The 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing
Co-located with the 33rd Conference on Neural Information Processing Systems NeurIPS 2019
description Workshop Objective
As artificial intelligence and other forms of cognitive computing continue to proliferate into new domains, many forums for dialogue and knowledge sharing have emerged. In the proposed workshop, the primary focus is on the exploration of energy efficient techniques and architectures for cognitive computing and machine learning, particularly for applications and systems running at the edge. For such resource constrained environments, performance alone is never sufficient, requiring system designers to carefully balance performance with power, energy, and area (overall PPA metric).
The goal of this workshop is to provide a forum for researchers who are exploring novel ideas in the field of energy efficient machine learning and artificial intelligence for a variety of applications. We also hope to provide a solid platform for forging relationships and exchange of ideas between the industry and the academic world through discussions and active collaborations.
chat Call for Papers
A new wave of intelligent computing, driven by recent advances in machine learning and cognitive algorithms coupled with process technology and new design methodologies, has the potential to usher unprecedented disruption in the way conventional computing solutions are designed and deployed. These new and innovative approaches often provide an attractive and efficient alternative not only in terms of performance but also power, energy, and area. This disruption is easily visible across the whole spectrum of computing systems -- ranging from low end mobile devices to large scale data centers and servers.
A key class of these intelligent solutions is providing real-time, on-device cognition at the edge to enable many novel applications including vision and image processing, language translation, autonomous driving, malware detection, and gesture recognition. Naturally, these applications have diverse requirements for performance,energy, reliability, accuracy, and security that demand a holistic approach to designing the hardware, software, and intelligence algorithms to achieve the best power, performance, and area (PPA).
format_list_bulleted Topics for the Workshop
- Architectures for the edge: IoT, automotive, and mobile
- Approximation, quantization reduced precision computing
- Hardware/software techniques for sparsity
- Neural network architectures for resource constrained devices
- Neural network pruning, tuning and and automatic architecture search
- Novel memory architectures for machine learing
- Communication/computation scheduling for better performance and energy
- Load balancing and efficient task distribution techniques
- Exploring the interplay between precision, performance, power and energy
- Exploration of new and efficient applications for machine learning
- Characterization of machine learning benchmarks and workloads
- Performance profiling and synthesis of workloads
- Simulation and emulation techniques, frameworks and platforms for machine learning
- Power, performance and area (PPA) based comparison of neural networks
- Verification, validation and determinism in neural networks
- Efficient on-device learning techniques
- Security, safety and privacy challenges and building secure AI systems
Yann LeCun is VP & Chief AI Scientist at Facebook and Silver Professor at NYU affiliated with the Courant Institute of Mathematical Sciences & the Center for Data Science. He was the founding Director of Facebook AI Research and of the NYU Center for Data Science. He received an Engineering Diploma from ESIEE (Paris) and a PhD from Sorbonne Université. After a postdoc in Toronto he joined AT&T Bell Labs in 1988, and AT&T Labs in 1996 as Head of Image Processing Research. He joined NYU as a professor in 2003 and Facebook in 2013. His interests include AI machine learning, computer perception, robotics and computational neuroscience. He is a member of the National Academy of Engineering and the recipient of the 2018 ACM Turing Award (with Geoffrey Hinton and Yoshua Bengio) for “conceptual and engineering breakthroughs that have made deep neural networks a a critical component of computing”.
Cheap, Fast, and Low Power Deep Learning: I need it now!
In this talk I will describe the need for low power machine learning systems. I will motivate this by describing several current projects at Purdue University that have a need for energy efficient deep learning and in some cases the real deployment of these methods will not be possible without lower power solutions. The applications include precision farming, health care monitoring, and edge-based surveillance.
Edward J. Delp is currently The Charles William Harrison Distinguished Professor of Electrical and Computer Engineering and Professor of Biomedical Engineering at Purdue University. His research interests include image and video processing, image analysis, computer vision, machine learning, image and video compression, multimedia security, medical imaging, multimedia systems, communication and information theory. Dr. Delp is a Life Fellow of the IEEE. In 2004 Dr. Delp received the Technical Achievement Award from the IEEE Signal Processing Society for his work in image and video compression and multimedia security. In 2008 Dr. Delp received the Society Award from the IEEE Signal Processing Society.
Efficient Computing for AI and Robotics
Computing near the sensor is preferred over the cloud due to privacy and/or latency concerns for a wide range of applications including robotics/drones, self-driving cars, smart Internet of Things, and portable/wearable electronics. However, at the sensor there are often stringent constraints on energy consumption and cost in addition to the throughput and accuracy requirements of the application. In this talk, we will describe how joint algorithm and hardware design can be used to reduce energy consumption while delivering real-time and robust performance for applications including deep learning, computer vision, autonomous navigation/exploration and video/image processing. We will show how energy-efficient techniques that exploit correlation and sparsity to reduce compute, data movement and storage costs can be applied to various tasks including image classification, depth estimation, super-resolution, localization and mapping.
Vivienne Sze is an Associate Professor at MIT in the Electrical Engineering and Computer Science Department. Her research interests include energy-aware signal processing algorithms, and low-power circuit and system design for portable multimedia applications, including computer vision, deep learning, autonomous navigation, and video process/coding. Prior to joining MIT, she was a Member of Technical Staff in the R&D Center at TI, where she designed low-power algorithms and architectures for video coding. She also represented TI in the JCT-VC committee of ITU-T and ISO/IEC standards body during the development of High Efficiency Video Coding (HEVC), which received a Primetime Engineering Emmy Award. She is a co-editor of the book entitled “High Efficiency Video Coding (HEVC): Algorithms and Architectures” (Springer, 2014).
Prof. Sze received the B.A.Sc. degree from the University of Toronto in 2004, and the S.M. and Ph.D. degree from MIT in 2006 and 2010, respectively. In 2011, she received the Jin-Au Kong Outstanding Doctoral Thesis Prize in Electrical Engineering at MIT. She is a recipient of the 2018 Facebook Faculty Award, the 2018 & 2017 Qualcomm Faculty Award, the 2018 & 2016 Google Faculty Research Award, the 2016 AFOSR Young Investigator Research Program (YIP) Award, the 2016 3M Non-Tenured Faculty Award, the 2014 DARPA Young Faculty Award, the 2007 DAC/ISSCC Student Design Contest Award, and a co-recipient of the 2018 Symposium on VLSI Circuits Best Student Paper Award, the 2017 CICC Outstanding Invited Paper Award, the 2016 IEEE Micro Top Picks Award and the 2008 A-SSCC Outstanding Design Award.
For more information about research in the Energy-Efficient Multimedia Systems Group at MIT visit: http://www.rle.mit.edu/eems/
Adaptive Multi-Task Neural Networks for Efficient Inference
Very deep convolutional neural networks have shown remarkable success in many computer vision tasks, yet their computational expense limits their impact in domains where fast inference is essential. While there has been significant progress on model compression and acceleration, most methods rely on a one-size-fits-all network, where the same set of features is extracted for all images or tasks, no matter their complexity. In this talk, I will first describe an approach called BlockDrop, which learns to dynamically choose which layers of a deep network to execute during inference, depending on the image complexity, so as to best reduce total computation without degrading prediction accuracy. Then, I will show how this approach can be extended to design compact multi-task networks, where a different set of layers is executed depending on the task complexity, and the level of feature sharing across tasks is automatically determined to maximize both the accuracy and efficiency of the model. Finally, I will conclude the talk presenting an efficient multi-scale neural network model, which achieves state-of-the art results in terms of accuracy and FLOPS reduction on standard benchmarks such as the ImageNet dataset.
Rogerio Schmidt Feris is the head of computer vision and multimedia research at IBM T.J. Watson Research Center. He joined IBM in 2006 after receiving a Ph.D. from the University of California, Santa Barbara. He has also worked as an Affiliate Associate Professor at the University of Washington and as an Adjunct Associate Professor at Columbia University. His work has not only been published in top AI conferences, but has also been integrated into multiple IBM products, including Watson Visual Recognition, Watson Media, and Intelligent Video Analytics. He currently serves as an Associate Editor of TPAMI, has served as a Program Chair of WACV 2017, and as an Area Chair of conferences such as NeurIPS, CVPR, and ICCV.
Efficient Algorithms to Accelerate Deep Learning on Edge Devices
Efficient deep learning computing requires algorithm and hardware co-design to enable specialization. However, the extra degree of freedom creates a much larger design space. We propose AutoML techniques to architect efficient neural networks. We investigate automatically designing small and fast models (ProxylessNAS), auto channel pruning (AMC), and auto mixed-precision quantization (HAQ). We demonstrate such learning-based, automated design achieves superior performance and efficiency than rule-based human design. Moreover, we shorten the design cycle by 200× than previous work to efficiently search efficient models, so that we can afford to design specialized neural network models for different hardware platforms. We accelerate computation-intensive AI applications including (TSM) for efficient video recognition and PVCNN for efficient 3D recognition on point clouds. Finally, we’ll describe scalable distributed training and the potential security issues of efficient deep learning  
Song Han is an assistant professor at MIT EECS. Dr. Han received the Ph.D. degree in Electrical Engineering from Stanford advised by Prof. Bill Dally. Dr. Han’s research focuses on efficient deep learning computing. He proposed “Deep Compression” and “ EIE Accelerator” that impacted the industry. His work received the best paper award in ICLR’16 and FPGA’17. He was the co-founder and chief scientist of DeePhi Tech which was acquired by Xilinx.
Putting the “Machine” Back in Machine Learning: The Case for Hardware-ML Model Co-design
Machine learning (ML) applications have entered and impacted our lives unlike any other technology advance from the recent past. Indeed, almost every aspect of how we live or interact with others relies on or uses ML for applications ranging from image classification and object detection, to processing multi‐modal and heterogeneous datasets. While the holy grail for judging the quality of a ML model has largely been serving accuracy, and only recently its resource usage, neither of these metrics translate directly to energy efficiency, runtime, or mobile device battery lifetime. This talk will uncover the need for building accurate, platform‐specific power and latency models for convolutional neural networks (CNNs) and efficient hardware-aware CNN design methodologies, thus allowing machine learners and hardware designers to identify not just the best accuracy NN configuration, but also those that satisfy given hardware constraints. Our proposed modeling framework is applicable to both high‐end and mobile platforms and achieves 88.24% accuracy for latency, 88.34% for power, and 97.21% for energy prediction. Using similar predictive models, we demonstrate a novel differentiable neural architecture search (NAS) framework, dubbed Single-Path NAS, that uses one single-path over-parameterized CNN to encode all architectural decisions based on shared convolutional kernel parameters. Single-Path NAS achieves state-of-the-art top-1 ImageNet accuracy (75.62%), outperforming existing mobile NAS methods for similar latency constraints (∼80ms) and finds the final configuration up to 5,000× faster compared to prior work. Combined with our quantized CNNs (Flexible Lightweight CNNs or FLightNNs) that customize precision level in a layer-wise fashion and achieve almost iso-accuracy at 5-10x energy reduction, such a modeling, analysis, and optimization framework is poised to lead to true co-design of hardware and ML model, orders of magnitude faster than state of the art, while satisfying both accuracy and latency or energy constraints.
Diana Marculescu is the David Edward Schramm Professor of Electrical and Computer Engineering at Carnegie Mellon University and the incoming Chair of Department of Electrical and Computer Engineering at University of Texas at Austin (starting December 2019). Diana is the Founding Director of the College of Engineering Center for Faculty Success at Carnegie Mellon University (since 2015) and has served as Associate Department Head for Academic Affairs in Electrical and Computer Engineering (2014-2018). She received the Dipl.Ing. degree in computer science from the Polytechnic University of Bucharest, Bucharest, Romania (1991), and the Ph.D. degree in computer engineering from the University of Southern California, Los Angeles, CA (1998). Her research interests include energy- and reliability-aware computing, hardware aware machine learning, and computing for sustainability and natural science applications. Diana was a recipient of the National Science Foundation Faculty Career Award (2000-2004), the ACM SIGDA Technical Leadership Award (2003), the Carnegie Institute of Technology George Tallman Ladd Research Award (2004), and several best paper awards. She was an IEEE Circuits and Systems Society Distinguished Lecturer (2004-2005) and the Chair of the Association for Computing Machinery (ACM) Special Interest Group on Design Automation (2005-2009). Diana chaired several conferences and symposia in her area and is currently an Associate Editor for IEEE Transactions on Computers. She was selected as an ELATE Fellow (2013-2014), and is the recipient of an Australian Research Council Future Fellowship (2013-2017), the Marie R. Pistilli Women in EDA Achievement Award (2014), and the Barbara Lazarus Award from Carnegie Mellon University (2018). Diana is an IEEE Fellow and an ACM Distinguished Scientist.
Advances and Prospects for In-memory Computing
Edge AI applications retain the need for high-performing inference models, while driving platforms beyond their limits of energy efficiency and throughput. Digital hardware acceleration, enabling 10-100x gains over general-purpose architectures, is already widely deployed, but is ultimately restricted by data-movement and memory accessing that dominates deep-learning computations. In-memory computing, based on both SRAM and emerging memory, offers fundamentally new tradeoffs for overcoming these barriers, with the potential for 10x higher energy efficiency and area-normalized throughput demonstrated in recent designs. But, those tradeoffs instate new challenges, especially affecting scaling to the level of computations required, integration in practical heterogeneous architectures, and mapping of diverse software. This talk examines those tradeoffs to characterize the challenges. It then explores recent research that provides promising paths forward, making in-memory computing more of a practical reality than ever before.
Naveen Verma received the B.A.Sc. degree in Electrical and Computer Engineering from the UBC, Vancouver, Canada in 2003, and the M.S. and Ph.D. degrees in Electrical Engineering from MIT in 2005 and 2009 respectively. Since July 2009 he has been a faculty member at Princeton University. His research focuses on advanced sensing systems, exploring how systems for learning, inference, and action planning can be enhanced by algorithms that exploit new sensing and computing technologies. This includes research on large-area, flexible sensors, energy-efficient statistical-computing architectures and circuits, and machine-learning and statistical-signal-processing algorithms. Prof. Verma has served as a Distinguished Lecturer of the IEEE Solid-State Circuits Society, and currently serves on the technical program committees for ISSCC, VLSI Symp., DATE, and IEEE Signal-Processing Society (DISPS).
Algorithm-Accelerator Co-Design for Neural Network Specialization
In recent years, machine learning (ML) with deep neural networks (DNNs) has been widely deployed in diverse application domains. However, the growing complexity of DNN models, the slowdown of technology scaling, and the proliferation of edge devices are driving a demand for higher DNN performance and energy efficiency. ML applications have shifted from general-purpose processors to dedicated hardware accelerators in both academic and commercial settings. In line with this trend, there has been an active body of research on both algorithms and hardware architectures for neural network specialization.
This talk presents our recent investigation into DNN optimization and low-precision quantization, using a co-design approach featuring contributions to both algorithms and hardware accelerators. First, we review static network pruning techniques and show a fundamental link between group convolutions and circulant matrices – two previously disparate lines of research in DNN compression. Then we discuss channel gating, a dynamic, fine-grained, and trainable technique for DNN acceleration. Unlike static approaches, channel gating exploits input-dependent dynamic sparsity at run time. This results in a significant reduction in compute cost with a minimal impact on accuracy. Finally, we present outlier channel splitting, a technique to improve DNN weight quantization by removing outliers from the weight distribution without retraining.
Zhiru Zhang is an Associate Professor in the School of ECE at Cornell University. His current research investigates new algorithms, design methodologies, and automation tools for heterogeneous computing. His research has been recognized with a Google Faculty Research Award (2018), the DAC Under-40 Innovators Award (2018), the Rising Professional Achievement Award from the UCLA Henry Samueli School of Engineering and Applied Science (2018), a DARPA Young Faculty Award (2015), and the IEEE CEDA Ernest S. Kuh Early Career Award (2015), an NSF CAREER Award (2015), the Ross Freeman Award for Technical Innovation from Xilinx (2012), and multiple best paper awards and nominations. Prior to joining Cornell, he was a co-founder of AutoESL, a high-level synthesis start-up later acquired by Xilinx.
description EMC2 Model Compression Challenge (EMCC)
Deep learning has recently pushed the state of the art boundaries in many computer vision tasks. However, existing deep learning models are both computationally and memory intensive, making their deployment difficult on devices with low compute and memory resources. To fit these emerging models on such devices, novel compression techniques are needed without significantly decreasing the model accuracy.
The EMC2 Model Compression Challenge (EMCC) aims to identify the best technology in deep learning model compression. To win a prize in EMCC, the solution will be evaluated to meet the two metrics in two tracks below:
- Achieve highest accuracy within the target model size. The submission will not be evaluated if the model size is outside of the target range.
- Achieve smallest model size within the target accuracy. The submission will not be evaluated if the accuracy is outside of the target range.
A participant (or a team) can submit a single model in Tensorflow (https://www.tensorflow.org/) or PyTorch (https://pytorch.org/). Final scores will be computed after submission closes.
There are three tracks, each participant can submit to either or all tracks.
- ImageNet Classification: This category focuses image classification models.
- COCO Object Detection: This category focuses on object detection models.
- PASCAL Object Segmentation: This category focuses on object segmentation models.
There are three tracks, each participant can submit to either or all tracks.
- ImageNet Classification: ILSVRC 2012 classification dataset at http://image-net.org/challenges/LSVRC/2012/index#data
- COCO Object Detection: COCO 2017 detection dataset at http://cocodataset.org
- PASCAL Object Segmentation: PASCAL 2012 object segmentation dataset at http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html
Submissions are evaluated based on the following metrics:
- Highest accuracy with target model size. The target size is based on the state-of-the-art DL model’s size with allowed 5% additional size budget below:
- ImageNet 2012 Classification: Top-1 accuracy for image classification under a targeted model size (3.5M Bytes). 3.5M is estimated from 8-bit quantized MobilenetV2 model.
- COCO 2017 Object Detection: COCO metrics, and the target model size is 6.2M Bytes, which is estimated from 8-bit quantized MobileNetV2-SSD model.
- PASCAL 2012 Object Segmentation: mIOU, and the target model size is 2.1M Bytes, which is based on 8-bit quantized MobileNetV2-DeepLab model.
- Smallest model size with target accuracy. The target accuracy is based on the state-of-the-art DL model’s accuracy with allowed 5% additional accuracy budget below:
- ImageNet 2012 Classification: Smallest mode size with target Top-1 accuracy (70%). 3.5M is estimated from 8-bit quantized MobilenetV2 model.
- COCO 2017 Object Detection: Smallest mode size with target mAP is 26% (COCO metrics), which is estimated from 8-bit quantized MobileNetV2-SSD model.
- PASCAL 2012 Object Segmentation: Smallest model size with target mIOU 70%, which is based on 8-bit quantized MobileNetV2-DeepLab model.
3. Benchmark Environment and Input
The submissions will be interpreted using the Tensorflow 1.14.0 or PyTorch 1.1.0. The input is ImageNet/COCO/PASCAL images.
4. Output and Model Conversion
- ImageNet Classification: The output must be a tensor encoding probabilities of the 1000 classes. (Labels are avaialble here.
- COCO Object Detection: The output must be the bounding box and probabilities of the 80 classes.
- PASCAL Object Segmentation:The output must be the segmentation mask.
5. Timeline and Submission
Please see the submision page for dates and details.