TY - GEN
T1 - Accelerating deep neural network computation on a low power reconfigurable architecture
AU - Xiong, Y.
AU - Zhou, J.
AU - Pal, S.
AU - Blaauw, D.
AU - Kim, H. S.
AU - Mudge, T.
AU - Dreslinski, R.
AU - Chakrabarti, C.
N1 - Funding Information:
This work investigated the performance of commonly-used kernels in DNN computations for different cache mode configurations on a low power reconfigurable architecture with two levels of cache hierarchy. All kernels were mapped on this architecture to achieve greater than 90% core utilization for most cases. Kernel level evaluation show that private-private mode has the lowest execution time. End to end implementation of ResNet, AlexNet and AWD LSTM RNN showed very high performance of 188.19, 150.53 and 120.68 GOPS/W, respectively, in the 14nm node. Acknowledgment: The material is based on research sponsored by Air Force Research Laboratory (AFRL) and Defense Advanced Research Projects Agency (DARPA) under agreement number FA8650-18-2-7864. The views and conclusions contained herein are those of the authors and do not represent the official policies or endorsements, either expressed or implied, of ARFL and DARPA or the U.S. Government.
Publisher Copyright:
© 2020 IEEE
PY - 2020
Y1 - 2020
N2 - Recent work on neural network architectures has focused on bridging the gap between performance/efficiency and programmability. We consider implementations of three popular neural networks, ResNet, AlexNet and ASGD weight-dropped Recurrent Neural Network (AWD RNN) on a low power programmable architecture, Transformer. The architecture consists of light-weight cores interconnected by caches and crossbars that support run-time reconfiguration between shared and private cache mode operations. We present efficient implementations of key neural network kernels and evaluate the performance of each kernel when operating in different cache modes. The best-performing cache modes are then used in the implementation of the end-to-end network. Simulation results show superior performance with ResNet, AlexNet and AWD RNN achieving 188.19 GOPS/W, 150.53 GOPS/W and 120.68 GOPS/W, respectively, in the 14 nm technology node.
AB - Recent work on neural network architectures has focused on bridging the gap between performance/efficiency and programmability. We consider implementations of three popular neural networks, ResNet, AlexNet and ASGD weight-dropped Recurrent Neural Network (AWD RNN) on a low power programmable architecture, Transformer. The architecture consists of light-weight cores interconnected by caches and crossbars that support run-time reconfiguration between shared and private cache mode operations. We present efficient implementations of key neural network kernels and evaluate the performance of each kernel when operating in different cache modes. The best-performing cache modes are then used in the implementation of the end-to-end network. Simulation results show superior performance with ResNet, AlexNet and AWD RNN achieving 188.19 GOPS/W, 150.53 GOPS/W and 120.68 GOPS/W, respectively, in the 14 nm technology node.
UR - http://www.scopus.com/inward/record.url?scp=85097844379&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85097844379&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85097844379
T3 - Proceedings - IEEE International Symposium on Circuits and Systems
BT - 2020 IEEE International Symposium on Circuits and Systems, ISCAS 2020 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 52nd IEEE International Symposium on Circuits and Systems, ISCAS 2020
Y2 - 10 October 2020 through 21 October 2020
ER -