TY - JOUR
T1 - An energy-efficient deep convolutional neural network accelerator featuring conditional computing and low external memory access
AU - Kim, Minkyu
AU - Seo, Jae Sun
N1 - Funding Information:
Manuscript received June 4, 2020; revised August 28, 2020; accepted October 2, 2020. Date of publication October 19, 2020; date of current version February 24, 2021. This article was approved by Guest Editor Mark Oude Alink. This work was supported in part by NSF under Grant 1652866; in part by Samsung Electronics; and in part by the Center for Brain-Inspired Computing (C-BRIC), one of the six centers in Joint University Microelectronics Program (JUMP), a Semiconductor Research Corporation (SRC) Program sponsored by the Defense Advanced Research Projects Agency (DARPA). (Corresponding author: Minkyu Kim.) Minkyu Kim was with the School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ 85287 USA. He is now with Qualcomm, San Diego, CA 92121 USA (e-mail: minkkim@qti.qualcomm.com).
Publisher Copyright:
© 1966-2012 IEEE.
PY - 2021/3
Y1 - 2021/3
N2 - With its algorithmic success in many machine learning tasks and applications, deep convolutional neural networks (DCNNs) have been implemented with custom hardware in a number of prior works. However, such works have not exploited conditional/approximate computing to the utmost toward eliminating redundant computations of CNNs. This article presents a DCNN accelerator featuring a novel conditional computing scheme that synergistically combines precision cascading (PC) with zero skipping (ZS). To reduce many redundant convolutions that are followed by max-pooling operations, we propose precision cascading, where the input features are divided into a number of low-precision groups and approximate convolutions with only the most significant bits (MSBs) are performed first. Based on this approximate computation, the full-precision convolution is performed only on the maximum pooling output that is found. This way, the total number of bit-wise convolutions can be reduced by ∼ 2× with < 0.8% degradation in ImageNet accuracy. PC provides the added benefit of increased sparsity per low-precision group, which we exploit with ZS to eliminate the clock cycles and external memory accesses. The proposed conditional computing scheme has been implemented with custom architecture in a 40-nm prototype chip, which achieves a peak energy efficiency of 24.97 TOPS/W at 0.6-V supply and a low external memory access of 0.0018 access/MAC with VGG-16 CNN for ImageNet classification and a peak energy efficiency of 28.51 TOPS/W at 0.9-V supply with FlowNet for Flying Chair data set.
AB - With its algorithmic success in many machine learning tasks and applications, deep convolutional neural networks (DCNNs) have been implemented with custom hardware in a number of prior works. However, such works have not exploited conditional/approximate computing to the utmost toward eliminating redundant computations of CNNs. This article presents a DCNN accelerator featuring a novel conditional computing scheme that synergistically combines precision cascading (PC) with zero skipping (ZS). To reduce many redundant convolutions that are followed by max-pooling operations, we propose precision cascading, where the input features are divided into a number of low-precision groups and approximate convolutions with only the most significant bits (MSBs) are performed first. Based on this approximate computation, the full-precision convolution is performed only on the maximum pooling output that is found. This way, the total number of bit-wise convolutions can be reduced by ∼ 2× with < 0.8% degradation in ImageNet accuracy. PC provides the added benefit of increased sparsity per low-precision group, which we exploit with ZS to eliminate the clock cycles and external memory accesses. The proposed conditional computing scheme has been implemented with custom architecture in a 40-nm prototype chip, which achieves a peak energy efficiency of 24.97 TOPS/W at 0.6-V supply and a low external memory access of 0.0018 access/MAC with VGG-16 CNN for ImageNet classification and a peak energy efficiency of 28.51 TOPS/W at 0.9-V supply with FlowNet for Flying Chair data set.
KW - Application-specific integrated circuit (ASIC)
KW - approximate computing
KW - conditional computing
KW - deep convolutional neural network (DCNN)
KW - deep learning
KW - energy-efficient accelerator
UR - http://www.scopus.com/inward/record.url?scp=85101838156&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85101838156&partnerID=8YFLogxK
U2 - 10.1109/JSSC.2020.3029235
DO - 10.1109/JSSC.2020.3029235
M3 - Article
AN - SCOPUS:85101838156
SN - 0018-9200
VL - 56
SP - 803
EP - 813
JO - IEEE Journal of Solid-State Circuits
JF - IEEE Journal of Solid-State Circuits
IS - 3
M1 - 9229157
ER -