TY - JOUR
T1 - Vesti
T2 - Energy-Efficient In-Memory Computing Accelerator for Deep Neural Networks
AU - Yin, Shihui
AU - Jiang, Zhewei
AU - Kim, Minkyu
AU - Gupta, Tushar
AU - Seok, Mingoo
AU - Seo, Jae Sun
N1 - Funding Information:
Manuscript received May 10, 2019; revised July 28, 2019; accepted August 19, 2019. Date of publication October 14, 2019; date of current version December 27, 2019. This work was supported in part by NSF under Grant 1652866; in part by the Center for Brain-Inspired Computing (C-BRIC), one of the six centers in Joint University Microelectronics Program (JUMP); and in part by the Semiconductor Research Corporation (SRC) Program sponsored by Defense Advanced Research Projects Agency (DARPA). (Corresponding author: Shihui Yin.) S. Yin, M. Kim, and J.-S. Seo are with the School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ 85287 USA (e-mail: syin11@asu.edu; jaesun.seo@asu.edu).
Publisher Copyright:
© 1993-2012 IEEE.
PY - 2020/1
Y1 - 2020/1
N2 - To enable essential deep learning computation on energy-constrained hardware platforms, including mobile, wearable, and Internet of Things (IoT) devices, a number of digital ASIC designs have presented customized dataflow and enhanced parallelism. However, in conventional digital designs, the biggest bottleneck for energy-efficient deep neural networks (DNNs) has reportedly been the data access and movement. To eliminate the storage access bottleneck, new SRAM macros that support in-memory computing have been recently demonstrated. Several in-SRAM computing works have used the mix of analog and digital circuits to perform XNOR-and-ACcumulate (XAC) operation without row-by-row memory access and can map a subset of DNNs with binary weights and binary activations. In the single array level, large improvement in energy efficiency (e.g., two orders of magnitude improvement) has been reported in computing XAC over digital-only hardware performing the same operation. In this article, by integrating many instances of such in-memory computing SRAM macros with an ensemble of peripheral digital circuits, we architect a new DNN accelerator, titled Vesti. This new accelerator is designed to support configurable multibit activations and large-scale DNNs seamlessly while substantially improving the chip-level energy-efficiency with favorable accuracy tradeoff compared to conventional digital ASIC. Vesti also employs double-buffering with two groups of in-memory computing SRAMs, effectively hiding the row-by-row write latencies of in-memory computing SRAMs. The Vesti accelerator is fully designed and laid out in 65-nm CMOS, demonstrating ultralow energy consumption of <20 nJ for MNIST classification and < 40~ \mu \text{J} for CIFAR-10 classification at 1.0-V supply.
AB - To enable essential deep learning computation on energy-constrained hardware platforms, including mobile, wearable, and Internet of Things (IoT) devices, a number of digital ASIC designs have presented customized dataflow and enhanced parallelism. However, in conventional digital designs, the biggest bottleneck for energy-efficient deep neural networks (DNNs) has reportedly been the data access and movement. To eliminate the storage access bottleneck, new SRAM macros that support in-memory computing have been recently demonstrated. Several in-SRAM computing works have used the mix of analog and digital circuits to perform XNOR-and-ACcumulate (XAC) operation without row-by-row memory access and can map a subset of DNNs with binary weights and binary activations. In the single array level, large improvement in energy efficiency (e.g., two orders of magnitude improvement) has been reported in computing XAC over digital-only hardware performing the same operation. In this article, by integrating many instances of such in-memory computing SRAM macros with an ensemble of peripheral digital circuits, we architect a new DNN accelerator, titled Vesti. This new accelerator is designed to support configurable multibit activations and large-scale DNNs seamlessly while substantially improving the chip-level energy-efficiency with favorable accuracy tradeoff compared to conventional digital ASIC. Vesti also employs double-buffering with two groups of in-memory computing SRAMs, effectively hiding the row-by-row write latencies of in-memory computing SRAMs. The Vesti accelerator is fully designed and laid out in 65-nm CMOS, demonstrating ultralow energy consumption of <20 nJ for MNIST classification and < 40~ \mu \text{J} for CIFAR-10 classification at 1.0-V supply.
KW - Deep learning accelerator
KW - SRAM
KW - deep neural networks (DNNs)
KW - double-buffering
KW - in-memory computing
UR - http://www.scopus.com/inward/record.url?scp=85077821278&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85077821278&partnerID=8YFLogxK
U2 - 10.1109/TVLSI.2019.2940649
DO - 10.1109/TVLSI.2019.2940649
M3 - Article
AN - SCOPUS:85077821278
SN - 1063-8210
VL - 28
SP - 48
EP - 61
JO - IEEE Transactions on Very Large Scale Integration (VLSI) Systems
JF - IEEE Transactions on Very Large Scale Integration (VLSI) Systems
IS - 1
M1 - 8867863
ER -