Vesti: Energy-Efficient In-Memory Computing Accelerator for Deep Neural Networks

Shihui Yin; Zhewei Jiang; Minkyu Kim; Tushar Gupta; Mingoo Seok; Jae Sun Seo

doi:10.1109/TVLSI.2019.2940649

Vesti: Energy-Efficient In-Memory Computing Accelerator for Deep Neural Networks

Shihui Yin, Zhewei Jiang, Minkyu Kim, Tushar Gupta, Mingoo Seok, Jae Sun Seo

Research output: Contribution to journal › Article › peer-review

30 Scopus citations

Abstract

To enable essential deep learning computation on energy-constrained hardware platforms, including mobile, wearable, and Internet of Things (IoT) devices, a number of digital ASIC designs have presented customized dataflow and enhanced parallelism. However, in conventional digital designs, the biggest bottleneck for energy-efficient deep neural networks (DNNs) has reportedly been the data access and movement. To eliminate the storage access bottleneck, new SRAM macros that support in-memory computing have been recently demonstrated. Several in-SRAM computing works have used the mix of analog and digital circuits to perform XNOR-and-ACcumulate (XAC) operation without row-by-row memory access and can map a subset of DNNs with binary weights and binary activations. In the single array level, large improvement in energy efficiency (e.g., two orders of magnitude improvement) has been reported in computing XAC over digital-only hardware performing the same operation. In this article, by integrating many instances of such in-memory computing SRAM macros with an ensemble of peripheral digital circuits, we architect a new DNN accelerator, titled Vesti. This new accelerator is designed to support configurable multibit activations and large-scale DNNs seamlessly while substantially improving the chip-level energy-efficiency with favorable accuracy tradeoff compared to conventional digital ASIC. Vesti also employs double-buffering with two groups of in-memory computing SRAMs, effectively hiding the row-by-row write latencies of in-memory computing SRAMs. The Vesti accelerator is fully designed and laid out in 65-nm CMOS, demonstrating ultralow energy consumption of <20 nJ for MNIST classification and < 40~ \mu \text{J} for CIFAR-10 classification at 1.0-V supply.

Original language	English (US)
Article number	8867863
Pages (from-to)	48-61
Number of pages	14
Journal	IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Volume	28
Issue number	1
DOIs	https://doi.org/10.1109/TVLSI.2019.2940649
State	Published - Jan 2020

Keywords

Deep learning accelerator
SRAM
deep neural networks (DNNs)
double-buffering
in-memory computing

ASJC Scopus subject areas

Software
Hardware and Architecture
Electrical and Electronic Engineering

Access to Document

10.1109/TVLSI.2019.2940649

Cite this

@article{ab16276fa88a4126a6f2c2e307b38840,

title = "Vesti: Energy-Efficient In-Memory Computing Accelerator for Deep Neural Networks",

abstract = "To enable essential deep learning computation on energy-constrained hardware platforms, including mobile, wearable, and Internet of Things (IoT) devices, a number of digital ASIC designs have presented customized dataflow and enhanced parallelism. However, in conventional digital designs, the biggest bottleneck for energy-efficient deep neural networks (DNNs) has reportedly been the data access and movement. To eliminate the storage access bottleneck, new SRAM macros that support in-memory computing have been recently demonstrated. Several in-SRAM computing works have used the mix of analog and digital circuits to perform XNOR-and-ACcumulate (XAC) operation without row-by-row memory access and can map a subset of DNNs with binary weights and binary activations. In the single array level, large improvement in energy efficiency (e.g., two orders of magnitude improvement) has been reported in computing XAC over digital-only hardware performing the same operation. In this article, by integrating many instances of such in-memory computing SRAM macros with an ensemble of peripheral digital circuits, we architect a new DNN accelerator, titled Vesti. This new accelerator is designed to support configurable multibit activations and large-scale DNNs seamlessly while substantially improving the chip-level energy-efficiency with favorable accuracy tradeoff compared to conventional digital ASIC. Vesti also employs double-buffering with two groups of in-memory computing SRAMs, effectively hiding the row-by-row write latencies of in-memory computing SRAMs. The Vesti accelerator is fully designed and laid out in 65-nm CMOS, demonstrating ultralow energy consumption of <20 nJ for MNIST classification and < 40~ \mu \text{J} for CIFAR-10 classification at 1.0-V supply.",

keywords = "Deep learning accelerator, SRAM, deep neural networks (DNNs), double-buffering, in-memory computing",

author = "Shihui Yin and Zhewei Jiang and Minkyu Kim and Tushar Gupta and Mingoo Seok and Seo, {Jae Sun}",

note = "Publisher Copyright: {\textcopyright} 1993-2012 IEEE.",

year = "2020",

month = jan,

doi = "10.1109/TVLSI.2019.2940649",

language = "English (US)",

volume = "28",

pages = "48--61",

journal = "IEEE Transactions on Very Large Scale Integration (VLSI) Systems",

issn = "1063-8210",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "1",

}

TY - JOUR

T1 - Vesti

T2 - Energy-Efficient In-Memory Computing Accelerator for Deep Neural Networks

AU - Yin, Shihui

AU - Jiang, Zhewei

AU - Kim, Minkyu

AU - Gupta, Tushar

AU - Seok, Mingoo

AU - Seo, Jae Sun

PY - 2020/1

Y1 - 2020/1

N2 - To enable essential deep learning computation on energy-constrained hardware platforms, including mobile, wearable, and Internet of Things (IoT) devices, a number of digital ASIC designs have presented customized dataflow and enhanced parallelism. However, in conventional digital designs, the biggest bottleneck for energy-efficient deep neural networks (DNNs) has reportedly been the data access and movement. To eliminate the storage access bottleneck, new SRAM macros that support in-memory computing have been recently demonstrated. Several in-SRAM computing works have used the mix of analog and digital circuits to perform XNOR-and-ACcumulate (XAC) operation without row-by-row memory access and can map a subset of DNNs with binary weights and binary activations. In the single array level, large improvement in energy efficiency (e.g., two orders of magnitude improvement) has been reported in computing XAC over digital-only hardware performing the same operation. In this article, by integrating many instances of such in-memory computing SRAM macros with an ensemble of peripheral digital circuits, we architect a new DNN accelerator, titled Vesti. This new accelerator is designed to support configurable multibit activations and large-scale DNNs seamlessly while substantially improving the chip-level energy-efficiency with favorable accuracy tradeoff compared to conventional digital ASIC. Vesti also employs double-buffering with two groups of in-memory computing SRAMs, effectively hiding the row-by-row write latencies of in-memory computing SRAMs. The Vesti accelerator is fully designed and laid out in 65-nm CMOS, demonstrating ultralow energy consumption of <20 nJ for MNIST classification and < 40~ \mu \text{J} for CIFAR-10 classification at 1.0-V supply.

AB - To enable essential deep learning computation on energy-constrained hardware platforms, including mobile, wearable, and Internet of Things (IoT) devices, a number of digital ASIC designs have presented customized dataflow and enhanced parallelism. However, in conventional digital designs, the biggest bottleneck for energy-efficient deep neural networks (DNNs) has reportedly been the data access and movement. To eliminate the storage access bottleneck, new SRAM macros that support in-memory computing have been recently demonstrated. Several in-SRAM computing works have used the mix of analog and digital circuits to perform XNOR-and-ACcumulate (XAC) operation without row-by-row memory access and can map a subset of DNNs with binary weights and binary activations. In the single array level, large improvement in energy efficiency (e.g., two orders of magnitude improvement) has been reported in computing XAC over digital-only hardware performing the same operation. In this article, by integrating many instances of such in-memory computing SRAM macros with an ensemble of peripheral digital circuits, we architect a new DNN accelerator, titled Vesti. This new accelerator is designed to support configurable multibit activations and large-scale DNNs seamlessly while substantially improving the chip-level energy-efficiency with favorable accuracy tradeoff compared to conventional digital ASIC. Vesti also employs double-buffering with two groups of in-memory computing SRAMs, effectively hiding the row-by-row write latencies of in-memory computing SRAMs. The Vesti accelerator is fully designed and laid out in 65-nm CMOS, demonstrating ultralow energy consumption of <20 nJ for MNIST classification and < 40~ \mu \text{J} for CIFAR-10 classification at 1.0-V supply.

KW - Deep learning accelerator

KW - SRAM

KW - deep neural networks (DNNs)

KW - double-buffering

KW - in-memory computing

UR - http://www.scopus.com/inward/record.url?scp=85077821278&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85077821278&partnerID=8YFLogxK

U2 - 10.1109/TVLSI.2019.2940649

DO - 10.1109/TVLSI.2019.2940649

M3 - Article

AN - SCOPUS:85077821278

SN - 1063-8210

VL - 28

SP - 48

EP - 61

JO - IEEE Transactions on Very Large Scale Integration (VLSI) Systems

JF - IEEE Transactions on Very Large Scale Integration (VLSI) Systems

IS - 1

M1 - 8867863

ER -

Vesti: Energy-Efficient In-Memory Computing Accelerator for Deep Neural Networks

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this