PIMCA: A Programmable In-Memory Computing Accelerator for Energy-Efficient DNN Inference

Bo Zhang; Shihui Yin; Minkyu Kim; Jyotishman Saikia; Soonwan Kwon; Sungmeen Myung; Hyunsoo Kim; Sang Joon Kim; Jae Sun Seo; Mingoo Seok

doi:10.1109/JSSC.2022.3211290

PIMCA: A Programmable In-Memory Computing Accelerator for Energy-Efficient DNN Inference

Bo Zhang, Shihui Yin, Minkyu Kim, Jyotishman Saikia, Soonwan Kwon, Sungmeen Myung, Hyunsoo Kim, Sang Joon Kim, Jae Sun Seo, Mingoo Seok

Research output: Contribution to journal › Article › peer-review

7 Scopus citations

Abstract

This article presents a programmable in-memory computing accelerator (PIMCA) for low-precision (1-2 b) deep neural network (DNN) inference. The custom 10T1C bitcell in the in-memory computing (IMC) macro has four additional transistors and one capacitor to perform capacitive-coupling-based multiply and accumulation (MAC) in analog-mixed-signal (AMS) domain. A macro containing 256× 128 bitcells can simultaneously activate all the rows, and as a result, it can perform a matrix-vector multiplication (VMM) in one cycle. PIMCA integrates 108 of such IMC static random-access memory (SRAM) macros with the custom six-stage pipeline and the custom instruction set architecture (ISA) for instruction-level programmability. The results of IMC macros are fed to a single-instruction-multiple-data (SIMD) processor for other computations such as partial sum accumulation, max-pooling, activation functions, etc. To effectively use the IMC and SIMD datapath, we customize the ISA especially by adding hardware loop support, which reduces the program size by up to 73%. The accelerator is prototyped in a 28-nm technology, and integrates a total of 3.4-Mb IMC SRAM and 1.5-Mb off-the-shelf activation SRAM, demonstrating one of the largest IMC accelerators to date. It achieves the system-level energy efficiency of 437 TOPS/W and the peak throughput of 49 TOPS at the 42-MHz clock frequency and 1-V supply for the VGG9 and the ResNet-18 on the CIFAR-10 dataset.

Original language	English (US)
Pages (from-to)	1436-1449
Number of pages	14
Journal	IEEE Journal of Solid-State Circuits
Volume	58
Issue number	5
DOIs	https://doi.org/10.1109/JSSC.2022.3211290
State	Published - May 1 2023
Externally published	Yes

Keywords

Capacitive coupling computing
deep neural network (DNN)
in-memory computing (IMC)
programmable accelerator

ASJC Scopus subject areas

Electrical and Electronic Engineering

Access to Document

10.1109/JSSC.2022.3211290

Cite this

@article{540ddd8119d547e89106ee428f2d1cea,

title = "PIMCA: A Programmable In-Memory Computing Accelerator for Energy-Efficient DNN Inference",

abstract = "This article presents a programmable in-memory computing accelerator (PIMCA) for low-precision (1-2 b) deep neural network (DNN) inference. The custom 10T1C bitcell in the in-memory computing (IMC) macro has four additional transistors and one capacitor to perform capacitive-coupling-based multiply and accumulation (MAC) in analog-mixed-signal (AMS) domain. A macro containing 256× 128 bitcells can simultaneously activate all the rows, and as a result, it can perform a matrix-vector multiplication (VMM) in one cycle. PIMCA integrates 108 of such IMC static random-access memory (SRAM) macros with the custom six-stage pipeline and the custom instruction set architecture (ISA) for instruction-level programmability. The results of IMC macros are fed to a single-instruction-multiple-data (SIMD) processor for other computations such as partial sum accumulation, max-pooling, activation functions, etc. To effectively use the IMC and SIMD datapath, we customize the ISA especially by adding hardware loop support, which reduces the program size by up to 73%. The accelerator is prototyped in a 28-nm technology, and integrates a total of 3.4-Mb IMC SRAM and 1.5-Mb off-the-shelf activation SRAM, demonstrating one of the largest IMC accelerators to date. It achieves the system-level energy efficiency of 437 TOPS/W and the peak throughput of 49 TOPS at the 42-MHz clock frequency and 1-V supply for the VGG9 and the ResNet-18 on the CIFAR-10 dataset.",

keywords = "Capacitive coupling computing, deep neural network (DNN), in-memory computing (IMC), programmable accelerator",

author = "Bo Zhang and Shihui Yin and Minkyu Kim and Jyotishman Saikia and Soonwan Kwon and Sungmeen Myung and Hyunsoo Kim and Kim, {Sang Joon} and Seo, {Jae Sun} and Mingoo Seok",

note = "Publisher Copyright: {\textcopyright} 1966-2012 IEEE.",

year = "2023",

month = may,

day = "1",

doi = "10.1109/JSSC.2022.3211290",

language = "English (US)",

volume = "58",

pages = "1436--1449",

journal = "IEEE Journal of Solid-State Circuits",

issn = "0018-9200",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "5",

}

TY - JOUR

T1 - PIMCA

T2 - A Programmable In-Memory Computing Accelerator for Energy-Efficient DNN Inference

AU - Zhang, Bo

AU - Yin, Shihui

AU - Kim, Minkyu

AU - Saikia, Jyotishman

AU - Kwon, Soonwan

AU - Myung, Sungmeen

AU - Kim, Hyunsoo

AU - Kim, Sang Joon

AU - Seo, Jae Sun

AU - Seok, Mingoo

PY - 2023/5/1

Y1 - 2023/5/1

N2 - This article presents a programmable in-memory computing accelerator (PIMCA) for low-precision (1-2 b) deep neural network (DNN) inference. The custom 10T1C bitcell in the in-memory computing (IMC) macro has four additional transistors and one capacitor to perform capacitive-coupling-based multiply and accumulation (MAC) in analog-mixed-signal (AMS) domain. A macro containing 256× 128 bitcells can simultaneously activate all the rows, and as a result, it can perform a matrix-vector multiplication (VMM) in one cycle. PIMCA integrates 108 of such IMC static random-access memory (SRAM) macros with the custom six-stage pipeline and the custom instruction set architecture (ISA) for instruction-level programmability. The results of IMC macros are fed to a single-instruction-multiple-data (SIMD) processor for other computations such as partial sum accumulation, max-pooling, activation functions, etc. To effectively use the IMC and SIMD datapath, we customize the ISA especially by adding hardware loop support, which reduces the program size by up to 73%. The accelerator is prototyped in a 28-nm technology, and integrates a total of 3.4-Mb IMC SRAM and 1.5-Mb off-the-shelf activation SRAM, demonstrating one of the largest IMC accelerators to date. It achieves the system-level energy efficiency of 437 TOPS/W and the peak throughput of 49 TOPS at the 42-MHz clock frequency and 1-V supply for the VGG9 and the ResNet-18 on the CIFAR-10 dataset.

AB - This article presents a programmable in-memory computing accelerator (PIMCA) for low-precision (1-2 b) deep neural network (DNN) inference. The custom 10T1C bitcell in the in-memory computing (IMC) macro has four additional transistors and one capacitor to perform capacitive-coupling-based multiply and accumulation (MAC) in analog-mixed-signal (AMS) domain. A macro containing 256× 128 bitcells can simultaneously activate all the rows, and as a result, it can perform a matrix-vector multiplication (VMM) in one cycle. PIMCA integrates 108 of such IMC static random-access memory (SRAM) macros with the custom six-stage pipeline and the custom instruction set architecture (ISA) for instruction-level programmability. The results of IMC macros are fed to a single-instruction-multiple-data (SIMD) processor for other computations such as partial sum accumulation, max-pooling, activation functions, etc. To effectively use the IMC and SIMD datapath, we customize the ISA especially by adding hardware loop support, which reduces the program size by up to 73%. The accelerator is prototyped in a 28-nm technology, and integrates a total of 3.4-Mb IMC SRAM and 1.5-Mb off-the-shelf activation SRAM, demonstrating one of the largest IMC accelerators to date. It achieves the system-level energy efficiency of 437 TOPS/W and the peak throughput of 49 TOPS at the 42-MHz clock frequency and 1-V supply for the VGG9 and the ResNet-18 on the CIFAR-10 dataset.

KW - Capacitive coupling computing

KW - deep neural network (DNN)

KW - in-memory computing (IMC)

KW - programmable accelerator

UR - http://www.scopus.com/inward/record.url?scp=85140791733&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85140791733&partnerID=8YFLogxK

U2 - 10.1109/JSSC.2022.3211290

DO - 10.1109/JSSC.2022.3211290

M3 - Article

AN - SCOPUS:85140791733

SN - 0018-9200

VL - 58

SP - 1436

EP - 1449

JO - IEEE Journal of Solid-State Circuits

JF - IEEE Journal of Solid-State Circuits

IS - 5

ER -

PIMCA: A Programmable In-Memory Computing Accelerator for Energy-Efficient DNN Inference

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this