TY - GEN
T1 - Sparse and Robust RRAM-based Efficient In-memory Computing for DNN Inference
AU - Meng, Jian
AU - Yeo, Injune
AU - Shim, Wonbo
AU - Yang, Li
AU - Fan, Deliang
AU - Yu, Shimeng
AU - Seo, Jae Sun
N1 - Funding Information:
We thank Winbond Electronics for RRAM chip fabrication support. This work is partially supported by NSF grants 1652866/1715443/1740225, SRC AIHW program, and C-BRIC/ASCENT, two of six centers in JUMP, an SRC program sponsored by DARPA.
Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Resistive random-access memory (RRAM)-based in-memory computing (IMC) recently became a promising paradigm for efficient deep neural network acceleration. The multi-bit RRAM arrays provide dense storage and high throughput, whereas the physical non-ideality of the RRAM devices impairs the retention characteristics of the resistive cells, leading to accuracy degradation. On the algorithm side, various hardware-aware compression algorithms have been proposed to accelerate the computation of deep neural networks (DNNs) computation. However, most recent works individually consider the "model compression"and "hardware robustness". The impact of the RRAM non-ideality for the sparse model is still under-explored. In this work, we present a novel temperature-resilient RRAM-based IMC scheme for reliable DNN inference hardware. Based on the measurement from a 90nm RRAM prototype chip, we first explore the robustness of the sparse model under the different operating temperatures (25°C to 85°C). On top of that, we propose a novel robustness-aware pruning algorithm, then further enhance the model robustness with a novel sparsity-aware noise-injected fine-tuning. The proposed scheme achieves >92% CIFAR-10 inference accuracy after one-day operation, which is >37% higher than the state-of-art method.
AB - Resistive random-access memory (RRAM)-based in-memory computing (IMC) recently became a promising paradigm for efficient deep neural network acceleration. The multi-bit RRAM arrays provide dense storage and high throughput, whereas the physical non-ideality of the RRAM devices impairs the retention characteristics of the resistive cells, leading to accuracy degradation. On the algorithm side, various hardware-aware compression algorithms have been proposed to accelerate the computation of deep neural networks (DNNs) computation. However, most recent works individually consider the "model compression"and "hardware robustness". The impact of the RRAM non-ideality for the sparse model is still under-explored. In this work, we present a novel temperature-resilient RRAM-based IMC scheme for reliable DNN inference hardware. Based on the measurement from a 90nm RRAM prototype chip, we first explore the robustness of the sparse model under the different operating temperatures (25°C to 85°C). On top of that, we propose a novel robustness-aware pruning algorithm, then further enhance the model robustness with a novel sparsity-aware noise-injected fine-tuning. The proposed scheme achieves >92% CIFAR-10 inference accuracy after one-day operation, which is >37% higher than the state-of-art method.
KW - Convolutional neural network
KW - data retention
KW - in-memory computing
KW - multilevel RRAM
KW - structured pruning
UR - http://www.scopus.com/inward/record.url?scp=85130718348&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85130718348&partnerID=8YFLogxK
U2 - 10.1109/IRPS48227.2022.9764480
DO - 10.1109/IRPS48227.2022.9764480
M3 - Conference contribution
AN - SCOPUS:85130718348
T3 - IEEE International Reliability Physics Symposium Proceedings
SP - 3C11-3C16
BT - 2022 IEEE International Reliability Physics Symposium, IRPS 2022 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 IEEE International Reliability Physics Symposium, IRPS 2022
Y2 - 27 March 2022 through 31 March 2022
ER -