TY - GEN
T1 - Accurate inference with inaccurate RRAM devices
T2 - 57th ACM/IEEE Design Automation Conference, DAC 2020
AU - Charan, Gouranga
AU - Hazra, Jubin
AU - Beckmann, Karsten
AU - Du, Xiaocong
AU - Krishnan, Gokul
AU - Joshi, Rajiv V.
AU - Cady, Nathaniel C.
AU - Cao, Yu
N1 - Funding Information:
This work was supported in part by the C-BRIC, one of six centers in JUMP, in part by the Semiconductor Research Corporation (SRC) Program, and in part by the by the National Science Foundation (NSF) under CCF 1715443.
Publisher Copyright:
© 2020 IEEE.
PY - 2020/7
Y1 - 2020/7
N2 - Resistive random-access memory (RRAM) is a promising technology for in-memory computing with high storage density, fast inference, and good compatibility with CMOS. However, the mapping of a pre-trained deep neural network (DNN) model on RRAM suffers from realistic device issues, especially the variation and quantization error, resulting in a significant reduction in inference accuracy. In this work, we first extract these statistical properties from 65 nm RRAM data on 300mm wafers. The RRAM data present 10-levels in quantization and 50% variance, resulting in an accuracy drop to 31.76% and 10.49% for MNIST and CIFAR-10 datasets, respectively. Based on the experimental data, we propose a combination of machine learning algorithms and on-line adaptation to recover the accuracy with the minimum overhead. The recipe first applies Knowledge Distillation (KD) to transfer an ideal model into a student model with statistical variations and 10 levels. Furthermore, an on-line sparse adaptation (OSA) method is applied to the DNN model mapped on to the RRAM array. Using importance sampling, OSA adds a small SRAM array that is sparsely connected to the main RRAM array; only this SRAM array is updated to recover the accuracy. As demonstrated on MNIST and CIFAR-10 datasets, a 7.86% area cost is sufficient to achieve baseline accuracy for the 65 nm RRAM devices.
AB - Resistive random-access memory (RRAM) is a promising technology for in-memory computing with high storage density, fast inference, and good compatibility with CMOS. However, the mapping of a pre-trained deep neural network (DNN) model on RRAM suffers from realistic device issues, especially the variation and quantization error, resulting in a significant reduction in inference accuracy. In this work, we first extract these statistical properties from 65 nm RRAM data on 300mm wafers. The RRAM data present 10-levels in quantization and 50% variance, resulting in an accuracy drop to 31.76% and 10.49% for MNIST and CIFAR-10 datasets, respectively. Based on the experimental data, we propose a combination of machine learning algorithms and on-line adaptation to recover the accuracy with the minimum overhead. The recipe first applies Knowledge Distillation (KD) to transfer an ideal model into a student model with statistical variations and 10 levels. Furthermore, an on-line sparse adaptation (OSA) method is applied to the DNN model mapped on to the RRAM array. Using importance sampling, OSA adds a small SRAM array that is sparsely connected to the main RRAM array; only this SRAM array is updated to recover the accuracy. As demonstrated on MNIST and CIFAR-10 datasets, a 7.86% area cost is sufficient to achieve baseline accuracy for the 65 nm RRAM devices.
KW - In-memory computing
KW - Knowledge Distillation
KW - On-line adaptation
KW - Resistive random access memory (RRAM)
KW - Robustness
UR - http://www.scopus.com/inward/record.url?scp=85093927059&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85093927059&partnerID=8YFLogxK
U2 - 10.1109/DAC18072.2020.9218605
DO - 10.1109/DAC18072.2020.9218605
M3 - Conference contribution
AN - SCOPUS:85093927059
T3 - Proceedings - Design Automation Conference
BT - 2020 57th ACM/IEEE Design Automation Conference, DAC 2020
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 20 July 2020 through 24 July 2020
ER -