TY - JOUR
T1 - An Accurate, Error-Tolerant, and Energy-Efficient Neural Network Inference Engine Based on SONOS Analog Memory
AU - Xiao, T. Patrick
AU - Feinberg, Ben
AU - Bennett, Christopher H.
AU - Agrawal, Vineet
AU - Saxena, Prashant
AU - Prabhakar, Venkatraman
AU - Ramkumar, Krishnaswamy
AU - Medu, Harsha
AU - Raghavan, Vijay
AU - Chettuvetty, Ramesh
AU - Agarwal, Sapan
AU - Marinella, Matthew J.
N1 - Publisher Copyright:
© 2004-2012 IEEE.
PY - 2022/4/1
Y1 - 2022/4/1
N2 - We demonstrate SONOS (silicon-oxide-nitride-oxide-silicon) analog memory arrays that are optimized for neural network inference. The devices are fabricated in a 40nm process and operated in the subthreshold regime for in-memory matrix multiplication. Subthreshold operation enables low conductances to be implemented with low error, which matches the typical weight distribution of neural networks, which is heavily skewed toward near-zero values. This leads to high accuracy in the presence of programming errors and process variations. We simulate the end-To-end neural network inference accuracy, accounting for the measured programming error, read noise, and retention loss in a fabricated SONOS array. Evaluated on the ImageNet dataset using ResNet50, the accuracy using a SONOS system is within 2.16% of floating-point accuracy without any retraining. The unique error properties and high On/Off ratio of the SONOS device allow scaling to large arrays without bit slicing, and enable an inference architecture that achieves 20 TOPS/W on ResNet50, a > 10× gain in energy efficiency over state-of-The-Art digital and analog inference accelerators.
AB - We demonstrate SONOS (silicon-oxide-nitride-oxide-silicon) analog memory arrays that are optimized for neural network inference. The devices are fabricated in a 40nm process and operated in the subthreshold regime for in-memory matrix multiplication. Subthreshold operation enables low conductances to be implemented with low error, which matches the typical weight distribution of neural networks, which is heavily skewed toward near-zero values. This leads to high accuracy in the presence of programming errors and process variations. We simulate the end-To-end neural network inference accuracy, accounting for the measured programming error, read noise, and retention loss in a fabricated SONOS array. Evaluated on the ImageNet dataset using ResNet50, the accuracy using a SONOS system is within 2.16% of floating-point accuracy without any retraining. The unique error properties and high On/Off ratio of the SONOS device allow scaling to large arrays without bit slicing, and enable an inference architecture that achieves 20 TOPS/W on ResNet50, a > 10× gain in energy efficiency over state-of-The-Art digital and analog inference accelerators.
KW - Analog
KW - Charge trap memory
KW - In-memory computing
KW - Inference accelerator
KW - Neural network
KW - Neuromorphic
KW - SONOS
UR - http://www.scopus.com/inward/record.url?scp=85122565802&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85122565802&partnerID=8YFLogxK
U2 - 10.1109/TCSI.2021.3134313
DO - 10.1109/TCSI.2021.3134313
M3 - Article
AN - SCOPUS:85122565802
SN - 1549-8328
VL - 69
SP - 1480
EP - 1493
JO - IEEE Transactions on Circuits and Systems I: Regular Papers
JF - IEEE Transactions on Circuits and Systems I: Regular Papers
IS - 4
ER -