A Mixed-Signal Quantized Neural Network Accelerator Using Flash Transistors

Kyler R. Scott, Cheng Yen Lee, Sunil P. Khatri, Sarma Vrudhula

Research output: Contribution to journalArticlepeer-review

Abstract

This paper presents a mixed-signal architecture for implementing Quantized Neural Networks (QNNs) using flash transistors, to achieve extremely high throughput with extremely low power, energy and memory requirements. Its low resource utilization makes our design especially suited for use in edge devices. The network weights are stored in-memory using flash transistors, and neurons perform operations in the analog current domain. Our design can be programmed with any QNN whose hyperparameters (the number of layers, filters, or filter size, etc) do not exceed the maximum provisioned. Once the flash devices are programmed with a trained model and our IC is given an input, our architecture performs inference with zero access to off-chip memory. We demonstrate the robustness of our design under current-mode non-linearities arising from process, voltage, and temperature (PVT) variations. We test validation accuracy on the ImageNet dataset, and show that our IC suffers only 0.71% and 0.92% reduction in classification accuracy for Top-1 and Top-5 outputs, respectively. Our implementation achieves between 2.1\times and 125\times better energy efficiency than previous NVM-based QNN accelerators. Our approach provides layer partitioning and neuron sharing options, which allow us to trade off latency, power, and area amongst each other.

Original languageEnglish (US)
Pages (from-to)1025-1038
Number of pages14
JournalIEEE Transactions on Circuits and Systems I: Regular Papers
Volume71
Issue number3
DOIs
StatePublished - Mar 1 2024

Keywords

  • Machine learning accelerators
  • current-mode circuits
  • floating-gate transistors
  • low-power circuits
  • quantized neural networks

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'A Mixed-Signal Quantized Neural Network Accelerator Using Flash Transistors'. Together they form a unique fingerprint.

Cite this