Systolic-CNN: An OpenCL-defined Scalable Run-time-flexible FPGA Accelerator Architecture for Accelerating Convolutional Neural Network Inference in Cloud/Edge Computing

Akshay Dua; Yixing Li; Fengbo Ren

doi:10.1109/FCCM48280.2020.00064

Systolic-CNN: An OpenCL-defined Scalable Run-time-flexible FPGA Accelerator Architecture for Accelerating Convolutional Neural Network Inference in Cloud/Edge Computing

Akshay Dua, Yixing Li, Fengbo Ren

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

11 Scopus citations

Abstract

This paper presents Systolic-CNN, an OpenCLdefined scalable, run-time-flexible FPGA accelerator architecture, optimized for performing the low-latency, energy-efficient inference of various convolutional neural networks (CNNs) in the context of multi-tenancy cloud/edge computing. Systolic-CNN adopts a highly pipelined and parallelized 1-D systolic array architecture, which efficiently explores both spatial and temporal parallelism for accelerating CNN inference on FPGAs. SystolicCNN is highly scalable and parameterized, which can be easily adapted by users to achieve 100% utilization of the coarsegrained computation resources (i.e., DSP blocks) for a given FPGA. In addition, Systolic-CNN is run-time-flexible, which can be time-shared, in the context of multi-tenancy cloud or edge computing, to accelerate a variety of CNN models at run time without the need of recompiling the FPGA kernel hardware nor reprogramming the FPGA. The experiment results based on an Intel Arria 10 GX FPGA Development board show that Systolic-CNN, when mapped with a single-precision data format, can achieve 100% utilization of the DSP block resource and an average inference latency of 10ms, 84ms, 1615ms, and 990ms per image for accelerating AlexNet, ResNet-50, RetinaNet, and Light-weight RetinaNet, respectively. The peak computational throughput is measured at 80-170 GFLOPS/s across the acceleration of different CNN models. Codes are available at https://github.com/PSCLab-ASU/SystolicCNN.

Original language	English (US)
Title of host publication	Proceedings - 28th IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2020
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	231
Number of pages	1
ISBN (Electronic)	9781728158037
DOIs	https://doi.org/10.1109/FCCM48280.2020.00064
State	Published - May 2020
Event	28th Annual IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2020 - Fayetteville, United States Duration: May 3 2020 → May 6 2020

Publication series

Name	Proceedings - 28th IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2020

Conference

Conference	28th Annual IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2020
Country/Territory	United States
City	Fayetteville
Period	5/3/20 → 5/6/20

ASJC Scopus subject areas

Computational Mathematics
Computer Networks and Communications
Computer Science Applications
Hardware and Architecture
Signal Processing

Access to Document

10.1109/FCCM48280.2020.00064

Cite this

Dua, A., Li, Y., & Ren, F. (2020). Systolic-CNN: An OpenCL-defined Scalable Run-time-flexible FPGA Accelerator Architecture for Accelerating Convolutional Neural Network Inference in Cloud/Edge Computing. In Proceedings - 28th IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2020 (pp. 231). Article 9114649 (Proceedings - 28th IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2020). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/FCCM48280.2020.00064

Systolic-CNN: An OpenCL-defined Scalable Run-time-flexible FPGA Accelerator Architecture for Accelerating Convolutional Neural Network Inference in Cloud/Edge Computing. / Dua, Akshay; Li, Yixing; Ren, Fengbo.
Proceedings - 28th IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2020. Institute of Electrical and Electronics Engineers Inc., 2020. p. 231 9114649 (Proceedings - 28th IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2020).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Dua, A, Li, Y & Ren, F 2020, Systolic-CNN: An OpenCL-defined Scalable Run-time-flexible FPGA Accelerator Architecture for Accelerating Convolutional Neural Network Inference in Cloud/Edge Computing. in Proceedings - 28th IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2020., 9114649, Proceedings - 28th IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2020, Institute of Electrical and Electronics Engineers Inc., pp. 231, 28th Annual IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2020, Fayetteville, United States, 5/3/20. https://doi.org/10.1109/FCCM48280.2020.00064

Dua A, Li Y, Ren F. Systolic-CNN: An OpenCL-defined Scalable Run-time-flexible FPGA Accelerator Architecture for Accelerating Convolutional Neural Network Inference in Cloud/Edge Computing. In Proceedings - 28th IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2020. Institute of Electrical and Electronics Engineers Inc. 2020. p. 231. 9114649. (Proceedings - 28th IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2020). doi: 10.1109/FCCM48280.2020.00064

Dua, Akshay ; Li, Yixing ; Ren, Fengbo. / Systolic-CNN : An OpenCL-defined Scalable Run-time-flexible FPGA Accelerator Architecture for Accelerating Convolutional Neural Network Inference in Cloud/Edge Computing. Proceedings - 28th IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2020. Institute of Electrical and Electronics Engineers Inc., 2020. pp. 231 (Proceedings - 28th IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2020).

@inproceedings{a7e913fad421466eaaf8baa0c4a70fc0,

title = "Systolic-CNN: An OpenCL-defined Scalable Run-time-flexible FPGA Accelerator Architecture for Accelerating Convolutional Neural Network Inference in Cloud/Edge Computing",

abstract = "This paper presents Systolic-CNN, an OpenCLdefined scalable, run-time-flexible FPGA accelerator architecture, optimized for performing the low-latency, energy-efficient inference of various convolutional neural networks (CNNs) in the context of multi-tenancy cloud/edge computing. Systolic-CNN adopts a highly pipelined and parallelized 1-D systolic array architecture, which efficiently explores both spatial and temporal parallelism for accelerating CNN inference on FPGAs. SystolicCNN is highly scalable and parameterized, which can be easily adapted by users to achieve 100% utilization of the coarsegrained computation resources (i.e., DSP blocks) for a given FPGA. In addition, Systolic-CNN is run-time-flexible, which can be time-shared, in the context of multi-tenancy cloud or edge computing, to accelerate a variety of CNN models at run time without the need of recompiling the FPGA kernel hardware nor reprogramming the FPGA. The experiment results based on an Intel Arria 10 GX FPGA Development board show that Systolic-CNN, when mapped with a single-precision data format, can achieve 100% utilization of the DSP block resource and an average inference latency of 10ms, 84ms, 1615ms, and 990ms per image for accelerating AlexNet, ResNet-50, RetinaNet, and Light-weight RetinaNet, respectively. The peak computational throughput is measured at 80-170 GFLOPS/s across the acceleration of different CNN models. Codes are available at https://github.com/PSCLab-ASU/SystolicCNN.",

author = "Akshay Dua and Yixing Li and Fengbo Ren",

note = "Publisher Copyright: {\textcopyright} 2020 IEEE.; 28th Annual IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2020 ; Conference date: 03-05-2020 Through 06-05-2020",

year = "2020",

month = may,

doi = "10.1109/FCCM48280.2020.00064",

language = "English (US)",

series = "Proceedings - 28th IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2020",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "231",

booktitle = "Proceedings - 28th IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2020",

}

TY - GEN

T1 - Systolic-CNN

T2 - 28th Annual IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2020

AU - Dua, Akshay

AU - Li, Yixing

AU - Ren, Fengbo

PY - 2020/5

Y1 - 2020/5

N2 - This paper presents Systolic-CNN, an OpenCLdefined scalable, run-time-flexible FPGA accelerator architecture, optimized for performing the low-latency, energy-efficient inference of various convolutional neural networks (CNNs) in the context of multi-tenancy cloud/edge computing. Systolic-CNN adopts a highly pipelined and parallelized 1-D systolic array architecture, which efficiently explores both spatial and temporal parallelism for accelerating CNN inference on FPGAs. SystolicCNN is highly scalable and parameterized, which can be easily adapted by users to achieve 100% utilization of the coarsegrained computation resources (i.e., DSP blocks) for a given FPGA. In addition, Systolic-CNN is run-time-flexible, which can be time-shared, in the context of multi-tenancy cloud or edge computing, to accelerate a variety of CNN models at run time without the need of recompiling the FPGA kernel hardware nor reprogramming the FPGA. The experiment results based on an Intel Arria 10 GX FPGA Development board show that Systolic-CNN, when mapped with a single-precision data format, can achieve 100% utilization of the DSP block resource and an average inference latency of 10ms, 84ms, 1615ms, and 990ms per image for accelerating AlexNet, ResNet-50, RetinaNet, and Light-weight RetinaNet, respectively. The peak computational throughput is measured at 80-170 GFLOPS/s across the acceleration of different CNN models. Codes are available at https://github.com/PSCLab-ASU/SystolicCNN.

AB - This paper presents Systolic-CNN, an OpenCLdefined scalable, run-time-flexible FPGA accelerator architecture, optimized for performing the low-latency, energy-efficient inference of various convolutional neural networks (CNNs) in the context of multi-tenancy cloud/edge computing. Systolic-CNN adopts a highly pipelined and parallelized 1-D systolic array architecture, which efficiently explores both spatial and temporal parallelism for accelerating CNN inference on FPGAs. SystolicCNN is highly scalable and parameterized, which can be easily adapted by users to achieve 100% utilization of the coarsegrained computation resources (i.e., DSP blocks) for a given FPGA. In addition, Systolic-CNN is run-time-flexible, which can be time-shared, in the context of multi-tenancy cloud or edge computing, to accelerate a variety of CNN models at run time without the need of recompiling the FPGA kernel hardware nor reprogramming the FPGA. The experiment results based on an Intel Arria 10 GX FPGA Development board show that Systolic-CNN, when mapped with a single-precision data format, can achieve 100% utilization of the DSP block resource and an average inference latency of 10ms, 84ms, 1615ms, and 990ms per image for accelerating AlexNet, ResNet-50, RetinaNet, and Light-weight RetinaNet, respectively. The peak computational throughput is measured at 80-170 GFLOPS/s across the acceleration of different CNN models. Codes are available at https://github.com/PSCLab-ASU/SystolicCNN.

UR - http://www.scopus.com/inward/record.url?scp=85087329503&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85087329503&partnerID=8YFLogxK

U2 - 10.1109/FCCM48280.2020.00064

DO - 10.1109/FCCM48280.2020.00064

M3 - Conference contribution

AN - SCOPUS:85087329503

T3 - Proceedings - 28th IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2020

SP - 231

BT - Proceedings - 28th IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2020

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 3 May 2020 through 6 May 2020

ER -

Systolic-CNN: An OpenCL-defined Scalable Run-time-flexible FPGA Accelerator Architecture for Accelerating Convolutional Neural Network Inference in Cloud/Edge Computing

Abstract

Publication series

Conference

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this