TY - GEN
T1 - Systolic-CNN
T2 - 28th Annual IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2020
AU - Dua, Akshay
AU - Li, Yixing
AU - Ren, Fengbo
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/5
Y1 - 2020/5
N2 - This paper presents Systolic-CNN, an OpenCLdefined scalable, run-time-flexible FPGA accelerator architecture, optimized for performing the low-latency, energy-efficient inference of various convolutional neural networks (CNNs) in the context of multi-tenancy cloud/edge computing. Systolic-CNN adopts a highly pipelined and parallelized 1-D systolic array architecture, which efficiently explores both spatial and temporal parallelism for accelerating CNN inference on FPGAs. SystolicCNN is highly scalable and parameterized, which can be easily adapted by users to achieve 100% utilization of the coarsegrained computation resources (i.e., DSP blocks) for a given FPGA. In addition, Systolic-CNN is run-time-flexible, which can be time-shared, in the context of multi-tenancy cloud or edge computing, to accelerate a variety of CNN models at run time without the need of recompiling the FPGA kernel hardware nor reprogramming the FPGA. The experiment results based on an Intel Arria 10 GX FPGA Development board show that Systolic-CNN, when mapped with a single-precision data format, can achieve 100% utilization of the DSP block resource and an average inference latency of 10ms, 84ms, 1615ms, and 990ms per image for accelerating AlexNet, ResNet-50, RetinaNet, and Light-weight RetinaNet, respectively. The peak computational throughput is measured at 80-170 GFLOPS/s across the acceleration of different CNN models. Codes are available at https://github.com/PSCLab-ASU/SystolicCNN.
AB - This paper presents Systolic-CNN, an OpenCLdefined scalable, run-time-flexible FPGA accelerator architecture, optimized for performing the low-latency, energy-efficient inference of various convolutional neural networks (CNNs) in the context of multi-tenancy cloud/edge computing. Systolic-CNN adopts a highly pipelined and parallelized 1-D systolic array architecture, which efficiently explores both spatial and temporal parallelism for accelerating CNN inference on FPGAs. SystolicCNN is highly scalable and parameterized, which can be easily adapted by users to achieve 100% utilization of the coarsegrained computation resources (i.e., DSP blocks) for a given FPGA. In addition, Systolic-CNN is run-time-flexible, which can be time-shared, in the context of multi-tenancy cloud or edge computing, to accelerate a variety of CNN models at run time without the need of recompiling the FPGA kernel hardware nor reprogramming the FPGA. The experiment results based on an Intel Arria 10 GX FPGA Development board show that Systolic-CNN, when mapped with a single-precision data format, can achieve 100% utilization of the DSP block resource and an average inference latency of 10ms, 84ms, 1615ms, and 990ms per image for accelerating AlexNet, ResNet-50, RetinaNet, and Light-weight RetinaNet, respectively. The peak computational throughput is measured at 80-170 GFLOPS/s across the acceleration of different CNN models. Codes are available at https://github.com/PSCLab-ASU/SystolicCNN.
UR - http://www.scopus.com/inward/record.url?scp=85087329503&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85087329503&partnerID=8YFLogxK
U2 - 10.1109/FCCM48280.2020.00064
DO - 10.1109/FCCM48280.2020.00064
M3 - Conference contribution
AN - SCOPUS:85087329503
T3 - Proceedings - 28th IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2020
SP - 231
BT - Proceedings - 28th IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2020
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 3 May 2020 through 6 May 2020
ER -