TY - GEN
T1 - Algorithm-hardware co-design of single shot detector for fast object detection on FPGAs
AU - Ma, Yufei
AU - Zheng, Tu
AU - Cao, Yu
AU - Vrudhula, Sarma
AU - Seo, Jae-sun
N1 - Funding Information:
This work was supported in part by the NSF I/UCRC Center for Embedded Systems through NSF grants 1230401, 1237856, 1701241, 1361926 and 1535669, NSF grants 1652866 and 1715443, Intel Labs, and C-BRIC, one of six centers in JUMP, a SRC program sponsored by DARPA.
Publisher Copyright:
© 2018 ACM.
PY - 2018/11/5
Y1 - 2018/11/5
N2 - The rapid improvement in computation capability has made convolutional neural networks (CNNs) a great success in recent years on image classification tasks, which has also prospered the development of objection detection algorithms with significantly improved accuracy. However, during the deployment phase, many applications demand low latency processing of one image with strict power consumption requirement, which reduces the efficiency of GPU and other general-purpose platform, bringing opportunities for specific acceleration hardware, e.g. FPGA, by customizing the digital circuit specific for the inference algorithm. Therefore, this work proposes to customize the detection algorithm, e.g. SSD, to benefit its hardware implementation with low data precision at the cost of marginal accuracy degradation. The proposed FPGA-based deep learning inference accelerator is demonstrated on two Intel FPGAs for SSD algorithm achieving up to 2.18 TOPS throughput and up to 3.3X superior energy-efficiency compared to GPU.
AB - The rapid improvement in computation capability has made convolutional neural networks (CNNs) a great success in recent years on image classification tasks, which has also prospered the development of objection detection algorithms with significantly improved accuracy. However, during the deployment phase, many applications demand low latency processing of one image with strict power consumption requirement, which reduces the efficiency of GPU and other general-purpose platform, bringing opportunities for specific acceleration hardware, e.g. FPGA, by customizing the digital circuit specific for the inference algorithm. Therefore, this work proposes to customize the detection algorithm, e.g. SSD, to benefit its hardware implementation with low data precision at the cost of marginal accuracy degradation. The proposed FPGA-based deep learning inference accelerator is demonstrated on two Intel FPGAs for SSD algorithm achieving up to 2.18 TOPS throughput and up to 3.3X superior energy-efficiency compared to GPU.
KW - FPGA
KW - HW/SW co-design
KW - hardware accelerator
KW - neural network
UR - http://www.scopus.com/inward/record.url?scp=85058172945&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85058172945&partnerID=8YFLogxK
U2 - 10.1145/3240765.3240775
DO - 10.1145/3240765.3240775
M3 - Conference contribution
AN - SCOPUS:85058172945
T3 - IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD
BT - 2018 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2018 - Digest of Technical Papers
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 37th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2018
Y2 - 5 November 2018 through 8 November 2018
ER -