End-to-End FPGA-based Object Detection Using Pipelined CNN and Non-Maximum Suppression

Anupreetham Anupreetham; Mohamed Ibrahim; Mathew Hall; Andrew Boutros; Ajay Kuzhively; Abinash Mohanty; Eriko Nurvitadhi; Vaughn Betz; Yu Cao; Jae Sun Seo

doi:10.1109/FPL53798.2021.00021

End-to-End FPGA-based Object Detection Using Pipelined CNN and Non-Maximum Suppression

Anupreetham Anupreetham, Mohamed Ibrahim, Mathew Hall, Andrew Boutros, Ajay Kuzhively, Abinash Mohanty, Eriko Nurvitadhi, Vaughn Betz, Yu Cao, Jae Sun Seo

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

4 Scopus citations

Abstract

Object detection is an important computer vision task, with many applications in autonomous driving, smart surveillance, robotics, and other domains. Single-shot detectors (SSD) coupled with a convolutional neural network (CNN) for feature extraction can efficiently detect, classify and localize various objects in an input image with very high accuracy. In such systems, the convolution layers extract features and predict the bounding box locations for the detected objects as well as their confidence scores. Then, a non-maximum suppression (NMS) algorithm eliminates partially overlapping boxes and selects the bounding box with the highest score per class. However, these two components are strictly sequential; a conventional NMS algorithm needs to wait for all box predictions to be produced before processing them. This prohibits any overlap between the execution of the convolutional layers and NMS, resulting in significant latency overhead and throughput degradation. In this paper, we present a novel NMS algorithm that alleviates this bottleneck and enables a fully-pipelined hardware implementation. We also implement an end-to-end system for low-latency SSDMobileNet-V1 object detection, which combines a state-of-the-art deeply-pipelined CNN accelerator with a custom hardware implementation of our novel NMS algorithm. As a result of our new algorithm, the NMS module adds a minimal latency overhead of only 0.13μs to the SSD-MobileNet-V1 convolution layers. Our end-to-end object detection system implemented on an Intel Stratix 10 FPGA runs at a maximum operating frequency of 350 MHz, with a throughput of 609 frames-per-second and an end-to-end batch-1 latency of 2.4 ms. Our system achieves 1.5× higher throughput and 4.4× lower latency compared to the current state-of-the-art SSD-based object detection systems on FPGAs.

Original language	English (US)
Title of host publication	Proceedings - 2021 31st International Conference on Field-Programmable Logic and Applications, FPL 2021
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	76-82
Number of pages	7
ISBN (Electronic)	9781665437592
DOIs	https://doi.org/10.1109/FPL53798.2021.00021
State	Published - 2021
Event	31st International Conference on Field-Programmable Logic and Applications, FPL 2021 - Virtual, Dresden, Germany Duration: Aug 30 2021 → Sep 3 2021

Publication series

Name	Proceedings - 2021 31st International Conference on Field-Programmable Logic and Applications, FPL 2021

Conference

Conference	31st International Conference on Field-Programmable Logic and Applications, FPL 2021
Country/Territory	Germany
City	Virtual, Dresden
Period	8/30/21 → 9/3/21

ASJC Scopus subject areas

Artificial Intelligence
Computer Networks and Communications
Hardware and Architecture
Computer Science Applications
Software

Access to Document

10.1109/FPL53798.2021.00021

Cite this

Anupreetham, A., Ibrahim, M., Hall, M., Boutros, A., Kuzhively, A., Mohanty, A., Nurvitadhi, E., Betz, V., Cao, Y., & Seo, J. S. (2021). End-to-End FPGA-based Object Detection Using Pipelined CNN and Non-Maximum Suppression. In Proceedings - 2021 31st International Conference on Field-Programmable Logic and Applications, FPL 2021 (pp. 76-82). (Proceedings - 2021 31st International Conference on Field-Programmable Logic and Applications, FPL 2021). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/FPL53798.2021.00021

End-to-End FPGA-based Object Detection Using Pipelined CNN and Non-Maximum Suppression. / Anupreetham, Anupreetham; Ibrahim, Mohamed; Hall, Mathew et al.
Proceedings - 2021 31st International Conference on Field-Programmable Logic and Applications, FPL 2021. Institute of Electrical and Electronics Engineers Inc., 2021. p. 76-82 (Proceedings - 2021 31st International Conference on Field-Programmable Logic and Applications, FPL 2021).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Anupreetham, A, Ibrahim, M, Hall, M, Boutros, A, Kuzhively, A, Mohanty, A, Nurvitadhi, E, Betz, V, Cao, Y & Seo, JS 2021, End-to-End FPGA-based Object Detection Using Pipelined CNN and Non-Maximum Suppression. in Proceedings - 2021 31st International Conference on Field-Programmable Logic and Applications, FPL 2021. Proceedings - 2021 31st International Conference on Field-Programmable Logic and Applications, FPL 2021, Institute of Electrical and Electronics Engineers Inc., pp. 76-82, 31st International Conference on Field-Programmable Logic and Applications, FPL 2021, Virtual, Dresden, Germany, 8/30/21. https://doi.org/10.1109/FPL53798.2021.00021

Anupreetham A, Ibrahim M, Hall M, Boutros A, Kuzhively A, Mohanty A et al. End-to-End FPGA-based Object Detection Using Pipelined CNN and Non-Maximum Suppression. In Proceedings - 2021 31st International Conference on Field-Programmable Logic and Applications, FPL 2021. Institute of Electrical and Electronics Engineers Inc. 2021. p. 76-82. (Proceedings - 2021 31st International Conference on Field-Programmable Logic and Applications, FPL 2021). doi: 10.1109/FPL53798.2021.00021

Anupreetham, Anupreetham ; Ibrahim, Mohamed ; Hall, Mathew et al. / End-to-End FPGA-based Object Detection Using Pipelined CNN and Non-Maximum Suppression. Proceedings - 2021 31st International Conference on Field-Programmable Logic and Applications, FPL 2021. Institute of Electrical and Electronics Engineers Inc., 2021. pp. 76-82 (Proceedings - 2021 31st International Conference on Field-Programmable Logic and Applications, FPL 2021).

@inproceedings{4bfd9afa9ddb4bfa92ded5c1e16f0150,

title = "End-to-End FPGA-based Object Detection Using Pipelined CNN and Non-Maximum Suppression",

abstract = "Object detection is an important computer vision task, with many applications in autonomous driving, smart surveillance, robotics, and other domains. Single-shot detectors (SSD) coupled with a convolutional neural network (CNN) for feature extraction can efficiently detect, classify and localize various objects in an input image with very high accuracy. In such systems, the convolution layers extract features and predict the bounding box locations for the detected objects as well as their confidence scores. Then, a non-maximum suppression (NMS) algorithm eliminates partially overlapping boxes and selects the bounding box with the highest score per class. However, these two components are strictly sequential; a conventional NMS algorithm needs to wait for all box predictions to be produced before processing them. This prohibits any overlap between the execution of the convolutional layers and NMS, resulting in significant latency overhead and throughput degradation. In this paper, we present a novel NMS algorithm that alleviates this bottleneck and enables a fully-pipelined hardware implementation. We also implement an end-to-end system for low-latency SSDMobileNet-V1 object detection, which combines a state-of-the-art deeply-pipelined CNN accelerator with a custom hardware implementation of our novel NMS algorithm. As a result of our new algorithm, the NMS module adds a minimal latency overhead of only 0.13μs to the SSD-MobileNet-V1 convolution layers. Our end-to-end object detection system implemented on an Intel Stratix 10 FPGA runs at a maximum operating frequency of 350 MHz, with a throughput of 609 frames-per-second and an end-to-end batch-1 latency of 2.4 ms. Our system achieves 1.5× higher throughput and 4.4× lower latency compared to the current state-of-the-art SSD-based object detection systems on FPGAs.",

author = "Anupreetham Anupreetham and Mohamed Ibrahim and Mathew Hall and Andrew Boutros and Ajay Kuzhively and Abinash Mohanty and Eriko Nurvitadhi and Vaughn Betz and Yu Cao and Seo, {Jae Sun}",

note = "Funding Information: This work is partially supported by NSF grant 1652866, the Intel ISRA program on FPGA, JUMP C-BRIC (a SRC program sponsored by DARPA), the Intel/NSERC Industrial Research Chair in Programmable Silicon, and the VectorInstitute for Artificial Intelligence. Any opinions, findings, conclusions or recommendations are those of the authors and not of the funding institutions. Publisher Copyright: {\textcopyright} 2021 IEEE.; 31st International Conference on Field-Programmable Logic and Applications, FPL 2021 ; Conference date: 30-08-2021 Through 03-09-2021",

year = "2021",

doi = "10.1109/FPL53798.2021.00021",

language = "English (US)",

series = "Proceedings - 2021 31st International Conference on Field-Programmable Logic and Applications, FPL 2021",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "76--82",

booktitle = "Proceedings - 2021 31st International Conference on Field-Programmable Logic and Applications, FPL 2021",

}

TY - GEN

T1 - End-to-End FPGA-based Object Detection Using Pipelined CNN and Non-Maximum Suppression

AU - Anupreetham, Anupreetham

AU - Ibrahim, Mohamed

AU - Hall, Mathew

AU - Boutros, Andrew

AU - Kuzhively, Ajay

AU - Mohanty, Abinash

AU - Nurvitadhi, Eriko

AU - Betz, Vaughn

AU - Cao, Yu

AU - Seo, Jae Sun

N1 - Funding Information: This work is partially supported by NSF grant 1652866, the Intel ISRA program on FPGA, JUMP C-BRIC (a SRC program sponsored by DARPA), the Intel/NSERC Industrial Research Chair in Programmable Silicon, and the VectorInstitute for Artificial Intelligence. Any opinions, findings, conclusions or recommendations are those of the authors and not of the funding institutions. Publisher Copyright: © 2021 IEEE.

PY - 2021

Y1 - 2021

N2 - Object detection is an important computer vision task, with many applications in autonomous driving, smart surveillance, robotics, and other domains. Single-shot detectors (SSD) coupled with a convolutional neural network (CNN) for feature extraction can efficiently detect, classify and localize various objects in an input image with very high accuracy. In such systems, the convolution layers extract features and predict the bounding box locations for the detected objects as well as their confidence scores. Then, a non-maximum suppression (NMS) algorithm eliminates partially overlapping boxes and selects the bounding box with the highest score per class. However, these two components are strictly sequential; a conventional NMS algorithm needs to wait for all box predictions to be produced before processing them. This prohibits any overlap between the execution of the convolutional layers and NMS, resulting in significant latency overhead and throughput degradation. In this paper, we present a novel NMS algorithm that alleviates this bottleneck and enables a fully-pipelined hardware implementation. We also implement an end-to-end system for low-latency SSDMobileNet-V1 object detection, which combines a state-of-the-art deeply-pipelined CNN accelerator with a custom hardware implementation of our novel NMS algorithm. As a result of our new algorithm, the NMS module adds a minimal latency overhead of only 0.13μs to the SSD-MobileNet-V1 convolution layers. Our end-to-end object detection system implemented on an Intel Stratix 10 FPGA runs at a maximum operating frequency of 350 MHz, with a throughput of 609 frames-per-second and an end-to-end batch-1 latency of 2.4 ms. Our system achieves 1.5× higher throughput and 4.4× lower latency compared to the current state-of-the-art SSD-based object detection systems on FPGAs.

AB - Object detection is an important computer vision task, with many applications in autonomous driving, smart surveillance, robotics, and other domains. Single-shot detectors (SSD) coupled with a convolutional neural network (CNN) for feature extraction can efficiently detect, classify and localize various objects in an input image with very high accuracy. In such systems, the convolution layers extract features and predict the bounding box locations for the detected objects as well as their confidence scores. Then, a non-maximum suppression (NMS) algorithm eliminates partially overlapping boxes and selects the bounding box with the highest score per class. However, these two components are strictly sequential; a conventional NMS algorithm needs to wait for all box predictions to be produced before processing them. This prohibits any overlap between the execution of the convolutional layers and NMS, resulting in significant latency overhead and throughput degradation. In this paper, we present a novel NMS algorithm that alleviates this bottleneck and enables a fully-pipelined hardware implementation. We also implement an end-to-end system for low-latency SSDMobileNet-V1 object detection, which combines a state-of-the-art deeply-pipelined CNN accelerator with a custom hardware implementation of our novel NMS algorithm. As a result of our new algorithm, the NMS module adds a minimal latency overhead of only 0.13μs to the SSD-MobileNet-V1 convolution layers. Our end-to-end object detection system implemented on an Intel Stratix 10 FPGA runs at a maximum operating frequency of 350 MHz, with a throughput of 609 frames-per-second and an end-to-end batch-1 latency of 2.4 ms. Our system achieves 1.5× higher throughput and 4.4× lower latency compared to the current state-of-the-art SSD-based object detection systems on FPGAs.

UR - http://www.scopus.com/inward/record.url?scp=85125759116&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85125759116&partnerID=8YFLogxK

U2 - 10.1109/FPL53798.2021.00021

DO - 10.1109/FPL53798.2021.00021

M3 - Conference contribution

AN - SCOPUS:85125759116

T3 - Proceedings - 2021 31st International Conference on Field-Programmable Logic and Applications, FPL 2021

SP - 76

EP - 82

BT - Proceedings - 2021 31st International Conference on Field-Programmable Logic and Applications, FPL 2021

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 31st International Conference on Field-Programmable Logic and Applications, FPL 2021

Y2 - 30 August 2021 through 3 September 2021

ER -

End-to-End FPGA-based Object Detection Using Pipelined CNN and Non-Maximum Suppression

Abstract

Publication series

Conference

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this