TY - GEN
T1 - High-performance face detection with CPU-FPGA acceleration
AU - Mohanty, Abinash
AU - Suda, Naveen
AU - Kim, Minkyu
AU - Vrudhula, Sarma
AU - Seo, Jae-sun
AU - Cao, Yu
PY - 2016/7/29
Y1 - 2016/7/29
N2 - Face detection is a critical function in many embedded applications, such as computer vision and security. Although face detection has been well studied, detecting a large number of faces with different scales and excessive variations (pose, expression, or illumination) usually involves computationally expensive classification algorithms. These algorithms may divide an image into sub-windows at different scales, evaluate a large set of features for each sub-window, and determine the presence and location of a face. Even with state-of-the-art CPUs, it is still challenging to perform real-time face detection with sufficiently high energy efficiency and accuracy. In this paper, we propose a suite of acceleration techniques to enable such a capability on the CPU-FPGA platform, based on a state-of-the-art face detection algorithm that employs a large number of simple classifiers. We first map the algorithm using the integrated OpenCL environment for FPGA. Matching the structure of the algorithm, a nested architecture is proposed to speed up both memory access and the computing iterations. This multi-layer architecture distributes parallel computing cores with the memory. The physical aspects of the nested architecture, such as the core size and the number of cores, are further optimized to achieve real-time face detection, under realistic hardware constraints.
AB - Face detection is a critical function in many embedded applications, such as computer vision and security. Although face detection has been well studied, detecting a large number of faces with different scales and excessive variations (pose, expression, or illumination) usually involves computationally expensive classification algorithms. These algorithms may divide an image into sub-windows at different scales, evaluate a large set of features for each sub-window, and determine the presence and location of a face. Even with state-of-the-art CPUs, it is still challenging to perform real-time face detection with sufficiently high energy efficiency and accuracy. In this paper, we propose a suite of acceleration techniques to enable such a capability on the CPU-FPGA platform, based on a state-of-the-art face detection algorithm that employs a large number of simple classifiers. We first map the algorithm using the integrated OpenCL environment for FPGA. Matching the structure of the algorithm, a nested architecture is proposed to speed up both memory access and the computing iterations. This multi-layer architecture distributes parallel computing cores with the memory. The physical aspects of the nested architecture, such as the core size and the number of cores, are further optimized to achieve real-time face detection, under realistic hardware constraints.
KW - FPGA
KW - Face detection
KW - OpenCL
KW - hardware acceleration
UR - http://www.scopus.com/inward/record.url?scp=84983399733&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84983399733&partnerID=8YFLogxK
U2 - 10.1109/ISCAS.2016.7527184
DO - 10.1109/ISCAS.2016.7527184
M3 - Conference contribution
AN - SCOPUS:84983399733
T3 - Proceedings - IEEE International Symposium on Circuits and Systems
SP - 117
EP - 120
BT - ISCAS 2016 - IEEE International Symposium on Circuits and Systems
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2016 IEEE International Symposium on Circuits and Systems, ISCAS 2016
Y2 - 22 May 2016 through 25 May 2016
ER -