TY - GEN
T1 - A novel design of adaptive and hierarchical convolutional neural networks using partial reconfiguration on FPGA
AU - Farhadi, Mohammad
AU - Ghasemi, Mehdi
AU - Yang, Yezhou
N1 - Funding Information:
In this paper, we proposed a new approach to run heavy neural networks on FPGAs with constrained resources. We stacked various shallow and deep models yielding an adaptive and hierarchical structure for quantaized neural networks. We conducted experiments on CIFAR-10, CIFAR-100 and SVHN, and empirically validated that AH-CNN maintains a similarly low inference time as the shallow models while achieving the high recognition accuracy of the deep model on image classification tasks. The flexible nature of this hierarchical method makes it suitable for applications that need adaptive behavior towards dynamic priority change over object categories, such as an agent with active perception. Acknowledgments: The National Science Foundation under the Robust Intelligence Program (1750082), and the IoT Innovation (I-square) fund provided by ASU Fulton Schools of Engineering are gratefully acknowledged. We also acknowledge NVIDIA and Xilinx for the donation of GPUs and FPGAs.
Funding Information:
The National Science Foundation under the Robust Intelligence Program (1750082), and the IoT Innovation (I-square) fund provided by ASU Fulton Schools of Engineering are gratefully acknowledged. We also acknowledge NVIDIA and Xilinx for the donation of GPUs and FPGAs.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/9
Y1 - 2019/9
N2 - Nowadays most research in visual recognition using Convolutional Neural Networks (CNNs) follows the 'deeper model with deeper confidence' belief to gain a higher recognition accuracy. At the same time, deeper model brings heavier computation. On the other hand, for a large chunk of recognition challenges, a system can classify images correctly using simple models or so-called shallow networks. Moreover, the implementation of CNNs faces with the size, weight, and energy constraints on the embedded devices. In this paper, we implement the adaptive switching between shallow and deep networks to reach the highest throughput on a resource-constrained MPSoC with CPU and FPGA. To this end, we develop and present a novel architecture for the CNNs where a gate makes the decision whether using the deeper model is beneficial or not. Due to resource limitation on FPGA, the idea of partial reconfiguration has been used to accommodate deep CNNs on the FPGA resources. We report experimental results on CIFAR-10, CIFAR-100, and SVHN datasets to validate our approach. Using confidence metric as the decision making factor, only 69.8%, 71.8%, and 43.8% of the computation in the deepest network is done for CIFAR10, CIFAR-100, and SVHN while it can maintain the desired accuracy with the throughput of around 400 images per second for SVHN dataset. https://github.com/mfarhadi/AHCNN.
AB - Nowadays most research in visual recognition using Convolutional Neural Networks (CNNs) follows the 'deeper model with deeper confidence' belief to gain a higher recognition accuracy. At the same time, deeper model brings heavier computation. On the other hand, for a large chunk of recognition challenges, a system can classify images correctly using simple models or so-called shallow networks. Moreover, the implementation of CNNs faces with the size, weight, and energy constraints on the embedded devices. In this paper, we implement the adaptive switching between shallow and deep networks to reach the highest throughput on a resource-constrained MPSoC with CPU and FPGA. To this end, we develop and present a novel architecture for the CNNs where a gate makes the decision whether using the deeper model is beneficial or not. Due to resource limitation on FPGA, the idea of partial reconfiguration has been used to accommodate deep CNNs on the FPGA resources. We report experimental results on CIFAR-10, CIFAR-100, and SVHN datasets to validate our approach. Using confidence metric as the decision making factor, only 69.8%, 71.8%, and 43.8% of the computation in the deepest network is done for CIFAR10, CIFAR-100, and SVHN while it can maintain the desired accuracy with the throughput of around 400 images per second for SVHN dataset. https://github.com/mfarhadi/AHCNN.
UR - http://www.scopus.com/inward/record.url?scp=85076684354&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85076684354&partnerID=8YFLogxK
U2 - 10.1109/HPEC.2019.8916237
DO - 10.1109/HPEC.2019.8916237
M3 - Conference contribution
AN - SCOPUS:85076684354
T3 - 2019 IEEE High Performance Extreme Computing Conference, HPEC 2019
BT - 2019 IEEE High Performance Extreme Computing Conference, HPEC 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE High Performance Extreme Computing Conference, HPEC 2019
Y2 - 24 September 2019 through 26 September 2019
ER -