TY - GEN
T1 - Efficient and modularized training on FPGA for real-time applications
AU - Venkataramanaiah, Shreyas Kolala
AU - Du, Xiaocong
AU - Li, Zheng
AU - Yin, Shihui
AU - Cao, Yu
AU - Seo, Jae Sun
N1 - Funding Information:
This work was supported in part by the Semiconductor Research Corporation (SRC) and DARPA. It was also partially supported by National Science Foundation (NSF) under CCF #1715443.
Publisher Copyright:
© 2020 Inst. Sci. inf., Univ. Defence in Belgrade. All rights reserved.
PY - 2020
Y1 - 2020
N2 - Training of deep Convolution Neural Networks (CNNs) requires a tremendous amount of computation and memory and thus, GPUs are widely used to meet the computation demands of these complex training tasks. However, lacking the flexibility to exploit architectural optimizations, GPUs have poor energy efficiency of GPUs and are hard to be deployed on energy-constrained platforms. FPGAs are highly suitable for training, such as real-time learning at the edge, as they provide higher energy efficiency and better flexibility to support algorithmic evolution. This paper first develops a training accelerator on FPGA, with 16-bit fixed-point computing and various training modules. Furthermore, leveraging model segmentation techniques from Progressive Segmented Training, the newly developed FPGA accelerator is applied to online learning, achieving much lower computation cost. We demonstrate the performance of representative CNNs trained for CIFAR-10 on Intel Stratix-10 MX FPGA, evaluating both the conventional training procedure and the online learning algorithm. The demo is available at https://github.com/dxc33linger/PSTonFPGA demo.
AB - Training of deep Convolution Neural Networks (CNNs) requires a tremendous amount of computation and memory and thus, GPUs are widely used to meet the computation demands of these complex training tasks. However, lacking the flexibility to exploit architectural optimizations, GPUs have poor energy efficiency of GPUs and are hard to be deployed on energy-constrained platforms. FPGAs are highly suitable for training, such as real-time learning at the edge, as they provide higher energy efficiency and better flexibility to support algorithmic evolution. This paper first develops a training accelerator on FPGA, with 16-bit fixed-point computing and various training modules. Furthermore, leveraging model segmentation techniques from Progressive Segmented Training, the newly developed FPGA accelerator is applied to online learning, achieving much lower computation cost. We demonstrate the performance of representative CNNs trained for CIFAR-10 on Intel Stratix-10 MX FPGA, evaluating both the conventional training procedure and the online learning algorithm. The demo is available at https://github.com/dxc33linger/PSTonFPGA demo.
UR - http://www.scopus.com/inward/record.url?scp=85097352355&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85097352355&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85097352355
T3 - IJCAI International Joint Conference on Artificial Intelligence
SP - 5237
EP - 5239
BT - Proceedings of the 29th International Joint Conference on Artificial Intelligence, IJCAI 2020
A2 - Bessiere, Christian
PB - International Joint Conferences on Artificial Intelligence
T2 - 29th International Joint Conference on Artificial Intelligence, IJCAI 2020
Y2 - 1 January 2021
ER -