Build a compact binary neural network through bit-level sensitivity and data pruning

Yixing Li; Shuai Zhang; Xichuan Zhou; Fengbo Ren

doi:10.1016/j.neucom.2020.02.012

Build a compact binary neural network through bit-level sensitivity and data pruning

Yixing Li, Shuai Zhang, Xichuan Zhou, Fengbo Ren

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Contribution to journal › Article › peer-review

10 Scopus citations

Abstract

Due to the high computational complexity and memory storage requirement, it is hard to directly deploy a full-precision convolutional neural network (CNN) on embedded devices. The hardware-friendly designs are needed for resource-limited and energy-constrained embedded devices. Emerging solutions are adopted for the neural network compression, e.g., binary/ternary weight network, pruned network and quantized network. Among them, binary neural network (BNN) is believed to be the most hardware-friendly framework due to its small network size and low computational complexity. No existing work has further shrunk the size of BNN. In this work, we explore the redundancy in BNN and build a compact BNN (CBNN) based on the bit-level sensitivity analysis and bit-level data pruning. The input data is converted to a high dimensional bit-sliced format. In the post-training stage, we analyze the impact of different bit slices to the accuracy. By pruning the redundant input bit slices and shrinking the network size, we are able to build a more compact BNN. Our result shows that we can further scale down the network size of the BNN up to 3.9x with no more than 1% accuracy drop. The actual runtime can be reduced up to 2x and 9.9x compared with the baseline BNN and its full-precision counterpart, respectively.

Original language	English (US)
Pages (from-to)	45-54
Number of pages	10
Journal	Neurocomputing
Volume	398
DOIs	https://doi.org/10.1016/j.neucom.2020.02.012
State	Published - Jul 20 2020

Keywords

Binary neural networks
Deep learning
Deep neural networks
Neural network compression

ASJC Scopus subject areas

Computer Science Applications
Cognitive Neuroscience
Artificial Intelligence

Access to Document

10.1016/j.neucom.2020.02.012

Cite this

@article{0328e31afe5f481bace5457f700c35ea,

title = "Build a compact binary neural network through bit-level sensitivity and data pruning",

abstract = "Due to the high computational complexity and memory storage requirement, it is hard to directly deploy a full-precision convolutional neural network (CNN) on embedded devices. The hardware-friendly designs are needed for resource-limited and energy-constrained embedded devices. Emerging solutions are adopted for the neural network compression, e.g., binary/ternary weight network, pruned network and quantized network. Among them, binary neural network (BNN) is believed to be the most hardware-friendly framework due to its small network size and low computational complexity. No existing work has further shrunk the size of BNN. In this work, we explore the redundancy in BNN and build a compact BNN (CBNN) based on the bit-level sensitivity analysis and bit-level data pruning. The input data is converted to a high dimensional bit-sliced format. In the post-training stage, we analyze the impact of different bit slices to the accuracy. By pruning the redundant input bit slices and shrinking the network size, we are able to build a more compact BNN. Our result shows that we can further scale down the network size of the BNN up to 3.9x with no more than 1% accuracy drop. The actual runtime can be reduced up to 2x and 9.9x compared with the baseline BNN and its full-precision counterpart, respectively.",

keywords = "Binary neural networks, Deep learning, Deep neural networks, Neural network compression",

author = "Yixing Li and Shuai Zhang and Xichuan Zhou and Fengbo Ren",

note = "Funding Information: Fengbo Ren received the B.Eng. degree from Zhejiang University, Hangzhou, China, in 2008 and the M.S. and Ph.D. degrees from University of California, Los Angeles, in 2010 and 2014, respectively, all in electrical engineering. In 2015, he joined the faculty of the School of Computing, Informatics, and Decision Systems Engineering at Arizona State University (ASU). His Ph.D. research has involved in designing energy-efficient VLSI systems, accelerating compressive sensing signal reconstruction, and developing emerging memory technology. His current research interests are focused on hardware acceleration and parallel computing solutions for data analytics and information processing, with emphasis on bringing energy efficiency and signal intelligence into a wide spectrum of today{\textquoteright}s computing infrastructures, from data center server systems to wearable and Internet-of-things devices. He is a member of the Digital Signal Processing Technical Committee and VLSI Systems and Applications Technical Committee of the IEEE Circuits and Systems Society. He received the Broadcom Fellowship in 2012, the National Science Foundation (NSF) Faculty Early Career Development (CAREER) Award in 2017, and the Google Faculty Research Award in 2018. Funding Information: The work by Arizona State University is supported by an NSF grant ( IIS/CPS-1652038 ) and an unrestricted gift ( CG#1319167 ) from Cisco Research Center. The work by Chongqing University is funded by Science and Technology on Analog Integrated Circuit Laboratory (No. 6142802WD201807 ). The GPUs used for this research were donated by the NVIDIA Corporation. Publisher Copyright: {\textcopyright} 2020 Elsevier B.V.",

year = "2020",

month = jul,

day = "20",

doi = "10.1016/j.neucom.2020.02.012",

language = "English (US)",

volume = "398",

pages = "45--54",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier",

}

TY - JOUR

T1 - Build a compact binary neural network through bit-level sensitivity and data pruning

AU - Li, Yixing

AU - Zhang, Shuai

AU - Zhou, Xichuan

AU - Ren, Fengbo

N1 - Funding Information: Fengbo Ren received the B.Eng. degree from Zhejiang University, Hangzhou, China, in 2008 and the M.S. and Ph.D. degrees from University of California, Los Angeles, in 2010 and 2014, respectively, all in electrical engineering. In 2015, he joined the faculty of the School of Computing, Informatics, and Decision Systems Engineering at Arizona State University (ASU). His Ph.D. research has involved in designing energy-efficient VLSI systems, accelerating compressive sensing signal reconstruction, and developing emerging memory technology. His current research interests are focused on hardware acceleration and parallel computing solutions for data analytics and information processing, with emphasis on bringing energy efficiency and signal intelligence into a wide spectrum of today’s computing infrastructures, from data center server systems to wearable and Internet-of-things devices. He is a member of the Digital Signal Processing Technical Committee and VLSI Systems and Applications Technical Committee of the IEEE Circuits and Systems Society. He received the Broadcom Fellowship in 2012, the National Science Foundation (NSF) Faculty Early Career Development (CAREER) Award in 2017, and the Google Faculty Research Award in 2018. Funding Information: The work by Arizona State University is supported by an NSF grant ( IIS/CPS-1652038 ) and an unrestricted gift ( CG#1319167 ) from Cisco Research Center. The work by Chongqing University is funded by Science and Technology on Analog Integrated Circuit Laboratory (No. 6142802WD201807 ). The GPUs used for this research were donated by the NVIDIA Corporation. Publisher Copyright: © 2020 Elsevier B.V.

PY - 2020/7/20

Y1 - 2020/7/20

N2 - Due to the high computational complexity and memory storage requirement, it is hard to directly deploy a full-precision convolutional neural network (CNN) on embedded devices. The hardware-friendly designs are needed for resource-limited and energy-constrained embedded devices. Emerging solutions are adopted for the neural network compression, e.g., binary/ternary weight network, pruned network and quantized network. Among them, binary neural network (BNN) is believed to be the most hardware-friendly framework due to its small network size and low computational complexity. No existing work has further shrunk the size of BNN. In this work, we explore the redundancy in BNN and build a compact BNN (CBNN) based on the bit-level sensitivity analysis and bit-level data pruning. The input data is converted to a high dimensional bit-sliced format. In the post-training stage, we analyze the impact of different bit slices to the accuracy. By pruning the redundant input bit slices and shrinking the network size, we are able to build a more compact BNN. Our result shows that we can further scale down the network size of the BNN up to 3.9x with no more than 1% accuracy drop. The actual runtime can be reduced up to 2x and 9.9x compared with the baseline BNN and its full-precision counterpart, respectively.

AB - Due to the high computational complexity and memory storage requirement, it is hard to directly deploy a full-precision convolutional neural network (CNN) on embedded devices. The hardware-friendly designs are needed for resource-limited and energy-constrained embedded devices. Emerging solutions are adopted for the neural network compression, e.g., binary/ternary weight network, pruned network and quantized network. Among them, binary neural network (BNN) is believed to be the most hardware-friendly framework due to its small network size and low computational complexity. No existing work has further shrunk the size of BNN. In this work, we explore the redundancy in BNN and build a compact BNN (CBNN) based on the bit-level sensitivity analysis and bit-level data pruning. The input data is converted to a high dimensional bit-sliced format. In the post-training stage, we analyze the impact of different bit slices to the accuracy. By pruning the redundant input bit slices and shrinking the network size, we are able to build a more compact BNN. Our result shows that we can further scale down the network size of the BNN up to 3.9x with no more than 1% accuracy drop. The actual runtime can be reduced up to 2x and 9.9x compared with the baseline BNN and its full-precision counterpart, respectively.

KW - Binary neural networks

KW - Deep learning

KW - Deep neural networks

KW - Neural network compression

UR - http://www.scopus.com/inward/record.url?scp=85080141588&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85080141588&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2020.02.012

DO - 10.1016/j.neucom.2020.02.012

M3 - Article

AN - SCOPUS:85080141588

SN - 0925-2312

VL - 398

SP - 45

EP - 54

JO - Neurocomputing

JF - Neurocomputing

ER -

Build a compact binary neural network through bit-level sensitivity and data pruning

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this