TY - GEN
T1 - Efficient memory compression in deep neural networks using coarse-grain sparsification for speech applications
AU - Kadetotad, Deepak
AU - Arunachalam, Sairam
AU - Chakrabarti, Chaitali
AU - Seo, Jae-sun
N1 - Publisher Copyright:
© 2016 ACM.
PY - 2016/11/7
Y1 - 2016/11/7
N2 - Recent breakthroughs in deep neural networks have led to the proliferation of its use in image and speech applications. Conventional deep neural networks (DNNs) are fully-connected multi-layer networks with hundreds or thousands of neurons in each layer. Such a network requires a very large weight memory to store the connectivity between neurons. In this paper, we propose a hardware-centric methodology to design low power neural networks with significantly smaller memory footprint and computation resource requirements. We achieve this by judiciously dropping connections in large blocks of weights. The corresponding technique, termed coarse-grain sparsification (CGS), introduces hardware-aware sparsity during the DNN training, which leads to efficient weight memory compression and significant computation reduction during classification without losing accuracy. We apply the proposed approach to DNN design for keyword detection and speech recognition. When the two DNNs are trained with 75% of the weights dropped and classified with 5-6 bit weight precision, the weight memory requirement is reduced by 95% compared to their fully-connected counterparts with double precision, while maintaining similar performance in keyword detection accuracy, word error rate, and sentence error rate. To validate this technique in real hardware, a time-multiplexed architecture using a shared multiply and accumulate (MAC) engine was implemented in 65nm and 40nm low power (LP) CMOS. In 40nm at 0.6V, the keyword detection network consumes 7μW and the speech recognition network consumes 103μW, making this technique highly suitable for mobile and wearable devices.
AB - Recent breakthroughs in deep neural networks have led to the proliferation of its use in image and speech applications. Conventional deep neural networks (DNNs) are fully-connected multi-layer networks with hundreds or thousands of neurons in each layer. Such a network requires a very large weight memory to store the connectivity between neurons. In this paper, we propose a hardware-centric methodology to design low power neural networks with significantly smaller memory footprint and computation resource requirements. We achieve this by judiciously dropping connections in large blocks of weights. The corresponding technique, termed coarse-grain sparsification (CGS), introduces hardware-aware sparsity during the DNN training, which leads to efficient weight memory compression and significant computation reduction during classification without losing accuracy. We apply the proposed approach to DNN design for keyword detection and speech recognition. When the two DNNs are trained with 75% of the weights dropped and classified with 5-6 bit weight precision, the weight memory requirement is reduced by 95% compared to their fully-connected counterparts with double precision, while maintaining similar performance in keyword detection accuracy, word error rate, and sentence error rate. To validate this technique in real hardware, a time-multiplexed architecture using a shared multiply and accumulate (MAC) engine was implemented in 65nm and 40nm low power (LP) CMOS. In 40nm at 0.6V, the keyword detection network consumes 7μW and the speech recognition network consumes 103μW, making this technique highly suitable for mobile and wearable devices.
KW - deep neural networks
KW - keyword detection
KW - low power design
KW - memory compression
KW - speech recognition
UR - http://www.scopus.com/inward/record.url?scp=85000995893&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85000995893&partnerID=8YFLogxK
U2 - 10.1145/2966986.2967028
DO - 10.1145/2966986.2967028
M3 - Conference contribution
AN - SCOPUS:85000995893
T3 - IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD
BT - 2016 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 35th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2016
Y2 - 7 November 2016 through 10 November 2016
ER -