Efficient memory compression in deep neural networks using coarse-grain sparsification for speech applications

Deepak Kadetotad; Sairam Arunachalam; Chaitali Chakrabarti; Jae-sun Seo

doi:10.1145/2966986.2967028

Efficient memory compression in deep neural networks using coarse-grain sparsification for speech applications

Deepak Kadetotad, Sairam Arunachalam, Chaitali Chakrabarti, Jae-sun Seo

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

28 Scopus citations

Abstract

Recent breakthroughs in deep neural networks have led to the proliferation of its use in image and speech applications. Conventional deep neural networks (DNNs) are fully-connected multi-layer networks with hundreds or thousands of neurons in each layer. Such a network requires a very large weight memory to store the connectivity between neurons. In this paper, we propose a hardware-centric methodology to design low power neural networks with significantly smaller memory footprint and computation resource requirements. We achieve this by judiciously dropping connections in large blocks of weights. The corresponding technique, termed coarse-grain sparsification (CGS), introduces hardware-aware sparsity during the DNN training, which leads to efficient weight memory compression and significant computation reduction during classification without losing accuracy. We apply the proposed approach to DNN design for keyword detection and speech recognition. When the two DNNs are trained with 75% of the weights dropped and classified with 5-6 bit weight precision, the weight memory requirement is reduced by 95% compared to their fully-connected counterparts with double precision, while maintaining similar performance in keyword detection accuracy, word error rate, and sentence error rate. To validate this technique in real hardware, a time-multiplexed architecture using a shared multiply and accumulate (MAC) engine was implemented in 65nm and 40nm low power (LP) CMOS. In 40nm at 0.6V, the keyword detection network consumes 7μW and the speech recognition network consumes 103μW, making this technique highly suitable for mobile and wearable devices.

Original language	English (US)
Title of host publication	2016 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2016
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)	9781450344661
DOIs	https://doi.org/10.1145/2966986.2967028
State	Published - Nov 7 2016
Event	35th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2016 - Austin, United States Duration: Nov 7 2016 → Nov 10 2016

Publication series

Name	IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD
Volume	07-10-November-2016
ISSN (Print)	1092-3152

Other

Other	35th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2016
Country/Territory	United States
City	Austin
Period	11/7/16 → 11/10/16

Keywords

deep neural networks
keyword detection
low power design
memory compression
speech recognition

ASJC Scopus subject areas

Software
Computer Science Applications
Computer Graphics and Computer-Aided Design

Access to Document

10.1145/2966986.2967028

Cite this

Kadetotad, D., Arunachalam, S., Chakrabarti, C., & Seo, J. (2016). Efficient memory compression in deep neural networks using coarse-grain sparsification for speech applications. In 2016 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2016 Article 2967028 (IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD; Vol. 07-10-November-2016). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1145/2966986.2967028

Efficient memory compression in deep neural networks using coarse-grain sparsification for speech applications. / Kadetotad, Deepak; Arunachalam, Sairam; Chakrabarti, Chaitali et al.
2016 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2016. Institute of Electrical and Electronics Engineers Inc., 2016. 2967028 (IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD; Vol. 07-10-November-2016).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Kadetotad, D, Arunachalam, S, Chakrabarti, C & Seo, J 2016, Efficient memory compression in deep neural networks using coarse-grain sparsification for speech applications. in 2016 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2016., 2967028, IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD, vol. 07-10-November-2016, Institute of Electrical and Electronics Engineers Inc., 35th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2016, Austin, United States, 11/7/16. https://doi.org/10.1145/2966986.2967028

Kadetotad D, Arunachalam S, Chakrabarti C, Seo J. Efficient memory compression in deep neural networks using coarse-grain sparsification for speech applications. In 2016 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2016. Institute of Electrical and Electronics Engineers Inc. 2016. 2967028. (IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD). doi: 10.1145/2966986.2967028

Kadetotad, Deepak ; Arunachalam, Sairam ; Chakrabarti, Chaitali et al. / Efficient memory compression in deep neural networks using coarse-grain sparsification for speech applications. 2016 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2016. Institute of Electrical and Electronics Engineers Inc., 2016. (IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD).

@inproceedings{5c5d448f79174ec98f1447f47b99b4c0,

title = "Efficient memory compression in deep neural networks using coarse-grain sparsification for speech applications",

abstract = "Recent breakthroughs in deep neural networks have led to the proliferation of its use in image and speech applications. Conventional deep neural networks (DNNs) are fully-connected multi-layer networks with hundreds or thousands of neurons in each layer. Such a network requires a very large weight memory to store the connectivity between neurons. In this paper, we propose a hardware-centric methodology to design low power neural networks with significantly smaller memory footprint and computation resource requirements. We achieve this by judiciously dropping connections in large blocks of weights. The corresponding technique, termed coarse-grain sparsification (CGS), introduces hardware-aware sparsity during the DNN training, which leads to efficient weight memory compression and significant computation reduction during classification without losing accuracy. We apply the proposed approach to DNN design for keyword detection and speech recognition. When the two DNNs are trained with 75% of the weights dropped and classified with 5-6 bit weight precision, the weight memory requirement is reduced by 95% compared to their fully-connected counterparts with double precision, while maintaining similar performance in keyword detection accuracy, word error rate, and sentence error rate. To validate this technique in real hardware, a time-multiplexed architecture using a shared multiply and accumulate (MAC) engine was implemented in 65nm and 40nm low power (LP) CMOS. In 40nm at 0.6V, the keyword detection network consumes 7μW and the speech recognition network consumes 103μW, making this technique highly suitable for mobile and wearable devices.",

keywords = "deep neural networks, keyword detection, low power design, memory compression, speech recognition",

author = "Deepak Kadetotad and Sairam Arunachalam and Chaitali Chakrabarti and Jae-sun Seo",

note = "Publisher Copyright: {\textcopyright} 2016 ACM.; 35th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2016 ; Conference date: 07-11-2016 Through 10-11-2016",

year = "2016",

month = nov,

day = "7",

doi = "10.1145/2966986.2967028",

language = "English (US)",

series = "IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "2016 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2016",

}

TY - GEN

T1 - Efficient memory compression in deep neural networks using coarse-grain sparsification for speech applications

AU - Kadetotad, Deepak

AU - Arunachalam, Sairam

AU - Chakrabarti, Chaitali

AU - Seo, Jae-sun

PY - 2016/11/7

Y1 - 2016/11/7

N2 - Recent breakthroughs in deep neural networks have led to the proliferation of its use in image and speech applications. Conventional deep neural networks (DNNs) are fully-connected multi-layer networks with hundreds or thousands of neurons in each layer. Such a network requires a very large weight memory to store the connectivity between neurons. In this paper, we propose a hardware-centric methodology to design low power neural networks with significantly smaller memory footprint and computation resource requirements. We achieve this by judiciously dropping connections in large blocks of weights. The corresponding technique, termed coarse-grain sparsification (CGS), introduces hardware-aware sparsity during the DNN training, which leads to efficient weight memory compression and significant computation reduction during classification without losing accuracy. We apply the proposed approach to DNN design for keyword detection and speech recognition. When the two DNNs are trained with 75% of the weights dropped and classified with 5-6 bit weight precision, the weight memory requirement is reduced by 95% compared to their fully-connected counterparts with double precision, while maintaining similar performance in keyword detection accuracy, word error rate, and sentence error rate. To validate this technique in real hardware, a time-multiplexed architecture using a shared multiply and accumulate (MAC) engine was implemented in 65nm and 40nm low power (LP) CMOS. In 40nm at 0.6V, the keyword detection network consumes 7μW and the speech recognition network consumes 103μW, making this technique highly suitable for mobile and wearable devices.

AB - Recent breakthroughs in deep neural networks have led to the proliferation of its use in image and speech applications. Conventional deep neural networks (DNNs) are fully-connected multi-layer networks with hundreds or thousands of neurons in each layer. Such a network requires a very large weight memory to store the connectivity between neurons. In this paper, we propose a hardware-centric methodology to design low power neural networks with significantly smaller memory footprint and computation resource requirements. We achieve this by judiciously dropping connections in large blocks of weights. The corresponding technique, termed coarse-grain sparsification (CGS), introduces hardware-aware sparsity during the DNN training, which leads to efficient weight memory compression and significant computation reduction during classification without losing accuracy. We apply the proposed approach to DNN design for keyword detection and speech recognition. When the two DNNs are trained with 75% of the weights dropped and classified with 5-6 bit weight precision, the weight memory requirement is reduced by 95% compared to their fully-connected counterparts with double precision, while maintaining similar performance in keyword detection accuracy, word error rate, and sentence error rate. To validate this technique in real hardware, a time-multiplexed architecture using a shared multiply and accumulate (MAC) engine was implemented in 65nm and 40nm low power (LP) CMOS. In 40nm at 0.6V, the keyword detection network consumes 7μW and the speech recognition network consumes 103μW, making this technique highly suitable for mobile and wearable devices.

KW - deep neural networks

KW - keyword detection

KW - low power design

KW - memory compression

KW - speech recognition

UR - http://www.scopus.com/inward/record.url?scp=85000995893&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85000995893&partnerID=8YFLogxK

U2 - 10.1145/2966986.2967028

DO - 10.1145/2966986.2967028

M3 - Conference contribution

AN - SCOPUS:85000995893

T3 - IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD

BT - 2016 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2016

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 35th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2016

Y2 - 7 November 2016 through 10 November 2016

ER -

Efficient memory compression in deep neural networks using coarse-grain sparsification for speech applications

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this