Transmuter: Bridging the efficiency gap using memory and dataflow reconfiguration

Subhankar Pal; Siying Feng; Dong Hyeon Park; Sung Kim; Aporva Amarnath; Chi Sheng Yang; Xin He; Jonathan Beaumont; Kyle May; Yan Xiong; Kuba Kaszyk; John Magnus Morton; Jiawen Sun; Michael O'Boyle; Murray Cole; Chaitali Chakrabarti; David Blaauw; Hun Seok Kim; Trevor Mudge; Ronald Dreslinski

doi:10.1145/3410463.3414627

Transmuter: Bridging the efficiency gap using memory and dataflow reconfiguration

Subhankar Pal, Siying Feng, Dong Hyeon Park, Sung Kim, Aporva Amarnath, Chi Sheng Yang, Xin He, Jonathan Beaumont, Kyle May, Yan Xiong, Kuba Kaszyk, John Magnus Morton, Jiawen Sun, Michael O'Boyle, Murray Cole, Chaitali Chakrabarti, David Blaauw, Hun Seok Kim, Trevor Mudge, Ronald Dreslinski

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

15 Scopus citations

Abstract

With the end of Dennard scaling and Moore's law, it is becoming increasingly difficult to build hardware for emerging applications thatmeet power and performance targets, while remaining flexible andprogrammable for end users. This is particularly true for domainsthat have frequently changing algorithms and applications involving mixed sparse/dense data structures, such as those in machinelearning and graph analytics. To overcome this, we present a flexibleaccelerator called Transmuter, in a novel effort to bridge the gap between General-Purpose Processors (GPPs) and Application-SpecificIntegrated Circuits (ASICs). Transmuter adapts to changing kernelcharacteristics, such as data reuse and control divergence, throughthe ability to reconfigure the on-chip memory type, resource sharingand dataflow at run-time within a short latency. This is facilitatedby a fabric of light-weight cores connected to a network of reconfigurable caches and crossbars. Transmuter addresses a rapidlygrowing set of algorithms exhibiting dynamic data movement patterns, irregularity, and sparsity, while delivering GPU-like efficiencies for traditional dense applications. Finally, in order to supportprogrammability and ease-of-adoption, we prototype a softwarestack composed of low-level runtime routines, and a high-levellanguage library called TransPy, that cater to expert programmersand end-users, respectively.Our evaluations with Transmuter demonstrate average throughput (energy-efficiency) improvements of 5.0× (18.4×) and 4.2× (4.0×)over a high-end CPU and GPU, respectively, across a diverse set ofkernels predominant in graph analytics, scientific computing andmachine learning. Transmuter achieves energy-efficiency gains averaging 3.4× and 2.0× over prior FPGA and CGRA implementationsof the same kernels, while remaining on average within 9.3× ofstate-of-the-art ASICs.

Original language	English (US)
Title of host publication	PACT 2020 - Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	175-190
Number of pages	16
ISBN (Electronic)	9781450380751
DOIs	https://doi.org/10.1145/3410463.3414627
State	Published - Sep 30 2020
Event	2020 ACM International Conference on Parallel Architectures and Compilation Techniques, PACT 2020 - Virtual, Online, United States Duration: Oct 3 2020 → Oct 7 2020

Publication series

Name	Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT
ISSN (Print)	1089-795X

Conference

Conference	2020 ACM International Conference on Parallel Architectures and Compilation Techniques, PACT 2020
Country/Territory	United States
City	Virtual, Online
Period	10/3/20 → 10/7/20

Keywords

Dataflow reconfiguration
General-purpose acceleration
Hardware acceleration
Memory reconfiguration
Reconfigurable architectures

ASJC Scopus subject areas

Software
Theoretical Computer Science
Hardware and Architecture

Access to Document

10.1145/3410463.3414627

Cite this

Pal, S., Feng, S., Park, D. H., Kim, S., Amarnath, A., Yang, C. S., He, X., Beaumont, J., May, K., Xiong, Y., Kaszyk, K., Morton, J. M., Sun, J., O'Boyle, M., Cole, M., Chakrabarti, C., Blaauw, D., Kim, H. S., Mudge, T., & Dreslinski, R. (2020). Transmuter: Bridging the efficiency gap using memory and dataflow reconfiguration. In PACT 2020 - Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques (pp. 175-190). (Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1145/3410463.3414627

Transmuter: Bridging the efficiency gap using memory and dataflow reconfiguration. / Pal, Subhankar; Feng, Siying; Park, Dong Hyeon et al.
PACT 2020 - Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques. Institute of Electrical and Electronics Engineers Inc., 2020. p. 175-190 (Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Pal, S, Feng, S, Park, DH, Kim, S, Amarnath, A, Yang, CS, He, X, Beaumont, J, May, K, Xiong, Y, Kaszyk, K, Morton, JM, Sun, J, O'Boyle, M, Cole, M, Chakrabarti, C, Blaauw, D, Kim, HS, Mudge, T & Dreslinski, R 2020, Transmuter: Bridging the efficiency gap using memory and dataflow reconfiguration. in PACT 2020 - Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques. Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT, Institute of Electrical and Electronics Engineers Inc., pp. 175-190, 2020 ACM International Conference on Parallel Architectures and Compilation Techniques, PACT 2020, Virtual, Online, United States, 10/3/20. https://doi.org/10.1145/3410463.3414627

Pal S, Feng S, Park DH, Kim S, Amarnath A, Yang CS et al. Transmuter: Bridging the efficiency gap using memory and dataflow reconfiguration. In PACT 2020 - Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques. Institute of Electrical and Electronics Engineers Inc. 2020. p. 175-190. (Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT). doi: 10.1145/3410463.3414627

Pal, Subhankar ; Feng, Siying ; Park, Dong Hyeon et al. / Transmuter : Bridging the efficiency gap using memory and dataflow reconfiguration. PACT 2020 - Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques. Institute of Electrical and Electronics Engineers Inc., 2020. pp. 175-190 (Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT).

@inproceedings{d429b1089780485190b6b991f4f88bce,

title = "Transmuter: Bridging the efficiency gap using memory and dataflow reconfiguration",

abstract = "With the end of Dennard scaling and Moore's law, it is becoming increasingly difficult to build hardware for emerging applications thatmeet power and performance targets, while remaining flexible andprogrammable for end users. This is particularly true for domainsthat have frequently changing algorithms and applications involving mixed sparse/dense data structures, such as those in machinelearning and graph analytics. To overcome this, we present a flexibleaccelerator called Transmuter, in a novel effort to bridge the gap between General-Purpose Processors (GPPs) and Application-SpecificIntegrated Circuits (ASICs). Transmuter adapts to changing kernelcharacteristics, such as data reuse and control divergence, throughthe ability to reconfigure the on-chip memory type, resource sharingand dataflow at run-time within a short latency. This is facilitatedby a fabric of light-weight cores connected to a network of reconfigurable caches and crossbars. Transmuter addresses a rapidlygrowing set of algorithms exhibiting dynamic data movement patterns, irregularity, and sparsity, while delivering GPU-like efficiencies for traditional dense applications. Finally, in order to supportprogrammability and ease-of-adoption, we prototype a softwarestack composed of low-level runtime routines, and a high-levellanguage library called TransPy, that cater to expert programmersand end-users, respectively.Our evaluations with Transmuter demonstrate average throughput (energy-efficiency) improvements of 5.0× (18.4×) and 4.2× (4.0×)over a high-end CPU and GPU, respectively, across a diverse set ofkernels predominant in graph analytics, scientific computing andmachine learning. Transmuter achieves energy-efficiency gains averaging 3.4× and 2.0× over prior FPGA and CGRA implementationsof the same kernels, while remaining on average within 9.3× ofstate-of-the-art ASICs.",

keywords = "Dataflow reconfiguration, General-purpose acceleration, Hardware acceleration, Memory reconfiguration, Reconfigurable architectures",

author = "Subhankar Pal and Siying Feng and Park, {Dong Hyeon} and Sung Kim and Aporva Amarnath and Yang, {Chi Sheng} and Xin He and Jonathan Beaumont and Kyle May and Yan Xiong and Kuba Kaszyk and Morton, {John Magnus} and Jiawen Sun and Michael O'Boyle and Murray Cole and Chaitali Chakrabarti and David Blaauw and Kim, {Hun Seok} and Trevor Mudge and Ronald Dreslinski",

note = "Publisher Copyright: {\textcopyright} 2020 Association for Computing Machinery.; 2020 ACM International Conference on Parallel Architectures and Compilation Techniques, PACT 2020 ; Conference date: 03-10-2020 Through 07-10-2020",

year = "2020",

month = sep,

day = "30",

doi = "10.1145/3410463.3414627",

language = "English (US)",

series = "Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "175--190",

booktitle = "PACT 2020 - Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques",

}

TY - GEN

T1 - Transmuter

T2 - 2020 ACM International Conference on Parallel Architectures and Compilation Techniques, PACT 2020

AU - Pal, Subhankar

AU - Feng, Siying

AU - Park, Dong Hyeon

AU - Kim, Sung

AU - Amarnath, Aporva

AU - Yang, Chi Sheng

AU - He, Xin

AU - Beaumont, Jonathan

AU - May, Kyle

AU - Xiong, Yan

AU - Kaszyk, Kuba

AU - Morton, John Magnus

AU - Sun, Jiawen

AU - O'Boyle, Michael

AU - Cole, Murray

AU - Chakrabarti, Chaitali

AU - Blaauw, David

AU - Kim, Hun Seok

AU - Mudge, Trevor

AU - Dreslinski, Ronald

PY - 2020/9/30

Y1 - 2020/9/30

N2 - With the end of Dennard scaling and Moore's law, it is becoming increasingly difficult to build hardware for emerging applications thatmeet power and performance targets, while remaining flexible andprogrammable for end users. This is particularly true for domainsthat have frequently changing algorithms and applications involving mixed sparse/dense data structures, such as those in machinelearning and graph analytics. To overcome this, we present a flexibleaccelerator called Transmuter, in a novel effort to bridge the gap between General-Purpose Processors (GPPs) and Application-SpecificIntegrated Circuits (ASICs). Transmuter adapts to changing kernelcharacteristics, such as data reuse and control divergence, throughthe ability to reconfigure the on-chip memory type, resource sharingand dataflow at run-time within a short latency. This is facilitatedby a fabric of light-weight cores connected to a network of reconfigurable caches and crossbars. Transmuter addresses a rapidlygrowing set of algorithms exhibiting dynamic data movement patterns, irregularity, and sparsity, while delivering GPU-like efficiencies for traditional dense applications. Finally, in order to supportprogrammability and ease-of-adoption, we prototype a softwarestack composed of low-level runtime routines, and a high-levellanguage library called TransPy, that cater to expert programmersand end-users, respectively.Our evaluations with Transmuter demonstrate average throughput (energy-efficiency) improvements of 5.0× (18.4×) and 4.2× (4.0×)over a high-end CPU and GPU, respectively, across a diverse set ofkernels predominant in graph analytics, scientific computing andmachine learning. Transmuter achieves energy-efficiency gains averaging 3.4× and 2.0× over prior FPGA and CGRA implementationsof the same kernels, while remaining on average within 9.3× ofstate-of-the-art ASICs.

AB - With the end of Dennard scaling and Moore's law, it is becoming increasingly difficult to build hardware for emerging applications thatmeet power and performance targets, while remaining flexible andprogrammable for end users. This is particularly true for domainsthat have frequently changing algorithms and applications involving mixed sparse/dense data structures, such as those in machinelearning and graph analytics. To overcome this, we present a flexibleaccelerator called Transmuter, in a novel effort to bridge the gap between General-Purpose Processors (GPPs) and Application-SpecificIntegrated Circuits (ASICs). Transmuter adapts to changing kernelcharacteristics, such as data reuse and control divergence, throughthe ability to reconfigure the on-chip memory type, resource sharingand dataflow at run-time within a short latency. This is facilitatedby a fabric of light-weight cores connected to a network of reconfigurable caches and crossbars. Transmuter addresses a rapidlygrowing set of algorithms exhibiting dynamic data movement patterns, irregularity, and sparsity, while delivering GPU-like efficiencies for traditional dense applications. Finally, in order to supportprogrammability and ease-of-adoption, we prototype a softwarestack composed of low-level runtime routines, and a high-levellanguage library called TransPy, that cater to expert programmersand end-users, respectively.Our evaluations with Transmuter demonstrate average throughput (energy-efficiency) improvements of 5.0× (18.4×) and 4.2× (4.0×)over a high-end CPU and GPU, respectively, across a diverse set ofkernels predominant in graph analytics, scientific computing andmachine learning. Transmuter achieves energy-efficiency gains averaging 3.4× and 2.0× over prior FPGA and CGRA implementationsof the same kernels, while remaining on average within 9.3× ofstate-of-the-art ASICs.

KW - Dataflow reconfiguration

KW - General-purpose acceleration

KW - Hardware acceleration

KW - Memory reconfiguration

KW - Reconfigurable architectures

UR - http://www.scopus.com/inward/record.url?scp=85094212659&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85094212659&partnerID=8YFLogxK

U2 - 10.1145/3410463.3414627

DO - 10.1145/3410463.3414627

M3 - Conference contribution

AN - SCOPUS:85094212659

T3 - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT

SP - 175

EP - 190

BT - PACT 2020 - Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 3 October 2020 through 7 October 2020

ER -

Transmuter: Bridging the efficiency gap using memory and dataflow reconfiguration

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this