LATTE-CC: Latency Tolerance Aware Adaptive Cache Compression Management for Energy Efficient GPUs

Akhil Arunkumar; Shin Ying Lee; Vignesh Soundararajan; Carole-Jean Wu

doi:10.1109/HPCA.2018.00028

LATTE-CC: Latency Tolerance Aware Adaptive Cache Compression Management for Energy Efficient GPUs

Akhil Arunkumar, Shin Ying Lee, Vignesh Soundararajan, Carole-Jean Wu

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

18 Scopus citations

Abstract

General-purpose GPU applications are significantly constrained by the efficiency of the memory subsystem and the availability of data cache capacity on GPUs. Cache compression, while is able to expand the effective cache capacity and improve cache efficiency, comes with the cost of increased hit latency. This has constrained the application of cache compression to mostly lower level caches, leaving it unexplored for L1 caches and for GPUs. Directly applying state-of-the-art high performance cache compression schemes on GPUs results in a wide performance variation from -52% to 48%. To maximize the performance and energy benefits of cache compression for GPUs, we propose a new compression management scheme, called LATTE-CC. LATTE-CC is designed to exploit the dynamically-varying latency tolerance feature of GPUs. LATTE-CC compresses cache lines based on its prediction of the degree of latency tolerance of GPU streaming multiprocessors and by choosing between three distinct compression modes: no compression, low-latency, and high-capacity. LATTE-CC improves the performance of cache sensitive GPGPU applications by as much as 48.4% and by an average of 19.2%, outperforming the static application of compression algorithms. LATTE-CC also reduces GPU energy consumption by an average of 10%, which is twice as much as that of the state-of-the-art compression scheme.

Original language	English (US)
Title of host publication	Proceedings - 24th IEEE International Symposium on High Performance Computer Architecture, HPCA 2018
Publisher	IEEE Computer Society
Pages	221-234
Number of pages	14
ISBN (Electronic)	9781538636596
DOIs	https://doi.org/10.1109/HPCA.2018.00028
State	Published - Mar 27 2018
Event	24th IEEE International Symposium on High Performance Computer Architecture, HPCA 2018 - Vienna, Austria Duration: Feb 24 2018 → Feb 28 2018

Publication series

Name	Proceedings - International Symposium on High-Performance Computer Architecture
Volume	2018-February
ISSN (Print)	1530-0897

Other

Other	24th IEEE International Symposium on High Performance Computer Architecture, HPCA 2018
Country/Territory	Austria
City	Vienna
Period	2/24/18 → 2/28/18

Keywords

Cache Compression
Graphics Processing Units
Latency Tolerance

ASJC Scopus subject areas

Hardware and Architecture

Access to Document

10.1109/HPCA.2018.00028

Cite this

Arunkumar, A., Lee, S. Y., Soundararajan, V., & Wu, C.-J. (2018). LATTE-CC: Latency Tolerance Aware Adaptive Cache Compression Management for Energy Efficient GPUs. In Proceedings - 24th IEEE International Symposium on High Performance Computer Architecture, HPCA 2018 (pp. 221-234). (Proceedings - International Symposium on High-Performance Computer Architecture; Vol. 2018-February). IEEE Computer Society. https://doi.org/10.1109/HPCA.2018.00028

LATTE-CC: Latency Tolerance Aware Adaptive Cache Compression Management for Energy Efficient GPUs. / Arunkumar, Akhil; Lee, Shin Ying; Soundararajan, Vignesh et al.
Proceedings - 24th IEEE International Symposium on High Performance Computer Architecture, HPCA 2018. IEEE Computer Society, 2018. p. 221-234 (Proceedings - International Symposium on High-Performance Computer Architecture; Vol. 2018-February).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Arunkumar, A, Lee, SY, Soundararajan, V & Wu, C-J 2018, LATTE-CC: Latency Tolerance Aware Adaptive Cache Compression Management for Energy Efficient GPUs. in Proceedings - 24th IEEE International Symposium on High Performance Computer Architecture, HPCA 2018. Proceedings - International Symposium on High-Performance Computer Architecture, vol. 2018-February, IEEE Computer Society, pp. 221-234, 24th IEEE International Symposium on High Performance Computer Architecture, HPCA 2018, Vienna, Austria, 2/24/18. https://doi.org/10.1109/HPCA.2018.00028

Arunkumar A, Lee SY, Soundararajan V, Wu CJ. LATTE-CC: Latency Tolerance Aware Adaptive Cache Compression Management for Energy Efficient GPUs. In Proceedings - 24th IEEE International Symposium on High Performance Computer Architecture, HPCA 2018. IEEE Computer Society. 2018. p. 221-234. (Proceedings - International Symposium on High-Performance Computer Architecture). doi: 10.1109/HPCA.2018.00028

Arunkumar, Akhil ; Lee, Shin Ying ; Soundararajan, Vignesh et al. / LATTE-CC : Latency Tolerance Aware Adaptive Cache Compression Management for Energy Efficient GPUs. Proceedings - 24th IEEE International Symposium on High Performance Computer Architecture, HPCA 2018. IEEE Computer Society, 2018. pp. 221-234 (Proceedings - International Symposium on High-Performance Computer Architecture).

@inproceedings{4b79a32cf84a4b1abf66e288cc78d27e,

title = "LATTE-CC: Latency Tolerance Aware Adaptive Cache Compression Management for Energy Efficient GPUs",

abstract = "General-purpose GPU applications are significantly constrained by the efficiency of the memory subsystem and the availability of data cache capacity on GPUs. Cache compression, while is able to expand the effective cache capacity and improve cache efficiency, comes with the cost of increased hit latency. This has constrained the application of cache compression to mostly lower level caches, leaving it unexplored for L1 caches and for GPUs. Directly applying state-of-the-art high performance cache compression schemes on GPUs results in a wide performance variation from -52% to 48%. To maximize the performance and energy benefits of cache compression for GPUs, we propose a new compression management scheme, called LATTE-CC. LATTE-CC is designed to exploit the dynamically-varying latency tolerance feature of GPUs. LATTE-CC compresses cache lines based on its prediction of the degree of latency tolerance of GPU streaming multiprocessors and by choosing between three distinct compression modes: no compression, low-latency, and high-capacity. LATTE-CC improves the performance of cache sensitive GPGPU applications by as much as 48.4% and by an average of 19.2%, outperforming the static application of compression algorithms. LATTE-CC also reduces GPU energy consumption by an average of 10%, which is twice as much as that of the state-of-the-art compression scheme.",

keywords = "Cache Compression, Graphics Processing Units, Latency Tolerance",

author = "Akhil Arunkumar and Lee, {Shin Ying} and Vignesh Soundararajan and Carole-Jean Wu",

note = "Publisher Copyright: {\textcopyright} 2018 IEEE.; 24th IEEE International Symposium on High Performance Computer Architecture, HPCA 2018 ; Conference date: 24-02-2018 Through 28-02-2018",

year = "2018",

month = mar,

day = "27",

doi = "10.1109/HPCA.2018.00028",

language = "English (US)",

series = "Proceedings - International Symposium on High-Performance Computer Architecture",

publisher = "IEEE Computer Society",

pages = "221--234",

booktitle = "Proceedings - 24th IEEE International Symposium on High Performance Computer Architecture, HPCA 2018",

}

TY - GEN

T1 - LATTE-CC

T2 - 24th IEEE International Symposium on High Performance Computer Architecture, HPCA 2018

AU - Arunkumar, Akhil

AU - Lee, Shin Ying

AU - Soundararajan, Vignesh

AU - Wu, Carole-Jean

PY - 2018/3/27

Y1 - 2018/3/27

N2 - General-purpose GPU applications are significantly constrained by the efficiency of the memory subsystem and the availability of data cache capacity on GPUs. Cache compression, while is able to expand the effective cache capacity and improve cache efficiency, comes with the cost of increased hit latency. This has constrained the application of cache compression to mostly lower level caches, leaving it unexplored for L1 caches and for GPUs. Directly applying state-of-the-art high performance cache compression schemes on GPUs results in a wide performance variation from -52% to 48%. To maximize the performance and energy benefits of cache compression for GPUs, we propose a new compression management scheme, called LATTE-CC. LATTE-CC is designed to exploit the dynamically-varying latency tolerance feature of GPUs. LATTE-CC compresses cache lines based on its prediction of the degree of latency tolerance of GPU streaming multiprocessors and by choosing between three distinct compression modes: no compression, low-latency, and high-capacity. LATTE-CC improves the performance of cache sensitive GPGPU applications by as much as 48.4% and by an average of 19.2%, outperforming the static application of compression algorithms. LATTE-CC also reduces GPU energy consumption by an average of 10%, which is twice as much as that of the state-of-the-art compression scheme.

AB - General-purpose GPU applications are significantly constrained by the efficiency of the memory subsystem and the availability of data cache capacity on GPUs. Cache compression, while is able to expand the effective cache capacity and improve cache efficiency, comes with the cost of increased hit latency. This has constrained the application of cache compression to mostly lower level caches, leaving it unexplored for L1 caches and for GPUs. Directly applying state-of-the-art high performance cache compression schemes on GPUs results in a wide performance variation from -52% to 48%. To maximize the performance and energy benefits of cache compression for GPUs, we propose a new compression management scheme, called LATTE-CC. LATTE-CC is designed to exploit the dynamically-varying latency tolerance feature of GPUs. LATTE-CC compresses cache lines based on its prediction of the degree of latency tolerance of GPU streaming multiprocessors and by choosing between three distinct compression modes: no compression, low-latency, and high-capacity. LATTE-CC improves the performance of cache sensitive GPGPU applications by as much as 48.4% and by an average of 19.2%, outperforming the static application of compression algorithms. LATTE-CC also reduces GPU energy consumption by an average of 10%, which is twice as much as that of the state-of-the-art compression scheme.

KW - Cache Compression

KW - Graphics Processing Units

KW - Latency Tolerance

UR - http://www.scopus.com/inward/record.url?scp=85046709116&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85046709116&partnerID=8YFLogxK

U2 - 10.1109/HPCA.2018.00028

DO - 10.1109/HPCA.2018.00028

M3 - Conference contribution

AN - SCOPUS:85046709116

T3 - Proceedings - International Symposium on High-Performance Computer Architecture

SP - 221

EP - 234

BT - Proceedings - 24th IEEE International Symposium on High Performance Computer Architecture, HPCA 2018

PB - IEEE Computer Society

Y2 - 24 February 2018 through 28 February 2018

ER -

LATTE-CC: Latency Tolerance Aware Adaptive Cache Compression Management for Energy Efficient GPUs

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this