TY - GEN
T1 - LATTE-CC
T2 - 24th IEEE International Symposium on High Performance Computer Architecture, HPCA 2018
AU - Arunkumar, Akhil
AU - Lee, Shin Ying
AU - Soundararajan, Vignesh
AU - Wu, Carole-Jean
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/3/27
Y1 - 2018/3/27
N2 - General-purpose GPU applications are significantly constrained by the efficiency of the memory subsystem and the availability of data cache capacity on GPUs. Cache compression, while is able to expand the effective cache capacity and improve cache efficiency, comes with the cost of increased hit latency. This has constrained the application of cache compression to mostly lower level caches, leaving it unexplored for L1 caches and for GPUs. Directly applying state-of-the-art high performance cache compression schemes on GPUs results in a wide performance variation from -52% to 48%. To maximize the performance and energy benefits of cache compression for GPUs, we propose a new compression management scheme, called LATTE-CC. LATTE-CC is designed to exploit the dynamically-varying latency tolerance feature of GPUs. LATTE-CC compresses cache lines based on its prediction of the degree of latency tolerance of GPU streaming multiprocessors and by choosing between three distinct compression modes: no compression, low-latency, and high-capacity. LATTE-CC improves the performance of cache sensitive GPGPU applications by as much as 48.4% and by an average of 19.2%, outperforming the static application of compression algorithms. LATTE-CC also reduces GPU energy consumption by an average of 10%, which is twice as much as that of the state-of-the-art compression scheme.
AB - General-purpose GPU applications are significantly constrained by the efficiency of the memory subsystem and the availability of data cache capacity on GPUs. Cache compression, while is able to expand the effective cache capacity and improve cache efficiency, comes with the cost of increased hit latency. This has constrained the application of cache compression to mostly lower level caches, leaving it unexplored for L1 caches and for GPUs. Directly applying state-of-the-art high performance cache compression schemes on GPUs results in a wide performance variation from -52% to 48%. To maximize the performance and energy benefits of cache compression for GPUs, we propose a new compression management scheme, called LATTE-CC. LATTE-CC is designed to exploit the dynamically-varying latency tolerance feature of GPUs. LATTE-CC compresses cache lines based on its prediction of the degree of latency tolerance of GPU streaming multiprocessors and by choosing between three distinct compression modes: no compression, low-latency, and high-capacity. LATTE-CC improves the performance of cache sensitive GPGPU applications by as much as 48.4% and by an average of 19.2%, outperforming the static application of compression algorithms. LATTE-CC also reduces GPU energy consumption by an average of 10%, which is twice as much as that of the state-of-the-art compression scheme.
KW - Cache Compression
KW - Graphics Processing Units
KW - Latency Tolerance
UR - http://www.scopus.com/inward/record.url?scp=85046709116&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85046709116&partnerID=8YFLogxK
U2 - 10.1109/HPCA.2018.00028
DO - 10.1109/HPCA.2018.00028
M3 - Conference contribution
AN - SCOPUS:85046709116
T3 - Proceedings - International Symposium on High-Performance Computer Architecture
SP - 221
EP - 234
BT - Proceedings - 24th IEEE International Symposium on High Performance Computer Architecture, HPCA 2018
PB - IEEE Computer Society
Y2 - 24 February 2018 through 28 February 2018
ER -