TY - GEN
T1 - Ctrl-C
T2 - 34th IEEE International Conference on Computer Design, ICCD 2016
AU - Lee, Shin Ying
AU - Wu, Carole-Jean
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/11/22
Y1 - 2016/11/22
N2 - The performance of general-purpose graphics processing units (GPGPUs) is often limited by the efficiency of the memory subsystems, particularly the L1 data caches. Because of the massive multithreading computation paradigm, significant memory resource contention and cache thrashing are often observed in GPGPU workloads. This leads to high cache miss rates and substantial pipeline stall time. In order to improve the efficiency of GPU caches, we propose an instruction-aware control loop based adaptive cache bypassing design (Ctrl-C). Ctrl-C applies an instruction-aware algorithm to dynamically identify per-memory instruction cache reuse behavior. Ctrl-C then adopts feedback control loops to bypass memory requests probabilistically in order to protect cache lines with short reuse distances from early eviction. GPGPU-sim simulation based evaluation shows that Ctrl-C improves the performance of cache sensitive GPGPU workloads by 41.5%, leading to higher cache and interconnect bandwidth utilization with only an insignificant 3.5% area overhead.
AB - The performance of general-purpose graphics processing units (GPGPUs) is often limited by the efficiency of the memory subsystems, particularly the L1 data caches. Because of the massive multithreading computation paradigm, significant memory resource contention and cache thrashing are often observed in GPGPU workloads. This leads to high cache miss rates and substantial pipeline stall time. In order to improve the efficiency of GPU caches, we propose an instruction-aware control loop based adaptive cache bypassing design (Ctrl-C). Ctrl-C applies an instruction-aware algorithm to dynamically identify per-memory instruction cache reuse behavior. Ctrl-C then adopts feedback control loops to bypass memory requests probabilistically in order to protect cache lines with short reuse distances from early eviction. GPGPU-sim simulation based evaluation shows that Ctrl-C improves the performance of cache sensitive GPGPU workloads by 41.5%, leading to higher cache and interconnect bandwidth utilization with only an insignificant 3.5% area overhead.
UR - https://www.scopus.com/pages/publications/85006826095
UR - https://www.scopus.com/pages/publications/85006826095#tab=citedBy
U2 - 10.1109/ICCD.2016.7753271
DO - 10.1109/ICCD.2016.7753271
M3 - Conference contribution
AN - SCOPUS:85006826095
T3 - Proceedings of the 34th IEEE International Conference on Computer Design, ICCD 2016
SP - 133
EP - 140
BT - Proceedings of the 34th IEEE International Conference on Computer Design, ICCD 2016
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 2 October 2016 through 5 October 2016
ER -