TY - GEN
T1 - Understanding the future of energy efficiency in multi-module GPUs
AU - Arunkumar, Akhil
AU - Bolotin, Evgeny
AU - Nellans, David
AU - Wu, Carole-Jean
N1 - Funding Information:
ACKNOWLEDGEMENTS The authors would like to thank the anonymous reviewers for their insightful feedback, which has been used to improve the paper. The authors would also like to thank NVIDIA for the equipment donation. This work is supported in part by the National Science Foundation under CCF-1618039 and CCF-1525462.
Funding Information:
This work is supported in part by the National Science Foundation under CCF-1618039 and CCF-1525462.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/3/26
Y1 - 2019/3/26
N2 - As Moore's law slows down, GPUs must pivot towards multi-module designs to continue scaling performance at historical rates. Prior work on multi-module GPUs has focused on performance, while largely ignoring the issue of energy efficiency. In this work, we propose a new metric for GPU efficiency called EDP Scaling Efficiency that quantifies the effects of both strong performance scaling and overall energy efficiency in these designs. To enable this analysis, we develop a novel top-down GPU energy estimation framework that is accurate within 10% of a recent GPU design. Being decoupled from granular GPU microarchitectural details, the framework is appropriate for energy efficiency studies in future GPUs. Using this model in conjunction with performance simulation, we show that the dominating factor influencing the energy efficiency of GPUs over the next decade is GPUmodule (GPM) idle time. Furthermore, neither inter-module interconnect energy, nor GPM microarchitectural design is expected to play a key role in this regard. We demonstrate that multi-module GPUs are on a trajectory to become 2× less energy efficient than current monolithic designs; a significant issue for data centers which are already energy constrained. Finally, we show that architects must be willing to spend more (not less) energy to enable higher bandwidth inter-GPM connections, because counter-intuitively, this additional energy expenditure can reduce total GPU energy consumption by as much as 45%, providing a path to energy efficient strong scaling in the future.
AB - As Moore's law slows down, GPUs must pivot towards multi-module designs to continue scaling performance at historical rates. Prior work on multi-module GPUs has focused on performance, while largely ignoring the issue of energy efficiency. In this work, we propose a new metric for GPU efficiency called EDP Scaling Efficiency that quantifies the effects of both strong performance scaling and overall energy efficiency in these designs. To enable this analysis, we develop a novel top-down GPU energy estimation framework that is accurate within 10% of a recent GPU design. Being decoupled from granular GPU microarchitectural details, the framework is appropriate for energy efficiency studies in future GPUs. Using this model in conjunction with performance simulation, we show that the dominating factor influencing the energy efficiency of GPUs over the next decade is GPUmodule (GPM) idle time. Furthermore, neither inter-module interconnect energy, nor GPM microarchitectural design is expected to play a key role in this regard. We demonstrate that multi-module GPUs are on a trajectory to become 2× less energy efficient than current monolithic designs; a significant issue for data centers which are already energy constrained. Finally, we show that architects must be willing to spend more (not less) energy to enable higher bandwidth inter-GPM connections, because counter-intuitively, this additional energy expenditure can reduce total GPU energy consumption by as much as 45%, providing a path to energy efficient strong scaling in the future.
KW - Energy Efficiency
KW - Energy Model
KW - GPU
KW - Moore's Law
KW - Multi Chip Module
KW - NUMA
UR - http://www.scopus.com/inward/record.url?scp=85064192758&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85064192758&partnerID=8YFLogxK
U2 - 10.1109/HPCA.2019.00063
DO - 10.1109/HPCA.2019.00063
M3 - Conference contribution
AN - SCOPUS:85064192758
T3 - Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019
SP - 519
EP - 532
BT - Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019
Y2 - 16 February 2019 through 20 February 2019
ER -