TY - GEN
T1 - Understanding the Power of Evolutionary Computation for GPU Code Optimization
AU - Liou, Jhe Yu
AU - Awan, Muaaz
AU - Hofmeyr, Steven
AU - Forrest, Stephanie
AU - Wu, Carole Jean
N1 - Funding Information:
This work is supported in part by the National Science Foundation under grants CCF-1652132, CCF-1618039, and CCF-2211750. The authors acknowledge support for computational resources from the ASU Research Technology Office and the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility located at Lawrence Berkeley National Laboratory, operated under Contract No. DE-AC02- 05CH11231. The authors would also like to thank Antonio Espinoza, Joshua Daymude, Joseph Renzullo, Kirtus Leyba, Pemma Reiter, and the anonymous reviewers for their valuable comments and suggestions.
Funding Information:
This work is supported in part by the National Science Foundation under grants CCF-1652132, CCF-1618039, and CCF-2211750. The authors acknowledge support for computational resources from the ASU Research Technology Office and the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility located at Lawrence Berkeley National Laboratory, operated under Contract No. DE-AC02-05CH11231. The authors would also like to thank Antonio Espinoza, Joshua Daymude, Joseph Renzullo, Kirtus Leyba, Pemma Reiter, and the anonymous reviewers for their valuable comments and suggestions.
Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Achieving high performance for GPU codes requires developers to have significant knowledge in parallel programming and GPU architectures, and in-depth understanding of the application. This combination makes it challenging to find performance optimizations for GPU-based applications, especially in scientific computing. This paper shows that significant speedups can be achieved on two quite different scientific workloads using the tool, GEVO, to improve performance over human-optimized GPU code. GEVO uses evolutionary computation to find code edits that improve the runtime of a multiple sequence alignment kernel and a SARS-CoV-2 simulation by 28.9% and 29% respectively. Further, when GEVO begins with an early, unoptimized version of the sequence alignment program, it finds an impressive 30 times speedup-a performance improvement similar to that of the hand-tuned version. This work presents an in-depth analysis of the discovered optimizations, revealing that the primary sources of improvement vary across applications; that most of the optimizations generalize across GPU architectures; and that several of the most important optimizations involve significant code interdependencies. The results showcase the potential of automated program optimization tools to help reduce the optimization burden for scientific computing developers and enhance performance portability for domain-specific accelerators.
AB - Achieving high performance for GPU codes requires developers to have significant knowledge in parallel programming and GPU architectures, and in-depth understanding of the application. This combination makes it challenging to find performance optimizations for GPU-based applications, especially in scientific computing. This paper shows that significant speedups can be achieved on two quite different scientific workloads using the tool, GEVO, to improve performance over human-optimized GPU code. GEVO uses evolutionary computation to find code edits that improve the runtime of a multiple sequence alignment kernel and a SARS-CoV-2 simulation by 28.9% and 29% respectively. Further, when GEVO begins with an early, unoptimized version of the sequence alignment program, it finds an impressive 30 times speedup-a performance improvement similar to that of the hand-tuned version. This work presents an in-depth analysis of the discovered optimizations, revealing that the primary sources of improvement vary across applications; that most of the optimizations generalize across GPU architectures; and that several of the most important optimizations involve significant code interdependencies. The results showcase the potential of automated program optimization tools to help reduce the optimization burden for scientific computing developers and enhance performance portability for domain-specific accelerators.
UR - http://www.scopus.com/inward/record.url?scp=85145661483&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85145661483&partnerID=8YFLogxK
U2 - 10.1109/IISWC55918.2022.00025
DO - 10.1109/IISWC55918.2022.00025
M3 - Conference contribution
AN - SCOPUS:85145661483
T3 - Proceedings - 2022 IEEE International Symposium on Workload Characterization, IISWC 2022
SP - 185
EP - 198
BT - Proceedings - 2022 IEEE International Symposium on Workload Characterization, IISWC 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 IEEE International Symposium on Workload Characterization, IISWC 2022
Y2 - 6 November 2022 through 8 November 2022
ER -