TY - JOUR
T1 - GEVO
T2 - GPU Code Optimization Using Evolutionary Computation
AU - Liou, Jhe Yu
AU - Wang, Xiaodong
AU - Forrest, Stephanie
AU - Wu, Carole Jean
N1 - Funding Information:
This work is supported in part by the National Science Foundation under CCF-1618039 SHF-1652132, CCF-1908633; DARPA FA8750-19C-0003; AFRL FA8750-19-1-0501 for Jhe-Yu Liou, Stephanie Forrest, and Carole-Jean Wu at ASU. Authors’ addresses: J.-Y. Liou, Arizona State University, 1151 S. Forest Ave, Tempe, AZ 85287; email: jhe-yu.liou@asu.edu; X. Wang, Facebook, 1 Hacker Way, Menlo Park, CA 94025; email: xdwang@fb.com; S. Forrest, Arizona State University, 1151 S. Forest Ave, Tempe, AZ 85287 and Santa Fe Institute, 1399 Hyde Park Rd, Santa Fe, NM 87501; email: stephanie.forrest@asu.edu; C.-J. Wu, Arizona State University, 1151 S. Forest Ave, Tempe, AZ 85287 and Facebook, 1 Hacker Way, Menlo Park, CA 94025; email: carole-jean.wu@asu.edu.
Publisher Copyright:
© 2020 ACM.
PY - 2020/11
Y1 - 2020/11
N2 - GPUs are a key enabler of the revolution in machine learning and high-performance computing, functioning as de facto co-processors to accelerate large-scale computation. As the programming stack and tool support have matured, GPUs have also become accessible to programmers, who may lack detailed knowledge of the underlying architecture and fail to fully leverage the GPU's computation power. GEVO (Gpu optimization using EVOlutionary computation) is a tool for automatically discovering optimization opportunities and tuning the performance of GPU kernels in the LLVM representation. GEVO uses population-based search to find edits to GPU code compiled to LLVM-IR and improves performance on desired criteria while retaining required functionality. We demonstrate that GEVO improves the execution time of general-purpose GPU programs and machine learning (ML) models on NVIDIA Tesla P100. For the Rodinia benchmarks, GEVO improves GPU kernel runtime performance by an average of 49.48% and by as much as 412% over the fully compiler-optimized baseline. If kernel output accuracy is relaxed to tolerate up to 1% error, GEVO can find kernel variants that outperform the baseline by an average of 51.08%. For the ML workloads, GEVO achieves kernel performance improvement for SVM on the MNIST handwriting recognition (3.24×) and the a9a income prediction (2.93×) datasets with no loss of model accuracy. GEVO achieves 1.79× kernel performance improvement on image classification using ResNet18/CIFAR-10, with less than 1% model accuracy reduction.
AB - GPUs are a key enabler of the revolution in machine learning and high-performance computing, functioning as de facto co-processors to accelerate large-scale computation. As the programming stack and tool support have matured, GPUs have also become accessible to programmers, who may lack detailed knowledge of the underlying architecture and fail to fully leverage the GPU's computation power. GEVO (Gpu optimization using EVOlutionary computation) is a tool for automatically discovering optimization opportunities and tuning the performance of GPU kernels in the LLVM representation. GEVO uses population-based search to find edits to GPU code compiled to LLVM-IR and improves performance on desired criteria while retaining required functionality. We demonstrate that GEVO improves the execution time of general-purpose GPU programs and machine learning (ML) models on NVIDIA Tesla P100. For the Rodinia benchmarks, GEVO improves GPU kernel runtime performance by an average of 49.48% and by as much as 412% over the fully compiler-optimized baseline. If kernel output accuracy is relaxed to tolerate up to 1% error, GEVO can find kernel variants that outperform the baseline by an average of 51.08%. For the ML workloads, GEVO achieves kernel performance improvement for SVM on the MNIST handwriting recognition (3.24×) and the a9a income prediction (2.93×) datasets with no loss of model accuracy. GEVO achieves 1.79× kernel performance improvement on image classification using ResNet18/CIFAR-10, with less than 1% model accuracy reduction.
KW - GPU code optimization
KW - Genetic improvement
KW - LLVM intermediate representation
KW - approximate computing
KW - multi-objective evolutionary computation
UR - http://www.scopus.com/inward/record.url?scp=85097228723&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85097228723&partnerID=8YFLogxK
U2 - 10.1145/3418055
DO - 10.1145/3418055
M3 - Article
AN - SCOPUS:85097228723
SN - 1544-3566
VL - 17
JO - ACM Transactions on Architecture and Code Optimization
JF - ACM Transactions on Architecture and Code Optimization
IS - 4
M1 - 3418055
ER -