TY - JOUR
T1 - Cyclebite
T2 - Extracting Task Graphs From Unstructured Compute-Programs
AU - Willis, Benjamin R.
AU - Shrivastava, Aviral
AU - Mack, Joshua
AU - Dave, Shail
AU - Chakrabarti, Chaitali
AU - Brunhaver, John
N1 - Publisher Copyright:
© 2023 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
PY - 2024/1/1
Y1 - 2024/1/1
N2 - —Extracting portable performance in an application requires structuring that program into a data-flow graph of coarse-grained tasks (CGTs). Structuring applications that interconnect multiple external libraries and custom code (i.e., “Code From The Wild” (CFTW)) is challenging. When experts manually restructure a program, they trivialize the extraction of structure; however, this expertise is not broadly available. Automatic structuring approaches focus on the intersection of hot code and static loops, ignoring the data dependencies between tasks and significantly reducing the scope of analyzeable programs. This work addresses the problem of extracting the data-flow graph of CGTs from CFTW. To that end, we present Cyclebite. Our approach extracts CGTs from unstructured compute-programs by detecting CGT candidates in the simplified Markov Control Graph (MCG), and localizing CGTs in an epoch profile. Additionally, the epoch profile extracts the data dependence between CGTs required to build the data-flow graph of CGTs. Cyclebite demonstrates a robust selectivity for critical CGTs relative to the state-of-the-art (SoA), leading to a potential speedup of 12x on average and thread-scaling of 24x on average compared to modern compiler optimizers. We validate the results of Cyclebite and compare them to two SoA techniques using an input corpus of 25 open-source C/C++ libraries with 2,019 unique execution profiles.
AB - —Extracting portable performance in an application requires structuring that program into a data-flow graph of coarse-grained tasks (CGTs). Structuring applications that interconnect multiple external libraries and custom code (i.e., “Code From The Wild” (CFTW)) is challenging. When experts manually restructure a program, they trivialize the extraction of structure; however, this expertise is not broadly available. Automatic structuring approaches focus on the intersection of hot code and static loops, ignoring the data dependencies between tasks and significantly reducing the scope of analyzeable programs. This work addresses the problem of extracting the data-flow graph of CGTs from CFTW. To that end, we present Cyclebite. Our approach extracts CGTs from unstructured compute-programs by detecting CGT candidates in the simplified Markov Control Graph (MCG), and localizing CGTs in an epoch profile. Additionally, the epoch profile extracts the data dependence between CGTs required to build the data-flow graph of CGTs. Cyclebite demonstrates a robust selectivity for critical CGTs relative to the state-of-the-art (SoA), leading to a potential speedup of 12x on average and thread-scaling of 24x on average compared to modern compiler optimizers. We validate the results of Cyclebite and compare them to two SoA techniques using an input corpus of 25 open-source C/C++ libraries with 2,019 unique execution profiles.
KW - Produce-consume task graph
KW - dynamic control flow graph
KW - epoch
KW - memory dependency analysis
KW - task partitioning
UR - http://www.scopus.com/inward/record.url?scp=85181575483&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85181575483&partnerID=8YFLogxK
U2 - 10.1109/TC.2023.3327504
DO - 10.1109/TC.2023.3327504
M3 - Article
AN - SCOPUS:85181575483
SN - 0018-9340
VL - 73
SP - 221
EP - 234
JO - IEEE Transactions on Computers
JF - IEEE Transactions on Computers
IS - 1
ER -