TY - GEN
T1 - Profile-Guided Parallel Task Extraction and Execution for Domain Specific Heterogeneous SoC
AU - Chang, Liangliang
AU - Mack, Joshua
AU - Willis, Benjamin
AU - Chen, Xing
AU - Brunhaver, John
AU - Akoglu, Ali
AU - Chakrabarti, Chaitali
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - In this study, we introduce a methodology for automatically transforming user applications in the radar and communication domain written in C C++ based on dynamic profiling to a parallel representation targeted for a heterogeneous SoC. We present our approach for instrumenting the user application binary during the compilation process with barrier synchronization primitives that enable runtime system schedule and execute independent tasks concurrently over the available compute resources. We demonstrate the capabilities of our integrated compile time and runtime flow through task-level parallel and functionally correct execution of real-life applications. We perform validation of our integrated system by executing four distinct applications each carrying various degrees of task level parallelism over the Xeon-based multi-core homogeneous processor. We use the proposed compilation and code transformation methodology to re-target each application for execution on a heterogeneous SoC composed of three ARM cores and one FFT accelerator that is emulated on the Xilinx Zynq Ultra Scale+ platform. We demonstrate our runtime's ability to process application binary, dispatch independent tasks over the available compute resources of the emulated SoC on the Zynq FPGA based on three different scheduling heuristics. Finally we demonstrate execution of each application individually with task level parallelism on the Zynq FPGA and execution of workload scenarios composed of multiple instances of the same application as well as mixture of two distinct applications to demonstrate ability to realize both application and task level parallel execution. Our integrated approach offers a path forward for application developers to take full advantage of the target SoC without requiring users to become hardware and parallel programming experts.
AB - In this study, we introduce a methodology for automatically transforming user applications in the radar and communication domain written in C C++ based on dynamic profiling to a parallel representation targeted for a heterogeneous SoC. We present our approach for instrumenting the user application binary during the compilation process with barrier synchronization primitives that enable runtime system schedule and execute independent tasks concurrently over the available compute resources. We demonstrate the capabilities of our integrated compile time and runtime flow through task-level parallel and functionally correct execution of real-life applications. We perform validation of our integrated system by executing four distinct applications each carrying various degrees of task level parallelism over the Xeon-based multi-core homogeneous processor. We use the proposed compilation and code transformation methodology to re-target each application for execution on a heterogeneous SoC composed of three ARM cores and one FFT accelerator that is emulated on the Xilinx Zynq Ultra Scale+ platform. We demonstrate our runtime's ability to process application binary, dispatch independent tasks over the available compute resources of the emulated SoC on the Zynq FPGA based on three different scheduling heuristics. Finally we demonstrate execution of each application individually with task level parallelism on the Zynq FPGA and execution of workload scenarios composed of multiple instances of the same application as well as mixture of two distinct applications to demonstrate ability to realize both application and task level parallel execution. Our integrated approach offers a path forward for application developers to take full advantage of the target SoC without requiring users to become hardware and parallel programming experts.
KW - Task-level parallelism
KW - dynamic profiling
KW - heterogeneous SoC and runtime
KW - parallelism detection
UR - http://www.scopus.com/inward/record.url?scp=85152020746&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85152020746&partnerID=8YFLogxK
U2 - 10.1109/ISPA-BDCloud-SocialCom-SustainCom57177.2022.00121
DO - 10.1109/ISPA-BDCloud-SocialCom-SustainCom57177.2022.00121
M3 - Conference contribution
AN - SCOPUS:85152020746
T3 - Proceedings - 20th IEEE International Symposium on Parallel and Distributed Processing with Applications, 12th IEEE International Conference on Big Data and Cloud Computing, 12th IEEE International Conference on Sustainable Computing and Communications and 15th IEEE International Conference on Social Computing and Networking, ISPA/BDCloud/SocialCom/SustainCom 2022
SP - 913
EP - 920
BT - Proceedings - 20th IEEE International Symposium on Parallel and Distributed Processing with Applications, 12th IEEE International Conference on Big Data and Cloud Computing, 12th IEEE International Conference on Sustainable Computing and Communications and 15th IEEE International Conference on Social Computing and Networking, ISPA/BDCloud/SocialCom/SustainCom 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 20th IEEE International Symposium on Parallel and Distributed Processing with Applications, 12th IEEE International Conference on Big Data and Cloud Computing, 12th IEEE International Conference on Sustainable Computing and Communications and 15th IEEE International Conference on Social Computing and Networking, ISPA/BDCloud/SocialCom/SustainCom 2022
Y2 - 17 December 2022 through 19 December 2022
ER -