TY - GEN
T1 - Operation tables for scheduling in the presence of incomplete bypassing
AU - Shrivastava, Aviral
AU - Earlie, Eugene
AU - Dutt, Nikil
AU - Nicolaut, Alex
PY - 2004
Y1 - 2004
N2 - Register bypassing is a powerful and widely used feature in modern processors to eliminate certain data hazards. Although complete bypassing is ideal for performance, bypassing has significant impact on cycle time, area, and power consumption of the processor. Due to the strict constraints on performance, cost and power consumption in embedded processors, architects need to evaluate and implement incomplete register bypassing mechanisms. However traditional data hazard detection and/or avoidance techniques used in retargetable schedulers break down in the presence of incomplete bypassing. In this paper, we present the concept of Operation Tables, which can be used to detect data hazards, even in the presence of incomplete bypassing. Further-more our technique integrates the detection of both data, as well as resource hazards, and can be easily employed in a compiler to generate better schedules. Our experimental results on the popular Intel XScale embedded processor platform show that even with a simple intra-basic block scheduling technique, we achieve upto 20% performance improvement over fully optimized GCC generated code on embedded applications from the MiBench suite.
AB - Register bypassing is a powerful and widely used feature in modern processors to eliminate certain data hazards. Although complete bypassing is ideal for performance, bypassing has significant impact on cycle time, area, and power consumption of the processor. Due to the strict constraints on performance, cost and power consumption in embedded processors, architects need to evaluate and implement incomplete register bypassing mechanisms. However traditional data hazard detection and/or avoidance techniques used in retargetable schedulers break down in the presence of incomplete bypassing. In this paper, we present the concept of Operation Tables, which can be used to detect data hazards, even in the presence of incomplete bypassing. Further-more our technique integrates the detection of both data, as well as resource hazards, and can be easily employed in a compiler to generate better schedules. Our experimental results on the popular Intel XScale embedded processor platform show that even with a simple intra-basic block scheduling technique, we achieve upto 20% performance improvement over fully optimized GCC generated code on embedded applications from the MiBench suite.
KW - Bypass
KW - Hazard Detection
KW - Operation Table
KW - Reservation Table
KW - Retargetable Compilers
KW - Scheduling
UR - http://www.scopus.com/inward/record.url?scp=16244386553&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=16244386553&partnerID=8YFLogxK
U2 - 10.1145/1016720.1016768
DO - 10.1145/1016720.1016768
M3 - Conference contribution
AN - SCOPUS:16244386553
SN - 1581139373
SN - 9781581139372
T3 - Second IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and Systems Synthesis, CODES+ISSS 2004
SP - 194
EP - 199
BT - International Conference on Hardware/Software Codesign and System Synthesis, CODES+ISSS 2004
PB - Association for Computing Machinery
T2 - Second IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, CODES+ISSS 2004
Y2 - 8 September 2004 through 10 September 2004
ER -