Abstract
The design complexity and power consumption of hardware cache coherence logic increase considerably with the increase in number of cores. Although skipping coherence can simplify hardware and make it more power-efficient, programming becomes more challenging as programmers have to manually insert DMA instructions to ensure that there is coherence of shared data between cores. To reduce the burden of parallel programming, we propose program transformations and a runtime library that will enable correct execution of data-race-free multi-threaded programs. Our scheme manages coherence at byte granularity rather than conventional page-granularity. We further optimize the performance by introducing the concept of private write notice for each core and combining write notices in our coherence implementation. Experimental results of running multi-threaded signal processing benchmarks on the 8-core non-cache coherent Texas Instruments processor TMS320C6678 demonstrates that our technique achieves 12X performance improvement over naive scheme of disabling caches, and 2X performance improvement over the state-of-art technique.
Original language | English (US) |
---|---|
Title of host publication | Proceedings of the IEEE International Conference on VLSI Design |
Publisher | IEEE Computer Society |
Pages | 397-402 |
Number of pages | 6 |
Volume | 2016-March |
ISBN (Print) | 9781467387002 |
DOIs | |
State | Published - Mar 16 2016 |
Event | 29th International Conference on VLSI Design, VLSID 2016 - Kolkata, India Duration: Jan 4 2016 → Jan 8 2016 |
Other
Other | 29th International Conference on VLSI Design, VLSID 2016 |
---|---|
Country/Territory | India |
City | Kolkata |
Period | 1/4/16 → 1/8/16 |
Keywords
- Multi-core Processor
- Scratchpad Memory
- Software Coherence Management
- Software Managed Multicores
ASJC Scopus subject areas
- Electrical and Electronic Engineering
- Hardware and Architecture