Software Coherence Management on Non-coherent Cache Multi-cores

Jian Cai; Aviral Shrivastava

doi:10.1109/VLSID.2016.70

Software Coherence Management on Non-coherent Cache Multi-cores

Jian Cai, Aviral Shrivastava

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

5 Scopus citations

Abstract

The design complexity and power consumption of hardware cache coherence logic increase considerably with the increase in number of cores. Although skipping coherence can simplify hardware and make it more power-efficient, programming becomes more challenging as programmers have to manually insert DMA instructions to ensure that there is coherence of shared data between cores. To reduce the burden of parallel programming, we propose program transformations and a runtime library that will enable correct execution of data-race-free multi-threaded programs. Our scheme manages coherence at byte granularity rather than conventional page-granularity. We further optimize the performance by introducing the concept of private write notice for each core and combining write notices in our coherence implementation. Experimental results of running multi-threaded signal processing benchmarks on the 8-core non-cache coherent Texas Instruments processor TMS320C6678 demonstrates that our technique achieves 12X performance improvement over naive scheme of disabling caches, and 2X performance improvement over the state-of-art technique.

Original language	English (US)
Title of host publication	Proceedings of the IEEE International Conference on VLSI Design
Publisher	IEEE Computer Society
Pages	397-402
Number of pages	6
Volume	2016-March
ISBN (Print)	9781467387002
DOIs	https://doi.org/10.1109/VLSID.2016.70
State	Published - Mar 16 2016
Event	29th International Conference on VLSI Design, VLSID 2016 - Kolkata, India Duration: Jan 4 2016 → Jan 8 2016

Other

Other	29th International Conference on VLSI Design, VLSID 2016
Country/Territory	India
City	Kolkata
Period	1/4/16 → 1/8/16

Keywords

Multi-core Processor
Scratchpad Memory
Software Coherence Management
Software Managed Multicores

ASJC Scopus subject areas

Electrical and Electronic Engineering
Hardware and Architecture

Access to Document

10.1109/VLSID.2016.70

Cite this

@inproceedings{314440ea1ef2456b8b4aeb74b236c5dc,

title = "Software Coherence Management on Non-coherent Cache Multi-cores",

abstract = "The design complexity and power consumption of hardware cache coherence logic increase considerably with the increase in number of cores. Although skipping coherence can simplify hardware and make it more power-efficient, programming becomes more challenging as programmers have to manually insert DMA instructions to ensure that there is coherence of shared data between cores. To reduce the burden of parallel programming, we propose program transformations and a runtime library that will enable correct execution of data-race-free multi-threaded programs. Our scheme manages coherence at byte granularity rather than conventional page-granularity. We further optimize the performance by introducing the concept of private write notice for each core and combining write notices in our coherence implementation. Experimental results of running multi-threaded signal processing benchmarks on the 8-core non-cache coherent Texas Instruments processor TMS320C6678 demonstrates that our technique achieves 12X performance improvement over naive scheme of disabling caches, and 2X performance improvement over the state-of-art technique.",

keywords = "Multi-core Processor, Scratchpad Memory, Software Coherence Management, Software Managed Multicores",

author = "Jian Cai and Aviral Shrivastava",

year = "2016",

month = mar,

day = "16",

doi = "10.1109/VLSID.2016.70",

language = "English (US)",

isbn = "9781467387002",

volume = "2016-March",

pages = "397--402",

booktitle = "Proceedings of the IEEE International Conference on VLSI Design",

publisher = "IEEE Computer Society",

note = "29th International Conference on VLSI Design, VLSID 2016 ; Conference date: 04-01-2016 Through 08-01-2016",

}

TY - GEN

T1 - Software Coherence Management on Non-coherent Cache Multi-cores

AU - Cai, Jian

AU - Shrivastava, Aviral

PY - 2016/3/16

Y1 - 2016/3/16

N2 - The design complexity and power consumption of hardware cache coherence logic increase considerably with the increase in number of cores. Although skipping coherence can simplify hardware and make it more power-efficient, programming becomes more challenging as programmers have to manually insert DMA instructions to ensure that there is coherence of shared data between cores. To reduce the burden of parallel programming, we propose program transformations and a runtime library that will enable correct execution of data-race-free multi-threaded programs. Our scheme manages coherence at byte granularity rather than conventional page-granularity. We further optimize the performance by introducing the concept of private write notice for each core and combining write notices in our coherence implementation. Experimental results of running multi-threaded signal processing benchmarks on the 8-core non-cache coherent Texas Instruments processor TMS320C6678 demonstrates that our technique achieves 12X performance improvement over naive scheme of disabling caches, and 2X performance improvement over the state-of-art technique.

AB - The design complexity and power consumption of hardware cache coherence logic increase considerably with the increase in number of cores. Although skipping coherence can simplify hardware and make it more power-efficient, programming becomes more challenging as programmers have to manually insert DMA instructions to ensure that there is coherence of shared data between cores. To reduce the burden of parallel programming, we propose program transformations and a runtime library that will enable correct execution of data-race-free multi-threaded programs. Our scheme manages coherence at byte granularity rather than conventional page-granularity. We further optimize the performance by introducing the concept of private write notice for each core and combining write notices in our coherence implementation. Experimental results of running multi-threaded signal processing benchmarks on the 8-core non-cache coherent Texas Instruments processor TMS320C6678 demonstrates that our technique achieves 12X performance improvement over naive scheme of disabling caches, and 2X performance improvement over the state-of-art technique.

KW - Multi-core Processor

KW - Scratchpad Memory

KW - Software Coherence Management

KW - Software Managed Multicores

UR - http://www.scopus.com/inward/record.url?scp=84964644446&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84964644446&partnerID=8YFLogxK

U2 - 10.1109/VLSID.2016.70

DO - 10.1109/VLSID.2016.70

M3 - Conference contribution

AN - SCOPUS:84964644446

SN - 9781467387002

VL - 2016-March

SP - 397

EP - 402

BT - Proceedings of the IEEE International Conference on VLSI Design

PB - IEEE Computer Society

T2 - 29th International Conference on VLSI Design, VLSID 2016

Y2 - 4 January 2016 through 8 January 2016

ER -

Software Coherence Management on Non-coherent Cache Multi-cores

Abstract

Other

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this