Cachededup: In-line deduplication for flash caching

Wenji Li; Gregory Jean-Baptise; Juan Riveros; Giri Narasimhan; Tong Zhang; Ming Zhao

Cachededup: In-line deduplication for flash caching

Wenji Li, Gregory Jean-Baptise, Juan Riveros, Giri Narasimhan, Tong Zhang, Ming Zhao

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

Flash caching has emerged as a promising solution to the scalability problems of storage systems by using fast flash memory devices as the cache for slower primary storage. But its adoption faces serious obstacles due to the limited capacity and endurance of flash devices. This paper presents CacheDedup, a solution that addresses these limitations using in-line deduplication. First, it proposes a novel architecture that integrates the caching of data and deduplication metadata (source addresses and fingerprints of the data) and efficiently manages these two components. Second, it proposes duplication-aware cache replacement algorithms (D-LRU, DARC) to optimize both cache performance and endurance. The paper presents a rigorous analysis of the algorithms to prove that they do not waste valuable cache space and that they are efficient in time and space usage. The paper also includes an experimental evaluation using real-world traces, which confirms that CacheDedup substantially improves I/O performance (up to 20% reduction in miss ratio and 51% in latency) and flash endurance (up to 89% reduction in writes sent to the cache device) compared to traditional cache management. It also shows that the proposed architecture and algorithms can be extended to support the combination of compression and deduplication for flash caching and improve its performance and endurance.

Original language	English (US)
Title of host publication	Proceedings of the 14th USENIX Conference on File and Storage Technologies, FAST 2016
Publisher	USENIX Association
Pages	301-314
Number of pages	14
ISBN (Electronic)	9781931971287
State	Published - 2016
Event	14th USENIX Conference on File and Storage Technologies, FAST 2016 - Santa Clara, United States Duration: Feb 22 2016 → Feb 25 2016

Publication series

Name	Proceedings of the 14th USENIX Conference on File and Storage Technologies, FAST 2016

Conference

Conference	14th USENIX Conference on File and Storage Technologies, FAST 2016
Country/Territory	United States
City	Santa Clara
Period	2/22/16 → 2/25/16

ASJC Scopus subject areas

Hardware and Architecture
Software
Computer Networks and Communications

Cite this

Cachededup: In-line deduplication for flash caching. / Li, Wenji; Jean-Baptise, Gregory; Riveros, Juan et al.
Proceedings of the 14th USENIX Conference on File and Storage Technologies, FAST 2016. USENIX Association, 2016. p. 301-314 (Proceedings of the 14th USENIX Conference on File and Storage Technologies, FAST 2016).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Li, W, Jean-Baptise, G, Riveros, J, Narasimhan, G, Zhang, T & Zhao, M 2016, Cachededup: In-line deduplication for flash caching. in Proceedings of the 14th USENIX Conference on File and Storage Technologies, FAST 2016. Proceedings of the 14th USENIX Conference on File and Storage Technologies, FAST 2016, USENIX Association, pp. 301-314, 14th USENIX Conference on File and Storage Technologies, FAST 2016, Santa Clara, United States, 2/22/16.

@inproceedings{35ff5b1be631460ba55261a14d9eeab9,

title = "Cachededup: In-line deduplication for flash caching",

abstract = "Flash caching has emerged as a promising solution to the scalability problems of storage systems by using fast flash memory devices as the cache for slower primary storage. But its adoption faces serious obstacles due to the limited capacity and endurance of flash devices. This paper presents CacheDedup, a solution that addresses these limitations using in-line deduplication. First, it proposes a novel architecture that integrates the caching of data and deduplication metadata (source addresses and fingerprints of the data) and efficiently manages these two components. Second, it proposes duplication-aware cache replacement algorithms (D-LRU, DARC) to optimize both cache performance and endurance. The paper presents a rigorous analysis of the algorithms to prove that they do not waste valuable cache space and that they are efficient in time and space usage. The paper also includes an experimental evaluation using real-world traces, which confirms that CacheDedup substantially improves I/O performance (up to 20% reduction in miss ratio and 51% in latency) and flash endurance (up to 89% reduction in writes sent to the cache device) compared to traditional cache management. It also shows that the proposed architecture and algorithms can be extended to support the combination of compression and deduplication for flash caching and improve its performance and endurance.",

author = "Wenji Li and Gregory Jean-Baptise and Juan Riveros and Giri Narasimhan and Tong Zhang and Ming Zhao",

note = "Funding Information: We thank the anonymous reviewers and our shepherd, Geoff Kuenning, for their thorough reviews and insightful suggestions, and our colleagues at the VISA Research Lab, Dul-cardo Arteaga, for his help with the caching framework, and Saman Biook Aghazadeh for his support of this paper including collecting the Hadoop traces. This research is sponsored by National Science Foundation CAREER award CNS-125394 and Department of Defense award W911NF-13-1-0157. Publisher Copyright: {\textcopyright} 2016 by The USENIX Association. All Rights Reserved.; 14th USENIX Conference on File and Storage Technologies, FAST 2016 ; Conference date: 22-02-2016 Through 25-02-2016",

year = "2016",

language = "English (US)",

series = "Proceedings of the 14th USENIX Conference on File and Storage Technologies, FAST 2016",

publisher = "USENIX Association",

pages = "301--314",

booktitle = "Proceedings of the 14th USENIX Conference on File and Storage Technologies, FAST 2016",

}

TY - GEN

T1 - Cachededup

T2 - 14th USENIX Conference on File and Storage Technologies, FAST 2016

AU - Li, Wenji

AU - Jean-Baptise, Gregory

AU - Riveros, Juan

AU - Narasimhan, Giri

AU - Zhang, Tong

AU - Zhao, Ming

N1 - Funding Information: We thank the anonymous reviewers and our shepherd, Geoff Kuenning, for their thorough reviews and insightful suggestions, and our colleagues at the VISA Research Lab, Dul-cardo Arteaga, for his help with the caching framework, and Saman Biook Aghazadeh for his support of this paper including collecting the Hadoop traces. This research is sponsored by National Science Foundation CAREER award CNS-125394 and Department of Defense award W911NF-13-1-0157. Publisher Copyright: © 2016 by The USENIX Association. All Rights Reserved.

PY - 2016

Y1 - 2016

N2 - Flash caching has emerged as a promising solution to the scalability problems of storage systems by using fast flash memory devices as the cache for slower primary storage. But its adoption faces serious obstacles due to the limited capacity and endurance of flash devices. This paper presents CacheDedup, a solution that addresses these limitations using in-line deduplication. First, it proposes a novel architecture that integrates the caching of data and deduplication metadata (source addresses and fingerprints of the data) and efficiently manages these two components. Second, it proposes duplication-aware cache replacement algorithms (D-LRU, DARC) to optimize both cache performance and endurance. The paper presents a rigorous analysis of the algorithms to prove that they do not waste valuable cache space and that they are efficient in time and space usage. The paper also includes an experimental evaluation using real-world traces, which confirms that CacheDedup substantially improves I/O performance (up to 20% reduction in miss ratio and 51% in latency) and flash endurance (up to 89% reduction in writes sent to the cache device) compared to traditional cache management. It also shows that the proposed architecture and algorithms can be extended to support the combination of compression and deduplication for flash caching and improve its performance and endurance.

AB - Flash caching has emerged as a promising solution to the scalability problems of storage systems by using fast flash memory devices as the cache for slower primary storage. But its adoption faces serious obstacles due to the limited capacity and endurance of flash devices. This paper presents CacheDedup, a solution that addresses these limitations using in-line deduplication. First, it proposes a novel architecture that integrates the caching of data and deduplication metadata (source addresses and fingerprints of the data) and efficiently manages these two components. Second, it proposes duplication-aware cache replacement algorithms (D-LRU, DARC) to optimize both cache performance and endurance. The paper presents a rigorous analysis of the algorithms to prove that they do not waste valuable cache space and that they are efficient in time and space usage. The paper also includes an experimental evaluation using real-world traces, which confirms that CacheDedup substantially improves I/O performance (up to 20% reduction in miss ratio and 51% in latency) and flash endurance (up to 89% reduction in writes sent to the cache device) compared to traditional cache management. It also shows that the proposed architecture and algorithms can be extended to support the combination of compression and deduplication for flash caching and improve its performance and endurance.

UR - http://www.scopus.com/inward/record.url?scp=85077197000&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85077197000&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85077197000

T3 - Proceedings of the 14th USENIX Conference on File and Storage Technologies, FAST 2016

SP - 301

EP - 314

BT - Proceedings of the 14th USENIX Conference on File and Storage Technologies, FAST 2016

PB - USENIX Association

Y2 - 22 February 2016 through 25 February 2016

ER -

Cachededup: In-line deduplication for flash caching

Abstract

Publication series

Conference

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this