Abstract
This paper studies the automatic optimization of data placement parameters for the inter-job write once read many (WORM) scenario where data is first materialized to storage by a producer job, and then accessed for many times by one or more consumer jobs. Such scenario is ubiquitous in Big Data analytics applications but existing Big Data auto-tuning techniques are often focused on single job performance. To address the shortcomings in existing works, this paper investigates data placement parameters regarding blocking, partitioning and replication and models the trade-offs caused by different configurations of these parameters through a producer-consumer model. We then present a novel cross-layer solution, WATSON, which can automatically predict future workloads’ data access patterns and tune data placement parameters accordingly to optimize the performance for an inter-job WORM scenario. WATSON can achieve up to eight times performance speedup on various analytics workloads.
Original language | English (US) |
---|---|
State | Published - 2020 |
Event | 36th International Conference on Massive Storage Systems and Technology, MSST 2020 - Virtual, Online Duration: Oct 29 2020 → Oct 30 2020 |
Conference
Conference | 36th International Conference on Massive Storage Systems and Technology, MSST 2020 |
---|---|
City | Virtual, Online |
Period | 10/29/20 → 10/30/20 |
Keywords
- auto-tuning
- Big Data analytics
- data placement
- parameter optimization
- storage
ASJC Scopus subject areas
- Electrical and Electronic Engineering
- Hardware and Architecture