WATSON: A Workflow-based Data Storage Optimizer for Analytics

Jia Zou, Ming Zhao, Juwei Shi, Chen Wang

Research output: Contribution to conferencePaperpeer-review

1 Scopus citations

Abstract

This paper studies the automatic optimization of data placement parameters for the inter-job write once read many (WORM) scenario where data is first materialized to storage by a producer job, and then accessed for many times by one or more consumer jobs. Such scenario is ubiquitous in Big Data analytics applications but existing Big Data auto-tuning techniques are often focused on single job performance. To address the shortcomings in existing works, this paper investigates data placement parameters regarding blocking, partitioning and replication and models the trade-offs caused by different configurations of these parameters through a producer-consumer model. We then present a novel cross-layer solution, WATSON, which can automatically predict future workloads’ data access patterns and tune data placement parameters accordingly to optimize the performance for an inter-job WORM scenario. WATSON can achieve up to eight times performance speedup on various analytics workloads.

Original languageEnglish (US)
StatePublished - 2020
Event36th International Conference on Massive Storage Systems and Technology, MSST 2020 - Virtual, Online
Duration: Oct 29 2020Oct 30 2020

Conference

Conference36th International Conference on Massive Storage Systems and Technology, MSST 2020
CityVirtual, Online
Period10/29/2010/30/20

Keywords

  • auto-tuning
  • Big Data analytics
  • data placement
  • parameter optimization
  • storage

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'WATSON: A Workflow-based Data Storage Optimizer for Analytics'. Together they form a unique fingerprint.

Cite this