HLSFactory: A Framework Empowering High-Level Synthesis Datasets for Machine Learning and Beyond

Stefan Abi-Karam, Rishov Sarkar, Allison Seigler, Sean Lowe, Zhigang Wei, Hanqiu Chen, Nanditha Rao, Lizy John, Aman Arora, Cong Hao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Machine learning (ML) techniques have been applied to high-level synthesis (HLS) flows for quality-of-result (QoR) prediction and design space exploration (DSE). Nevertheless, the scarcity of accessible high-quality HLS datasets and the complexity of building such datasets present great challenges to FPGA and ML researchers. Existing datasets either cover only a subset of previously published benchmarks, provide no way to enumerate optimization design spaces, are limited to a specific vendor, or have no reproducible and extensible software for dataset construction. Many works also lack user-friendly ways to add more designs to existing datasets, limiting wider adoption and sustainability of such datasets. In response to these challenges, we introduce HLSFactory, a comprehensive framework designed to facilitate the curation and generation of high-quality HLS design datasets. HLSFactory has three main stages: 1) a design space expansion stage to elaborate single HLS designs into large design spaces using various optimization directives across multiple vendor tools, 2) a design synthesis stage to execute HLS and FPGA tool flows concurrently across designs, and 3) a data aggregation stage for extracting standardized data into packaged datasets for ML usage. This tripartite architecture not only ensures broad coverage of data points via design space expansion but also supports interoperability with tools from multiple vendors. Users can contribute to each stage easily by submitting their own HLS designs or synthesis results via provided user APIs. The framework is also flexible, allowing extensions at every step via user APIs with custom frontends, synthesis tools, and scripts. To demonstrate the framework functionality, we include an initial set of built-in base designs from PolyBench, MachSuite, Rosetta, CHStone, Kastner et al.'s Parallel Programming for FPGAs, and curated kernels from existing open-source HLS designs. We report the statistical analyses and design space visualizations to demonstrate the completed end-to-end compilation flow, and to highlight the effectiveness of our design space expansion beyond the initial base dataset, which greatly contributes to dataset diversity and coverage. In addition to its evident application in ML, we showcase the versatility and multi-functionality of our framework through seven case studies: I) Building an ML model for post-implementation QoR prediction II) Using design space sampling in stage 1 to expand the design space covered from a small base set of HLS designs; III) Demonstrating the speedup from the fine-grained design parallelism backend; IV) Extending HLSFactory to target Intel's HLS flow across all stages; V) Adding and running new auxiliary designs using HLSFactory; VI) Integration of previously published HLS data in stage 3; VII) Using HLSFactory to perform HLS tool version regression benchmarking. Code available at https://github.com/sharc-lab/HLSFactory.

Original languageEnglish (US)
Title of host publicationMLCAD 2024 - Proceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9798400706998
DOIs
StatePublished - Sep 9 2024
Externally publishedYes
Event6th ACM/IEEE International Symposium on Machine Learning for CAD, MLCAD 2024 - Snowbird, United States
Duration: Sep 9 2024Sep 11 2024

Publication series

NameMLCAD 2024 - Proceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD

Conference

Conference6th ACM/IEEE International Symposium on Machine Learning for CAD, MLCAD 2024
Country/TerritoryUnited States
CitySnowbird
Period9/9/249/11/24

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Graphics and Computer-Aided Design
  • Control and Optimization
  • Modeling and Simulation

Fingerprint

Dive into the research topics of 'HLSFactory: A Framework Empowering High-Level Synthesis Datasets for Machine Learning and Beyond'. Together they form a unique fingerprint.

Cite this