FSpGEMM: A Framework for Accelerating Sparse General Matrix-Matrix Multiplication Using Gustavson's Algorithm on FPGAs

Erfan Bank Tavakoli, Michael Riera, Masudul Hassan Quraishi, Fengbo Ren

Research output: Contribution to journalArticlepeer-review

Abstract

General sparse matrix-matrix multiplication (SpGEMM) is integral to many high-performance computing (HPC) and machine learning applications. However, prior field-programmable gate array (FPGA)-based SpGEMM accelerators either use the inner product algorithm with wasted and costly operations or Gustavson's algorithm with a cache-based hardware architecture suffering from long-latency cache miss penalties and limited to embedded devices. In this work, we propose framework for accelerating SpGEMM (FSpGEMM), an OpenCL-based SpGEMM framework for accelerating Gustvason's algorithm that includes an FPGA kernel implementing a throughput-optimized and scalable hardware architecture compatible with high-bandwidth memory (HBM) or traditional DDR-based memory. In addition, to address the irregular memory access patterns incurred by Gustavson's algorithm, we propose a new buffering scheme tailored to Gustavson's algorithm enabled by a new compressed sparse vector (CSV) format for representing sparse matrices and a row reordering technique as a preprocessing step to improve data reuse, and consequently, resource utilization. The proposed framework includes a host program implementing preprocessing functions for reordering input matrices and storing them in the proposed CSV format for further use. We implemented FSpGEMM using Intel FPGA SDK for OpenCL and experimented with a benchmark of sparse matrices selected from the SuiteSparse Matrix Collection on a Bittware 520N-MX FPGA board. The results show that the reordering technique improves the performance on average by 20.3% compared with the baseline. Finally, FSpGEMM outperforms the state-of-the-art (SOTA) FPGA implementation by an average of 2.23× in terms of execution cycles with the same benchmark and memory system configuration for a fair comparison.

Original languageEnglish (US)
Pages (from-to)633-644
Number of pages12
JournalIEEE Transactions on Very Large Scale Integration (VLSI) Systems
Volume32
Issue number4
DOIs
StatePublished - Apr 1 2024

Keywords

  • Field-programmable gate array (FPGA)
  • Gustavson's algorithm
  • OpenCL
  • general sparse matrix-matrix multiplication (SpGEMM)
  • reconfigurable computing

ASJC Scopus subject areas

  • Software
  • Electrical and Electronic Engineering
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'FSpGEMM: A Framework for Accelerating Sparse General Matrix-Matrix Multiplication Using Gustavson's Algorithm on FPGAs'. Together they form a unique fingerprint.

Cite this