Efficient Approaches for GEMM Acceleration on Leading AI-Optimized FPGAs

Endri Taka, Dimitrios Gourounas, Andreas Gerstlauer, Diana Marculescu, Aman Arora

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

FPGAs are a promising platform for accelerating Deep Learning (DL) applications, due to their high performance, low power consumption, and reconfigurability. Recently, the leading FPGA vendors have enhanced their architectures to more efficiently support the computational demands of DL workloads. However, the two most prominent AI-optimized FPGAs, i.e., AMD/Xilinx Versal ACAP and Intel Stratix 10 NX, employ significantly different architectural approaches. This paper presents novel systematic frameworks to optimize the performance of General Matrix Multiplication (GEMM), a fundamental operation in DL workloads, by exploiting the unique and distinct architectural characteristics of each FPGA. Our evaluation on GEMM workloads for int8 precision shows up to 77 and 68 TOPs (int8) throughput, with up to 0.94 and 1.35 TOPs/W energy efficiency for Versal VC1902 and Stratix 10 NX, respectively. This work provides insights and guidelines for optimizing GEMM-based applications on both platforms, while also delving into their programmability trade-offs and associated challenges.

Original languageEnglish (US)
Title of host publicationProceedings - 2024 IEEE 32nd Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages54-65
Number of pages12
ISBN (Electronic)9798350372434
DOIs
StatePublished - 2024
Externally publishedYes
Event32nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2024 - Orlando, United States
Duration: May 5 2024May 8 2024

Publication series

NameProceedings - 2024 IEEE 32nd Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2024

Conference

Conference32nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2024
Country/TerritoryUnited States
CityOrlando
Period5/5/245/8/24

Keywords

  • ACAP
  • AI Engine
  • AI Tensor Blocks
  • Deep Learning
  • FPGA
  • GEMM
  • Hardware Acceleration
  • Stratix
  • Versal

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Software
  • Control and Optimization

Fingerprint

Dive into the research topics of 'Efficient Approaches for GEMM Acceleration on Leading AI-Optimized FPGAs'. Together they form a unique fingerprint.

Cite this