Exploiting Parallelism Opportunities with Deep Learning Frameworks

Yu Emma Wang, Carole Jean Wu, Xiaodong Wang, Kim Hazelwood, David Brooks

Research output: Contribution to journalArticlepeer-review

14 Scopus citations


State-of-the-art machine learning frameworks support a wide variety of design features to enable a flexible machine learning programming interface and to ease the programmability burden on machine learning developers. Identifying and using a performance-optimal setting in feature-rich frameworks, however, involves a non-trivial amount of performance profiling efforts and often relies on domain-specific knowledge. This article takes a deep dive into analyzing the performance impact of key design features in a machine learning framework and quantifies the role of parallelism. The observations and insights distill into a simple set of guidelines that one can use to achieve much higher training and inference speedup. Across a diverse set of real-world deep learning models, the evaluation results show that the proposed performance tuning guidelines outperform the Intel and TensorFlow recommended settings by 1.30× and 1.38×, respectively.

Original languageEnglish (US)
Article number9
JournalACM Transactions on Architecture and Code Optimization
Issue number1
StatePublished - Jan 2021
Externally publishedYes


  • Machine learning frameworks
  • parallel computing
  • performance analysis

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture


Dive into the research topics of 'Exploiting Parallelism Opportunities with Deep Learning Frameworks'. Together they form a unique fingerprint.

Cite this