Skip to main navigation Skip to search Skip to main content

A Comprehensive Survey on Data Augmentation

  • Zaitian Wang
  • , Pengfei Wang
  • , Kunpeng Liu
  • , Pengyang Wang
  • , Yanjie Fu
  • , Chang Tien Lu
  • , Charu C. Aggarwal
  • , Jian Pei
  • , Yuanchun Zhou

Research output: Contribution to journalArticlepeer-review

Abstract

Data augmentation is a series of techniques that generate high-quality artificial data by manipulating existing data samples. By leveraging data augmentation techniques, AI models can achieve significantly improved applicability in tasks involving scarce or imbalanced datasets, thereby substantially enhancing AI models’ generalization capabilities. Existing literature surveys only focus on a certain type of specific modality data and categorize these methods from modality-specific and operation-centric perspectives, which lacks a consistent summary of data augmentation methods across multiple modalities and limits the comprehension of how existing data samples serve the data augmentation process. To bridge this gap, this survey proposes a more enlightening taxonomy that encompasses data augmentation techniques for different common data modalities by investigating how to take advantage of the intrinsic relationship between and within instances. Additionally, it categorizes data augmentation methods across five data modalities through a unified inductive approach.

Original languageEnglish (US)
Pages (from-to)47-66
Number of pages20
JournalIEEE Transactions on Knowledge and Data Engineering
Volume38
Issue number1
DOIs
StatePublished - 2026
Externally publishedYes

Keywords

  • Data augmentation
  • data-centric taxonomy
  • multi-modality

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'A Comprehensive Survey on Data Augmentation'. Together they form a unique fingerprint.

Cite this