PatchSwap: A Regularization Technique for Vision Transformers

Sachin Chhabra; Hemanth Venkateswara; Baoxin Li

PatchSwap: A Regularization Technique for Vision Transformers

Sachin Chhabra, Hemanth Venkateswara, Baoxin Li

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Contribution to conference › Paper › peer-review

Abstract

Vision Transformers have recently gained popularity due to their superior performance on visual computing tasks. However, this performance is based on training with huge datasets, and maintaining the performance on small datasets remains a challenge. Regularization helps to alleviate the overfitting issue that is common when dealing with small datasets. Most existing regularization techniques are designed keeping ConvNets in mind. As Vision Transformers process images differently, there is a need for new regularization techniques crafted for them. In this paper, we propose a regularization called PatchSwap, which interchanges the patches between two images, resulting in a new input for regularizing the transformer. Our extensive experiments showcase that PatchSwap yields superior performance than existing state-of-the-art methods. Further, the simplicity of PatchSwap makes a straightforward extension to a semi-supervised setting with minimal effort.

Original language	English (US)
State	Published - 2022
Event	33rd British Machine Vision Conference Proceedings, BMVC 2022 - London, United Kingdom Duration: Nov 21 2022 → Nov 24 2022

Conference

Conference	33rd British Machine Vision Conference Proceedings, BMVC 2022
Country/Territory	United Kingdom
City	London
Period	11/21/22 → 11/24/22

ASJC Scopus subject areas

Computer Vision and Pattern Recognition

Cite this

@conference{3c29da9ba6474e21a979940dd315631c,

title = "PatchSwap: A Regularization Technique for Vision Transformers",

abstract = "Vision Transformers have recently gained popularity due to their superior performance on visual computing tasks. However, this performance is based on training with huge datasets, and maintaining the performance on small datasets remains a challenge. Regularization helps to alleviate the overfitting issue that is common when dealing with small datasets. Most existing regularization techniques are designed keeping ConvNets in mind. As Vision Transformers process images differently, there is a need for new regularization techniques crafted for them. In this paper, we propose a regularization called PatchSwap, which interchanges the patches between two images, resulting in a new input for regularizing the transformer. Our extensive experiments showcase that PatchSwap yields superior performance than existing state-of-the-art methods. Further, the simplicity of PatchSwap makes a straightforward extension to a semi-supervised setting with minimal effort.",

author = "Sachin Chhabra and Hemanth Venkateswara and Baoxin Li",

note = "Publisher Copyright: {\textcopyright} 2022. The copyright of this document resides with its authors. It may be distributed unchanged freely in print or electronic forms.; 33rd British Machine Vision Conference Proceedings, BMVC 2022 ; Conference date: 21-11-2022 Through 24-11-2022",

year = "2022",

language = "English (US)",

}

TY - CONF

T1 - PatchSwap

T2 - 33rd British Machine Vision Conference Proceedings, BMVC 2022

AU - Chhabra, Sachin

AU - Venkateswara, Hemanth

AU - Li, Baoxin

PY - 2022

Y1 - 2022

N2 - Vision Transformers have recently gained popularity due to their superior performance on visual computing tasks. However, this performance is based on training with huge datasets, and maintaining the performance on small datasets remains a challenge. Regularization helps to alleviate the overfitting issue that is common when dealing with small datasets. Most existing regularization techniques are designed keeping ConvNets in mind. As Vision Transformers process images differently, there is a need for new regularization techniques crafted for them. In this paper, we propose a regularization called PatchSwap, which interchanges the patches between two images, resulting in a new input for regularizing the transformer. Our extensive experiments showcase that PatchSwap yields superior performance than existing state-of-the-art methods. Further, the simplicity of PatchSwap makes a straightforward extension to a semi-supervised setting with minimal effort.

AB - Vision Transformers have recently gained popularity due to their superior performance on visual computing tasks. However, this performance is based on training with huge datasets, and maintaining the performance on small datasets remains a challenge. Regularization helps to alleviate the overfitting issue that is common when dealing with small datasets. Most existing regularization techniques are designed keeping ConvNets in mind. As Vision Transformers process images differently, there is a need for new regularization techniques crafted for them. In this paper, we propose a regularization called PatchSwap, which interchanges the patches between two images, resulting in a new input for regularizing the transformer. Our extensive experiments showcase that PatchSwap yields superior performance than existing state-of-the-art methods. Further, the simplicity of PatchSwap makes a straightforward extension to a semi-supervised setting with minimal effort.

UR - http://www.scopus.com/inward/record.url?scp=85174731380&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85174731380&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:85174731380

Y2 - 21 November 2022 through 24 November 2022

ER -

PatchSwap: A Regularization Technique for Vision Transformers

Abstract

Conference

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this