Benchmarking and Boosting Transformers for Medical Image Classification

Dong Ao Ma; Mohammad Reza Hosseinzadeh Taher; Jiaxuan Pang; Nahid Ui Islam; Fatemeh Haghighi; Michael B. Gotway; Jianming Liang

doi:10.1007/978-3-031-16852-9_2

Benchmarking and Boosting Transformers for Medical Image Classification

Dong Ao Ma, Mohammad Reza Hosseinzadeh Taher, Jiaxuan Pang, Nahid Ui Islam, Fatemeh Haghighi, Michael B. Gotway, Jianming Liang

Health Solutions, College of (CHS)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

5 Scopus citations

Abstract

Visual transformers have recently gained popularity in the computer vision community as they began to outrank convolutional neural networks (CNNs) in one representative visual benchmark after another. However, the competition between visual transformers and CNNs in medical imaging is rarely studied, leaving many important questions unanswered. As the first step, we benchmark how well existing transformer variants that use various (supervised and self-supervised) pre-training methods perform against CNNs on a variety of medical classification tasks. Furthermore, given the data-hungry nature of transformers and the annotation-deficiency challenge of medical imaging, we present a practical approach for bridging the domain gap between photographic and medical images by utilizing unlabeled large-scale in-domain data. Our extensive empirical evaluations reveal the following insights in medical imaging: (1) good initialization is more crucial for transformer-based models than for CNNs, (2) self-supervised learning based on masked image modeling captures more generalizable representations than supervised models, and (3) assembling a larger-scale domain-specific dataset can better bridge the domain gap between photographic and medical images via self-supervised continuous pre-training. We hope this benchmark study can direct future research on applying transformers to medical imaging analysis. All codes and pre-trained models are available on our GitHub page https://github.com/JLiangLab/BenchmarkTransformers.

Original language	English (US)
Title of host publication	Domain Adaptation and Representation Transfer - 4th MICCAI Workshop, DART 2022, Held in Conjunction with MICCAI 2022, Proceedings
Editors	Konstantinos Kamnitsas, Lisa Koch, Mobarakol Islam, Ziyue Xu, Jorge Cardoso, Qi Dou, Nicola Rieke, Sotirios Tsaftaris
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	12-22
Number of pages	11
ISBN (Print)	9783031168512
DOIs	https://doi.org/10.1007/978-3-031-16852-9_2
State	Published - 2022
Event	4th MICCAI Workshop on Domain Adaptation and Representation Transfer, DART 2022, held in conjunction with the 25th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2022 - Singapore, Singapore Duration: Sep 22 2022 → Sep 22 2022

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	13542 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	4th MICCAI Workshop on Domain Adaptation and Representation Transfer, DART 2022, held in conjunction with the 25th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2022
Country/Territory	Singapore
City	Singapore
Period	9/22/22 → 9/22/22

Keywords

Benchmarking
Domain-adaptive pre-training
Transfer learning
Vision Transformer

ASJC Scopus subject areas

Theoretical Computer Science
General Computer Science

Access to Document

10.1007/978-3-031-16852-9_2

Cite this

Ma, D. A., Hosseinzadeh Taher, M. R., Pang, J., Islam, N. U., Haghighi, F., Gotway, M. B., & Liang, J. (2022). Benchmarking and Boosting Transformers for Medical Image Classification. In K. Kamnitsas, L. Koch, M. Islam, Z. Xu, J. Cardoso, Q. Dou, N. Rieke, & S. Tsaftaris (Eds.), Domain Adaptation and Representation Transfer - 4th MICCAI Workshop, DART 2022, Held in Conjunction with MICCAI 2022, Proceedings (pp. 12-22). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13542 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-16852-9_2

Benchmarking and Boosting Transformers for Medical Image Classification. / Ma, Dong Ao; Hosseinzadeh Taher, Mohammad Reza; Pang, Jiaxuan et al.
Domain Adaptation and Representation Transfer - 4th MICCAI Workshop, DART 2022, Held in Conjunction with MICCAI 2022, Proceedings. ed. / Konstantinos Kamnitsas; Lisa Koch; Mobarakol Islam; Ziyue Xu; Jorge Cardoso; Qi Dou; Nicola Rieke; Sotirios Tsaftaris. Springer Science and Business Media Deutschland GmbH, 2022. p. 12-22 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13542 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Ma, DA, Hosseinzadeh Taher, MR, Pang, J, Islam, NU, Haghighi, F, Gotway, MB & Liang, J 2022, Benchmarking and Boosting Transformers for Medical Image Classification. in K Kamnitsas, L Koch, M Islam, Z Xu, J Cardoso, Q Dou, N Rieke & S Tsaftaris (eds), Domain Adaptation and Representation Transfer - 4th MICCAI Workshop, DART 2022, Held in Conjunction with MICCAI 2022, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 13542 LNCS, Springer Science and Business Media Deutschland GmbH, pp. 12-22, 4th MICCAI Workshop on Domain Adaptation and Representation Transfer, DART 2022, held in conjunction with the 25th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2022, Singapore, Singapore, 9/22/22. https://doi.org/10.1007/978-3-031-16852-9_2

Ma DA, Hosseinzadeh Taher MR, Pang J, Islam NU, Haghighi F, Gotway MB et al. Benchmarking and Boosting Transformers for Medical Image Classification. In Kamnitsas K, Koch L, Islam M, Xu Z, Cardoso J, Dou Q, Rieke N, Tsaftaris S, editors, Domain Adaptation and Representation Transfer - 4th MICCAI Workshop, DART 2022, Held in Conjunction with MICCAI 2022, Proceedings. Springer Science and Business Media Deutschland GmbH. 2022. p. 12-22. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-16852-9_2

Ma, Dong Ao ; Hosseinzadeh Taher, Mohammad Reza ; Pang, Jiaxuan et al. / Benchmarking and Boosting Transformers for Medical Image Classification. Domain Adaptation and Representation Transfer - 4th MICCAI Workshop, DART 2022, Held in Conjunction with MICCAI 2022, Proceedings. editor / Konstantinos Kamnitsas ; Lisa Koch ; Mobarakol Islam ; Ziyue Xu ; Jorge Cardoso ; Qi Dou ; Nicola Rieke ; Sotirios Tsaftaris. Springer Science and Business Media Deutschland GmbH, 2022. pp. 12-22 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{6387ffc9249f4a48ab8dc3467663c720,

title = "Benchmarking and Boosting Transformers for Medical Image Classification",

abstract = "Visual transformers have recently gained popularity in the computer vision community as they began to outrank convolutional neural networks (CNNs) in one representative visual benchmark after another. However, the competition between visual transformers and CNNs in medical imaging is rarely studied, leaving many important questions unanswered. As the first step, we benchmark how well existing transformer variants that use various (supervised and self-supervised) pre-training methods perform against CNNs on a variety of medical classification tasks. Furthermore, given the data-hungry nature of transformers and the annotation-deficiency challenge of medical imaging, we present a practical approach for bridging the domain gap between photographic and medical images by utilizing unlabeled large-scale in-domain data. Our extensive empirical evaluations reveal the following insights in medical imaging: (1) good initialization is more crucial for transformer-based models than for CNNs, (2) self-supervised learning based on masked image modeling captures more generalizable representations than supervised models, and (3) assembling a larger-scale domain-specific dataset can better bridge the domain gap between photographic and medical images via self-supervised continuous pre-training. We hope this benchmark study can direct future research on applying transformers to medical imaging analysis. All codes and pre-trained models are available on our GitHub page https://github.com/JLiangLab/BenchmarkTransformers.",

keywords = "Benchmarking, Domain-adaptive pre-training, Transfer learning, Vision Transformer",

author = "Ma, {Dong Ao} and {Hosseinzadeh Taher}, {Mohammad Reza} and Jiaxuan Pang and Islam, {Nahid Ui} and Fatemeh Haghighi and Gotway, {Michael B.} and Jianming Liang",

note = "Funding Information: Acknowledgments. This research has been supported in part by ASU and Mayo Clinic through a Seed Grant and an Innovation Grant, and in part by the NIH under Award Number R01HL128785. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. This work has utilized the GPUs provided in part by the ASU Research Computing and in part by the Extreme Science and Engineering Discovery Environment (XSEDE) funded by the National Science Foundation (NSF) under grant numbers: ACI-1548562, ACI-1928147, and ACI-2005632. We thank Manas Chetan Valia and Haozhe Luo for evaluating the pre-trained ResNet50 models on the five chest X-ray tasks and the pre-trained transformer models on the VinDr-CXR target tasks, respectively. The content of this paper is covered by patents pending. Publisher Copyright: {\textcopyright} 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.; 4th MICCAI Workshop on Domain Adaptation and Representation Transfer, DART 2022, held in conjunction with the 25th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2022 ; Conference date: 22-09-2022 Through 22-09-2022",

year = "2022",

doi = "10.1007/978-3-031-16852-9_2",

language = "English (US)",

isbn = "9783031168512",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "12--22",

editor = "Konstantinos Kamnitsas and Lisa Koch and Mobarakol Islam and Ziyue Xu and Jorge Cardoso and Qi Dou and Nicola Rieke and Sotirios Tsaftaris",

booktitle = "Domain Adaptation and Representation Transfer - 4th MICCAI Workshop, DART 2022, Held in Conjunction with MICCAI 2022, Proceedings",

address = "Germany",

}

TY - GEN

T1 - Benchmarking and Boosting Transformers for Medical Image Classification

AU - Ma, Dong Ao

AU - Hosseinzadeh Taher, Mohammad Reza

AU - Pang, Jiaxuan

AU - Islam, Nahid Ui

AU - Haghighi, Fatemeh

AU - Gotway, Michael B.

AU - Liang, Jianming

N1 - Funding Information: Acknowledgments. This research has been supported in part by ASU and Mayo Clinic through a Seed Grant and an Innovation Grant, and in part by the NIH under Award Number R01HL128785. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. This work has utilized the GPUs provided in part by the ASU Research Computing and in part by the Extreme Science and Engineering Discovery Environment (XSEDE) funded by the National Science Foundation (NSF) under grant numbers: ACI-1548562, ACI-1928147, and ACI-2005632. We thank Manas Chetan Valia and Haozhe Luo for evaluating the pre-trained ResNet50 models on the five chest X-ray tasks and the pre-trained transformer models on the VinDr-CXR target tasks, respectively. The content of this paper is covered by patents pending. Publisher Copyright: © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.

PY - 2022

Y1 - 2022

N2 - Visual transformers have recently gained popularity in the computer vision community as they began to outrank convolutional neural networks (CNNs) in one representative visual benchmark after another. However, the competition between visual transformers and CNNs in medical imaging is rarely studied, leaving many important questions unanswered. As the first step, we benchmark how well existing transformer variants that use various (supervised and self-supervised) pre-training methods perform against CNNs on a variety of medical classification tasks. Furthermore, given the data-hungry nature of transformers and the annotation-deficiency challenge of medical imaging, we present a practical approach for bridging the domain gap between photographic and medical images by utilizing unlabeled large-scale in-domain data. Our extensive empirical evaluations reveal the following insights in medical imaging: (1) good initialization is more crucial for transformer-based models than for CNNs, (2) self-supervised learning based on masked image modeling captures more generalizable representations than supervised models, and (3) assembling a larger-scale domain-specific dataset can better bridge the domain gap between photographic and medical images via self-supervised continuous pre-training. We hope this benchmark study can direct future research on applying transformers to medical imaging analysis. All codes and pre-trained models are available on our GitHub page https://github.com/JLiangLab/BenchmarkTransformers.

AB - Visual transformers have recently gained popularity in the computer vision community as they began to outrank convolutional neural networks (CNNs) in one representative visual benchmark after another. However, the competition between visual transformers and CNNs in medical imaging is rarely studied, leaving many important questions unanswered. As the first step, we benchmark how well existing transformer variants that use various (supervised and self-supervised) pre-training methods perform against CNNs on a variety of medical classification tasks. Furthermore, given the data-hungry nature of transformers and the annotation-deficiency challenge of medical imaging, we present a practical approach for bridging the domain gap between photographic and medical images by utilizing unlabeled large-scale in-domain data. Our extensive empirical evaluations reveal the following insights in medical imaging: (1) good initialization is more crucial for transformer-based models than for CNNs, (2) self-supervised learning based on masked image modeling captures more generalizable representations than supervised models, and (3) assembling a larger-scale domain-specific dataset can better bridge the domain gap between photographic and medical images via self-supervised continuous pre-training. We hope this benchmark study can direct future research on applying transformers to medical imaging analysis. All codes and pre-trained models are available on our GitHub page https://github.com/JLiangLab/BenchmarkTransformers.

KW - Benchmarking

KW - Domain-adaptive pre-training

KW - Transfer learning

KW - Vision Transformer

UR - http://www.scopus.com/inward/record.url?scp=85140442747&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85140442747&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-16852-9_2

DO - 10.1007/978-3-031-16852-9_2

M3 - Conference contribution

AN - SCOPUS:85140442747

SN - 9783031168512

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 12

EP - 22

BT - Domain Adaptation and Representation Transfer - 4th MICCAI Workshop, DART 2022, Held in Conjunction with MICCAI 2022, Proceedings

A2 - Kamnitsas, Konstantinos

A2 - Koch, Lisa

A2 - Islam, Mobarakol

A2 - Xu, Ziyue

A2 - Cardoso, Jorge

A2 - Dou, Qi

A2 - Rieke, Nicola

A2 - Tsaftaris, Sotirios

PB - Springer Science and Business Media Deutschland GmbH

T2 - 4th MICCAI Workshop on Domain Adaptation and Representation Transfer, DART 2022, held in conjunction with the 25th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2022

Y2 - 22 September 2022 through 22 September 2022

ER -