Transformer-based Automatic Mapping of Clinical Notes to Specific Clinical Concepts

Jay Ganesh; Ajay Bansal

doi:10.1109/COMPSAC57700.2023.00080

Transformer-based Automatic Mapping of Clinical Notes to Specific Clinical Concepts

Jay Ganesh, Ajay Bansal

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

A significant proportion of medical errors exist in crucial medical information, and most stem from misinterpreting non-standardized clinical notes. This research compares four transformer-based models namely: BERT (Bidirectional Encoder Representations from Transformers) Base Uncased, Emilyalsentzer Bio-ClinicalBERT, RoBERTa (Robustly Optimized BERT Pre-Training Approach), and DeBERTa (Decoding-enhanced BERT with disentangled attention) to determine which among the four is the best backbone model for mapping free text in clinical notes to specific clinical concepts. Besides, the impact of context-specific embeddings on BERT was also studied to determine the need for a clinical BERT in Clinical Skills exam scoring. This research proposes the use of DeBERTa as a backbone model in patient note scoring for the United States Medical Licensing Examination (USMLE) Clinical Skills exam after comparing it with three other transformer models. Disentangled attention and enhanced mask decoder integrated into DeBERTa were credited for its high performance. Besides, the effect of meta pseudo labeling was also investigated in this research, which in turn, further enhanced DeBERTa's performance.

Original language	English (US)
Title of host publication	Proceedings - 2023 IEEE 47th Annual Computers, Software, and Applications Conference, COMPSAC 2023
Editors	Hossain Shahriar, Yuuichi Teranishi, Alfredo Cuzzocrea, Moushumi Sharmin, Dave Towey, AKM Jahangir Alam Majumder, Hiroki Kashiwazaki, Ji-Jiang Yang, Michiharu Takemoto, Nazmus Sakib, Ryohei Banno, Sheikh Iqbal Ahamed
Publisher	IEEE Computer Society
Pages	558-563
Number of pages	6
ISBN (Electronic)	9798350326970
DOIs	https://doi.org/10.1109/COMPSAC57700.2023.00080
State	Published - 2023
Event	47th IEEE Annual Computers, Software, and Applications Conference, COMPSAC 2023 - Hybrid, Torino, Italy Duration: Jun 26 2023 → Jun 30 2023

Publication series

Name	Proceedings - International Computer Software and Applications Conference
Volume	2023-June
ISSN (Print)	0730-3157

Conference

Conference	47th IEEE Annual Computers, Software, and Applications Conference, COMPSAC 2023
Country/Territory	Italy
City	Hybrid, Torino
Period	6/26/23 → 6/30/23

Keywords

BERT
BERT Base Uncased
DeBERTa
RoBERTa
clinical BERT
meta pseudo labeling

ASJC Scopus subject areas

Software
Computer Science Applications

Access to Document

10.1109/COMPSAC57700.2023.00080

Cite this

Ganesh, J., & Bansal, A. (2023). Transformer-based Automatic Mapping of Clinical Notes to Specific Clinical Concepts. In H. Shahriar, Y. Teranishi, A. Cuzzocrea, M. Sharmin, D. Towey, AKM. J. A. Majumder, H. Kashiwazaki, J.-J. Yang, M. Takemoto, N. Sakib, R. Banno, & S. I. Ahamed (Eds.), Proceedings - 2023 IEEE 47th Annual Computers, Software, and Applications Conference, COMPSAC 2023 (pp. 558-563). (Proceedings - International Computer Software and Applications Conference; Vol. 2023-June). IEEE Computer Society. https://doi.org/10.1109/COMPSAC57700.2023.00080

Transformer-based Automatic Mapping of Clinical Notes to Specific Clinical Concepts. / Ganesh, Jay; Bansal, Ajay.
Proceedings - 2023 IEEE 47th Annual Computers, Software, and Applications Conference, COMPSAC 2023. ed. / Hossain Shahriar; Yuuichi Teranishi; Alfredo Cuzzocrea; Moushumi Sharmin; Dave Towey; AKM Jahangir Alam Majumder; Hiroki Kashiwazaki; Ji-Jiang Yang; Michiharu Takemoto; Nazmus Sakib; Ryohei Banno; Sheikh Iqbal Ahamed. IEEE Computer Society, 2023. p. 558-563 (Proceedings - International Computer Software and Applications Conference; Vol. 2023-June).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Ganesh, J & Bansal, A 2023, Transformer-based Automatic Mapping of Clinical Notes to Specific Clinical Concepts. in H Shahriar, Y Teranishi, A Cuzzocrea, M Sharmin, D Towey, AKMJA Majumder, H Kashiwazaki, J-J Yang, M Takemoto, N Sakib, R Banno & SI Ahamed (eds), Proceedings - 2023 IEEE 47th Annual Computers, Software, and Applications Conference, COMPSAC 2023. Proceedings - International Computer Software and Applications Conference, vol. 2023-June, IEEE Computer Society, pp. 558-563, 47th IEEE Annual Computers, Software, and Applications Conference, COMPSAC 2023, Hybrid, Torino, Italy, 6/26/23. https://doi.org/10.1109/COMPSAC57700.2023.00080

Ganesh J, Bansal A. Transformer-based Automatic Mapping of Clinical Notes to Specific Clinical Concepts. In Shahriar H, Teranishi Y, Cuzzocrea A, Sharmin M, Towey D, Majumder AKMJA, Kashiwazaki H, Yang JJ, Takemoto M, Sakib N, Banno R, Ahamed SI, editors, Proceedings - 2023 IEEE 47th Annual Computers, Software, and Applications Conference, COMPSAC 2023. IEEE Computer Society. 2023. p. 558-563. (Proceedings - International Computer Software and Applications Conference). doi: 10.1109/COMPSAC57700.2023.00080

Ganesh, Jay ; Bansal, Ajay. / Transformer-based Automatic Mapping of Clinical Notes to Specific Clinical Concepts. Proceedings - 2023 IEEE 47th Annual Computers, Software, and Applications Conference, COMPSAC 2023. editor / Hossain Shahriar ; Yuuichi Teranishi ; Alfredo Cuzzocrea ; Moushumi Sharmin ; Dave Towey ; AKM Jahangir Alam Majumder ; Hiroki Kashiwazaki ; Ji-Jiang Yang ; Michiharu Takemoto ; Nazmus Sakib ; Ryohei Banno ; Sheikh Iqbal Ahamed. IEEE Computer Society, 2023. pp. 558-563 (Proceedings - International Computer Software and Applications Conference).

@inproceedings{1c3f4c51b24f48cf8789deb32ef911ce,

title = "Transformer-based Automatic Mapping of Clinical Notes to Specific Clinical Concepts",

abstract = "A significant proportion of medical errors exist in crucial medical information, and most stem from misinterpreting non-standardized clinical notes. This research compares four transformer-based models namely: BERT (Bidirectional Encoder Representations from Transformers) Base Uncased, Emilyalsentzer Bio-ClinicalBERT, RoBERTa (Robustly Optimized BERT Pre-Training Approach), and DeBERTa (Decoding-enhanced BERT with disentangled attention) to determine which among the four is the best backbone model for mapping free text in clinical notes to specific clinical concepts. Besides, the impact of context-specific embeddings on BERT was also studied to determine the need for a clinical BERT in Clinical Skills exam scoring. This research proposes the use of DeBERTa as a backbone model in patient note scoring for the United States Medical Licensing Examination (USMLE) Clinical Skills exam after comparing it with three other transformer models. Disentangled attention and enhanced mask decoder integrated into DeBERTa were credited for its high performance. Besides, the effect of meta pseudo labeling was also investigated in this research, which in turn, further enhanced DeBERTa's performance.",

keywords = "BERT, BERT Base Uncased, DeBERTa, RoBERTa, clinical BERT, meta pseudo labeling",

author = "Jay Ganesh and Ajay Bansal",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 47th IEEE Annual Computers, Software, and Applications Conference, COMPSAC 2023 ; Conference date: 26-06-2023 Through 30-06-2023",

year = "2023",

doi = "10.1109/COMPSAC57700.2023.00080",

language = "English (US)",

series = "Proceedings - International Computer Software and Applications Conference",

publisher = "IEEE Computer Society",

pages = "558--563",

editor = "Hossain Shahriar and Yuuichi Teranishi and Alfredo Cuzzocrea and Moushumi Sharmin and Dave Towey and Majumder, {AKM Jahangir Alam} and Hiroki Kashiwazaki and Ji-Jiang Yang and Michiharu Takemoto and Nazmus Sakib and Ryohei Banno and Ahamed, {Sheikh Iqbal}",

booktitle = "Proceedings - 2023 IEEE 47th Annual Computers, Software, and Applications Conference, COMPSAC 2023",

}

TY - GEN

T1 - Transformer-based Automatic Mapping of Clinical Notes to Specific Clinical Concepts

AU - Ganesh, Jay

AU - Bansal, Ajay

PY - 2023

Y1 - 2023

N2 - A significant proportion of medical errors exist in crucial medical information, and most stem from misinterpreting non-standardized clinical notes. This research compares four transformer-based models namely: BERT (Bidirectional Encoder Representations from Transformers) Base Uncased, Emilyalsentzer Bio-ClinicalBERT, RoBERTa (Robustly Optimized BERT Pre-Training Approach), and DeBERTa (Decoding-enhanced BERT with disentangled attention) to determine which among the four is the best backbone model for mapping free text in clinical notes to specific clinical concepts. Besides, the impact of context-specific embeddings on BERT was also studied to determine the need for a clinical BERT in Clinical Skills exam scoring. This research proposes the use of DeBERTa as a backbone model in patient note scoring for the United States Medical Licensing Examination (USMLE) Clinical Skills exam after comparing it with three other transformer models. Disentangled attention and enhanced mask decoder integrated into DeBERTa were credited for its high performance. Besides, the effect of meta pseudo labeling was also investigated in this research, which in turn, further enhanced DeBERTa's performance.

AB - A significant proportion of medical errors exist in crucial medical information, and most stem from misinterpreting non-standardized clinical notes. This research compares four transformer-based models namely: BERT (Bidirectional Encoder Representations from Transformers) Base Uncased, Emilyalsentzer Bio-ClinicalBERT, RoBERTa (Robustly Optimized BERT Pre-Training Approach), and DeBERTa (Decoding-enhanced BERT with disentangled attention) to determine which among the four is the best backbone model for mapping free text in clinical notes to specific clinical concepts. Besides, the impact of context-specific embeddings on BERT was also studied to determine the need for a clinical BERT in Clinical Skills exam scoring. This research proposes the use of DeBERTa as a backbone model in patient note scoring for the United States Medical Licensing Examination (USMLE) Clinical Skills exam after comparing it with three other transformer models. Disentangled attention and enhanced mask decoder integrated into DeBERTa were credited for its high performance. Besides, the effect of meta pseudo labeling was also investigated in this research, which in turn, further enhanced DeBERTa's performance.

KW - BERT

KW - BERT Base Uncased

KW - DeBERTa

KW - RoBERTa

KW - clinical BERT

KW - meta pseudo labeling

UR - http://www.scopus.com/inward/record.url?scp=85168921695&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85168921695&partnerID=8YFLogxK

U2 - 10.1109/COMPSAC57700.2023.00080

DO - 10.1109/COMPSAC57700.2023.00080

M3 - Conference contribution

AN - SCOPUS:85168921695

T3 - Proceedings - International Computer Software and Applications Conference

SP - 558

EP - 563

BT - Proceedings - 2023 IEEE 47th Annual Computers, Software, and Applications Conference, COMPSAC 2023

A2 - Shahriar, Hossain

A2 - Teranishi, Yuuichi

A2 - Cuzzocrea, Alfredo

A2 - Sharmin, Moushumi

A2 - Towey, Dave

A2 - Majumder, AKM Jahangir Alam

A2 - Kashiwazaki, Hiroki

A2 - Yang, Ji-Jiang

A2 - Takemoto, Michiharu

A2 - Sakib, Nazmus

A2 - Banno, Ryohei

A2 - Ahamed, Sheikh Iqbal

PB - IEEE Computer Society

T2 - 47th IEEE Annual Computers, Software, and Applications Conference, COMPSAC 2023

Y2 - 26 June 2023 through 30 June 2023

ER -

Transformer-based Automatic Mapping of Clinical Notes to Specific Clinical Concepts

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this