TY - GEN
T1 - Active Learning Framework for Cost-Effective TCR-Epitope Binding Affinity Prediction
AU - Zhang, Pengfei
AU - Bang, Seojin
AU - Lee, Heewook
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - T cell receptors (TCRs) are critical components of adaptive immune systems, responsible for responding to threats by recognizing epitope sequences presented on host cell surface. Computational prediction of binding affinity between TCRs and epitope sequences using machine/deep learning has attracted intense attention recently. However, its success is hindered by the lack of large collections of annotated TCR-epitope pairs. Annotating their binding affinity requires expensive and time-consuming wet-lab evaluation. To reduce annotation cost, we present ActiveTCR, a framework that incorporates active learning and TCR-epitope binding affinity prediction models. Starting with a small set of labeled training pairs, ActiveTCR iteratively searches for unlabeled TCR-epitope pairs that are "worthy"for annotation. It aims to maximize performance gains while minimizing the cost of annotation. We compared four query strategies with a random sampling baseline and demonstrated that ActiveTCR reduces annotation costs by approximately 40%. Furthermore, we showed that providing ground truth labels of TCR-epitope pairs to query strategies can help identify and reduce more than 40% redundancy among already annotated pairs without compromising model performance, enabling users to train equally powerful prediction models with less training data. Our work is the first systematic investigation of data optimization for TCR-epitope binding affinity prediction.
AB - T cell receptors (TCRs) are critical components of adaptive immune systems, responsible for responding to threats by recognizing epitope sequences presented on host cell surface. Computational prediction of binding affinity between TCRs and epitope sequences using machine/deep learning has attracted intense attention recently. However, its success is hindered by the lack of large collections of annotated TCR-epitope pairs. Annotating their binding affinity requires expensive and time-consuming wet-lab evaluation. To reduce annotation cost, we present ActiveTCR, a framework that incorporates active learning and TCR-epitope binding affinity prediction models. Starting with a small set of labeled training pairs, ActiveTCR iteratively searches for unlabeled TCR-epitope pairs that are "worthy"for annotation. It aims to maximize performance gains while minimizing the cost of annotation. We compared four query strategies with a random sampling baseline and demonstrated that ActiveTCR reduces annotation costs by approximately 40%. Furthermore, we showed that providing ground truth labels of TCR-epitope pairs to query strategies can help identify and reduce more than 40% redundancy among already annotated pairs without compromising model performance, enabling users to train equally powerful prediction models with less training data. Our work is the first systematic investigation of data optimization for TCR-epitope binding affinity prediction.
KW - Active Learning
KW - TCR-epitope Binding Affinity
UR - http://www.scopus.com/inward/record.url?scp=85184878590&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85184878590&partnerID=8YFLogxK
U2 - 10.1109/BIBM58861.2023.10385683
DO - 10.1109/BIBM58861.2023.10385683
M3 - Conference contribution
AN - SCOPUS:85184878590
T3 - Proceedings - 2023 2023 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2023
SP - 988
EP - 993
BT - Proceedings - 2023 2023 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2023
A2 - Jiang, Xingpeng
A2 - Wang, Haiying
A2 - Alhajj, Reda
A2 - Hu, Xiaohua
A2 - Engel, Felix
A2 - Mahmud, Mufti
A2 - Pisanti, Nadia
A2 - Cui, Xuefeng
A2 - Song, Hong
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2023
Y2 - 5 December 2023 through 8 December 2023
ER -