TY - GEN
T1 - Choose Your QA Model Wisely
T2 - 1st Workshop on Semiparametric Methods in NLP: Decoupling Logic from Knowledge, Spa-NLP 2022
AU - Luo, Man
AU - Hashimoto, Kazuma
AU - Yavuz, Semih
AU - Liu, Zhiwei
AU - Baral, Chitta
AU - Zhou, Yingbo
N1 - Publisher Copyright:
© 2022 Association for Computational Linguistics.
PY - 2022
Y1 - 2022
N2 - While both extractive and generative readers have been successfully applied to the Question Answering (QA) task, little attention has been paid toward the systematic comparison of them. Characterizing the strengths and weaknesses of the two readers is crucial not only for making a more informed reader selection in practice but also for developing a deeper understanding to foster further research on improving readers in a principled manner. Motivated by this goal, we make the first attempt to systematically study the comparison of extractive and generative readers for question answering. To be aligned with the state-of-the-art, we explore nine transformer-based large pre-trained language models (PrLMs) as backbone architectures. Furthermore, we organize our findings under two main categories: (1) keeping the architecture invariant, and (2) varying the underlying PrLMs. Among several interesting findings, it is important to highlight that (1) the generative readers perform better in long context QA, (2) the extractive readers perform better in short context while also showing better out-of-domain generalization, and (3) the encoder of encoder-decoder PrLMs (e.g., T5) turns out to be a strong extractive reader and outperforms the standard choice of encoder-only PrLMs (e.g., RoBERTa). We also study the effect of multi-task learning on the two types of readers varying the underlying PrLMs and perform qualitative and quantitative diagnosis to provide further insights into future directions in modeling better readers.
AB - While both extractive and generative readers have been successfully applied to the Question Answering (QA) task, little attention has been paid toward the systematic comparison of them. Characterizing the strengths and weaknesses of the two readers is crucial not only for making a more informed reader selection in practice but also for developing a deeper understanding to foster further research on improving readers in a principled manner. Motivated by this goal, we make the first attempt to systematically study the comparison of extractive and generative readers for question answering. To be aligned with the state-of-the-art, we explore nine transformer-based large pre-trained language models (PrLMs) as backbone architectures. Furthermore, we organize our findings under two main categories: (1) keeping the architecture invariant, and (2) varying the underlying PrLMs. Among several interesting findings, it is important to highlight that (1) the generative readers perform better in long context QA, (2) the extractive readers perform better in short context while also showing better out-of-domain generalization, and (3) the encoder of encoder-decoder PrLMs (e.g., T5) turns out to be a strong extractive reader and outperforms the standard choice of encoder-only PrLMs (e.g., RoBERTa). We also study the effect of multi-task learning on the two types of readers varying the underlying PrLMs and perform qualitative and quantitative diagnosis to provide further insights into future directions in modeling better readers.
UR - http://www.scopus.com/inward/record.url?scp=85134654601&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85134654601&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85134654601
T3 - Spa-NLP 2022 - 1st Workshop on Semiparametric Methods in NLP: Decoupling Logic from Knowledge, Proceedings of the Workshop
SP - 7
EP - 22
BT - Spa-NLP 2022 - 1st Workshop on Semiparametric Methods in NLP
A2 - Das, Rajarshi
A2 - Lewis, Patrick
A2 - Min, Sewon
A2 - Thai, June
A2 - Zaheer, Manzil
PB - Association for Computational Linguistics (ACL)
Y2 - 27 May 2022
ER -