TY - GEN
T1 - Automated Paraphrase Quality Assessment Using Recurrent Neural Networks and Language Models
AU - Nicula, Bogdan
AU - Dascalu, Mihai
AU - Newton, Natalie
AU - Orcutt, Ellen
AU - McNamara, Danielle S.
N1 - Funding Information:
Acknowledgments. The work was funded by a grant of the Romanian National Authority for Scientific Research and Innovation, CNCS – UEFISCDI, project number TE 70 PN-III-P1-1.1-TE-2019-2209, ATES – “Automated Text Evaluation and Simplification”. This research was also supported in part by the Institute of Education Sciences (R305A190063 and R305A190050) and the Office of Naval Research (N00014-17-1-2300 and N00014-19-1-2424). The opinions expressed are those of the authors and do not represent views of the IES or ONR.
Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - The ability to automatically assess the quality of paraphrases can be very useful for facilitating literacy skills and providing timely feedback to learners. Our aim is twofold: a) to automatically evaluate the quality of paraphrases across four dimensions: lexical similarity, syntactic similarity, semantic similarity and paraphrase quality, and b) to assess how well models trained for this task generalize. The task is modeled as a classification problem and three different methods are explored: a) manual feature extraction combined with an Extra Trees model, b) GloVe embeddings and a Siamese neural network, and c) using a pretrained BERT model fine-tuned on our task. Starting from a dataset of 1998 paraphrases from the User Language Paraphrase Corpus (ULPC), we explore how the three models trained on the ULPC dataset generalize when applied on a separate, small paraphrase corpus based on children inputs. The best out-of-the-box generalization performance is obtained by the Extra Trees model with at least 75% average F1-scores for the three similarity dimensions. We also show that the Siamese neural network and BERT models can obtain an improvement of at least 5% after fine-tuning across all dimensions.
AB - The ability to automatically assess the quality of paraphrases can be very useful for facilitating literacy skills and providing timely feedback to learners. Our aim is twofold: a) to automatically evaluate the quality of paraphrases across four dimensions: lexical similarity, syntactic similarity, semantic similarity and paraphrase quality, and b) to assess how well models trained for this task generalize. The task is modeled as a classification problem and three different methods are explored: a) manual feature extraction combined with an Extra Trees model, b) GloVe embeddings and a Siamese neural network, and c) using a pretrained BERT model fine-tuned on our task. Starting from a dataset of 1998 paraphrases from the User Language Paraphrase Corpus (ULPC), we explore how the three models trained on the ULPC dataset generalize when applied on a separate, small paraphrase corpus based on children inputs. The best out-of-the-box generalization performance is obtained by the Extra Trees model with at least 75% average F1-scores for the three similarity dimensions. We also show that the Siamese neural network and BERT models can obtain an improvement of at least 5% after fine-tuning across all dimensions.
KW - Language models
KW - Natural language processing
KW - Paraphrase quality assessment
KW - Recurrent neural networks
UR - http://www.scopus.com/inward/record.url?scp=85112250426&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85112250426&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-80421-3_36
DO - 10.1007/978-3-030-80421-3_36
M3 - Conference contribution
AN - SCOPUS:85112250426
SN - 9783030804206
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 333
EP - 340
BT - Intelligent Tutoring Systems - 17th International Conference, ITS 2021, Proceedings
A2 - Cristea, Alexandra I.
A2 - Troussas, Christos
PB - Springer Science and Business Media Deutschland GmbH
T2 - 17th International Conference on Intelligent Tutoring Systems, ITS 2021
Y2 - 7 June 2021 through 11 June 2021
ER -