Automated Paraphrase Quality Assessment Using Recurrent Neural Networks and Language Models

Bogdan Nicula, Mihai Dascalu, Natalie Newton, Ellen Orcutt, Danielle S. McNamara

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Scopus citations

Abstract

The ability to automatically assess the quality of paraphrases can be very useful for facilitating literacy skills and providing timely feedback to learners. Our aim is twofold: a) to automatically evaluate the quality of paraphrases across four dimensions: lexical similarity, syntactic similarity, semantic similarity and paraphrase quality, and b) to assess how well models trained for this task generalize. The task is modeled as a classification problem and three different methods are explored: a) manual feature extraction combined with an Extra Trees model, b) GloVe embeddings and a Siamese neural network, and c) using a pretrained BERT model fine-tuned on our task. Starting from a dataset of 1998 paraphrases from the User Language Paraphrase Corpus (ULPC), we explore how the three models trained on the ULPC dataset generalize when applied on a separate, small paraphrase corpus based on children inputs. The best out-of-the-box generalization performance is obtained by the Extra Trees model with at least 75% average F1-scores for the three similarity dimensions. We also show that the Siamese neural network and BERT models can obtain an improvement of at least 5% after fine-tuning across all dimensions.

Original languageEnglish (US)
Title of host publicationIntelligent Tutoring Systems - 17th International Conference, ITS 2021, Proceedings
EditorsAlexandra I. Cristea, Christos Troussas
PublisherSpringer Science and Business Media Deutschland GmbH
Pages333-340
Number of pages8
ISBN (Print)9783030804206
DOIs
StatePublished - 2021
Event17th International Conference on Intelligent Tutoring Systems, ITS 2021 - Virtual, Online
Duration: Jun 7 2021Jun 11 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12677 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference17th International Conference on Intelligent Tutoring Systems, ITS 2021
CityVirtual, Online
Period6/7/216/11/21

Keywords

  • Language models
  • Natural language processing
  • Paraphrase quality assessment
  • Recurrent neural networks

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Automated Paraphrase Quality Assessment Using Recurrent Neural Networks and Language Models'. Together they form a unique fingerprint.

Cite this