The components of paraphrase evaluations

Philip M. McCarthy, Rebekah H. Guess, Danielle S. McNamara

Research output: Contribution to journalArticlepeer-review

38 Scopus citations


Two sentences are paraphrases if their meanings are equivalent but their words and syntax are different. Paraphrasing can be used to aid comprehension, stimulate prior knowledge, and assist in writing-skills development. As such, paraphrasing is a feature of fields as diverse as discourse psychology, composition, and computer science. Although automated paraphrase assessment is both commonplace and useful, research has centered solely on artificial, edited paraphrases and has used only binary dimensions (i.e., is or is not a paraphrase). In this study, we use an extensive database (N 5 1,998) of natural paraphrases generated by high school students that have been assessed along 10 dimensions (e.g., semantic completeness, lexical similarity, syntactical similarity). This study investigates the components of paraphrase quality emerging from these dimensions and examines whether computational approaches can simulate those human evaluations. The results suggest that semantic and syntactic evaluations are the primary components of paraphrase quality, and that computationally light systems such as latent semantic analysis (semantics) and minimal edit distances (syntax) present promising approaches to simulating human evaluations of paraphrases.

Original languageEnglish (US)
Pages (from-to)682-690
Number of pages9
JournalBehavior Research Methods
Issue number3
StatePublished - Aug 2009
Externally publishedYes

ASJC Scopus subject areas

  • Experimental and Cognitive Psychology
  • Developmental and Educational Psychology
  • Arts and Humanities (miscellaneous)
  • Psychology (miscellaneous)
  • Psychology(all)


Dive into the research topics of 'The components of paraphrase evaluations'. Together they form a unique fingerprint.

Cite this