The user-language paraphrase corpus

Philip M. McCarthy, Danielle S. McNamara

Research output: Chapter in Book/Report/Conference proceedingChapter

8 Scopus citations


The corpus in this challenge comprises 1998 target-sentence/student response text-pairs, or protocols. The protocols have been evaluated by extensively trained human raters; however, unlike established paraphrase corpora that evaluate paraphrases as either true or false, the User-Language Paraphrase Corpus evaluates protocols along 10 dimensions of paraphrase characteristics on a six point scale. Along with the protocols, the database comprising the challenge includes 10 computational indices that have been used to assess these protocols. The challenge posed for researchers is to describe and assess their own approach (computational or statistical) to evaluating, characterizing, and/or categorizing, any, some, or all of the paraphrase dimensions in this corpus. The purpose of establishing such evaluations of user-language paraphrases is so that ITSs may provide users with accurate assessment and subsequently facilitative feedback, such that the assessment would be comparable to one or more trained human raters. Thus, these evaluations will help to develop the field of natural language assessment and understanding (Rus, McCarthy, McNamara, & Graesser, 2008 [a]).

Original languageEnglish (US)
Title of host publicationCross-Disciplinary Advances in Applied Natural Language Processing
Subtitle of host publicationIssues and Approaches
PublisherIGI Global
Number of pages17
ISBN (Print)9781613504475
StatePublished - 2011
Externally publishedYes

ASJC Scopus subject areas

  • General Computer Science


Dive into the research topics of 'The user-language paraphrase corpus'. Together they form a unique fingerprint.

Cite this