Natural language processing in an intelligent writing strategy tutoring system

Danielle McNamara; Scott A. Crossley; Rod Roscoe

doi:10.3758/s13428-012-0258-1

Natural language processing in an intelligent writing strategy tutoring system

Danielle McNamara, Scott A. Crossley, Rod Roscoe

Research output: Contribution to journal › Article › peer-review

120 Scopus citations

Abstract

The Writing Pal is an intelligent tutoring system that provides writing strategy training. A large part of its artificial intelligence resides in the natural language processing algorithms to assess essay quality and guide feedback to students. Because writing is often highly nuanced and subjective, the development of these algorithms must consider a broad array of linguistic, rhetorical, and contextual features. This study assesses the potential for computational indices to predict human ratings of essay quality. Past studies have demonstrated that linguistic indices related to lexical diversity, word frequency, and syntactic complexity are significant predictors of human judgments of essay quality but that indices of cohesion are not. The present study extends prior work by including a larger data sample and an expanded set of indices to assess new lexical, syntactic, cohesion, rhetorical, and reading ease indices. Three models were assessed. The model reported by McNamara, Crossley, and McCarthy (Written Communication 27:57-86, 2010) including three indices of lexical diversity, word frequency, and syntactic complexity accounted for only 6 % of the variance in the larger data set. A regression model including the full set of indices examined in prior studies of writing predicted 38 % of the variance in human scores of essay quality with 91 % adjacent accuracy (i.e., within 1 point). A regression model that also included new indices related to rhetoric and cohesion predicted 44 % of the variance with 94 % adjacent accuracy. The new indices increased accuracy but, more importantly, afford the means to provide more meaningful feedback in the context of a writing tutoring system.

Original language	English (US)
Pages (from-to)	499-515
Number of pages	17
Journal	Behavior Research Methods
Volume	45
Issue number	2
DOIs	https://doi.org/10.3758/s13428-012-0258-1
State	Published - Jun 2013

Keywords

Automated essay scoring
Computational linguistics
Corpus linguistics
Intelligent tutoring systems
Natural language processing
Writing pedagogy

ASJC Scopus subject areas

Experimental and Cognitive Psychology
Developmental and Educational Psychology
Arts and Humanities (miscellaneous)
Psychology (miscellaneous)
General Psychology

Access to Document

10.3758/s13428-012-0258-1

Cite this

@article{d13573887f53478888bae7f0d42a406d,

title = "Natural language processing in an intelligent writing strategy tutoring system",

abstract = "The Writing Pal is an intelligent tutoring system that provides writing strategy training. A large part of its artificial intelligence resides in the natural language processing algorithms to assess essay quality and guide feedback to students. Because writing is often highly nuanced and subjective, the development of these algorithms must consider a broad array of linguistic, rhetorical, and contextual features. This study assesses the potential for computational indices to predict human ratings of essay quality. Past studies have demonstrated that linguistic indices related to lexical diversity, word frequency, and syntactic complexity are significant predictors of human judgments of essay quality but that indices of cohesion are not. The present study extends prior work by including a larger data sample and an expanded set of indices to assess new lexical, syntactic, cohesion, rhetorical, and reading ease indices. Three models were assessed. The model reported by McNamara, Crossley, and McCarthy (Written Communication 27:57-86, 2010) including three indices of lexical diversity, word frequency, and syntactic complexity accounted for only 6 % of the variance in the larger data set. A regression model including the full set of indices examined in prior studies of writing predicted 38 % of the variance in human scores of essay quality with 91 % adjacent accuracy (i.e., within 1 point). A regression model that also included new indices related to rhetoric and cohesion predicted 44 % of the variance with 94 % adjacent accuracy. The new indices increased accuracy but, more importantly, afford the means to provide more meaningful feedback in the context of a writing tutoring system.",

keywords = "Automated essay scoring, Computational linguistics, Corpus linguistics, Intelligent tutoring systems, Natural language processing, Writing pedagogy",

author = "Danielle McNamara and Crossley, {Scott A.} and Rod Roscoe",

note = "Funding Information: This research was supported in part by the Institute for Education Sciences (IES R305A080589 and IES R305G20018-02). Ideas expressed in this material are those of the authors and do not necessarily reflect the views of the IES. We are thankful to the members of the Writing Pal project who have contributed feedback to various aspects of this study and other studies that have led to this study. We are particularly thankful to Zhiqiang Cai and Art Graesser. We also thank Russell Brandon, Laura Varner, and Jen Weston, as well as Brad Campbell, Daniel White, Steve Chrestman, Michael Kardos, Becky Hagenston, LaToya Bogards, Ashley Leonard, and Marty Price, who scored the essays in this study.",

year = "2013",

month = jun,

doi = "10.3758/s13428-012-0258-1",

language = "English (US)",

volume = "45",

pages = "499--515",

journal = "Behavior Research Methods",

issn = "1554-351X",

publisher = "Springer New York",

number = "2",

}

TY - JOUR

T1 - Natural language processing in an intelligent writing strategy tutoring system

AU - McNamara, Danielle

AU - Crossley, Scott A.

AU - Roscoe, Rod

N1 - Funding Information: This research was supported in part by the Institute for Education Sciences (IES R305A080589 and IES R305G20018-02). Ideas expressed in this material are those of the authors and do not necessarily reflect the views of the IES. We are thankful to the members of the Writing Pal project who have contributed feedback to various aspects of this study and other studies that have led to this study. We are particularly thankful to Zhiqiang Cai and Art Graesser. We also thank Russell Brandon, Laura Varner, and Jen Weston, as well as Brad Campbell, Daniel White, Steve Chrestman, Michael Kardos, Becky Hagenston, LaToya Bogards, Ashley Leonard, and Marty Price, who scored the essays in this study.

PY - 2013/6

Y1 - 2013/6

N2 - The Writing Pal is an intelligent tutoring system that provides writing strategy training. A large part of its artificial intelligence resides in the natural language processing algorithms to assess essay quality and guide feedback to students. Because writing is often highly nuanced and subjective, the development of these algorithms must consider a broad array of linguistic, rhetorical, and contextual features. This study assesses the potential for computational indices to predict human ratings of essay quality. Past studies have demonstrated that linguistic indices related to lexical diversity, word frequency, and syntactic complexity are significant predictors of human judgments of essay quality but that indices of cohesion are not. The present study extends prior work by including a larger data sample and an expanded set of indices to assess new lexical, syntactic, cohesion, rhetorical, and reading ease indices. Three models were assessed. The model reported by McNamara, Crossley, and McCarthy (Written Communication 27:57-86, 2010) including three indices of lexical diversity, word frequency, and syntactic complexity accounted for only 6 % of the variance in the larger data set. A regression model including the full set of indices examined in prior studies of writing predicted 38 % of the variance in human scores of essay quality with 91 % adjacent accuracy (i.e., within 1 point). A regression model that also included new indices related to rhetoric and cohesion predicted 44 % of the variance with 94 % adjacent accuracy. The new indices increased accuracy but, more importantly, afford the means to provide more meaningful feedback in the context of a writing tutoring system.

AB - The Writing Pal is an intelligent tutoring system that provides writing strategy training. A large part of its artificial intelligence resides in the natural language processing algorithms to assess essay quality and guide feedback to students. Because writing is often highly nuanced and subjective, the development of these algorithms must consider a broad array of linguistic, rhetorical, and contextual features. This study assesses the potential for computational indices to predict human ratings of essay quality. Past studies have demonstrated that linguistic indices related to lexical diversity, word frequency, and syntactic complexity are significant predictors of human judgments of essay quality but that indices of cohesion are not. The present study extends prior work by including a larger data sample and an expanded set of indices to assess new lexical, syntactic, cohesion, rhetorical, and reading ease indices. Three models were assessed. The model reported by McNamara, Crossley, and McCarthy (Written Communication 27:57-86, 2010) including three indices of lexical diversity, word frequency, and syntactic complexity accounted for only 6 % of the variance in the larger data set. A regression model including the full set of indices examined in prior studies of writing predicted 38 % of the variance in human scores of essay quality with 91 % adjacent accuracy (i.e., within 1 point). A regression model that also included new indices related to rhetoric and cohesion predicted 44 % of the variance with 94 % adjacent accuracy. The new indices increased accuracy but, more importantly, afford the means to provide more meaningful feedback in the context of a writing tutoring system.

KW - Automated essay scoring

KW - Computational linguistics

KW - Corpus linguistics

KW - Intelligent tutoring systems

KW - Natural language processing

KW - Writing pedagogy

UR - http://www.scopus.com/inward/record.url?scp=84878218112&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84878218112&partnerID=8YFLogxK

U2 - 10.3758/s13428-012-0258-1

DO - 10.3758/s13428-012-0258-1

M3 - Article

C2 - 23055164

AN - SCOPUS:84878218112

SN - 1554-351X

VL - 45

SP - 499

EP - 515

JO - Behavior Research Methods

JF - Behavior Research Methods

IS - 2

ER -

Natural language processing in an intelligent writing strategy tutoring system

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this