Predicting math performance using natural language processing tools

Scott Crossley, Ran Liu, Danielle McNamara

Research output: Chapter in Book/Report/Conference proceedingConference contribution

19 Scopus citations


A number of studies have demonstrated links between linguistic knowledge and performance in math. Studies examining these links in first language speakers of English have traditionally relied on correlational analyses between linguistic knowledge tests and standardized math tests. For second language (L2) speakers, the majority of studies have compared math performance between proficient and non-proficient speakers of English. In this study, we take a novel approach and examine the linguistic features of student language while they are engaged in collaborative problem solving within an on-line math tutoring system. We transcribe the students' speech and use natural language processing tools to extract linguistic information related to text cohesion, lexical sophistication, and sentiment. Our criterion variables are individuals' pretest and posttest math performance scores. In addition to examining relations between linguistic features of student language production and math scores, we also control for a number of non-linguistic factors including gender, age, grade, school, and content focus (procedural versus conceptual). Linear mixed effect modeling indicates that non-linguistic factors are not predictive of math scores. However, linguistic features related to cohesion affect and lexical proficiency explained approximately 30% of the variance (R2 =.303) in the math scores.

Original languageEnglish (US)
Title of host publicationLAK 2017 Conference Proceedings - 7th International Learning Analytics and Knowledge Conference
Subtitle of host publicationUnderstanding, Informing and Improving Learning with Data
PublisherAssociation for Computing Machinery
Number of pages9
ISBN (Electronic)9781450348706
StatePublished - Mar 13 2017
Event7th International Conference on Learning Analytics and Knowledge, LAK 2017 - Vancouver, Canada
Duration: Mar 13 2017Mar 17 2017

Publication series

NameACM International Conference Proceeding Series


Other7th International Conference on Learning Analytics and Knowledge, LAK 2017


  • Educational data mining
  • Natural language processing
  • On-line tutoring systems
  • Predictive analytics
  • Sentiment analysis

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications


Dive into the research topics of 'Predicting math performance using natural language processing tools'. Together they form a unique fingerprint.

Cite this