Predicting the proficiency level of language learners using lexical indices

Scott A. Crossley, Tom Salsbury, Danielle S. McNamara

Research output: Contribution to journalArticlepeer-review

79 Scopus citations


This study explores how second language (L2) texts written by learners at various proficiency levels can be classified using computational indices that characterize lexical competence. For this study, 100 writing samples taken from 100 L2 learners were analyzed using lexical indices reported by the computational tool Coh-Metrix. The L2 writing samples were categorized into beginning, intermediate, and advanced groupings based on the TOEFL and ACT ESL Compass scores of the writer. A discriminant function analysis was used to predict the level categorization of the texts using lexical indices related to breadth of lexical knowledge (word frequency, lexical diversity), depth of lexical knowledge (hypernymy, polysemy, semantic co-referentiality, and word meaningfulness), and access to core lexical items (word concreteness, familiarity, and imagability). The strongest predictors of an individual's proficiency level were word agability, word frequency, lexical diversity, and word familiarity. In total, the indices correctly classified 70% of the texts based on proficiency level in both a training and a test set. The authors argue for the applicability of a statistical model as a method to investigate lexical competence across language levels, as a method to assess L2 lexical development, and as a method to classify L2 proficiency.

Original languageEnglish (US)
Pages (from-to)243-263
Number of pages21
JournalLanguage Testing
Issue number2
StatePublished - Apr 2012
Externally publishedYes


  • frequency
  • language proficiency
  • lexical competence
  • lexical diversity
  • second language acquisition
  • word familiarity
  • word imagability

ASJC Scopus subject areas

  • Language and Linguistics
  • Social Sciences (miscellaneous)
  • Linguistics and Language


Dive into the research topics of 'Predicting the proficiency level of language learners using lexical indices'. Together they form a unique fingerprint.

Cite this