Multilingual Age of Exposure

Robert Mihai Botarleanu, Mihai Dascalu, Micah Watanabe, Danielle S. McNamara, Scott Andrew Crossley

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

The ability to objectively quantify the complexity of a text can be a useful indicator of how likely learners of a given level will comprehend it. Before creating more complex models of assessing text difficulty, the basic building block of a text consists of words and, inherently, its overall difficulty is greatly influenced by the complexity of underlying words. One approach is to measure a word’s Age of Acquisition (AoA), an estimate of the average age at which a speaker of a language understands the semantics of a specific word. Age of Exposure (AoE) statistically models the process of word learning, and in turn an estimate of a given word’s AoA. In this paper, we expand on the model proposed by AoE by training regression models that learn and generalize AoA word lists across multiple languages including English, German, French, and Spanish. Our approach allows for the estimation of AoA scores for words that are not found in the original lists, up to the majority of the target language’s vocabulary. Our method can be uniformly applied across multiple languages though the usage of parallel corpora and helps bridge the gap in the size of AoA word lists available for non-English languages. This effort is particularly important for efforts toward extending AI to languages with fewer resources and benchmarked corpora.

Original languageEnglish (US)
Title of host publicationArtificial Intelligence in Education - 22nd International Conference, AIED 2021, Proceedings
EditorsIdo Roll, Danielle McNamara, Sergey Sosnovsky, Rose Luckin, Vania Dimitrova
PublisherSpringer Science and Business Media Deutschland GmbH
Pages77-87
Number of pages11
ISBN (Print)9783030782917
DOIs
StatePublished - 2021
Event22nd International Conference on Artificial Intelligence in Education, AIED 2021 - Virtual, Online
Duration: Jun 14 2021Jun 18 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12748 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference22nd International Conference on Artificial Intelligence in Education, AIED 2021
CityVirtual, Online
Period6/14/216/18/21

Keywords

  • Age of acquisition
  • Age of exposure
  • Multilingual
  • Natural language processing

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Multilingual Age of Exposure'. Together they form a unique fingerprint.

Cite this