Multilingual Age of Exposure 2.0

Robert Mihai Botarleanu, Micah Watanabe, Mihai Dascalu, Scott A. Crossley, Danielle S. McNamara

Research output: Contribution to journalArticlepeer-review

Abstract

Age of Acquisition (AoA) scores approximate the age at which a language speaker fully understands a word’s semantic meaning and represent a quantitative measure of the relative difficulty of words in a language. AoA word lists exist across various languages, with English having the most complete lists that capture the largest percentage of the vocabulary. In contrast, other languages have smaller lists making large-scale analyses difficult. Given the usefulness of AoA scores, methods have been developed to leverage the use of Machine Learning models to estimate AoA scores automatically through Age of Exposure (AoE) scores for the entire vocabulary of a language. These generated AoE scores use simulated learning trajectories to evaluate properties similar to AoA. In this work, we propose a method that leverages the greater size of existing English AoA lists to improve the performance of AoE prediction models for other languages. Our main contributions are threefold. First, we introduce a novel AoE regression architecture that uses a Recurrent Neural Network applied to the simulated word exposure trajectories. Second, we consider word embeddings projected into a unified multilingual space. Third, we apply transfer learning on the English AoE regressor to improve the performance of non-English AoE regressors. We show that AoA lists across languages share inherent similarities that enable Machine Learning models to transfer insights from one language to another, thus diminishing the effect of the smaller sample sizes for non-English languages.

Original languageEnglish (US)
JournalInternational Journal of Artificial Intelligence in Education
DOIs
StateAccepted/In press - 2023

Keywords

  • Age of Acquisition
  • Age of Exposure
  • Multi-lingual Model
  • Natural Language Processing
  • Transfer Learning

ASJC Scopus subject areas

  • Education
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Multilingual Age of Exposure 2.0'. Together they form a unique fingerprint.

Cite this