TY - JOUR
T1 - Multilingual Age of Exposure 2.0
AU - Botarleanu, Robert Mihai
AU - Watanabe, Micah
AU - Dascalu, Mihai
AU - Crossley, Scott A.
AU - McNamara, Danielle S.
N1 - Publisher Copyright:
© 2023, International Artificial Intelligence in Education Society.
PY - 2023
Y1 - 2023
N2 - Age of Acquisition (AoA) scores approximate the age at which a language speaker fully understands a word’s semantic meaning and represent a quantitative measure of the relative difficulty of words in a language. AoA word lists exist across various languages, with English having the most complete lists that capture the largest percentage of the vocabulary. In contrast, other languages have smaller lists making large-scale analyses difficult. Given the usefulness of AoA scores, methods have been developed to leverage the use of Machine Learning models to estimate AoA scores automatically through Age of Exposure (AoE) scores for the entire vocabulary of a language. These generated AoE scores use simulated learning trajectories to evaluate properties similar to AoA. In this work, we propose a method that leverages the greater size of existing English AoA lists to improve the performance of AoE prediction models for other languages. Our main contributions are threefold. First, we introduce a novel AoE regression architecture that uses a Recurrent Neural Network applied to the simulated word exposure trajectories. Second, we consider word embeddings projected into a unified multilingual space. Third, we apply transfer learning on the English AoE regressor to improve the performance of non-English AoE regressors. We show that AoA lists across languages share inherent similarities that enable Machine Learning models to transfer insights from one language to another, thus diminishing the effect of the smaller sample sizes for non-English languages.
AB - Age of Acquisition (AoA) scores approximate the age at which a language speaker fully understands a word’s semantic meaning and represent a quantitative measure of the relative difficulty of words in a language. AoA word lists exist across various languages, with English having the most complete lists that capture the largest percentage of the vocabulary. In contrast, other languages have smaller lists making large-scale analyses difficult. Given the usefulness of AoA scores, methods have been developed to leverage the use of Machine Learning models to estimate AoA scores automatically through Age of Exposure (AoE) scores for the entire vocabulary of a language. These generated AoE scores use simulated learning trajectories to evaluate properties similar to AoA. In this work, we propose a method that leverages the greater size of existing English AoA lists to improve the performance of AoE prediction models for other languages. Our main contributions are threefold. First, we introduce a novel AoE regression architecture that uses a Recurrent Neural Network applied to the simulated word exposure trajectories. Second, we consider word embeddings projected into a unified multilingual space. Third, we apply transfer learning on the English AoE regressor to improve the performance of non-English AoE regressors. We show that AoA lists across languages share inherent similarities that enable Machine Learning models to transfer insights from one language to another, thus diminishing the effect of the smaller sample sizes for non-English languages.
KW - Age of Acquisition
KW - Age of Exposure
KW - Multi-lingual Model
KW - Natural Language Processing
KW - Transfer Learning
UR - http://www.scopus.com/inward/record.url?scp=85180170897&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85180170897&partnerID=8YFLogxK
U2 - 10.1007/s40593-023-00386-7
DO - 10.1007/s40593-023-00386-7
M3 - Article
AN - SCOPUS:85180170897
SN - 1560-4292
JO - International Journal of Artificial Intelligence in Education
JF - International Journal of Artificial Intelligence in Education
ER -