Group-Sparse Matrix Factorization for Transfer Learning of Word Embeddings

Kan Xu, Xuanyi Zhao, Hamsa Bastani, Osbert Bastani

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

Sparse regression has recently been applied to enable transfer learning from very limited data. We study an extension of this approach to unsupervised learning-in particular, learning word embeddings from unstructured text corpora using low-rank matrix factorization. Intuitively, when transferring word embeddings to a new domain, we expect that the embeddings change for only a small number of words-e.g., the ones with novel meanings in that domain. We propose a novel group-sparse penalty that exploits this sparsity to perform transfer learning when there is very little text data available in the target domain-e.g., a single article of text. We prove generalization bounds for our algorithm. Furthermore, we empirically evaluate its effectiveness, both in terms of prediction accuracy in downstream tasks as well as the interpretability of the results.

Original languageEnglish (US)
Title of host publicationProceedings of the 38th International Conference on Machine Learning, ICML 2021
PublisherML Research Press
Pages11603-11612
Number of pages10
ISBN (Electronic)9781713845065
StatePublished - 2021
Externally publishedYes
Event38th International Conference on Machine Learning, ICML 2021 - Virtual, Online
Duration: Jul 18 2021Jul 24 2021

Publication series

NameProceedings of Machine Learning Research
Volume139
ISSN (Electronic)2640-3498

Conference

Conference38th International Conference on Machine Learning, ICML 2021
CityVirtual, Online
Period7/18/217/24/21

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Group-Sparse Matrix Factorization for Transfer Learning of Word Embeddings'. Together they form a unique fingerprint.

Cite this