TY - GEN
T1 - Group-Sparse Matrix Factorization for Transfer Learning of Word Embeddings
AU - Xu, Kan
AU - Zhao, Xuanyi
AU - Bastani, Hamsa
AU - Bastani, Osbert
N1 - Publisher Copyright:
Copyright © 2021 by the author(s)
PY - 2021
Y1 - 2021
N2 - Sparse regression has recently been applied to enable transfer learning from very limited data. We study an extension of this approach to unsupervised learning-in particular, learning word embeddings from unstructured text corpora using low-rank matrix factorization. Intuitively, when transferring word embeddings to a new domain, we expect that the embeddings change for only a small number of words-e.g., the ones with novel meanings in that domain. We propose a novel group-sparse penalty that exploits this sparsity to perform transfer learning when there is very little text data available in the target domain-e.g., a single article of text. We prove generalization bounds for our algorithm. Furthermore, we empirically evaluate its effectiveness, both in terms of prediction accuracy in downstream tasks as well as the interpretability of the results.
AB - Sparse regression has recently been applied to enable transfer learning from very limited data. We study an extension of this approach to unsupervised learning-in particular, learning word embeddings from unstructured text corpora using low-rank matrix factorization. Intuitively, when transferring word embeddings to a new domain, we expect that the embeddings change for only a small number of words-e.g., the ones with novel meanings in that domain. We propose a novel group-sparse penalty that exploits this sparsity to perform transfer learning when there is very little text data available in the target domain-e.g., a single article of text. We prove generalization bounds for our algorithm. Furthermore, we empirically evaluate its effectiveness, both in terms of prediction accuracy in downstream tasks as well as the interpretability of the results.
UR - http://www.scopus.com/inward/record.url?scp=85126710944&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85126710944&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85126710944
T3 - Proceedings of Machine Learning Research
SP - 11603
EP - 11612
BT - Proceedings of the 38th International Conference on Machine Learning, ICML 2021
PB - ML Research Press
T2 - 38th International Conference on Machine Learning, ICML 2021
Y2 - 18 July 2021 through 24 July 2021
ER -