TY - GEN
T1 - Choosing the Right Words
T2 - 2nd Joint Conference on Lexical and Computational Semantics, *SEM 2013
AU - Andrew Schwartz, H.
AU - Eichstaedt, Johannes
AU - Dziurzynski, Lukasz
AU - Blanco, Eduardo
AU - Kern, Margaret L.
AU - Ramones, Stephanie
AU - Seligman, Martin
AU - Ungar, Lyle
N1 - Funding Information:
Support for this research was provided by the Robert Wood Johnson Foundation’s Pioneer Portfolio, through a grant to Martin Seligman, “Exploring Concepts of Positive Health”. We thank the reviewers for their constructive and insightful comments.
Publisher Copyright:
c 2013 Association for Computational Linguistics
PY - 2013
Y1 - 2013
N2 - Social scientists are increasingly using the vast amount of text available on social media to measure variation in happiness and other psychological states. Such studies count words deemed to be indicators of happiness and track how the word frequencies change across locations or time. This word count approach is simple and scalable, yet often picks up false signals, as words can appear in different contexts and take on different meanings. We characterize the types of errors that occur using the word count approach, and find lexical ambiguity to be the most prevalent. We then show that one can reduce error with a simple refinement to such lexica by automatically eliminating highly ambiguous words. The resulting refined lexica improve precision as measured by human judgments of word occurrences in Facebook posts.
AB - Social scientists are increasingly using the vast amount of text available on social media to measure variation in happiness and other psychological states. Such studies count words deemed to be indicators of happiness and track how the word frequencies change across locations or time. This word count approach is simple and scalable, yet often picks up false signals, as words can appear in different contexts and take on different meanings. We characterize the types of errors that occur using the word count approach, and find lexical ambiguity to be the most prevalent. We then show that one can reduce error with a simple refinement to such lexica by automatically eliminating highly ambiguous words. The resulting refined lexica improve precision as measured by human judgments of word occurrences in Facebook posts.
UR - http://www.scopus.com/inward/record.url?scp=84930612773&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84930612773&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84930612773
T3 - *SEM 2013 - 2nd Joint Conference on Lexical and Computational Semantics
SP - 296
EP - 305
BT - *SEM 2013 - 2nd Joint Conference on Lexical and Computational Semantics
PB - Association for Computational Linguistics (ACL)
Y2 - 13 June 2013 through 14 June 2013
ER -