TY - GEN
T1 - Finding needles of interested tweets in the haystack of Twitter network
AU - Tian, Qiongjie
AU - Lagisetty, Jashmi
AU - Li, Baoxin
N1 - Funding Information:
This work was supported in part by a grant (#1135616) from National Science Foundation. Any opinions expressed in this material are those of the authors and do not necessarily reflect the views of the NSF
Publisher Copyright:
© 2016 IEEE.
Copyright:
Copyright 2017 Elsevier B.V., All rights reserved.
PY - 2016/11/21
Y1 - 2016/11/21
N2 - Drug use and abuse is a serious societal problem. The fast development and adoption of social media and smart mobile devices in recent years bring about new opportunities for advancing computer-based strategies for understanding and intervention of drug-related behaviors. However, the existing literature still lacks principled ways of building computational models for supporting effective analysis of large-scale, often unstructured social media data. Part of the challenge stems from the difficulty of obtaining so-called ground-truth data that are typically required for training computational models. This paper presents a progressive semi-supervised learning approach to identifying Twitter tweets that are related to personal and recreational use of marijuana. Based on a small, labeled dataset, the proposed approach first learns optimal mapping of raw features from the tweets for classification, using a method of weakly hierarchical lasso. The learned feature model is then used to support unsupervised clustering of Web-scale data. Experiments with realistic data crawled from Twitter are used to validate the proposed approach, demonstrating its effectiveness.
AB - Drug use and abuse is a serious societal problem. The fast development and adoption of social media and smart mobile devices in recent years bring about new opportunities for advancing computer-based strategies for understanding and intervention of drug-related behaviors. However, the existing literature still lacks principled ways of building computational models for supporting effective analysis of large-scale, often unstructured social media data. Part of the challenge stems from the difficulty of obtaining so-called ground-truth data that are typically required for training computational models. This paper presents a progressive semi-supervised learning approach to identifying Twitter tweets that are related to personal and recreational use of marijuana. Based on a small, labeled dataset, the proposed approach first learns optimal mapping of raw features from the tweets for classification, using a method of weakly hierarchical lasso. The learned feature model is then used to support unsupervised clustering of Web-scale data. Experiments with realistic data crawled from Twitter are used to validate the proposed approach, demonstrating its effectiveness.
UR - http://www.scopus.com/inward/record.url?scp=85006716950&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85006716950&partnerID=8YFLogxK
U2 - 10.1109/ASONAM.2016.7752273
DO - 10.1109/ASONAM.2016.7752273
M3 - Conference contribution
AN - SCOPUS:85006716950
T3 - Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016
SP - 447
EP - 452
BT - Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016
A2 - Kumar, Ravi
A2 - Caverlee, James
A2 - Tong, Hanghang
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016
Y2 - 18 August 2016 through 21 August 2016
ER -