TY - GEN
T1 - Unsupervised feature selection for linked social media data
AU - Tang, Jiliang
AU - Liu, Huan
PY - 2012
Y1 - 2012
N2 - The prevalent use of social media produces mountains of unlabeled, high-dimensional data. Feature selection has been shown effective in dealing with high-dimensional data for efficient data mining. Feature selection for unlabeled data remains a challenging task due to the absence of label information by which the feature relevance can be assessed. The unique characteristics of social media data further complicate the already challenging problem of unsupervised feature selection, (e.g., part of social media data is linked, which makes invalid the independent and identically distributed assumption), bringing about new challenges to traditional unsupervised feature selection algorithms. In this paper, we study the differences between social media data and traditional attribute-value data, investigate if the relations revealed in linked data can be used to help select relevant features, and propose a novel unsupervised feature selection framework, LUFS, for linked social media data. We perform experiments with real-world social media datasets to evaluate the effectiveness of the proposed framework and probe the working of its key components.
AB - The prevalent use of social media produces mountains of unlabeled, high-dimensional data. Feature selection has been shown effective in dealing with high-dimensional data for efficient data mining. Feature selection for unlabeled data remains a challenging task due to the absence of label information by which the feature relevance can be assessed. The unique characteristics of social media data further complicate the already challenging problem of unsupervised feature selection, (e.g., part of social media data is linked, which makes invalid the independent and identically distributed assumption), bringing about new challenges to traditional unsupervised feature selection algorithms. In this paper, we study the differences between social media data and traditional attribute-value data, investigate if the relations revealed in linked data can be used to help select relevant features, and propose a novel unsupervised feature selection framework, LUFS, for linked social media data. We perform experiments with real-world social media datasets to evaluate the effectiveness of the proposed framework and probe the working of its key components.
KW - linked social media data
KW - pseudo-class label
KW - unsupervised feature selection
UR - http://www.scopus.com/inward/record.url?scp=84866052116&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84866052116&partnerID=8YFLogxK
U2 - 10.1145/2339530.2339673
DO - 10.1145/2339530.2339673
M3 - Conference contribution
AN - SCOPUS:84866052116
SN - 9781450314626
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 904
EP - 912
BT - KDD'12 - 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
T2 - 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2012
Y2 - 12 August 2012 through 16 August 2012
ER -