Robust unsupervised feature selection on networked data

Jundong Li; Xia Hu; Liang Wu; Huan Liu

doi:10.1137/1.9781611974348.44

Robust unsupervised feature selection on networked data

Jundong Li, Xia Hu, Liang Wu, Huan Liu

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

62 Scopus citations

Abstract

Feature selection has shown its effectiveness to prepare high-dimensional data for many data mining and machine learning tasks. Traditional feature selection algorithms are mainly based on the assumption that data instances are independent and identically distributed. However, this assumption is invalid in networked data since instances are not only associated with high dimensional features but also inherently interconnected with each other. In addition, obtaining label information for networked data is time consuming and labor intensive. Without label information to direct feature selection, it is difficult to assess the feature relevance. In contrast to the scarce label information, link information in networks are abundant and could help select relevant features. However, most networked data has a lot of noisy links, resulting in the feature selection algorithms to be less effective. To address the above mentioned issues, we propose a robust unsupervised feature selection framework NetFS for networked data, which embeds the latent representation learning into feature selection. Therefore, content information is able to help mitigate the negative effects from noisy links in learning latent representations, while good latent representations in turn can contribute to extract more meaningful features. In other words, both phases could cooperate and boost each other. Experimental results on real-world datasets demonstrate the effectiveness of the proposed framework.

Original language	English (US)
Title of host publication	16th SIAM International Conference on Data Mining 2016, SDM 2016
Editors	Sanjay Chawla Venkatasubramanian, Wagner Meira
Publisher	Society for Industrial and Applied Mathematics Publications
Pages	387-395
Number of pages	9
ISBN (Electronic)	9781510828117
DOIs	https://doi.org/10.1137/1.9781611974348.44
State	Published - 2016
Event	16th SIAM International Conference on Data Mining 2016, SDM 2016 - Miami, United States Duration: May 5 2016 → May 7 2016

Publication series

Name	16th SIAM International Conference on Data Mining 2016, SDM 2016

Other

Other	16th SIAM International Conference on Data Mining 2016, SDM 2016
Country/Territory	United States
City	Miami
Period	5/5/16 → 5/7/16

ASJC Scopus subject areas

Computer Science Applications
Software

Access to Document

10.1137/1.9781611974348.44

Cite this

Li, J., Hu, X., Wu, L., & Liu, H. (2016). Robust unsupervised feature selection on networked data. In S. C. Venkatasubramanian, & W. Meira (Eds.), 16th SIAM International Conference on Data Mining 2016, SDM 2016 (pp. 387-395). (16th SIAM International Conference on Data Mining 2016, SDM 2016). Society for Industrial and Applied Mathematics Publications. https://doi.org/10.1137/1.9781611974348.44

Robust unsupervised feature selection on networked data. / Li, Jundong; Hu, Xia; Wu, Liang et al.
16th SIAM International Conference on Data Mining 2016, SDM 2016. ed. / Sanjay Chawla Venkatasubramanian; Wagner Meira. Society for Industrial and Applied Mathematics Publications, 2016. p. 387-395 (16th SIAM International Conference on Data Mining 2016, SDM 2016).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Li, J, Hu, X, Wu, L & Liu, H 2016, Robust unsupervised feature selection on networked data. in SC Venkatasubramanian & W Meira (eds), 16th SIAM International Conference on Data Mining 2016, SDM 2016. 16th SIAM International Conference on Data Mining 2016, SDM 2016, Society for Industrial and Applied Mathematics Publications, pp. 387-395, 16th SIAM International Conference on Data Mining 2016, SDM 2016, Miami, United States, 5/5/16. https://doi.org/10.1137/1.9781611974348.44

Li J, Hu X, Wu L, Liu H. Robust unsupervised feature selection on networked data. In Venkatasubramanian SC, Meira W, editors, 16th SIAM International Conference on Data Mining 2016, SDM 2016. Society for Industrial and Applied Mathematics Publications. 2016. p. 387-395. (16th SIAM International Conference on Data Mining 2016, SDM 2016). doi: 10.1137/1.9781611974348.44

Li, Jundong ; Hu, Xia ; Wu, Liang et al. / Robust unsupervised feature selection on networked data. 16th SIAM International Conference on Data Mining 2016, SDM 2016. editor / Sanjay Chawla Venkatasubramanian ; Wagner Meira. Society for Industrial and Applied Mathematics Publications, 2016. pp. 387-395 (16th SIAM International Conference on Data Mining 2016, SDM 2016).

@inproceedings{c1889231ac734b70b3822d6026ddc88e,

title = "Robust unsupervised feature selection on networked data",

abstract = "Feature selection has shown its effectiveness to prepare high-dimensional data for many data mining and machine learning tasks. Traditional feature selection algorithms are mainly based on the assumption that data instances are independent and identically distributed. However, this assumption is invalid in networked data since instances are not only associated with high dimensional features but also inherently interconnected with each other. In addition, obtaining label information for networked data is time consuming and labor intensive. Without label information to direct feature selection, it is difficult to assess the feature relevance. In contrast to the scarce label information, link information in networks are abundant and could help select relevant features. However, most networked data has a lot of noisy links, resulting in the feature selection algorithms to be less effective. To address the above mentioned issues, we propose a robust unsupervised feature selection framework NetFS for networked data, which embeds the latent representation learning into feature selection. Therefore, content information is able to help mitigate the negative effects from noisy links in learning latent representations, while good latent representations in turn can contribute to extract more meaningful features. In other words, both phases could cooperate and boost each other. Experimental results on real-world datasets demonstrate the effectiveness of the proposed framework.",

author = "Jundong Li and Xia Hu and Liang Wu and Huan Liu",

note = "Publisher Copyright: Copyright {\textcopyright} by SIAM.; 16th SIAM International Conference on Data Mining 2016, SDM 2016 ; Conference date: 05-05-2016 Through 07-05-2016",

year = "2016",

doi = "10.1137/1.9781611974348.44",

language = "English (US)",

series = "16th SIAM International Conference on Data Mining 2016, SDM 2016",

publisher = "Society for Industrial and Applied Mathematics Publications",

pages = "387--395",

editor = "Venkatasubramanian, {Sanjay Chawla} and Wagner Meira",

booktitle = "16th SIAM International Conference on Data Mining 2016, SDM 2016",

}

TY - GEN

T1 - Robust unsupervised feature selection on networked data

AU - Li, Jundong

AU - Hu, Xia

AU - Wu, Liang

AU - Liu, Huan

PY - 2016

Y1 - 2016

N2 - Feature selection has shown its effectiveness to prepare high-dimensional data for many data mining and machine learning tasks. Traditional feature selection algorithms are mainly based on the assumption that data instances are independent and identically distributed. However, this assumption is invalid in networked data since instances are not only associated with high dimensional features but also inherently interconnected with each other. In addition, obtaining label information for networked data is time consuming and labor intensive. Without label information to direct feature selection, it is difficult to assess the feature relevance. In contrast to the scarce label information, link information in networks are abundant and could help select relevant features. However, most networked data has a lot of noisy links, resulting in the feature selection algorithms to be less effective. To address the above mentioned issues, we propose a robust unsupervised feature selection framework NetFS for networked data, which embeds the latent representation learning into feature selection. Therefore, content information is able to help mitigate the negative effects from noisy links in learning latent representations, while good latent representations in turn can contribute to extract more meaningful features. In other words, both phases could cooperate and boost each other. Experimental results on real-world datasets demonstrate the effectiveness of the proposed framework.

AB - Feature selection has shown its effectiveness to prepare high-dimensional data for many data mining and machine learning tasks. Traditional feature selection algorithms are mainly based on the assumption that data instances are independent and identically distributed. However, this assumption is invalid in networked data since instances are not only associated with high dimensional features but also inherently interconnected with each other. In addition, obtaining label information for networked data is time consuming and labor intensive. Without label information to direct feature selection, it is difficult to assess the feature relevance. In contrast to the scarce label information, link information in networks are abundant and could help select relevant features. However, most networked data has a lot of noisy links, resulting in the feature selection algorithms to be less effective. To address the above mentioned issues, we propose a robust unsupervised feature selection framework NetFS for networked data, which embeds the latent representation learning into feature selection. Therefore, content information is able to help mitigate the negative effects from noisy links in learning latent representations, while good latent representations in turn can contribute to extract more meaningful features. In other words, both phases could cooperate and boost each other. Experimental results on real-world datasets demonstrate the effectiveness of the proposed framework.

UR - http://www.scopus.com/inward/record.url?scp=84991628944&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84991628944&partnerID=8YFLogxK

U2 - 10.1137/1.9781611974348.44

DO - 10.1137/1.9781611974348.44

M3 - Conference contribution

AN - SCOPUS:84991628944

T3 - 16th SIAM International Conference on Data Mining 2016, SDM 2016

SP - 387

EP - 395

BT - 16th SIAM International Conference on Data Mining 2016, SDM 2016

A2 - Venkatasubramanian, Sanjay Chawla

A2 - Meira, Wagner

PB - Society for Industrial and Applied Mathematics Publications

T2 - 16th SIAM International Conference on Data Mining 2016, SDM 2016

Y2 - 5 May 2016 through 7 May 2016

ER -

Robust unsupervised feature selection on networked data

Abstract

Publication series

Other

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this