Efficiently handling feature redundancy in high-dimensional data

Lei Yu; Huan Liu

doi:10.1145/956750.956840

Efficiently handling feature redundancy in high-dimensional data

Lei Yu, Huan Liu

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

55 Scopus citations

Abstract

High-dimensional data poses a severe challenge for data mining. Feature selection is a frequently used technique in pre-processing high-dimensional data for successful data mining. Traditionally, feature selection is focused on removing irrelevant features. However, for high-dimensional data, removing redundant features is equally critical. In this paper, we provide a study of feature redundancy in high-dimensional data and propose a novel correlation-based approach to feature selection within the filter model. The extensive empirical study using real-world data shows that the proposed approach is efficient and effective in removing redundant and irrelevant features.

Original language	English (US)
Title of host publication	Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Pages	685-690
Number of pages	6
DOIs	https://doi.org/10.1145/956750.956840
State	Published - 2003
Event	9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03 - Washington, DC, United States Duration: Aug 24 2003 → Aug 27 2003

Other

Other	9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03
Country/Territory	United States
City	Washington, DC
Period	8/24/03 → 8/27/03

Keywords

Feature selection
High-dimensional data
Redundancy

ASJC Scopus subject areas

Software
Information Systems

Access to Document

10.1145/956750.956840

Cite this

@inproceedings{8796b241a0ba4c3ca24ec46cc515dc28,

title = "Efficiently handling feature redundancy in high-dimensional data",

abstract = "High-dimensional data poses a severe challenge for data mining. Feature selection is a frequently used technique in pre-processing high-dimensional data for successful data mining. Traditionally, feature selection is focused on removing irrelevant features. However, for high-dimensional data, removing redundant features is equally critical. In this paper, we provide a study of feature redundancy in high-dimensional data and propose a novel correlation-based approach to feature selection within the filter model. The extensive empirical study using real-world data shows that the proposed approach is efficient and effective in removing redundant and irrelevant features.",

keywords = "Feature selection, High-dimensional data, Redundancy",

author = "Lei Yu and Huan Liu",

year = "2003",

doi = "10.1145/956750.956840",

language = "English (US)",

pages = "685--690",

booktitle = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

note = "9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03 ; Conference date: 24-08-2003 Through 27-08-2003",

}

TY - GEN

T1 - Efficiently handling feature redundancy in high-dimensional data

AU - Yu, Lei

AU - Liu, Huan

PY - 2003

Y1 - 2003

N2 - High-dimensional data poses a severe challenge for data mining. Feature selection is a frequently used technique in pre-processing high-dimensional data for successful data mining. Traditionally, feature selection is focused on removing irrelevant features. However, for high-dimensional data, removing redundant features is equally critical. In this paper, we provide a study of feature redundancy in high-dimensional data and propose a novel correlation-based approach to feature selection within the filter model. The extensive empirical study using real-world data shows that the proposed approach is efficient and effective in removing redundant and irrelevant features.

AB - High-dimensional data poses a severe challenge for data mining. Feature selection is a frequently used technique in pre-processing high-dimensional data for successful data mining. Traditionally, feature selection is focused on removing irrelevant features. However, for high-dimensional data, removing redundant features is equally critical. In this paper, we provide a study of feature redundancy in high-dimensional data and propose a novel correlation-based approach to feature selection within the filter model. The extensive empirical study using real-world data shows that the proposed approach is efficient and effective in removing redundant and irrelevant features.

KW - Feature selection

KW - High-dimensional data

KW - Redundancy

UR - http://www.scopus.com/inward/record.url?scp=12244249636&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=12244249636&partnerID=8YFLogxK

U2 - 10.1145/956750.956840

DO - 10.1145/956750.956840

M3 - Conference contribution

AN - SCOPUS:12244249636

SP - 685

EP - 690

BT - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

T2 - 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03

Y2 - 24 August 2003 through 27 August 2003

ER -

Efficiently handling feature redundancy in high-dimensional data

Abstract

Other

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this