Abstract
High-dimensional data poses a severe challenge for data mining. Feature selection is a frequently used technique in pre-processing high-dimensional data for successful data mining. Traditionally, feature selection is focused on removing irrelevant features. However, for high-dimensional data, removing redundant features is equally critical. In this paper, we provide a study of feature redundancy in high-dimensional data and propose a novel correlation-based approach to feature selection within the filter model. The extensive empirical study using real-world data shows that the proposed approach is efficient and effective in removing redundant and irrelevant features.
Original language | English (US) |
---|---|
Title of host publication | Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining |
Pages | 685-690 |
Number of pages | 6 |
DOIs | |
State | Published - 2003 |
Event | 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03 - Washington, DC, United States Duration: Aug 24 2003 → Aug 27 2003 |
Other
Other | 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03 |
---|---|
Country/Territory | United States |
City | Washington, DC |
Period | 8/24/03 → 8/27/03 |
Keywords
- Feature selection
- High-dimensional data
- Redundancy
ASJC Scopus subject areas
- Software
- Information Systems