CRAFTER: A Tree-Ensemble Clustering Algorithm for Static Datasets with Mixed Attributes and High Dimensionality

Sangdi Lin, Bahareh Azarnoush, George Runger

Research output: Contribution to journalArticlepeer-review

11 Scopus citations

Abstract

Clustering is an important aspect of data mining, while clustering high-dimensional mixed-attribute data in a scalable fashion still remains a challenging problem. In this paper, we propose a tree-ensemble clustering algorithm for static datasets, CRAFTER, to tackle this problem. CRAFTER is able to handle categorical and numeric attributes simultaneously, and scales well with the dimensionality and the size of datasets. CRAFTER leverages the advantages of a tree-ensemble to handle mixed attributes and high dimensionality. The concept of the class probability estimates is utilized to identify the representative data points for clustering. Through a series of experiments on both synthetic and real datasets, we have demonstrated that CRAFTER is superior than Random Forest Clustering (RFC), an existing tree-based clustering method, in terms of both the clustering quality and the computational cost.

Original languageEnglish (US)
Article number8294273
Pages (from-to)1686-1696
Number of pages11
JournalIEEE Transactions on Knowledge and Data Engineering
Volume30
Issue number9
DOIs
StatePublished - Sep 1 2018

Keywords

  • Clustering
  • categorical attribute
  • ensemble method
  • high dimensionality
  • mixed attributes
  • random forest
  • static datasets

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'CRAFTER: A Tree-Ensemble Clustering Algorithm for Static Datasets with Mixed Attributes and High Dimensionality'. Together they form a unique fingerprint.

Cite this