TY - GEN
T1 - Dampster-shafer evidence theory based multi-characteristics fusion for clustering evaluation
AU - Yue, Shihong
AU - Wu, Teresa
AU - Wang, Yamin
AU - Zhang, Kai
AU - Liu, Weixia
PY - 2010
Y1 - 2010
N2 - Clustering is a widely used unsupervised learning method to group data with similar characteristics. The performance of the clustering method can be in general evaluated through some validity indices. However, most validity indices are designed for the specific algorithms along with specific structure of data space. Moreover, these indices consist of a few within- and between- clustering distance functions. The applicability of these indices heavily relies on the correctness of combining these functions. In this research, we first summarize three common characteristics of any clustering evaluation: (1) the clustering outcome can be evaluated by a group of validity indices if some efficient validity indices are available, (2) the clustering outcome can be measured by an independent intra-cluster distance function and (3) the clustering outcome can be measured by the neighborhood based functions. Considering the complementary and unstable natures among the clustering evaluation, we then apply Dampster-Shafter (D-S) Evidence Theory to fuse the three characteristics to generate a new index, termed fused Multiple Characteristic Indices (fMCI). The fMCI generally is capable to evaluate clustering outcomes of arbitrary clustering methods associated with more complex structures of data space. We conduct a number of experiments to demonstrate that the fMCI is applicable to evaluate different clustering algorithms on different datasets and the fMCI can achieve more accurate and robust clustering evaluation comparing to existing indices.
AB - Clustering is a widely used unsupervised learning method to group data with similar characteristics. The performance of the clustering method can be in general evaluated through some validity indices. However, most validity indices are designed for the specific algorithms along with specific structure of data space. Moreover, these indices consist of a few within- and between- clustering distance functions. The applicability of these indices heavily relies on the correctness of combining these functions. In this research, we first summarize three common characteristics of any clustering evaluation: (1) the clustering outcome can be evaluated by a group of validity indices if some efficient validity indices are available, (2) the clustering outcome can be measured by an independent intra-cluster distance function and (3) the clustering outcome can be measured by the neighborhood based functions. Considering the complementary and unstable natures among the clustering evaluation, we then apply Dampster-Shafter (D-S) Evidence Theory to fuse the three characteristics to generate a new index, termed fused Multiple Characteristic Indices (fMCI). The fMCI generally is capable to evaluate clustering outcomes of arbitrary clustering methods associated with more complex structures of data space. We conduct a number of experiments to demonstrate that the fMCI is applicable to evaluate different clustering algorithms on different datasets and the fMCI can achieve more accurate and robust clustering evaluation comparing to existing indices.
KW - Dampster-Shafer evidence theory
KW - Validity index
KW - clustering algorithm
KW - data structure
UR - http://www.scopus.com/inward/record.url?scp=78349277289&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78349277289&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-16248-0_70
DO - 10.1007/978-3-642-16248-0_70
M3 - Conference contribution
AN - SCOPUS:78349277289
SN - 3642162479
SN - 9783642162473
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 499
EP - 519
BT - Rough Set and Knowledge Technology - 5th International Conference, RSKT 2010, Proceedings
T2 - 5th International Conference on Rough Set and Knowledge Technology, RSKT 2010
Y2 - 15 October 2010 through 17 October 2010
ER -