J-score: a robust measure of clustering accuracy

Navid Ahmadinejad; Yunro Chung; Li Liu

doi:10.7717/peerj-cs.1545

J-score: a robust measure of clustering accuracy

Navid Ahmadinejad, Yunro Chung, Li Liu

Health Solutions, College of (CHS)

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

Background. Clustering analysis discovers hidden structures in a data set by partitioning them into disjoint clusters. Robust accuracy measures that evaluate the goodness of clustering results are critical for algorithm development and model diagnosis. Common problems of clustering accuracy measures include overlooking unmatched clusters, biases towards excessive clusters, unstable baselines, and difficulties of interpretation. In this study, we presented a novel accuracy measure, J-score, to address these issues. Methods. Given a data set with known class labels, J-score quantifies how well the hypothetical clusters produced by clustering analysis recover the true classes. It starts with bidirectional set matching to identify the correspondence between true classes and hypothetical clusters based on Jaccard index. It then computes two weighted sums of Jaccard indices measuring the reconciliation from classes to clusters and vice versa. The final J-score is the harmonic mean of the two weighted sums. Results. Through simulation studies and analyses of real data sets, we evaluated the performance of J-score and compared with existing measures. Our results show that J-score is effective in distinguishing partition structures that differ only by unmatched clusters, rewarding correct inference of class numbers, addressing biases towards excessive clusters, and having a relatively stable baseline. The simplicity of its calculation makes the interpretation straightforward. It is a valuable tool complementary to other accuracy measures. We released an R/jScore package implementing the algorithm.

Original language	English (US)
Article number	e1545
Journal	PeerJ Computer Science
Volume	9
DOIs	https://doi.org/10.7717/peerj-cs.1545
State	Published - 2023

Keywords

Accuracy metrics
Cluster analysis

ASJC Scopus subject areas

General Computer Science

Access to Document

10.7717/peerj-cs.1545

Cite this

@article{22518826b14a4b3987976de59ff233e6,

title = "J-score: a robust measure of clustering accuracy",

abstract = "Background. Clustering analysis discovers hidden structures in a data set by partitioning them into disjoint clusters. Robust accuracy measures that evaluate the goodness of clustering results are critical for algorithm development and model diagnosis. Common problems of clustering accuracy measures include overlooking unmatched clusters, biases towards excessive clusters, unstable baselines, and difficulties of interpretation. In this study, we presented a novel accuracy measure, J-score, to address these issues. Methods. Given a data set with known class labels, J-score quantifies how well the hypothetical clusters produced by clustering analysis recover the true classes. It starts with bidirectional set matching to identify the correspondence between true classes and hypothetical clusters based on Jaccard index. It then computes two weighted sums of Jaccard indices measuring the reconciliation from classes to clusters and vice versa. The final J-score is the harmonic mean of the two weighted sums. Results. Through simulation studies and analyses of real data sets, we evaluated the performance of J-score and compared with existing measures. Our results show that J-score is effective in distinguishing partition structures that differ only by unmatched clusters, rewarding correct inference of class numbers, addressing biases towards excessive clusters, and having a relatively stable baseline. The simplicity of its calculation makes the interpretation straightforward. It is a valuable tool complementary to other accuracy measures. We released an R/jScore package implementing the algorithm.",

keywords = "Accuracy metrics, Cluster analysis",

author = "Navid Ahmadinejad and Yunro Chung and Li Liu",

note = "Publisher Copyright: {\textcopyright} 2023 Ahmadinejad et al.",

year = "2023",

doi = "10.7717/peerj-cs.1545",

language = "English (US)",

volume = "9",

journal = "PeerJ Computer Science",

issn = "2376-5992",

publisher = "PeerJ Inc.",

}

TY - JOUR

T1 - J-score

T2 - a robust measure of clustering accuracy

AU - Ahmadinejad, Navid

AU - Chung, Yunro

AU - Liu, Li

PY - 2023

Y1 - 2023

N2 - Background. Clustering analysis discovers hidden structures in a data set by partitioning them into disjoint clusters. Robust accuracy measures that evaluate the goodness of clustering results are critical for algorithm development and model diagnosis. Common problems of clustering accuracy measures include overlooking unmatched clusters, biases towards excessive clusters, unstable baselines, and difficulties of interpretation. In this study, we presented a novel accuracy measure, J-score, to address these issues. Methods. Given a data set with known class labels, J-score quantifies how well the hypothetical clusters produced by clustering analysis recover the true classes. It starts with bidirectional set matching to identify the correspondence between true classes and hypothetical clusters based on Jaccard index. It then computes two weighted sums of Jaccard indices measuring the reconciliation from classes to clusters and vice versa. The final J-score is the harmonic mean of the two weighted sums. Results. Through simulation studies and analyses of real data sets, we evaluated the performance of J-score and compared with existing measures. Our results show that J-score is effective in distinguishing partition structures that differ only by unmatched clusters, rewarding correct inference of class numbers, addressing biases towards excessive clusters, and having a relatively stable baseline. The simplicity of its calculation makes the interpretation straightforward. It is a valuable tool complementary to other accuracy measures. We released an R/jScore package implementing the algorithm.

AB - Background. Clustering analysis discovers hidden structures in a data set by partitioning them into disjoint clusters. Robust accuracy measures that evaluate the goodness of clustering results are critical for algorithm development and model diagnosis. Common problems of clustering accuracy measures include overlooking unmatched clusters, biases towards excessive clusters, unstable baselines, and difficulties of interpretation. In this study, we presented a novel accuracy measure, J-score, to address these issues. Methods. Given a data set with known class labels, J-score quantifies how well the hypothetical clusters produced by clustering analysis recover the true classes. It starts with bidirectional set matching to identify the correspondence between true classes and hypothetical clusters based on Jaccard index. It then computes two weighted sums of Jaccard indices measuring the reconciliation from classes to clusters and vice versa. The final J-score is the harmonic mean of the two weighted sums. Results. Through simulation studies and analyses of real data sets, we evaluated the performance of J-score and compared with existing measures. Our results show that J-score is effective in distinguishing partition structures that differ only by unmatched clusters, rewarding correct inference of class numbers, addressing biases towards excessive clusters, and having a relatively stable baseline. The simplicity of its calculation makes the interpretation straightforward. It is a valuable tool complementary to other accuracy measures. We released an R/jScore package implementing the algorithm.

KW - Accuracy metrics

KW - Cluster analysis

UR - http://www.scopus.com/inward/record.url?scp=85172310004&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85172310004&partnerID=8YFLogxK

U2 - 10.7717/peerj-cs.1545

DO - 10.7717/peerj-cs.1545

M3 - Article

AN - SCOPUS:85172310004

SN - 2376-5992

VL - 9

JO - PeerJ Computer Science

JF - PeerJ Computer Science

M1 - e1545

ER -

J-score: a robust measure of clustering accuracy

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this