TY - JOUR
T1 - Fuzzy c-means clustering with prior biological knowledge
AU - Tari, Luis
AU - Baral, Chitta
AU - Kim, Seungchan
N1 - Funding Information:
The authors appreciate insightful comments and editorial helps from Dr. Michael Bittner at the Translational Genomics Research Institute ( http://www.tgen.org ), Phoenix, AZ 85005. The authors would also like to thank the valuable comments by the anonymous reviewers. SK was partially funded by P01-CA27502-23 (NIH/NCI), P01 CA109552-01A1 (NIH/NCI), U19 AI067773 (NIH/NIAID), and W81XWH-06-1-090 (DoD/CDMRP).
PY - 2009/2
Y1 - 2009/2
N2 - We propose a novel semi-supervised clustering method called GO Fuzzy c-means, which enables the simultaneous use of biological knowledge and gene expression data in a probabilistic clustering algorithm. Our method is based on the fuzzy c-means clustering algorithm and utilizes the Gene Ontology annotations as prior knowledge to guide the process of grouping functionally related genes. Unlike traditional clustering methods, our method is capable of assigning genes to multiple clusters, which is a more appropriate representation of the behavior of genes. Two datasets of yeast (Saccharomyces cerevisiae) expression profiles were applied to compare our method with other state-of-the-art clustering methods. Our experiments show that our method can produce far better biologically meaningful clusters even with the use of a small percentage of Gene Ontology annotations. In addition, our experiments further indicate that the utilization of prior knowledge in our method can predict gene functions effectively. The source code is freely available at http://sysbio.fulton.asu.edu/gofuzzy/.
AB - We propose a novel semi-supervised clustering method called GO Fuzzy c-means, which enables the simultaneous use of biological knowledge and gene expression data in a probabilistic clustering algorithm. Our method is based on the fuzzy c-means clustering algorithm and utilizes the Gene Ontology annotations as prior knowledge to guide the process of grouping functionally related genes. Unlike traditional clustering methods, our method is capable of assigning genes to multiple clusters, which is a more appropriate representation of the behavior of genes. Two datasets of yeast (Saccharomyces cerevisiae) expression profiles were applied to compare our method with other state-of-the-art clustering methods. Our experiments show that our method can produce far better biologically meaningful clusters even with the use of a small percentage of Gene Ontology annotations. In addition, our experiments further indicate that the utilization of prior knowledge in our method can predict gene functions effectively. The source code is freely available at http://sysbio.fulton.asu.edu/gofuzzy/.
KW - Fuzzy c-means clustering
KW - Gene Ontology
KW - Gene expression data
KW - Gene function prediction
KW - Saccharomyces cerevisiae yeast
KW - Semi-supervised clustering
UR - http://www.scopus.com/inward/record.url?scp=60049085522&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=60049085522&partnerID=8YFLogxK
U2 - 10.1016/j.jbi.2008.05.009
DO - 10.1016/j.jbi.2008.05.009
M3 - Article
C2 - 18595779
AN - SCOPUS:60049085522
SN - 1532-0464
VL - 42
SP - 74
EP - 81
JO - Journal of Biomedical Informatics
JF - Journal of Biomedical Informatics
IS - 1
ER -