TY - JOUR
T1 - Drosophila gene expression pattern annotation through multi-instance multi-label learning
AU - Li, Ying Xin
AU - Ji, Shuiwang
AU - Kumar, Sudhir
AU - Ye, Jieping
AU - Zhou, Zhi Hua
N1 - Funding Information:
The authors want to thank Dr. Charlotte Konikoff for examining the in situ image groups, and the associate editor and anonymous reviewers for helpful comments and suggestions. This research was partially supported by the National Fundamental Research Program of China (2010CB327903), the National Science Foundation of China (60721002, 61073097), the Jiangsu Science Foundation (BK2008018), the Postdoctoral Science Foundation of China (20090461086), the Jiangsu Postdoctoral Foundation (0802001C), the National Institutes of Health (HG002516), and the US National Science Foundation (IIS-0612069, IIS-0953662).
PY - 2012
Y1 - 2012
N2 - In the studies of Drosophila embryogenesis, a large number of two-dimensional digital images of gene expression patterns have been produced to build an atlas of spatio-temporal gene expression dynamics across developmental time. Gene expressions captured in these images have been manually annotated with anatomical and developmental ontology terms using a controlled vocabulary (CV), which are useful in research aimed at understanding gene functions, interactions, and networks. With the rapid accumulation of images, the process of manual annotation has become increasingly cumbersome, and computational methods to automate this task are urgently needed. However, the automated annotation of embryo images is challenging. This is because the annotation terms spatially correspond to local expression patterns of images, yet they are assigned collectively to groups of images and it is unknown which term corresponds to which region of which image in the group. In this paper, we address this problem using a new machine learning framework, Multi-Instance Multi-Label (MIML) learning. We first show that the underlying nature of the annotation task is a typical MIML learning problem. Then, we propose two support vector machine algorithms under the MIML framework for the task. Experimental results on the FlyExpress database (a digital library of standardized Drosophila gene expression pattern images) reveal that the exploitation of MIML framework leads to significant performance improvement over state-of-the-art approaches.
AB - In the studies of Drosophila embryogenesis, a large number of two-dimensional digital images of gene expression patterns have been produced to build an atlas of spatio-temporal gene expression dynamics across developmental time. Gene expressions captured in these images have been manually annotated with anatomical and developmental ontology terms using a controlled vocabulary (CV), which are useful in research aimed at understanding gene functions, interactions, and networks. With the rapid accumulation of images, the process of manual annotation has become increasingly cumbersome, and computational methods to automate this task are urgently needed. However, the automated annotation of embryo images is challenging. This is because the annotation terms spatially correspond to local expression patterns of images, yet they are assigned collectively to groups of images and it is unknown which term corresponds to which region of which image in the group. In this paper, we address this problem using a new machine learning framework, Multi-Instance Multi-Label (MIML) learning. We first show that the underlying nature of the annotation task is a typical MIML learning problem. Then, we propose two support vector machine algorithms under the MIML framework for the task. Experimental results on the FlyExpress database (a digital library of standardized Drosophila gene expression pattern images) reveal that the exploitation of MIML framework leads to significant performance improvement over state-of-the-art approaches.
KW - Drosophila
KW - Gene expression pattern
KW - image annotation
KW - machine learning
KW - multi-instance multi-label (MIML) learning
KW - support vector machine
UR - http://www.scopus.com/inward/record.url?scp=81455143767&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=81455143767&partnerID=8YFLogxK
U2 - 10.1109/TCBB.2011.73
DO - 10.1109/TCBB.2011.73
M3 - Article
C2 - 21519115
AN - SCOPUS:81455143767
SN - 1545-5963
VL - 9
SP - 98
EP - 112
JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics
JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics
IS - 1
M1 - 5753882
ER -