TY - JOUR
T1 - Biological workflow with BlastQuest
AU - Farmerie, William G.
AU - Hammer, Joachim
AU - Liu, Li
AU - Sahni, Anuj
AU - Schneider, Markus
N1 - Funding Information:
In this paper we have described BlastQuest, a Web-based and interactive tool for importing and persistently storing genomic data from multiple BLAST queries in a relational database, applying DBMS functionality for processing and querying these data, and visualizing them appropriately. In addition BlastQuest supports the ability to connect sequence identities inferred from BLAST results with gene-associated biological functions described through the efforts of the Gene Ontology (GO) Consortium and pathway information from the Kyoto Encyclopedia of Genes and Genomes (KEGG). This type of cross-referencing has shown to be an ideal way to describe the functionality of a newly, discovered gene and helps biologists annotate and catalogue the genes in a way that is universally accepted. BlastQuest is being supported by the Interdisciplinary Center for Biotechnology Research (ICBR) at the University of Florida and has been successfully employed and tested by scientists on campus and their collaborators around the world for over eighteen months. We are now in the process of developing the next-generation BlastQuest, which addresses the limitations of our existing concept mainly with respect to the need for a more expressive and extensible representation and data model, tools to support the browsing and integration of external repositories, and a richer and more intuitive query language that can be extended with new analytical functions and that can take advantage of the new data model. William G. Farmerie is an Assistant Director of the Interdisciplinary Center for Biotechnology Research (ICBR) at the University of Florida. He received his B.S. degree in Biological Science from Florida State University in 1973, and his Ph.D. degree in Biomedical Sciences from the University of Tennessee in 1980. Prior to joining the University of Florida, Dr. Farmerie held Research Associate positions at the University of Michigan, and at the University of North Carolina at Chapel Hill. He is the Director of the ICBR large-scale DNA sequencing facility and the ICBR bioinformatics group, which provide genomics-based services to research scientists throughout the University of Florida. Joachim Hammer is an Associate Professor in the Department of Computer and Information Science and Engineering at the University of Florida. He received his B.S. degree in Computer Science and Applied Mathematics from the University of Rochester in 1988, and his M.S. and Ph.D. degrees in Computer Science from the University of Southern California in 1990 and 1994. Prior to joining the University of Florida, Dr. Hammer was a Research Scientist in the database group at Stanford University. His areas of research are heterogeneous and semi-structured information systems, federated databases, data warehousing, knowledge engineering and data management for biological databases. Dr. Hammer is a member of ACM and IEEE. Since August 2001, he is the elected secretary and treasurer for the Association for Computing Machinery’s Special Interest Group Management of Data (SIGMOD). Li Liu has an M.D. degree from the Peking Union Medical College in China and a M.S. degree in Computer Science from the New Jersey Institute of Technology. Dr. Liu is currently a senior bioinformatician at ICBR of the University of Florida. Her responsibility is to conduct computational analysis of sequence data and microarray data. Prior to joining ICBR, she worked on a biological database integration project at a biotechnology company. Anuj Sahni received M.S. in Electrical and Computer Engineering from the University of Florida in 2002. He is currently working as a Bioinformatics Software Engineer in the Interdisciplinary Center for Biotechnology Research (ICBR), where he is responsible for designing and implementing high throughput bioinformatics software-tools including BlastQuest. Markus Schneider received his Diploma degree in Computer Science from the University of Dortmund, Germany, in 1990, and his Ph.D. degree (Dr. rer. nat.) in Computer Science from the University of Hagen, Germany, in 1995. After that, until 2001, he was research assistant (lecturer) at the University of Hagen. Since 2002, he has been an Assistant Professor at the University of Florida, Gainesville, FL, USA. His research interests are spatial, spatio-temporal, fuzzy, and genomics databases. In particular, he focuses on the design and implementation of spatial data types (geo-relational algebra, ROSE algebra, realms), spatio-temporal data types (moving objects), fuzzy spatial objects, spatial and spatio-temporal partitions and networks, and deals with the design and implementation of data structures and geometric algorithms for these topics.
PY - 2005/4
Y1 - 2005/4
N2 - Besides domain-specific biological problems, biologists are confronted with many computational problems. The large amount of varying, heterogeneous, and semi-structured biological data, the increasing complexity of biological applications, methods, and tools afflicted with uncertainty and missing knowledge, as well as the lacking interoperability of available tools necessitate integrative measures to enable biology workflow. In this paper we address these problems in the context of the processing and evaluation of BLAST query results. We present a new tool, called BlastQuest, which relies on database technology and provides sophisticated interactive and Web-enabled query, analysis, and visualization facilities for genomics data. The interface with the Gene Ontology and the KEGG pathway databases decisively foster the biological workflow. Finally, based on our experience with BlastQuest, we briefly sketch a new concept, called Genomics Algebra, for solving genomic data management problems from a broader perspective.
AB - Besides domain-specific biological problems, biologists are confronted with many computational problems. The large amount of varying, heterogeneous, and semi-structured biological data, the increasing complexity of biological applications, methods, and tools afflicted with uncertainty and missing knowledge, as well as the lacking interoperability of available tools necessitate integrative measures to enable biology workflow. In this paper we address these problems in the context of the processing and evaluation of BLAST query results. We present a new tool, called BlastQuest, which relies on database technology and provides sophisticated interactive and Web-enabled query, analysis, and visualization facilities for genomics data. The interface with the Gene Ontology and the KEGG pathway databases decisively foster the biological workflow. Finally, based on our experience with BlastQuest, we briefly sketch a new concept, called Genomics Algebra, for solving genomic data management problems from a broader perspective.
KW - BLAST
KW - Gene Ontology
KW - Genomics Algebra
KW - KEGG pathway database
KW - Unifying database
UR - http://www.scopus.com/inward/record.url?scp=10444279210&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=10444279210&partnerID=8YFLogxK
U2 - 10.1016/S0169-023X(04)00113-2
DO - 10.1016/S0169-023X(04)00113-2
M3 - Article
AN - SCOPUS:10444279210
SN - 0169-023X
VL - 53
SP - 75
EP - 97
JO - Data and Knowledge Engineering
JF - Data and Knowledge Engineering
IS - 1
ER -