TY - JOUR
T1 - The GNAT library for local and remote gene mention normalization
AU - Hakenberg, Jörg
AU - Gerner, Martin
AU - Haeussler, Maximilian
AU - Solt, Illés
AU - Plake, Conrad
AU - Schroeder, Michael
AU - Gonzalez, Graciela
AU - Nenadic, Goran
AU - Bergman, Casey M.
N1 - Funding Information:
Funding: Biotechnology and Biological Sciences Research Council (CASE studentship to M.G., grant BB/G000093/1 to C.M.B., G.N.); the European Commission (grant HEALTH-F4-2008-223210 to C.M.B.); German Academic Exchange Service (DAAD) to I.S.
PY - 2011/10
Y1 - 2011/10
N2 - Summary: Identifying mentions of named entities, such as genes or diseases, and normalizing them to database identifiers have become an important step in many text and data mining pipelines. Despite this need, very few entity normalization systems are publicly available as source code or web services for biomedical text mining. Here we present the GNAT Java library for text retrieval, named entity recognition, and normalization of gene and protein mentions in biomedical text. The library can be used as a component to be integrated with other text-mining systems, as a framework to add user-specific extensions, and as an efficient stand-alone application for the identification of gene and protein names for data analysis. On the BioCreative III test data, the current version of GNAT achieves a Tap-20 score of 0.1987.
AB - Summary: Identifying mentions of named entities, such as genes or diseases, and normalizing them to database identifiers have become an important step in many text and data mining pipelines. Despite this need, very few entity normalization systems are publicly available as source code or web services for biomedical text mining. Here we present the GNAT Java library for text retrieval, named entity recognition, and normalization of gene and protein mentions in biomedical text. The library can be used as a component to be integrated with other text-mining systems, as a framework to add user-specific extensions, and as an efficient stand-alone application for the identification of gene and protein names for data analysis. On the BioCreative III test data, the current version of GNAT achieves a Tap-20 score of 0.1987.
UR - http://www.scopus.com/inward/record.url?scp=80053441509&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80053441509&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btr455
DO - 10.1093/bioinformatics/btr455
M3 - Article
C2 - 21813477
AN - SCOPUS:80053441509
SN - 1367-4803
VL - 27
SP - 2769
EP - 2771
JO - Bioinformatics
JF - Bioinformatics
IS - 19
M1 - btr455
ER -