The GNAT library for local and remote gene mention normalization

Jörg Hakenberg, Martin Gerner, Maximilian Haeussler, Illés Solt, Conrad Plake, Michael Schroeder, Graciela Gonzalez, Goran Nenadic, Casey M. Bergman

    Research output: Contribution to journalArticlepeer-review

    53 Scopus citations


    Summary: Identifying mentions of named entities, such as genes or diseases, and normalizing them to database identifiers have become an important step in many text and data mining pipelines. Despite this need, very few entity normalization systems are publicly available as source code or web services for biomedical text mining. Here we present the GNAT Java library for text retrieval, named entity recognition, and normalization of gene and protein mentions in biomedical text. The library can be used as a component to be integrated with other text-mining systems, as a framework to add user-specific extensions, and as an efficient stand-alone application for the identification of gene and protein names for data analysis. On the BioCreative III test data, the current version of GNAT achieves a Tap-20 score of 0.1987.

    Original languageEnglish (US)
    Article numberbtr455
    Pages (from-to)2769-2771
    Number of pages3
    Issue number19
    StatePublished - Oct 2011

    ASJC Scopus subject areas

    • Statistics and Probability
    • Biochemistry
    • Molecular Biology
    • Computer Science Applications
    • Computational Theory and Mathematics
    • Computational Mathematics


    Dive into the research topics of 'The GNAT library for local and remote gene mention normalization'. Together they form a unique fingerprint.

    Cite this