TY - JOUR
T1 - A SNPshot of PubMed to associate genetic variants with drugs, diseases, and adverse reactions
AU - Hakenberg, Jörg
AU - Voronov, Dmitry
AU - Nguyên, Võ Hà
AU - Liang, Shanshan
AU - Anwar, Saadat
AU - Lumpkin, Barry
AU - Leaman, Robert
AU - Tari, Luis
AU - Baral, Chitta
N1 - Funding Information:
We kindly acknowledge funding by the National Science Foundation (VN, SL, BL), Science Foundation Arizona (LT, RL), Fulbright International Student Program Russia (DV), and Arizona State University (JH, CB). We would like to thank the anonymous reviewers whose suggestions helped to improve this manuscript.
PY - 2012/10
Y1 - 2012/10
N2 - Motivation: Genetic factors determine differences in pharmacokinetics, drug efficacy, and drug responses between individuals and sub-populations. Wrong dosages of drugs can lead to severe adverse drug reactions in individuals whose drug metabolism drastically differs from the " assumed average" Databases such as PharmGKB are excellent sources of pharmacogenetic information on enzymes, genetic variants, and drug response affected by changes in enzymatic activity. Here, we seek to aid researchers, database curators, and clinicians in their search for relevant information by automatically extracting these data from literature. Approach: We automatically populate a repository of information on genetic variants, relations to drugs, occurrence in sub-populations, and associations with disease. We mine textual data from PubMed abstracts to discover such genotype-phenotype associations, focusing on SNPs that can be associated with variations in drug response. The overall repository covers relations found between genes, variants, alleles, drugs, diseases, adverse drug reactions, populations, and allele frequencies. We cross-reference these data to EntrezGene, PharmGKB, PubChem, and others. Results: The performance regarding entity recognition and relation extraction yields a precision of 90-92% for the major entity types (gene, drug, disease), and 76-84% for relations involving these types. Comparison of our repository to PharmGKB reveals a coverage of 93% of gene-drug associations in PharmGKB and 97% of the gene-variant mappings based on 180,000 PubMed abstracts. Availability: http://bioai4core.fulton.asu.edu/snpshot.
AB - Motivation: Genetic factors determine differences in pharmacokinetics, drug efficacy, and drug responses between individuals and sub-populations. Wrong dosages of drugs can lead to severe adverse drug reactions in individuals whose drug metabolism drastically differs from the " assumed average" Databases such as PharmGKB are excellent sources of pharmacogenetic information on enzymes, genetic variants, and drug response affected by changes in enzymatic activity. Here, we seek to aid researchers, database curators, and clinicians in their search for relevant information by automatically extracting these data from literature. Approach: We automatically populate a repository of information on genetic variants, relations to drugs, occurrence in sub-populations, and associations with disease. We mine textual data from PubMed abstracts to discover such genotype-phenotype associations, focusing on SNPs that can be associated with variations in drug response. The overall repository covers relations found between genes, variants, alleles, drugs, diseases, adverse drug reactions, populations, and allele frequencies. We cross-reference these data to EntrezGene, PharmGKB, PubChem, and others. Results: The performance regarding entity recognition and relation extraction yields a precision of 90-92% for the major entity types (gene, drug, disease), and 76-84% for relations involving these types. Comparison of our repository to PharmGKB reveals a coverage of 93% of gene-drug associations in PharmGKB and 97% of the gene-variant mappings based on 180,000 PubMed abstracts. Availability: http://bioai4core.fulton.asu.edu/snpshot.
KW - Databases
KW - Information extraction
KW - Pharmacogenomics
KW - Text mining
UR - http://www.scopus.com/inward/record.url?scp=84865981409&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84865981409&partnerID=8YFLogxK
U2 - 10.1016/j.jbi.2012.04.006
DO - 10.1016/j.jbi.2012.04.006
M3 - Article
C2 - 22564364
AN - SCOPUS:84865981409
SN - 1532-0464
VL - 45
SP - 842
EP - 850
JO - Journal of Biomedical Informatics
JF - Journal of Biomedical Informatics
IS - 5
ER -