Evolutionary balancing is critical for correctly forecasting disease-associated amino acid variants

Li Liu; Sudhir Kumar

doi:10.1093/molbev/mst037

Evolutionary balancing is critical for correctly forecasting disease-associated amino acid variants

Li Liu, Sudhir Kumar

Evolution and Medicine, Center for

Research output: Contribution to journal › Article › peer-review

16 Scopus citations

Abstract

Computational predictions have become indispensable for evaluating the disease-related impact of nonsynonymous single-nucleotide variants discovered in exome sequencing. Many such methods have their roots in molecular evolution, as they use information derived from multiple sequence alignments. We show that the performance of current methods (e.g., PolyPhen-2 and SIFT) is improved significantly by optimizing their statistical models on evolutionarily balanced training data, where equal numbers of positive and negative controls within each evolutionary conservation class are used. Evolutionary balancing significantly reduces the false-positive rates for variants observed at highly conserved sites and false-negative rates for variants observed at fast evolving sites. Use of these improved methods enables more accurate forecasting when concordant diagnosis from multiple methods is regarded as a more reliable indicator of the prediction. Applied to a large exome variation data set, we find that the current methods produce concordant predictions for less than half of the population variants. These advances are implemented in a web resource for use in practical applications (www.mypeg.info, last accessed March 13, 2013).

Original language	English (US)
Pages (from-to)	1252-1257
Number of pages	6
Journal	Molecular biology and evolution
Volume	30
Issue number	6
DOIs	https://doi.org/10.1093/molbev/mst037
State	Published - Jun 2013

Keywords

computational prediction
evolutionary medicine
nonsynonymous single nucleotide variant

ASJC Scopus subject areas

Ecology, Evolution, Behavior and Systematics
Molecular Biology
Genetics

Access to Document

10.1093/molbev/mst037

Cite this

@article{2615504c22de42d38eb6ef8cebe7d86e,

title = "Evolutionary balancing is critical for correctly forecasting disease-associated amino acid variants",

abstract = "Computational predictions have become indispensable for evaluating the disease-related impact of nonsynonymous single-nucleotide variants discovered in exome sequencing. Many such methods have their roots in molecular evolution, as they use information derived from multiple sequence alignments. We show that the performance of current methods (e.g., PolyPhen-2 and SIFT) is improved significantly by optimizing their statistical models on evolutionarily balanced training data, where equal numbers of positive and negative controls within each evolutionary conservation class are used. Evolutionary balancing significantly reduces the false-positive rates for variants observed at highly conserved sites and false-negative rates for variants observed at fast evolving sites. Use of these improved methods enables more accurate forecasting when concordant diagnosis from multiple methods is regarded as a more reliable indicator of the prediction. Applied to a large exome variation data set, we find that the current methods produce concordant predictions for less than half of the population variants. These advances are implemented in a web resource for use in practical applications (www.mypeg.info, last accessed March 13, 2013).",

keywords = "computational prediction, evolutionary medicine, nonsynonymous single nucleotide variant",

author = "Li Liu and Sudhir Kumar",

year = "2013",

month = jun,

doi = "10.1093/molbev/mst037",

language = "English (US)",

volume = "30",

pages = "1252--1257",

journal = "Molecular biology and evolution",

issn = "0737-4038",

publisher = "Oxford University Press",

number = "6",

}

TY - JOUR

T1 - Evolutionary balancing is critical for correctly forecasting disease-associated amino acid variants

AU - Liu, Li

AU - Kumar, Sudhir

PY - 2013/6

Y1 - 2013/6

N2 - Computational predictions have become indispensable for evaluating the disease-related impact of nonsynonymous single-nucleotide variants discovered in exome sequencing. Many such methods have their roots in molecular evolution, as they use information derived from multiple sequence alignments. We show that the performance of current methods (e.g., PolyPhen-2 and SIFT) is improved significantly by optimizing their statistical models on evolutionarily balanced training data, where equal numbers of positive and negative controls within each evolutionary conservation class are used. Evolutionary balancing significantly reduces the false-positive rates for variants observed at highly conserved sites and false-negative rates for variants observed at fast evolving sites. Use of these improved methods enables more accurate forecasting when concordant diagnosis from multiple methods is regarded as a more reliable indicator of the prediction. Applied to a large exome variation data set, we find that the current methods produce concordant predictions for less than half of the population variants. These advances are implemented in a web resource for use in practical applications (www.mypeg.info, last accessed March 13, 2013).

AB - Computational predictions have become indispensable for evaluating the disease-related impact of nonsynonymous single-nucleotide variants discovered in exome sequencing. Many such methods have their roots in molecular evolution, as they use information derived from multiple sequence alignments. We show that the performance of current methods (e.g., PolyPhen-2 and SIFT) is improved significantly by optimizing their statistical models on evolutionarily balanced training data, where equal numbers of positive and negative controls within each evolutionary conservation class are used. Evolutionary balancing significantly reduces the false-positive rates for variants observed at highly conserved sites and false-negative rates for variants observed at fast evolving sites. Use of these improved methods enables more accurate forecasting when concordant diagnosis from multiple methods is regarded as a more reliable indicator of the prediction. Applied to a large exome variation data set, we find that the current methods produce concordant predictions for less than half of the population variants. These advances are implemented in a web resource for use in practical applications (www.mypeg.info, last accessed March 13, 2013).

KW - computational prediction

KW - evolutionary medicine

KW - nonsynonymous single nucleotide variant

UR - http://www.scopus.com/inward/record.url?scp=84877766852&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84877766852&partnerID=8YFLogxK

U2 - 10.1093/molbev/mst037

DO - 10.1093/molbev/mst037

M3 - Article

C2 - 23462317

AN - SCOPUS:84877766852

SN - 0737-4038

VL - 30

SP - 1252

EP - 1257

JO - Molecular biology and evolution

JF - Molecular biology and evolution

IS - 6

ER -

Evolutionary balancing is critical for correctly forecasting disease-associated amino acid variants

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this