Assessing the effect of selection at the amino acid level in malaria antigen sequences through Bayesian generalized linear models

Daniel Merl, Raquel Prado, Ananías A. Escalante

Research output: Contribution to journalArticlepeer-review

2 Scopus citations


We present a statistical approach for identifying residues in DNA sequences for which diversity may be maintained by natural selection. Bayesian generalized linear models (GLMs) are used to describe patterns of mutation in a DNA sequence alignment. Posterior distributions of key quantities, such as probabilities of nonsynonymous and synonymous mutations per site, are studied. Inference in this class of models is achieved through customary Markov chain Monte Carlo methods. Model selection is dealt with by means of a minimum posterior predictive loss approach. We describe how information on the evolutionary process underlying the sequences can be formally incorporated into the models through structured priors. The proposed methodology was designed to analyze several DNA sequences encoding the vaccine candidate apical membrane antigen-1 (AMA-1) of the human malaria parasite Plasmodium falciparum. The study of genetic variability in antigen sequences is relevant to determining whether a particular antigen is a viable target for a vaccine construct. Using a simulation study, we first compare the GLM-based approach to existing methods for detecting sites under selection that are based on stochastic models of sequence evolution. We then apply the proposed models to the AMA-1 sequence data, which allows us to identify residues with the greatest disparities between nonsynonymous and synonymous changes. Recent experimental evidence suggests that several of these residues are immunologically relevant, indicating that the proposed models may be used predictively to identify functionally significant residues in antigens for which experimental results are not yet available.

Original languageEnglish (US)
Pages (from-to)1496-1507
Number of pages12
JournalJournal of the American Statistical Association
Issue number484
StatePublished - Dec 2008


  • Bayesian generalized linear model
  • DNA sequence data
  • Malaria antigens
  • Model comparison
  • Mutation count data
  • Natural selection
  • Structured priors.

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty


Dive into the research topics of 'Assessing the effect of selection at the amino acid level in malaria antigen sequences through Bayesian generalized linear models'. Together they form a unique fingerprint.

Cite this