How should gaps be treated in parsimony? A comparison of approaches using simulation

T. Heath Ogden, Michael S. Rosenberg

Research output: Contribution to journalArticlepeer-review

77 Scopus citations


Simulation with indels was used to produce alignments where true site homologies in DNA sequences were known; the gaps from these datasets were removed and the sequences were then aligned to produce hypothesized alignments. Both alignments were then analyzed under three widely used methods of treating gaps during tree reconstruction under the maximum parsimony principle. With the true alignments, for many cases (82%), there was no difference in topological accuracy for the different methods of gap coding. However, in cases where a difference was present, coding gaps as a fifth state character or as separate presence/absence characters outperformed treating gaps as unknown/missing data nearly 90% of the time. For the hypothesized alignments, on average, all gap treatment approaches performed equally well. Data sets with higher sequence divergence and more pectinate tree shapes with variable branch lengths are more affected by gap coding than datasets associated with shallower non-pectinate tree shapes.

Original languageEnglish (US)
Pages (from-to)817-826
Number of pages10
JournalMolecular Phylogenetics and Evolution
Issue number3
StatePublished - Mar 2007


  • Clustal
  • DNA sequence alignment
  • Evolutionary distance
  • Gap character coding
  • Homology
  • Indel
  • Parsimony
  • Simulation
  • Tree reconstruction

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Molecular Biology
  • Genetics


Dive into the research topics of 'How should gaps be treated in parsimony? A comparison of approaches using simulation'. Together they form a unique fingerprint.

Cite this