TY - JOUR
T1 - Multiple sequence alignment accuracy and phylogenetic inference
AU - Ogden, T. Heath
AU - Rosenberg, Michael S.
N1 - Funding Information:
ACKNOWLEDGMENTS Thanks to D. Morrison, K. Kjer, and an anonymous reviewer for comments and suggestions on an earlier version of this manuscript. This work was partially supported by the NIH R03-LM008637 (MSR) and Arizona State University.
PY - 2006/4
Y1 - 2006/4
N2 - Phylogenies are often thought to be more dependent upon the specifics of the sequence alignment rather than on the method of reconstruction. Simulation of sequences containing insertion and deletion events was performed in order to determine the role that alignment accuracy plays during phylogenetic inference. Data sets were simulated for pectinate, balanced, and random tree shapes under different conditions (ultrametric equal branch length, ultrametric random branch length, nonultrametric random branch length). Comparisons between hypothesized alignments and true alignments enabled determination of two measures of alignment accuracy, that of the total data set and that of individual branches. In general, our results indicate that as alignment error increases, topological accuracy decreases. This trend was much more pronounced for data sets derived from more pectinate topologies. In contrast, for balanced, ultrametric, equal branch length tree shapes, alignment inaccuracy had little average effect on tree reconstruction. These conclusions are based on average trends of many analyses under different conditions, and any one specific analysis, independent of the alignment accuracy, may recover very accurate or inaccurate topologies. Maximum likelihood and Bayesian, in general, outperformed neighbor joining and maximum parsimony in terms of tree reconstruction accuracy. Results also indicated that as the length of the branch and of the neighboring branches increase, alignment accuracy decreases, and the length of the neighboring branches is the major factor in topological accuracy. Thus, multiple-sequence alignment can be an important factor in downstream effects on topological reconstruction.
AB - Phylogenies are often thought to be more dependent upon the specifics of the sequence alignment rather than on the method of reconstruction. Simulation of sequences containing insertion and deletion events was performed in order to determine the role that alignment accuracy plays during phylogenetic inference. Data sets were simulated for pectinate, balanced, and random tree shapes under different conditions (ultrametric equal branch length, ultrametric random branch length, nonultrametric random branch length). Comparisons between hypothesized alignments and true alignments enabled determination of two measures of alignment accuracy, that of the total data set and that of individual branches. In general, our results indicate that as alignment error increases, topological accuracy decreases. This trend was much more pronounced for data sets derived from more pectinate topologies. In contrast, for balanced, ultrametric, equal branch length tree shapes, alignment inaccuracy had little average effect on tree reconstruction. These conclusions are based on average trends of many analyses under different conditions, and any one specific analysis, independent of the alignment accuracy, may recover very accurate or inaccurate topologies. Maximum likelihood and Bayesian, in general, outperformed neighbor joining and maximum parsimony in terms of tree reconstruction accuracy. Results also indicated that as the length of the branch and of the neighboring branches increase, alignment accuracy decreases, and the length of the neighboring branches is the major factor in topological accuracy. Thus, multiple-sequence alignment can be an important factor in downstream effects on topological reconstruction.
KW - Bayesian
KW - Maximum likelihood
KW - Maximum parsimony
KW - Multiple sequence alignment
KW - Neighbor joining
KW - Phylogenetics
KW - Simulation
KW - Tree reconstruction
UR - http://www.scopus.com/inward/record.url?scp=33744993430&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33744993430&partnerID=8YFLogxK
U2 - 10.1080/10635150500541730
DO - 10.1080/10635150500541730
M3 - Article
C2 - 16611602
AN - SCOPUS:33744993430
SN - 1063-5157
VL - 55
SP - 314
EP - 328
JO - Systematic biology
JF - Systematic biology
IS - 2
ER -