TY - JOUR
T1 - PICS-Ord
T2 - Unlimited coding of ambiguous regions by pairwise identity and cost scores ordination
AU - Lücking, Robert
AU - Hodkinson, Brendan P.
AU - Stamatakis, Alexandros
AU - Cartwright, Reed A.
N1 - Funding Information:
This study was elaborated in the framework of phylogenetic studies of the lichen family Graphidaceae (= Thelotremataceae) and the basidiolichen- containing family Hygrophoraceae. Both studies received support from the National Science Foundation under the titles ‘Phylogeny and Taxonomy of Ostropalean Fungi, with Emphasis on the Lichen-forming Thelotremataceae’ (NSF-DEB 0516116 to The Field Museum; PI H.T. Lumbsch; Co-PI R. Lücking) and ‘Phylogenetic Diversity of Mycobionts and Photobionts in the Cyanolichen Genus Dictyonema, with Emphasis on the Neotropics and the Galapagos Islands’ (NSF-DEB 0841405 to George Mason University, Virginia; PI J. Lawrey, Co-PIs R. Lücking and P. Gillevet). Additional material for these phylogenetic studies that was used in part for the present paper was obtained through the NSF-funded projects ‘TICOLICHEN - The Costa Rican Biodiversity Inventory’ (NSF-DEB 0206125 to The Field Museum; PI R. Lücking) and ‘Neotropical Epiphytic Microlichens - An Innovative Inventory of a Highly Diverse yet Little Known Group of Symbiotic Organisms’ (NSF-DEB 715660 to The Field Museum; PI R. Lücking). Brendan P. Hodkinson was funded by a Graduate Fellowship from the Mycological Society of America and a Doctoral Dissertation Improvement Grant from the National Science Foundation entitled ‘A Phylogenetic Characterization of the Lichen Microbiome’ (NSF-DEB 1011504 to Duke University; PI F. Lutzoni, Co-PI B. P. Hodkinson). Reed Cartwright was funded by NLM grant LM010009-01 to D. Graur and G. Landan. Marti Anderson (Massey University, Auckland, New Zealand) and Bruce McCune (Oregon State University) are warmly thanked for discussions and advice regarding principal coordinates analysis, and François Lutzoni (Duke University) for discussing methods of ambiguous region coding. Eimy Rivas Plata (University of Illinois-Chicago), Matthew Nelsen (University of Chicago), James Lawrey (George Mason University), and four anonymous reviewers reviewed an earlier version of this manuscript.
PY - 2011/1/7
Y1 - 2011/1/7
N2 - Background: We present a novel method to encode ambiguously aligned regions in fixed multiple sequence alignments by 'Pairwise Identity and Cost Scores Ordination' (PICS-Ord). The method works via ordination of sequence identity or cost scores matrices by means of Principal Coordinates Analysis (PCoA). After identification of ambiguous regions, the method computes pairwise distances as sequence identities or cost scores, ordinates the resulting distance matrix by means of PCoA, and encodes the principal coordinates as ordered integers. Three biological and 100 simulated datasets were used to assess the performance of the new method.Results: Including ambiguous regions coded by means of PICS-Ord increased topological accuracy, resolution, and bootstrap support in real biological and simulated datasets compared to the alternative of excluding such regions from the analysis a priori. In terms of accuracy, PICS-Ord performs equal to or better than previously available methods of ambiguous region coding (e.g., INAASE), with the advantage of a practically unlimited alignment size and increased analytical speed and the possibility of PICS-Ord scores to be analyzed together with DNA data in a partitioned maximum likelihood model.Conclusions: Advantages of PICS-Ord over step matrix-based ambiguous region coding with INAASE include a practically unlimited number of OTUs and seamless integration of PICS-Ord codes into phylogenetic datasets, as well as the increased speed of phylogenetic analysis. Contrary to word- and frequency-based methods, PICS-Ord maintains the advantage of pairwise sequence alignment to derive distances, and the method is flexible with respect to the calculation of distance scores. In addition to distance and maximum parsimony, PICS-Ord codes can be analyzed in a Bayesian or maximum likelihood framework. RAxML (version 7.2.6 or higher that was developed for this study) allows up to 32-state ordered or unordered characters. A GTR, MK, or ORDERED model can be applied to analyse the PICS-Ord codes partition, with GTR performing slightly better than MK and ORDERED.Availability: An implementation of the PICS-Ord algorithm is available from http://scit.us/projects/ngila/wiki/PICS-Ord. It requires both the statistical software, R http://www.r-project.org and the alignment software Ngila http://scit.us/projects/ngila.
AB - Background: We present a novel method to encode ambiguously aligned regions in fixed multiple sequence alignments by 'Pairwise Identity and Cost Scores Ordination' (PICS-Ord). The method works via ordination of sequence identity or cost scores matrices by means of Principal Coordinates Analysis (PCoA). After identification of ambiguous regions, the method computes pairwise distances as sequence identities or cost scores, ordinates the resulting distance matrix by means of PCoA, and encodes the principal coordinates as ordered integers. Three biological and 100 simulated datasets were used to assess the performance of the new method.Results: Including ambiguous regions coded by means of PICS-Ord increased topological accuracy, resolution, and bootstrap support in real biological and simulated datasets compared to the alternative of excluding such regions from the analysis a priori. In terms of accuracy, PICS-Ord performs equal to or better than previously available methods of ambiguous region coding (e.g., INAASE), with the advantage of a practically unlimited alignment size and increased analytical speed and the possibility of PICS-Ord scores to be analyzed together with DNA data in a partitioned maximum likelihood model.Conclusions: Advantages of PICS-Ord over step matrix-based ambiguous region coding with INAASE include a practically unlimited number of OTUs and seamless integration of PICS-Ord codes into phylogenetic datasets, as well as the increased speed of phylogenetic analysis. Contrary to word- and frequency-based methods, PICS-Ord maintains the advantage of pairwise sequence alignment to derive distances, and the method is flexible with respect to the calculation of distance scores. In addition to distance and maximum parsimony, PICS-Ord codes can be analyzed in a Bayesian or maximum likelihood framework. RAxML (version 7.2.6 or higher that was developed for this study) allows up to 32-state ordered or unordered characters. A GTR, MK, or ORDERED model can be applied to analyse the PICS-Ord codes partition, with GTR performing slightly better than MK and ORDERED.Availability: An implementation of the PICS-Ord algorithm is available from http://scit.us/projects/ngila/wiki/PICS-Ord. It requires both the statistical software, R http://www.r-project.org and the alignment software Ngila http://scit.us/projects/ngila.
UR - http://www.scopus.com/inward/record.url?scp=78650935230&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78650935230&partnerID=8YFLogxK
U2 - 10.1186/1471-2105-12-10
DO - 10.1186/1471-2105-12-10
M3 - Article
C2 - 21214904
AN - SCOPUS:78650935230
SN - 1471-2105
VL - 12
JO - BMC bioinformatics
JF - BMC bioinformatics
M1 - 10
ER -