DEPP: Deep Learning Enables Extending Species Trees using Single Genes

Yueyu Jiang, Metin Balaban, Qiyun Zhu, Siavash Mirarab

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

Placing new sequences onto reference phylogenies is increasingly used for analyzing environmental samples, especially microbiomes. Existing placement methods assume that query sequences have evolved under specific models directly on the reference phylogeny. For example, they assume single-gene data (e.g., 16S rRNA amplicons) have evolved under the GTR model on a gene tree. Placement, however, often has a more ambitious goal: extending a (genome-wide) species tree given data from individual genes without knowing the evolutionary model. Addressing this challenging problem requires new directions. Here, we introduce Deep-learning Enabled Phylogenetic Placement (DEPP), an algorithm that learns to extend species trees using single genes without prespecified models. In simulations and on real data, we show that DEPP can match the accuracy of model-based methods without any prior knowledge of the model. We also show that DEPP can update the multilocus microbial tree-of-life with single genes with high accuracy. We further demonstrate that DEPP can combine 16S and metagenomic data onto a single tree, enabling community structure analyses that take advantage of both sources of data.

Original languageEnglish (US)
Pages (from-to)17-34
Number of pages18
JournalSystematic biology
Volume72
Issue number1
DOIs
StatePublished - Jan 1 2023

Keywords

  • Deep learning
  • gene tree discordance
  • metagenomics
  • microbiome analyses
  • neural networks
  • phylogenetic placement

ASJC Scopus subject areas

  • Genetics
  • Ecology, Evolution, Behavior and Systematics

Fingerprint

Dive into the research topics of 'DEPP: Deep Learning Enables Extending Species Trees using Single Genes'. Together they form a unique fingerprint.

Cite this