TY - JOUR
T1 - Performance evaluation of six popular short-read simulators
AU - Milhaven, Mark
AU - Pfeifer, Susanne P.
N1 - Funding Information:
This work was supported by a National Science Foundation CAREER grant to SPP (DEB-2045343). Computations were performed on Arizona State University’s High-Performance Compute Cluster.
Publisher Copyright:
© 2022, The Author(s).
PY - 2023/2
Y1 - 2023/2
N2 - High-throughput sequencing data enables the comprehensive study of genomes and the variation therein. Essential for the interpretation of this genomic data is a thorough understanding of the computational methods used for processing and analysis. Whereas “gold-standard” empirical datasets exist for this purpose in humans, synthetic (i.e., simulated) sequencing data can offer important insights into the capabilities and limitations of computational pipelines for any arbitrary species and/or study design—yet, the ability of read simulator software to emulate genomic characteristics of empirical datasets remains poorly understood. We here compare the performance of six popular short-read simulators—ART, DWGSIM, InSilicoSeq, Mason, NEAT, and wgsim—and discuss important considerations for selecting suitable models for benchmarking.
AB - High-throughput sequencing data enables the comprehensive study of genomes and the variation therein. Essential for the interpretation of this genomic data is a thorough understanding of the computational methods used for processing and analysis. Whereas “gold-standard” empirical datasets exist for this purpose in humans, synthetic (i.e., simulated) sequencing data can offer important insights into the capabilities and limitations of computational pipelines for any arbitrary species and/or study design—yet, the ability of read simulator software to emulate genomic characteristics of empirical datasets remains poorly understood. We here compare the performance of six popular short-read simulators—ART, DWGSIM, InSilicoSeq, Mason, NEAT, and wgsim—and discuss important considerations for selecting suitable models for benchmarking.
UR - http://www.scopus.com/inward/record.url?scp=85143593608&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85143593608&partnerID=8YFLogxK
U2 - 10.1038/s41437-022-00577-3
DO - 10.1038/s41437-022-00577-3
M3 - Article
C2 - 36496447
AN - SCOPUS:85143593608
SN - 0018-067X
VL - 130
SP - 55
EP - 63
JO - Heredity
JF - Heredity
IS - 2
ER -