PopInf: An Approach for Reproducibly Visualizing and Assigning Population Affiliation in Genomic Samples of Uncertain Origin

Angela M.Taravella Oill; Anagha J. Deshpande; Heini M. Natri; Melissa A. Wilson

doi:10.1089/cmb.2019.0434

PopInf: An Approach for Reproducibly Visualizing and Assigning Population Affiliation in Genomic Samples of Uncertain Origin

Angela M.Taravella Oill, Anagha J. Deshpande, Heini M. Natri, Melissa A. Wilson

Life Sciences, School of (SOLS)

Research output: Contribution to journal › Article › peer-review

2 Scopus citations

Abstract

Germline genetic variation contributes to cancer etiology, but self-reported race is not always consistent with genetic ancestry, and samples may not have identifying ancestry information. In this study, we describe a flexible computational pipeline, PopInf, to visualize principal component analysis output and assign ancestry to samples with unknown genetic ancestry, given a reference population panel of known origins. PopInf is implemented as a reproducible workflow in Snakemake with a tutorial on GitHub. We provide a preprocessed reference population panel that can be quickly and efficiently implemented in cancer genetics studies. We ran PopInf on The Cancer Genome Atlas (TCGA) liver cancer data and identify discrepancies between reported race and inferred genetic ancestry. The PopInf pipeline facilitates visualization and identification of genetic ancestry across samples, so that this ancestry can be accounted for in studies of disease risk.

Original language	English (US)
Pages (from-to)	296-303
Number of pages	8
Journal	Journal of Computational Biology
Volume	28
Issue number	3
DOIs	https://doi.org/10.1089/cmb.2019.0434
State	Published - Mar 2021

Keywords

cancer GWAS
computational pipeline
population ancestry
principal component analysis
visualization

ASJC Scopus subject areas

Modeling and Simulation
Molecular Biology
Genetics
Computational Mathematics
Computational Theory and Mathematics

Access to Document

10.1089/cmb.2019.0434

Cite this

@article{fd4263af3823413389c0f5d4d2686c01,

title = "PopInf: An Approach for Reproducibly Visualizing and Assigning Population Affiliation in Genomic Samples of Uncertain Origin",

abstract = "Germline genetic variation contributes to cancer etiology, but self-reported race is not always consistent with genetic ancestry, and samples may not have identifying ancestry information. In this study, we describe a flexible computational pipeline, PopInf, to visualize principal component analysis output and assign ancestry to samples with unknown genetic ancestry, given a reference population panel of known origins. PopInf is implemented as a reproducible workflow in Snakemake with a tutorial on GitHub. We provide a preprocessed reference population panel that can be quickly and efficiently implemented in cancer genetics studies. We ran PopInf on The Cancer Genome Atlas (TCGA) liver cancer data and identify discrepancies between reported race and inferred genetic ancestry. The PopInf pipeline facilitates visualization and identification of genetic ancestry across samples, so that this ancestry can be accounted for in studies of disease risk.",

keywords = "cancer GWAS, computational pipeline, population ancestry, principal component analysis, visualization",

author = "Oill, {Angela M.Taravella} and Deshpande, {Anagha J.} and Natri, {Heini M.} and Wilson, {Melissa A.}",

year = "2021",

month = mar,

doi = "10.1089/cmb.2019.0434",

language = "English (US)",

volume = "28",

pages = "296--303",

journal = "Journal of Computational Biology",

issn = "1066-5277",

publisher = "Mary Ann Liebert Inc.",

number = "3",

}

TY - JOUR

T1 - PopInf

T2 - An Approach for Reproducibly Visualizing and Assigning Population Affiliation in Genomic Samples of Uncertain Origin

AU - Oill, Angela M.Taravella

AU - Deshpande, Anagha J.

AU - Natri, Heini M.

AU - Wilson, Melissa A.

PY - 2021/3

Y1 - 2021/3

N2 - Germline genetic variation contributes to cancer etiology, but self-reported race is not always consistent with genetic ancestry, and samples may not have identifying ancestry information. In this study, we describe a flexible computational pipeline, PopInf, to visualize principal component analysis output and assign ancestry to samples with unknown genetic ancestry, given a reference population panel of known origins. PopInf is implemented as a reproducible workflow in Snakemake with a tutorial on GitHub. We provide a preprocessed reference population panel that can be quickly and efficiently implemented in cancer genetics studies. We ran PopInf on The Cancer Genome Atlas (TCGA) liver cancer data and identify discrepancies between reported race and inferred genetic ancestry. The PopInf pipeline facilitates visualization and identification of genetic ancestry across samples, so that this ancestry can be accounted for in studies of disease risk.

AB - Germline genetic variation contributes to cancer etiology, but self-reported race is not always consistent with genetic ancestry, and samples may not have identifying ancestry information. In this study, we describe a flexible computational pipeline, PopInf, to visualize principal component analysis output and assign ancestry to samples with unknown genetic ancestry, given a reference population panel of known origins. PopInf is implemented as a reproducible workflow in Snakemake with a tutorial on GitHub. We provide a preprocessed reference population panel that can be quickly and efficiently implemented in cancer genetics studies. We ran PopInf on The Cancer Genome Atlas (TCGA) liver cancer data and identify discrepancies between reported race and inferred genetic ancestry. The PopInf pipeline facilitates visualization and identification of genetic ancestry across samples, so that this ancestry can be accounted for in studies of disease risk.

KW - cancer GWAS

KW - computational pipeline

KW - population ancestry

KW - principal component analysis

KW - visualization

UR - http://www.scopus.com/inward/record.url?scp=85102123927&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85102123927&partnerID=8YFLogxK

U2 - 10.1089/cmb.2019.0434

DO - 10.1089/cmb.2019.0434

M3 - Article

C2 - 33074720

AN - SCOPUS:85102123927

SN - 1066-5277

VL - 28

SP - 296

EP - 303

JO - Journal of Computational Biology

JF - Journal of Computational Biology

IS - 3

ER -

PopInf: An Approach for Reproducibly Visualizing and Assigning Population Affiliation in Genomic Samples of Uncertain Origin

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this