Genetic differentiation between and within Northern Native American language groups: an argument for the expansion of the Native American CODIS database

Jessica A. Weise; Jillian Ng; Robert F. Oldt; Joy Viray; Kelly L. McCulloh; David Glenn Smith; Sreetharan Kanthaswamy

doi:10.1080/20961790.2021.1963088

Genetic differentiation between and within Northern Native American language groups: an argument for the expansion of the Native American CODIS database

Jessica A. Weise, Jillian Ng, Robert F. Oldt, Joy Viray, Kelly L. McCulloh, David Glenn Smith, Sreetharan Kanthaswamy

Research output: Contribution to journal › Article › peer-review

Abstract

The National Research Council recommends that genetic differentiation among subgroups of ethnic samples be lower than 3% of the total genetic differentiation within the ethnic sample to be used for estimating reliable random match probabilities for forensic use. Native American samples in the United States’ Combined DNA Index System (CODIS) database represent four language families: Algonquian, Na-Dene, Eskimo-Aleut, and Salishan. However, a minimum of 27 Native American language families exists in the US, not including language isolates. Our goal was to ascertain whether genetic differences are correlated with language groupings and, if so, whether additional language families would provide a more accurate representation of current genetic diversity among tribal populations. The 21 short tandem repeat (STR) loci included in the Globalfiler® PCR Amplification Kit were used to characterize six indigenous language families, including three of the four represented in the CODIS database (i.e. Algonquian, Na-Dene, and Eskimo-Aleut), and two language isolates (Miwok and Seri) using major population genetic diversity metrics such as F statistics and Bayesian clustering analysis of genotype frequencies. Most of the genetic variation (97%) was found to be within language families instead of among them (3%). In contrast, when only the three of the four language families represented in both the CODIS database and the present study were considered, 4% of the genetic variation occurred among the language groups. Bayesian clustering resulted in a maximum posterior probability indicating three genetically distinct groups among the eight language families and isolates: (1) Eskimo, (2) Seri, and (3) all other language groups and isolates, thus confirming genetic subdivision among subgroups of the CODIS Native American database. This genetic structure indicates the need for an increased number of Native American populations based on language affiliation in the CODIS database as well as more robust sample sets for those language families. Supplemental data for this article is available online at https://doi.org/10.1080/20961790.2021.1963088.

Original language	English (US)
Pages (from-to)	662-672
Number of pages	11
Journal	Forensic Sciences Research
Volume	7
Issue number	4
DOIs	https://doi.org/10.1080/20961790.2021.1963088
State	Published - 2022
Externally published	Yes

Keywords

Forensic sciences
Native Americans
North America
languages
population genetics
short tandem repeats (STRs or microsatellites)

ASJC Scopus subject areas

Analytical Chemistry
Pathology and Forensic Medicine
Biochemistry, Genetics and Molecular Biology (miscellaneous)
Anthropology
Physical and Theoretical Chemistry
Psychiatry and Mental health

Access to Document

10.1080/20961790.2021.1963088

Cite this

@article{f2ae663d1f6448658a23975205305660,

title = "Genetic differentiation between and within Northern Native American language groups: an argument for the expansion of the Native American CODIS database",

abstract = "The National Research Council recommends that genetic differentiation among subgroups of ethnic samples be lower than 3% of the total genetic differentiation within the ethnic sample to be used for estimating reliable random match probabilities for forensic use. Native American samples in the United States{\textquoteright} Combined DNA Index System (CODIS) database represent four language families: Algonquian, Na-Dene, Eskimo-Aleut, and Salishan. However, a minimum of 27 Native American language families exists in the US, not including language isolates. Our goal was to ascertain whether genetic differences are correlated with language groupings and, if so, whether additional language families would provide a more accurate representation of current genetic diversity among tribal populations. The 21 short tandem repeat (STR) loci included in the Globalfiler{\textregistered} PCR Amplification Kit were used to characterize six indigenous language families, including three of the four represented in the CODIS database (i.e. Algonquian, Na-Dene, and Eskimo-Aleut), and two language isolates (Miwok and Seri) using major population genetic diversity metrics such as F statistics and Bayesian clustering analysis of genotype frequencies. Most of the genetic variation (97%) was found to be within language families instead of among them (3%). In contrast, when only the three of the four language families represented in both the CODIS database and the present study were considered, 4% of the genetic variation occurred among the language groups. Bayesian clustering resulted in a maximum posterior probability indicating three genetically distinct groups among the eight language families and isolates: (1) Eskimo, (2) Seri, and (3) all other language groups and isolates, thus confirming genetic subdivision among subgroups of the CODIS Native American database. This genetic structure indicates the need for an increased number of Native American populations based on language affiliation in the CODIS database as well as more robust sample sets for those language families. Supplemental data for this article is available online at https://doi.org/10.1080/20961790.2021.1963088.",

keywords = "Forensic sciences, Native Americans, North America, languages, population genetics, short tandem repeats (STRs or microsatellites)",

author = "Weise, {Jessica A.} and Jillian Ng and Oldt, {Robert F.} and Joy Viray and McCulloh, {Kelly L.} and Smith, {David Glenn} and Sreetharan Kanthaswamy",

note = "Publisher Copyright: {\textcopyright} 2021 The Author(s). Published by Taylor & Francis Group on behalf of the Academy of Forensic Science.",

year = "2022",

doi = "10.1080/20961790.2021.1963088",

language = "English (US)",

volume = "7",

pages = "662--672",

journal = "Forensic Sciences Research",

issn = "2096-1790",

publisher = "Taylor and Francis Ltd.",

number = "4",

}

TY - JOUR

T1 - Genetic differentiation between and within Northern Native American language groups

T2 - an argument for the expansion of the Native American CODIS database

AU - Weise, Jessica A.

AU - Ng, Jillian

AU - Oldt, Robert F.

AU - Viray, Joy

AU - McCulloh, Kelly L.

AU - Smith, David Glenn

AU - Kanthaswamy, Sreetharan

PY - 2022

Y1 - 2022

N2 - The National Research Council recommends that genetic differentiation among subgroups of ethnic samples be lower than 3% of the total genetic differentiation within the ethnic sample to be used for estimating reliable random match probabilities for forensic use. Native American samples in the United States’ Combined DNA Index System (CODIS) database represent four language families: Algonquian, Na-Dene, Eskimo-Aleut, and Salishan. However, a minimum of 27 Native American language families exists in the US, not including language isolates. Our goal was to ascertain whether genetic differences are correlated with language groupings and, if so, whether additional language families would provide a more accurate representation of current genetic diversity among tribal populations. The 21 short tandem repeat (STR) loci included in the Globalfiler® PCR Amplification Kit were used to characterize six indigenous language families, including three of the four represented in the CODIS database (i.e. Algonquian, Na-Dene, and Eskimo-Aleut), and two language isolates (Miwok and Seri) using major population genetic diversity metrics such as F statistics and Bayesian clustering analysis of genotype frequencies. Most of the genetic variation (97%) was found to be within language families instead of among them (3%). In contrast, when only the three of the four language families represented in both the CODIS database and the present study were considered, 4% of the genetic variation occurred among the language groups. Bayesian clustering resulted in a maximum posterior probability indicating three genetically distinct groups among the eight language families and isolates: (1) Eskimo, (2) Seri, and (3) all other language groups and isolates, thus confirming genetic subdivision among subgroups of the CODIS Native American database. This genetic structure indicates the need for an increased number of Native American populations based on language affiliation in the CODIS database as well as more robust sample sets for those language families. Supplemental data for this article is available online at https://doi.org/10.1080/20961790.2021.1963088.

AB - The National Research Council recommends that genetic differentiation among subgroups of ethnic samples be lower than 3% of the total genetic differentiation within the ethnic sample to be used for estimating reliable random match probabilities for forensic use. Native American samples in the United States’ Combined DNA Index System (CODIS) database represent four language families: Algonquian, Na-Dene, Eskimo-Aleut, and Salishan. However, a minimum of 27 Native American language families exists in the US, not including language isolates. Our goal was to ascertain whether genetic differences are correlated with language groupings and, if so, whether additional language families would provide a more accurate representation of current genetic diversity among tribal populations. The 21 short tandem repeat (STR) loci included in the Globalfiler® PCR Amplification Kit were used to characterize six indigenous language families, including three of the four represented in the CODIS database (i.e. Algonquian, Na-Dene, and Eskimo-Aleut), and two language isolates (Miwok and Seri) using major population genetic diversity metrics such as F statistics and Bayesian clustering analysis of genotype frequencies. Most of the genetic variation (97%) was found to be within language families instead of among them (3%). In contrast, when only the three of the four language families represented in both the CODIS database and the present study were considered, 4% of the genetic variation occurred among the language groups. Bayesian clustering resulted in a maximum posterior probability indicating three genetically distinct groups among the eight language families and isolates: (1) Eskimo, (2) Seri, and (3) all other language groups and isolates, thus confirming genetic subdivision among subgroups of the CODIS Native American database. This genetic structure indicates the need for an increased number of Native American populations based on language affiliation in the CODIS database as well as more robust sample sets for those language families. Supplemental data for this article is available online at https://doi.org/10.1080/20961790.2021.1963088.

KW - Forensic sciences

KW - Native Americans

KW - North America

KW - languages

KW - population genetics

KW - short tandem repeats (STRs or microsatellites)

UR - http://www.scopus.com/inward/record.url?scp=85115223400&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85115223400&partnerID=8YFLogxK

U2 - 10.1080/20961790.2021.1963088

DO - 10.1080/20961790.2021.1963088

M3 - Article

AN - SCOPUS:85115223400

SN - 2096-1790

VL - 7

SP - 662

EP - 672

JO - Forensic Sciences Research

JF - Forensic Sciences Research

IS - 4

ER -

Genetic differentiation between and within Northern Native American language groups: an argument for the expansion of the Native American CODIS database

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this