An Extended GFfit Statistic Defined on Orthogonal Components of Pearson’s Chi-Square

Mark Reiser; Silvia Cagnone; Junfei Zhu

doi:10.1007/s11336-022-09866-6

An Extended GFfit Statistic Defined on Orthogonal Components of Pearson’s Chi-Square

Mark Reiser, Silvia Cagnone, Junfei Zhu

Mathematical and Statistical Sciences, School of (SoMSS)

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

The Pearson and likelihood ratio statistics are commonly used to test goodness of fit for models applied to data from a multinomial distribution. The goodness-of-fit test based on Pearson’s Chi-squared statistic is sometimes considered to be a global test that gives little guidance to the source of poor fit when the null hypothesis is rejected, and it has also been recognized that the global test can often be outperformed in terms of power by focused or directional tests. For the cross-classification of a large number of manifest variables, the GFfit statistic focused on second-order marginals for variable pairs i, j has been proposed as a diagnostic to aid in finding the source of lack of fit after the model has been rejected based on a more global test. When data are from a table formed by the cross-classification of a large number of variables, the common global statistics may also have low power and inaccurate Type I error level due to sparseness in the cells of the table. The sparseness problem is rarely encountered with the GFfit statistic because it is focused on the lower-order marginals. In this paper, a new and extended version of the GFfit statistic is proposed by decomposing the Pearson statistic from the full table into orthogonal components defined on marginal distributions and then defining the new version, GFfit⊥(ij), as a partial sum of these orthogonal components. While the emphasis is on lower-order marginals, the new version of GFfit⊥(ij) is also extended to higher-order tables so that the GFfit_⊥ statistics sum to the Pearson statistic. As orthogonal components of the Pearson X² statistic, GFfit⊥(ij) statistics have advantages over other lack-of-fit diagnostics that are currently available for cross-classified tables: the GFfit⊥(ij) generally have higher power to detect lack of fit while maintaining good Type I error control even if the joint frequencies are very sparse, as will be shown in simulation results; theoretical results will establish that GFfit⊥(ij) statistics have known degrees of freedom and are asymptotically independent with known joint distribution, a property which facilitates less conservative control of false discovery rate (FDR) or familywise error rate (FWER) in a high-dimensional table which would produce a large number of bivariate lack-of-fit diagnostics. Computation of GFfit⊥(ij) statistics is also computationally stable. The extended GFfit⊥(ij) statistic can be applied to a variety of models for cross-classified tables. An application of the new GFfit statistic as a diagnostic for a latent variable model is presented.

Original language	English (US)
Pages (from-to)	208-240
Number of pages	33
Journal	Psychometrika
Volume	88
Issue number	1
DOIs	https://doi.org/10.1007/s11336-022-09866-6
State	Published - Mar 2023

Keywords

composite null hypothesis
multivariate discrete distribution
orthogonal components
overlapping cells

ASJC Scopus subject areas

General Psychology
Applied Mathematics

Access to Document

10.1007/s11336-022-09866-6

Cite this

@article{6cbf14d30a344dc58e1baeb0b4113d8c,

title = "An Extended GFfit Statistic Defined on Orthogonal Components of Pearson{\textquoteright}s Chi-Square",

abstract = "The Pearson and likelihood ratio statistics are commonly used to test goodness of fit for models applied to data from a multinomial distribution. The goodness-of-fit test based on Pearson{\textquoteright}s Chi-squared statistic is sometimes considered to be a global test that gives little guidance to the source of poor fit when the null hypothesis is rejected, and it has also been recognized that the global test can often be outperformed in terms of power by focused or directional tests. For the cross-classification of a large number of manifest variables, the GFfit statistic focused on second-order marginals for variable pairs i, j has been proposed as a diagnostic to aid in finding the source of lack of fit after the model has been rejected based on a more global test. When data are from a table formed by the cross-classification of a large number of variables, the common global statistics may also have low power and inaccurate Type I error level due to sparseness in the cells of the table. The sparseness problem is rarely encountered with the GFfit statistic because it is focused on the lower-order marginals. In this paper, a new and extended version of the GFfit statistic is proposed by decomposing the Pearson statistic from the full table into orthogonal components defined on marginal distributions and then defining the new version, GFfit⊥(ij), as a partial sum of these orthogonal components. While the emphasis is on lower-order marginals, the new version of GFfit⊥(ij) is also extended to higher-order tables so that the GFfit⊥ statistics sum to the Pearson statistic. As orthogonal components of the Pearson X2 statistic, GFfit⊥(ij) statistics have advantages over other lack-of-fit diagnostics that are currently available for cross-classified tables: the GFfit⊥(ij) generally have higher power to detect lack of fit while maintaining good Type I error control even if the joint frequencies are very sparse, as will be shown in simulation results; theoretical results will establish that GFfit⊥(ij) statistics have known degrees of freedom and are asymptotically independent with known joint distribution, a property which facilitates less conservative control of false discovery rate (FDR) or familywise error rate (FWER) in a high-dimensional table which would produce a large number of bivariate lack-of-fit diagnostics. Computation of GFfit⊥(ij) statistics is also computationally stable. The extended GFfit⊥(ij) statistic can be applied to a variety of models for cross-classified tables. An application of the new GFfit statistic as a diagnostic for a latent variable model is presented.",

keywords = "composite null hypothesis, multivariate discrete distribution, orthogonal components, overlapping cells",

author = "Mark Reiser and Silvia Cagnone and Junfei Zhu",

note = "Publisher Copyright: {\textcopyright} 2022, The Author(s) under exclusive licence to The Psychometric Society.",

year = "2023",

month = mar,

doi = "10.1007/s11336-022-09866-6",

language = "English (US)",

volume = "88",

pages = "208--240",

journal = "Psychometrika",

issn = "0033-3123",

publisher = "Springer New York",

number = "1",

}

TY - JOUR

T1 - An Extended GFfit Statistic Defined on Orthogonal Components of Pearson’s Chi-Square

AU - Reiser, Mark

AU - Cagnone, Silvia

AU - Zhu, Junfei

PY - 2023/3

Y1 - 2023/3

N2 - The Pearson and likelihood ratio statistics are commonly used to test goodness of fit for models applied to data from a multinomial distribution. The goodness-of-fit test based on Pearson’s Chi-squared statistic is sometimes considered to be a global test that gives little guidance to the source of poor fit when the null hypothesis is rejected, and it has also been recognized that the global test can often be outperformed in terms of power by focused or directional tests. For the cross-classification of a large number of manifest variables, the GFfit statistic focused on second-order marginals for variable pairs i, j has been proposed as a diagnostic to aid in finding the source of lack of fit after the model has been rejected based on a more global test. When data are from a table formed by the cross-classification of a large number of variables, the common global statistics may also have low power and inaccurate Type I error level due to sparseness in the cells of the table. The sparseness problem is rarely encountered with the GFfit statistic because it is focused on the lower-order marginals. In this paper, a new and extended version of the GFfit statistic is proposed by decomposing the Pearson statistic from the full table into orthogonal components defined on marginal distributions and then defining the new version, GFfit⊥(ij), as a partial sum of these orthogonal components. While the emphasis is on lower-order marginals, the new version of GFfit⊥(ij) is also extended to higher-order tables so that the GFfit⊥ statistics sum to the Pearson statistic. As orthogonal components of the Pearson X2 statistic, GFfit⊥(ij) statistics have advantages over other lack-of-fit diagnostics that are currently available for cross-classified tables: the GFfit⊥(ij) generally have higher power to detect lack of fit while maintaining good Type I error control even if the joint frequencies are very sparse, as will be shown in simulation results; theoretical results will establish that GFfit⊥(ij) statistics have known degrees of freedom and are asymptotically independent with known joint distribution, a property which facilitates less conservative control of false discovery rate (FDR) or familywise error rate (FWER) in a high-dimensional table which would produce a large number of bivariate lack-of-fit diagnostics. Computation of GFfit⊥(ij) statistics is also computationally stable. The extended GFfit⊥(ij) statistic can be applied to a variety of models for cross-classified tables. An application of the new GFfit statistic as a diagnostic for a latent variable model is presented.

AB - The Pearson and likelihood ratio statistics are commonly used to test goodness of fit for models applied to data from a multinomial distribution. The goodness-of-fit test based on Pearson’s Chi-squared statistic is sometimes considered to be a global test that gives little guidance to the source of poor fit when the null hypothesis is rejected, and it has also been recognized that the global test can often be outperformed in terms of power by focused or directional tests. For the cross-classification of a large number of manifest variables, the GFfit statistic focused on second-order marginals for variable pairs i, j has been proposed as a diagnostic to aid in finding the source of lack of fit after the model has been rejected based on a more global test. When data are from a table formed by the cross-classification of a large number of variables, the common global statistics may also have low power and inaccurate Type I error level due to sparseness in the cells of the table. The sparseness problem is rarely encountered with the GFfit statistic because it is focused on the lower-order marginals. In this paper, a new and extended version of the GFfit statistic is proposed by decomposing the Pearson statistic from the full table into orthogonal components defined on marginal distributions and then defining the new version, GFfit⊥(ij), as a partial sum of these orthogonal components. While the emphasis is on lower-order marginals, the new version of GFfit⊥(ij) is also extended to higher-order tables so that the GFfit⊥ statistics sum to the Pearson statistic. As orthogonal components of the Pearson X2 statistic, GFfit⊥(ij) statistics have advantages over other lack-of-fit diagnostics that are currently available for cross-classified tables: the GFfit⊥(ij) generally have higher power to detect lack of fit while maintaining good Type I error control even if the joint frequencies are very sparse, as will be shown in simulation results; theoretical results will establish that GFfit⊥(ij) statistics have known degrees of freedom and are asymptotically independent with known joint distribution, a property which facilitates less conservative control of false discovery rate (FDR) or familywise error rate (FWER) in a high-dimensional table which would produce a large number of bivariate lack-of-fit diagnostics. Computation of GFfit⊥(ij) statistics is also computationally stable. The extended GFfit⊥(ij) statistic can be applied to a variety of models for cross-classified tables. An application of the new GFfit statistic as a diagnostic for a latent variable model is presented.

KW - composite null hypothesis

KW - multivariate discrete distribution

KW - orthogonal components

KW - overlapping cells

UR - http://www.scopus.com/inward/record.url?scp=85131321200&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85131321200&partnerID=8YFLogxK

U2 - 10.1007/s11336-022-09866-6

DO - 10.1007/s11336-022-09866-6

M3 - Article

C2 - 35661291

AN - SCOPUS:85131321200

SN - 0033-3123

VL - 88

SP - 208

EP - 240

JO - Psychometrika

JF - Psychometrika

IS - 1

ER -

An Extended GFfit Statistic Defined on Orthogonal Components of Pearson’s Chi-Square

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this