TY - JOUR
T1 - Reliable Trees
T2 - Reliability Informed Recursive Partitioning for Psychological Data
AU - Grimm, Kevin J.
AU - Jacobucci, Ross
N1 - Funding Information:
Illustrative data come from the National Longitudinal Survey of Youth-Children and Young Adults (NLSY-CYA; Center for Human Resource Research, ), which was funded by the U.S. Bureau of Labor Statistics. The NLSY-CYA began in 1986 with children of female respondents of the National Longitudinal Survey of Youth 1979. These participants were assessed every two years. In 2010, participants were administered a short-form of the Center for Epidemiologic Studies-Depression (CES-D) Scale (11-items) and asked about suicide ideation in 2012. In this illustration, we use the CES-D total score and item responses to predict recent suicide ideation (seriously considered suicide during the past 12 months) for participants who indicated that they seriously considered attempting suicide. Participants with missing values were removed from the analysis leaving 299 participants who completed the CES-D in 2010 and responded to the suicide ideation question in 2012.
Publisher Copyright:
© 2020 Taylor & Francis Group, LLC.
PY - 2021
Y1 - 2021
N2 - Recursive partitioning, also known as decision trees and classification and regression trees (CART), is a machine learning procedure that has gained traction in the behavioral sciences because of its ability to search for nonlinear and interactive effects, and produce interpretable predictive models. The recursive partitioning algorithm is greedy—searching for the variable and the splitting value that maximizes outcome homogeneity. Thus, the algorithm can be overly sensitive to chance associations in the data, particularly in small samples. In an effort to limit chance associations, we propose and evaluate a reliability-based cost function for recursive partitioning. The reliability-based cost function increases the likelihood of selecting variables that are more reliable, which should have more consistent associations with the outcome of interest. Two reliability-based cost functions are proposed, evaluated through simulation, and compared to the CART algorithm. Results indicate that reliability-based cost functions can be beneficial, particularly with smaller samples and when more reliable variables are important to the prediction, but can overlook important associations between the outcome and lower reliability predictors. The use of these cost functions was illustrated using data on depression and suicidal ideation from the National Longitudinal Survey of Youth.
AB - Recursive partitioning, also known as decision trees and classification and regression trees (CART), is a machine learning procedure that has gained traction in the behavioral sciences because of its ability to search for nonlinear and interactive effects, and produce interpretable predictive models. The recursive partitioning algorithm is greedy—searching for the variable and the splitting value that maximizes outcome homogeneity. Thus, the algorithm can be overly sensitive to chance associations in the data, particularly in small samples. In an effort to limit chance associations, we propose and evaluate a reliability-based cost function for recursive partitioning. The reliability-based cost function increases the likelihood of selecting variables that are more reliable, which should have more consistent associations with the outcome of interest. Two reliability-based cost functions are proposed, evaluated through simulation, and compared to the CART algorithm. Results indicate that reliability-based cost functions can be beneficial, particularly with smaller samples and when more reliable variables are important to the prediction, but can overlook important associations between the outcome and lower reliability predictors. The use of these cost functions was illustrated using data on depression and suicidal ideation from the National Longitudinal Survey of Youth.
KW - CART
KW - Machine learning
KW - reliability
UR - http://www.scopus.com/inward/record.url?scp=85084121164&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85084121164&partnerID=8YFLogxK
U2 - 10.1080/00273171.2020.1751028
DO - 10.1080/00273171.2020.1751028
M3 - Article
C2 - 32298157
AN - SCOPUS:85084121164
SN - 0027-3171
VL - 56
SP - 595
EP - 607
JO - Multivariate Behavioral Research
JF - Multivariate Behavioral Research
IS - 4
ER -