TY - JOUR
T1 - Classifying Refugee Status Using Common Features in EMR**
AU - Morrison, Malia
AU - Nobles, Vanessa
AU - Johnson-Agbakwu, Crista E.
AU - Bailey, Celeste
AU - Liu, Li
N1 - Publisher Copyright:
© 2022 Wiley-VHCA AG, Zurich, Switzerland.
PY - 2022/10
Y1 - 2022/10
N2 - Automated and accurate identification of refugees in healthcare databases is a critical first step to investigate healthcare needs of this vulnerable population and improve health disparities. In this study, we developed a machine-learning method, named refugee identification system (RIS) to address this need. We curated a data set consisting of 103 refugees and 930 non-refugees in Arizona. We compiled de-identified individual-level information including age, primary language, and noise-masked home address, state-level refugee resettlement statistics, and world language statistics. We then performed feature engineering to convert language and masked address into quantitative features. Finally, we built a random forest model to classify refugee and non-refugees. RIS achieved high classification accuracy (overall accuracy=0.97, specificity=0.99, sensitivity=0.85, positive predictive value=0.88, negative predictive value=0.98, and area under receiver operating characteristic curve=0.98). RIS is customizable for refugee identification outside Arizona. Its application enables large-scale investigation of refugee healthcare needs and improvement of health disparities.
AB - Automated and accurate identification of refugees in healthcare databases is a critical first step to investigate healthcare needs of this vulnerable population and improve health disparities. In this study, we developed a machine-learning method, named refugee identification system (RIS) to address this need. We curated a data set consisting of 103 refugees and 930 non-refugees in Arizona. We compiled de-identified individual-level information including age, primary language, and noise-masked home address, state-level refugee resettlement statistics, and world language statistics. We then performed feature engineering to convert language and masked address into quantitative features. Finally, we built a random forest model to classify refugee and non-refugees. RIS achieved high classification accuracy (overall accuracy=0.97, specificity=0.99, sensitivity=0.85, positive predictive value=0.88, negative predictive value=0.98, and area under receiver operating characteristic curve=0.98). RIS is customizable for refugee identification outside Arizona. Its application enables large-scale investigation of refugee healthcare needs and improvement of health disparities.
KW - health disparity
KW - health informatics
KW - machine learning
KW - refugee health
UR - http://www.scopus.com/inward/record.url?scp=85138568740&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85138568740&partnerID=8YFLogxK
U2 - 10.1002/cbdv.202200651
DO - 10.1002/cbdv.202200651
M3 - Article
C2 - 36050919
AN - SCOPUS:85138568740
SN - 1612-1872
VL - 19
JO - Chemistry and Biodiversity
JF - Chemistry and Biodiversity
IS - 10
M1 - e202200651
ER -