Employing computational linguistics techniques to identify limited patient health literacy: Findings from the ECLIPPSE study

Dean Schillinger; Renu Balyan; Scott A. Crossley; Danielle S. McNamara; Jennifer Y. Liu; Andrew J. Karter

doi:10.1111/1475-6773.13560

Employing computational linguistics techniques to identify limited patient health literacy: Findings from the ECLIPPSE study

Dean Schillinger, Renu Balyan, Scott A. Crossley, Danielle S. McNamara, Jennifer Y. Liu, Andrew J. Karter

Psychology

Research output: Contribution to journal › Article › peer-review

7 Scopus citations

Abstract

Objective: To develop novel, scalable, and valid literacy profiles for identifying limited health literacy patients by harnessing natural language processing. Data Source: With respect to the linguistic content, we analyzed 283 216 secure messages sent by 6941 diabetes patients to physicians within an integrated system's electronic portal. Sociodemographic, clinical, and utilization data were obtained via questionnaire and electronic health records. Study Design: Retrospective study used natural language processing and machine learning to generate five unique “Literacy Profiles” by employing various sets of linguistic indices: Flesch-Kincaid (LP_FK); basic indices of writing complexity, including lexical diversity (LP_LD) and writing quality (LP_WQ); and advanced indices related to syntactic complexity, lexical sophistication, and diversity, modeled from self-reported (LP_SR), and expert-rated (LP_Exp) health literacy. We first determined the performance of each literacy profile relative to self-reported and expert-rated health literacy to discriminate between high and low health literacy and then assessed Literacy Profiles’ relationships with known correlates of health literacy, such as patient sociodemographics and a range of health-related outcomes, including ratings of physician communication, medication adherence, diabetes control, comorbidities, and utilization. Principal Findings: LP_SR and LP_Exp performed best in discriminating between high and low self-reported (C-statistics: 0.86 and 0.58, respectively) and expert-rated health literacy (C-statistics: 0.71 and 0.87, respectively) and were significantly associated with educational attainment, race/ethnicity, Consumer Assessment of Provider and Systems (CAHPS) scores, adherence, glycemia, comorbidities, and emergency department visits. Conclusions: Since health literacy is a potentially remediable explanatory factor in health care disparities, the development of automated health literacy indicators represents a significant accomplishment with broad clinical and population health applications. Health systems could apply literacy profiles to efficiently determine whether quality of care and outcomes vary by patient health literacy; identify at-risk populations for targeting tailored health communications and self-management support interventions; and inform clinicians to promote improvements in individual-level care.

Original language	English (US)
Pages (from-to)	132-144
Number of pages	13
Journal	Health Services Research
Volume	56
Issue number	1
DOIs	https://doi.org/10.1111/1475-6773.13560
State	Published - Feb 2021

Keywords

communication
diabetes
health literacy
machine learning
managed care
natural language processing
secure messaging

ASJC Scopus subject areas

Health Policy

Access to Document

10.1111/1475-6773.13560

Cite this

@article{6856ac86592d4319acb53aa40e151fed,

title = "Employing computational linguistics techniques to identify limited patient health literacy: Findings from the ECLIPPSE study",

abstract = "Objective: To develop novel, scalable, and valid literacy profiles for identifying limited health literacy patients by harnessing natural language processing. Data Source: With respect to the linguistic content, we analyzed 283 216 secure messages sent by 6941 diabetes patients to physicians within an integrated system's electronic portal. Sociodemographic, clinical, and utilization data were obtained via questionnaire and electronic health records. Study Design: Retrospective study used natural language processing and machine learning to generate five unique “Literacy Profiles” by employing various sets of linguistic indices: Flesch-Kincaid (LP_FK); basic indices of writing complexity, including lexical diversity (LP_LD) and writing quality (LP_WQ); and advanced indices related to syntactic complexity, lexical sophistication, and diversity, modeled from self-reported (LP_SR), and expert-rated (LP_Exp) health literacy. We first determined the performance of each literacy profile relative to self-reported and expert-rated health literacy to discriminate between high and low health literacy and then assessed Literacy Profiles{\textquoteright} relationships with known correlates of health literacy, such as patient sociodemographics and a range of health-related outcomes, including ratings of physician communication, medication adherence, diabetes control, comorbidities, and utilization. Principal Findings: LP_SR and LP_Exp performed best in discriminating between high and low self-reported (C-statistics: 0.86 and 0.58, respectively) and expert-rated health literacy (C-statistics: 0.71 and 0.87, respectively) and were significantly associated with educational attainment, race/ethnicity, Consumer Assessment of Provider and Systems (CAHPS) scores, adherence, glycemia, comorbidities, and emergency department visits. Conclusions: Since health literacy is a potentially remediable explanatory factor in health care disparities, the development of automated health literacy indicators represents a significant accomplishment with broad clinical and population health applications. Health systems could apply literacy profiles to efficiently determine whether quality of care and outcomes vary by patient health literacy; identify at-risk populations for targeting tailored health communications and self-management support interventions; and inform clinicians to promote improvements in individual-level care.",

keywords = "communication, diabetes, health literacy, machine learning, managed care, natural language processing, secure messaging",

author = "Dean Schillinger and Renu Balyan and Crossley, {Scott A.} and McNamara, {Danielle S.} and Liu, {Jennifer Y.} and Karter, {Andrew J.}",

note = "Publisher Copyright: {\textcopyright} 2020 The Authors. Health Services Research published by Wiley Periodicals LLC on behalf of Health Research and Educational Trust",

year = "2021",

month = feb,

doi = "10.1111/1475-6773.13560",

language = "English (US)",

volume = "56",

pages = "132--144",

journal = "Health Services Research",

issn = "0017-9124",

publisher = "Wiley-Blackwell",

number = "1",

}

TY - JOUR

T1 - Employing computational linguistics techniques to identify limited patient health literacy

T2 - Findings from the ECLIPPSE study

AU - Schillinger, Dean

AU - Balyan, Renu

AU - Crossley, Scott A.

AU - McNamara, Danielle S.

AU - Liu, Jennifer Y.

AU - Karter, Andrew J.

PY - 2021/2

Y1 - 2021/2

N2 - Objective: To develop novel, scalable, and valid literacy profiles for identifying limited health literacy patients by harnessing natural language processing. Data Source: With respect to the linguistic content, we analyzed 283 216 secure messages sent by 6941 diabetes patients to physicians within an integrated system's electronic portal. Sociodemographic, clinical, and utilization data were obtained via questionnaire and electronic health records. Study Design: Retrospective study used natural language processing and machine learning to generate five unique “Literacy Profiles” by employing various sets of linguistic indices: Flesch-Kincaid (LP_FK); basic indices of writing complexity, including lexical diversity (LP_LD) and writing quality (LP_WQ); and advanced indices related to syntactic complexity, lexical sophistication, and diversity, modeled from self-reported (LP_SR), and expert-rated (LP_Exp) health literacy. We first determined the performance of each literacy profile relative to self-reported and expert-rated health literacy to discriminate between high and low health literacy and then assessed Literacy Profiles’ relationships with known correlates of health literacy, such as patient sociodemographics and a range of health-related outcomes, including ratings of physician communication, medication adherence, diabetes control, comorbidities, and utilization. Principal Findings: LP_SR and LP_Exp performed best in discriminating between high and low self-reported (C-statistics: 0.86 and 0.58, respectively) and expert-rated health literacy (C-statistics: 0.71 and 0.87, respectively) and were significantly associated with educational attainment, race/ethnicity, Consumer Assessment of Provider and Systems (CAHPS) scores, adherence, glycemia, comorbidities, and emergency department visits. Conclusions: Since health literacy is a potentially remediable explanatory factor in health care disparities, the development of automated health literacy indicators represents a significant accomplishment with broad clinical and population health applications. Health systems could apply literacy profiles to efficiently determine whether quality of care and outcomes vary by patient health literacy; identify at-risk populations for targeting tailored health communications and self-management support interventions; and inform clinicians to promote improvements in individual-level care.

AB - Objective: To develop novel, scalable, and valid literacy profiles for identifying limited health literacy patients by harnessing natural language processing. Data Source: With respect to the linguistic content, we analyzed 283 216 secure messages sent by 6941 diabetes patients to physicians within an integrated system's electronic portal. Sociodemographic, clinical, and utilization data were obtained via questionnaire and electronic health records. Study Design: Retrospective study used natural language processing and machine learning to generate five unique “Literacy Profiles” by employing various sets of linguistic indices: Flesch-Kincaid (LP_FK); basic indices of writing complexity, including lexical diversity (LP_LD) and writing quality (LP_WQ); and advanced indices related to syntactic complexity, lexical sophistication, and diversity, modeled from self-reported (LP_SR), and expert-rated (LP_Exp) health literacy. We first determined the performance of each literacy profile relative to self-reported and expert-rated health literacy to discriminate between high and low health literacy and then assessed Literacy Profiles’ relationships with known correlates of health literacy, such as patient sociodemographics and a range of health-related outcomes, including ratings of physician communication, medication adherence, diabetes control, comorbidities, and utilization. Principal Findings: LP_SR and LP_Exp performed best in discriminating between high and low self-reported (C-statistics: 0.86 and 0.58, respectively) and expert-rated health literacy (C-statistics: 0.71 and 0.87, respectively) and were significantly associated with educational attainment, race/ethnicity, Consumer Assessment of Provider and Systems (CAHPS) scores, adherence, glycemia, comorbidities, and emergency department visits. Conclusions: Since health literacy is a potentially remediable explanatory factor in health care disparities, the development of automated health literacy indicators represents a significant accomplishment with broad clinical and population health applications. Health systems could apply literacy profiles to efficiently determine whether quality of care and outcomes vary by patient health literacy; identify at-risk populations for targeting tailored health communications and self-management support interventions; and inform clinicians to promote improvements in individual-level care.

KW - communication

KW - diabetes

KW - health literacy

KW - machine learning

KW - managed care

KW - natural language processing

KW - secure messaging

UR - http://www.scopus.com/inward/record.url?scp=85091296602&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85091296602&partnerID=8YFLogxK

U2 - 10.1111/1475-6773.13560

DO - 10.1111/1475-6773.13560

M3 - Article

C2 - 32966630

AN - SCOPUS:85091296602

SN - 0017-9124

VL - 56

SP - 132

EP - 144

JO - Health Services Research

JF - Health Services Research

IS - 1

ER -

Employing computational linguistics techniques to identify limited patient health literacy: Findings from the ECLIPPSE study

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this