A Data-Driven Iterative Approach for Semi-automatically Assessing the Correctness of Medication Value Sets: A Proof of Concept Based on Opioids

Linyi Li; Adela Grando; Abeed Sarker

doi:10.1055/s-0041-1740358

A Data-Driven Iterative Approach for Semi-automatically Assessing the Correctness of Medication Value Sets: A Proof of Concept Based on Opioids

Linyi Li, Adela Grando, Abeed Sarker

Health Solutions, College of (CHS)

Research output: Contribution to journal › Article › peer-review

Abstract

Background Value sets are lists of terms (e.g., opioid medication names) and their corresponding codes from standard clinical vocabularies (e.g., RxNorm) created with the intent of supporting health information exchange and research. Value sets are manually-created and often exhibit errors. Objectives The aim of the study is to develop a semi-automatic, data-centric natural language processing (NLP) method to assess medication-related value set correctness and evaluate it on a set of opioid medication value sets. Methods We developed an NLP algorithm that utilizes value sets containing mostly true positives and true negatives to learn lexical patterns associated with the true positives, and then employs these patterns to identify potential errors in unseen value sets. We evaluated the algorithm on a set of opioid medication value sets, using the recall, precision and F ₁-score metrics. We applied the trained model to assess the correctness of unseen opioid value sets based on recall. To replicate the application of the algorithm in real-world settings, a domain expert manually conducted error analysis to identify potential system and value set errors. Results Thirty-eight value sets were retrieved from the Value Set Authority Center, and six (two opioid, four non-opioid) were used to develop and evaluate the system. Average precision, recall, and F ₁-score were 0.932, 0.904, and 0.909, respectively on uncorrected value sets; and 0.958, 0.953, and 0.953, respectively after manual correction of the same value sets. On 20 unseen opioid value sets, the algorithm obtained average recall of 0.89. Error analyses revealed that the main sources of system misclassifications were differences in how opioids were coded in the value sets-while the training value sets had generic names mostly, some of the unseen value sets had new trade names and ingredients. Conclusion The proposed approach is data-centric, reusable, customizable, and not resource intensive. It may help domain experts to easily validate value sets.

Original language	English (US)
Pages (from-to)	E111-E119
Journal	Methods of Information in Medicine
Volume	60
DOIs	https://doi.org/10.1055/s-0041-1740358
State	Published - Dec 1 2021

Keywords

medication value sets
natural language processing
opioids
prescription drugs

ASJC Scopus subject areas

Health Informatics
Advanced and Specialized Nursing
Health Information Management

Access to Document

10.1055/s-0041-1740358

Cite this

@article{f317a4b33af0484ea2808f7d3fca4eda,

title = "A Data-Driven Iterative Approach for Semi-automatically Assessing the Correctness of Medication Value Sets: A Proof of Concept Based on Opioids",

abstract = "Background Value sets are lists of terms (e.g., opioid medication names) and their corresponding codes from standard clinical vocabularies (e.g., RxNorm) created with the intent of supporting health information exchange and research. Value sets are manually-created and often exhibit errors. Objectives The aim of the study is to develop a semi-automatic, data-centric natural language processing (NLP) method to assess medication-related value set correctness and evaluate it on a set of opioid medication value sets. Methods We developed an NLP algorithm that utilizes value sets containing mostly true positives and true negatives to learn lexical patterns associated with the true positives, and then employs these patterns to identify potential errors in unseen value sets. We evaluated the algorithm on a set of opioid medication value sets, using the recall, precision and F 1-score metrics. We applied the trained model to assess the correctness of unseen opioid value sets based on recall. To replicate the application of the algorithm in real-world settings, a domain expert manually conducted error analysis to identify potential system and value set errors. Results Thirty-eight value sets were retrieved from the Value Set Authority Center, and six (two opioid, four non-opioid) were used to develop and evaluate the system. Average precision, recall, and F 1-score were 0.932, 0.904, and 0.909, respectively on uncorrected value sets; and 0.958, 0.953, and 0.953, respectively after manual correction of the same value sets. On 20 unseen opioid value sets, the algorithm obtained average recall of 0.89. Error analyses revealed that the main sources of system misclassifications were differences in how opioids were coded in the value sets-while the training value sets had generic names mostly, some of the unseen value sets had new trade names and ingredients. Conclusion The proposed approach is data-centric, reusable, customizable, and not resource intensive. It may help domain experts to easily validate value sets.",

keywords = "medication value sets, natural language processing, opioids, prescription drugs",

author = "Linyi Li and Adela Grando and Abeed Sarker",

year = "2021",

month = dec,

day = "1",

doi = "10.1055/s-0041-1740358",

language = "English (US)",

volume = "60",

pages = "E111--E119",

journal = "Methods of Information in Medicine",

issn = "0026-1270",

publisher = "Schattauer GmbH",

}

TY - JOUR

T1 - A Data-Driven Iterative Approach for Semi-automatically Assessing the Correctness of Medication Value Sets

T2 - A Proof of Concept Based on Opioids

AU - Li, Linyi

AU - Grando, Adela

AU - Sarker, Abeed

PY - 2021/12/1

Y1 - 2021/12/1

N2 - Background Value sets are lists of terms (e.g., opioid medication names) and their corresponding codes from standard clinical vocabularies (e.g., RxNorm) created with the intent of supporting health information exchange and research. Value sets are manually-created and often exhibit errors. Objectives The aim of the study is to develop a semi-automatic, data-centric natural language processing (NLP) method to assess medication-related value set correctness and evaluate it on a set of opioid medication value sets. Methods We developed an NLP algorithm that utilizes value sets containing mostly true positives and true negatives to learn lexical patterns associated with the true positives, and then employs these patterns to identify potential errors in unseen value sets. We evaluated the algorithm on a set of opioid medication value sets, using the recall, precision and F 1-score metrics. We applied the trained model to assess the correctness of unseen opioid value sets based on recall. To replicate the application of the algorithm in real-world settings, a domain expert manually conducted error analysis to identify potential system and value set errors. Results Thirty-eight value sets were retrieved from the Value Set Authority Center, and six (two opioid, four non-opioid) were used to develop and evaluate the system. Average precision, recall, and F 1-score were 0.932, 0.904, and 0.909, respectively on uncorrected value sets; and 0.958, 0.953, and 0.953, respectively after manual correction of the same value sets. On 20 unseen opioid value sets, the algorithm obtained average recall of 0.89. Error analyses revealed that the main sources of system misclassifications were differences in how opioids were coded in the value sets-while the training value sets had generic names mostly, some of the unseen value sets had new trade names and ingredients. Conclusion The proposed approach is data-centric, reusable, customizable, and not resource intensive. It may help domain experts to easily validate value sets.

AB - Background Value sets are lists of terms (e.g., opioid medication names) and their corresponding codes from standard clinical vocabularies (e.g., RxNorm) created with the intent of supporting health information exchange and research. Value sets are manually-created and often exhibit errors. Objectives The aim of the study is to develop a semi-automatic, data-centric natural language processing (NLP) method to assess medication-related value set correctness and evaluate it on a set of opioid medication value sets. Methods We developed an NLP algorithm that utilizes value sets containing mostly true positives and true negatives to learn lexical patterns associated with the true positives, and then employs these patterns to identify potential errors in unseen value sets. We evaluated the algorithm on a set of opioid medication value sets, using the recall, precision and F 1-score metrics. We applied the trained model to assess the correctness of unseen opioid value sets based on recall. To replicate the application of the algorithm in real-world settings, a domain expert manually conducted error analysis to identify potential system and value set errors. Results Thirty-eight value sets were retrieved from the Value Set Authority Center, and six (two opioid, four non-opioid) were used to develop and evaluate the system. Average precision, recall, and F 1-score were 0.932, 0.904, and 0.909, respectively on uncorrected value sets; and 0.958, 0.953, and 0.953, respectively after manual correction of the same value sets. On 20 unseen opioid value sets, the algorithm obtained average recall of 0.89. Error analyses revealed that the main sources of system misclassifications were differences in how opioids were coded in the value sets-while the training value sets had generic names mostly, some of the unseen value sets had new trade names and ingredients. Conclusion The proposed approach is data-centric, reusable, customizable, and not resource intensive. It may help domain experts to easily validate value sets.

KW - medication value sets

KW - natural language processing

KW - opioids

KW - prescription drugs

UR - http://www.scopus.com/inward/record.url?scp=85122982255&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85122982255&partnerID=8YFLogxK

U2 - 10.1055/s-0041-1740358

DO - 10.1055/s-0041-1740358

M3 - Article

C2 - 34965602

AN - SCOPUS:85122982255

SN - 0026-1270

VL - 60

SP - E111-E119

JO - Methods of Information in Medicine

JF - Methods of Information in Medicine

ER -

A Data-Driven Iterative Approach for Semi-automatically Assessing the Correctness of Medication Value Sets: A Proof of Concept Based on Opioids

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this