Mining peripheral arterial disease cases from narrative clinical notes using natural language processing

Naveed Afzal, Sunghwan Sohn, Sara Abram, Christopher G. Scott, Rajeev Chaudhry, Hongfang D Liu, Iftikhar Jan Kullo, Adelaide M Arruda-Olson

Research output: Contribution to journalArticle

11 Citations (Scopus)

Abstract

Objective: Lower extremity peripheral arterial disease (PAD) is highly prevalent and affects millions of individuals worldwide. We developed a natural language processing (NLP) system for automated ascertainment of PAD cases from clinical narrative notes and compared the performance of the NLP algorithm with billing code algorithms, using ankle-brachial index test results as the gold standard. Methods: We compared the performance of the NLP algorithm to (1) results of gold standard ankle-brachial index; (2) previously validated algorithms based on relevant International Classification of Diseases, Ninth Revision diagnostic codes (simple model); and (3) a combination of International Classification of Diseases, Ninth Revision codes with procedural codes (full model). A dataset of 1569 patients with PAD and controls was randomly divided into training (n = 935) and testing (n = 634) subsets. Results: We iteratively refined the NLP algorithm in the training set including narrative note sections, note types, and service types, to maximize its accuracy. In the testing dataset, when compared with both simple and full models, the NLP algorithm had better accuracy (NLP, 91.8%; full model, 81.8%; simple model, 83%; P < .001), positive predictive value (NLP, 92.9%; full model, 74.3%; simple model, 79.9%; P < .001), and specificity (NLP, 92.5%; full model, 64.2%; simple model, 75.9%; P < .001). Conclusions: A knowledge-driven NLP algorithm for automatic ascertainment of PAD cases from clinical notes had greater accuracy than billing code algorithms. Our findings highlight the potential of NLP tools for rapid and efficient ascertainment of PAD cases from electronic health records to facilitate clinical investigation and eventually improve care by clinical decision support.

Original languageEnglish (US)
JournalJournal of Vascular Surgery
DOIs
StateAccepted/In press - Aug 10 2016

Fingerprint

Natural Language Processing
Peripheral Arterial Disease
Ankle Brachial Index
International Classification of Diseases
Clinical Decision Support Systems
Electronic Health Records
Lower Extremity

ASJC Scopus subject areas

  • Surgery
  • Cardiology and Cardiovascular Medicine

Cite this

@article{6a9eff9ab83d44bbaf0b2a8e0debe99e,
title = "Mining peripheral arterial disease cases from narrative clinical notes using natural language processing",
abstract = "Objective: Lower extremity peripheral arterial disease (PAD) is highly prevalent and affects millions of individuals worldwide. We developed a natural language processing (NLP) system for automated ascertainment of PAD cases from clinical narrative notes and compared the performance of the NLP algorithm with billing code algorithms, using ankle-brachial index test results as the gold standard. Methods: We compared the performance of the NLP algorithm to (1) results of gold standard ankle-brachial index; (2) previously validated algorithms based on relevant International Classification of Diseases, Ninth Revision diagnostic codes (simple model); and (3) a combination of International Classification of Diseases, Ninth Revision codes with procedural codes (full model). A dataset of 1569 patients with PAD and controls was randomly divided into training (n = 935) and testing (n = 634) subsets. Results: We iteratively refined the NLP algorithm in the training set including narrative note sections, note types, and service types, to maximize its accuracy. In the testing dataset, when compared with both simple and full models, the NLP algorithm had better accuracy (NLP, 91.8{\%}; full model, 81.8{\%}; simple model, 83{\%}; P < .001), positive predictive value (NLP, 92.9{\%}; full model, 74.3{\%}; simple model, 79.9{\%}; P < .001), and specificity (NLP, 92.5{\%}; full model, 64.2{\%}; simple model, 75.9{\%}; P < .001). Conclusions: A knowledge-driven NLP algorithm for automatic ascertainment of PAD cases from clinical notes had greater accuracy than billing code algorithms. Our findings highlight the potential of NLP tools for rapid and efficient ascertainment of PAD cases from electronic health records to facilitate clinical investigation and eventually improve care by clinical decision support.",
author = "Naveed Afzal and Sunghwan Sohn and Sara Abram and Scott, {Christopher G.} and Rajeev Chaudhry and Liu, {Hongfang D} and Kullo, {Iftikhar Jan} and Arruda-Olson, {Adelaide M}",
year = "2016",
month = "8",
day = "10",
doi = "10.1016/j.jvs.2016.11.031",
language = "English (US)",
journal = "Journal of Vascular Surgery",
issn = "0741-5214",
publisher = "Mosby Inc.",

}

TY - JOUR

T1 - Mining peripheral arterial disease cases from narrative clinical notes using natural language processing

AU - Afzal, Naveed

AU - Sohn, Sunghwan

AU - Abram, Sara

AU - Scott, Christopher G.

AU - Chaudhry, Rajeev

AU - Liu, Hongfang D

AU - Kullo, Iftikhar Jan

AU - Arruda-Olson, Adelaide M

PY - 2016/8/10

Y1 - 2016/8/10

N2 - Objective: Lower extremity peripheral arterial disease (PAD) is highly prevalent and affects millions of individuals worldwide. We developed a natural language processing (NLP) system for automated ascertainment of PAD cases from clinical narrative notes and compared the performance of the NLP algorithm with billing code algorithms, using ankle-brachial index test results as the gold standard. Methods: We compared the performance of the NLP algorithm to (1) results of gold standard ankle-brachial index; (2) previously validated algorithms based on relevant International Classification of Diseases, Ninth Revision diagnostic codes (simple model); and (3) a combination of International Classification of Diseases, Ninth Revision codes with procedural codes (full model). A dataset of 1569 patients with PAD and controls was randomly divided into training (n = 935) and testing (n = 634) subsets. Results: We iteratively refined the NLP algorithm in the training set including narrative note sections, note types, and service types, to maximize its accuracy. In the testing dataset, when compared with both simple and full models, the NLP algorithm had better accuracy (NLP, 91.8%; full model, 81.8%; simple model, 83%; P < .001), positive predictive value (NLP, 92.9%; full model, 74.3%; simple model, 79.9%; P < .001), and specificity (NLP, 92.5%; full model, 64.2%; simple model, 75.9%; P < .001). Conclusions: A knowledge-driven NLP algorithm for automatic ascertainment of PAD cases from clinical notes had greater accuracy than billing code algorithms. Our findings highlight the potential of NLP tools for rapid and efficient ascertainment of PAD cases from electronic health records to facilitate clinical investigation and eventually improve care by clinical decision support.

AB - Objective: Lower extremity peripheral arterial disease (PAD) is highly prevalent and affects millions of individuals worldwide. We developed a natural language processing (NLP) system for automated ascertainment of PAD cases from clinical narrative notes and compared the performance of the NLP algorithm with billing code algorithms, using ankle-brachial index test results as the gold standard. Methods: We compared the performance of the NLP algorithm to (1) results of gold standard ankle-brachial index; (2) previously validated algorithms based on relevant International Classification of Diseases, Ninth Revision diagnostic codes (simple model); and (3) a combination of International Classification of Diseases, Ninth Revision codes with procedural codes (full model). A dataset of 1569 patients with PAD and controls was randomly divided into training (n = 935) and testing (n = 634) subsets. Results: We iteratively refined the NLP algorithm in the training set including narrative note sections, note types, and service types, to maximize its accuracy. In the testing dataset, when compared with both simple and full models, the NLP algorithm had better accuracy (NLP, 91.8%; full model, 81.8%; simple model, 83%; P < .001), positive predictive value (NLP, 92.9%; full model, 74.3%; simple model, 79.9%; P < .001), and specificity (NLP, 92.5%; full model, 64.2%; simple model, 75.9%; P < .001). Conclusions: A knowledge-driven NLP algorithm for automatic ascertainment of PAD cases from clinical notes had greater accuracy than billing code algorithms. Our findings highlight the potential of NLP tools for rapid and efficient ascertainment of PAD cases from electronic health records to facilitate clinical investigation and eventually improve care by clinical decision support.

UR - http://www.scopus.com/inward/record.url?scp=85011903621&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85011903621&partnerID=8YFLogxK

U2 - 10.1016/j.jvs.2016.11.031

DO - 10.1016/j.jvs.2016.11.031

M3 - Article

JO - Journal of Vascular Surgery

JF - Journal of Vascular Surgery

SN - 0741-5214

ER -