TY - JOUR
T1 - Mining peripheral arterial disease cases from narrative clinical notes using natural language processing
AU - Afzal, Naveed
AU - Sohn, Sunghwan
AU - Abram, Sara
AU - Scott, Christopher G.
AU - Chaudhry, Rajeev
AU - Liu, Hongfang
AU - Kullo, Iftikhar J.
AU - Arruda-Olson, Adelaide M.
N1 - Funding Information:
Research reported in this publication was supported by the National Heart, Lung, and Blood Institute of the National Institutes of Health (award K01HL124045) and the NHGRI eMERGE (Electronic Records and Genomics) Network grants HG04599 and HG006379. This study was made possible using the resources of the Rochester Epidemiology Project supported by the National Institute on Aging of the National Institutes of Health (award R01AG034676) and the NLP framework established through the NIGMS award R01GM102283A1. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Publisher Copyright:
© 2016 The Authors
PY - 2017/6
Y1 - 2017/6
N2 - Objective Lower extremity peripheral arterial disease (PAD) is highly prevalent and affects millions of individuals worldwide. We developed a natural language processing (NLP) system for automated ascertainment of PAD cases from clinical narrative notes and compared the performance of the NLP algorithm with billing code algorithms, using ankle-brachial index test results as the gold standard. Methods We compared the performance of the NLP algorithm to (1) results of gold standard ankle-brachial index; (2) previously validated algorithms based on relevant International Classification of Diseases, Ninth Revision diagnostic codes (simple model); and (3) a combination of International Classification of Diseases, Ninth Revision codes with procedural codes (full model). A dataset of 1569 patients with PAD and controls was randomly divided into training (n = 935) and testing (n = 634) subsets. Results We iteratively refined the NLP algorithm in the training set including narrative note sections, note types, and service types, to maximize its accuracy. In the testing dataset, when compared with both simple and full models, the NLP algorithm had better accuracy (NLP, 91.8%; full model, 81.8%; simple model, 83%; P <.001), positive predictive value (NLP, 92.9%; full model, 74.3%; simple model, 79.9%; P <.001), and specificity (NLP, 92.5%; full model, 64.2%; simple model, 75.9%; P <.001). Conclusions A knowledge-driven NLP algorithm for automatic ascertainment of PAD cases from clinical notes had greater accuracy than billing code algorithms. Our findings highlight the potential of NLP tools for rapid and efficient ascertainment of PAD cases from electronic health records to facilitate clinical investigation and eventually improve care by clinical decision support.
AB - Objective Lower extremity peripheral arterial disease (PAD) is highly prevalent and affects millions of individuals worldwide. We developed a natural language processing (NLP) system for automated ascertainment of PAD cases from clinical narrative notes and compared the performance of the NLP algorithm with billing code algorithms, using ankle-brachial index test results as the gold standard. Methods We compared the performance of the NLP algorithm to (1) results of gold standard ankle-brachial index; (2) previously validated algorithms based on relevant International Classification of Diseases, Ninth Revision diagnostic codes (simple model); and (3) a combination of International Classification of Diseases, Ninth Revision codes with procedural codes (full model). A dataset of 1569 patients with PAD and controls was randomly divided into training (n = 935) and testing (n = 634) subsets. Results We iteratively refined the NLP algorithm in the training set including narrative note sections, note types, and service types, to maximize its accuracy. In the testing dataset, when compared with both simple and full models, the NLP algorithm had better accuracy (NLP, 91.8%; full model, 81.8%; simple model, 83%; P <.001), positive predictive value (NLP, 92.9%; full model, 74.3%; simple model, 79.9%; P <.001), and specificity (NLP, 92.5%; full model, 64.2%; simple model, 75.9%; P <.001). Conclusions A knowledge-driven NLP algorithm for automatic ascertainment of PAD cases from clinical notes had greater accuracy than billing code algorithms. Our findings highlight the potential of NLP tools for rapid and efficient ascertainment of PAD cases from electronic health records to facilitate clinical investigation and eventually improve care by clinical decision support.
UR - http://www.scopus.com/inward/record.url?scp=85011903621&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85011903621&partnerID=8YFLogxK
U2 - 10.1016/j.jvs.2016.11.031
DO - 10.1016/j.jvs.2016.11.031
M3 - Article
C2 - 28189359
AN - SCOPUS:85011903621
SN - 0741-5214
VL - 65
SP - 1753
EP - 1761
JO - Journal of Vascular Surgery
JF - Journal of Vascular Surgery
IS - 6
ER -