Discovering peripheral arterial disease cases from radiology notes using natural language processing

Guergana K. Savova; Jin Fan; Zi Ye; Sean P. Murphy; Jiaping Zheng; Christopher G. Chute; Iftikhar J. Kullo

Discovering peripheral arterial disease cases from radiology notes using natural language processing

Guergana K. Savova, Jin Fan, Zi Ye, Sean P. Murphy, Jiaping Zheng, Christopher G. Chute, Iftikhar J. Kullo

Cardiovascular Medicine

Research output: Contribution to journal › Article › peer-review

43 Scopus citations

Abstract

As part of the Electronic Medical Records and Genomics Network, we applied, extended and evaluated an open source clinical Natural Language Processing system, Mayo's Clinical Text Analysis and Knowledge Extraction System, for the discovery of peripheral arterial disease cases from radiology reports. The manually created gold standard consisted of 223 positive, 19 negative, 63 probable and 150 unknown cases. Overall accuracy agreement between the system and the gold standard was 0.93 as compared to a named entity recognition baseline of 0.46. Sensitivity for the positive, probable and unknown cases was 0.93-0.96, and for the negative cases was 0.72. Specificity and negative predictive value for all categories were in the 90's. The positive predictive value for the positive and unknown categories was in the high 90's, for the negative category was 0.84, and for the probable category was 0.63. We outline the main sources of errors and suggest improvements.

Original language	English (US)
Pages (from-to)	722-726
Number of pages	5
Journal	AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
Volume	2010
State	Published - 2010

ASJC Scopus subject areas

General Medicine

Cite this

@article{9c19e3572f0f4299a6f30c0cb650c2aa,

title = "Discovering peripheral arterial disease cases from radiology notes using natural language processing",

abstract = "As part of the Electronic Medical Records and Genomics Network, we applied, extended and evaluated an open source clinical Natural Language Processing system, Mayo's Clinical Text Analysis and Knowledge Extraction System, for the discovery of peripheral arterial disease cases from radiology reports. The manually created gold standard consisted of 223 positive, 19 negative, 63 probable and 150 unknown cases. Overall accuracy agreement between the system and the gold standard was 0.93 as compared to a named entity recognition baseline of 0.46. Sensitivity for the positive, probable and unknown cases was 0.93-0.96, and for the negative cases was 0.72. Specificity and negative predictive value for all categories were in the 90's. The positive predictive value for the positive and unknown categories was in the high 90's, for the negative category was 0.84, and for the probable category was 0.63. We outline the main sources of errors and suggest improvements.",

author = "Savova, {Guergana K.} and Jin Fan and Zi Ye and Murphy, {Sean P.} and Jiaping Zheng and Chute, {Christopher G.} and Kullo, {Iftikhar J.}",

year = "2010",

language = "English (US)",

volume = "2010",

pages = "722--726",

journal = "AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium",

issn = "1559-4076",

publisher = "American Medical Informatics Association",

}

TY - JOUR

T1 - Discovering peripheral arterial disease cases from radiology notes using natural language processing

AU - Savova, Guergana K.

AU - Fan, Jin

AU - Ye, Zi

AU - Murphy, Sean P.

AU - Zheng, Jiaping

AU - Chute, Christopher G.

AU - Kullo, Iftikhar J.

PY - 2010

Y1 - 2010

N2 - As part of the Electronic Medical Records and Genomics Network, we applied, extended and evaluated an open source clinical Natural Language Processing system, Mayo's Clinical Text Analysis and Knowledge Extraction System, for the discovery of peripheral arterial disease cases from radiology reports. The manually created gold standard consisted of 223 positive, 19 negative, 63 probable and 150 unknown cases. Overall accuracy agreement between the system and the gold standard was 0.93 as compared to a named entity recognition baseline of 0.46. Sensitivity for the positive, probable and unknown cases was 0.93-0.96, and for the negative cases was 0.72. Specificity and negative predictive value for all categories were in the 90's. The positive predictive value for the positive and unknown categories was in the high 90's, for the negative category was 0.84, and for the probable category was 0.63. We outline the main sources of errors and suggest improvements.

AB - As part of the Electronic Medical Records and Genomics Network, we applied, extended and evaluated an open source clinical Natural Language Processing system, Mayo's Clinical Text Analysis and Knowledge Extraction System, for the discovery of peripheral arterial disease cases from radiology reports. The manually created gold standard consisted of 223 positive, 19 negative, 63 probable and 150 unknown cases. Overall accuracy agreement between the system and the gold standard was 0.93 as compared to a named entity recognition baseline of 0.46. Sensitivity for the positive, probable and unknown cases was 0.93-0.96, and for the negative cases was 0.72. Specificity and negative predictive value for all categories were in the 90's. The positive predictive value for the positive and unknown categories was in the high 90's, for the negative category was 0.84, and for the probable category was 0.63. We outline the main sources of errors and suggest improvements.

UR - http://www.scopus.com/inward/record.url?scp=84964959775&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84964959775&partnerID=8YFLogxK

M3 - Article

C2 - 21347073

AN - SCOPUS:84964959775

SN - 1559-4076

VL - 2010

SP - 722

EP - 726

JO - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

JF - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

ER -

Discovering peripheral arterial disease cases from radiology notes using natural language processing

Abstract

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this