Electronic medical records for clinical research: Application to the identification of heart failure

Serguei Pakhomov; Susan A. Weston; Steven J. Jacobsen; Christopher G. Chute; Ryan Meverden; Veronique L. Roger

Electronic medical records for clinical research: Application to the identification of heart failure

Serguei Pakhomov, Susan A. Weston, Steven J. Jacobsen, Christopher G. Chute, Ryan Meverden, Veronique L. Roger

Research output: Contribution to journal › Article › peer-review

Abstract

Objective: To identify patients with heart failure (HF) by using language contained in the electronic medical record (EMR). Methods: We validated 2 methods of identifying HF through the EMR, which offers transcription of clinical notes within 24 hours or less of the encounter. The first method was natural language processing (NLP) of the EMR text. The second method was predictive modeling based on machine learning, using the text of clinical reports. Natural language processing was compared with both manual record review and billing records. Predictive modeling was compared with manual record review. Results: Natural language processing identified 2904 HF cases; billing records independently identified 1684 HF cases, 252 (15%) of them not identified by NLP. Review of a random sample of these 252 cases did not identify HF, yielding 100% sensitivity (95% confidence interval [CI] = 86, 100) and 97.8% specificity (95% CI = 97.7, 97.9) for NLP. Manual review confirmed 1107 of the 2904 cases identified by NLP, yielding a positive predictive value (PPV) of 38% (95% CI = 36, 40). Predictive modeling yielded a PPV of 82% (95% CI = 73,93), 56% sensitivity (95% CI = 46, 67), and 96% specificity (95% CI = 94, 99). Conclusions: The EMR can be used to identify HF via 2 complementary approaches. Natural language processing may be more suitable for studies requiring highest sensitivity, whereas predictive modeling may be more suitable for studies requiring higher PPV.

Original language	English (US)
Pages (from-to)	281-288
Number of pages	8
Journal	American Journal of Managed Care
Volume	13
Issue number	6 I
State	Published - Jun 2007

ASJC Scopus subject areas

Health Policy

Cite this

@article{100239486d6d43b897210c41f4db49b9,

title = "Electronic medical records for clinical research: Application to the identification of heart failure",

abstract = "Objective: To identify patients with heart failure (HF) by using language contained in the electronic medical record (EMR). Methods: We validated 2 methods of identifying HF through the EMR, which offers transcription of clinical notes within 24 hours or less of the encounter. The first method was natural language processing (NLP) of the EMR text. The second method was predictive modeling based on machine learning, using the text of clinical reports. Natural language processing was compared with both manual record review and billing records. Predictive modeling was compared with manual record review. Results: Natural language processing identified 2904 HF cases; billing records independently identified 1684 HF cases, 252 (15%) of them not identified by NLP. Review of a random sample of these 252 cases did not identify HF, yielding 100% sensitivity (95% confidence interval [CI] = 86, 100) and 97.8% specificity (95% CI = 97.7, 97.9) for NLP. Manual review confirmed 1107 of the 2904 cases identified by NLP, yielding a positive predictive value (PPV) of 38% (95% CI = 36, 40). Predictive modeling yielded a PPV of 82% (95% CI = 73,93), 56% sensitivity (95% CI = 46, 67), and 96% specificity (95% CI = 94, 99). Conclusions: The EMR can be used to identify HF via 2 complementary approaches. Natural language processing may be more suitable for studies requiring highest sensitivity, whereas predictive modeling may be more suitable for studies requiring higher PPV.",

author = "Serguei Pakhomov and Weston, {Susan A.} and Jacobsen, {Steven J.} and Chute, {Christopher G.} and Ryan Meverden and Roger, {Veronique L.}",

year = "2007",

month = jun,

language = "English (US)",

volume = "13",

pages = "281--288",

journal = "American Journal of Managed Care",

issn = "1088-0224",

publisher = "Ascend Media",

number = "6 I",

}

TY - JOUR

T1 - Electronic medical records for clinical research

T2 - Application to the identification of heart failure

AU - Pakhomov, Serguei

AU - Weston, Susan A.

AU - Jacobsen, Steven J.

AU - Chute, Christopher G.

AU - Meverden, Ryan

AU - Roger, Veronique L.

PY - 2007/6

Y1 - 2007/6

N2 - Objective: To identify patients with heart failure (HF) by using language contained in the electronic medical record (EMR). Methods: We validated 2 methods of identifying HF through the EMR, which offers transcription of clinical notes within 24 hours or less of the encounter. The first method was natural language processing (NLP) of the EMR text. The second method was predictive modeling based on machine learning, using the text of clinical reports. Natural language processing was compared with both manual record review and billing records. Predictive modeling was compared with manual record review. Results: Natural language processing identified 2904 HF cases; billing records independently identified 1684 HF cases, 252 (15%) of them not identified by NLP. Review of a random sample of these 252 cases did not identify HF, yielding 100% sensitivity (95% confidence interval [CI] = 86, 100) and 97.8% specificity (95% CI = 97.7, 97.9) for NLP. Manual review confirmed 1107 of the 2904 cases identified by NLP, yielding a positive predictive value (PPV) of 38% (95% CI = 36, 40). Predictive modeling yielded a PPV of 82% (95% CI = 73,93), 56% sensitivity (95% CI = 46, 67), and 96% specificity (95% CI = 94, 99). Conclusions: The EMR can be used to identify HF via 2 complementary approaches. Natural language processing may be more suitable for studies requiring highest sensitivity, whereas predictive modeling may be more suitable for studies requiring higher PPV.

AB - Objective: To identify patients with heart failure (HF) by using language contained in the electronic medical record (EMR). Methods: We validated 2 methods of identifying HF through the EMR, which offers transcription of clinical notes within 24 hours or less of the encounter. The first method was natural language processing (NLP) of the EMR text. The second method was predictive modeling based on machine learning, using the text of clinical reports. Natural language processing was compared with both manual record review and billing records. Predictive modeling was compared with manual record review. Results: Natural language processing identified 2904 HF cases; billing records independently identified 1684 HF cases, 252 (15%) of them not identified by NLP. Review of a random sample of these 252 cases did not identify HF, yielding 100% sensitivity (95% confidence interval [CI] = 86, 100) and 97.8% specificity (95% CI = 97.7, 97.9) for NLP. Manual review confirmed 1107 of the 2904 cases identified by NLP, yielding a positive predictive value (PPV) of 38% (95% CI = 36, 40). Predictive modeling yielded a PPV of 82% (95% CI = 73,93), 56% sensitivity (95% CI = 46, 67), and 96% specificity (95% CI = 94, 99). Conclusions: The EMR can be used to identify HF via 2 complementary approaches. Natural language processing may be more suitable for studies requiring highest sensitivity, whereas predictive modeling may be more suitable for studies requiring higher PPV.

UR - http://www.scopus.com/inward/record.url?scp=34250895016&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34250895016&partnerID=8YFLogxK

M3 - Article

C2 - 17567225

AN - SCOPUS:34250895016

SN - 1088-0224

VL - 13

SP - 281

EP - 288

JO - American Journal of Managed Care

JF - American Journal of Managed Care

IS - 6 I

ER -

Electronic medical records for clinical research: Application to the identification of heart failure

Abstract

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this