Importance of multi-modal approaches to effectively identify cataract cases from electronic health records

Peggy L. Peissig, Luke V. Rasmussen, Richard L. Berg, James G. Linneman, Catherine A. McCarty, Carol Waudby, Lin Chen, Joshua C. Denny, Russell A. Wilke, Jyotishman Pathak, David Carrell, Abel N. Kho, Justin B. Starren

Research output: Contribution to journalArticle

62 Citations (Scopus)

Abstract

Objective There is increasing interest in using electronic health records (EHRs) to identify subjects for genomic association studies, due in part to the availability of large amounts of clinical data and the expected cost efficiencies of subject identification. We describe the construction and validation of an EHR-based algorithm to identify subjects with age-related cataracts. Materials and methods We used a multi-modal strategy consisting of structured database querying, natural language processing on free-text documents, and optical character recognition on scanned clinical images to identify cataract subjects and related cataract attributes. Extensive validation on 3657 subjects compared the multi-modal results to manual chart review. The algorithm was also implemented at participating electronic MEdical Records and GEnomics (eMERGE) institutions. Results An EHR-based cataract phenotyping algorithm was successfully developed and validated, resulting in positive predictive values (PPVs) >95%. The multi-modal approach increased the identification of cataract subject attributes by a factor of three compared to single-mode approaches while maintaining high PPV. Components of the cataract algorithm were successfully deployed at three other institutions with similar accuracy. Discussion A multi-modal strategy incorporating optical character recognition and natural language processing may increase the number of cases identified while maintaining similar PPVs. Such algorithms, however, require that the needed information be embedded within clinical documents. Conclusion We have demonstrated that algorithms to identify and characterize cataracts can be developed utilizing data collected via the EHR. These algorithms provide a high level of accuracy even when implemented across multiple EHRs and institutional boundaries.

Original languageEnglish (US)
Pages (from-to)225-234
Number of pages10
JournalJournal of the American Medical Informatics Association
Volume19
Issue number2
DOIs
StatePublished - Mar 2012

Fingerprint

Electronic Health Records
Cataract
Natural Language Processing
Genomics
Databases
Costs and Cost Analysis

ASJC Scopus subject areas

  • Health Informatics

Cite this

Peissig, P. L., Rasmussen, L. V., Berg, R. L., Linneman, J. G., McCarty, C. A., Waudby, C., ... Starren, J. B. (2012). Importance of multi-modal approaches to effectively identify cataract cases from electronic health records. Journal of the American Medical Informatics Association, 19(2), 225-234. https://doi.org/10.1136/amiajnl-2011-000456

Importance of multi-modal approaches to effectively identify cataract cases from electronic health records. / Peissig, Peggy L.; Rasmussen, Luke V.; Berg, Richard L.; Linneman, James G.; McCarty, Catherine A.; Waudby, Carol; Chen, Lin; Denny, Joshua C.; Wilke, Russell A.; Pathak, Jyotishman; Carrell, David; Kho, Abel N.; Starren, Justin B.

In: Journal of the American Medical Informatics Association, Vol. 19, No. 2, 03.2012, p. 225-234.

Research output: Contribution to journalArticle

Peissig, PL, Rasmussen, LV, Berg, RL, Linneman, JG, McCarty, CA, Waudby, C, Chen, L, Denny, JC, Wilke, RA, Pathak, J, Carrell, D, Kho, AN & Starren, JB 2012, 'Importance of multi-modal approaches to effectively identify cataract cases from electronic health records', Journal of the American Medical Informatics Association, vol. 19, no. 2, pp. 225-234. https://doi.org/10.1136/amiajnl-2011-000456
Peissig, Peggy L. ; Rasmussen, Luke V. ; Berg, Richard L. ; Linneman, James G. ; McCarty, Catherine A. ; Waudby, Carol ; Chen, Lin ; Denny, Joshua C. ; Wilke, Russell A. ; Pathak, Jyotishman ; Carrell, David ; Kho, Abel N. ; Starren, Justin B. / Importance of multi-modal approaches to effectively identify cataract cases from electronic health records. In: Journal of the American Medical Informatics Association. 2012 ; Vol. 19, No. 2. pp. 225-234.
@article{bdd6dc6ca69f43debb42f9273eafe4dd,
title = "Importance of multi-modal approaches to effectively identify cataract cases from electronic health records",
abstract = "Objective There is increasing interest in using electronic health records (EHRs) to identify subjects for genomic association studies, due in part to the availability of large amounts of clinical data and the expected cost efficiencies of subject identification. We describe the construction and validation of an EHR-based algorithm to identify subjects with age-related cataracts. Materials and methods We used a multi-modal strategy consisting of structured database querying, natural language processing on free-text documents, and optical character recognition on scanned clinical images to identify cataract subjects and related cataract attributes. Extensive validation on 3657 subjects compared the multi-modal results to manual chart review. The algorithm was also implemented at participating electronic MEdical Records and GEnomics (eMERGE) institutions. Results An EHR-based cataract phenotyping algorithm was successfully developed and validated, resulting in positive predictive values (PPVs) >95{\%}. The multi-modal approach increased the identification of cataract subject attributes by a factor of three compared to single-mode approaches while maintaining high PPV. Components of the cataract algorithm were successfully deployed at three other institutions with similar accuracy. Discussion A multi-modal strategy incorporating optical character recognition and natural language processing may increase the number of cases identified while maintaining similar PPVs. Such algorithms, however, require that the needed information be embedded within clinical documents. Conclusion We have demonstrated that algorithms to identify and characterize cataracts can be developed utilizing data collected via the EHR. These algorithms provide a high level of accuracy even when implemented across multiple EHRs and institutional boundaries.",
author = "Peissig, {Peggy L.} and Rasmussen, {Luke V.} and Berg, {Richard L.} and Linneman, {James G.} and McCarty, {Catherine A.} and Carol Waudby and Lin Chen and Denny, {Joshua C.} and Wilke, {Russell A.} and Jyotishman Pathak and David Carrell and Kho, {Abel N.} and Starren, {Justin B.}",
year = "2012",
month = "3",
doi = "10.1136/amiajnl-2011-000456",
language = "English (US)",
volume = "19",
pages = "225--234",
journal = "Journal of the American Medical Informatics Association : JAMIA",
issn = "1067-5027",
publisher = "Oxford University Press",
number = "2",

}

TY - JOUR

T1 - Importance of multi-modal approaches to effectively identify cataract cases from electronic health records

AU - Peissig, Peggy L.

AU - Rasmussen, Luke V.

AU - Berg, Richard L.

AU - Linneman, James G.

AU - McCarty, Catherine A.

AU - Waudby, Carol

AU - Chen, Lin

AU - Denny, Joshua C.

AU - Wilke, Russell A.

AU - Pathak, Jyotishman

AU - Carrell, David

AU - Kho, Abel N.

AU - Starren, Justin B.

PY - 2012/3

Y1 - 2012/3

N2 - Objective There is increasing interest in using electronic health records (EHRs) to identify subjects for genomic association studies, due in part to the availability of large amounts of clinical data and the expected cost efficiencies of subject identification. We describe the construction and validation of an EHR-based algorithm to identify subjects with age-related cataracts. Materials and methods We used a multi-modal strategy consisting of structured database querying, natural language processing on free-text documents, and optical character recognition on scanned clinical images to identify cataract subjects and related cataract attributes. Extensive validation on 3657 subjects compared the multi-modal results to manual chart review. The algorithm was also implemented at participating electronic MEdical Records and GEnomics (eMERGE) institutions. Results An EHR-based cataract phenotyping algorithm was successfully developed and validated, resulting in positive predictive values (PPVs) >95%. The multi-modal approach increased the identification of cataract subject attributes by a factor of three compared to single-mode approaches while maintaining high PPV. Components of the cataract algorithm were successfully deployed at three other institutions with similar accuracy. Discussion A multi-modal strategy incorporating optical character recognition and natural language processing may increase the number of cases identified while maintaining similar PPVs. Such algorithms, however, require that the needed information be embedded within clinical documents. Conclusion We have demonstrated that algorithms to identify and characterize cataracts can be developed utilizing data collected via the EHR. These algorithms provide a high level of accuracy even when implemented across multiple EHRs and institutional boundaries.

AB - Objective There is increasing interest in using electronic health records (EHRs) to identify subjects for genomic association studies, due in part to the availability of large amounts of clinical data and the expected cost efficiencies of subject identification. We describe the construction and validation of an EHR-based algorithm to identify subjects with age-related cataracts. Materials and methods We used a multi-modal strategy consisting of structured database querying, natural language processing on free-text documents, and optical character recognition on scanned clinical images to identify cataract subjects and related cataract attributes. Extensive validation on 3657 subjects compared the multi-modal results to manual chart review. The algorithm was also implemented at participating electronic MEdical Records and GEnomics (eMERGE) institutions. Results An EHR-based cataract phenotyping algorithm was successfully developed and validated, resulting in positive predictive values (PPVs) >95%. The multi-modal approach increased the identification of cataract subject attributes by a factor of three compared to single-mode approaches while maintaining high PPV. Components of the cataract algorithm were successfully deployed at three other institutions with similar accuracy. Discussion A multi-modal strategy incorporating optical character recognition and natural language processing may increase the number of cases identified while maintaining similar PPVs. Such algorithms, however, require that the needed information be embedded within clinical documents. Conclusion We have demonstrated that algorithms to identify and characterize cataracts can be developed utilizing data collected via the EHR. These algorithms provide a high level of accuracy even when implemented across multiple EHRs and institutional boundaries.

UR - http://www.scopus.com/inward/record.url?scp=84863123646&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84863123646&partnerID=8YFLogxK

U2 - 10.1136/amiajnl-2011-000456

DO - 10.1136/amiajnl-2011-000456

M3 - Article

C2 - 22319176

AN - SCOPUS:84863123646

VL - 19

SP - 225

EP - 234

JO - Journal of the American Medical Informatics Association : JAMIA

JF - Journal of the American Medical Informatics Association : JAMIA

SN - 1067-5027

IS - 2

ER -