Validation of electronic medical record-based phenotyping algorithms: Results and lessons learned from the eMERGE network

Katherine M. Newton; Peggy L. Peissig; Abel Ngo Kho; Suzette J. Bielinski; Richard L. Berg; Vidhu Choudhary; Melissa Basford; Christopher G. Chute; Iftikhar J. Kullo; Rongling Li; Jennifer A. Pacheco; Luke V. Rasmussen; Leslie Spangler; Joshua C. Denny

doi:10.1136/amiajnl-2012-000896

Validation of electronic medical record-based phenotyping algorithms: Results and lessons learned from the eMERGE network

Katherine M. Newton, Peggy L. Peissig, Abel Ngo Kho, Suzette J. Bielinski, Richard L. Berg, Vidhu Choudhary, Melissa Basford, Christopher G. Chute, Iftikhar J. Kullo, Rongling Li, Jennifer A. Pacheco, Luke V. Rasmussen, Leslie Spangler, Joshua C. Denny

Research output: Contribution to journal › Article › peer-review

199 Scopus citations

Abstract

Background Genetic studies require precise phenotype definitions, but electronic medical record (EMR) phenotype data are recorded inconsistently and in a variety of formats. Objective To present lessons learned about validation of EMR-based phenotypes from the Electronic Medical Records and Genomics (eMERGE) studies. Materials and methods The eMERGE network created and validated 13 EMR-derived phenotype algorithms. Network sites are Group Health, Marshfield Clinic, Mayo Clinic, Northwestern University, and Vanderbilt University. Results By validating EMR-derived phenotypes we learned that: (1) multisite validation improves phenotype algorithm accuracy; (2) targets for validation should be carefully considered and defined; (3) specifying time frames for review of variables eases validation time and improves accuracy; (4) using repeated measures requires defining the relevant time period and specifying the most meaningful value to be studied; (5) patient movement in and out of the health plan (transience) can result in incomplete or fragmented data; (6) the review scope should be defined carefully; (7) particular care is required in combining EMR and research data; (8) medication data can be assessed using claims, medications dispensed, or medications prescribed; (9) algorithm development and validation work best as an iterative process; and (10) validation by content experts or structured chart review can provide accurate results. Conclusions Despite the diverse structure of the five EMRs of the eMERGE sites, we developed, validated, and successfully deployed 13 electronic phenotype algorithms. Validation is a worthwhile process that not only measures phenotype performance but also strengthens phenotype algorithm definitions and enhances their inter-institutional sharing.

Original language	English (US)
Pages (from-to)	e147-e154
Journal	Journal of the American Medical Informatics Association
Volume	20
Issue number	E1
DOIs	https://doi.org/10.1136/amiajnl-2012-000896
State	Published - 2013

ASJC Scopus subject areas

Health Informatics

Access to Document

10.1136/amiajnl-2012-000896

Cite this

Newton, K. M., Peissig, P. L., Kho, A. N., Bielinski, S. J., Berg, R. L., Choudhary, V., Basford, M., Chute, C. G., Kullo, I. J., Li, R., Pacheco, J. A., Rasmussen, L. V., Spangler, L., & Denny, J. C. (2013). Validation of electronic medical record-based phenotyping algorithms: Results and lessons learned from the eMERGE network. Journal of the American Medical Informatics Association, 20(E1), e147-e154. https://doi.org/10.1136/amiajnl-2012-000896

Newton, KM, Peissig, PL, Kho, AN, Bielinski, SJ, Berg, RL, Choudhary, V, Basford, M, Chute, CG, Kullo, IJ, Li, R, Pacheco, JA, Rasmussen, LV, Spangler, L & Denny, JC 2013, 'Validation of electronic medical record-based phenotyping algorithms: Results and lessons learned from the eMERGE network', Journal of the American Medical Informatics Association, vol. 20, no. E1, pp. e147-e154. https://doi.org/10.1136/amiajnl-2012-000896

@article{2600dee51c1f48abb2efe21511b0e777,

title = "Validation of electronic medical record-based phenotyping algorithms: Results and lessons learned from the eMERGE network",

abstract = "Background Genetic studies require precise phenotype definitions, but electronic medical record (EMR) phenotype data are recorded inconsistently and in a variety of formats. Objective To present lessons learned about validation of EMR-based phenotypes from the Electronic Medical Records and Genomics (eMERGE) studies. Materials and methods The eMERGE network created and validated 13 EMR-derived phenotype algorithms. Network sites are Group Health, Marshfield Clinic, Mayo Clinic, Northwestern University, and Vanderbilt University. Results By validating EMR-derived phenotypes we learned that: (1) multisite validation improves phenotype algorithm accuracy; (2) targets for validation should be carefully considered and defined; (3) specifying time frames for review of variables eases validation time and improves accuracy; (4) using repeated measures requires defining the relevant time period and specifying the most meaningful value to be studied; (5) patient movement in and out of the health plan (transience) can result in incomplete or fragmented data; (6) the review scope should be defined carefully; (7) particular care is required in combining EMR and research data; (8) medication data can be assessed using claims, medications dispensed, or medications prescribed; (9) algorithm development and validation work best as an iterative process; and (10) validation by content experts or structured chart review can provide accurate results. Conclusions Despite the diverse structure of the five EMRs of the eMERGE sites, we developed, validated, and successfully deployed 13 electronic phenotype algorithms. Validation is a worthwhile process that not only measures phenotype performance but also strengthens phenotype algorithm definitions and enhances their inter-institutional sharing.",

author = "Newton, {Katherine M.} and Peissig, {Peggy L.} and Kho, {Abel Ngo} and Bielinski, {Suzette J.} and Berg, {Richard L.} and Vidhu Choudhary and Melissa Basford and Chute, {Christopher G.} and Kullo, {Iftikhar J.} and Rongling Li and Pacheco, {Jennifer A.} and Rasmussen, {Luke V.} and Leslie Spangler and Denny, {Joshua C.}",

year = "2013",

doi = "10.1136/amiajnl-2012-000896",

language = "English (US)",

volume = "20",

pages = "e147--e154",

journal = "Journal of the American Medical Informatics Association",

issn = "1067-5027",

publisher = "Oxford University Press",

number = "E1",

}

TY - JOUR

T1 - Validation of electronic medical record-based phenotyping algorithms

T2 - Results and lessons learned from the eMERGE network

AU - Newton, Katherine M.

AU - Peissig, Peggy L.

AU - Kho, Abel Ngo

AU - Bielinski, Suzette J.

AU - Berg, Richard L.

AU - Choudhary, Vidhu

AU - Basford, Melissa

AU - Chute, Christopher G.

AU - Kullo, Iftikhar J.

AU - Li, Rongling

AU - Pacheco, Jennifer A.

AU - Rasmussen, Luke V.

AU - Spangler, Leslie

AU - Denny, Joshua C.

PY - 2013

Y1 - 2013

N2 - Background Genetic studies require precise phenotype definitions, but electronic medical record (EMR) phenotype data are recorded inconsistently and in a variety of formats. Objective To present lessons learned about validation of EMR-based phenotypes from the Electronic Medical Records and Genomics (eMERGE) studies. Materials and methods The eMERGE network created and validated 13 EMR-derived phenotype algorithms. Network sites are Group Health, Marshfield Clinic, Mayo Clinic, Northwestern University, and Vanderbilt University. Results By validating EMR-derived phenotypes we learned that: (1) multisite validation improves phenotype algorithm accuracy; (2) targets for validation should be carefully considered and defined; (3) specifying time frames for review of variables eases validation time and improves accuracy; (4) using repeated measures requires defining the relevant time period and specifying the most meaningful value to be studied; (5) patient movement in and out of the health plan (transience) can result in incomplete or fragmented data; (6) the review scope should be defined carefully; (7) particular care is required in combining EMR and research data; (8) medication data can be assessed using claims, medications dispensed, or medications prescribed; (9) algorithm development and validation work best as an iterative process; and (10) validation by content experts or structured chart review can provide accurate results. Conclusions Despite the diverse structure of the five EMRs of the eMERGE sites, we developed, validated, and successfully deployed 13 electronic phenotype algorithms. Validation is a worthwhile process that not only measures phenotype performance but also strengthens phenotype algorithm definitions and enhances their inter-institutional sharing.

AB - Background Genetic studies require precise phenotype definitions, but electronic medical record (EMR) phenotype data are recorded inconsistently and in a variety of formats. Objective To present lessons learned about validation of EMR-based phenotypes from the Electronic Medical Records and Genomics (eMERGE) studies. Materials and methods The eMERGE network created and validated 13 EMR-derived phenotype algorithms. Network sites are Group Health, Marshfield Clinic, Mayo Clinic, Northwestern University, and Vanderbilt University. Results By validating EMR-derived phenotypes we learned that: (1) multisite validation improves phenotype algorithm accuracy; (2) targets for validation should be carefully considered and defined; (3) specifying time frames for review of variables eases validation time and improves accuracy; (4) using repeated measures requires defining the relevant time period and specifying the most meaningful value to be studied; (5) patient movement in and out of the health plan (transience) can result in incomplete or fragmented data; (6) the review scope should be defined carefully; (7) particular care is required in combining EMR and research data; (8) medication data can be assessed using claims, medications dispensed, or medications prescribed; (9) algorithm development and validation work best as an iterative process; and (10) validation by content experts or structured chart review can provide accurate results. Conclusions Despite the diverse structure of the five EMRs of the eMERGE sites, we developed, validated, and successfully deployed 13 electronic phenotype algorithms. Validation is a worthwhile process that not only measures phenotype performance but also strengthens phenotype algorithm definitions and enhances their inter-institutional sharing.

UR - http://www.scopus.com/inward/record.url?scp=84881328205&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84881328205&partnerID=8YFLogxK

U2 - 10.1136/amiajnl-2012-000896

DO - 10.1136/amiajnl-2012-000896

M3 - Article

C2 - 23531748

AN - SCOPUS:84881328205

SN - 1067-5027

VL - 20

SP - e147-e154

JO - Journal of the American Medical Informatics Association

JF - Journal of the American Medical Informatics Association

IS - E1

ER -

Validation of electronic medical record-based phenotyping algorithms: Results and lessons learned from the eMERGE network

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this