PheKB: A catalog and workflow for creating electronic phenotype algorithms for transportability

Jacqueline C. Kirby; Peter Speltz; Luke V. Rasmussen; Melissa Basford; Omri Gottesman; Peggy L. Peissig; Jennifer A. Pacheco; Gerard Tromp; Jyotishman Pathak; David S. Carrell; Stephen B. Ellis; Todd Lingren; Will K. Thompson; Guergana Savova; Jonathan Haines; Dan M. Roden; Paul A. Harris; Joshua C. Denny

doi:10.1093/jamia/ocv202

PheKB: A catalog and workflow for creating electronic phenotype algorithms for transportability

Jacqueline C. Kirby, Peter Speltz, Luke V. Rasmussen, Melissa Basford, Omri Gottesman, Peggy L. Peissig, Jennifer A. Pacheco, Gerard Tromp, Jyotishman Pathak, David S. Carrell, Stephen B. Ellis, Todd Lingren, Will K. Thompson, Guergana Savova, Jonathan Haines, Dan M. Roden, Paul A. Harris, Joshua C. Denny

Quantitative Health Sciences

Research output: Contribution to journal › Article › peer-review

102 Scopus citations

Abstract

Objective Health care generated data have become an important source for clinical and genomic research. Often, investigators create and iteratively refine phenotype algorithms to achieve high positive predictive values (PPVs) or sensitivity, thereby identifying valid cases and controls. These algorithms achieve the greatest utility when validated and shared by multiple health care systems.Materials and Methods We report the current status and impact of the Phenotype KnowledgeBase (PheKB, http://phekb.org), an online environment supporting the workflow of building, sharing, and validating electronic phenotype algorithms. We analyze the most frequent components used in algorithms and their performance at authoring institutions and secondary implementation sites.Results As of June 2015, PheKB contained 30 finalized phenotype algorithms and 62 algorithms in development spanning a range of traits and diseases. Phenotypes have had over 3500 unique views in a 6-month period and have been reused by other institutions. International Classification of Disease codes were the most frequently used component, followed by medications and natural language processing. Among algorithms with published performance data, the median PPV was nearly identical when evaluated at the authoring institutions (n = 44; case 96.0%, control 100%) compared to implementation sites (n = 40; case 97.5%, control 100%).Discussion These results demonstrate that a broad range of algorithms to mine electronic health record data from different health systems can be developed with high PPV, and algorithms developed at one site are generally transportable to others.Conclusion By providing a central repository, PheKB enables improved development, transportability, and validity of algorithms for research-grade phenotypes using health care generated data.

Original language	English (US)
Pages (from-to)	1046-1052
Number of pages	7
Journal	Journal of the American Medical Informatics Association
Volume	23
Issue number	6
DOIs	https://doi.org/10.1093/jamia/ocv202
State	Published - Nov 1 2016

Keywords

Clinical research
Electronic health records
Electronic phenotyping
Genomic research
Natural language processing

ASJC Scopus subject areas

Health Informatics

Access to Document

10.1093/jamia/ocv202

Cite this

Kirby, J. C., Speltz, P., Rasmussen, L. V., Basford, M., Gottesman, O., Peissig, P. L., Pacheco, J. A., Tromp, G., Pathak, J., Carrell, D. S., Ellis, S. B., Lingren, T., Thompson, W. K., Savova, G., Haines, J., Roden, D. M., Harris, P. A., & Denny, J. C. (2016). PheKB: A catalog and workflow for creating electronic phenotype algorithms for transportability. Journal of the American Medical Informatics Association, 23(6), 1046-1052. https://doi.org/10.1093/jamia/ocv202

Kirby, JC, Speltz, P, Rasmussen, LV, Basford, M, Gottesman, O, Peissig, PL, Pacheco, JA, Tromp, G, Pathak, J, Carrell, DS, Ellis, SB, Lingren, T, Thompson, WK, Savova, G, Haines, J, Roden, DM, Harris, PA & Denny, JC 2016, 'PheKB: A catalog and workflow for creating electronic phenotype algorithms for transportability', Journal of the American Medical Informatics Association, vol. 23, no. 6, pp. 1046-1052. https://doi.org/10.1093/jamia/ocv202

@article{b0c6b1357e324496af81b0bc25f83cfd,

title = "PheKB: A catalog and workflow for creating electronic phenotype algorithms for transportability",

abstract = "Objective Health care generated data have become an important source for clinical and genomic research. Often, investigators create and iteratively refine phenotype algorithms to achieve high positive predictive values (PPVs) or sensitivity, thereby identifying valid cases and controls. These algorithms achieve the greatest utility when validated and shared by multiple health care systems.Materials and Methods We report the current status and impact of the Phenotype KnowledgeBase (PheKB, http://phekb.org), an online environment supporting the workflow of building, sharing, and validating electronic phenotype algorithms. We analyze the most frequent components used in algorithms and their performance at authoring institutions and secondary implementation sites.Results As of June 2015, PheKB contained 30 finalized phenotype algorithms and 62 algorithms in development spanning a range of traits and diseases. Phenotypes have had over 3500 unique views in a 6-month period and have been reused by other institutions. International Classification of Disease codes were the most frequently used component, followed by medications and natural language processing. Among algorithms with published performance data, the median PPV was nearly identical when evaluated at the authoring institutions (n = 44; case 96.0%, control 100%) compared to implementation sites (n = 40; case 97.5%, control 100%).Discussion These results demonstrate that a broad range of algorithms to mine electronic health record data from different health systems can be developed with high PPV, and algorithms developed at one site are generally transportable to others.Conclusion By providing a central repository, PheKB enables improved development, transportability, and validity of algorithms for research-grade phenotypes using health care generated data.",

keywords = "Clinical research, Electronic health records, Electronic phenotyping, Genomic research, Natural language processing",

author = "Kirby, {Jacqueline C.} and Peter Speltz and Rasmussen, {Luke V.} and Melissa Basford and Omri Gottesman and Peissig, {Peggy L.} and Pacheco, {Jennifer A.} and Gerard Tromp and Jyotishman Pathak and Carrell, {David S.} and Ellis, {Stephen B.} and Todd Lingren and Thompson, {Will K.} and Guergana Savova and Jonathan Haines and Roden, {Dan M.} and Harris, {Paul A.} and Denny, {Joshua C.}",

year = "2016",

month = nov,

day = "1",

doi = "10.1093/jamia/ocv202",

language = "English (US)",

volume = "23",

pages = "1046--1052",

journal = "Journal of the American Medical Informatics Association",

issn = "1067-5027",

publisher = "Oxford University Press",

number = "6",

}

TY - JOUR

T1 - PheKB

T2 - A catalog and workflow for creating electronic phenotype algorithms for transportability

AU - Kirby, Jacqueline C.

AU - Speltz, Peter

AU - Rasmussen, Luke V.

AU - Basford, Melissa

AU - Gottesman, Omri

AU - Peissig, Peggy L.

AU - Pacheco, Jennifer A.

AU - Tromp, Gerard

AU - Pathak, Jyotishman

AU - Carrell, David S.

AU - Ellis, Stephen B.

AU - Lingren, Todd

AU - Thompson, Will K.

AU - Savova, Guergana

AU - Haines, Jonathan

AU - Roden, Dan M.

AU - Harris, Paul A.

AU - Denny, Joshua C.

PY - 2016/11/1

Y1 - 2016/11/1

N2 - Objective Health care generated data have become an important source for clinical and genomic research. Often, investigators create and iteratively refine phenotype algorithms to achieve high positive predictive values (PPVs) or sensitivity, thereby identifying valid cases and controls. These algorithms achieve the greatest utility when validated and shared by multiple health care systems.Materials and Methods We report the current status and impact of the Phenotype KnowledgeBase (PheKB, http://phekb.org), an online environment supporting the workflow of building, sharing, and validating electronic phenotype algorithms. We analyze the most frequent components used in algorithms and their performance at authoring institutions and secondary implementation sites.Results As of June 2015, PheKB contained 30 finalized phenotype algorithms and 62 algorithms in development spanning a range of traits and diseases. Phenotypes have had over 3500 unique views in a 6-month period and have been reused by other institutions. International Classification of Disease codes were the most frequently used component, followed by medications and natural language processing. Among algorithms with published performance data, the median PPV was nearly identical when evaluated at the authoring institutions (n = 44; case 96.0%, control 100%) compared to implementation sites (n = 40; case 97.5%, control 100%).Discussion These results demonstrate that a broad range of algorithms to mine electronic health record data from different health systems can be developed with high PPV, and algorithms developed at one site are generally transportable to others.Conclusion By providing a central repository, PheKB enables improved development, transportability, and validity of algorithms for research-grade phenotypes using health care generated data.

AB - Objective Health care generated data have become an important source for clinical and genomic research. Often, investigators create and iteratively refine phenotype algorithms to achieve high positive predictive values (PPVs) or sensitivity, thereby identifying valid cases and controls. These algorithms achieve the greatest utility when validated and shared by multiple health care systems.Materials and Methods We report the current status and impact of the Phenotype KnowledgeBase (PheKB, http://phekb.org), an online environment supporting the workflow of building, sharing, and validating electronic phenotype algorithms. We analyze the most frequent components used in algorithms and their performance at authoring institutions and secondary implementation sites.Results As of June 2015, PheKB contained 30 finalized phenotype algorithms and 62 algorithms in development spanning a range of traits and diseases. Phenotypes have had over 3500 unique views in a 6-month period and have been reused by other institutions. International Classification of Disease codes were the most frequently used component, followed by medications and natural language processing. Among algorithms with published performance data, the median PPV was nearly identical when evaluated at the authoring institutions (n = 44; case 96.0%, control 100%) compared to implementation sites (n = 40; case 97.5%, control 100%).Discussion These results demonstrate that a broad range of algorithms to mine electronic health record data from different health systems can be developed with high PPV, and algorithms developed at one site are generally transportable to others.Conclusion By providing a central repository, PheKB enables improved development, transportability, and validity of algorithms for research-grade phenotypes using health care generated data.

KW - Clinical research

KW - Electronic health records

KW - Electronic phenotyping

KW - Genomic research

KW - Natural language processing

UR - http://www.scopus.com/inward/record.url?scp=84994697920&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84994697920&partnerID=8YFLogxK

U2 - 10.1093/jamia/ocv202

DO - 10.1093/jamia/ocv202

M3 - Article

C2 - 27026615

AN - SCOPUS:84994697920

SN - 1067-5027

VL - 23

SP - 1046

EP - 1052

JO - Journal of the American Medical Informatics Association

JF - Journal of the American Medical Informatics Association

IS - 6

ER -

PheKB: A catalog and workflow for creating electronic phenotype algorithms for transportability

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this