Characterizing variability of electronic health record-driven phenotype definitions

Pascal S. Brandt; Abel Kho; Yuan Luo; Jennifer A. Pacheco; Theresa L. Walunas; Hakon Hakonarson; George Hripcsak; Cong Liu; Ning Shang; Chunhua Weng; Nephi Walton; David S. Carrell; Paul K. Crane; Eric B. Larson; Christopher G. Chute; Iftikhar J. Kullo; Robert Carroll; Josh Denny; Andrea Ramirez; Wei Qi Wei; Jyoti Pathak; Laura K. Wiley; Rachel Richesson; Justin B. Starren; Luke V. Rasmussen

doi:10.1093/jamia/ocac235

Characterizing variability of electronic health record-driven phenotype definitions

Pascal S. Brandt, Abel Kho, Yuan Luo, Jennifer A. Pacheco, Theresa L. Walunas, Hakon Hakonarson, George Hripcsak, Cong Liu, Ning Shang, Chunhua Weng, Nephi Walton, David S. Carrell, Paul K. Crane, Eric B. Larson, Christopher G. Chute, Iftikhar J. Kullo, Robert Carroll, Josh Denny, Andrea Ramirez, Wei Qi WeiJyoti Pathak, Laura K. Wiley, Rachel Richesson, Justin B. Starren, Luke V. Rasmussen

Cardiovascular Medicine

Research output: Contribution to journal › Article › peer-review

Abstract

Objective: The aim of this study was to analyze a publicly available sample of rule-based phenotype definitions to characterize and evaluate the variability of logical constructs used. Materials and Methods: A sample of 33 preexisting phenotype definitions used in research that are represented using Fast Healthcare Interoperability Resources and Clinical Quality Language (CQL) was analyzed using automated analysis of the computable representation of the CQL libraries. Results: Most of the phenotype definitions include narrative descriptions and flowcharts, while few provide pseudocode or executable artifacts. Most use 4 or fewer medical terminologies. The number of codes used ranges from 5 to 6865, and value sets from 1 to 19. We found that the most common expressions used were literal, data, and logical expressions. Aggregate and arithmetic expressions are the least common. Expression depth ranges from 4 to 27. Discussion: Despite the range of conditions, we found that all of the phenotype definitions consisted of logical criteria, representing both clinical and operational logic, and tabular data, consisting of codes from standard terminologies and keywords for natural language processing. The total number and variety of expressions are low, which may be to simplify implementation, or authors may limit complexity due to data availability constraints. Conclusions: The phenotype definitions analyzed show significant variation in specific logical, arithmetic, and other operators but are all composed of the same high-level components, namely tabular data and logical expressions. A standard representation for phenotype definitions should support these formats and be modular to support localization and shared logic.

Original language	English (US)
Pages (from-to)	427-437
Number of pages	11
Journal	Journal of the American Medical Informatics Association
Volume	30
Issue number	3
DOIs	https://doi.org/10.1093/jamia/ocac235
State	Published - Mar 1 2023

Keywords

CQL
EHR-driven phenotyping
FHIR
cohort identification

ASJC Scopus subject areas

Health Informatics

Access to Document

10.1093/jamia/ocac235

Cite this

Brandt, P. S., Kho, A., Luo, Y., Pacheco, J. A., Walunas, T. L., Hakonarson, H., Hripcsak, G., Liu, C., Shang, N., Weng, C., Walton, N., Carrell, D. S., Crane, P. K., Larson, E. B., Chute, C. G., Kullo, I. J., Carroll, R., Denny, J., Ramirez, A., ... Rasmussen, L. V. (2023). Characterizing variability of electronic health record-driven phenotype definitions. Journal of the American Medical Informatics Association, 30(3), 427-437. https://doi.org/10.1093/jamia/ocac235

Brandt, PS, Kho, A, Luo, Y, Pacheco, JA, Walunas, TL, Hakonarson, H, Hripcsak, G, Liu, C, Shang, N, Weng, C, Walton, N, Carrell, DS, Crane, PK, Larson, EB, Chute, CG, Kullo, IJ, Carroll, R, Denny, J, Ramirez, A, Wei, WQ, Pathak, J, Wiley, LK, Richesson, R, Starren, JB & Rasmussen, LV 2023, 'Characterizing variability of electronic health record-driven phenotype definitions', Journal of the American Medical Informatics Association, vol. 30, no. 3, pp. 427-437. https://doi.org/10.1093/jamia/ocac235

@article{1022d30a8e1e4add8ceb4aef00038d72,

title = "Characterizing variability of electronic health record-driven phenotype definitions",

abstract = "Objective: The aim of this study was to analyze a publicly available sample of rule-based phenotype definitions to characterize and evaluate the variability of logical constructs used. Materials and Methods: A sample of 33 preexisting phenotype definitions used in research that are represented using Fast Healthcare Interoperability Resources and Clinical Quality Language (CQL) was analyzed using automated analysis of the computable representation of the CQL libraries. Results: Most of the phenotype definitions include narrative descriptions and flowcharts, while few provide pseudocode or executable artifacts. Most use 4 or fewer medical terminologies. The number of codes used ranges from 5 to 6865, and value sets from 1 to 19. We found that the most common expressions used were literal, data, and logical expressions. Aggregate and arithmetic expressions are the least common. Expression depth ranges from 4 to 27. Discussion: Despite the range of conditions, we found that all of the phenotype definitions consisted of logical criteria, representing both clinical and operational logic, and tabular data, consisting of codes from standard terminologies and keywords for natural language processing. The total number and variety of expressions are low, which may be to simplify implementation, or authors may limit complexity due to data availability constraints. Conclusions: The phenotype definitions analyzed show significant variation in specific logical, arithmetic, and other operators but are all composed of the same high-level components, namely tabular data and logical expressions. A standard representation for phenotype definitions should support these formats and be modular to support localization and shared logic.",

keywords = "CQL, EHR-driven phenotyping, FHIR, cohort identification",

author = "Brandt, {Pascal S.} and Abel Kho and Yuan Luo and Pacheco, {Jennifer A.} and Walunas, {Theresa L.} and Hakon Hakonarson and George Hripcsak and Cong Liu and Ning Shang and Chunhua Weng and Nephi Walton and Carrell, {David S.} and Crane, {Paul K.} and Larson, {Eric B.} and Chute, {Christopher G.} and Kullo, {Iftikhar J.} and Robert Carroll and Josh Denny and Andrea Ramirez and Wei, {Wei Qi} and Jyoti Pathak and Wiley, {Laura K.} and Rachel Richesson and Starren, {Justin B.} and Rasmussen, {Luke V.}",

year = "2023",

month = mar,

day = "1",

doi = "10.1093/jamia/ocac235",

language = "English (US)",

volume = "30",

pages = "427--437",

journal = "Journal of the American Medical Informatics Association",

issn = "1067-5027",

publisher = "Oxford University Press",

number = "3",

}

TY - JOUR

T1 - Characterizing variability of electronic health record-driven phenotype definitions

AU - Brandt, Pascal S.

AU - Kho, Abel

AU - Luo, Yuan

AU - Pacheco, Jennifer A.

AU - Walunas, Theresa L.

AU - Hakonarson, Hakon

AU - Hripcsak, George

AU - Liu, Cong

AU - Shang, Ning

AU - Weng, Chunhua

AU - Walton, Nephi

AU - Carrell, David S.

AU - Crane, Paul K.

AU - Larson, Eric B.

AU - Chute, Christopher G.

AU - Kullo, Iftikhar J.

AU - Carroll, Robert

AU - Denny, Josh

AU - Ramirez, Andrea

AU - Wei, Wei Qi

AU - Pathak, Jyoti

AU - Wiley, Laura K.

AU - Richesson, Rachel

AU - Starren, Justin B.

AU - Rasmussen, Luke V.

PY - 2023/3/1

Y1 - 2023/3/1

N2 - Objective: The aim of this study was to analyze a publicly available sample of rule-based phenotype definitions to characterize and evaluate the variability of logical constructs used. Materials and Methods: A sample of 33 preexisting phenotype definitions used in research that are represented using Fast Healthcare Interoperability Resources and Clinical Quality Language (CQL) was analyzed using automated analysis of the computable representation of the CQL libraries. Results: Most of the phenotype definitions include narrative descriptions and flowcharts, while few provide pseudocode or executable artifacts. Most use 4 or fewer medical terminologies. The number of codes used ranges from 5 to 6865, and value sets from 1 to 19. We found that the most common expressions used were literal, data, and logical expressions. Aggregate and arithmetic expressions are the least common. Expression depth ranges from 4 to 27. Discussion: Despite the range of conditions, we found that all of the phenotype definitions consisted of logical criteria, representing both clinical and operational logic, and tabular data, consisting of codes from standard terminologies and keywords for natural language processing. The total number and variety of expressions are low, which may be to simplify implementation, or authors may limit complexity due to data availability constraints. Conclusions: The phenotype definitions analyzed show significant variation in specific logical, arithmetic, and other operators but are all composed of the same high-level components, namely tabular data and logical expressions. A standard representation for phenotype definitions should support these formats and be modular to support localization and shared logic.

AB - Objective: The aim of this study was to analyze a publicly available sample of rule-based phenotype definitions to characterize and evaluate the variability of logical constructs used. Materials and Methods: A sample of 33 preexisting phenotype definitions used in research that are represented using Fast Healthcare Interoperability Resources and Clinical Quality Language (CQL) was analyzed using automated analysis of the computable representation of the CQL libraries. Results: Most of the phenotype definitions include narrative descriptions and flowcharts, while few provide pseudocode or executable artifacts. Most use 4 or fewer medical terminologies. The number of codes used ranges from 5 to 6865, and value sets from 1 to 19. We found that the most common expressions used were literal, data, and logical expressions. Aggregate and arithmetic expressions are the least common. Expression depth ranges from 4 to 27. Discussion: Despite the range of conditions, we found that all of the phenotype definitions consisted of logical criteria, representing both clinical and operational logic, and tabular data, consisting of codes from standard terminologies and keywords for natural language processing. The total number and variety of expressions are low, which may be to simplify implementation, or authors may limit complexity due to data availability constraints. Conclusions: The phenotype definitions analyzed show significant variation in specific logical, arithmetic, and other operators but are all composed of the same high-level components, namely tabular data and logical expressions. A standard representation for phenotype definitions should support these formats and be modular to support localization and shared logic.

KW - CQL

KW - EHR-driven phenotyping

KW - FHIR

KW - cohort identification

UR - http://www.scopus.com/inward/record.url?scp=85148250157&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85148250157&partnerID=8YFLogxK

U2 - 10.1093/jamia/ocac235

DO - 10.1093/jamia/ocac235

M3 - Article

C2 - 36474423

AN - SCOPUS:85148250157

SN - 1067-5027

VL - 30

SP - 427

EP - 437

JO - Journal of the American Medical Informatics Association

JF - Journal of the American Medical Informatics Association

IS - 3

ER -

Characterizing variability of electronic health record-driven phenotype definitions

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this