Normalization and standardization of electronic health records for high-throughput phenotyping

The sharpn consortium

Jyotishman Pathak, Kent R Bailey, Calvin E. Beebe, Steven Bethard, David S. Carrell, Pei J. Chen, Dmitriy Dligach, Cory M. Endle, Lacey A. Hart, Peter J. Haug, Stanley M. Huff, Vinod C. Kaggal, Dingcheng Li, Hongfang D Liu, Kyle Marchant, James Masanz, Timothy Miller, Thomas A. Oniki, Martha Palmer, Kevin J. Peterson & 12 others Susan Rea, Guergana K. Savova, Craig R. Stancl, Sunghwan Sohn, Harold R. Solbrig, Dale B. Suesse, Cui Tao, David P. Taylor, Les Westberg, Stephen Wu, Ning Zhuo, Christopher G. Chute

Research output: Contribution to journalArticle

55 Citations (Scopus)

Abstract

Research objective To develop scalable informatics infrastructure for normalization of both structured and unstructured electronic health record (EHR) data into a unified, concept-based model for high-throughput phenotype extraction. Materials and methods Software tools and applications were developed to extract information from EHRs. Representative and convenience samples of both structured and unstructured data from two EHR systems -Mayo Clinic and Intermountain Healthcare-were used for development and validation. Extracted information was standardized and normalized to meaningful use (MU) conformant terminology and value set standards using Clinical Element Models (CEMs). These resources were used to demonstrate semi-automatic execution of MU clinical-quality measures modeled using the Quality Data Model (QDM) and an open-source rules engine. Results Using CEMs and open-source natural language processing and terminology services engines-namely, Apache clinical Text Analysis and Knowledge Extraction System (cTAKES) and Common Terminology Services (CTS2)-we developed a data-normalization platform that ensures data security, end-to-end connectivity, and reliable data flow within and across institutions. We demonstrated the applicability of this platform by executing a QDM-based MU quality measure that determines the percentage of patients between 18 and 75 years with diabetes whose most recent low-density lipoprotein cholesterol test result during the measurement year was <100 mg/dL on a randomly selected cohort of 273 Mayo Clinic patients. The platform identified 21 and patients for the denominator and numerator of the quality measure, respectively. Validation results indicate that all identified patients meet the QDM-based criteria. Conclusions End-to-end automated systems for extracting clinical information from diverse EHR systems require extensive use of standardized vocabularies and terminologies, as well as robust information models for storing, discovering, and processing that information. This study demonstrates the application of modular and opensource resources for enabling secondary use of EHR data through normalization into standards-based, comparable, and consistent format for high-throughput phenotyping to identify patient cohorts.

Original languageEnglish (US)
JournalJournal of the American Medical Informatics Association
Volume20
Issue numberE2
DOIs
StatePublished - 2013

Fingerprint

Electronic Health Records
Terminology
Natural Language Processing
Computer Security
Informatics
Vocabulary
Automatic Data Processing
LDL Cholesterol
Software
Delivery of Health Care
Phenotype
Research
Data Accuracy

ASJC Scopus subject areas

  • Health Informatics

Cite this

Normalization and standardization of electronic health records for high-throughput phenotyping : The sharpn consortium. / Pathak, Jyotishman; Bailey, Kent R; Beebe, Calvin E.; Bethard, Steven; Carrell, David S.; Chen, Pei J.; Dligach, Dmitriy; Endle, Cory M.; Hart, Lacey A.; Haug, Peter J.; Huff, Stanley M.; Kaggal, Vinod C.; Li, Dingcheng; Liu, Hongfang D; Marchant, Kyle; Masanz, James; Miller, Timothy; Oniki, Thomas A.; Palmer, Martha; Peterson, Kevin J.; Rea, Susan; Savova, Guergana K.; Stancl, Craig R.; Sohn, Sunghwan; Solbrig, Harold R.; Suesse, Dale B.; Tao, Cui; Taylor, David P.; Westberg, Les; Wu, Stephen; Zhuo, Ning; Chute, Christopher G.

In: Journal of the American Medical Informatics Association, Vol. 20, No. E2, 2013.

Research output: Contribution to journalArticle

Pathak, J, Bailey, KR, Beebe, CE, Bethard, S, Carrell, DS, Chen, PJ, Dligach, D, Endle, CM, Hart, LA, Haug, PJ, Huff, SM, Kaggal, VC, Li, D, Liu, HD, Marchant, K, Masanz, J, Miller, T, Oniki, TA, Palmer, M, Peterson, KJ, Rea, S, Savova, GK, Stancl, CR, Sohn, S, Solbrig, HR, Suesse, DB, Tao, C, Taylor, DP, Westberg, L, Wu, S, Zhuo, N & Chute, CG 2013, 'Normalization and standardization of electronic health records for high-throughput phenotyping: The sharpn consortium', Journal of the American Medical Informatics Association, vol. 20, no. E2. https://doi.org/10.1136/amiajnl-2013-001939
Pathak, Jyotishman ; Bailey, Kent R ; Beebe, Calvin E. ; Bethard, Steven ; Carrell, David S. ; Chen, Pei J. ; Dligach, Dmitriy ; Endle, Cory M. ; Hart, Lacey A. ; Haug, Peter J. ; Huff, Stanley M. ; Kaggal, Vinod C. ; Li, Dingcheng ; Liu, Hongfang D ; Marchant, Kyle ; Masanz, James ; Miller, Timothy ; Oniki, Thomas A. ; Palmer, Martha ; Peterson, Kevin J. ; Rea, Susan ; Savova, Guergana K. ; Stancl, Craig R. ; Sohn, Sunghwan ; Solbrig, Harold R. ; Suesse, Dale B. ; Tao, Cui ; Taylor, David P. ; Westberg, Les ; Wu, Stephen ; Zhuo, Ning ; Chute, Christopher G. / Normalization and standardization of electronic health records for high-throughput phenotyping : The sharpn consortium. In: Journal of the American Medical Informatics Association. 2013 ; Vol. 20, No. E2.
@article{4e78bc4b5c324563b13a38714c9ef4c2,
title = "Normalization and standardization of electronic health records for high-throughput phenotyping: The sharpn consortium",
abstract = "Research objective To develop scalable informatics infrastructure for normalization of both structured and unstructured electronic health record (EHR) data into a unified, concept-based model for high-throughput phenotype extraction. Materials and methods Software tools and applications were developed to extract information from EHRs. Representative and convenience samples of both structured and unstructured data from two EHR systems -Mayo Clinic and Intermountain Healthcare-were used for development and validation. Extracted information was standardized and normalized to meaningful use (MU) conformant terminology and value set standards using Clinical Element Models (CEMs). These resources were used to demonstrate semi-automatic execution of MU clinical-quality measures modeled using the Quality Data Model (QDM) and an open-source rules engine. Results Using CEMs and open-source natural language processing and terminology services engines-namely, Apache clinical Text Analysis and Knowledge Extraction System (cTAKES) and Common Terminology Services (CTS2)-we developed a data-normalization platform that ensures data security, end-to-end connectivity, and reliable data flow within and across institutions. We demonstrated the applicability of this platform by executing a QDM-based MU quality measure that determines the percentage of patients between 18 and 75 years with diabetes whose most recent low-density lipoprotein cholesterol test result during the measurement year was <100 mg/dL on a randomly selected cohort of 273 Mayo Clinic patients. The platform identified 21 and patients for the denominator and numerator of the quality measure, respectively. Validation results indicate that all identified patients meet the QDM-based criteria. Conclusions End-to-end automated systems for extracting clinical information from diverse EHR systems require extensive use of standardized vocabularies and terminologies, as well as robust information models for storing, discovering, and processing that information. This study demonstrates the application of modular and opensource resources for enabling secondary use of EHR data through normalization into standards-based, comparable, and consistent format for high-throughput phenotyping to identify patient cohorts.",
author = "Jyotishman Pathak and Bailey, {Kent R} and Beebe, {Calvin E.} and Steven Bethard and Carrell, {David S.} and Chen, {Pei J.} and Dmitriy Dligach and Endle, {Cory M.} and Hart, {Lacey A.} and Haug, {Peter J.} and Huff, {Stanley M.} and Kaggal, {Vinod C.} and Dingcheng Li and Liu, {Hongfang D} and Kyle Marchant and James Masanz and Timothy Miller and Oniki, {Thomas A.} and Martha Palmer and Peterson, {Kevin J.} and Susan Rea and Savova, {Guergana K.} and Stancl, {Craig R.} and Sunghwan Sohn and Solbrig, {Harold R.} and Suesse, {Dale B.} and Cui Tao and Taylor, {David P.} and Les Westberg and Stephen Wu and Ning Zhuo and Chute, {Christopher G.}",
year = "2013",
doi = "10.1136/amiajnl-2013-001939",
language = "English (US)",
volume = "20",
journal = "Journal of the American Medical Informatics Association : JAMIA",
issn = "1067-5027",
publisher = "Oxford University Press",
number = "E2",

}

TY - JOUR

T1 - Normalization and standardization of electronic health records for high-throughput phenotyping

T2 - The sharpn consortium

AU - Pathak, Jyotishman

AU - Bailey, Kent R

AU - Beebe, Calvin E.

AU - Bethard, Steven

AU - Carrell, David S.

AU - Chen, Pei J.

AU - Dligach, Dmitriy

AU - Endle, Cory M.

AU - Hart, Lacey A.

AU - Haug, Peter J.

AU - Huff, Stanley M.

AU - Kaggal, Vinod C.

AU - Li, Dingcheng

AU - Liu, Hongfang D

AU - Marchant, Kyle

AU - Masanz, James

AU - Miller, Timothy

AU - Oniki, Thomas A.

AU - Palmer, Martha

AU - Peterson, Kevin J.

AU - Rea, Susan

AU - Savova, Guergana K.

AU - Stancl, Craig R.

AU - Sohn, Sunghwan

AU - Solbrig, Harold R.

AU - Suesse, Dale B.

AU - Tao, Cui

AU - Taylor, David P.

AU - Westberg, Les

AU - Wu, Stephen

AU - Zhuo, Ning

AU - Chute, Christopher G.

PY - 2013

Y1 - 2013

N2 - Research objective To develop scalable informatics infrastructure for normalization of both structured and unstructured electronic health record (EHR) data into a unified, concept-based model for high-throughput phenotype extraction. Materials and methods Software tools and applications were developed to extract information from EHRs. Representative and convenience samples of both structured and unstructured data from two EHR systems -Mayo Clinic and Intermountain Healthcare-were used for development and validation. Extracted information was standardized and normalized to meaningful use (MU) conformant terminology and value set standards using Clinical Element Models (CEMs). These resources were used to demonstrate semi-automatic execution of MU clinical-quality measures modeled using the Quality Data Model (QDM) and an open-source rules engine. Results Using CEMs and open-source natural language processing and terminology services engines-namely, Apache clinical Text Analysis and Knowledge Extraction System (cTAKES) and Common Terminology Services (CTS2)-we developed a data-normalization platform that ensures data security, end-to-end connectivity, and reliable data flow within and across institutions. We demonstrated the applicability of this platform by executing a QDM-based MU quality measure that determines the percentage of patients between 18 and 75 years with diabetes whose most recent low-density lipoprotein cholesterol test result during the measurement year was <100 mg/dL on a randomly selected cohort of 273 Mayo Clinic patients. The platform identified 21 and patients for the denominator and numerator of the quality measure, respectively. Validation results indicate that all identified patients meet the QDM-based criteria. Conclusions End-to-end automated systems for extracting clinical information from diverse EHR systems require extensive use of standardized vocabularies and terminologies, as well as robust information models for storing, discovering, and processing that information. This study demonstrates the application of modular and opensource resources for enabling secondary use of EHR data through normalization into standards-based, comparable, and consistent format for high-throughput phenotyping to identify patient cohorts.

AB - Research objective To develop scalable informatics infrastructure for normalization of both structured and unstructured electronic health record (EHR) data into a unified, concept-based model for high-throughput phenotype extraction. Materials and methods Software tools and applications were developed to extract information from EHRs. Representative and convenience samples of both structured and unstructured data from two EHR systems -Mayo Clinic and Intermountain Healthcare-were used for development and validation. Extracted information was standardized and normalized to meaningful use (MU) conformant terminology and value set standards using Clinical Element Models (CEMs). These resources were used to demonstrate semi-automatic execution of MU clinical-quality measures modeled using the Quality Data Model (QDM) and an open-source rules engine. Results Using CEMs and open-source natural language processing and terminology services engines-namely, Apache clinical Text Analysis and Knowledge Extraction System (cTAKES) and Common Terminology Services (CTS2)-we developed a data-normalization platform that ensures data security, end-to-end connectivity, and reliable data flow within and across institutions. We demonstrated the applicability of this platform by executing a QDM-based MU quality measure that determines the percentage of patients between 18 and 75 years with diabetes whose most recent low-density lipoprotein cholesterol test result during the measurement year was <100 mg/dL on a randomly selected cohort of 273 Mayo Clinic patients. The platform identified 21 and patients for the denominator and numerator of the quality measure, respectively. Validation results indicate that all identified patients meet the QDM-based criteria. Conclusions End-to-end automated systems for extracting clinical information from diverse EHR systems require extensive use of standardized vocabularies and terminologies, as well as robust information models for storing, discovering, and processing that information. This study demonstrates the application of modular and opensource resources for enabling secondary use of EHR data through normalization into standards-based, comparable, and consistent format for high-throughput phenotyping to identify patient cohorts.

UR - http://www.scopus.com/inward/record.url?scp=84890446404&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84890446404&partnerID=8YFLogxK

U2 - 10.1136/amiajnl-2013-001939

DO - 10.1136/amiajnl-2013-001939

M3 - Article

VL - 20

JO - Journal of the American Medical Informatics Association : JAMIA

JF - Journal of the American Medical Informatics Association : JAMIA

SN - 1067-5027

IS - E2

ER -