Developing a FHIR-based EHR phenotyping framework: A case study for identification of patients with obesity and multiple comorbidities from discharge summaries

Na Hong, Andrew Wen, Daniel J. Stone, Shintaro Tsuji, Paul R. Kingsbury, Luke V. Rasmussen, Jennifer A. Pacheco, Prakash Adekkanattu, Fei Wang, Yuan Luo, Jyotishman Pathak, Hongfang Liu, Guoqian Jiang

Research output: Contribution to journalArticle

Abstract

Background: Standards-based clinical data normalization has become a key component of effective data integration and accurate phenotyping for secondary use of electronic healthcare records (EHR) data. HL7 Fast Healthcare Interoperability Resources (FHIR) is an emerging clinical data standard for exchanging electronic healthcare data and has been used in modeling and integrating both structured and unstructured EHR data for a variety of clinical research applications. The overall objective of this study is to develop and evaluate a FHIR-based EHR phenotyping framework for identification of patients with obesity and its multiple comorbidities from semi-structured discharge summaries leveraging a FHIR-based clinical data normalization pipeline (known as NLP2FHIR). Methods: We implemented a multi-class and multi-label classification system based on the i2b2 Obesity Challenge task to evaluate the FHIR-based EHR phenotyping framework. Two core parts of the framework are: (a) the conversion of discharge summaries into corresponding FHIR resources – Composition, Condition, MedicationStatement, Procedure and FamilyMemberHistory using the NLP2FHIR pipeline, and (b) the implementation of four machine learning algorithms (logistic regression, support vector machine, decision tree, and random forest) to train classifiers to predict disease state of obesity and 15 comorbidities using features extracted from standard FHIR resources and terminology expansions. We used the macro- and micro-averaged precision (P), recall (R), and F1 score (F1) measures to evaluate the classifier performance. We validated the framework using a second obesity dataset extracted from the MIMIC-III database. Results: Using the NLP2FHIR pipeline, 1237 clinical discharge summaries from the 2008 i2b2 obesity challenge dataset were represented as the instances of the FHIR Composition resource consisting of 5677 records with 16 unique section types. After the NLP processing and FHIR modeling, a set of 244,438 FHIR clinical resource instances were generated. As the results of the four machine learning classifiers, the random forest algorithm performed the best with F1-micro(0.9466)/F1-macro(0.7887) and F1-micro(0.9536)/F1-macro(0.6524) for intuitive classification (reflecting medical professionals’ judgments) and textual classification (reflecting the judgments based on explicitly reported information of diseases), respectively. The MIMIC-III obesity dataset was successfully integrated for prediction with minimal configuration of the NLP2FHIR pipeline and machine learning models. Conclusions: The study demonstrated that the FHIR-based EHR phenotyping approach could effectively identify the state of obesity and multiple comorbidities using semi-structured discharge summaries. Our FHIR-based phenotyping approach is a first concrete step towards improving the data aspect of phenotyping portability across EHR systems and enhancing interpretability of the machine learning-based phenotyping algorithms.

Original languageEnglish (US)
Article number103310
JournalJournal of Biomedical Informatics
Volume99
DOIs
StatePublished - Nov 2019

Fingerprint

Interoperability
Comorbidity
Obesity
Delivery of Health Care
Learning systems
Pipelines
Macros
Classifiers
Data integration
Terminology
Decision trees
Chemical analysis
Learning algorithms
Support vector machines
Logistics
Labels
Decision Trees

Keywords

  • Algorithm portability
  • Clinical phenotyping
  • Electronic Health Records (EHRs)
  • HL7 Fast Healthcare Interoperability Resources (FHIR)
  • Natural language processing

ASJC Scopus subject areas

  • Computer Science Applications
  • Health Informatics

Cite this

Developing a FHIR-based EHR phenotyping framework : A case study for identification of patients with obesity and multiple comorbidities from discharge summaries. / Hong, Na; Wen, Andrew; Stone, Daniel J.; Tsuji, Shintaro; Kingsbury, Paul R.; Rasmussen, Luke V.; Pacheco, Jennifer A.; Adekkanattu, Prakash; Wang, Fei; Luo, Yuan; Pathak, Jyotishman; Liu, Hongfang; Jiang, Guoqian.

In: Journal of Biomedical Informatics, Vol. 99, 103310, 11.2019.

Research output: Contribution to journalArticle

Hong, Na ; Wen, Andrew ; Stone, Daniel J. ; Tsuji, Shintaro ; Kingsbury, Paul R. ; Rasmussen, Luke V. ; Pacheco, Jennifer A. ; Adekkanattu, Prakash ; Wang, Fei ; Luo, Yuan ; Pathak, Jyotishman ; Liu, Hongfang ; Jiang, Guoqian. / Developing a FHIR-based EHR phenotyping framework : A case study for identification of patients with obesity and multiple comorbidities from discharge summaries. In: Journal of Biomedical Informatics. 2019 ; Vol. 99.
@article{20acdf20aaa44c78a1f349c1a421b39d,
title = "Developing a FHIR-based EHR phenotyping framework: A case study for identification of patients with obesity and multiple comorbidities from discharge summaries",
abstract = "Background: Standards-based clinical data normalization has become a key component of effective data integration and accurate phenotyping for secondary use of electronic healthcare records (EHR) data. HL7 Fast Healthcare Interoperability Resources (FHIR) is an emerging clinical data standard for exchanging electronic healthcare data and has been used in modeling and integrating both structured and unstructured EHR data for a variety of clinical research applications. The overall objective of this study is to develop and evaluate a FHIR-based EHR phenotyping framework for identification of patients with obesity and its multiple comorbidities from semi-structured discharge summaries leveraging a FHIR-based clinical data normalization pipeline (known as NLP2FHIR). Methods: We implemented a multi-class and multi-label classification system based on the i2b2 Obesity Challenge task to evaluate the FHIR-based EHR phenotyping framework. Two core parts of the framework are: (a) the conversion of discharge summaries into corresponding FHIR resources – Composition, Condition, MedicationStatement, Procedure and FamilyMemberHistory using the NLP2FHIR pipeline, and (b) the implementation of four machine learning algorithms (logistic regression, support vector machine, decision tree, and random forest) to train classifiers to predict disease state of obesity and 15 comorbidities using features extracted from standard FHIR resources and terminology expansions. We used the macro- and micro-averaged precision (P), recall (R), and F1 score (F1) measures to evaluate the classifier performance. We validated the framework using a second obesity dataset extracted from the MIMIC-III database. Results: Using the NLP2FHIR pipeline, 1237 clinical discharge summaries from the 2008 i2b2 obesity challenge dataset were represented as the instances of the FHIR Composition resource consisting of 5677 records with 16 unique section types. After the NLP processing and FHIR modeling, a set of 244,438 FHIR clinical resource instances were generated. As the results of the four machine learning classifiers, the random forest algorithm performed the best with F1-micro(0.9466)/F1-macro(0.7887) and F1-micro(0.9536)/F1-macro(0.6524) for intuitive classification (reflecting medical professionals’ judgments) and textual classification (reflecting the judgments based on explicitly reported information of diseases), respectively. The MIMIC-III obesity dataset was successfully integrated for prediction with minimal configuration of the NLP2FHIR pipeline and machine learning models. Conclusions: The study demonstrated that the FHIR-based EHR phenotyping approach could effectively identify the state of obesity and multiple comorbidities using semi-structured discharge summaries. Our FHIR-based phenotyping approach is a first concrete step towards improving the data aspect of phenotyping portability across EHR systems and enhancing interpretability of the machine learning-based phenotyping algorithms.",
keywords = "Algorithm portability, Clinical phenotyping, Electronic Health Records (EHRs), HL7 Fast Healthcare Interoperability Resources (FHIR), Natural language processing",
author = "Na Hong and Andrew Wen and Stone, {Daniel J.} and Shintaro Tsuji and Kingsbury, {Paul R.} and Rasmussen, {Luke V.} and Pacheco, {Jennifer A.} and Prakash Adekkanattu and Fei Wang and Yuan Luo and Jyotishman Pathak and Hongfang Liu and Guoqian Jiang",
year = "2019",
month = "11",
doi = "10.1016/j.jbi.2019.103310",
language = "English (US)",
volume = "99",
journal = "Journal of Biomedical Informatics",
issn = "1532-0464",
publisher = "Academic Press Inc.",

}

TY - JOUR

T1 - Developing a FHIR-based EHR phenotyping framework

T2 - A case study for identification of patients with obesity and multiple comorbidities from discharge summaries

AU - Hong, Na

AU - Wen, Andrew

AU - Stone, Daniel J.

AU - Tsuji, Shintaro

AU - Kingsbury, Paul R.

AU - Rasmussen, Luke V.

AU - Pacheco, Jennifer A.

AU - Adekkanattu, Prakash

AU - Wang, Fei

AU - Luo, Yuan

AU - Pathak, Jyotishman

AU - Liu, Hongfang

AU - Jiang, Guoqian

PY - 2019/11

Y1 - 2019/11

N2 - Background: Standards-based clinical data normalization has become a key component of effective data integration and accurate phenotyping for secondary use of electronic healthcare records (EHR) data. HL7 Fast Healthcare Interoperability Resources (FHIR) is an emerging clinical data standard for exchanging electronic healthcare data and has been used in modeling and integrating both structured and unstructured EHR data for a variety of clinical research applications. The overall objective of this study is to develop and evaluate a FHIR-based EHR phenotyping framework for identification of patients with obesity and its multiple comorbidities from semi-structured discharge summaries leveraging a FHIR-based clinical data normalization pipeline (known as NLP2FHIR). Methods: We implemented a multi-class and multi-label classification system based on the i2b2 Obesity Challenge task to evaluate the FHIR-based EHR phenotyping framework. Two core parts of the framework are: (a) the conversion of discharge summaries into corresponding FHIR resources – Composition, Condition, MedicationStatement, Procedure and FamilyMemberHistory using the NLP2FHIR pipeline, and (b) the implementation of four machine learning algorithms (logistic regression, support vector machine, decision tree, and random forest) to train classifiers to predict disease state of obesity and 15 comorbidities using features extracted from standard FHIR resources and terminology expansions. We used the macro- and micro-averaged precision (P), recall (R), and F1 score (F1) measures to evaluate the classifier performance. We validated the framework using a second obesity dataset extracted from the MIMIC-III database. Results: Using the NLP2FHIR pipeline, 1237 clinical discharge summaries from the 2008 i2b2 obesity challenge dataset were represented as the instances of the FHIR Composition resource consisting of 5677 records with 16 unique section types. After the NLP processing and FHIR modeling, a set of 244,438 FHIR clinical resource instances were generated. As the results of the four machine learning classifiers, the random forest algorithm performed the best with F1-micro(0.9466)/F1-macro(0.7887) and F1-micro(0.9536)/F1-macro(0.6524) for intuitive classification (reflecting medical professionals’ judgments) and textual classification (reflecting the judgments based on explicitly reported information of diseases), respectively. The MIMIC-III obesity dataset was successfully integrated for prediction with minimal configuration of the NLP2FHIR pipeline and machine learning models. Conclusions: The study demonstrated that the FHIR-based EHR phenotyping approach could effectively identify the state of obesity and multiple comorbidities using semi-structured discharge summaries. Our FHIR-based phenotyping approach is a first concrete step towards improving the data aspect of phenotyping portability across EHR systems and enhancing interpretability of the machine learning-based phenotyping algorithms.

AB - Background: Standards-based clinical data normalization has become a key component of effective data integration and accurate phenotyping for secondary use of electronic healthcare records (EHR) data. HL7 Fast Healthcare Interoperability Resources (FHIR) is an emerging clinical data standard for exchanging electronic healthcare data and has been used in modeling and integrating both structured and unstructured EHR data for a variety of clinical research applications. The overall objective of this study is to develop and evaluate a FHIR-based EHR phenotyping framework for identification of patients with obesity and its multiple comorbidities from semi-structured discharge summaries leveraging a FHIR-based clinical data normalization pipeline (known as NLP2FHIR). Methods: We implemented a multi-class and multi-label classification system based on the i2b2 Obesity Challenge task to evaluate the FHIR-based EHR phenotyping framework. Two core parts of the framework are: (a) the conversion of discharge summaries into corresponding FHIR resources – Composition, Condition, MedicationStatement, Procedure and FamilyMemberHistory using the NLP2FHIR pipeline, and (b) the implementation of four machine learning algorithms (logistic regression, support vector machine, decision tree, and random forest) to train classifiers to predict disease state of obesity and 15 comorbidities using features extracted from standard FHIR resources and terminology expansions. We used the macro- and micro-averaged precision (P), recall (R), and F1 score (F1) measures to evaluate the classifier performance. We validated the framework using a second obesity dataset extracted from the MIMIC-III database. Results: Using the NLP2FHIR pipeline, 1237 clinical discharge summaries from the 2008 i2b2 obesity challenge dataset were represented as the instances of the FHIR Composition resource consisting of 5677 records with 16 unique section types. After the NLP processing and FHIR modeling, a set of 244,438 FHIR clinical resource instances were generated. As the results of the four machine learning classifiers, the random forest algorithm performed the best with F1-micro(0.9466)/F1-macro(0.7887) and F1-micro(0.9536)/F1-macro(0.6524) for intuitive classification (reflecting medical professionals’ judgments) and textual classification (reflecting the judgments based on explicitly reported information of diseases), respectively. The MIMIC-III obesity dataset was successfully integrated for prediction with minimal configuration of the NLP2FHIR pipeline and machine learning models. Conclusions: The study demonstrated that the FHIR-based EHR phenotyping approach could effectively identify the state of obesity and multiple comorbidities using semi-structured discharge summaries. Our FHIR-based phenotyping approach is a first concrete step towards improving the data aspect of phenotyping portability across EHR systems and enhancing interpretability of the machine learning-based phenotyping algorithms.

KW - Algorithm portability

KW - Clinical phenotyping

KW - Electronic Health Records (EHRs)

KW - HL7 Fast Healthcare Interoperability Resources (FHIR)

KW - Natural language processing

UR - http://www.scopus.com/inward/record.url?scp=85073228853&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85073228853&partnerID=8YFLogxK

U2 - 10.1016/j.jbi.2019.103310

DO - 10.1016/j.jbi.2019.103310

M3 - Article

C2 - 31622801

AN - SCOPUS:85073228853

VL - 99

JO - Journal of Biomedical Informatics

JF - Journal of Biomedical Informatics

SN - 1532-0464

M1 - 103310

ER -