Developing a scalable FHIR-based clinical data normalization pipeline for standardizing and integrating unstructured and structured electronic health record data

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

Objective: To design, develop, and evaluate a scalable clinical data normalization pipeline for standardizing unstructured electronic health record (EHR) data leveraging the HL7 Fast Healthcare Interoperability Resources (FHIR) specification. Methods: We established an FHIR-based clinical data normalization pipeline known as NLP2FHIR that mainly comprises: (1) a module for a core natural language processing (NLP) engine with an FHIR-based type system; (2) a module for integrating structured data; and (3) a module for content normalization. We evaluated the FHIR modeling capability focusing on core clinical resources such as Condition, Procedure, MedicationStatement (including Medication), and FamilyMemberHistory using Mayo Clinic's unstructured EHR data. We constructed a gold standard reusing annotation corpora from previous NLP projects. Results: A total of 30 mapping rules, 62 normalization rules, and 11 NLP-specific FHIR extensions were created and implemented in the NLP2FHIR pipeline. The elements that need to integrate structured data from each clinical resource were identified. The performance of unstructured data modeling achieved F scores ranging from 0.69 to 0.99 for various FHIR element representations (0.69-0.99 for Condition; 0.75-0.84 for Procedure; 0.71-0.99 for MedicationStatement; and 0.75-0.95 for FamilyMemberHistory). Conclusion: We demonstrated that the NLP2FHIR pipeline is feasible for modeling unstructured EHR data and integrating structured elements into the model. The outcomes of this work provide standards-based tools of clinical data normalization that is indispensable for enabling portable EHR-driven phenotyping and large-scale data analytics, as well as useful insights for future developments of the FHIR specifications with regard to handling unstructured clinical data.

Original languageEnglish (US)
Pages (from-to)570-579
Number of pages10
JournalJAMIA Open
Volume2
Issue number4
DOIs
StatePublished - Dec 1 2019

Keywords

  • Data standards
  • Electronic health records
  • Fast Healthcare Interoperability Resources
  • Natural language process

ASJC Scopus subject areas

  • Health Informatics

Fingerprint Dive into the research topics of 'Developing a scalable FHIR-based clinical data normalization pipeline for standardizing and integrating unstructured and structured electronic health record data'. Together they form a unique fingerprint.

Cite this