Standardizing Heterogeneous Annotation Corpora Using HL7 FHIR for Facilitating their Reuse and Integration in Clinical NLP

Na Hong, Andrew Wen, Majid Rastegar Mojarad, Sunghwan Sohn, Hongfang D Liu, Guoqian D Jiang

Research output: Contribution to journalArticle

Abstract

Manually annotated clinical corpora are commonly used as the gold standards for the training and evaluation of clinical natural language processing (NLP) tools. The creation of these manual annotation corpora, however, is both costly and time-consuming. There is an emerging need in the clinical NLP community for reusing existing annotation corpora across different clinical NLP tasks. The objective of this study is to design, develop and evaluate a framework and accompanying tools to support the standardization and integration of annotation corpora using the HL7 Fast Healthcare Interoperability Resources (FHIR) specification. The framework contains two main modules: 1) an automatic schema transformation module, in which the annotation schema in each corpus is automatically transformed into the FHIR-based schema; 2) an expert-based verification and annotation module, in which existing annotations can be verified and new annotations can be added for new elements defined in FHIR. We evaluated the framework using various annotation corpora created as part of different clinical NLP projects at the Mayo Clinic. We demonstrated that it is feasible to leverage FHIR as a standard data model for standardizing heterogeneous annotation corpora for their reuse and integration in advanced clinical NLP research and practices.

Original languageEnglish (US)
Pages (from-to)574-583
Number of pages10
JournalAMIA ... Annual Symposium proceedings. AMIA Symposium
Volume2018
StatePublished - Jan 1 2018

Fingerprint

Natural Language Processing
Delivery of Health Care
Research

ASJC Scopus subject areas

  • Medicine(all)

Cite this

@article{eb6df3cdd0534679b6c39c2eafa6b503,
title = "Standardizing Heterogeneous Annotation Corpora Using HL7 FHIR for Facilitating their Reuse and Integration in Clinical NLP",
abstract = "Manually annotated clinical corpora are commonly used as the gold standards for the training and evaluation of clinical natural language processing (NLP) tools. The creation of these manual annotation corpora, however, is both costly and time-consuming. There is an emerging need in the clinical NLP community for reusing existing annotation corpora across different clinical NLP tasks. The objective of this study is to design, develop and evaluate a framework and accompanying tools to support the standardization and integration of annotation corpora using the HL7 Fast Healthcare Interoperability Resources (FHIR) specification. The framework contains two main modules: 1) an automatic schema transformation module, in which the annotation schema in each corpus is automatically transformed into the FHIR-based schema; 2) an expert-based verification and annotation module, in which existing annotations can be verified and new annotations can be added for new elements defined in FHIR. We evaluated the framework using various annotation corpora created as part of different clinical NLP projects at the Mayo Clinic. We demonstrated that it is feasible to leverage FHIR as a standard data model for standardizing heterogeneous annotation corpora for their reuse and integration in advanced clinical NLP research and practices.",
author = "Na Hong and Andrew Wen and Mojarad, {Majid Rastegar} and Sunghwan Sohn and Liu, {Hongfang D} and Jiang, {Guoqian D}",
year = "2018",
month = "1",
day = "1",
language = "English (US)",
volume = "2018",
pages = "574--583",
journal = "AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium",
issn = "1559-4076",
publisher = "American Medical Informatics Association",

}

TY - JOUR

T1 - Standardizing Heterogeneous Annotation Corpora Using HL7 FHIR for Facilitating their Reuse and Integration in Clinical NLP

AU - Hong, Na

AU - Wen, Andrew

AU - Mojarad, Majid Rastegar

AU - Sohn, Sunghwan

AU - Liu, Hongfang D

AU - Jiang, Guoqian D

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Manually annotated clinical corpora are commonly used as the gold standards for the training and evaluation of clinical natural language processing (NLP) tools. The creation of these manual annotation corpora, however, is both costly and time-consuming. There is an emerging need in the clinical NLP community for reusing existing annotation corpora across different clinical NLP tasks. The objective of this study is to design, develop and evaluate a framework and accompanying tools to support the standardization and integration of annotation corpora using the HL7 Fast Healthcare Interoperability Resources (FHIR) specification. The framework contains two main modules: 1) an automatic schema transformation module, in which the annotation schema in each corpus is automatically transformed into the FHIR-based schema; 2) an expert-based verification and annotation module, in which existing annotations can be verified and new annotations can be added for new elements defined in FHIR. We evaluated the framework using various annotation corpora created as part of different clinical NLP projects at the Mayo Clinic. We demonstrated that it is feasible to leverage FHIR as a standard data model for standardizing heterogeneous annotation corpora for their reuse and integration in advanced clinical NLP research and practices.

AB - Manually annotated clinical corpora are commonly used as the gold standards for the training and evaluation of clinical natural language processing (NLP) tools. The creation of these manual annotation corpora, however, is both costly and time-consuming. There is an emerging need in the clinical NLP community for reusing existing annotation corpora across different clinical NLP tasks. The objective of this study is to design, develop and evaluate a framework and accompanying tools to support the standardization and integration of annotation corpora using the HL7 Fast Healthcare Interoperability Resources (FHIR) specification. The framework contains two main modules: 1) an automatic schema transformation module, in which the annotation schema in each corpus is automatically transformed into the FHIR-based schema; 2) an expert-based verification and annotation module, in which existing annotations can be verified and new annotations can be added for new elements defined in FHIR. We evaluated the framework using various annotation corpora created as part of different clinical NLP projects at the Mayo Clinic. We demonstrated that it is feasible to leverage FHIR as a standard data model for standardizing heterogeneous annotation corpora for their reuse and integration in advanced clinical NLP research and practices.

UR - http://www.scopus.com/inward/record.url?scp=85062377763&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85062377763&partnerID=8YFLogxK

M3 - Article

VL - 2018

SP - 574

EP - 583

JO - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

JF - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

SN - 1559-4076

ER -