Clinical documentation variations and NLP system portability

A case study in asthma birth cohorts across institutions

Sunghwan Sohn, Yanshan Wang, Chung Il Wi, Elizabeth A. Krusemark, Euijung Ryu, Mir H. Ali, Young J Juhn, Hongfang D Liu

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

Objective: To assess clinical documentation variations across health care institutions using different electronic medical record systems and investigate how they affect natural language processing (NLP) system portability. Materials and Methods: Birth cohorts from Mayo Clinic and Sanford Children's Hospital (SCH) were used in this study (n=298 for each). Documentation variations regarding asthma between the 2 cohorts were examined in various aspects: (1) overall corpus at the word level (ie, lexical variation), (2) topics and asthma-related concepts (ie, semantic variation), and (3) clinical note types (ie, process variation). We compared those statistics and explored NLP system portability for asthma ascertainment in 2 stages: prototype and refinement. Results: There exist notable lexical variations (word-level similarity=0.669) and process variations (differences in major note types containing asthma-related concepts). However, semantic-level corpora were relatively homogeneous (topic similarity=0.944, asthma-related concept similarity=0.971). The NLP system for asthma ascertainment had an F-score of 0.937 at Mayo, and produced 0.813 (prototype) and 0.908 (refinement) when applied at SCH. Discussion: The criteria for asthma ascertainment are largely dependent on asthma-related concepts. Therefore, we believe that semantic similarity is important to estimate NLP system portability. As the Mayo Clinic and SCH corpora were relatively homogeneous at a semantic level, the NLP system, developed at Mayo Clinic, was imported to SCH successfully with proper adjustments to deal with the intrinsic corpus heterogeneity.

Original languageEnglish (US)
Pages (from-to)353-359
Number of pages7
JournalJournal of the American Medical Informatics Association
Volume25
Issue number3
DOIs
StatePublished - Mar 1 2018

Fingerprint

Natural Language Processing
Documentation
Asthma
Parturition
Semantics
Electronic Health Records
Delivery of Health Care

Keywords

  • Asthma
  • Documentation variation
  • Electronic medical records
  • Natural language processing
  • Portability

ASJC Scopus subject areas

  • Health Informatics

Cite this

Clinical documentation variations and NLP system portability : A case study in asthma birth cohorts across institutions. / Sohn, Sunghwan; Wang, Yanshan; Wi, Chung Il; Krusemark, Elizabeth A.; Ryu, Euijung; Ali, Mir H.; Juhn, Young J; Liu, Hongfang D.

In: Journal of the American Medical Informatics Association, Vol. 25, No. 3, 01.03.2018, p. 353-359.

Research output: Contribution to journalArticle

@article{315b5b1f52404020818be0b314daa4de,
title = "Clinical documentation variations and NLP system portability: A case study in asthma birth cohorts across institutions",
abstract = "Objective: To assess clinical documentation variations across health care institutions using different electronic medical record systems and investigate how they affect natural language processing (NLP) system portability. Materials and Methods: Birth cohorts from Mayo Clinic and Sanford Children's Hospital (SCH) were used in this study (n=298 for each). Documentation variations regarding asthma between the 2 cohorts were examined in various aspects: (1) overall corpus at the word level (ie, lexical variation), (2) topics and asthma-related concepts (ie, semantic variation), and (3) clinical note types (ie, process variation). We compared those statistics and explored NLP system portability for asthma ascertainment in 2 stages: prototype and refinement. Results: There exist notable lexical variations (word-level similarity=0.669) and process variations (differences in major note types containing asthma-related concepts). However, semantic-level corpora were relatively homogeneous (topic similarity=0.944, asthma-related concept similarity=0.971). The NLP system for asthma ascertainment had an F-score of 0.937 at Mayo, and produced 0.813 (prototype) and 0.908 (refinement) when applied at SCH. Discussion: The criteria for asthma ascertainment are largely dependent on asthma-related concepts. Therefore, we believe that semantic similarity is important to estimate NLP system portability. As the Mayo Clinic and SCH corpora were relatively homogeneous at a semantic level, the NLP system, developed at Mayo Clinic, was imported to SCH successfully with proper adjustments to deal with the intrinsic corpus heterogeneity.",
keywords = "Asthma, Documentation variation, Electronic medical records, Natural language processing, Portability",
author = "Sunghwan Sohn and Yanshan Wang and Wi, {Chung Il} and Krusemark, {Elizabeth A.} and Euijung Ryu and Ali, {Mir H.} and Juhn, {Young J} and Liu, {Hongfang D}",
year = "2018",
month = "3",
day = "1",
doi = "10.1093/jamia/ocx138",
language = "English (US)",
volume = "25",
pages = "353--359",
journal = "Journal of the American Medical Informatics Association : JAMIA",
issn = "1067-5027",
publisher = "Oxford University Press",
number = "3",

}

TY - JOUR

T1 - Clinical documentation variations and NLP system portability

T2 - A case study in asthma birth cohorts across institutions

AU - Sohn, Sunghwan

AU - Wang, Yanshan

AU - Wi, Chung Il

AU - Krusemark, Elizabeth A.

AU - Ryu, Euijung

AU - Ali, Mir H.

AU - Juhn, Young J

AU - Liu, Hongfang D

PY - 2018/3/1

Y1 - 2018/3/1

N2 - Objective: To assess clinical documentation variations across health care institutions using different electronic medical record systems and investigate how they affect natural language processing (NLP) system portability. Materials and Methods: Birth cohorts from Mayo Clinic and Sanford Children's Hospital (SCH) were used in this study (n=298 for each). Documentation variations regarding asthma between the 2 cohorts were examined in various aspects: (1) overall corpus at the word level (ie, lexical variation), (2) topics and asthma-related concepts (ie, semantic variation), and (3) clinical note types (ie, process variation). We compared those statistics and explored NLP system portability for asthma ascertainment in 2 stages: prototype and refinement. Results: There exist notable lexical variations (word-level similarity=0.669) and process variations (differences in major note types containing asthma-related concepts). However, semantic-level corpora were relatively homogeneous (topic similarity=0.944, asthma-related concept similarity=0.971). The NLP system for asthma ascertainment had an F-score of 0.937 at Mayo, and produced 0.813 (prototype) and 0.908 (refinement) when applied at SCH. Discussion: The criteria for asthma ascertainment are largely dependent on asthma-related concepts. Therefore, we believe that semantic similarity is important to estimate NLP system portability. As the Mayo Clinic and SCH corpora were relatively homogeneous at a semantic level, the NLP system, developed at Mayo Clinic, was imported to SCH successfully with proper adjustments to deal with the intrinsic corpus heterogeneity.

AB - Objective: To assess clinical documentation variations across health care institutions using different electronic medical record systems and investigate how they affect natural language processing (NLP) system portability. Materials and Methods: Birth cohorts from Mayo Clinic and Sanford Children's Hospital (SCH) were used in this study (n=298 for each). Documentation variations regarding asthma between the 2 cohorts were examined in various aspects: (1) overall corpus at the word level (ie, lexical variation), (2) topics and asthma-related concepts (ie, semantic variation), and (3) clinical note types (ie, process variation). We compared those statistics and explored NLP system portability for asthma ascertainment in 2 stages: prototype and refinement. Results: There exist notable lexical variations (word-level similarity=0.669) and process variations (differences in major note types containing asthma-related concepts). However, semantic-level corpora were relatively homogeneous (topic similarity=0.944, asthma-related concept similarity=0.971). The NLP system for asthma ascertainment had an F-score of 0.937 at Mayo, and produced 0.813 (prototype) and 0.908 (refinement) when applied at SCH. Discussion: The criteria for asthma ascertainment are largely dependent on asthma-related concepts. Therefore, we believe that semantic similarity is important to estimate NLP system portability. As the Mayo Clinic and SCH corpora were relatively homogeneous at a semantic level, the NLP system, developed at Mayo Clinic, was imported to SCH successfully with proper adjustments to deal with the intrinsic corpus heterogeneity.

KW - Asthma

KW - Documentation variation

KW - Electronic medical records

KW - Natural language processing

KW - Portability

UR - http://www.scopus.com/inward/record.url?scp=85043335010&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85043335010&partnerID=8YFLogxK

U2 - 10.1093/jamia/ocx138

DO - 10.1093/jamia/ocx138

M3 - Article

VL - 25

SP - 353

EP - 359

JO - Journal of the American Medical Informatics Association : JAMIA

JF - Journal of the American Medical Informatics Association : JAMIA

SN - 1067-5027

IS - 3

ER -