TY - JOUR
T1 - Clinical documentation variations and NLP system portability
T2 - A case study in asthma birth cohorts across institutions
AU - Sohn, Sunghwan
AU - Wang, Yanshan
AU - Wi, Chung Il
AU - Krusemark, Elizabeth A.
AU - Ryu, Euijung
AU - Ali, Mir H.
AU - Juhn, Young J.
AU - Liu, Hongfang
N1 - Funding Information:
This work was made possible by National Institute of General Medical Sciences R01GM102282, National Library of Medicine R01LM11934, National Institute of Biomedical Imaging and Bioengineering R01EB19403, National Heart, Lung, and Blood Institute R01HL126667, National Institute of Child Health and Human Development R21Al116839-01, and the T Denny San-ford Pediatric Collaborative Research Fund.
Publisher Copyright:
© The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved.
PY - 2018/3/1
Y1 - 2018/3/1
N2 - Objective: To assess clinical documentation variations across health care institutions using different electronic medical record systems and investigate how they affect natural language processing (NLP) system portability. Materials and Methods: Birth cohorts from Mayo Clinic and Sanford Children's Hospital (SCH) were used in this study (n=298 for each). Documentation variations regarding asthma between the 2 cohorts were examined in various aspects: (1) overall corpus at the word level (ie, lexical variation), (2) topics and asthma-related concepts (ie, semantic variation), and (3) clinical note types (ie, process variation). We compared those statistics and explored NLP system portability for asthma ascertainment in 2 stages: prototype and refinement. Results: There exist notable lexical variations (word-level similarity=0.669) and process variations (differences in major note types containing asthma-related concepts). However, semantic-level corpora were relatively homogeneous (topic similarity=0.944, asthma-related concept similarity=0.971). The NLP system for asthma ascertainment had an F-score of 0.937 at Mayo, and produced 0.813 (prototype) and 0.908 (refinement) when applied at SCH. Discussion: The criteria for asthma ascertainment are largely dependent on asthma-related concepts. Therefore, we believe that semantic similarity is important to estimate NLP system portability. As the Mayo Clinic and SCH corpora were relatively homogeneous at a semantic level, the NLP system, developed at Mayo Clinic, was imported to SCH successfully with proper adjustments to deal with the intrinsic corpus heterogeneity.
AB - Objective: To assess clinical documentation variations across health care institutions using different electronic medical record systems and investigate how they affect natural language processing (NLP) system portability. Materials and Methods: Birth cohorts from Mayo Clinic and Sanford Children's Hospital (SCH) were used in this study (n=298 for each). Documentation variations regarding asthma between the 2 cohorts were examined in various aspects: (1) overall corpus at the word level (ie, lexical variation), (2) topics and asthma-related concepts (ie, semantic variation), and (3) clinical note types (ie, process variation). We compared those statistics and explored NLP system portability for asthma ascertainment in 2 stages: prototype and refinement. Results: There exist notable lexical variations (word-level similarity=0.669) and process variations (differences in major note types containing asthma-related concepts). However, semantic-level corpora were relatively homogeneous (topic similarity=0.944, asthma-related concept similarity=0.971). The NLP system for asthma ascertainment had an F-score of 0.937 at Mayo, and produced 0.813 (prototype) and 0.908 (refinement) when applied at SCH. Discussion: The criteria for asthma ascertainment are largely dependent on asthma-related concepts. Therefore, we believe that semantic similarity is important to estimate NLP system portability. As the Mayo Clinic and SCH corpora were relatively homogeneous at a semantic level, the NLP system, developed at Mayo Clinic, was imported to SCH successfully with proper adjustments to deal with the intrinsic corpus heterogeneity.
KW - Asthma
KW - Documentation variation
KW - Electronic medical records
KW - Natural language processing
KW - Portability
UR - http://www.scopus.com/inward/record.url?scp=85043335010&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85043335010&partnerID=8YFLogxK
U2 - 10.1093/jamia/ocx138
DO - 10.1093/jamia/ocx138
M3 - Article
C2 - 29202185
AN - SCOPUS:85043335010
VL - 25
SP - 353
EP - 359
JO - Journal of the American Medical Informatics Association : JAMIA
JF - Journal of the American Medical Informatics Association : JAMIA
SN - 1067-5027
IS - 3
ER -