TY - JOUR
T1 - Clinical documentation variations and NLP system portability
T2 - A case study in asthma birth cohorts across institutions
AU - Sohn, Sunghwan
AU - Wang, Yanshan
AU - Wi, Chung Il
AU - Krusemark, Elizabeth A.
AU - Ryu, Euijung
AU - Ali, Mir H.
AU - Juhn, Young J.
AU - Liu, Hongfang
N1 - Publisher Copyright:
© The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved.
PY - 2018/3/1
Y1 - 2018/3/1
N2 - Objective: To assess clinical documentation variations across health care institutions using different electronic medical record systems and investigate how they affect natural language processing (NLP) system portability. Materials and Methods: Birth cohorts from Mayo Clinic and Sanford Children's Hospital (SCH) were used in this study (n=298 for each). Documentation variations regarding asthma between the 2 cohorts were examined in various aspects: (1) overall corpus at the word level (ie, lexical variation), (2) topics and asthma-related concepts (ie, semantic variation), and (3) clinical note types (ie, process variation). We compared those statistics and explored NLP system portability for asthma ascertainment in 2 stages: prototype and refinement. Results: There exist notable lexical variations (word-level similarity=0.669) and process variations (differences in major note types containing asthma-related concepts). However, semantic-level corpora were relatively homogeneous (topic similarity=0.944, asthma-related concept similarity=0.971). The NLP system for asthma ascertainment had an F-score of 0.937 at Mayo, and produced 0.813 (prototype) and 0.908 (refinement) when applied at SCH. Discussion: The criteria for asthma ascertainment are largely dependent on asthma-related concepts. Therefore, we believe that semantic similarity is important to estimate NLP system portability. As the Mayo Clinic and SCH corpora were relatively homogeneous at a semantic level, the NLP system, developed at Mayo Clinic, was imported to SCH successfully with proper adjustments to deal with the intrinsic corpus heterogeneity.
AB - Objective: To assess clinical documentation variations across health care institutions using different electronic medical record systems and investigate how they affect natural language processing (NLP) system portability. Materials and Methods: Birth cohorts from Mayo Clinic and Sanford Children's Hospital (SCH) were used in this study (n=298 for each). Documentation variations regarding asthma between the 2 cohorts were examined in various aspects: (1) overall corpus at the word level (ie, lexical variation), (2) topics and asthma-related concepts (ie, semantic variation), and (3) clinical note types (ie, process variation). We compared those statistics and explored NLP system portability for asthma ascertainment in 2 stages: prototype and refinement. Results: There exist notable lexical variations (word-level similarity=0.669) and process variations (differences in major note types containing asthma-related concepts). However, semantic-level corpora were relatively homogeneous (topic similarity=0.944, asthma-related concept similarity=0.971). The NLP system for asthma ascertainment had an F-score of 0.937 at Mayo, and produced 0.813 (prototype) and 0.908 (refinement) when applied at SCH. Discussion: The criteria for asthma ascertainment are largely dependent on asthma-related concepts. Therefore, we believe that semantic similarity is important to estimate NLP system portability. As the Mayo Clinic and SCH corpora were relatively homogeneous at a semantic level, the NLP system, developed at Mayo Clinic, was imported to SCH successfully with proper adjustments to deal with the intrinsic corpus heterogeneity.
KW - Asthma
KW - Documentation variation
KW - Electronic medical records
KW - Natural language processing
KW - Portability
UR - http://www.scopus.com/inward/record.url?scp=85043335010&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85043335010&partnerID=8YFLogxK
U2 - 10.1093/jamia/ocx138
DO - 10.1093/jamia/ocx138
M3 - Article
C2 - 29202185
AN - SCOPUS:85043335010
SN - 1067-5027
VL - 25
SP - 353
EP - 359
JO - Journal of the American Medical Informatics Association
JF - Journal of the American Medical Informatics Association
IS - 3
ER -