A hybrid model to identify fall occurrence from electronic health records

Sunyang Fu; Bjoerg Thorsteinsdottir; Xin Zhang; Guilherme S. Lopes; Sandeep R. Pagali; Nathan K. LeBrasseur; Andrew Wen; Hongfang Liu; Walter A. Rocca; Janet E. Olson; Jennifer St Sauver; Sunghwan Sohn

doi:10.1016/j.ijmedinf.2022.104736

A hybrid model to identify fall occurrence from electronic health records

Sunyang Fu, Bjoerg Thorsteinsdottir, Xin Zhang, Guilherme S. Lopes, Sandeep R. Pagali, Nathan K. LeBrasseur, Andrew Wen, Hongfang Liu, Walter A. Rocca, Janet E. Olson, Jennifer St Sauver, Sunghwan Sohn

Research output: Contribution to journal › Article › peer-review

Abstract

Introduction: Falls are a leading cause of unintentional injury in the elderly. Electronic health records (EHRs) offer the unique opportunity to develop models that can identify fall events. However, identifying fall events in clinical notes requires advanced natural language processing (NLP) to simultaneously address multiple issues because the word “fall” is a typical homonym. Methods: We implemented a context-aware language model, Bidirectional Encoder Representations from Transformers (BERT) to identify falls from the EHR text and further fused the BERT model into a hybrid architecture coupled with post-hoc heuristic rules to enhance the performance. The models were evaluated on real world EHR data and were compared to conventional rule-based and deep learning models (CNN and Bi-LSTM). To better understand the ability of each approach to identify falls, we further categorize fall-related concepts (i.e., risk of fall, prevention of fall, homonym) and performed a detailed error analysis. Results: The hybrid model achieved the highest f1-score on sentence (0.971), document (0.985), and patient (0.954) level. At the sentence level (basic data unit in the model), the hybrid model had 0.954, 1.000, 0.988, and 0.999 in sensitivity, specificity, positive predictive value, and negative predictive value, respectively. The error analysis showed that that machine learning-based approaches demonstrated higher performance than a rule-based approach in challenging cases that required contextual understanding. The context-aware language model (BERT) slightly outperformed the word embedding approach trained on Bi-LSTM. No single model yielded the best performance for all fall-related semantic categories. Conclusion: A context-aware language model (BERT) was able to identify challenging fall events that requires context understanding in EHR free text. The hybrid model combined with post-hoc rules allowed a custom fix on the BERT outcomes and further improved the performance of fall detection.

Original language	English (US)
Article number	104736
Journal	International Journal of Medical Informatics
Volume	162
DOIs	https://doi.org/10.1016/j.ijmedinf.2022.104736
State	Published - Jun 2022

Keywords

BERT
EHR
Fall
NLP

ASJC Scopus subject areas

Health Informatics

Access to Document

10.1016/j.ijmedinf.2022.104736

Cite this

@article{31a7c0c947594c7ba2cecd560d2d0d32,

title = "A hybrid model to identify fall occurrence from electronic health records",

abstract = "Introduction: Falls are a leading cause of unintentional injury in the elderly. Electronic health records (EHRs) offer the unique opportunity to develop models that can identify fall events. However, identifying fall events in clinical notes requires advanced natural language processing (NLP) to simultaneously address multiple issues because the word “fall” is a typical homonym. Methods: We implemented a context-aware language model, Bidirectional Encoder Representations from Transformers (BERT) to identify falls from the EHR text and further fused the BERT model into a hybrid architecture coupled with post-hoc heuristic rules to enhance the performance. The models were evaluated on real world EHR data and were compared to conventional rule-based and deep learning models (CNN and Bi-LSTM). To better understand the ability of each approach to identify falls, we further categorize fall-related concepts (i.e., risk of fall, prevention of fall, homonym) and performed a detailed error analysis. Results: The hybrid model achieved the highest f1-score on sentence (0.971), document (0.985), and patient (0.954) level. At the sentence level (basic data unit in the model), the hybrid model had 0.954, 1.000, 0.988, and 0.999 in sensitivity, specificity, positive predictive value, and negative predictive value, respectively. The error analysis showed that that machine learning-based approaches demonstrated higher performance than a rule-based approach in challenging cases that required contextual understanding. The context-aware language model (BERT) slightly outperformed the word embedding approach trained on Bi-LSTM. No single model yielded the best performance for all fall-related semantic categories. Conclusion: A context-aware language model (BERT) was able to identify challenging fall events that requires context understanding in EHR free text. The hybrid model combined with post-hoc rules allowed a custom fix on the BERT outcomes and further improved the performance of fall detection.",

keywords = "BERT, EHR, Fall, NLP",

author = "Sunyang Fu and Bjoerg Thorsteinsdottir and Xin Zhang and Lopes, {Guilherme S.} and Pagali, {Sandeep R.} and LeBrasseur, {Nathan K.} and Andrew Wen and Hongfang Liu and Rocca, {Walter A.} and Olson, {Janet E.} and Sauver, {Jennifer St} and Sunghwan Sohn",

note = "Publisher Copyright: {\textcopyright} 2022 Elsevier B.V.",

year = "2022",

month = jun,

doi = "10.1016/j.ijmedinf.2022.104736",

language = "English (US)",

volume = "162",

journal = "International Journal of Medical Informatics",

issn = "1386-5056",

publisher = "Elsevier Ireland Ltd",

}

TY - JOUR

T1 - A hybrid model to identify fall occurrence from electronic health records

AU - Fu, Sunyang

AU - Thorsteinsdottir, Bjoerg

AU - Zhang, Xin

AU - Lopes, Guilherme S.

AU - Pagali, Sandeep R.

AU - LeBrasseur, Nathan K.

AU - Wen, Andrew

AU - Liu, Hongfang

AU - Rocca, Walter A.

AU - Olson, Janet E.

AU - Sauver, Jennifer St

AU - Sohn, Sunghwan

PY - 2022/6

Y1 - 2022/6

N2 - Introduction: Falls are a leading cause of unintentional injury in the elderly. Electronic health records (EHRs) offer the unique opportunity to develop models that can identify fall events. However, identifying fall events in clinical notes requires advanced natural language processing (NLP) to simultaneously address multiple issues because the word “fall” is a typical homonym. Methods: We implemented a context-aware language model, Bidirectional Encoder Representations from Transformers (BERT) to identify falls from the EHR text and further fused the BERT model into a hybrid architecture coupled with post-hoc heuristic rules to enhance the performance. The models were evaluated on real world EHR data and were compared to conventional rule-based and deep learning models (CNN and Bi-LSTM). To better understand the ability of each approach to identify falls, we further categorize fall-related concepts (i.e., risk of fall, prevention of fall, homonym) and performed a detailed error analysis. Results: The hybrid model achieved the highest f1-score on sentence (0.971), document (0.985), and patient (0.954) level. At the sentence level (basic data unit in the model), the hybrid model had 0.954, 1.000, 0.988, and 0.999 in sensitivity, specificity, positive predictive value, and negative predictive value, respectively. The error analysis showed that that machine learning-based approaches demonstrated higher performance than a rule-based approach in challenging cases that required contextual understanding. The context-aware language model (BERT) slightly outperformed the word embedding approach trained on Bi-LSTM. No single model yielded the best performance for all fall-related semantic categories. Conclusion: A context-aware language model (BERT) was able to identify challenging fall events that requires context understanding in EHR free text. The hybrid model combined with post-hoc rules allowed a custom fix on the BERT outcomes and further improved the performance of fall detection.

AB - Introduction: Falls are a leading cause of unintentional injury in the elderly. Electronic health records (EHRs) offer the unique opportunity to develop models that can identify fall events. However, identifying fall events in clinical notes requires advanced natural language processing (NLP) to simultaneously address multiple issues because the word “fall” is a typical homonym. Methods: We implemented a context-aware language model, Bidirectional Encoder Representations from Transformers (BERT) to identify falls from the EHR text and further fused the BERT model into a hybrid architecture coupled with post-hoc heuristic rules to enhance the performance. The models were evaluated on real world EHR data and were compared to conventional rule-based and deep learning models (CNN and Bi-LSTM). To better understand the ability of each approach to identify falls, we further categorize fall-related concepts (i.e., risk of fall, prevention of fall, homonym) and performed a detailed error analysis. Results: The hybrid model achieved the highest f1-score on sentence (0.971), document (0.985), and patient (0.954) level. At the sentence level (basic data unit in the model), the hybrid model had 0.954, 1.000, 0.988, and 0.999 in sensitivity, specificity, positive predictive value, and negative predictive value, respectively. The error analysis showed that that machine learning-based approaches demonstrated higher performance than a rule-based approach in challenging cases that required contextual understanding. The context-aware language model (BERT) slightly outperformed the word embedding approach trained on Bi-LSTM. No single model yielded the best performance for all fall-related semantic categories. Conclusion: A context-aware language model (BERT) was able to identify challenging fall events that requires context understanding in EHR free text. The hybrid model combined with post-hoc rules allowed a custom fix on the BERT outcomes and further improved the performance of fall detection.

KW - BERT

KW - EHR

KW - Fall

KW - NLP

UR - http://www.scopus.com/inward/record.url?scp=85126622773&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85126622773&partnerID=8YFLogxK

U2 - 10.1016/j.ijmedinf.2022.104736

DO - 10.1016/j.ijmedinf.2022.104736

M3 - Article

C2 - 35316697

AN - SCOPUS:85126622773

SN - 1386-5056

VL - 162

JO - International Journal of Medical Informatics

JF - International Journal of Medical Informatics

M1 - 104736

ER -

A hybrid model to identify fall occurrence from electronic health records

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this