TY - JOUR
T1 - A hybrid model to identify fall occurrence from electronic health records
AU - Fu, Sunyang
AU - Thorsteinsdottir, Bjoerg
AU - Zhang, Xin
AU - Lopes, Guilherme S.
AU - Pagali, Sandeep R.
AU - LeBrasseur, Nathan K.
AU - Wen, Andrew
AU - Liu, Hongfang
AU - Rocca, Walter A.
AU - Olson, Janet E.
AU - Sauver, Jennifer St
AU - Sohn, Sunghwan
N1 - Funding Information:
This work was supported by National Institute on Aging R21 AG58738, National Institute on Aging R01 AG34676, and National Institute on Aging R01AG068007.
Publisher Copyright:
© 2022 Elsevier B.V.
PY - 2022/6
Y1 - 2022/6
N2 - Introduction: Falls are a leading cause of unintentional injury in the elderly. Electronic health records (EHRs) offer the unique opportunity to develop models that can identify fall events. However, identifying fall events in clinical notes requires advanced natural language processing (NLP) to simultaneously address multiple issues because the word “fall” is a typical homonym. Methods: We implemented a context-aware language model, Bidirectional Encoder Representations from Transformers (BERT) to identify falls from the EHR text and further fused the BERT model into a hybrid architecture coupled with post-hoc heuristic rules to enhance the performance. The models were evaluated on real world EHR data and were compared to conventional rule-based and deep learning models (CNN and Bi-LSTM). To better understand the ability of each approach to identify falls, we further categorize fall-related concepts (i.e., risk of fall, prevention of fall, homonym) and performed a detailed error analysis. Results: The hybrid model achieved the highest f1-score on sentence (0.971), document (0.985), and patient (0.954) level. At the sentence level (basic data unit in the model), the hybrid model had 0.954, 1.000, 0.988, and 0.999 in sensitivity, specificity, positive predictive value, and negative predictive value, respectively. The error analysis showed that that machine learning-based approaches demonstrated higher performance than a rule-based approach in challenging cases that required contextual understanding. The context-aware language model (BERT) slightly outperformed the word embedding approach trained on Bi-LSTM. No single model yielded the best performance for all fall-related semantic categories. Conclusion: A context-aware language model (BERT) was able to identify challenging fall events that requires context understanding in EHR free text. The hybrid model combined with post-hoc rules allowed a custom fix on the BERT outcomes and further improved the performance of fall detection.
AB - Introduction: Falls are a leading cause of unintentional injury in the elderly. Electronic health records (EHRs) offer the unique opportunity to develop models that can identify fall events. However, identifying fall events in clinical notes requires advanced natural language processing (NLP) to simultaneously address multiple issues because the word “fall” is a typical homonym. Methods: We implemented a context-aware language model, Bidirectional Encoder Representations from Transformers (BERT) to identify falls from the EHR text and further fused the BERT model into a hybrid architecture coupled with post-hoc heuristic rules to enhance the performance. The models were evaluated on real world EHR data and were compared to conventional rule-based and deep learning models (CNN and Bi-LSTM). To better understand the ability of each approach to identify falls, we further categorize fall-related concepts (i.e., risk of fall, prevention of fall, homonym) and performed a detailed error analysis. Results: The hybrid model achieved the highest f1-score on sentence (0.971), document (0.985), and patient (0.954) level. At the sentence level (basic data unit in the model), the hybrid model had 0.954, 1.000, 0.988, and 0.999 in sensitivity, specificity, positive predictive value, and negative predictive value, respectively. The error analysis showed that that machine learning-based approaches demonstrated higher performance than a rule-based approach in challenging cases that required contextual understanding. The context-aware language model (BERT) slightly outperformed the word embedding approach trained on Bi-LSTM. No single model yielded the best performance for all fall-related semantic categories. Conclusion: A context-aware language model (BERT) was able to identify challenging fall events that requires context understanding in EHR free text. The hybrid model combined with post-hoc rules allowed a custom fix on the BERT outcomes and further improved the performance of fall detection.
KW - BERT
KW - EHR
KW - Fall
KW - NLP
UR - http://www.scopus.com/inward/record.url?scp=85126622773&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85126622773&partnerID=8YFLogxK
U2 - 10.1016/j.ijmedinf.2022.104736
DO - 10.1016/j.ijmedinf.2022.104736
M3 - Article
AN - SCOPUS:85126622773
SN - 1386-5056
VL - 162
JO - International Journal of Medical Informatics
JF - International Journal of Medical Informatics
M1 - 104736
ER -