TY - JOUR
T1 - A fast, resource efficient, and reliable rule-based system for COVID-19 symptom identification
AU - Sahoo, Himanshu S.
AU - Silverman, Greg M.
AU - Ingraham, Nicholas E.
AU - Lupei, Monica I.
AU - Puskarich, Michael A.
AU - Finzel, Raymond L.
AU - Sartori, John
AU - Zhang, Rui
AU - Knoll, Benjamin C.
AU - Liu, Sijia
AU - Liu, Hongfang
AU - Melton, Genevieve B.
AU - Tignanelli, Christopher J.
AU - Pakhomov, Serguei V.S.
N1 - Publisher Copyright:
© 2021 The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association.
PY - 2021/7/1
Y1 - 2021/7/1
N2 - Objective: With COVID-19, there was a need for a rapidly scalable annotation system that facilitated real-time integration with clinical decision support systems (CDS). Current annotation systems suffer from a high-resource utilization and poor scalability limiting real-world integration with CDS. A potential solution to mitigate these issues is to use the rule-based gazetteer developed at our institution. Materials and Methods: Performance, resource utilization, and runtime of the rule-based gazetteer were compared with five annotation systems: BioMedICUS, cTAKES, MetaMap, CLAMP, and MedTagger. Results: This rule-based gazetteer was the fastest, had a low resource footprint, and similar performance for weighted microaverage and macroaverage measures of precision, recall, and f1-score compared to other annotation systems. Discussion: Opportunities to increase its performance include fine-tuning lexical rules for symptom identification. Additionally, it could run on multiple compute nodes for faster runtime. Conclusion: This rule-based gazetteer overcame key technical limitations facilitating real-time symptomatology identification for COVID-19 and integration of unstructured data elements into our CDS. It is ideal for large-scale deployment across a wide variety of healthcare settings for surveillance of acute COVID-19 symptoms for integration into prognostic modeling. Such a system is currently being leveraged for monitoring of postacute sequelae of COVID-19 (PASC) progression in COVID-19 survivors. This study conducted the first in-depth analysis and developed a rule-based gazetteer for COVID-19 symptom extraction with the following key features: low processor and memory utilization, faster runtime, and similar weighted microaverage and macroaverage measures for precision, recall, and f1-score compared to industry-standard annotation systems.
AB - Objective: With COVID-19, there was a need for a rapidly scalable annotation system that facilitated real-time integration with clinical decision support systems (CDS). Current annotation systems suffer from a high-resource utilization and poor scalability limiting real-world integration with CDS. A potential solution to mitigate these issues is to use the rule-based gazetteer developed at our institution. Materials and Methods: Performance, resource utilization, and runtime of the rule-based gazetteer were compared with five annotation systems: BioMedICUS, cTAKES, MetaMap, CLAMP, and MedTagger. Results: This rule-based gazetteer was the fastest, had a low resource footprint, and similar performance for weighted microaverage and macroaverage measures of precision, recall, and f1-score compared to other annotation systems. Discussion: Opportunities to increase its performance include fine-tuning lexical rules for symptom identification. Additionally, it could run on multiple compute nodes for faster runtime. Conclusion: This rule-based gazetteer overcame key technical limitations facilitating real-time symptomatology identification for COVID-19 and integration of unstructured data elements into our CDS. It is ideal for large-scale deployment across a wide variety of healthcare settings for surveillance of acute COVID-19 symptoms for integration into prognostic modeling. Such a system is currently being leveraged for monitoring of postacute sequelae of COVID-19 (PASC) progression in COVID-19 survivors. This study conducted the first in-depth analysis and developed a rule-based gazetteer for COVID-19 symptom extraction with the following key features: low processor and memory utilization, faster runtime, and similar weighted microaverage and macroaverage measures for precision, recall, and f1-score compared to industry-standard annotation systems.
KW - Natural language processing
KW - and symptoms
KW - artificial intelligence
KW - clinical decision support systems
KW - follow-up studies
KW - information extraction
KW - signs
UR - http://www.scopus.com/inward/record.url?scp=85118120384&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85118120384&partnerID=8YFLogxK
U2 - 10.1093/jamiaopen/ooab070
DO - 10.1093/jamiaopen/ooab070
M3 - Article
AN - SCOPUS:85118120384
SN - 2574-2531
VL - 4
JO - JAMIA Open
JF - JAMIA Open
IS - 3
M1 - ooab070
ER -