A Qualitative Evaluation of Language Models on Automatic Question-Answering for COVID-19

David Oniani, Yanshan Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

COVID-19 (2019 Novel Coronavirus) has resulted in an ongoing pandemic and as of 26 July 2020, has caused more than 15.7 million cases and over 640,000 deaths. The highly dynamic and rapidly evolving situation with COVID-19 has made it difficult to access accurate, on-demand information regarding the disease. Online communities, forums, and social media provide potential venues to search for relevant questions and answers, or post questions and seek answers from other members. However, due to the nature of such sites, there are always a limited number of relevant questions and responses to search from, and posted questions are rarely answered immediately. With the advancements in the field of natural language processing, particularly in the domain of language models, it has become possible to design chatbots that can automatically answer consumer questions. However, such models are rarely applied and evaluated in the healthcare domain, to meet the information needs with accurate and up-To-date healthcare data. In this paper, we propose to apply a language model for automatically answering questions related to COVID-19 and qualitatively evaluate the generated responses. We utilized the GPT-2 language model and applied transfer learning to retrain it on the COVID-19 Open Research Dataset (CORD-19) corpus. In order to improve the quality of the generated responses, we applied 4 different approaches, namely tf-idf (Term Frequency-Inverse Document Frequency), Bidirectional Encoder Representations from Transformers (BERT), Bidirectional Encoder Representations from Transformers for Biomedical Text Mining (BioBERT), and Universal Sentence Encoder (USE) to filter and retain relevant sentences in the responses. In the performance evaluation step, we asked two medical experts to rate the responses. We found that BERT and BioBERT, on average, outperform both tf-idf and USE in relevance-based sentence filtering tasks. Additionally, based on the chatbot, we created a user-friendly interactive web application to be hosted online and made its source code available free of charge to anyone interested in running it locally, online, or just for experimental purposes. Overall, our work has yielded significant results in both designing a chatbot that produces high-quality responses to COVID-19-related questions and comparing several embedding generation techniques.

Original languageEnglish (US)
Title of host publicationProceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9781450379649
DOIs
StatePublished - Sep 21 2020
Event11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020 - Virtual, Online, United States
Duration: Sep 21 2020Sep 24 2020

Publication series

NameProceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020

Conference

Conference11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020
CountryUnited States
CityVirtual, Online
Period9/21/209/24/20

Keywords

  • ai
  • bert
  • biobert
  • cord-19
  • covid-19
  • dataset
  • gpt-2
  • nlp
  • semantic similarity
  • tf-idf
  • use

ASJC Scopus subject areas

  • Computer Science Applications
  • Software
  • Biomedical Engineering
  • Health Informatics

Fingerprint Dive into the research topics of 'A Qualitative Evaluation of Language Models on Automatic Question-Answering for COVID-19'. Together they form a unique fingerprint.

Cite this