Adapting and evaluating a deep learning language model for clinical why-question answering

Andrew Wen; Mohamed Y. Elwazir; Sungrim Moon; Jungwei Fan

doi:10.1093/JAMIAOPEN/OOZ072

Adapting and evaluating a deep learning language model for clinical why-question answering

Andrew Wen, Mohamed Y. Elwazir, Sungrim Moon, Jungwei Fan

Digital Health Sciences

Research output: Contribution to journal › Article › peer-review

Abstract

Objectives: To adapt and evaluate a deep learning language model for answering why-questions based on patient-specific clinical text. Materials and Methods: Bidirectional encoder representations from transformers (BERT) models were trained with varying data sources to perform SQuAD 2.0 style why-question answering (why-QA) on clinical notes. The evaluation focused on: (1) comparing the merits from different training data and (2) error analysis. Results: The best model achieved an accuracy of 0.707 (or 0.760 by partial match). Training toward customization for the clinical language helped increase 6% in accuracy. Discussion: The error analysis suggested that the model did not really perform deep reasoning and that clinical why-QA might warrant more sophisticated solutions. Conclusion: The BERT model achieved moderate accuracy in clinical why-QA and should benefit from the rapidly evolving technology. Despite the identified limitations, it could serve as a competent proxy for questiondriven clinical information extraction.

Original language	English (US)
Pages (from-to)	16-20
Number of pages	5
Journal	JAMIA Open
Volume	3
Issue number	1
DOIs	https://doi.org/10.1093/JAMIAOPEN/OOZ072
State	Published - 2021

Keywords

Artificial intelligence
Clinical decision-making
Evaluation studies
Natural language processing
Question answering

ASJC Scopus subject areas

Health Informatics

Access to Document

10.1093/JAMIAOPEN/OOZ072

Cite this

@article{2cf3132e89ae4ceda0f27adb986da344,

title = "Adapting and evaluating a deep learning language model for clinical why-question answering",

abstract = "Objectives: To adapt and evaluate a deep learning language model for answering why-questions based on patient-specific clinical text. Materials and Methods: Bidirectional encoder representations from transformers (BERT) models were trained with varying data sources to perform SQuAD 2.0 style why-question answering (why-QA) on clinical notes. The evaluation focused on: (1) comparing the merits from different training data and (2) error analysis. Results: The best model achieved an accuracy of 0.707 (or 0.760 by partial match). Training toward customization for the clinical language helped increase 6% in accuracy. Discussion: The error analysis suggested that the model did not really perform deep reasoning and that clinical why-QA might warrant more sophisticated solutions. Conclusion: The BERT model achieved moderate accuracy in clinical why-QA and should benefit from the rapidly evolving technology. Despite the identified limitations, it could serve as a competent proxy for questiondriven clinical information extraction.",

keywords = "Artificial intelligence, Clinical decision-making, Evaluation studies, Natural language processing, Question answering",

author = "Andrew Wen and Elwazir, {Mohamed Y.} and Sungrim Moon and Jungwei Fan",

note = "Publisher Copyright: {\textcopyright} The Author(s) 2020.",

year = "2021",

doi = "10.1093/JAMIAOPEN/OOZ072",

language = "English (US)",

volume = "3",

pages = "16--20",

journal = "JAMIA Open",

issn = "2574-2531",

publisher = "Oxford University Press",

number = "1",

}

TY - JOUR

T1 - Adapting and evaluating a deep learning language model for clinical why-question answering

AU - Wen, Andrew

AU - Elwazir, Mohamed Y.

AU - Moon, Sungrim

AU - Fan, Jungwei

N1 - Publisher Copyright: © The Author(s) 2020.

PY - 2021

Y1 - 2021

N2 - Objectives: To adapt and evaluate a deep learning language model for answering why-questions based on patient-specific clinical text. Materials and Methods: Bidirectional encoder representations from transformers (BERT) models were trained with varying data sources to perform SQuAD 2.0 style why-question answering (why-QA) on clinical notes. The evaluation focused on: (1) comparing the merits from different training data and (2) error analysis. Results: The best model achieved an accuracy of 0.707 (or 0.760 by partial match). Training toward customization for the clinical language helped increase 6% in accuracy. Discussion: The error analysis suggested that the model did not really perform deep reasoning and that clinical why-QA might warrant more sophisticated solutions. Conclusion: The BERT model achieved moderate accuracy in clinical why-QA and should benefit from the rapidly evolving technology. Despite the identified limitations, it could serve as a competent proxy for questiondriven clinical information extraction.

AB - Objectives: To adapt and evaluate a deep learning language model for answering why-questions based on patient-specific clinical text. Materials and Methods: Bidirectional encoder representations from transformers (BERT) models were trained with varying data sources to perform SQuAD 2.0 style why-question answering (why-QA) on clinical notes. The evaluation focused on: (1) comparing the merits from different training data and (2) error analysis. Results: The best model achieved an accuracy of 0.707 (or 0.760 by partial match). Training toward customization for the clinical language helped increase 6% in accuracy. Discussion: The error analysis suggested that the model did not really perform deep reasoning and that clinical why-QA might warrant more sophisticated solutions. Conclusion: The BERT model achieved moderate accuracy in clinical why-QA and should benefit from the rapidly evolving technology. Despite the identified limitations, it could serve as a competent proxy for questiondriven clinical information extraction.

KW - Artificial intelligence

KW - Clinical decision-making

KW - Evaluation studies

KW - Natural language processing

KW - Question answering

UR - http://www.scopus.com/inward/record.url?scp=85100647550&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85100647550&partnerID=8YFLogxK

U2 - 10.1093/JAMIAOPEN/OOZ072

DO - 10.1093/JAMIAOPEN/OOZ072

M3 - Article

AN - SCOPUS:85100647550

SN - 2574-2531

VL - 3

SP - 16

EP - 20

JO - JAMIA Open

JF - JAMIA Open

IS - 1

ER -

Adapting and evaluating a deep learning language model for clinical why-question answering

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this