Abstract
Objectives: To adapt and evaluate a deep learning language model for answering why-questions based on patient-specific clinical text. Materials and Methods: Bidirectional encoder representations from transformers (BERT) models were trained with varying data sources to perform SQuAD 2.0 style why-question answering (why-QA) on clinical notes. The evaluation focused on: (1) comparing the merits from different training data and (2) error analysis. Results: The best model achieved an accuracy of 0.707 (or 0.760 by partial match). Training toward customization for the clinical language helped increase 6% in accuracy. Discussion: The error analysis suggested that the model did not really perform deep reasoning and that clinical why-QA might warrant more sophisticated solutions. Conclusion: The BERT model achieved moderate accuracy in clinical why-QA and should benefit from the rapidly evolving technology. Despite the identified limitations, it could serve as a competent proxy for questiondriven clinical information extraction.
Original language | English (US) |
---|---|
Pages (from-to) | 16-20 |
Number of pages | 5 |
Journal | JAMIA Open |
Volume | 3 |
Issue number | 1 |
DOIs | |
State | Published - 2021 |
Keywords
- Artificial intelligence
- Clinical decision-making
- Evaluation studies
- Natural language processing
- Question answering
ASJC Scopus subject areas
- Health Informatics