Analysis of Language Embeddings for Classification of Unstructured Pathology Reports

Aishwarya Krishna Allada, Yuanxin Wang, Veni Jindal, Morteza Babee, H. R. Tizhoosh, Mark Crowley

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

A pathology report is one of the most significant medical documents providing interpretive insights into the visual appearance of the patient's biopsy sample. In digital pathology, high-resolution images of tissue samples are stored along with pathology reports. Despite the valuable information that pathology reports hold, they are not used in any systematic manner to promote computational pathology. In this work, we focus on analyzing the reports, which are generally unstructured documents written in English with sophisticated and highly specialized medical terminology. We provide a comparative analysis of various embedding models like BioBERT, Clinical BioBERT, BioMed-RoBERTa and Term Frequency-Inverse Document Frequency (TF-IDF), a traditional NLP technique, as well as the combination of embeddings from pre-trained models with TF-IDF. Our results demonstrate the effectiveness of various word embedding techniques for pathology reports.

Original languageEnglish (US)
Title of host publication43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2378-2381
Number of pages4
ISBN (Electronic)9781728111797
DOIs
StatePublished - 2021
Event43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2021 - Virtual, Online, Mexico
Duration: Nov 1 2021Nov 5 2021

Publication series

NameProceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS
ISSN (Print)1557-170X

Conference

Conference43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2021
Country/TerritoryMexico
CityVirtual, Online
Period11/1/2111/5/21

ASJC Scopus subject areas

  • Signal Processing
  • Biomedical Engineering
  • Computer Vision and Pattern Recognition
  • Health Informatics

Fingerprint

Dive into the research topics of 'Analysis of Language Embeddings for Classification of Unstructured Pathology Reports'. Together they form a unique fingerprint.

Cite this