TY - GEN
T1 - Analysis of Language Embeddings for Classification of Unstructured Pathology Reports
AU - Allada, Aishwarya Krishna
AU - Wang, Yuanxin
AU - Jindal, Veni
AU - Babee, Morteza
AU - Tizhoosh, H. R.
AU - Crowley, Mark
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - A pathology report is one of the most significant medical documents providing interpretive insights into the visual appearance of the patient's biopsy sample. In digital pathology, high-resolution images of tissue samples are stored along with pathology reports. Despite the valuable information that pathology reports hold, they are not used in any systematic manner to promote computational pathology. In this work, we focus on analyzing the reports, which are generally unstructured documents written in English with sophisticated and highly specialized medical terminology. We provide a comparative analysis of various embedding models like BioBERT, Clinical BioBERT, BioMed-RoBERTa and Term Frequency-Inverse Document Frequency (TF-IDF), a traditional NLP technique, as well as the combination of embeddings from pre-trained models with TF-IDF. Our results demonstrate the effectiveness of various word embedding techniques for pathology reports.
AB - A pathology report is one of the most significant medical documents providing interpretive insights into the visual appearance of the patient's biopsy sample. In digital pathology, high-resolution images of tissue samples are stored along with pathology reports. Despite the valuable information that pathology reports hold, they are not used in any systematic manner to promote computational pathology. In this work, we focus on analyzing the reports, which are generally unstructured documents written in English with sophisticated and highly specialized medical terminology. We provide a comparative analysis of various embedding models like BioBERT, Clinical BioBERT, BioMed-RoBERTa and Term Frequency-Inverse Document Frequency (TF-IDF), a traditional NLP technique, as well as the combination of embeddings from pre-trained models with TF-IDF. Our results demonstrate the effectiveness of various word embedding techniques for pathology reports.
UR - http://www.scopus.com/inward/record.url?scp=85122538999&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85122538999&partnerID=8YFLogxK
U2 - 10.1109/EMBC46164.2021.9630347
DO - 10.1109/EMBC46164.2021.9630347
M3 - Conference contribution
C2 - 34891760
AN - SCOPUS:85122538999
T3 - Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS
SP - 2378
EP - 2381
BT - 43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2021
Y2 - 1 November 2021 through 5 November 2021
ER -