The aim of this study is to develop an automated classification method for Brain Tumor Reporting and Data System (BT-RADS) categories from unstructured and structured brain magnetic resonance imaging (MR) reports. This retrospective study included 1410 BT-RADS structured reports dated from January 2014 to December 2017 and a test set of 109 unstructured brain MR reports dated from January 2010 to December 2014. Text vector representations and semantic word embeddings were generated from individual report sections (i.e., “History,” “Findings,” etc.) using Tf-idf statistics and a fine-tuned word2vec model, respectively. Section-wise ensemble models were trained using gradient boosting (XGBoost), elastic net regularization, and random forests, and classification accuracy was evaluated on an independent test set of unstructured brain MR reports and a validation set of BT-RADS structured reports. Section-wise ensemble models using XGBoost and word2vec semantic word embeddings were more accurate than those using Tf-idf statistics when classifying unstructured reports, with an f1 score of 0.72. In contrast, models using traditional Tf-idf statistics outperformed the word2vec semantic approach for categorization from structured reports, with an f1 score of 0.98. Proposed natural language processing pipeline is capable of inferring BT-RADS report scores from unstructured reports after training on structured report data. Our study provides a detailed experimentation process and may provide guidance for the development of RADS-focused information extraction (IE) applications from structured and unstructured radiology reports.
- Deep learning
- Distributional semantics
- Text mining
ASJC Scopus subject areas
- Radiological and Ultrasound Technology
- Radiology Nuclear Medicine and imaging
- Computer Science Applications