Purpose: Conventional model observers (MO) in CT are often limited to a uniform background or varying background that is random and can be modeled in an analytical form. It is unclear if these conventional MOs can be readily generalized to predict human observer performance in clinical CT tasks that involve realistic anatomical background. Deep-learning-based model observers (DL-MO) have recently been developed, but have not been validated for challenging low contrast diagnostic tasks in abdominal CT. We consequently sought to validate a DL-MO for a low-contrast hepatic metastases localization task. Methods: We adapted our recently developed DL-MO framework for the liver metastases localization task. Our previously-validated projection-domain lesion-/noise-insertion techniques were used to synthesize realistic positive and low-dose abdominal CT exams, using the archived patient projection data. Ten experimental conditions were generated, which involved different lesion sizes/contrasts, radiation dose levels, and image reconstruction types. Each condition included 100 trials generated from a patient cohort of 7 cases. Each trial was presented as liver image patches (160×160×5 voxels). The DL-MO performance was calculated for each condition and was compared with human observer performance, which was obtained by three sub-specialized radiologists in an observer study. The performance of DL-MO and radiologists was gauged by the area under localization receiver-operating-characteristic curves. The generalization performance of the DL-MO was estimated with the repeated twofold cross-validation method over the same set of trials used in the human observer study. A multi-slice Channelized Hoteling Observers (CHO) was compared with the DL-MO across the same experimental conditions. Results: The performance of DL-MO was highly correlated to that of radiologists (Pearson's correlation coefficient: 0.987; 95% CI: [0.942, 0.997]). The performance level of DL-MO was comparable to that of the grouped radiologists, that is, the mean performance difference was -3.3%. The CHO performance was poorer than the grouped radiologist performance, before internal noise could be added. The correlation between CHO and radiologists was weaker (Pearson's correlation coefficient: 0.812, and 95% CI: [0.378, 0.955]), and the corresponding performance bias (-29.5%) was statistically significant. Conclusion: The presented study demonstrated the potential of using the DL-MO for image quality assessment in patient abdominal CT tasks.
ASJC Scopus subject areas
- Radiology Nuclear Medicine and imaging