Deep learning predicts interval and screening-detected cancer from screening mammograms: A case-case-control study in 6369 women

Xun Zhu; Thomas K. Wolfgruber; Lambert Leong; Matthew Jensen; Christopher Scott; Stacey Winham; Peter Sadowski; Celine Vachon; Karla Kerlikowske; John A. Shepherd

doi:10.1148/radiol.2021203758

Deep learning predicts interval and screening-detected cancer from screening mammograms: A case-case-control study in 6369 women

Xun Zhu, Thomas K. Wolfgruber, Lambert Leong, Matthew Jensen, Christopher Scott, Stacey Winham, Peter Sadowski, Celine Vachon, Karla Kerlikowske, John A. Shepherd

Quantitative Health Sciences

Research output: Contribution to journal › Article › peer-review

Abstract

Background: The ability of deep learning (DL) models to classify women as at risk for either screening mammography-detected or interval cancer (not detected at mammography) has not yet been explored in the literature. Purpose: To examine the ability of DL models to estimate the risk of interval and screening-detected breast cancers with and without clinical risk factors. Materials and Methods: This study was performed on 25 096 digital screening mammograms obtained from January 2006 to December 2013. The mammograms were obtained in 6369 women without breast cancer, 1609 of whom developed screeningdetected breast cancer and 351 of whom developed interval invasive breast cancer. A DL model was trained on the negative mammograms to classify women into those who did not develop cancer and those who developed screening-detected cancer or interval invasive cancer. Model effectiveness was evaluated as a matched concordance statistic (C statistic) in a held-out 26% (1669 of 6369) test set of the mammograms. Results: The C statistics and odds ratios for comparing patients with screening-detected cancer versus matched controls were 0.66 (95% CI: 0.63, 0.69) and 1.25 (95% CI: 1.17, 1.33), respectively, for the DL model, 0.62 (95% CI: 0.59, 0.65) and 2.14 (95% CI: 1.32, 3.45) for the clinical risk factors with the Breast Imaging Reporting and Data System (BI-RADS) density model, and 0.66 (95% CI: 0.63, 0.69) and 1.21 (95% CI: 1.13, 1.30) for the combined DL and clinical risk factors model. For comparing patients with interval cancer versus controls, the C statistics and odds ratios were 0.64 (95% CI: 0.58, 0.71) and 1.26 (95% CI: 1.10, 1.45), respectively, for the DL model, 0.71 (95% CI: 0.65, 0.77) and 7.25 (95% CI: 2.94, 17.9) for the risk factors with BIRADS density (b rated vs non-b rated) model, and 0.72 (95% CI: 0.66, 0.78) and 1.10 (95% CI: 0.94, 1.29) for the combined DL and clinical risk factors model. The P values between the DL, BI-RADS, and combined model's ability to detect screen and interval cancer were .99, .002, and .03, respectively. Conclusion: The deep learning model outperformed in determining screening-detected cancer risk but underperformed for interval cancer risk when compared with clinical risk factors including breast density.

Original language	English (US)
Pages (from-to)	550-558
Number of pages	9
Journal	Radiology
Volume	301
Issue number	3
DOIs	https://doi.org/10.1148/radiol.2021203758
State	Published - Dec 2021

ASJC Scopus subject areas

Radiology Nuclear Medicine and imaging

Access to Document

10.1148/radiol.2021203758

Cite this

@article{face80b0a3424ae986e9596c370849e4,

title = "Deep learning predicts interval and screening-detected cancer from screening mammograms: A case-case-control study in 6369 women",

abstract = "Background: The ability of deep learning (DL) models to classify women as at risk for either screening mammography-detected or interval cancer (not detected at mammography) has not yet been explored in the literature. Purpose: To examine the ability of DL models to estimate the risk of interval and screening-detected breast cancers with and without clinical risk factors. Materials and Methods: This study was performed on 25 096 digital screening mammograms obtained from January 2006 to December 2013. The mammograms were obtained in 6369 women without breast cancer, 1609 of whom developed screeningdetected breast cancer and 351 of whom developed interval invasive breast cancer. A DL model was trained on the negative mammograms to classify women into those who did not develop cancer and those who developed screening-detected cancer or interval invasive cancer. Model effectiveness was evaluated as a matched concordance statistic (C statistic) in a held-out 26% (1669 of 6369) test set of the mammograms. Results: The C statistics and odds ratios for comparing patients with screening-detected cancer versus matched controls were 0.66 (95% CI: 0.63, 0.69) and 1.25 (95% CI: 1.17, 1.33), respectively, for the DL model, 0.62 (95% CI: 0.59, 0.65) and 2.14 (95% CI: 1.32, 3.45) for the clinical risk factors with the Breast Imaging Reporting and Data System (BI-RADS) density model, and 0.66 (95% CI: 0.63, 0.69) and 1.21 (95% CI: 1.13, 1.30) for the combined DL and clinical risk factors model. For comparing patients with interval cancer versus controls, the C statistics and odds ratios were 0.64 (95% CI: 0.58, 0.71) and 1.26 (95% CI: 1.10, 1.45), respectively, for the DL model, 0.71 (95% CI: 0.65, 0.77) and 7.25 (95% CI: 2.94, 17.9) for the risk factors with BIRADS density (b rated vs non-b rated) model, and 0.72 (95% CI: 0.66, 0.78) and 1.10 (95% CI: 0.94, 1.29) for the combined DL and clinical risk factors model. The P values between the DL, BI-RADS, and combined model's ability to detect screen and interval cancer were .99, .002, and .03, respectively. Conclusion: The deep learning model outperformed in determining screening-detected cancer risk but underperformed for interval cancer risk when compared with clinical risk factors including breast density.",

author = "Xun Zhu and Wolfgruber, {Thomas K.} and Lambert Leong and Matthew Jensen and Christopher Scott and Stacey Winham and Peter Sadowski and Celine Vachon and Karla Kerlikowske and Shepherd, {John A.}",

year = "2021",

month = dec,

doi = "10.1148/radiol.2021203758",

language = "English (US)",

volume = "301",

pages = "550--558",

journal = "Radiology",

issn = "0033-8419",

publisher = "Radiological Society of North America Inc.",

number = "3",

}

TY - JOUR

T1 - Deep learning predicts interval and screening-detected cancer from screening mammograms

T2 - A case-case-control study in 6369 women

AU - Zhu, Xun

AU - Wolfgruber, Thomas K.

AU - Leong, Lambert

AU - Jensen, Matthew

AU - Scott, Christopher

AU - Winham, Stacey

AU - Sadowski, Peter

AU - Vachon, Celine

AU - Kerlikowske, Karla

AU - Shepherd, John A.

PY - 2021/12

Y1 - 2021/12

N2 - Background: The ability of deep learning (DL) models to classify women as at risk for either screening mammography-detected or interval cancer (not detected at mammography) has not yet been explored in the literature. Purpose: To examine the ability of DL models to estimate the risk of interval and screening-detected breast cancers with and without clinical risk factors. Materials and Methods: This study was performed on 25 096 digital screening mammograms obtained from January 2006 to December 2013. The mammograms were obtained in 6369 women without breast cancer, 1609 of whom developed screeningdetected breast cancer and 351 of whom developed interval invasive breast cancer. A DL model was trained on the negative mammograms to classify women into those who did not develop cancer and those who developed screening-detected cancer or interval invasive cancer. Model effectiveness was evaluated as a matched concordance statistic (C statistic) in a held-out 26% (1669 of 6369) test set of the mammograms. Results: The C statistics and odds ratios for comparing patients with screening-detected cancer versus matched controls were 0.66 (95% CI: 0.63, 0.69) and 1.25 (95% CI: 1.17, 1.33), respectively, for the DL model, 0.62 (95% CI: 0.59, 0.65) and 2.14 (95% CI: 1.32, 3.45) for the clinical risk factors with the Breast Imaging Reporting and Data System (BI-RADS) density model, and 0.66 (95% CI: 0.63, 0.69) and 1.21 (95% CI: 1.13, 1.30) for the combined DL and clinical risk factors model. For comparing patients with interval cancer versus controls, the C statistics and odds ratios were 0.64 (95% CI: 0.58, 0.71) and 1.26 (95% CI: 1.10, 1.45), respectively, for the DL model, 0.71 (95% CI: 0.65, 0.77) and 7.25 (95% CI: 2.94, 17.9) for the risk factors with BIRADS density (b rated vs non-b rated) model, and 0.72 (95% CI: 0.66, 0.78) and 1.10 (95% CI: 0.94, 1.29) for the combined DL and clinical risk factors model. The P values between the DL, BI-RADS, and combined model's ability to detect screen and interval cancer were .99, .002, and .03, respectively. Conclusion: The deep learning model outperformed in determining screening-detected cancer risk but underperformed for interval cancer risk when compared with clinical risk factors including breast density.

AB - Background: The ability of deep learning (DL) models to classify women as at risk for either screening mammography-detected or interval cancer (not detected at mammography) has not yet been explored in the literature. Purpose: To examine the ability of DL models to estimate the risk of interval and screening-detected breast cancers with and without clinical risk factors. Materials and Methods: This study was performed on 25 096 digital screening mammograms obtained from January 2006 to December 2013. The mammograms were obtained in 6369 women without breast cancer, 1609 of whom developed screeningdetected breast cancer and 351 of whom developed interval invasive breast cancer. A DL model was trained on the negative mammograms to classify women into those who did not develop cancer and those who developed screening-detected cancer or interval invasive cancer. Model effectiveness was evaluated as a matched concordance statistic (C statistic) in a held-out 26% (1669 of 6369) test set of the mammograms. Results: The C statistics and odds ratios for comparing patients with screening-detected cancer versus matched controls were 0.66 (95% CI: 0.63, 0.69) and 1.25 (95% CI: 1.17, 1.33), respectively, for the DL model, 0.62 (95% CI: 0.59, 0.65) and 2.14 (95% CI: 1.32, 3.45) for the clinical risk factors with the Breast Imaging Reporting and Data System (BI-RADS) density model, and 0.66 (95% CI: 0.63, 0.69) and 1.21 (95% CI: 1.13, 1.30) for the combined DL and clinical risk factors model. For comparing patients with interval cancer versus controls, the C statistics and odds ratios were 0.64 (95% CI: 0.58, 0.71) and 1.26 (95% CI: 1.10, 1.45), respectively, for the DL model, 0.71 (95% CI: 0.65, 0.77) and 7.25 (95% CI: 2.94, 17.9) for the risk factors with BIRADS density (b rated vs non-b rated) model, and 0.72 (95% CI: 0.66, 0.78) and 1.10 (95% CI: 0.94, 1.29) for the combined DL and clinical risk factors model. The P values between the DL, BI-RADS, and combined model's ability to detect screen and interval cancer were .99, .002, and .03, respectively. Conclusion: The deep learning model outperformed in determining screening-detected cancer risk but underperformed for interval cancer risk when compared with clinical risk factors including breast density.

UR - http://www.scopus.com/inward/record.url?scp=85118476838&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85118476838&partnerID=8YFLogxK

U2 - 10.1148/radiol.2021203758

DO - 10.1148/radiol.2021203758

M3 - Article

C2 - 34491131

AN - SCOPUS:85118476838

SN - 0033-8419

VL - 301

SP - 550

EP - 558

JO - Radiology

JF - Radiology

IS - 3

ER -

Deep learning predicts interval and screening-detected cancer from screening mammograms: A case-case-control study in 6369 women

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this