Comparing deep learning-based automatic segmentation of breast masses to expert interobserver variability in ultrasound imaging

Jeremy M. Webb; Shaheeda A. Adusei; Yinong Wang; Naziya Samreen; Kalie Adler; Duane D. Meixner; Robert T. Fazzio; Mostafa Fatemi; Azra Alizad

doi:10.1016/j.compbiomed.2021.104966

Comparing deep learning-based automatic segmentation of breast masses to expert interobserver variability in ultrasound imaging

Jeremy M. Webb, Shaheeda A. Adusei, Yinong Wang, Naziya Samreen, Kalie Adler, Duane D. Meixner, Robert T. Fazzio, Mostafa Fatemi, Azra Alizad

Physiology & Biomedical Engineering

Research output: Contribution to journal › Article › peer-review

Abstract

Deep learning is a powerful tool that became practical in 2008, harnessing the power of Graphic Processing Unites, and has developed rapidly in image, video, and natural language processing. There are ongoing developments in the application of deep learning to medical data for a variety of tasks across multiple imaging modalities. The reliability and repeatability of deep learning techniques are of utmost importance if deep learning can be considered a tool for assisting experts, including physicians, radiologists, and sonographers. Owing to the high costs of labeling data, deep learning models are often evaluated against one expert, and it is unknown if any errors fall within a clinically acceptable range. Ultrasound is a commonly used imaging modality for breast cancer screening processes and for visually estimating risk using the Breast Imaging Reporting and Data System score. This process is highly dependent on the skills and experience of the sonographers and radiologists, thereby leading to interobserver variability and interpretation. For these reasons, we propose an interobserver reliability study comparing the performance of a current top-performing deep learning segmentation model against three experts who manually segmented suspicious breast lesions in clinical ultrasound (US) images. We pretrained the model using a US thyroid segmentation dataset with 455 patients and 50,993 images, and trained the model using a US breast segmentation dataset with 733 patients and 29,884 images. We found a mean Fleiss kappa value of 0.78 for the performance of three experts in breast mass segmentation compared to a mean Fleiss kappa value of 0.79 for the performance of experts and the optimized deep learning model.

Original language	English (US)
Article number	104966
Journal	Computers in Biology and Medicine
Volume	139
DOIs	https://doi.org/10.1016/j.compbiomed.2021.104966
State	Published - Dec 2021

Keywords

Automatic segmentation
Breast cancer
Deep leaning
Interobserver variability
Ultrasound

ASJC Scopus subject areas

Health Informatics
Computer Science Applications

Access to Document

10.1016/j.compbiomed.2021.104966

Cite this

@article{2a39ddd25b28433d991a35dfb2543bc5,

title = "Comparing deep learning-based automatic segmentation of breast masses to expert interobserver variability in ultrasound imaging",

abstract = "Deep learning is a powerful tool that became practical in 2008, harnessing the power of Graphic Processing Unites, and has developed rapidly in image, video, and natural language processing. There are ongoing developments in the application of deep learning to medical data for a variety of tasks across multiple imaging modalities. The reliability and repeatability of deep learning techniques are of utmost importance if deep learning can be considered a tool for assisting experts, including physicians, radiologists, and sonographers. Owing to the high costs of labeling data, deep learning models are often evaluated against one expert, and it is unknown if any errors fall within a clinically acceptable range. Ultrasound is a commonly used imaging modality for breast cancer screening processes and for visually estimating risk using the Breast Imaging Reporting and Data System score. This process is highly dependent on the skills and experience of the sonographers and radiologists, thereby leading to interobserver variability and interpretation. For these reasons, we propose an interobserver reliability study comparing the performance of a current top-performing deep learning segmentation model against three experts who manually segmented suspicious breast lesions in clinical ultrasound (US) images. We pretrained the model using a US thyroid segmentation dataset with 455 patients and 50,993 images, and trained the model using a US breast segmentation dataset with 733 patients and 29,884 images. We found a mean Fleiss kappa value of 0.78 for the performance of three experts in breast mass segmentation compared to a mean Fleiss kappa value of 0.79 for the performance of experts and the optimized deep learning model.",

keywords = "Automatic segmentation, Breast cancer, Deep leaning, Interobserver variability, Ultrasound",

author = "Webb, {Jeremy M.} and Adusei, {Shaheeda A.} and Yinong Wang and Naziya Samreen and Kalie Adler and Meixner, {Duane D.} and Fazzio, {Robert T.} and Mostafa Fatemi and Azra Alizad",

note = "Publisher Copyright: {\textcopyright} 2021 The Authors",

year = "2021",

month = dec,

doi = "10.1016/j.compbiomed.2021.104966",

language = "English (US)",

volume = "139",

journal = "Computers in Biology and Medicine",

issn = "0010-4825",

publisher = "Elsevier Limited",

}

TY - JOUR

T1 - Comparing deep learning-based automatic segmentation of breast masses to expert interobserver variability in ultrasound imaging

AU - Webb, Jeremy M.

AU - Adusei, Shaheeda A.

AU - Wang, Yinong

AU - Samreen, Naziya

AU - Adler, Kalie

AU - Meixner, Duane D.

AU - Fazzio, Robert T.

AU - Fatemi, Mostafa

AU - Alizad, Azra

PY - 2021/12

Y1 - 2021/12

N2 - Deep learning is a powerful tool that became practical in 2008, harnessing the power of Graphic Processing Unites, and has developed rapidly in image, video, and natural language processing. There are ongoing developments in the application of deep learning to medical data for a variety of tasks across multiple imaging modalities. The reliability and repeatability of deep learning techniques are of utmost importance if deep learning can be considered a tool for assisting experts, including physicians, radiologists, and sonographers. Owing to the high costs of labeling data, deep learning models are often evaluated against one expert, and it is unknown if any errors fall within a clinically acceptable range. Ultrasound is a commonly used imaging modality for breast cancer screening processes and for visually estimating risk using the Breast Imaging Reporting and Data System score. This process is highly dependent on the skills and experience of the sonographers and radiologists, thereby leading to interobserver variability and interpretation. For these reasons, we propose an interobserver reliability study comparing the performance of a current top-performing deep learning segmentation model against three experts who manually segmented suspicious breast lesions in clinical ultrasound (US) images. We pretrained the model using a US thyroid segmentation dataset with 455 patients and 50,993 images, and trained the model using a US breast segmentation dataset with 733 patients and 29,884 images. We found a mean Fleiss kappa value of 0.78 for the performance of three experts in breast mass segmentation compared to a mean Fleiss kappa value of 0.79 for the performance of experts and the optimized deep learning model.

AB - Deep learning is a powerful tool that became practical in 2008, harnessing the power of Graphic Processing Unites, and has developed rapidly in image, video, and natural language processing. There are ongoing developments in the application of deep learning to medical data for a variety of tasks across multiple imaging modalities. The reliability and repeatability of deep learning techniques are of utmost importance if deep learning can be considered a tool for assisting experts, including physicians, radiologists, and sonographers. Owing to the high costs of labeling data, deep learning models are often evaluated against one expert, and it is unknown if any errors fall within a clinically acceptable range. Ultrasound is a commonly used imaging modality for breast cancer screening processes and for visually estimating risk using the Breast Imaging Reporting and Data System score. This process is highly dependent on the skills and experience of the sonographers and radiologists, thereby leading to interobserver variability and interpretation. For these reasons, we propose an interobserver reliability study comparing the performance of a current top-performing deep learning segmentation model against three experts who manually segmented suspicious breast lesions in clinical ultrasound (US) images. We pretrained the model using a US thyroid segmentation dataset with 455 patients and 50,993 images, and trained the model using a US breast segmentation dataset with 733 patients and 29,884 images. We found a mean Fleiss kappa value of 0.78 for the performance of three experts in breast mass segmentation compared to a mean Fleiss kappa value of 0.79 for the performance of experts and the optimized deep learning model.

KW - Automatic segmentation

KW - Breast cancer

KW - Deep leaning

KW - Interobserver variability

KW - Ultrasound

UR - http://www.scopus.com/inward/record.url?scp=85117788600&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85117788600&partnerID=8YFLogxK

U2 - 10.1016/j.compbiomed.2021.104966

DO - 10.1016/j.compbiomed.2021.104966

M3 - Article

C2 - 34715553

AN - SCOPUS:85117788600

SN - 0010-4825

VL - 139

JO - Computers in Biology and Medicine

JF - Computers in Biology and Medicine

M1 - 104966

ER -

Comparing deep learning-based automatic segmentation of breast masses to expert interobserver variability in ultrasound imaging

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this