Deep neural network to locate and segment brain tumors outperformed the expert technicians who created the training data

Joseph Ross Mitchell; Konstantinos Kamnitsas; Kyle W. Singleton; Scott A. Whitmire; Kamala R. Clark-Swanson; Sara Ranjbar; Cassandra R. Rickertsen; Sandra K. Johnston; Kathleen M. Egan; Dana E. Rollison; John Arrington; Karl N. Krecke; Theodore J. Passe; Jared T. Verdoorn; Alex A. Nagelschneider; Carrie M. Carr; John D. Port; Alice Patton; Norbert G. Campeau; Greta B. Liebo; Laurence J. Eckel; Christopher P. Wood; Christopher H. Hunt; Prasanna Vibhute; Kent D. Nelson; Joseph M. Hoxworth; Ameet C. Patel; Brian W. Chong; Jeffrey S. Ross; Jerrold L. Boxerman; Michael A. Vogelbaum; Leland S. Hu; Ben Glocker; Kristin R. Swanson

doi:10.1117/1.JMI.7.5.055501

Deep neural network to locate and segment brain tumors outperformed the expert technicians who created the training data

Joseph Ross Mitchell, Konstantinos Kamnitsas, Kyle W. Singleton, Scott A. Whitmire, Kamala R. Clark-Swanson, Sara Ranjbar, Cassandra R. Rickertsen, Sandra K. Johnston, Kathleen M. Egan, Dana E. Rollison, John Arrington, Karl N. Krecke, Theodore J. Passe, Jared T. Verdoorn, Alex A. Nagelschneider, Carrie M. Carr, John D. Port, Alice Patton, Norbert G. Campeau, Greta B. LieboLaurence J. Eckel, Christopher P. Wood, Christopher H. Hunt, Prasanna Vibhute, Kent D. Nelson, Joseph M. Hoxworth, Ameet C. Patel, Brian W. Chong, Jeffrey S. Ross, Jerrold L. Boxerman, Michael A. Vogelbaum, Leland S. Hu, Ben Glocker, Kristin R. Swanson

Research output: Contribution to journal › Article › peer-review

2 Scopus citations

Abstract

Purpose: Deep learning (DL) algorithms have shown promising results for brain tumor segmentation in MRI. However, validation is required prior to routine clinical use. We report the first randomized and blinded comparison of DL and trained technician segmentations. Approach: We compiled a multi-institutional database of 741 pretreatment MRI exams. Each contained a postcontrast T1-weighted exam, a T2-weighted fluid-attenuated inversion recovery exam, and at least one technician-derived tumor segmentation. The database included 729 unique patients (470 males and 259 females). Of these exams, 641 were used for training the DL system, and 100 were reserved for testing. We developed a platform to enable qualitative, blinded, controlled assessment of lesion segmentations made by technicians and the DL method. On this platform, 20 neuroradiologists performed 400 side-by-side comparisons of segmentations on 100 test cases. They scored each segmentation between 0 (poor) and 10 (perfect). Agreement between segmentations from technicians and the DL method was also evaluated quantitatively using the Dice coefficient, which produces values between 0 (no overlap) and 1 (perfect overlap). Results: The neuroradiologists gave technician and DL segmentations mean scores of 6.97 and 7.31, respectively (p < 0.00007). The DL method achieved a mean Dice coefficient of 0.87 on the test cases. Conclusions: This was the first objective comparison of automated and human segmentation using a blinded controlled assessment study. Our DL system learned to outperform its “human teachers” and produced output that was better, on average, than its training data.

Original language	English (US)
Article number	055501-1
Journal	Journal of Medical Imaging
Volume	7
Issue number	5
DOIs	https://doi.org/10.1117/1.JMI.7.5.055501
State	Published - Sep 1 2020

Keywords

Brain tumors
Deep learning
Observer studies
Segmentation
Validation

ASJC Scopus subject areas

Radiology Nuclear Medicine and imaging

Access to Document

10.1117/1.JMI.7.5.055501

Cite this

Mitchell, J. R., Kamnitsas, K., Singleton, K. W., Whitmire, S. A., Clark-Swanson, K. R., Ranjbar, S., Rickertsen, C. R., Johnston, S. K., Egan, K. M., Rollison, D. E., Arrington, J., Krecke, K. N., Passe, T. J., Verdoorn, J. T., Nagelschneider, A. A., Carr, C. M., Port, J. D., Patton, A., Campeau, N. G., ... Swanson, K. R. (2020). Deep neural network to locate and segment brain tumors outperformed the expert technicians who created the training data. Journal of Medical Imaging, 7(5), Article 055501-1. https://doi.org/10.1117/1.JMI.7.5.055501

Mitchell, JR, Kamnitsas, K, Singleton, KW, Whitmire, SA, Clark-Swanson, KR, Ranjbar, S, Rickertsen, CR, Johnston, SK, Egan, KM, Rollison, DE, Arrington, J, Krecke, KN, Passe, TJ, Verdoorn, JT, Nagelschneider, AA, Carr, CM , Port, JD, Patton, A, Campeau, NG, Liebo, GB, Eckel, LJ, Wood, CP, Hunt, CH, Vibhute, P, Nelson, KD, Hoxworth, JM, Patel, AC, Chong, BW, Ross, JS, Boxerman, JL, Vogelbaum, MA, Hu, LS, Glocker, B & Swanson, KR 2020, 'Deep neural network to locate and segment brain tumors outperformed the expert technicians who created the training data', Journal of Medical Imaging, vol. 7, no. 5, 055501-1. https://doi.org/10.1117/1.JMI.7.5.055501

@article{db75b054377449ee91495a416da631ef,

title = "Deep neural network to locate and segment brain tumors outperformed the expert technicians who created the training data",

abstract = "Purpose: Deep learning (DL) algorithms have shown promising results for brain tumor segmentation in MRI. However, validation is required prior to routine clinical use. We report the first randomized and blinded comparison of DL and trained technician segmentations. Approach: We compiled a multi-institutional database of 741 pretreatment MRI exams. Each contained a postcontrast T1-weighted exam, a T2-weighted fluid-attenuated inversion recovery exam, and at least one technician-derived tumor segmentation. The database included 729 unique patients (470 males and 259 females). Of these exams, 641 were used for training the DL system, and 100 were reserved for testing. We developed a platform to enable qualitative, blinded, controlled assessment of lesion segmentations made by technicians and the DL method. On this platform, 20 neuroradiologists performed 400 side-by-side comparisons of segmentations on 100 test cases. They scored each segmentation between 0 (poor) and 10 (perfect). Agreement between segmentations from technicians and the DL method was also evaluated quantitatively using the Dice coefficient, which produces values between 0 (no overlap) and 1 (perfect overlap). Results: The neuroradiologists gave technician and DL segmentations mean scores of 6.97 and 7.31, respectively (p < 0.00007). The DL method achieved a mean Dice coefficient of 0.87 on the test cases. Conclusions: This was the first objective comparison of automated and human segmentation using a blinded controlled assessment study. Our DL system learned to outperform its “human teachers” and produced output that was better, on average, than its training data.",

keywords = "Brain tumors, Deep learning, Observer studies, Segmentation, Validation",

author = "Mitchell, {Joseph Ross} and Konstantinos Kamnitsas and Singleton, {Kyle W.} and Whitmire, {Scott A.} and Clark-Swanson, {Kamala R.} and Sara Ranjbar and Rickertsen, {Cassandra R.} and Johnston, {Sandra K.} and Egan, {Kathleen M.} and Rollison, {Dana E.} and John Arrington and Krecke, {Karl N.} and Passe, {Theodore J.} and Verdoorn, {Jared T.} and Nagelschneider, {Alex A.} and Carr, {Carrie M.} and Port, {John D.} and Alice Patton and Campeau, {Norbert G.} and Liebo, {Greta B.} and Eckel, {Laurence J.} and Wood, {Christopher P.} and Hunt, {Christopher H.} and Prasanna Vibhute and Nelson, {Kent D.} and Hoxworth, {Joseph M.} and Patel, {Ameet C.} and Chong, {Brian W.} and Ross, {Jeffrey S.} and Boxerman, {Jerrold L.} and Vogelbaum, {Michael A.} and Hu, {Leland S.} and Ben Glocker and Swanson, {Kristin R.}",

note = "Publisher Copyright: {\textcopyright} The Authors. Published by SPIE under a Creative Commons Attribution 4.0 Unported License.",

year = "2020",

month = sep,

day = "1",

doi = "10.1117/1.JMI.7.5.055501",

language = "English (US)",

volume = "7",

journal = "Journal of Medical Imaging",

issn = "2329-4302",

publisher = "Elsevier Ireland Ltd",

number = "5",

}

TY - JOUR

T1 - Deep neural network to locate and segment brain tumors outperformed the expert technicians who created the training data

AU - Mitchell, Joseph Ross

AU - Kamnitsas, Konstantinos

AU - Singleton, Kyle W.

AU - Whitmire, Scott A.

AU - Clark-Swanson, Kamala R.

AU - Ranjbar, Sara

AU - Rickertsen, Cassandra R.

AU - Johnston, Sandra K.

AU - Egan, Kathleen M.

AU - Rollison, Dana E.

AU - Arrington, John

AU - Krecke, Karl N.

AU - Passe, Theodore J.

AU - Verdoorn, Jared T.

AU - Nagelschneider, Alex A.

AU - Carr, Carrie M.

AU - Port, John D.

AU - Patton, Alice

AU - Campeau, Norbert G.

AU - Liebo, Greta B.

AU - Eckel, Laurence J.

AU - Wood, Christopher P.

AU - Hunt, Christopher H.

AU - Vibhute, Prasanna

AU - Nelson, Kent D.

AU - Hoxworth, Joseph M.

AU - Patel, Ameet C.

AU - Chong, Brian W.

AU - Ross, Jeffrey S.

AU - Boxerman, Jerrold L.

AU - Vogelbaum, Michael A.

AU - Hu, Leland S.

AU - Glocker, Ben

AU - Swanson, Kristin R.

PY - 2020/9/1

Y1 - 2020/9/1

N2 - Purpose: Deep learning (DL) algorithms have shown promising results for brain tumor segmentation in MRI. However, validation is required prior to routine clinical use. We report the first randomized and blinded comparison of DL and trained technician segmentations. Approach: We compiled a multi-institutional database of 741 pretreatment MRI exams. Each contained a postcontrast T1-weighted exam, a T2-weighted fluid-attenuated inversion recovery exam, and at least one technician-derived tumor segmentation. The database included 729 unique patients (470 males and 259 females). Of these exams, 641 were used for training the DL system, and 100 were reserved for testing. We developed a platform to enable qualitative, blinded, controlled assessment of lesion segmentations made by technicians and the DL method. On this platform, 20 neuroradiologists performed 400 side-by-side comparisons of segmentations on 100 test cases. They scored each segmentation between 0 (poor) and 10 (perfect). Agreement between segmentations from technicians and the DL method was also evaluated quantitatively using the Dice coefficient, which produces values between 0 (no overlap) and 1 (perfect overlap). Results: The neuroradiologists gave technician and DL segmentations mean scores of 6.97 and 7.31, respectively (p < 0.00007). The DL method achieved a mean Dice coefficient of 0.87 on the test cases. Conclusions: This was the first objective comparison of automated and human segmentation using a blinded controlled assessment study. Our DL system learned to outperform its “human teachers” and produced output that was better, on average, than its training data.

AB - Purpose: Deep learning (DL) algorithms have shown promising results for brain tumor segmentation in MRI. However, validation is required prior to routine clinical use. We report the first randomized and blinded comparison of DL and trained technician segmentations. Approach: We compiled a multi-institutional database of 741 pretreatment MRI exams. Each contained a postcontrast T1-weighted exam, a T2-weighted fluid-attenuated inversion recovery exam, and at least one technician-derived tumor segmentation. The database included 729 unique patients (470 males and 259 females). Of these exams, 641 were used for training the DL system, and 100 were reserved for testing. We developed a platform to enable qualitative, blinded, controlled assessment of lesion segmentations made by technicians and the DL method. On this platform, 20 neuroradiologists performed 400 side-by-side comparisons of segmentations on 100 test cases. They scored each segmentation between 0 (poor) and 10 (perfect). Agreement between segmentations from technicians and the DL method was also evaluated quantitatively using the Dice coefficient, which produces values between 0 (no overlap) and 1 (perfect overlap). Results: The neuroradiologists gave technician and DL segmentations mean scores of 6.97 and 7.31, respectively (p < 0.00007). The DL method achieved a mean Dice coefficient of 0.87 on the test cases. Conclusions: This was the first objective comparison of automated and human segmentation using a blinded controlled assessment study. Our DL system learned to outperform its “human teachers” and produced output that was better, on average, than its training data.

KW - Brain tumors

KW - Deep learning

KW - Observer studies

KW - Segmentation

KW - Validation

UR - http://www.scopus.com/inward/record.url?scp=85096707928&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85096707928&partnerID=8YFLogxK

U2 - 10.1117/1.JMI.7.5.055501

DO - 10.1117/1.JMI.7.5.055501

M3 - Article

AN - SCOPUS:85096707928

SN - 2329-4302

VL - 7

JO - Journal of Medical Imaging

JF - Journal of Medical Imaging

IS - 5

M1 - 055501-1

ER -

Deep neural network to locate and segment brain tumors outperformed the expert technicians who created the training data

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this