A pathologist-annotated dataset for validating artificial intelligence: A project description and pilot study

Sarah N. Dudgeon; Si Wen; Matthew G. Hanna; Rajarsi Gupta; Mohamed Amgad; Manasi Sheth; Hetal Marble; Richard Huang; Markus D. Herrmann; Clifford H. Szu; Darick Tong; Bruce Werness; Evan Szu; Denis Larsimont; Anant Madabhushi; Evangelos Hytopoulos; Weijie Chen; Rajendra Singh; Steven N. Hart; Ashish Sharma; Joel Saltz; Roberto Salgado; Brandon D. Gallas

doi:10.4103/jpi.jpi-83-20

A pathologist-annotated dataset for validating artificial intelligence: A project description and pilot study

Sarah N. Dudgeon, Si Wen, Matthew G. Hanna, Rajarsi Gupta, Mohamed Amgad, Manasi Sheth, Hetal Marble, Richard Huang, Markus D. Herrmann, Clifford H. Szu, Darick Tong, Bruce Werness, Evan Szu, Denis Larsimont, Anant Madabhushi, Evangelos Hytopoulos, Weijie Chen, Rajendra Singh, Steven N. Hart, Ashish SharmaJoel Saltz, Roberto Salgado, Brandon D. Gallas

Quantitative Health Sciences

Research output: Contribution to journal › Article › peer-review

Abstract

Purpose: Validating artificial intelligence algorithms for clinical use in medical images is a challenging endeavor due to a lack of standard reference data (ground truth). This topic typically occupies a small portion of the discussion in research papers since most of the efforts are focused on developing novel algorithms. In this work, we present a collaboration to create a validation dataset of pathologist annotations for algorithms that process whole slide images. We focus on data collection and evaluation of algorithm performance in the context of estimating the density of stromal tumor-infiltrating lymphocytes (sTILs) in breast cancer. Methods: We digitized 64 glass slides of hematoxylin-and eosin-stained invasive ductal carcinoma core biopsies prepared at a single clinical site. A collaborating pathologist selected 10 regions of interest (ROIs) per slide for evaluation. We created training materials and workflows to crowdsource pathologist image annotations on two modes: an optical microscope and two digital platforms. The microscope platform allows the same ROIs to be evaluated in both modes. The workflows collect the ROI type, a decision on whether the ROI is appropriate for estimating the density of sTILs, and if appropriate, the sTIL density value for that ROI. Results: In total, 19 pathologists made 1645 ROI evaluations during a data collection event and the following 2 weeks. The pilot study yielded an abundant number of cases with nominal sTIL infiltration. Furthermore, we found that the sTIL densities are correlated within a case, and there is notable pathologist variability. Consequently, we outline plans to improve our ROI and case sampling methods. We also outline statistical methods to account for ROI correlations within a case and pathologist variability when validating an algorithm. Conclusion: We have built workflows for efficient data collection and tested them in a pilot study. As we prepare for pivotal studies, we will investigate methods to use the dataset as an external validation tool for algorithms. We will also consider what it will take for the dataset to be fit for a regulatory purpose: study size, patient population, and pathologist training and qualifications. To this end, we will elicit feedback from the Food and Drug Administration via the Medical Device Development Tool program and from the broader digital pathology and AI community. Ultimately, we intend to share the dataset, statistical methods, and lessons learned.

Original language	English (US)
Article number	330486
Journal	Journal of Pathology Informatics
Volume	12
Issue number	1
DOIs	https://doi.org/10.4103/jpi.jpi-83-20
State	Published - Jan 1 2021

Keywords

Artificial intelligence validation
medical image analysis
pathology
reference standard
tumor-infiltrating lymphocytes

ASJC Scopus subject areas

Pathology and Forensic Medicine
Health Informatics
Computer Science Applications

Access to Document

10.4103/jpi.jpi-83-20

Cite this

Dudgeon, S. N., Wen, S., Hanna, M. G., Gupta, R., Amgad, M., Sheth, M., Marble, H., Huang, R., Herrmann, M. D., Szu, C. H., Tong, D., Werness, B., Szu, E., Larsimont, D., Madabhushi, A., Hytopoulos, E., Chen, W., Singh, R., Hart, S. N., ... Gallas, B. D. (2021). A pathologist-annotated dataset for validating artificial intelligence: A project description and pilot study. Journal of Pathology Informatics, 12(1), Article 330486. https://doi.org/10.4103/jpi.jpi-83-20

Dudgeon, SN, Wen, S, Hanna, MG, Gupta, R, Amgad, M, Sheth, M, Marble, H, Huang, R, Herrmann, MD, Szu, CH, Tong, D, Werness, B, Szu, E, Larsimont, D, Madabhushi, A, Hytopoulos, E, Chen, W, Singh, R, Hart, SN, Sharma, A, Saltz, J, Salgado, R & Gallas, BD 2021, 'A pathologist-annotated dataset for validating artificial intelligence: A project description and pilot study', Journal of Pathology Informatics, vol. 12, no. 1, 330486. https://doi.org/10.4103/jpi.jpi-83-20

@article{10c5eba4eadf486da0607531d77e079d,

title = "A pathologist-annotated dataset for validating artificial intelligence: A project description and pilot study",

abstract = "Purpose: Validating artificial intelligence algorithms for clinical use in medical images is a challenging endeavor due to a lack of standard reference data (ground truth). This topic typically occupies a small portion of the discussion in research papers since most of the efforts are focused on developing novel algorithms. In this work, we present a collaboration to create a validation dataset of pathologist annotations for algorithms that process whole slide images. We focus on data collection and evaluation of algorithm performance in the context of estimating the density of stromal tumor-infiltrating lymphocytes (sTILs) in breast cancer. Methods: We digitized 64 glass slides of hematoxylin-and eosin-stained invasive ductal carcinoma core biopsies prepared at a single clinical site. A collaborating pathologist selected 10 regions of interest (ROIs) per slide for evaluation. We created training materials and workflows to crowdsource pathologist image annotations on two modes: an optical microscope and two digital platforms. The microscope platform allows the same ROIs to be evaluated in both modes. The workflows collect the ROI type, a decision on whether the ROI is appropriate for estimating the density of sTILs, and if appropriate, the sTIL density value for that ROI. Results: In total, 19 pathologists made 1645 ROI evaluations during a data collection event and the following 2 weeks. The pilot study yielded an abundant number of cases with nominal sTIL infiltration. Furthermore, we found that the sTIL densities are correlated within a case, and there is notable pathologist variability. Consequently, we outline plans to improve our ROI and case sampling methods. We also outline statistical methods to account for ROI correlations within a case and pathologist variability when validating an algorithm. Conclusion: We have built workflows for efficient data collection and tested them in a pilot study. As we prepare for pivotal studies, we will investigate methods to use the dataset as an external validation tool for algorithms. We will also consider what it will take for the dataset to be fit for a regulatory purpose: study size, patient population, and pathologist training and qualifications. To this end, we will elicit feedback from the Food and Drug Administration via the Medical Device Development Tool program and from the broader digital pathology and AI community. Ultimately, we intend to share the dataset, statistical methods, and lessons learned.",

keywords = "Artificial intelligence validation, medical image analysis, pathology, reference standard, tumor-infiltrating lymphocytes",

author = "Dudgeon, {Sarah N.} and Si Wen and Hanna, {Matthew G.} and Rajarsi Gupta and Mohamed Amgad and Manasi Sheth and Hetal Marble and Richard Huang and Herrmann, {Markus D.} and Szu, {Clifford H.} and Darick Tong and Bruce Werness and Evan Szu and Denis Larsimont and Anant Madabhushi and Evangelos Hytopoulos and Weijie Chen and Rajendra Singh and Hart, {Steven N.} and Ashish Sharma and Joel Saltz and Roberto Salgado and Gallas, {Brandon D.}",

year = "2021",

month = jan,

day = "1",

doi = "10.4103/jpi.jpi-83-20",

language = "English (US)",

volume = "12",

journal = "Journal of Pathology Informatics",

issn = "2229-5089",

publisher = "Medknow Publications and Media Pvt. Ltd",

number = "1",

}

TY - JOUR

T1 - A pathologist-annotated dataset for validating artificial intelligence

T2 - A project description and pilot study

AU - Dudgeon, Sarah N.

AU - Wen, Si

AU - Hanna, Matthew G.

AU - Gupta, Rajarsi

AU - Amgad, Mohamed

AU - Sheth, Manasi

AU - Marble, Hetal

AU - Huang, Richard

AU - Herrmann, Markus D.

AU - Szu, Clifford H.

AU - Tong, Darick

AU - Werness, Bruce

AU - Szu, Evan

AU - Larsimont, Denis

AU - Madabhushi, Anant

AU - Hytopoulos, Evangelos

AU - Chen, Weijie

AU - Singh, Rajendra

AU - Hart, Steven N.

AU - Sharma, Ashish

AU - Saltz, Joel

AU - Salgado, Roberto

AU - Gallas, Brandon D.

PY - 2021/1/1

Y1 - 2021/1/1

N2 - Purpose: Validating artificial intelligence algorithms for clinical use in medical images is a challenging endeavor due to a lack of standard reference data (ground truth). This topic typically occupies a small portion of the discussion in research papers since most of the efforts are focused on developing novel algorithms. In this work, we present a collaboration to create a validation dataset of pathologist annotations for algorithms that process whole slide images. We focus on data collection and evaluation of algorithm performance in the context of estimating the density of stromal tumor-infiltrating lymphocytes (sTILs) in breast cancer. Methods: We digitized 64 glass slides of hematoxylin-and eosin-stained invasive ductal carcinoma core biopsies prepared at a single clinical site. A collaborating pathologist selected 10 regions of interest (ROIs) per slide for evaluation. We created training materials and workflows to crowdsource pathologist image annotations on two modes: an optical microscope and two digital platforms. The microscope platform allows the same ROIs to be evaluated in both modes. The workflows collect the ROI type, a decision on whether the ROI is appropriate for estimating the density of sTILs, and if appropriate, the sTIL density value for that ROI. Results: In total, 19 pathologists made 1645 ROI evaluations during a data collection event and the following 2 weeks. The pilot study yielded an abundant number of cases with nominal sTIL infiltration. Furthermore, we found that the sTIL densities are correlated within a case, and there is notable pathologist variability. Consequently, we outline plans to improve our ROI and case sampling methods. We also outline statistical methods to account for ROI correlations within a case and pathologist variability when validating an algorithm. Conclusion: We have built workflows for efficient data collection and tested them in a pilot study. As we prepare for pivotal studies, we will investigate methods to use the dataset as an external validation tool for algorithms. We will also consider what it will take for the dataset to be fit for a regulatory purpose: study size, patient population, and pathologist training and qualifications. To this end, we will elicit feedback from the Food and Drug Administration via the Medical Device Development Tool program and from the broader digital pathology and AI community. Ultimately, we intend to share the dataset, statistical methods, and lessons learned.

AB - Purpose: Validating artificial intelligence algorithms for clinical use in medical images is a challenging endeavor due to a lack of standard reference data (ground truth). This topic typically occupies a small portion of the discussion in research papers since most of the efforts are focused on developing novel algorithms. In this work, we present a collaboration to create a validation dataset of pathologist annotations for algorithms that process whole slide images. We focus on data collection and evaluation of algorithm performance in the context of estimating the density of stromal tumor-infiltrating lymphocytes (sTILs) in breast cancer. Methods: We digitized 64 glass slides of hematoxylin-and eosin-stained invasive ductal carcinoma core biopsies prepared at a single clinical site. A collaborating pathologist selected 10 regions of interest (ROIs) per slide for evaluation. We created training materials and workflows to crowdsource pathologist image annotations on two modes: an optical microscope and two digital platforms. The microscope platform allows the same ROIs to be evaluated in both modes. The workflows collect the ROI type, a decision on whether the ROI is appropriate for estimating the density of sTILs, and if appropriate, the sTIL density value for that ROI. Results: In total, 19 pathologists made 1645 ROI evaluations during a data collection event and the following 2 weeks. The pilot study yielded an abundant number of cases with nominal sTIL infiltration. Furthermore, we found that the sTIL densities are correlated within a case, and there is notable pathologist variability. Consequently, we outline plans to improve our ROI and case sampling methods. We also outline statistical methods to account for ROI correlations within a case and pathologist variability when validating an algorithm. Conclusion: We have built workflows for efficient data collection and tested them in a pilot study. As we prepare for pivotal studies, we will investigate methods to use the dataset as an external validation tool for algorithms. We will also consider what it will take for the dataset to be fit for a regulatory purpose: study size, patient population, and pathologist training and qualifications. To this end, we will elicit feedback from the Food and Drug Administration via the Medical Device Development Tool program and from the broader digital pathology and AI community. Ultimately, we intend to share the dataset, statistical methods, and lessons learned.

KW - Artificial intelligence validation

KW - medical image analysis

KW - pathology

KW - reference standard

KW - tumor-infiltrating lymphocytes

UR - http://www.scopus.com/inward/record.url?scp=85120959752&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85120959752&partnerID=8YFLogxK

U2 - 10.4103/jpi.jpi-83-20

DO - 10.4103/jpi.jpi-83-20

M3 - Article

AN - SCOPUS:85120959752

SN - 2229-5089

VL - 12

JO - Journal of Pathology Informatics

JF - Journal of Pathology Informatics

IS - 1

M1 - 330486

ER -

A pathologist-annotated dataset for validating artificial intelligence: A project description and pilot study

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this