Federated learning and differential privacy for medical image analysis

Mohammed Adnan; Shivam Kalra; Jesse C. Cresswell; Graham W. Taylor; Hamid R. Tizhoosh

doi:10.1038/s41598-022-05539-7

Federated learning and differential privacy for medical image analysis

Mohammed Adnan, Shivam Kalra, Jesse C. Cresswell, Graham W. Taylor, Hamid R. Tizhoosh

Artificial Intelligence and Informatics

Research output: Contribution to journal › Article › peer-review

Abstract

The artificial intelligence revolution has been spurred forward by the availability of large-scale datasets. In contrast, the paucity of large-scale medical datasets hinders the application of machine learning in healthcare. The lack of publicly available multi-centric and diverse datasets mainly stems from confidentiality and privacy concerns around sharing medical data. To demonstrate a feasible path forward in medical image imaging, we conduct a case study of applying a differentially private federated learning framework for analysis of histopathology images, the largest and perhaps most complex medical images. We study the effects of IID and non-IID distributions along with the number of healthcare providers, i.e., hospitals and clinics, and the individual dataset sizes, using The Cancer Genome Atlas (TCGA) dataset, a public repository, to simulate a distributed environment. We empirically compare the performance of private, distributed training to conventional training and demonstrate that distributed training can achieve similar performance with strong privacy guarantees. We also study the effect of different source domains for histopathology images by evaluating the performance using external validation. Our work indicates that differentially private federated learning is a viable and reliable framework for the collaborative development of machine learning models in medical image analysis.

Original language	English (US)
Article number	1953
Journal	Scientific reports
Volume	12
Issue number	1
DOIs	https://doi.org/10.1038/s41598-022-05539-7
State	Published - Dec 2022

ASJC Scopus subject areas

General

Access to Document

10.1038/s41598-022-05539-7

Cite this

@article{97c9f5c0c4714251a5c377616bf32211,

title = "Federated learning and differential privacy for medical image analysis",

abstract = "The artificial intelligence revolution has been spurred forward by the availability of large-scale datasets. In contrast, the paucity of large-scale medical datasets hinders the application of machine learning in healthcare. The lack of publicly available multi-centric and diverse datasets mainly stems from confidentiality and privacy concerns around sharing medical data. To demonstrate a feasible path forward in medical image imaging, we conduct a case study of applying a differentially private federated learning framework for analysis of histopathology images, the largest and perhaps most complex medical images. We study the effects of IID and non-IID distributions along with the number of healthcare providers, i.e., hospitals and clinics, and the individual dataset sizes, using The Cancer Genome Atlas (TCGA) dataset, a public repository, to simulate a distributed environment. We empirically compare the performance of private, distributed training to conventional training and demonstrate that distributed training can achieve similar performance with strong privacy guarantees. We also study the effect of different source domains for histopathology images by evaluating the performance using external validation. Our work indicates that differentially private federated learning is a viable and reliable framework for the collaborative development of machine learning models in medical image analysis.",

author = "Mohammed Adnan and Shivam Kalra and Cresswell, {Jesse C.} and Taylor, {Graham W.} and Tizhoosh, {Hamid R.}",

note = "Publisher Copyright: {\textcopyright} 2022, The Author(s).",

year = "2022",

month = dec,

doi = "10.1038/s41598-022-05539-7",

language = "English (US)",

volume = "12",

journal = "Scientific reports",

issn = "2045-2322",

publisher = "Nature Publishing Group",

number = "1",

}

TY - JOUR

T1 - Federated learning and differential privacy for medical image analysis

AU - Adnan, Mohammed

AU - Kalra, Shivam

AU - Cresswell, Jesse C.

AU - Taylor, Graham W.

AU - Tizhoosh, Hamid R.

PY - 2022/12

Y1 - 2022/12

N2 - The artificial intelligence revolution has been spurred forward by the availability of large-scale datasets. In contrast, the paucity of large-scale medical datasets hinders the application of machine learning in healthcare. The lack of publicly available multi-centric and diverse datasets mainly stems from confidentiality and privacy concerns around sharing medical data. To demonstrate a feasible path forward in medical image imaging, we conduct a case study of applying a differentially private federated learning framework for analysis of histopathology images, the largest and perhaps most complex medical images. We study the effects of IID and non-IID distributions along with the number of healthcare providers, i.e., hospitals and clinics, and the individual dataset sizes, using The Cancer Genome Atlas (TCGA) dataset, a public repository, to simulate a distributed environment. We empirically compare the performance of private, distributed training to conventional training and demonstrate that distributed training can achieve similar performance with strong privacy guarantees. We also study the effect of different source domains for histopathology images by evaluating the performance using external validation. Our work indicates that differentially private federated learning is a viable and reliable framework for the collaborative development of machine learning models in medical image analysis.

AB - The artificial intelligence revolution has been spurred forward by the availability of large-scale datasets. In contrast, the paucity of large-scale medical datasets hinders the application of machine learning in healthcare. The lack of publicly available multi-centric and diverse datasets mainly stems from confidentiality and privacy concerns around sharing medical data. To demonstrate a feasible path forward in medical image imaging, we conduct a case study of applying a differentially private federated learning framework for analysis of histopathology images, the largest and perhaps most complex medical images. We study the effects of IID and non-IID distributions along with the number of healthcare providers, i.e., hospitals and clinics, and the individual dataset sizes, using The Cancer Genome Atlas (TCGA) dataset, a public repository, to simulate a distributed environment. We empirically compare the performance of private, distributed training to conventional training and demonstrate that distributed training can achieve similar performance with strong privacy guarantees. We also study the effect of different source domains for histopathology images by evaluating the performance using external validation. Our work indicates that differentially private federated learning is a viable and reliable framework for the collaborative development of machine learning models in medical image analysis.

UR - http://www.scopus.com/inward/record.url?scp=85124172819&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85124172819&partnerID=8YFLogxK

U2 - 10.1038/s41598-022-05539-7

DO - 10.1038/s41598-022-05539-7

M3 - Article

C2 - 35121774

AN - SCOPUS:85124172819

SN - 2045-2322

VL - 12

JO - Scientific reports

JF - Scientific reports

IS - 1

M1 - 1953

ER -

Federated learning and differential privacy for medical image analysis

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this