TY - JOUR
T1 - Assessing and Mitigating Bias in Medical Artificial Intelligence
T2 - The Effects of Race and Ethnicity on a Deep Learning Model for ECG Analysis
AU - Noseworthy, Peter A.
AU - Attia, Zachi I.
AU - Brewer, La Princess C.
AU - Hayes, Sharonne N.
AU - Yao, Xiaoxi
AU - Kapa, Suraj
AU - Friedman, Paul A.
AU - Lopez-Jimenez, Francisco
N1 - Publisher Copyright:
© 2020 Lippincott Williams and Wilkins. All rights reserved.
PY - 2020/3/1
Y1 - 2020/3/1
N2 - Background: Deep learning algorithms derived in homogeneous populations may be poorly generalizable and have the potential to reflect, perpetuate, and even exacerbate racial/ethnic disparities in health and health care. In this study, we aimed to (1) assess whether the performance of a deep learning algorithm designed to detect low left ventricular ejection fraction using the 12-lead ECG varies by race/ethnicity and to (2) determine whether its performance is determined by the derivation population or by racial variation in the ECG. Methods: We performed a retrospective cohort analysis that included 97 829 patients with paired ECGs and echocardiograms. We tested the model performance by race/ethnicity for convolutional neural network designed to identify patients with a left ventricular ejection fraction ≤35% from the 12-lead ECG. Results: The convolutional neural network that was previously derived in a homogeneous population (derivation cohort, n=44 959; 96.2% non-Hispanic white) demonstrated consistent performance to detect low left ventricular ejection fraction across a range of racial/ethnic subgroups in a separate testing cohort (n=52 870): non-Hispanic white (n=44 524; area under the curve [AUC], 0.931), Asian (n=557; AUC, 0.961), black/African American (n=651; AUC, 0.937), Hispanic/Latino (n=331; AUC, 0.937), and American Indian/Native Alaskan (n=223; AUC, 0.938). In secondary analyses, a separate neural network was able to discern racial subgroup category (black/African American [AUC, 0.84], and white, non-Hispanic [AUC, 0.76] in a 5-class classifier), and a network trained only in non-Hispanic whites from the original derivation cohort performed similarly well across a range of racial/ethnic subgroups in the testing cohort with an AUC of at least 0.930 in all racial/ethnic subgroups. Conclusions: Our study demonstrates that while ECG characteristics vary by race, this did not impact the ability of a convolutional neural network to predict low left ventricular ejection fraction from the ECG. We recommend reporting of performance among diverse ethnic, racial, age, and sex groups for all new artificial intelligence tools to ensure responsible use of artificial intelligence in medicine.
AB - Background: Deep learning algorithms derived in homogeneous populations may be poorly generalizable and have the potential to reflect, perpetuate, and even exacerbate racial/ethnic disparities in health and health care. In this study, we aimed to (1) assess whether the performance of a deep learning algorithm designed to detect low left ventricular ejection fraction using the 12-lead ECG varies by race/ethnicity and to (2) determine whether its performance is determined by the derivation population or by racial variation in the ECG. Methods: We performed a retrospective cohort analysis that included 97 829 patients with paired ECGs and echocardiograms. We tested the model performance by race/ethnicity for convolutional neural network designed to identify patients with a left ventricular ejection fraction ≤35% from the 12-lead ECG. Results: The convolutional neural network that was previously derived in a homogeneous population (derivation cohort, n=44 959; 96.2% non-Hispanic white) demonstrated consistent performance to detect low left ventricular ejection fraction across a range of racial/ethnic subgroups in a separate testing cohort (n=52 870): non-Hispanic white (n=44 524; area under the curve [AUC], 0.931), Asian (n=557; AUC, 0.961), black/African American (n=651; AUC, 0.937), Hispanic/Latino (n=331; AUC, 0.937), and American Indian/Native Alaskan (n=223; AUC, 0.938). In secondary analyses, a separate neural network was able to discern racial subgroup category (black/African American [AUC, 0.84], and white, non-Hispanic [AUC, 0.76] in a 5-class classifier), and a network trained only in non-Hispanic whites from the original derivation cohort performed similarly well across a range of racial/ethnic subgroups in the testing cohort with an AUC of at least 0.930 in all racial/ethnic subgroups. Conclusions: Our study demonstrates that while ECG characteristics vary by race, this did not impact the ability of a convolutional neural network to predict low left ventricular ejection fraction from the ECG. We recommend reporting of performance among diverse ethnic, racial, age, and sex groups for all new artificial intelligence tools to ensure responsible use of artificial intelligence in medicine.
KW - United States
KW - artificial intelligence
KW - electrocardiography
KW - humans
KW - machine learning
UR - http://www.scopus.com/inward/record.url?scp=85082098420&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85082098420&partnerID=8YFLogxK
U2 - 10.1161/CIRCEP.119.007988
DO - 10.1161/CIRCEP.119.007988
M3 - Article
C2 - 32064914
AN - SCOPUS:85082098420
SN - 1941-3149
VL - 13
SP - E007988
JO - Circulation: Arrhythmia and Electrophysiology
JF - Circulation: Arrhythmia and Electrophysiology
IS - 3
ER -