COVID-19 mortality prediction from deep learning in a large multistate electronic health record and laboratory information system data set: Algorithm development and validation

Saranya Sankaranarayanan; Jagadheshwar Balan; Jesse R. Walsh; Yanhong Wu; Sara Minnich; Amy Piazza; Collin Osborne; Gavin R. Oliver; Jessica Lesko; Kathy L. Bates; Kia Khezeli; Darci R. Block; Margaret DiGuardo; Justin Kreuter; John C. O’Horo; John Kalantari; Eric W. Klee; Mohamed E. Salama; Benjamin Kipp; William G. Morice; Garrett Jenkinson

doi:10.2196/30157

COVID-19 mortality prediction from deep learning in a large multistate electronic health record and laboratory information system data set: Algorithm development and validation

Saranya Sankaranarayanan, Jagadheshwar Balan, Jesse R. Walsh, Yanhong Wu, Sara Minnich, Amy Piazza, Collin Osborne, Gavin R. Oliver, Jessica Lesko, Kathy L. Bates, Kia Khezeli, Darci R. Block, Margaret DiGuardo, Justin Kreuter, John C. O’Horo, John Kalantari, Eric W. Klee, Mohamed E. Salama, Benjamin Kipp, William G. MoriceGarrett Jenkinson

Research output: Contribution to journal › Review article › peer-review

Abstract

Background: COVID-19 is caused by the SARS-CoV-2 virus and has strikingly heterogeneous clinical manifestations, with most individuals contracting mild disease but a substantial minority experiencing fulminant cardiopulmonary symptoms or death. The clinical covariates and the laboratory tests performed on a patient provide robust statistics to guide clinical treatment. Deep learning approaches on a data set of this nature enable patient stratification and provide methods to guide clinical treatment. Objective: Here, we report on the development and prospective validation of a state-of-the-art machine learning model to provide mortality prediction shortly after confirmation of SARS-CoV-2 infection in the Mayo Clinic patient population. Methods: We retrospectively constructed one of the largest reported and most geographically diverse laboratory information system and electronic health record of COVID-19 data sets in the published literature, which included 11,807 patients residing in 41 states of the United States of America and treated at medical sites across 5 states in 3 time zones. Traditional machine learning models were evaluated independently as well as in a stacked learner approach by using AutoGluon, and various recurrent neural network architectures were considered. The traditional machine learning models were implemented using the AutoGluon-Tabular framework, whereas the recurrent neural networks utilized the TensorFlow Keras framework. We trained these models to operate solely using routine laboratory measurements and clinical covariates available within 72 hours of a patient’s first positive COVID-19 nucleic acid test result. Results: The GRU-D recurrent neural network achieved peak cross-validation performance with 0.938 (SE 0.004) as the area under the receiver operating characteristic (AUROC) curve. This model retained strong performance by reducing the follow-up time to 12 hours (0.916 [SE 0.005] AUROC), and the leave-one-out feature importance analysis indicated that the most independently valuable features were age, Charlson comorbidity index, minimum oxygen saturation, fibrinogen level, and serum iron level. In the prospective testing cohort, this model provided an AUROC of 0.901 and a statistically significant difference in survival (P<.001, hazard ratio for those predicted to survive, 95% CI 0.043-0.106). Conclusions: Our deep learning approach using GRU-D provides an alert system to flag mortality for COVID-19–positive patients by using clinical covariates and laboratory values within a 72-hour window after the first positive nucleic acid test result.

Original language	English (US)
Article number	e30157
Journal	Journal of medical Internet research
Volume	23
Issue number	9
DOIs	https://doi.org/10.2196/30157
State	Published - Sep 2021

Keywords

Algorithm
COVID-19
Deep learning
Development
EHR
Electronic health record
Machine learning
Missing data
Mortality
Neural network
Prediction
Recurrent neural networks
Time series
Validation

ASJC Scopus subject areas

Health Informatics

Access to Document

10.2196/30157

Cite this

Sankaranarayanan, S., Balan, J., Walsh, J. R., Wu, Y., Minnich, S., Piazza, A., Osborne, C., Oliver, G. R., Lesko, J., Bates, K. L., Khezeli, K., Block, D. R., DiGuardo, M., Kreuter, J., O’Horo, J. C., Kalantari, J., Klee, E. W., Salama, M. E., Kipp, B., ... Jenkinson, G. (2021). COVID-19 mortality prediction from deep learning in a large multistate electronic health record and laboratory information system data set: Algorithm development and validation. Journal of medical Internet research, 23(9), Article e30157. https://doi.org/10.2196/30157

COVID-19 mortality prediction from deep learning in a large multistate electronic health record and laboratory information system data set: Algorithm development and validation. / Sankaranarayanan, Saranya; Balan, Jagadheshwar; Walsh, Jesse R. et al.
In: Journal of medical Internet research, Vol. 23, No. 9, e30157, 09.2021.

Research output: Contribution to journal › Review article › peer-review

Sankaranarayanan, S, Balan, J, Walsh, JR, Wu, Y, Minnich, S, Piazza, A, Osborne, C, Oliver, GR, Lesko, J, Bates, KL, Khezeli, K, Block, DR, DiGuardo, M, Kreuter, J, O’Horo, JC, Kalantari, J , Klee, EW, Salama, ME, Kipp, B, Morice, WG & Jenkinson, G 2021, 'COVID-19 mortality prediction from deep learning in a large multistate electronic health record and laboratory information system data set: Algorithm development and validation', Journal of medical Internet research, vol. 23, no. 9, e30157. https://doi.org/10.2196/30157

@article{001e64d2fbe142e396f5503d6243b9be,

title = "COVID-19 mortality prediction from deep learning in a large multistate electronic health record and laboratory information system data set: Algorithm development and validation",

abstract = "Background: COVID-19 is caused by the SARS-CoV-2 virus and has strikingly heterogeneous clinical manifestations, with most individuals contracting mild disease but a substantial minority experiencing fulminant cardiopulmonary symptoms or death. The clinical covariates and the laboratory tests performed on a patient provide robust statistics to guide clinical treatment. Deep learning approaches on a data set of this nature enable patient stratification and provide methods to guide clinical treatment. Objective: Here, we report on the development and prospective validation of a state-of-the-art machine learning model to provide mortality prediction shortly after confirmation of SARS-CoV-2 infection in the Mayo Clinic patient population. Methods: We retrospectively constructed one of the largest reported and most geographically diverse laboratory information system and electronic health record of COVID-19 data sets in the published literature, which included 11,807 patients residing in 41 states of the United States of America and treated at medical sites across 5 states in 3 time zones. Traditional machine learning models were evaluated independently as well as in a stacked learner approach by using AutoGluon, and various recurrent neural network architectures were considered. The traditional machine learning models were implemented using the AutoGluon-Tabular framework, whereas the recurrent neural networks utilized the TensorFlow Keras framework. We trained these models to operate solely using routine laboratory measurements and clinical covariates available within 72 hours of a patient{\textquoteright}s first positive COVID-19 nucleic acid test result. Results: The GRU-D recurrent neural network achieved peak cross-validation performance with 0.938 (SE 0.004) as the area under the receiver operating characteristic (AUROC) curve. This model retained strong performance by reducing the follow-up time to 12 hours (0.916 [SE 0.005] AUROC), and the leave-one-out feature importance analysis indicated that the most independently valuable features were age, Charlson comorbidity index, minimum oxygen saturation, fibrinogen level, and serum iron level. In the prospective testing cohort, this model provided an AUROC of 0.901 and a statistically significant difference in survival (P<.001, hazard ratio for those predicted to survive, 95% CI 0.043-0.106). Conclusions: Our deep learning approach using GRU-D provides an alert system to flag mortality for COVID-19–positive patients by using clinical covariates and laboratory values within a 72-hour window after the first positive nucleic acid test result.",

keywords = "Algorithm, COVID-19, Deep learning, Development, EHR, Electronic health record, Machine learning, Missing data, Mortality, Neural network, Prediction, Recurrent neural networks, Time series, Validation",

author = "Saranya Sankaranarayanan and Jagadheshwar Balan and Walsh, {Jesse R.} and Yanhong Wu and Sara Minnich and Amy Piazza and Collin Osborne and Oliver, {Gavin R.} and Jessica Lesko and Bates, {Kathy L.} and Kia Khezeli and Block, {Darci R.} and Margaret DiGuardo and Justin Kreuter and O{\textquoteright}Horo, {John C.} and John Kalantari and Klee, {Eric W.} and Salama, {Mohamed E.} and Benjamin Kipp and Morice, {William G.} and Garrett Jenkinson",

note = "Publisher Copyright: John Kalantari, Eric W Klee, Mohamed E Salama, Benjamin Kipp, William G Morice, Garrett Jenkinson.",

year = "2021",

month = sep,

doi = "10.2196/30157",

language = "English (US)",

volume = "23",

journal = "Journal of medical Internet research",

issn = "1439-4456",

publisher = "Journal of medical Internet Research",

number = "9",

}

TY - JOUR

T1 - COVID-19 mortality prediction from deep learning in a large multistate electronic health record and laboratory information system data set

T2 - Algorithm development and validation

AU - Sankaranarayanan, Saranya

AU - Balan, Jagadheshwar

AU - Walsh, Jesse R.

AU - Wu, Yanhong

AU - Minnich, Sara

AU - Piazza, Amy

AU - Osborne, Collin

AU - Oliver, Gavin R.

AU - Lesko, Jessica

AU - Bates, Kathy L.

AU - Khezeli, Kia

AU - Block, Darci R.

AU - DiGuardo, Margaret

AU - Kreuter, Justin

AU - O’Horo, John C.

AU - Kalantari, John

AU - Klee, Eric W.

AU - Salama, Mohamed E.

AU - Kipp, Benjamin

AU - Morice, William G.

AU - Jenkinson, Garrett

N1 - Publisher Copyright: John Kalantari, Eric W Klee, Mohamed E Salama, Benjamin Kipp, William G Morice, Garrett Jenkinson.

PY - 2021/9

Y1 - 2021/9

N2 - Background: COVID-19 is caused by the SARS-CoV-2 virus and has strikingly heterogeneous clinical manifestations, with most individuals contracting mild disease but a substantial minority experiencing fulminant cardiopulmonary symptoms or death. The clinical covariates and the laboratory tests performed on a patient provide robust statistics to guide clinical treatment. Deep learning approaches on a data set of this nature enable patient stratification and provide methods to guide clinical treatment. Objective: Here, we report on the development and prospective validation of a state-of-the-art machine learning model to provide mortality prediction shortly after confirmation of SARS-CoV-2 infection in the Mayo Clinic patient population. Methods: We retrospectively constructed one of the largest reported and most geographically diverse laboratory information system and electronic health record of COVID-19 data sets in the published literature, which included 11,807 patients residing in 41 states of the United States of America and treated at medical sites across 5 states in 3 time zones. Traditional machine learning models were evaluated independently as well as in a stacked learner approach by using AutoGluon, and various recurrent neural network architectures were considered. The traditional machine learning models were implemented using the AutoGluon-Tabular framework, whereas the recurrent neural networks utilized the TensorFlow Keras framework. We trained these models to operate solely using routine laboratory measurements and clinical covariates available within 72 hours of a patient’s first positive COVID-19 nucleic acid test result. Results: The GRU-D recurrent neural network achieved peak cross-validation performance with 0.938 (SE 0.004) as the area under the receiver operating characteristic (AUROC) curve. This model retained strong performance by reducing the follow-up time to 12 hours (0.916 [SE 0.005] AUROC), and the leave-one-out feature importance analysis indicated that the most independently valuable features were age, Charlson comorbidity index, minimum oxygen saturation, fibrinogen level, and serum iron level. In the prospective testing cohort, this model provided an AUROC of 0.901 and a statistically significant difference in survival (P<.001, hazard ratio for those predicted to survive, 95% CI 0.043-0.106). Conclusions: Our deep learning approach using GRU-D provides an alert system to flag mortality for COVID-19–positive patients by using clinical covariates and laboratory values within a 72-hour window after the first positive nucleic acid test result.

AB - Background: COVID-19 is caused by the SARS-CoV-2 virus and has strikingly heterogeneous clinical manifestations, with most individuals contracting mild disease but a substantial minority experiencing fulminant cardiopulmonary symptoms or death. The clinical covariates and the laboratory tests performed on a patient provide robust statistics to guide clinical treatment. Deep learning approaches on a data set of this nature enable patient stratification and provide methods to guide clinical treatment. Objective: Here, we report on the development and prospective validation of a state-of-the-art machine learning model to provide mortality prediction shortly after confirmation of SARS-CoV-2 infection in the Mayo Clinic patient population. Methods: We retrospectively constructed one of the largest reported and most geographically diverse laboratory information system and electronic health record of COVID-19 data sets in the published literature, which included 11,807 patients residing in 41 states of the United States of America and treated at medical sites across 5 states in 3 time zones. Traditional machine learning models were evaluated independently as well as in a stacked learner approach by using AutoGluon, and various recurrent neural network architectures were considered. The traditional machine learning models were implemented using the AutoGluon-Tabular framework, whereas the recurrent neural networks utilized the TensorFlow Keras framework. We trained these models to operate solely using routine laboratory measurements and clinical covariates available within 72 hours of a patient’s first positive COVID-19 nucleic acid test result. Results: The GRU-D recurrent neural network achieved peak cross-validation performance with 0.938 (SE 0.004) as the area under the receiver operating characteristic (AUROC) curve. This model retained strong performance by reducing the follow-up time to 12 hours (0.916 [SE 0.005] AUROC), and the leave-one-out feature importance analysis indicated that the most independently valuable features were age, Charlson comorbidity index, minimum oxygen saturation, fibrinogen level, and serum iron level. In the prospective testing cohort, this model provided an AUROC of 0.901 and a statistically significant difference in survival (P<.001, hazard ratio for those predicted to survive, 95% CI 0.043-0.106). Conclusions: Our deep learning approach using GRU-D provides an alert system to flag mortality for COVID-19–positive patients by using clinical covariates and laboratory values within a 72-hour window after the first positive nucleic acid test result.

KW - Algorithm

KW - COVID-19

KW - Deep learning

KW - Development

KW - EHR

KW - Electronic health record

KW - Machine learning

KW - Missing data

KW - Mortality

KW - Neural network

KW - Prediction

KW - Recurrent neural networks

KW - Time series

KW - Validation

UR - http://www.scopus.com/inward/record.url?scp=85116596888&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85116596888&partnerID=8YFLogxK

U2 - 10.2196/30157

DO - 10.2196/30157

M3 - Review article

C2 - 34449401

AN - SCOPUS:85116596888

SN - 1439-4456

VL - 23

JO - Journal of medical Internet research

JF - Journal of medical Internet research

IS - 9

M1 - e30157

ER -

COVID-19 mortality prediction from deep learning in a large multistate electronic health record and laboratory information system data set: Algorithm development and validation

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this