Prediction of 30-day all-cause readmissions in patients hospitalized for heart failure: Comparison of machine learning and other statistical approaches

Jarrod D. Frizzell; Li Liang; Phillip J. Schulte; Clyde W. Yancy; Paul A. Heidenreich; Adrian F. Hernandez; Deepak L. Bhatt; Gregg C. Fonarow; Warren K. Laskey

doi:10.1001/jamacardio.2016.3956

Prediction of 30-day all-cause readmissions in patients hospitalized for heart failure: Comparison of machine learning and other statistical approaches

Jarrod D. Frizzell, Li Liang, Phillip J. Schulte, Clyde W. Yancy, Paul A. Heidenreich, Adrian F. Hernandez, Deepak L. Bhatt, Gregg C. Fonarow, Warren K. Laskey

Quantitative Health Sciences

Research output: Contribution to journal › Article › peer-review

110 Scopus citations

Abstract

IMPORTANCE: Several attempts have been made at developing models to predict 30-day readmissions in patients with heart failure, but none have sufficient discriminatory capacity for clinical use. Machine-learning (ML) algorithms represent a novel approach and may have potential advantages over traditional statistical modeling. OBJECTIVE: To develop models using a ML approach to predict all-cause readmissions 30 days after discharge from a heart failure hospitalization and to compare ML model performance with models developed using “conventional” statistically based methods. DESIGN, SETTING, AND PARTICIPANTS: Models were developed using ML algorithms, specifically, a tree-augmented naive Bayesian network, a random forest algorithm, and a gradient-boosted model and compared with traditional statistical methods using 2 independently derived logistic regression models (a de novo model and an a priori model developed using electronic health records) and a least absolute shrinkage and selection operator method. The study sample was randomly divided into training (70%) and validation (30%) sets to develop and test model performance. This was a registry-based study, and the study sample was obtained by linking patients from the Get With the Guidelines Heart Failure registry with Medicare data. After applying appropriate inclusion and exclusion criteria, 56 477 patients were included in our analysis. The study was conducted between January 4, 2005, and December 1, 2010, and analysis of the data was conducted between November 25, 2014, and June 30, 2016. MAIN OUTCOMES AND MEASURES: C statistics were used for comparison of discriminatory capacity across models in the validation sample. RESULTS: The overall 30-day rehospitalization rate was 21.2% (11 959 of 56 477 patients). For the tree-augmented naive Bayesian network, random forest, gradient-boosted, logistic regression, and least absolute shrinkage and selection operator models, C statistics for the validation sets were similar: 0.618, 0.607, 0.614, 0.624, and 0.618, respectively. Applying the previously validated electronic health records model to our study sample yielded a C statistic of 0.589 for the validation set. CONCLUSIONS AND RELEVANCE: Use of a number of ML algorithms did not improve prediction of 30-day heart failure readmissions compared with more traditional prediction models. Although there will likely be further applications of ML approaches in prognostic modeling, our study fits within the literature of limited predictive ability for heart failure readmissions.

Original language	English (US)
Pages (from-to)	204-209
Number of pages	6
Journal	JAMA cardiology
Volume	2
Issue number	2
DOIs	https://doi.org/10.1001/jamacardio.2016.3956
State	Published - Feb 2017

ASJC Scopus subject areas

Cardiology and Cardiovascular Medicine

Access to Document

10.1001/jamacardio.2016.3956

Cite this

Frizzell, J. D., Liang, L., Schulte, P. J., Yancy, C. W., Heidenreich, P. A., Hernandez, A. F., Bhatt, D. L., Fonarow, G. C., & Laskey, W. K. (2017). Prediction of 30-day all-cause readmissions in patients hospitalized for heart failure: Comparison of machine learning and other statistical approaches. JAMA cardiology, 2(2), 204-209. https://doi.org/10.1001/jamacardio.2016.3956

@article{01b1608f778948dbb10da6662d1008e7,

title = "Prediction of 30-day all-cause readmissions in patients hospitalized for heart failure: Comparison of machine learning and other statistical approaches",

abstract = "IMPORTANCE: Several attempts have been made at developing models to predict 30-day readmissions in patients with heart failure, but none have sufficient discriminatory capacity for clinical use. Machine-learning (ML) algorithms represent a novel approach and may have potential advantages over traditional statistical modeling. OBJECTIVE: To develop models using a ML approach to predict all-cause readmissions 30 days after discharge from a heart failure hospitalization and to compare ML model performance with models developed using “conventional” statistically based methods. DESIGN, SETTING, AND PARTICIPANTS: Models were developed using ML algorithms, specifically, a tree-augmented naive Bayesian network, a random forest algorithm, and a gradient-boosted model and compared with traditional statistical methods using 2 independently derived logistic regression models (a de novo model and an a priori model developed using electronic health records) and a least absolute shrinkage and selection operator method. The study sample was randomly divided into training (70%) and validation (30%) sets to develop and test model performance. This was a registry-based study, and the study sample was obtained by linking patients from the Get With the Guidelines Heart Failure registry with Medicare data. After applying appropriate inclusion and exclusion criteria, 56 477 patients were included in our analysis. The study was conducted between January 4, 2005, and December 1, 2010, and analysis of the data was conducted between November 25, 2014, and June 30, 2016. MAIN OUTCOMES AND MEASURES: C statistics were used for comparison of discriminatory capacity across models in the validation sample. RESULTS: The overall 30-day rehospitalization rate was 21.2% (11 959 of 56 477 patients). For the tree-augmented naive Bayesian network, random forest, gradient-boosted, logistic regression, and least absolute shrinkage and selection operator models, C statistics for the validation sets were similar: 0.618, 0.607, 0.614, 0.624, and 0.618, respectively. Applying the previously validated electronic health records model to our study sample yielded a C statistic of 0.589 for the validation set. CONCLUSIONS AND RELEVANCE: Use of a number of ML algorithms did not improve prediction of 30-day heart failure readmissions compared with more traditional prediction models. Although there will likely be further applications of ML approaches in prognostic modeling, our study fits within the literature of limited predictive ability for heart failure readmissions.",

author = "Frizzell, {Jarrod D.} and Li Liang and Schulte, {Phillip J.} and Yancy, {Clyde W.} and Heidenreich, {Paul A.} and Hernandez, {Adrian F.} and Bhatt, {Deepak L.} and Fonarow, {Gregg C.} and Laskey, {Warren K.}",

year = "2017",

month = feb,

doi = "10.1001/jamacardio.2016.3956",

language = "English (US)",

volume = "2",

pages = "204--209",

journal = "JAMA cardiology",

issn = "2380-6583",

publisher = "American Medical Association",

number = "2",

}

TY - JOUR

T1 - Prediction of 30-day all-cause readmissions in patients hospitalized for heart failure

T2 - Comparison of machine learning and other statistical approaches

AU - Frizzell, Jarrod D.

AU - Liang, Li

AU - Schulte, Phillip J.

AU - Yancy, Clyde W.

AU - Heidenreich, Paul A.

AU - Hernandez, Adrian F.

AU - Bhatt, Deepak L.

AU - Fonarow, Gregg C.

AU - Laskey, Warren K.

PY - 2017/2

Y1 - 2017/2

N2 - IMPORTANCE: Several attempts have been made at developing models to predict 30-day readmissions in patients with heart failure, but none have sufficient discriminatory capacity for clinical use. Machine-learning (ML) algorithms represent a novel approach and may have potential advantages over traditional statistical modeling. OBJECTIVE: To develop models using a ML approach to predict all-cause readmissions 30 days after discharge from a heart failure hospitalization and to compare ML model performance with models developed using “conventional” statistically based methods. DESIGN, SETTING, AND PARTICIPANTS: Models were developed using ML algorithms, specifically, a tree-augmented naive Bayesian network, a random forest algorithm, and a gradient-boosted model and compared with traditional statistical methods using 2 independently derived logistic regression models (a de novo model and an a priori model developed using electronic health records) and a least absolute shrinkage and selection operator method. The study sample was randomly divided into training (70%) and validation (30%) sets to develop and test model performance. This was a registry-based study, and the study sample was obtained by linking patients from the Get With the Guidelines Heart Failure registry with Medicare data. After applying appropriate inclusion and exclusion criteria, 56 477 patients were included in our analysis. The study was conducted between January 4, 2005, and December 1, 2010, and analysis of the data was conducted between November 25, 2014, and June 30, 2016. MAIN OUTCOMES AND MEASURES: C statistics were used for comparison of discriminatory capacity across models in the validation sample. RESULTS: The overall 30-day rehospitalization rate was 21.2% (11 959 of 56 477 patients). For the tree-augmented naive Bayesian network, random forest, gradient-boosted, logistic regression, and least absolute shrinkage and selection operator models, C statistics for the validation sets were similar: 0.618, 0.607, 0.614, 0.624, and 0.618, respectively. Applying the previously validated electronic health records model to our study sample yielded a C statistic of 0.589 for the validation set. CONCLUSIONS AND RELEVANCE: Use of a number of ML algorithms did not improve prediction of 30-day heart failure readmissions compared with more traditional prediction models. Although there will likely be further applications of ML approaches in prognostic modeling, our study fits within the literature of limited predictive ability for heart failure readmissions.

AB - IMPORTANCE: Several attempts have been made at developing models to predict 30-day readmissions in patients with heart failure, but none have sufficient discriminatory capacity for clinical use. Machine-learning (ML) algorithms represent a novel approach and may have potential advantages over traditional statistical modeling. OBJECTIVE: To develop models using a ML approach to predict all-cause readmissions 30 days after discharge from a heart failure hospitalization and to compare ML model performance with models developed using “conventional” statistically based methods. DESIGN, SETTING, AND PARTICIPANTS: Models were developed using ML algorithms, specifically, a tree-augmented naive Bayesian network, a random forest algorithm, and a gradient-boosted model and compared with traditional statistical methods using 2 independently derived logistic regression models (a de novo model and an a priori model developed using electronic health records) and a least absolute shrinkage and selection operator method. The study sample was randomly divided into training (70%) and validation (30%) sets to develop and test model performance. This was a registry-based study, and the study sample was obtained by linking patients from the Get With the Guidelines Heart Failure registry with Medicare data. After applying appropriate inclusion and exclusion criteria, 56 477 patients were included in our analysis. The study was conducted between January 4, 2005, and December 1, 2010, and analysis of the data was conducted between November 25, 2014, and June 30, 2016. MAIN OUTCOMES AND MEASURES: C statistics were used for comparison of discriminatory capacity across models in the validation sample. RESULTS: The overall 30-day rehospitalization rate was 21.2% (11 959 of 56 477 patients). For the tree-augmented naive Bayesian network, random forest, gradient-boosted, logistic regression, and least absolute shrinkage and selection operator models, C statistics for the validation sets were similar: 0.618, 0.607, 0.614, 0.624, and 0.618, respectively. Applying the previously validated electronic health records model to our study sample yielded a C statistic of 0.589 for the validation set. CONCLUSIONS AND RELEVANCE: Use of a number of ML algorithms did not improve prediction of 30-day heart failure readmissions compared with more traditional prediction models. Although there will likely be further applications of ML approaches in prognostic modeling, our study fits within the literature of limited predictive ability for heart failure readmissions.

UR - http://www.scopus.com/inward/record.url?scp=85017203403&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85017203403&partnerID=8YFLogxK

U2 - 10.1001/jamacardio.2016.3956

DO - 10.1001/jamacardio.2016.3956

M3 - Article

C2 - 27784047

AN - SCOPUS:85017203403

SN - 2380-6583

VL - 2

SP - 204

EP - 209

JO - JAMA cardiology

JF - JAMA cardiology

IS - 2

ER -

Prediction of 30-day all-cause readmissions in patients hospitalized for heart failure: Comparison of machine learning and other statistical approaches

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this