Prediction and Inference With Missing Data in Patient Alert Systems

Curtis Storlie, Terry M Therneau, Rickey E. Carter, Nicholas D Chia, John R. Bergquist, Jeanne M. Huddleston, Santiago Romero-Brufau

Research output: Contribution to journalArticle

Abstract

We describe the Bedside Patient Rescue (BPR) project, the goal of which is risk prediction of adverse events for non-intensive care unit patients using ∼100 variables (vitals, lab results, assessments, etc.). There are several missing predictor values for most patients, which in the health sciences is the norm, rather than the exception. A Bayesian approach is presented that addresses many of the shortcomings to standard approaches to missing predictors: (i) treatment of the uncertainty due to imputation is straight-forward in the Bayesian paradigm, (ii) the predictor distribution is flexibly modeled as an infinite normal mixture with latent variables to explicitly account for discrete predictors (i.e., as in multivariate probit regression models), and (iii) certain missing not at random situations can be handled effectively by allowing the indicator of missingness into the predictor distribution only to inform the distribution of the missing variables. The proposed approach also has the benefit of providing a distribution for the prediction, including the uncertainty inherent in the imputation. Therefore, we can ask questions such as: is it possible this individual is at high risk but we are missing too much information to know for sure? How much would we reduce the uncertainty in our risk prediction by obtaining a particular missing value? This approach is applied to the BPR problem resulting in excellent predictive capability to identify deteriorating patients. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.

Original languageEnglish (US)
JournalJournal of the American Statistical Association
DOIs
StatePublished - Jan 1 2019

    Fingerprint

Keywords

  • Continuous and categorical
  • Dirichlet process
  • Hierarchical Bayesian model
  • Latent variable
  • Missing data
  • Multiple imputation

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this