Evaluation of predictive model performance of an existing model in the presence of missing data

Pin Li; Jeremy M.G. Taylor; Daniel E. Spratt; R. Jeffery Karnes; Matthew J. Schipper

doi:10.1002/sim.8978

Evaluation of predictive model performance of an existing model in the presence of missing data

Pin Li, Jeremy M.G. Taylor, Daniel E. Spratt, R. Jeffery Karnes, Matthew J. Schipper

Urology

Research output: Contribution to journal › Article › peer-review

Abstract

In medical research, the Brier score (BS) and the area under the receiver operating characteristic (ROC) curves (AUC) are two common metrics used to evaluate prediction models of a binary outcome, such as using biomarkers to predict the risk of developing a disease in the future. The assessment of an existing prediction models using data with missing covariate values is challenging. In this article, we propose inverse probability weighted (IPW) and augmented inverse probability weighted (AIPW) estimates of AUC and BS to handle the missing data. An alternative approach uses multiple imputation (MI), which requires a model for the distribution of the missing variable. We evaluated the performance of IPW and AIPW in comparison with MI in simulation studies under missing completely at random, missing at random, and missing not at random scenarios. When there are missing observations in the data, MI and IPW can be used to obtain unbiased estimates of BS and AUC if the imputation model for the missing variable or the model for the missingness is correctly specified. MI is more efficient than IPW. Our simulation results suggest that AIPW can be more efficient than IPW, and also achieves double robustness from miss-specification of either the missingness model or the imputation model. The outcome variable should be included in the model for the missing variable under all scenarios, while it only needs to be included in missingness model if the missingness depends on the outcome. We illustrate these methods using an example from prostate cancer.

Original language	English (US)
Pages (from-to)	3477-3498
Number of pages	22
Journal	Statistics in Medicine
Volume	40
Issue number	15
DOIs	https://doi.org/10.1002/sim.8978
State	Published - Jul 10 2021

Keywords

Brier score
area under the ROC curve
augmented inverse probability weighting
inverse probability weighting
multiple imputation

ASJC Scopus subject areas

Epidemiology
Statistics and Probability

Access to Document

10.1002/sim.8978

Cite this

@article{94ad177ba64b482e8492418c76ea1d69,

title = "Evaluation of predictive model performance of an existing model in the presence of missing data",

abstract = "In medical research, the Brier score (BS) and the area under the receiver operating characteristic (ROC) curves (AUC) are two common metrics used to evaluate prediction models of a binary outcome, such as using biomarkers to predict the risk of developing a disease in the future. The assessment of an existing prediction models using data with missing covariate values is challenging. In this article, we propose inverse probability weighted (IPW) and augmented inverse probability weighted (AIPW) estimates of AUC and BS to handle the missing data. An alternative approach uses multiple imputation (MI), which requires a model for the distribution of the missing variable. We evaluated the performance of IPW and AIPW in comparison with MI in simulation studies under missing completely at random, missing at random, and missing not at random scenarios. When there are missing observations in the data, MI and IPW can be used to obtain unbiased estimates of BS and AUC if the imputation model for the missing variable or the model for the missingness is correctly specified. MI is more efficient than IPW. Our simulation results suggest that AIPW can be more efficient than IPW, and also achieves double robustness from miss-specification of either the missingness model or the imputation model. The outcome variable should be included in the model for the missing variable under all scenarios, while it only needs to be included in missingness model if the missingness depends on the outcome. We illustrate these methods using an example from prostate cancer.",

keywords = "Brier score, area under the ROC curve, augmented inverse probability weighting, inverse probability weighting, multiple imputation",

author = "Pin Li and Taylor, {Jeremy M.G.} and Spratt, {Daniel E.} and Karnes, {R. Jeffery} and Schipper, {Matthew J.}",

note = "Publisher Copyright: {\textcopyright} 2021 John Wiley & Sons Ltd.",

year = "2021",

month = jul,

day = "10",

doi = "10.1002/sim.8978",

language = "English (US)",

volume = "40",

pages = "3477--3498",

journal = "Statistics in Medicine",

issn = "0277-6715",

publisher = "John Wiley and Sons Ltd",

number = "15",

}

TY - JOUR

T1 - Evaluation of predictive model performance of an existing model in the presence of missing data

AU - Li, Pin

AU - Taylor, Jeremy M.G.

AU - Spratt, Daniel E.

AU - Karnes, R. Jeffery

AU - Schipper, Matthew J.

PY - 2021/7/10

Y1 - 2021/7/10

N2 - In medical research, the Brier score (BS) and the area under the receiver operating characteristic (ROC) curves (AUC) are two common metrics used to evaluate prediction models of a binary outcome, such as using biomarkers to predict the risk of developing a disease in the future. The assessment of an existing prediction models using data with missing covariate values is challenging. In this article, we propose inverse probability weighted (IPW) and augmented inverse probability weighted (AIPW) estimates of AUC and BS to handle the missing data. An alternative approach uses multiple imputation (MI), which requires a model for the distribution of the missing variable. We evaluated the performance of IPW and AIPW in comparison with MI in simulation studies under missing completely at random, missing at random, and missing not at random scenarios. When there are missing observations in the data, MI and IPW can be used to obtain unbiased estimates of BS and AUC if the imputation model for the missing variable or the model for the missingness is correctly specified. MI is more efficient than IPW. Our simulation results suggest that AIPW can be more efficient than IPW, and also achieves double robustness from miss-specification of either the missingness model or the imputation model. The outcome variable should be included in the model for the missing variable under all scenarios, while it only needs to be included in missingness model if the missingness depends on the outcome. We illustrate these methods using an example from prostate cancer.

AB - In medical research, the Brier score (BS) and the area under the receiver operating characteristic (ROC) curves (AUC) are two common metrics used to evaluate prediction models of a binary outcome, such as using biomarkers to predict the risk of developing a disease in the future. The assessment of an existing prediction models using data with missing covariate values is challenging. In this article, we propose inverse probability weighted (IPW) and augmented inverse probability weighted (AIPW) estimates of AUC and BS to handle the missing data. An alternative approach uses multiple imputation (MI), which requires a model for the distribution of the missing variable. We evaluated the performance of IPW and AIPW in comparison with MI in simulation studies under missing completely at random, missing at random, and missing not at random scenarios. When there are missing observations in the data, MI and IPW can be used to obtain unbiased estimates of BS and AUC if the imputation model for the missing variable or the model for the missingness is correctly specified. MI is more efficient than IPW. Our simulation results suggest that AIPW can be more efficient than IPW, and also achieves double robustness from miss-specification of either the missingness model or the imputation model. The outcome variable should be included in the model for the missing variable under all scenarios, while it only needs to be included in missingness model if the missingness depends on the outcome. We illustrate these methods using an example from prostate cancer.

KW - Brier score

KW - area under the ROC curve

KW - augmented inverse probability weighting

KW - inverse probability weighting

KW - multiple imputation

UR - http://www.scopus.com/inward/record.url?scp=85104142254&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85104142254&partnerID=8YFLogxK

U2 - 10.1002/sim.8978

DO - 10.1002/sim.8978

M3 - Article

C2 - 33843085

AN - SCOPUS:85104142254

SN - 0277-6715

VL - 40

SP - 3477

EP - 3498

JO - Statistics in Medicine

JF - Statistics in Medicine

IS - 15

ER -

Evaluation of predictive model performance of an existing model in the presence of missing data

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this