The effect of retrospective sampling on estimates of prediction error for multifactor dimensionality reduction

Stacey J Winham, Alison A. Motsinger-Reif

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

The standard in genetic association studies of complex diseases is replication and validation of positive results, with an emphasis on assessing the predictive value of associations. In response to this need, a number of analytical approaches have been developed to identify predictive models that account for complex genetic etiologies. Multifactor Dimensionality Reduction (MDR) is a commonly used, highly successful method designed to evaluate potential gene-gene interactions. MDR relies on classification error in a cross-validation framework to rank and evaluate potentially predictive models. Previous work has demonstrated the high power of MDR, but has not considered the accuracy and variance of the MDR prediction error estimate. Currently, we evaluate the bias and variance of the MDR error estimate as both a retrospective and prospective estimator and show that MDR can both underestimate and overestimate error. We argue that a prospective error estimate is necessary if MDR models are used for prediction, and propose a bootstrap resampling estimate, integrating population prevalence, to accurately estimate prospective error. We demonstrate that this bootstrap estimate is preferable for prediction to the error estimate currently produced by MDR. While demonstrated with MDR, the proposed estimation is applicable to all data-mining methods that use similar estimates.

Original languageEnglish (US)
Pages (from-to)46-61
Number of pages16
JournalAnnals of Human Genetics
Volume75
Issue number1
DOIs
StatePublished - Jan 2011
Externally publishedYes

Fingerprint

Multifactor Dimensionality Reduction
Data Mining
Genetic Association Studies
Genes

Keywords

  • Bias
  • Epistasis
  • Gene-gene interaction
  • Prediction error
  • Retrospective and prospective sampling
  • Variance

ASJC Scopus subject areas

  • Genetics(clinical)
  • Genetics

Cite this

The effect of retrospective sampling on estimates of prediction error for multifactor dimensionality reduction. / Winham, Stacey J; Motsinger-Reif, Alison A.

In: Annals of Human Genetics, Vol. 75, No. 1, 01.2011, p. 46-61.

Research output: Contribution to journalArticle

@article{55f568b10c9e40b39dfd9c3e46b3d705,
title = "The effect of retrospective sampling on estimates of prediction error for multifactor dimensionality reduction",
abstract = "The standard in genetic association studies of complex diseases is replication and validation of positive results, with an emphasis on assessing the predictive value of associations. In response to this need, a number of analytical approaches have been developed to identify predictive models that account for complex genetic etiologies. Multifactor Dimensionality Reduction (MDR) is a commonly used, highly successful method designed to evaluate potential gene-gene interactions. MDR relies on classification error in a cross-validation framework to rank and evaluate potentially predictive models. Previous work has demonstrated the high power of MDR, but has not considered the accuracy and variance of the MDR prediction error estimate. Currently, we evaluate the bias and variance of the MDR error estimate as both a retrospective and prospective estimator and show that MDR can both underestimate and overestimate error. We argue that a prospective error estimate is necessary if MDR models are used for prediction, and propose a bootstrap resampling estimate, integrating population prevalence, to accurately estimate prospective error. We demonstrate that this bootstrap estimate is preferable for prediction to the error estimate currently produced by MDR. While demonstrated with MDR, the proposed estimation is applicable to all data-mining methods that use similar estimates.",
keywords = "Bias, Epistasis, Gene-gene interaction, Prediction error, Retrospective and prospective sampling, Variance",
author = "Winham, {Stacey J} and Motsinger-Reif, {Alison A.}",
year = "2011",
month = "1",
doi = "10.1111/j.1469-1809.2010.00587.x",
language = "English (US)",
volume = "75",
pages = "46--61",
journal = "Annals of Human Genetics",
issn = "0003-4800",
publisher = "Wiley-Blackwell",
number = "1",

}

TY - JOUR

T1 - The effect of retrospective sampling on estimates of prediction error for multifactor dimensionality reduction

AU - Winham, Stacey J

AU - Motsinger-Reif, Alison A.

PY - 2011/1

Y1 - 2011/1

N2 - The standard in genetic association studies of complex diseases is replication and validation of positive results, with an emphasis on assessing the predictive value of associations. In response to this need, a number of analytical approaches have been developed to identify predictive models that account for complex genetic etiologies. Multifactor Dimensionality Reduction (MDR) is a commonly used, highly successful method designed to evaluate potential gene-gene interactions. MDR relies on classification error in a cross-validation framework to rank and evaluate potentially predictive models. Previous work has demonstrated the high power of MDR, but has not considered the accuracy and variance of the MDR prediction error estimate. Currently, we evaluate the bias and variance of the MDR error estimate as both a retrospective and prospective estimator and show that MDR can both underestimate and overestimate error. We argue that a prospective error estimate is necessary if MDR models are used for prediction, and propose a bootstrap resampling estimate, integrating population prevalence, to accurately estimate prospective error. We demonstrate that this bootstrap estimate is preferable for prediction to the error estimate currently produced by MDR. While demonstrated with MDR, the proposed estimation is applicable to all data-mining methods that use similar estimates.

AB - The standard in genetic association studies of complex diseases is replication and validation of positive results, with an emphasis on assessing the predictive value of associations. In response to this need, a number of analytical approaches have been developed to identify predictive models that account for complex genetic etiologies. Multifactor Dimensionality Reduction (MDR) is a commonly used, highly successful method designed to evaluate potential gene-gene interactions. MDR relies on classification error in a cross-validation framework to rank and evaluate potentially predictive models. Previous work has demonstrated the high power of MDR, but has not considered the accuracy and variance of the MDR prediction error estimate. Currently, we evaluate the bias and variance of the MDR error estimate as both a retrospective and prospective estimator and show that MDR can both underestimate and overestimate error. We argue that a prospective error estimate is necessary if MDR models are used for prediction, and propose a bootstrap resampling estimate, integrating population prevalence, to accurately estimate prospective error. We demonstrate that this bootstrap estimate is preferable for prediction to the error estimate currently produced by MDR. While demonstrated with MDR, the proposed estimation is applicable to all data-mining methods that use similar estimates.

KW - Bias

KW - Epistasis

KW - Gene-gene interaction

KW - Prediction error

KW - Retrospective and prospective sampling

KW - Variance

UR - http://www.scopus.com/inward/record.url?scp=78650150426&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78650150426&partnerID=8YFLogxK

U2 - 10.1111/j.1469-1809.2010.00587.x

DO - 10.1111/j.1469-1809.2010.00587.x

M3 - Article

C2 - 20560921

AN - SCOPUS:78650150426

VL - 75

SP - 46

EP - 61

JO - Annals of Human Genetics

JF - Annals of Human Genetics

SN - 0003-4800

IS - 1

ER -