Super learner

Mark J. Van Der Laan, Eric Polley, Alan E. Hubbard

Research output: Contribution to journalArticle

362 Citations (Scopus)

Abstract

When trying to learn a model for the prediction of an outcome given a set of covariates, a statistician has many estimation procedures in their toolbox. A few examples of these candidate learners are: least squares, least angle regression, random forests, and spline regression. Previous articles (van der Laan and Dudoit (2003); van der Laan et al. (2006); Sinisi et al. (2007)) theoretically validated the use of cross validation to select an optimal learner among many candidate learners. Motivated by this use of cross validation, we propose a new prediction method for creating a weighted combination of many candidate learners to build the super learner. This article proposes a fast algorithm for constructing a super learner in prediction which uses V-fold cross-validation to select weights to combine an initial set of candidate learners. In addition, this paper contains a practical demonstration of the adaptivity of this so called super learner to various true data generating distributions. This approach for construction of a super learner generalizes to any parameter which can be defined as a minimizer of a loss function.

Original languageEnglish (US)
Article number25
JournalStatistical Applications in Genetics and Molecular Biology
Volume6
Issue number1
StatePublished - Sep 16 2007
Externally publishedYes

Fingerprint

Cross-validation
Prediction
Least-Squares Analysis
Regression Splines
Random Forest
Adaptivity
Loss Function
Minimizer
Weights and Measures
Splines
Fast Algorithm
Least Squares
Covariates
Fold
Demonstrations
Regression
Angle
Generalise
Model
Forests

Keywords

  • Cross-validation
  • Loss-based estimation
  • Machine learning
  • Prediction

ASJC Scopus subject areas

  • Statistics and Probability
  • Molecular Biology
  • Genetics
  • Computational Mathematics

Cite this

Van Der Laan, M. J., Polley, E., & Hubbard, A. E. (2007). Super learner. Statistical Applications in Genetics and Molecular Biology, 6(1), [25].

Super learner. / Van Der Laan, Mark J.; Polley, Eric; Hubbard, Alan E.

In: Statistical Applications in Genetics and Molecular Biology, Vol. 6, No. 1, 25, 16.09.2007.

Research output: Contribution to journalArticle

Van Der Laan, MJ, Polley, E & Hubbard, AE 2007, 'Super learner', Statistical Applications in Genetics and Molecular Biology, vol. 6, no. 1, 25.
Van Der Laan MJ, Polley E, Hubbard AE. Super learner. Statistical Applications in Genetics and Molecular Biology. 2007 Sep 16;6(1). 25.
Van Der Laan, Mark J. ; Polley, Eric ; Hubbard, Alan E. / Super learner. In: Statistical Applications in Genetics and Molecular Biology. 2007 ; Vol. 6, No. 1.
@article{58654da1eb134a08b57359414331916c,
title = "Super learner",
abstract = "When trying to learn a model for the prediction of an outcome given a set of covariates, a statistician has many estimation procedures in their toolbox. A few examples of these candidate learners are: least squares, least angle regression, random forests, and spline regression. Previous articles (van der Laan and Dudoit (2003); van der Laan et al. (2006); Sinisi et al. (2007)) theoretically validated the use of cross validation to select an optimal learner among many candidate learners. Motivated by this use of cross validation, we propose a new prediction method for creating a weighted combination of many candidate learners to build the super learner. This article proposes a fast algorithm for constructing a super learner in prediction which uses V-fold cross-validation to select weights to combine an initial set of candidate learners. In addition, this paper contains a practical demonstration of the adaptivity of this so called super learner to various true data generating distributions. This approach for construction of a super learner generalizes to any parameter which can be defined as a minimizer of a loss function.",
keywords = "Cross-validation, Loss-based estimation, Machine learning, Prediction",
author = "{Van Der Laan}, {Mark J.} and Eric Polley and Hubbard, {Alan E.}",
year = "2007",
month = "9",
day = "16",
language = "English (US)",
volume = "6",
journal = "Statistical Applications in Genetics and Molecular Biology",
issn = "1544-6115",
publisher = "Berkeley Electronic Press",
number = "1",

}

TY - JOUR

T1 - Super learner

AU - Van Der Laan, Mark J.

AU - Polley, Eric

AU - Hubbard, Alan E.

PY - 2007/9/16

Y1 - 2007/9/16

N2 - When trying to learn a model for the prediction of an outcome given a set of covariates, a statistician has many estimation procedures in their toolbox. A few examples of these candidate learners are: least squares, least angle regression, random forests, and spline regression. Previous articles (van der Laan and Dudoit (2003); van der Laan et al. (2006); Sinisi et al. (2007)) theoretically validated the use of cross validation to select an optimal learner among many candidate learners. Motivated by this use of cross validation, we propose a new prediction method for creating a weighted combination of many candidate learners to build the super learner. This article proposes a fast algorithm for constructing a super learner in prediction which uses V-fold cross-validation to select weights to combine an initial set of candidate learners. In addition, this paper contains a practical demonstration of the adaptivity of this so called super learner to various true data generating distributions. This approach for construction of a super learner generalizes to any parameter which can be defined as a minimizer of a loss function.

AB - When trying to learn a model for the prediction of an outcome given a set of covariates, a statistician has many estimation procedures in their toolbox. A few examples of these candidate learners are: least squares, least angle regression, random forests, and spline regression. Previous articles (van der Laan and Dudoit (2003); van der Laan et al. (2006); Sinisi et al. (2007)) theoretically validated the use of cross validation to select an optimal learner among many candidate learners. Motivated by this use of cross validation, we propose a new prediction method for creating a weighted combination of many candidate learners to build the super learner. This article proposes a fast algorithm for constructing a super learner in prediction which uses V-fold cross-validation to select weights to combine an initial set of candidate learners. In addition, this paper contains a practical demonstration of the adaptivity of this so called super learner to various true data generating distributions. This approach for construction of a super learner generalizes to any parameter which can be defined as a minimizer of a loss function.

KW - Cross-validation

KW - Loss-based estimation

KW - Machine learning

KW - Prediction

UR - http://www.scopus.com/inward/record.url?scp=34548705586&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34548705586&partnerID=8YFLogxK

M3 - Article

C2 - 17910531

AN - SCOPUS:34548705586

VL - 6

JO - Statistical Applications in Genetics and Molecular Biology

JF - Statistical Applications in Genetics and Molecular Biology

SN - 1544-6115

IS - 1

M1 - 25

ER -