Super learner

Mark J. Van Der Laan, Eric C. Polley, Alan E. Hubbard

Research output: Contribution to journalArticle

460 Scopus citations

Abstract

When trying to learn a model for the prediction of an outcome given a set of covariates, a statistician has many estimation procedures in their toolbox. A few examples of these candidate learners are: least squares, least angle regression, random forests, and spline regression. Previous articles (van der Laan and Dudoit (2003); van der Laan et al. (2006); Sinisi et al. (2007)) theoretically validated the use of cross validation to select an optimal learner among many candidate learners. Motivated by this use of cross validation, we propose a new prediction method for creating a weighted combination of many candidate learners to build the super learner. This article proposes a fast algorithm for constructing a super learner in prediction which uses V-fold cross-validation to select weights to combine an initial set of candidate learners. In addition, this paper contains a practical demonstration of the adaptivity of this so called super learner to various true data generating distributions. This approach for construction of a super learner generalizes to any parameter which can be defined as a minimizer of a loss function.

Original languageEnglish (US)
Article number25
JournalStatistical Applications in Genetics and Molecular Biology
Volume6
Issue number1
DOIs
StatePublished - Sep 16 2007

Keywords

  • Cross-validation
  • Loss-based estimation
  • Machine learning
  • Prediction

ASJC Scopus subject areas

  • Statistics and Probability
  • Molecular Biology
  • Genetics
  • Computational Mathematics

Fingerprint Dive into the research topics of 'Super learner'. Together they form a unique fingerprint.

  • Cite this