### Abstract

When trying to learn a model for the prediction of an outcome given a set of covariates, a statistician has many estimation procedures in their toolbox. A few examples of these candidate learners are: least squares, least angle regression, random forests, and spline regression. Previous articles (van der Laan and Dudoit (2003); van der Laan et al. (2006); Sinisi et al. (2007)) theoretically validated the use of cross validation to select an optimal learner among many candidate learners. Motivated by this use of cross validation, we propose a new prediction method for creating a weighted combination of many candidate learners to build the super learner. This article proposes a fast algorithm for constructing a super learner in prediction which uses V-fold cross-validation to select weights to combine an initial set of candidate learners. In addition, this paper contains a practical demonstration of the adaptivity of this so called super learner to various true data generating distributions. This approach for construction of a super learner generalizes to any parameter which can be defined as a minimizer of a loss function.

Original language | English (US) |
---|---|

Article number | 25 |

Journal | Statistical Applications in Genetics and Molecular Biology |

Volume | 6 |

Issue number | 1 |

State | Published - Sep 16 2007 |

Externally published | Yes |

### Fingerprint

### Keywords

- Cross-validation
- Loss-based estimation
- Machine learning
- Prediction

### ASJC Scopus subject areas

- Statistics and Probability
- Molecular Biology
- Genetics
- Computational Mathematics

### Cite this

*Statistical Applications in Genetics and Molecular Biology*,

*6*(1), [25].

**Super learner.** / Van Der Laan, Mark J.; Polley, Eric; Hubbard, Alan E.

Research output: Contribution to journal › Article

*Statistical Applications in Genetics and Molecular Biology*, vol. 6, no. 1, 25.

}

TY - JOUR

T1 - Super learner

AU - Van Der Laan, Mark J.

AU - Polley, Eric

AU - Hubbard, Alan E.

PY - 2007/9/16

Y1 - 2007/9/16

N2 - When trying to learn a model for the prediction of an outcome given a set of covariates, a statistician has many estimation procedures in their toolbox. A few examples of these candidate learners are: least squares, least angle regression, random forests, and spline regression. Previous articles (van der Laan and Dudoit (2003); van der Laan et al. (2006); Sinisi et al. (2007)) theoretically validated the use of cross validation to select an optimal learner among many candidate learners. Motivated by this use of cross validation, we propose a new prediction method for creating a weighted combination of many candidate learners to build the super learner. This article proposes a fast algorithm for constructing a super learner in prediction which uses V-fold cross-validation to select weights to combine an initial set of candidate learners. In addition, this paper contains a practical demonstration of the adaptivity of this so called super learner to various true data generating distributions. This approach for construction of a super learner generalizes to any parameter which can be defined as a minimizer of a loss function.

AB - When trying to learn a model for the prediction of an outcome given a set of covariates, a statistician has many estimation procedures in their toolbox. A few examples of these candidate learners are: least squares, least angle regression, random forests, and spline regression. Previous articles (van der Laan and Dudoit (2003); van der Laan et al. (2006); Sinisi et al. (2007)) theoretically validated the use of cross validation to select an optimal learner among many candidate learners. Motivated by this use of cross validation, we propose a new prediction method for creating a weighted combination of many candidate learners to build the super learner. This article proposes a fast algorithm for constructing a super learner in prediction which uses V-fold cross-validation to select weights to combine an initial set of candidate learners. In addition, this paper contains a practical demonstration of the adaptivity of this so called super learner to various true data generating distributions. This approach for construction of a super learner generalizes to any parameter which can be defined as a minimizer of a loss function.

KW - Cross-validation

KW - Loss-based estimation

KW - Machine learning

KW - Prediction

UR - http://www.scopus.com/inward/record.url?scp=34548705586&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34548705586&partnerID=8YFLogxK

M3 - Article

C2 - 17910531

AN - SCOPUS:34548705586

VL - 6

JO - Statistical Applications in Genetics and Molecular Biology

JF - Statistical Applications in Genetics and Molecular Biology

SN - 1544-6115

IS - 1

M1 - 25

ER -