Super learning: An application to the prediction of HIV-1 drug resistance

Sandra E. Sinisi, Eric Polley, Maya L. Petersen, Soo Yon Rhee, Mark J. Van Der Laan

Research output: Contribution to journalArticle

26 Citations (Scopus)

Abstract

Many alternative data-adaptive algorithms can be used to learn a predictor based on observed data. Examples of such learners include decision trees, neural networks, support vector regression, least angle regression, logic regression, and the Deletion/Substitution/Addition algorithm. The optimal learner for prediction will vary depending on the underlying data-generating distribution. In this article we introduce the "super learner", a prediction algorithm that applies any set of candidate learners and uses cross-validation to select between them. Theory shows that asymptotically the super learner performs essentially as well as or better than any of the candidate learners. In this article we present the theory behind the super learner, and illustrate its performance using simulations. We further apply the super learner to a data example, in which we predict the phenotypic antiretroviral susceptibility of HIV based on viral genotype. Specifically, we apply the super learner to predict susceptibility to a specific protease inhibitor, nelfinavir, using a set of database-derived non-polymorphic treatment-selected mutations.

Original languageEnglish (US)
Article number7
Pages (from-to)1-24
Number of pages24
JournalStatistical Applications in Genetics and Molecular Biology
Volume6
Issue number1
DOIs
StatePublished - Feb 23 2007
Externally publishedYes

Fingerprint

Drug Resistance
HIV-1
Learning
Prediction
Decision trees
Adaptive algorithms
Nelfinavir
Susceptibility
Decision Trees
Substitution reactions
Regression
Protease Inhibitors
Neural networks
Predict
Support Vector Regression
Protease
Adaptive Algorithm
Genotype
Cross-validation
Decision tree

Keywords

  • Antiretroviral
  • Cross-validation
  • Genomics
  • Loss-based estimation
  • Machine learning

ASJC Scopus subject areas

  • Statistics and Probability
  • Molecular Biology
  • Genetics
  • Computational Mathematics

Cite this

Super learning : An application to the prediction of HIV-1 drug resistance. / Sinisi, Sandra E.; Polley, Eric; Petersen, Maya L.; Rhee, Soo Yon; Van Der Laan, Mark J.

In: Statistical Applications in Genetics and Molecular Biology, Vol. 6, No. 1, 7, 23.02.2007, p. 1-24.

Research output: Contribution to journalArticle

Sinisi, Sandra E. ; Polley, Eric ; Petersen, Maya L. ; Rhee, Soo Yon ; Van Der Laan, Mark J. / Super learning : An application to the prediction of HIV-1 drug resistance. In: Statistical Applications in Genetics and Molecular Biology. 2007 ; Vol. 6, No. 1. pp. 1-24.
@article{1bb45fe65a12409f86fce3a46799ab2b,
title = "Super learning: An application to the prediction of HIV-1 drug resistance",
abstract = "Many alternative data-adaptive algorithms can be used to learn a predictor based on observed data. Examples of such learners include decision trees, neural networks, support vector regression, least angle regression, logic regression, and the Deletion/Substitution/Addition algorithm. The optimal learner for prediction will vary depending on the underlying data-generating distribution. In this article we introduce the {"}super learner{"}, a prediction algorithm that applies any set of candidate learners and uses cross-validation to select between them. Theory shows that asymptotically the super learner performs essentially as well as or better than any of the candidate learners. In this article we present the theory behind the super learner, and illustrate its performance using simulations. We further apply the super learner to a data example, in which we predict the phenotypic antiretroviral susceptibility of HIV based on viral genotype. Specifically, we apply the super learner to predict susceptibility to a specific protease inhibitor, nelfinavir, using a set of database-derived non-polymorphic treatment-selected mutations.",
keywords = "Antiretroviral, Cross-validation, Genomics, Loss-based estimation, Machine learning",
author = "Sinisi, {Sandra E.} and Eric Polley and Petersen, {Maya L.} and Rhee, {Soo Yon} and {Van Der Laan}, {Mark J.}",
year = "2007",
month = "2",
day = "23",
doi = "10.2202/1544-6115.1240",
language = "English (US)",
volume = "6",
pages = "1--24",
journal = "Statistical Applications in Genetics and Molecular Biology",
issn = "1544-6115",
publisher = "Berkeley Electronic Press",
number = "1",

}

TY - JOUR

T1 - Super learning

T2 - An application to the prediction of HIV-1 drug resistance

AU - Sinisi, Sandra E.

AU - Polley, Eric

AU - Petersen, Maya L.

AU - Rhee, Soo Yon

AU - Van Der Laan, Mark J.

PY - 2007/2/23

Y1 - 2007/2/23

N2 - Many alternative data-adaptive algorithms can be used to learn a predictor based on observed data. Examples of such learners include decision trees, neural networks, support vector regression, least angle regression, logic regression, and the Deletion/Substitution/Addition algorithm. The optimal learner for prediction will vary depending on the underlying data-generating distribution. In this article we introduce the "super learner", a prediction algorithm that applies any set of candidate learners and uses cross-validation to select between them. Theory shows that asymptotically the super learner performs essentially as well as or better than any of the candidate learners. In this article we present the theory behind the super learner, and illustrate its performance using simulations. We further apply the super learner to a data example, in which we predict the phenotypic antiretroviral susceptibility of HIV based on viral genotype. Specifically, we apply the super learner to predict susceptibility to a specific protease inhibitor, nelfinavir, using a set of database-derived non-polymorphic treatment-selected mutations.

AB - Many alternative data-adaptive algorithms can be used to learn a predictor based on observed data. Examples of such learners include decision trees, neural networks, support vector regression, least angle regression, logic regression, and the Deletion/Substitution/Addition algorithm. The optimal learner for prediction will vary depending on the underlying data-generating distribution. In this article we introduce the "super learner", a prediction algorithm that applies any set of candidate learners and uses cross-validation to select between them. Theory shows that asymptotically the super learner performs essentially as well as or better than any of the candidate learners. In this article we present the theory behind the super learner, and illustrate its performance using simulations. We further apply the super learner to a data example, in which we predict the phenotypic antiretroviral susceptibility of HIV based on viral genotype. Specifically, we apply the super learner to predict susceptibility to a specific protease inhibitor, nelfinavir, using a set of database-derived non-polymorphic treatment-selected mutations.

KW - Antiretroviral

KW - Cross-validation

KW - Genomics

KW - Loss-based estimation

KW - Machine learning

UR - http://www.scopus.com/inward/record.url?scp=33847387164&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33847387164&partnerID=8YFLogxK

U2 - 10.2202/1544-6115.1240

DO - 10.2202/1544-6115.1240

M3 - Article

C2 - 17402922

AN - SCOPUS:33847387164

VL - 6

SP - 1

EP - 24

JO - Statistical Applications in Genetics and Molecular Biology

JF - Statistical Applications in Genetics and Molecular Biology

SN - 1544-6115

IS - 1

M1 - 7

ER -