Super learning: An application to the prediction of HIV-1 drug resistance

Sandra E. Sinisi; Eric C. Polley; Maya L. Petersen; Soo Yon Rhee; Mark J. Van Der Laan

doi:10.2202/1544-6115.1240

Super learning: An application to the prediction of HIV-1 drug resistance

Sandra E. Sinisi, Eric C. Polley, Maya L. Petersen, Soo Yon Rhee, Mark J. Van Der Laan

Quantitative Health Sciences

Research output: Contribution to journal › Article › peer-review

33 Scopus citations

Abstract

Many alternative data-adaptive algorithms can be used to learn a predictor based on observed data. Examples of such learners include decision trees, neural networks, support vector regression, least angle regression, logic regression, and the Deletion/Substitution/Addition algorithm. The optimal learner for prediction will vary depending on the underlying data-generating distribution. In this article we introduce the "super learner", a prediction algorithm that applies any set of candidate learners and uses cross-validation to select between them. Theory shows that asymptotically the super learner performs essentially as well as or better than any of the candidate learners. In this article we present the theory behind the super learner, and illustrate its performance using simulations. We further apply the super learner to a data example, in which we predict the phenotypic antiretroviral susceptibility of HIV based on viral genotype. Specifically, we apply the super learner to predict susceptibility to a specific protease inhibitor, nelfinavir, using a set of database-derived non-polymorphic treatment-selected mutations.

Original language	English (US)
Article number	7
Pages (from-to)	1-24
Number of pages	24
Journal	Statistical Applications in Genetics and Molecular Biology
Volume	6
Issue number	1
DOIs	https://doi.org/10.2202/1544-6115.1240
State	Published - Feb 23 2007

Keywords

Antiretroviral
Cross-validation
Genomics
Loss-based estimation
Machine learning

ASJC Scopus subject areas

Statistics and Probability
Molecular Biology
Genetics
Computational Mathematics

Access to Document

10.2202/1544-6115.1240

Cite this

@article{1bb45fe65a12409f86fce3a46799ab2b,

title = "Super learning: An application to the prediction of HIV-1 drug resistance",

abstract = "Many alternative data-adaptive algorithms can be used to learn a predictor based on observed data. Examples of such learners include decision trees, neural networks, support vector regression, least angle regression, logic regression, and the Deletion/Substitution/Addition algorithm. The optimal learner for prediction will vary depending on the underlying data-generating distribution. In this article we introduce the {"}super learner{"}, a prediction algorithm that applies any set of candidate learners and uses cross-validation to select between them. Theory shows that asymptotically the super learner performs essentially as well as or better than any of the candidate learners. In this article we present the theory behind the super learner, and illustrate its performance using simulations. We further apply the super learner to a data example, in which we predict the phenotypic antiretroviral susceptibility of HIV based on viral genotype. Specifically, we apply the super learner to predict susceptibility to a specific protease inhibitor, nelfinavir, using a set of database-derived non-polymorphic treatment-selected mutations.",

keywords = "Antiretroviral, Cross-validation, Genomics, Loss-based estimation, Machine learning",

author = "Sinisi, {Sandra E.} and Polley, {Eric C.} and Petersen, {Maya L.} and Rhee, {Soo Yon} and {Van Der Laan}, {Mark J.}",

year = "2007",

month = feb,

day = "23",

doi = "10.2202/1544-6115.1240",

language = "English (US)",

volume = "6",

pages = "1--24",

journal = "Statistical Applications in Genetics and Molecular Biology",

issn = "1544-6115",

publisher = "Berkeley Electronic Press",

number = "1",

}

TY - JOUR

T1 - Super learning

T2 - An application to the prediction of HIV-1 drug resistance

AU - Sinisi, Sandra E.

AU - Polley, Eric C.

AU - Petersen, Maya L.

AU - Rhee, Soo Yon

AU - Van Der Laan, Mark J.

PY - 2007/2/23

Y1 - 2007/2/23

N2 - Many alternative data-adaptive algorithms can be used to learn a predictor based on observed data. Examples of such learners include decision trees, neural networks, support vector regression, least angle regression, logic regression, and the Deletion/Substitution/Addition algorithm. The optimal learner for prediction will vary depending on the underlying data-generating distribution. In this article we introduce the "super learner", a prediction algorithm that applies any set of candidate learners and uses cross-validation to select between them. Theory shows that asymptotically the super learner performs essentially as well as or better than any of the candidate learners. In this article we present the theory behind the super learner, and illustrate its performance using simulations. We further apply the super learner to a data example, in which we predict the phenotypic antiretroviral susceptibility of HIV based on viral genotype. Specifically, we apply the super learner to predict susceptibility to a specific protease inhibitor, nelfinavir, using a set of database-derived non-polymorphic treatment-selected mutations.

AB - Many alternative data-adaptive algorithms can be used to learn a predictor based on observed data. Examples of such learners include decision trees, neural networks, support vector regression, least angle regression, logic regression, and the Deletion/Substitution/Addition algorithm. The optimal learner for prediction will vary depending on the underlying data-generating distribution. In this article we introduce the "super learner", a prediction algorithm that applies any set of candidate learners and uses cross-validation to select between them. Theory shows that asymptotically the super learner performs essentially as well as or better than any of the candidate learners. In this article we present the theory behind the super learner, and illustrate its performance using simulations. We further apply the super learner to a data example, in which we predict the phenotypic antiretroviral susceptibility of HIV based on viral genotype. Specifically, we apply the super learner to predict susceptibility to a specific protease inhibitor, nelfinavir, using a set of database-derived non-polymorphic treatment-selected mutations.

KW - Antiretroviral

KW - Cross-validation

KW - Genomics

KW - Loss-based estimation

KW - Machine learning

UR - http://www.scopus.com/inward/record.url?scp=33847387164&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33847387164&partnerID=8YFLogxK

U2 - 10.2202/1544-6115.1240

DO - 10.2202/1544-6115.1240

M3 - Article

C2 - 17402922

AN - SCOPUS:33847387164

SN - 1544-6115

VL - 6

SP - 1

EP - 24

JO - Statistical Applications in Genetics and Molecular Biology

JF - Statistical Applications in Genetics and Molecular Biology

IS - 1

M1 - 7

ER -

Super learning: An application to the prediction of HIV-1 drug resistance

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this