Q- and A-learning methods for estimating optimal dynamic treatment regimes

Phillip J. Schulte; Anastasios A. Tsiatis; Eric B. Laber; Marie Davidian

doi:10.1214/13-STS450

Q- and A-learning methods for estimating optimal dynamic treatment regimes

Phillip J. Schulte, Anastasios A. Tsiatis, Eric B. Laber, Marie Davidian

Quantitative Health Sciences

Research output: Contribution to journal › Article › peer-review

82 Scopus citations

Abstract

In clinical practice, physicians make a series of treatment decisions over the course of a patient's disease based on his/her baseline and evolving characteristics. A dynamic treatment regime is a set of sequential decision rules that operationalizes this process. Each rule corresponds to a decision point and dictates the next treatment action based on the accrued information. Using existing data, a key goal is estimating the optimal regime, that, if followed by the patient population, would yield the most favorable outcome on average. Q- and A-learning are two main approaches for this purpose. We provide a detailed account of these methods, study their performance, and illustrate them using data from a depression study.

Original language	English (US)
Pages (from-to)	640-661
Number of pages	22
Journal	Statistical Science
Volume	29
Issue number	4
DOIs	https://doi.org/10.1214/13-STS450
State	Published - 2014

Keywords

Advantage learning
Bias-variance trade-off
Model misspecification
Personalized medicine
Potential outcomes
Sequential decision-making

ASJC Scopus subject areas

Statistics and Probability
General Mathematics
Statistics, Probability and Uncertainty

Access to Document

10.1214/13-STS450

Cite this

@article{8bb968c90a394995a5576731dcb337fb,

title = "Q- and A-learning methods for estimating optimal dynamic treatment regimes",

abstract = "In clinical practice, physicians make a series of treatment decisions over the course of a patient's disease based on his/her baseline and evolving characteristics. A dynamic treatment regime is a set of sequential decision rules that operationalizes this process. Each rule corresponds to a decision point and dictates the next treatment action based on the accrued information. Using existing data, a key goal is estimating the optimal regime, that, if followed by the patient population, would yield the most favorable outcome on average. Q- and A-learning are two main approaches for this purpose. We provide a detailed account of these methods, study their performance, and illustrate them using data from a depression study.",

keywords = "Advantage learning, Bias-variance trade-off, Model misspecification, Personalized medicine, Potential outcomes, Sequential decision-making",

author = "Schulte, {Phillip J.} and Tsiatis, {Anastasios A.} and Laber, {Eric B.} and Marie Davidian",

note = "Publisher Copyright: {\textcopyright} Institute of Mathematical Statistics, 2014.",

year = "2014",

doi = "10.1214/13-STS450",

language = "English (US)",

volume = "29",

pages = "640--661",

journal = "Statistical Science",

issn = "0883-4237",

publisher = "Institute of Mathematical Statistics",

number = "4",

}

TY - JOUR

T1 - Q- and A-learning methods for estimating optimal dynamic treatment regimes

AU - Schulte, Phillip J.

AU - Tsiatis, Anastasios A.

AU - Laber, Eric B.

AU - Davidian, Marie

N1 - Publisher Copyright: © Institute of Mathematical Statistics, 2014.

PY - 2014

Y1 - 2014

N2 - In clinical practice, physicians make a series of treatment decisions over the course of a patient's disease based on his/her baseline and evolving characteristics. A dynamic treatment regime is a set of sequential decision rules that operationalizes this process. Each rule corresponds to a decision point and dictates the next treatment action based on the accrued information. Using existing data, a key goal is estimating the optimal regime, that, if followed by the patient population, would yield the most favorable outcome on average. Q- and A-learning are two main approaches for this purpose. We provide a detailed account of these methods, study their performance, and illustrate them using data from a depression study.

AB - In clinical practice, physicians make a series of treatment decisions over the course of a patient's disease based on his/her baseline and evolving characteristics. A dynamic treatment regime is a set of sequential decision rules that operationalizes this process. Each rule corresponds to a decision point and dictates the next treatment action based on the accrued information. Using existing data, a key goal is estimating the optimal regime, that, if followed by the patient population, would yield the most favorable outcome on average. Q- and A-learning are two main approaches for this purpose. We provide a detailed account of these methods, study their performance, and illustrate them using data from a depression study.

KW - Advantage learning

KW - Bias-variance trade-off

KW - Model misspecification

KW - Personalized medicine

KW - Potential outcomes

KW - Sequential decision-making

UR - http://www.scopus.com/inward/record.url?scp=84921477485&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84921477485&partnerID=8YFLogxK

U2 - 10.1214/13-STS450

DO - 10.1214/13-STS450

M3 - Article

AN - SCOPUS:84921477485

SN - 0883-4237

VL - 29

SP - 640

EP - 661

JO - Statistical Science

JF - Statistical Science

IS - 4

ER -

Q- and A-learning methods for estimating optimal dynamic treatment regimes

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this