Modernizing Statistics ThroughTreatment Regimes
Marie Davidian
Department of StatisticsNorth Carolina State University
2019 ICSA Applied Statistics Symposium June 11, 2019
1/65
Outline
Treatment Regimes and Precision Medicine
A Brief History
Optimal Single Decision Regimes
Optimal Multiple Decision Regimes
Treatment Regimes in Practice
2/65
Premise
Patient heterogeneity:• Genetic/genomic profile• Demographic, physiological characteristics• Clinical variables• Medical history, concomitant conditions• Environment, lifestyle factors• Adverse reactions, adherence to prior treatments• Preference• . . .
Clinical decision-making:• Key decision points in the disease/disorder process• Multiple treatment options at each• A patient’s characteristics are implicated in which treatment
options s/he should receive
3/65
Example: Acute leukemia
Two decision points:• Decision 1 : Induction chemotherapy (2 options: C1, C2)• Decision 2 :
I Maintenance treatment for patients who respond(2 options: M1, M2)
I Salvage chemotherapy for those who don’t respond(2 options: S1, S2)
4/65
Clinical decision-making
How are treatment decisions made?• Clinical judgment , practice guidelines• Synthesis of all information on a patient up to the point of a
decision to determine next treatment action from among thefeasible options
• Goal: Make the “best ” decisions leading to the most beneficialexpected outcome for the patient
Precision medicine: Inform clinical decision-making and make itevidence-based• Evidence-based decision support
5/65
Informing clinical decision-making
Treatment regime:• A set of decision rules , each corresponding to a decision point• Static rule: Recommended treatment action does not depend on
patient information• Dynamic rule: Takes as input all available information on a
patient to that point and recommends treatment action fromamong the possible options
• Dynamic treatment regime , adaptive treatment strategy ,adaptive intervention , policy
• Formalizes clinical-decision making and defines an an algorithmfor treating any individual patient
6/65
Treatment regime
Simplest: E.g., acute leukemia• Decision 1: Give C1
• Decision 2: If response, give M2, if nonresponse, give S1
Individualized rules: More complex rules incorporating patientinformation• “Tailoring variables ”• Consistent with precision medicine
7/65
Treatment regime
For example: Acute leukemia• Decision 1:
If age < 50 years and WBC < 10.0 × 103/µl , givechemotherapy C2, otherwise, give C1
• Decision 2:
If patient responded and baseline WBC < 11.2, currentWBC < 10.5, no grade 3+ hematologic adverse event,current ECOG Performance Status ≤ 2, give maintenanceM1, otherwise, give M2; otherwise
If patient did not respond and age >60, current WBC <11.0, ECOG ≥ 2 give S1, otherwise, give S2
8/65
Two decision regime: Acute leukemia
• At baseline: Information x1, accrued information h1 = x1 ∈ H1
• Decision 1: Set of options A1 = {C1,C2}; rule 1: d1(h1): H1 → A1
• Between Decisions 1 and 2: Collect additional information x2, includingresponder status
• Accrued information h2 = (x1, chemotherapy at Decision 1, x2) ∈ H2
• Decision 2: Set of options A2 = {M1,M2,S1,S2}; rule 2:
d2(h2): H2 → {M1,M2} (responder), d2(h2): H2 → {S1,S2} (nonresponder)
• Treatment regime : d = {d1(h1),d2(h2)} = (d1,d2)
9/65
In general
Treatment regime with K decision points:• Baseline information x1 ∈ X1, intermediate information xk ∈ Xk
between Decisions k − 1 and k , k = 2, . . . ,K• Set of treatment options Ak at Decision k , elements ak ∈ Ak
• Accrued information or history
h1 = x1 ∈ H1
hk = (x1,a1, . . . , xk−1,ak−1, xk ) ∈ Hk , k = 2, . . . ,K ,
• Decision rules d1(h1),d2(h2), . . . ,dK (hK ), dk : Hk → Ak
• Treatment regime
d = {d1(h1), . . . ,dK (hK )} = (d1,d2, . . . ,dK )
10/65
Treatment regimes and precision medicine
Premise: There is an infinitude of possible regimes d• D = class of all possible treatment regimes• Given a health outcome of interest. . .• Can we define an optimal treatment regime in D formalizing the
clinician’s goal to make the “best ” decisions to achieve the mostbeneficial expected outcome for a patient?
• Can we estimate an optimal treatment regime from data ?• And thereby inform clinical-decision making and make it
evidence-based
Result: An explosion of statistical methodological research onestimation of (optimal) treatment regimes in the past decade
11/65
Outline
Treatment Regimes and Precision Medicine
A Brief History
Optimal Single Decision Regimes
Optimal Multiple Decision Regimes
Treatment Regimes in Practice
12/65
Causal inference framework
Jamie Robins
Robins, J. (1986). A new approach to causal inference in mortality studieswith sustained exposure period–application to control of the health workersurvivor effect. Mathematical Modeling, 7, 1393–1512 (and Addendum) –Framework for causal inference on effects of time-varying treatment
13/65
Causal inference framework
Robins, J. M. (1997). Causal inference from complex longitudinaldata. In Berkane, M., editor, Latent Variable Modeling andApplications to Causality. Lecture Notes in Statistics (120), New York:Springer Verlag, 69–117 – Refine the causal inference framework
Robins J. M., Hernan, M., and Brumback, B. (2000). Marginalstructural models and causal inference in epidemiology.Epidemiology, 11, 550–560 – Modeling and estimation of meanoutcome if the patient population were to follow a static regime(inverse probability weighting )
14/65
Sequential treatment
Peter Thall
Thall, P., Millikan, R., and Sung, H. (2000). Evaluating multipletreatment courses in clinical trials. Statistics in Medicine, 30,1011-1128 – Sequential treatments
15/65
Sequential treatment
Lavori, P. W. and Dawson, R. (2000). A design for testing clinicalstrategies: Biased adaptive within-subject randomization. JRSS-A,163, 29-38 – Clinical trials for evaluating sequential treatments
Murphy, S. A., van der Laan, M. J., Robins, J. M., and CPPRG.(2001). Marginal mean models for dynamic regimes. JASA, 96,1410–1423 – Estimation of mean outcome for dynamic regimes
Lunceford, J., Davidian, M., and Tsiatis, A. A. (2002). Estimation ofsurvival distributions of treatment policies in two-stage randomizationdesigns in clinical trials. Biometrics, 58, 48- 57 – Estimation of meansurvival outcome for simple dynamic regimes
16/65
Optimal regimes
Susan Murphy
Murphy, S. (2003). Optimal dynamic treatment regimes (withdiscussions). JRSS-B, 65, 331-366 – Definition and estimation of anoptimal treatment regime from data
17/65
Optimal regimes
Robins, J. M. (2004). Optimal structural nested models for optimalsequential decisions. In Lin, D. Y. and Heagerty, P., editors,Proceedings of the Second Seattle Symposium on Biostatistics,189–326, New York. Springer – Definition and estimation of anoptimal treatment regime from data
Rosthøj, S., Fullwood, C., Henderson, R., and Stewart, S. (2006).Estimation of optimal dynamic anticoagulation regimes fromobservational data: A regret-based approach. Statistics in Medicine,25, 4197–4215.
Moodie, E. E. M., Richardson, T. S., and Stephens, D. A. (2007).De-mystifying optimal dynamic treatment regimes. Biometrics, 63,447-455.
18/65
Sequential multiple assignment randomized trials
Lavori, P. W. and Dawson, R. (2004). Dynamic treatment regimes:Practical design considerations. Clinical Trials, 1, 9–20.
Murphy, S. A. (2005). An experimental design for the development ofadaptive treatment strategies. Statistics in Medicine, 24, 1455–1481– Formal framework for SMARTs
Nahum-Shani, I., Qian, M., Almirall, D., Pelham, W., Gnagy, B., et al.(2012). Experimental design and primary data analysis methods forcomparing adaptive interventions. Psychological Methods, 17,457–477.
19/65
Estimation of optimal regimes, K = 1
Zhao, Y., Zeng, D., Rush, A. J., and Kosorok, M. R. (2012).Estimating individual treatment rules using outcome weightedlearning. JASA, 107, 1106-1118.
Zhang, B., Tsiatis, A. A., Laber, E. B., and Davidian, M. (2012). Arobust method for estimating optimal treatment regimes. Biometrics,68, 1010–1018.
Zhang, B., Tsiatis, A., Davidian, M., Zhang, M., and Laber, E. (2012).Estimating optimal treatment regimes from a classication perspective.Stat, 1, 103114.
20/65
Estimation of optimal regimes, K ≥ 2
Orellana, L., Rotnitzky, A., and Robins, J. M. (2010a). Dynamicregime marginal structural mean models for estimation of optimaldynamic treatment regimes, part I: Main content. The InternationalJournal of Biostatistics, 6.
Nahum-Shani, I., Qian, M., Almirall, D., Pelham, W., Gnagy, B., et al.(2012). Q-learning: A data analysis method for constructing adaptiveinterventions. Psychological Methods, 17, 478-494.
Zhang, B., Tsiatis, A. A., Laber, E. B., and Davidian, M. (2013).Robust estimation of optimal dynamic treatment regimes forsequential treatment decisions. Biometrika, 100, 681–694.
Zhao, Y., Zeng, D., Laber, E. B., and Kosorok, M. R. (2015). Newstatistical learning methods for estimating optimal dynamic treatmentregimes. JASA, 110, 583-598.
21/65
Today
Enormous body of literature on estimation of optimal single andmultiple decision regimes. . .
22/65
Outline
Treatment Regimes and Precision Medicine
A Brief History
Optimal Single Decision Regimes
Optimal Multiple Decision Regimes
Treatment Regimes in Practice
23/65
Statistical framework
For simplicity: Two treatment options, A1 = {0,1}
With K = 1: Baseline information x1 = h1 ∈ H1
• Treatment regime d ∈ D: d = {d1(h1)}, d1 : H1 → A1
• Example: Rules involving thresholds (with 0 = C1, 1 = C2)
d1(h1) = I(age < 50 and WBC < 10)
• Example: Rules involving linear combinations
d1(h1) = I{age + 8.7 log(WBC)− 60 > 0}
Convention: Larger outcomes are more beneficial
24/65
Statistical framework
For a randomly chosen individual from the population: Withhistory H1
• Potential outcomes Y *(0) and Y *(1) that would be achievedunder options 0 and 1
• Potential outcome if assigned treatment according to d ∈ D
Y *(d) = Y *(1)I{d1(H1) = 1}+ Y *(0)I{d1(H1) = 0}
• For general A1, a1 ∈ A1
Y *(d) =∑
a1∈A1
Y *(a1)I{d1(H1) = a1}
25/65
Value of a treatment regime
For regime d ∈ D: E{Y *(d)} is the expected outcome if allindividuals in the population were to receive treatment in A1according to rule d1 in d• Referred to as the value of regime d
V(d) = E{Y *(d)}
Definition of an optimal regime: dopt ∈ D is a regime satisfying
dopt = arg maxd∈D
E{Y *(d)} = arg maxd∈D
V(d)
i.e., V(dopt) = E{Y *(dopt)} ≥ E{Y *(d)} = V(d) for all d ∈ D
26/65
Optimal treatment regime
Characterization of an optimal regime:
V(d) = E{Y *(d)} = E[E{Y *(d)|H1}
]= E
[E{Y *(1)|H1}I{d1(H1) = 1}+ E{Y *(0)|H1}I{d1(H1) = 0}
]• Maximizing the inner expression at any h1 leads to V(d) as large
as possible if
d1(h1) = 1 when E{Y *(1)|H1 = h1} > E{Y *(0)|H1 = h1}d1(h1) = 0 when E{Y *(1)|H1 = h1} ≤ E{Y *(0)|H1 = h1}
• Thus, dopt has rule
dopt1 (h1) = I
[E{Y *(1)|H1 = h1} > E{Y *(0)|H1 = h1}
]27/65
Optimal treatment regime
In general:
dopt1 (h1) = arg max
a1∈A1
E{Y *(a1)|H1 = h1}
• Chooses the option in A1 with the maximum expected outcomegiven an individual’s history h1
• Thus, makes the best decision for such an individual given onlyknowledge of his history
• So formalizing the goal of clinical decision-making
28/65
Estimating dopt
Observed data: (X1i ,A1i ,Y1i), i = 1, . . . ,n• H1i = X1i = history for individual i• A1i = option in A1 actually received by i• Y1 = observed outcome• From a clinical trial or observational study (RWD )
Goal: Estimate dopt based on these data
dopt1 (h1) = arg max
a1∈A1
E{Y *(a1)|H1 = h1}
• dopt is defined in terms of potential outcomes• Must be able to express the definition of dopt in terms of the
observed data
29/65
Identifiability assumptions
SUTVA (consistency): Y = Y *(A1) =∑
a1∈A1Y *(a1)I(A1 = a1)
No unmeasured confounders (NUC): {Y *(1),Y *(0)} ⊥⊥ A1|H1
Positivity: P(A1 = a1|H1 = h1) > 0, a1 ∈ A1, for all h1 ∈ H1 withP(H1 = h1) > 0
Under these assumptions: For a1 ∈ A1
E{Y *(a1)|H1} = E{Y *(a1)|H1,A1 = a1} = E(Y |H1,A1 = a1)
so that dopt1 (h1) = I{E(Y|H1,A1 = 1) > E(Y|H1,A1 = 0)}
• Regression E(Y|H1 = h1,A1 = a1) = Q1(h1,a1)
dopt1 (h1) = arg max
a1∈A1
Q1(h1,a1)
30/65
Estimation of an optimal regime
Q-learning: Posit and fit a regression model Q1(h1,a1;β1)
• Substitution estimator
doptQ,1(h1) = arg max
a1∈A1
Q1(h1,a1; β1), doptQ (h1) = {dopt
Q,1(h1)}
• Form of model Q1(h1,a1;β1) induces a class of regimes indexedby β1 to which the search is restricted
• If Q1(h1,a1;β1) is misspecified doptQ could be far from dopt
31/65
Estimation of an optimal regime
Alternative approach: Deliberately restrict to a class Dη ⊂ D ofregimes dη with rules of form d1(h1; η1)
• Chosen for interpretability , cost , etc; e.g,
d1(h1; η1) = I(x11 < η11, x12 < η12)
• Optimal restricted regime: doptη ∈ Dη with rule
d1(h1; ηopt1 ), ηopt
1 = arg maxη1
V(dη), doptη = {d1(h1; η
opt1 )}
Policy search estimation: Estimate V(dη) by V(dη) for fixed η1
• Maximize V(dη) in η1 to obtain
doptη = {d1(h1, η
opt1 )}, ηopt
1 = arg maxη1
V(dη), doptη = {d1(h1, η
opt1 )}
32/65
Estimation of an optimal regime
Define: π1(h1,a1) = P(A1 = a1|H1 = h1)
Can show: For any d under SUTVA, NUC, positivity
V(d) = E{Y *(d)} = E[
I{A1 = d1(H1)}YP{A1 = d1(H1)|H1}
]= E
{I{A1 = d1(H1)}Y
π1(H1,A1)
}
33/65
Estimation of an optimal regime
Suggests: With a model π1(h1,a1; γ1)
• Inverse probability weighted estimator (IPW)
VIPW (dη) = n−1n∑
i=1
I{A1i = d1(H1i ; η1)}Yi
π1(H1i ,A1i ; γ1)
• Augmented inverse probability weighted estimator (AIPW)
VAIPW (dη) = n−1n∑
i=1
[I{A1i = d1(H1i ; η1)}Yi
π1(H1i ,A1i ; γ1)
− I{A1i = d1(H1i ; η1)} − π1(H1i ,A1i ; γ1)
π1(H1i ,A1i ; γ1)Qdη,1(H1i ; η1, β1)
]Qdη,1(H1; η1, β1) = Q1{H1,d1(H1; η1);β1}
• AIPW estimator is doubly robust, considerably more efficient
34/65
Classification analogy
Challenge: Maximization of VIPW (dη) or VAIPW (dη) in η1 is anonsmooth optimization problem
Can show: With two options• This maximization can be recast as minimization of a weighted
classification error• Can view the rule d1(h1; η1) as a classifier• Can exploit existing algorithms for classification problems (for
nonsmooth optimization )• CART, SVM, etc
35/65
Classification analogy
Algebra: Can write
VAIPW (dη) = n−1n∑
i=1
[ψ1(H1i ,A1i ,Yi)I{d1(H1i ; η1) = 1}
+ ψ0(H1i ,A1i ,Yi)I{d1(H1i ; η1) = 0}]
ψa1(H1,A1,Y) =I(A1 = a1)Yπ1(H1,a1)
− I(A1 = a1)− π1(H1,a1)
π1(H1,a1)Q1(H1,a1)
• Maximizing VAIPW (dη) is equivalent to minimizing in η1
n−1n∑
i=1
|C1(H1i ,A1i ,Yi)|I[I{C1(H1i ,A1i ,Yi) > 0} 6= d1(H1i ; η1)
]C1(H1i ,A1i ,Yi) = ψ1(H1i ,A1i ,Yi)− ψ0(H1i ,A1i ,Yi)
• In form of a weighted classification error• Similarly for VIPW (dη)
36/65
Classification analogy
Decision function f1: d1(h1; η1) = I{f1(h1; η1) > 0}• Equivalently with `0-1(x) = I(x ≤ 0), minimize
n−1n∑
i=1
|C1(H1i ,A1i ,Yi)| `0-1
([2I{C1(H1i ,A1i ,Yi) > 0}−1
]f1(H1i ; η1)
)• For SVM, replace nonconvex `0-1(x) by convex surrogate
`hinge(x) = (1− x)+, x+ = max(0, x)
and impose a penalty for overfitting of f1• Outcome Weighted Learning (OWL)
37/65
Interpretability versus flexibility
Classification approach:• Flexible, highly parameterized decision rules• Pro: Can synthesize high-dimensional patient information and
get close to true dopt ∈ D• Con: “Black box ,” little scientific insight
Opposing view: Parsimony and interpretability• Focus on Dη with understandable rules• Pro: Accessibility, scientific insight• Con: Optimal regime may not get close to true dopt ∈ D
38/65
Outline
Treatment Regimes and Precision Medicine
A Brief History
Optimal Single Decision Regimes
Optimal Multiple Decision Regimes
Treatment Regimes in Practice
39/65
K decision treatment regime
• Baseline information x1 ∈ X1, intermediate information xk ∈ Xkbetween Decisions k − 1 and k , k = 2, . . . ,K
• Set of treatment options Ak at Decision k , elements ak ∈ Ak
• Accrued information or history
h1 = x1 ∈ H1
hk = (x1,a1, . . . , xk−1,ak−1, xk ) ∈ Hk , k = 2, . . . ,K ,= (xk ,ak−1)
where xk = (x1, . . . , xk ), ak similarly, k = 1, . . . ,K• Decision rules d1(h1),d2(h2), . . . ,dK (hK ), dk : Hk → Ak
• Treatment regime
d = {d1(h1), . . . ,dK (hK )} = (d1,d2, . . . ,dK )
• Write dk = (d1, . . . ,dk ), k = 1, . . . ,K
40/65
Statistical framework
For a randomly chosen individual from the population: Withhistory H1 = X1
• Potential outcomes if an individual were to receive aK
{X *2(a1),X *
3(a2), . . . ,X *K (aK−1),Y *(aK )}
• All possible potential outcomes
W * ={
X *2(a1),X *
3(a2), . . . ,X *K (aK−1),Y *(aK ),
for a1 ∈ A1,a2 ∈ A2, . . . ,aK−1 ∈ AK−1,aK ∈ AK
}• For regime d ∈ D can define in terms of H1 and W * potential
outcomes under d
{X *2(d1),X *
3(d2), . . . ,X *K (dK−1),Y *(d)}
41/65
Potential outcomes for regime d ∈ DFormally:• Define
X*k (ak−1) = {X1,X *
2(a1),X *3(a2), . . . ,X *
k (ak−1)}, k = 2, . . . ,K
• Then
X *2(d1) =
∑a1∈A1
X *2(a1)I{d1(X1) = a1}
X *k (dk−1) =
∑ak−1∈Ak−1
X *k (ak−1)
k−1∏j=1
I[dj{X*
j (aj−1),aj−1} = aj
]k = 3, . . . ,K
Y *(d) =∑
aK∈AK
Y *(aK )K∏
j=1
I[dj{X*
j (aj−1),aj−1} = aj
]• Also define
X*k (dk−1) = {X1,X *
2(d1),X *3(d2), . . . ,X *
k (dk−1)}, k = 2, . . . ,K
42/65
Value of a K -decision treatment regime
For regime d ∈ D: E{Y *(d)} is the expected outcome if allindividuals in the population were to receive treatment in A1, . . . ,AKaccording to the rules in d• The value of regime d
V(d) = E{Y *(d)}
Definition of an optimal regime: dopt ∈ D is a regime satisfying
dopt = arg maxd∈D
E{Y *(d)} = arg maxd∈D
V(d)
i.e., V(dopt) = E{Y *(dopt)} ≥ E{Y *(d)} = V(d) for all d ∈ D
43/65
Characterization of an optimal regime
Sketch for K = 2: Using backward induction• For a randomly chosen individual
At Decision 2:• If she started with H1 = X1 = x1 and received a1 ∈ A1 at
Decision 1, she already will have achieved X *2(a1)
• With a1 and X*2(a1) = {X1,X *
2(a1} = x2 already determined, theoptimal decision at Decision 2 is to choose a2 ∈ A2 resulting inthe largest expected outcome given she is already at this point
V2(h2) = V2(x2,a1) = maxa2∈A2
E{Y *(a1,a2)|X*2(a1) = x2}
• Optimal rule at Decision 2
dopt2 (h2) = dopt
2 (x2,a1) = arg maxa2∈A2
E{Y *(a1,a2)|X*2(a1) = x2}
44/65
Characterization of an optimal regime
At Decision 1: If she starts with H1 = X1 = x1, choose a1 ∈ A1 tomaximize expected outcome given X1 = x1, taking into account shewill receive treatment at Decision 2 by following dopt
2 in the future• If a1 ∈ A1 is selected now, she will arrive at Decision 2 with
X*2(a1) = {x1,X *
2(a1)} and, with treatment selected by dopt2 , will
have expected outcome
V2{x1,X *2(a1),a1} = max
a2∈A2E{Y *(a1,a2)|X *
2(a1),X1 = x1}
• Optimal rule at Decision 1
dopt1 (h1) = dopt
1 (x1) = arg maxa1∈A1
E [V2{x1,X *2(a1),a1}|X1 = x1]
selects a1 ∈ A1 to maximize the maximum expected outcomethat would result from choosing treatment optimally at Decision 2
45/65
Characterization of an optimal regime
Can be shown: d = (dopt1 ,dopt
2 ) defined this way satisfies thedefinition of an optimal regime• And the reasoning extends to general K
46/65
Estimating dopt
Observed data: For i = 1, . . . ,n, i.i.d.
(X1i ,A1i ,X2i ,A2i , . . . ,XKi ,AKi ,Yi) = (X Ki ,AKi ,Yi) = (X i ,Ai ,Yi)
• X1 = baseline information at Decision 1, taking values in X1
• Ak = treatment option actually received at Decision k ,k = 1, . . . ,K , taking values in Ak
• Xk = intervening information between Decisions k − 1 and k ,k = 2, . . . ,K , taking values in Xk
• History H1 = X1, Hk = (X1,A1, . . . ,Xk−1,Ak−1,Xk ) = (X k ,Ak−1),k = 2, . . . ,K
• Y = observed outcome (after Decision K or function of HK )• Data sources discussed shortly
Goal: Estimate dopt based on these data• Must be able to express dopt in terms of the observed data
47/65
Identifiability assumptions
SUTVA (consistency): Y = Y *(AK ) =∑
a∈A Y *(aK )I(AK = aK )
Xk = X *k (Ak−1) =
∑ak−1∈Ak−1
X *k (ak−1)I(Ak−1 = ak−1), k = 2, . . . ,K
Sequential Randomization (SRA): Robins (1986)
W * ⊥⊥ Ak |Hk , k = 1, . . . ,K
Positivity: P(Ak = ak |Hk = hk ) > 0, ak ∈ Ak , for all hk ∈ Hk withP(Hk = hk ) > 0, k = 1, . . . ,K
48/65
Identifiability assumptions
Under these assumptions: It is possible to identify the distribution of
{X1,X *2(d1),X *
3(d2), . . . ,X *K (dK−1),Y *(d)}
which depends on that of (X1,W *), from the distribution of theobserved data
(X1,A1,X2,A2, . . . ,XK ,AK ,Y)
• Robins (1986) g-computation algorithm
49/65
Characterization in terms of observed data
For K = 2: Generalizes to arbitrary K
Decision 2: With Q2(h2,a2) = E(Y|H2 = h2,A2 = a2)
dopt2 (h2) = arg max
a2∈A2
Q2(h2,a2)
V2(h2) = maxa2∈A2
Q2(h2,a2)
Decision 1: Define
Q1(h1,a1) = Q1(x2,a1) = E{V2(X2,a1)|X2 = x2,A1 = a1}
dopt1 (h1) = arg max
a1∈A1
Q1(h1,a1)
• Qk (h1,ak ) are the Q-functions
Suggests: Positing and fitting models for the Q-functions
50/65
Q-learning
Estimation of dopt :• Decision 2: Posit and fit a model Q2(h2,a2;β2) by regressing Y
on H2,A2 (e.g., least squares) and estimate
doptQ,2(h2) = dopt
Q,2(h2; β2) = I{Q2(h2,1; β2) > Q2(h2,0; β2)}
• For each i , form the “pseudo outcome ”
V2i = V2(H2i ; β2) = max{Q2(H2i ,0; β2),Q2(H2i ,1; β2)}
• Decision 1: Posit and fit a model Q1(h1,a1;β1) by regressing V2on H1,A1 (e.g., least squares) and estimate
doptQ,1(h1) = dopt
Q,1(x1; β1) = I{Q1(h1,1; β1) > Q1(h1,0; β1)}
• Estimated regime doptQ = (dopt
Q,1, doptQ,2)
51/65
Restricted class of regimes
Q-learning: Q-function models for k = K − 1, . . . ,1 almost certainlymisspecified
Alternative approach: Deliberately restrict to a class Dη ⊂ Dcomprising regimes
dη = {d1(h1; η1), . . . ,dK (hK ; ηK )}, η = (ηT1 , . . . , η
TK )
T
• Chosen for interpretability , cost , etc• Optimal restricted regime: dopt
η ∈ Dη satisfies
doptη = {d1(h1; η
opt1 ), . . . ,dK (hK ; η
optK )}
ηopt = (ηopt T1 , . . . , ηopt T
K )T = arg maxη
V(dη)
52/65
Restricted class of regimes
Policy search estimation: Estimate V(dη) by V(dη) for fixed η
• Maximize V(dη) in η to obtain
ηopt = (ηopt T1 , . . . , ηopt T
K )T = arg maxη
V(dη)
doptη = {d1(h1, η
opt1 ), . . . ,dK (hK , η
optK )}
53/65
Estimation of an optimal regime
IPW estimator: Extension of single decision case
VIPW (dη) = n−1n∑
i=1
∏Kk=1 I{Aki = dk (Hki ; ηk )Yi}∏K
k=1 πk (Hki ,Aki ; γk )
• πk (hk ,ak ) = P(Ak = ak |Hk = hk ); model as πk (h1k ,ak ; γk )
• AIPW estimator possible, is doubly robust, considerably moreefficient
• Valid under SUTVA, SRA, positivity
54/65
Estimation of an optimal regime
High-dimensional η: Direct maximization in η is infeasible
Backward iterative implementation: Basic idea for K = 2• Decision 2: History H2 = (X 2,A1) is already determined , so
selection of Decision 2 treatment is like a single decision problemwith “baseline history ” H2 and single decision rule d2(h2; η2)
• ⇒ Maximize single decision estimator V(dη) in η2 to obtain ηopt2
• Decision 1: Maximize a two decision estimator V(dη,1,dη,2) in η1
with dη,2(h2) = d2(h2; η2) held fixed at d2(h2; ηopt2 )
• Can be shown: Results in an estimator for doptη
• Classification analogy at each stage
• Backward Outcome Weighted Learning (BOWL)
55/65
Outline
Treatment Regimes and Precision Medicine
A Brief History
Optimal Single Decision Regimes
Optimal Multiple Decision Regimes
Treatment Regimes in Practice
56/65
Evidence-based decision support
Result: From these or other approaches• An evidence-based regime based on formal statistical principles
that can be used to inform selection of treatment at at eachdecision point
• Insight on key characteristics (tailoring variables ) that should beincorporated at each decision point
• Interpretability versus flexibility
57/65
Data sources
Data sources:• Existing data from a longitudinal observational study or
previously conducted conventional clinical trial with follow-up(RWD )
• Prospectively collected data from a clinical trial conductedspecifically for this purpose
Sequential Multiple Assignment Randomized Trial (SMART):• Randomize at each decision point• SRA, positivity automatically satisfied• Collect rich baseline and intervening information to support
estimation of an optimal regime
58/65
SMART for Acute Leukemia
Randomization at •s
59/65
Growing interest in SMARTs
SMARTs on which I am a collaborator: Optimizing• Behavioral cancer pain management• Colorectal screening• HIV prevention, management• Anti-epilepsy medication adherence
60/65
I-SPY 2+ platform trial in breast cancer
How to treat women with locally advanced breast cancer who donot respond to initial therapy?
• I-SPY 2: Adaptive phase II platform trial, collaborative effort ofNCI, FDA, industry (FINH Biomarkers Consortium)
• I-SPY 2+: SMART with re-randomization of nonresponders
• P01 CA210961, PI: Laura Esserman, UCSF
61/65
Remarks
“Modernizing statistics:”• Questions of “what comes next ?” and “when and for whom ”
arise routinely• Usually after the fact in a conventional clinical trial. . .• These questions usually can be cast in terms of treatment
regimes• A formal statistical framework and methods to address this exist ,
as do methods for design of SMARTs• We must promote thinking in terms of treatment regimes• Prospectively rather than retrospectively
62/65
Pontification
“Modernizing statistics:”• The research-practice gap is still huge• Development of methods for estimating optimal treatment
regimescontinues• But major conceptual questions remain unresolved• E.g., how to characterize the contribution of a particular
treatment option to a regime?• E.g., what should be the regulatory path forward for a treatment
that is a critical component of an overall regime?• Challenge: Resolution of these and associated methodology
63/65
Shameless promotion
Coming in 2020:
Introduction to Dynamic Treatment Regimes:Statistical Methods for Precision Medicine
Tsiatis, A. A., Davidian, M., Holloway, S. T., and Laber, E. B.
• Published by Chapman & Hall• Dedicated website with software , code , worked examples
R package: DynTxRegime, available on CRAN and athttps://www2.cscc.unc.edu/impact7/DynTxRegime
Course notes: Eric Laber and I taught the PhD course Introductionto Dynamic Treatment Regimes in Spring 2019 for the SAMSIPrecision Medicine (PMED) program (www.samsi.info)
64/65
Acknowledgement
IMPACT – Innovative Methods Program for Advancing Clinical Trials
• A joint venture of Duke, UNC-Chapel Hill, NC State• Supported by NCI Program Project P01 CA142538 (2010–2020)
http://impact.unc.edu
• Statistical methods for precision cancer medicine
65/65