Lecture 5: Sequential MultipleAssignment Randomized Trials
(SMARTs) for DTR
Donglin Zeng, Department of Biostatistics, University of North Carolina
Introduction
Donglin Zeng, Department of Biostatistics, University of North Carolina
Introduction
I Consider simple DTRs: D = (D1, ...,DK)
Dk(Hk) = 1 or −1 (Ak = {−1, 1}).
That is, a fixed treatment is assigned to any patients atstage k.
I How shall we evaluate V(D)?Conduct a single-arm randomized trial.
I How shall we compare two and more different DTRs: D1vs D2 (not necessarily simple ones)?Conduct a multiple-arm randomized trial.
Donglin Zeng, Department of Biostatistics, University of North Carolina
Randomization Is Key
I Why randomization is needed?I It provides representative coverage of the patient
population.I It controls all potential confounders which may bias the
effects of comparing treatments.
I One essential mathematical relationship links observedoutcome to potential outcome:
E[R(d)|H] = E[R(d)|A = d,H] by randomization= E[R|A = d,H] by consistency assumption.
Consistency/SUTVA: R = R(d) if A = d and is not affectedby the particular treatment assignments to the others.
Donglin Zeng, Department of Biostatistics, University of North Carolina
Difficulty in Design Studies for Optimal DTRs
I Even for simple DTRs and only two treatment options ateach stage, the total number of DTRs is 2K.
I The optimal DTRs permit treatment decisions to dependon patient characteristics, Hk, at each stage k so the totalnumber of DTRs is infinite.
I Some smart design is needed.
Donglin Zeng, Department of Biostatistics, University of North Carolina
Learn from Artificial Intelligence
Donglin Zeng, Department of Biostatistics, University of North Carolina
What is Reinforcement Learning in AI
I Reinforcement learning (RL) is a machine learning methodin artificial intelligence.
I Its history dates back to decades ago when AI engineerstried to mimic learning process in human brains.
I Different from commonly known machine learning, RL is alearning method of interactions between agent andunknown environment with feedback loop.
I It consists of exploitation and exploration.Exploitation: apply already learnt decisionsExploration: trial and learn from new environment.
Donglin Zeng, Department of Biostatistics, University of North Carolina
Link between RL and DTR
I They both involve sequential decisions.
I They share the same goal:find optimal decisions to maximize some value.
I Equivalence between DTR and RL terminologies:treatment == actionstate/covariate==statereward outcome==rewardtreatment rule==policy
Donglin Zeng, Department of Biostatistics, University of North Carolina
How RL Designs Studies in AI
I In artificial intelligence (AI) , a single agent (Robot) trainsand learns by itself for a particular task, for example, playchess to win.
I A typical design for this learning is a trial and errorlearning process.
I The single agent tries many different episodes, each with agiven policy.
I It leans from these episodes and also updates policies innew episodes.
Donglin Zeng, Department of Biostatistics, University of North Carolina
How Different RL Should Be Used in Medical Trials
I Instead of training one single agent with many policies, we“train”many patients, each given potentially different treatments.
I Randomization is necessary to have sufficientrepresentation and avoid selection for treatments.
I This is essentially the idea of Sequential MultipleAssignment Randomized Trial (SMART).
Donglin Zeng, Department of Biostatistics, University of North Carolina
SMART
Donglin Zeng, Department of Biostatistics, University of North Carolina
SMART
SMART: Sequential Multiple Assignment Randomized Trial(Lavori & Dawson 2000, 2004; Murphy 2005)
I Patients are sequentially randomized at each criticaldecision stage.
I Randomization probability may depend on current statesof patients.
I Practical SMART– Adaptive Pharmacological and Behavioral treatments forADHD;– Sequenced Treatment Alternatives to Relieve Depression(STAR*D) ;– CATIE for schizophrenia;– ExTENd for alcohol dependence;– Adaptive therapy for androgen independent prostate cancer
Donglin Zeng, Department of Biostatistics, University of North Carolina
Example: Two-Stage SMART Study
I The study (Kasari et al., 2014) was designed to studycommunication intervention for minimally verbal childrenwith autism.
I The study aimed to test the effect of SGD, each stagelasting 12 weeks.
I SGD: speech-generating device; (JASP+EMT): blendeddevelopmental/behavioral intervention
I The second stage had another 12 week follow-up.I The study started with 61 eligible children and 46
completed both stages.
Donglin Zeng, Department of Biostatistics, University of North Carolina
Diagram of The Autism Study
12 weeks in duration. At the beginning of Stage 1(baseline), all children meeting inclusion criteria wererandomized with equal probability to JASPþEMTversus JASPþEMTþSGD. At the end of 12 weeks,children were assessed for early response versus slowresponse (defined in section “Stage 2 Treatments:Weeks 13 to 24”) to Stage 1 treatment. At the begin-ning of Stage 2 (i.e., beginning of week 13), the sub-sequent treatments were adapted based on responsestatus. All early responders continued with the sametreatment for another 12 weeks. For slow respondersto JASPþEMTþSGD, treatment was intensified(3 sessions per week). Slow responders to JASPþEMTwere re-randomized with equal probability to inten-sified JASPþEMT or augmented JASPþEMTþSGD(Figure 1). The institutional review board at eachsite approved the study protocol. Randomizationwas conducted by an independent data-coordinatingcenter.
ParticipantsInclusion criteria were as follows: previous clinicaldiagnosis of ASD, confirmed by research-reliable staffusing the Autism Diagnostic Observational Schedule(ADOS-Generic)20 Module 1 appropriate for childrenwithout phrase speech; chronological age between 5and 8 years; evidence of being minimally verbal, withfewer than 20 spontaneous different words used dur-ing the 20-minute NLS; at least 2 years of prior inter-vention, per parent-report; and receptive language ageof at least 24 months (based on performance of 2 of 3assessments, given the potential difficulty complyingwith standardized test conditions. Exclusion criteriawere as follows: major medical conditions other thanASD; sensory disabilities (e.g., deafness); motor dis-abilities (e.g., cerebral palsy); uncontrolled seizuredisorders; and proficient use of an SGD based onparent-report and observation during study adminis-tration of the NLS.
FIGURE 1 Participant flow through trial. Note: JASPþEMT ¼ spoken mode of JASPER plus Enhanced Milieu Teaching;JASPþEMTþSGD ¼ spoken mode of JASPER plus Enhanced Milieu Teaching plus Speech Generating Device.
JOURNAL OF THE AMERICAN ACADEMY OF CHILD & ADOLESCENT PSYCHIATRY
VOLUME 53 NUMBER 6 JUNE 2014 www.jaacap.org 637
SMART FOR MINIMALLY VERBAL CHILDREN WITH ASD
Figure: SMART Design of Autism Study (Kasari et al. 2014)
Donglin Zeng, Department of Biostatistics, University of North Carolina
Advantages of SMART
Research questions to be answered from a SMARTI Main effects of treatments
I What is the better initial treatment, JASP+EMT orJASP+EMT+SGD?
I What about the slow-responders: intensify or not?
I Effects of embedded DTRI JASP+EMT→intensify vs JASP+EMT+SGD→ intensify vs
JASP+EM→JASP+EMT+SGD
I Exploring optimal treatment strategy (deep tailoring)I intensify or not in the second stage dependent on
additional intermediate outcomes?
Donglin Zeng, Department of Biostatistics, University of North Carolina
General Advantages
I Valid comparisons of different treatment options atdifferent stages due to the virtue of randomization.
I Discover adaptive treatment strategies that are embeddedin the SMART trial.
I Inform the development of adaptive and deeply tailoredtreatments (using potentially high-dimensionalbiomarkers).
Donglin Zeng, Department of Biostatistics, University of North Carolina
Unbiased Value Estimation under SMART
Recall the value function associated with D = (D1, ...,DK)
V(D) = E[R(D)] = E[R(D1, ...,DK)].
Let πk(a, h) be the randomization probability P(Ak = a|Hk = h).I Start from stage K:
E[R(D1, ...,Dk−1,DK)]
= E{
E[R(D1, ...,DK)
∣∣∣HK,AK = DK(Hk)]}
= E{
E[
R(D1, ...,Dk−1,AK)I(AK = DK(HK))
πK(AK,HK)
∣∣∣HK
]}= E
[R(D1, ...,Dk−1,AK)
I(AK = DK(HK))
πK(AK,HK)
].
Donglin Zeng, Department of Biostatistics, University of North Carolina
Continue to Prior Stages
I Now at stage K − 1, we continue and repeat the samederivation to obtain
E[R(D1, ...,Dk−1,DK)]
= E[
R(D1, ...,AK−1,AK)I(AK−1 = DK−1(HK−1),AK = DK(HK))
πK−1(AK−1,HK−1)πK(AK,HK)
].
I Continue backwards till stage 1:
V(D) = E
[R(A1, ...,AK−1,AK)
I(A1 = D1(H1),AK = DK(HK))∏Kk=1 πk(Ak,Hk)
]
= E
[R
I(A1 = D1(H1),AK = DK(HK))∏Kk=1 πk(Ak,Hk)
].
I The value function for D can be estimated using theaverage reward outcomes from the patients whosetreatments follow D, weighted by their randomizationprobabilities.
Donglin Zeng, Department of Biostatistics, University of North Carolina
An Alternative Interpretation
I In SMART, a particular treatment assignment takesprobability
K∏k=1
πk(Ak,Hk).
I If we run a single-arm trial for a given D, the treatmentassignment is
∏Kk=1 I(Ak = Dk(Hk)).
I To use SMART to estimate the expected reward in theD-specified trial, important sampling theory gives
V(D) = E
[R
I(A1 = D1(H1),AK = DK(HK))∏Kk=1 πk(Ak,Hk)
].
I In conclusion, SMART provides unbiased estimation andcomparisons between DTRs so potentially leads to validestimation of optimal DTRs.
Donglin Zeng, Department of Biostatistics, University of North Carolina
Demo of SMART Trial
Donglin Zeng, Department of Biostatistics, University of North Carolina
NSCLC trial
In treating advanced non-small cell lung cancer, patientstypically experience two or more lines of treatment, and manystudies demonstrate that three lines of treatment can improvesurvival for patients.
1st-line 2nd-line 3rd-line
Donglin Zeng, Department of Biostatistics, University of North Carolina
1st-line therapy
Scagliotti et al., 2008{Cisplatin + GemcitabineCisplatin + Pemetrexed
Sandler et al., 2006{Paclitaxel + CarboplatinPaclitaxel + Carboplatin + Bevacizumab (Avastin)
Pirker et al., 2008{Cisplatin + VinorelbineCisplatin + Vinorelbine + Cetuximab (Erbitux)
Donglin Zeng, Department of Biostatistics, University of North Carolina
2nd-line therapy
Approved regimens
DocetaxelPemetrexedErlotinib (approved for third-line, too)
Timing issue
I Immediate or delayed docetaxel? (Fidias et al., 2007;Ciuleanu et al., 2008)
I Comparison of PFS suggests immediate, but OS did notreach significant difference.
I Merits of pemetrexed and erlotinib remain unclear.
Donglin Zeng, Department of Biostatistics, University of North Carolina
Important clinical questions
1. Among many approved 1st-line treatments, whattreatment to administer?
2. Then, at the end of the 1st-line treatmentI Among approved 2nd-line treatments, what treatment to
administer?I When to begin the 2nd-line of treatment?
Possibletreatments
Possibletreatmentsand initialtimings
1st-line 2nd-line
Immediate Progression Death
1
Donglin Zeng, Department of Biostatistics, University of North Carolina
Reinforcement Learning
I Has a well-known framework for optimizing sequence oftreatments in an evolving, time-varying system.
I Discovers individualized treatment regimens for cancer.I Selects treatments that improve outcomes even when the
relationship between treatments and outcomes is not fullyknown.
I Can evaluate treatments based on immediate andlong-term effects.
Applications:I Behavioral disorders (Pineau et al., 2007).I Sequential multiple assignment randomized trial (SMART)
designs for developing dynamic treatment regimes(Murphy et al., 2007).
Donglin Zeng, Department of Biostatistics, University of North Carolina
Clinical Reinforcement Trial
Drafted Protocol
1. A finite, reasonably small set of decision times is identified.2. For each decision time, a set of possible treatments to be
randomized is identified.3. A utility function is identified which can be assessed at
each time point.4. Patients are then recruited into the study and randomized
to the treatment set under the protocol restrictions at eachdecision point.
5. The patient data is collected and Q-learning is applied, incombination with SVR applied, to estimate the optimaltreatment rule as a function of patient variables andbiomarkers, at each decision time.
Donglin Zeng, Department of Biostatistics, University of North Carolina
Reinforcement Learning Trial
t1
T1
YD = 0, δ = 1, T 0 = T1
t1
C
YD = 0, δ = 0, T 0 = C
t2
T2
t1
(t2 + TM ) ∧ TP
YD = 1, δ = 1, T 0 = t2 + T2
t2
C
t1
(t2 + TM ) ∧ TP
YD = 1, δ = 0, T 0 = C
1
I T1 = TD ∧ t2, YD = I(TD ∧ C ≥ t2)
I T2 = (TD − t2)I(TD ≥ t2) = (TD − t2)I(T1 = t2)
I C2 = (C− t2)I(C ≥ t2)
I TD = T1 + T2, T0 = TD ∧ C = T1 ∧ C + YD(T2 ∧ C2)
Donglin Zeng, Department of Biostatistics, University of North Carolina
Learning Algorithm
1. At t1, (w1,m1, d1,T1 ∧ C,∆)ni=1;
at t2, (w2,m2, d2,TM,T2 ∧ C2,∆)n′
j=1, where n′ ≤ n.2. Q2, the expected reward at stage 2, is estimated using
regression methods with outcome
T2 ∧ C2.
3. Q1, the expected optimal reward at stage 1, is estimatedusing regression methods with outcome
T1 ∧ C + I(T1 = t2)×maxd2,TM
Q2.
4. In step 2 & 3, regression methods can include machinelearning algorithms (SVR).
5. Given Q1(θ̂1) and Q2(θ̂2), compute D̂1 and D̂2 bymaximizing Q1 and Q2, respectively.
Donglin Zeng, Department of Biostatistics, University of North Carolina
Simulation Settings
I 1st-line treatment regimens: A1 or A2
I 2nd-line treatment regimens: A3 or A4
I initiation time for 2nd-line treatment: random timebetween 2.8 and 4.8 months from 1st-line treatment
I prognostic factors: Wt (quality of life), Mt (tumor size)I four groups are generated along with their optimal
treatment regimens
Donglin Zeng, Department of Biostatistics, University of North Carolina
Simulation scenario table
Table 1: The scenarios studied in the simulation. Sample size = 100/group.
Group State Variables Status Timing Optimal Regimen
1W1 ∼ N(0.25, σ2)
W1 ↓M1 ↑ A1A32M1 ∼ N(0.75, σ2)
2W1 ∼ N(0.75, σ2)
W1 ↑M1 ↑ A1A41M1 ∼ N(0.75, σ2)
3W1 ∼ N(0.25, σ2)
W1 ↓M1 ↓ A2A33M1 ∼ N(0.25, σ2)
4W1 ∼ N(0.75, σ2)
W1 ↑M1 ↓ A2A42M1 ∼ N(0.25, σ2)
1
Donglin Zeng, Department of Biostatistics, University of North Carolina
Performance of the optimal regimen
9.23 10.39 9.04 9.59 10.25 9.12 10.53 11.29 10.31 9.15 9.75 8.90 17.48
Overall Survival
05
1015
2025
A1A31 A1A32 A1A33 A1A41 A1A42 A1A43 A2A31 A2A32 A2A33 A2A41 A2A42 A2A43 optimal
Donglin Zeng, Department of Biostatistics, University of North Carolina
Performance of the optimal regimen
Table: Comparisons between true optimal regimens and estimated optimalregimens for overall survival (month). Each reinforcement trial is of size100/group with 10 simulation runs. The confirmatory trial is of size100/group.
Optimal True Predicted survivalGroup regimen survival Min Mean Max
1 A1A32 16.00 15.83 15.93 16.002 A1A41 15.33 14.96 15.13 15.283 A2A33 18.37 17.75 17.99 18.274 A2A42 20.75 20.60 20.86 20.97
Average 17.61 17.28 17.48 17.63
Donglin Zeng, Department of Biostatistics, University of North Carolina
A1A32
Group 1 (92%, 100%)
Fre
quen
cy
3.0 3.5 4.0 4.5
020
4060
8010
0A1A41
Group 2 (100%, 100%)
Fre
quen
cy
3.0 3.5 4.0 4.5 5.0 5.5
020
4060
8010
0
A2A33
Group 3 (100%, 100%)
Fre
quen
cy
3.0 3.5 4.0 4.5 5.0
020
4060
8010
0
A2A42
Group 4 (100%, 81%)
Fre
quen
cy
3.0 4.0 5.0 6.0
020
4060
8010
0
Sensitivity of the predicted survival to the sample size
●●
Sample Size for Each Group
Ove
rall
Sur
viva
l
2 3 6 10 20 40 60 80 100 150 200 300 400 500 600
1314
1516
17
Donglin Zeng, Department of Biostatistics, University of North Carolina
ε-SVR-C performance
●
(a)
Pre
dict
ed O
vera
ll S
urvi
val
None 25% delete
15.0
15.5
16.0
16.5
17.0
17.5
18.0
●
(b)
Pre
dict
ed O
vera
ll S
urvi
val
None 50% delete
15.0
15.5
16.0
16.5
17.0
17.5
18.0
●
(c)P
redi
cted
Ove
rall
Sur
viva
l
None 75% delete
15.0
15.5
16.0
16.5
17.0
17.5
18.0
Donglin Zeng, Department of Biostatistics, University of North Carolina
Summary
I Identified optimal treatment strategies tailored to propersubpopulation of NSCLC patients.
I Solved timing problem of initiating second-line therapy inNSCLC.
I Handled right censored data.
Donglin Zeng, Department of Biostatistics, University of North Carolina
Design Limitations
Donglin Zeng, Department of Biostatistics, University of North Carolina
Practical limitations of SMART
I Operation cost of administrating multiple stage studiesand multiple treatments is high.
I The length of trial period is long (March et al. 2010).
I Study dropout or compliance is common even in regularRCTs:
I In the CATIE study, 705 of 1460 patients stayed for theentire 18 months of the study.
I In ExTENd, the drop-out rate was 17% (52 out of 302) in thefirst-stage treatment and an additional 13% (41 out of 302)during the second stage.
Donglin Zeng, Department of Biostatistics, University of North Carolina