Date post: | 21-Dec-2015 |
Category: |
Documents |
View: | 220 times |
Download: | 0 times |
Variable Selection for Optimal
Decision Making
Susan Murphy & Lacey GunterUniversity of Michigan Statistics Department
Artificial Intelligence Seminar
Joint work with Ji Zhu
Simple Motivating Example
Nefazodone - CBASP Trial
R
Nefazodone
Nefazodone + Cognative Behavioral-analysis System of Psychotherapy (CBASP)
50+ baseline covariates, both categorical and continuous
Complex Motivating Example
STAR*D "Sequenced Treatment to Relieve Depression"
Preference Treatment Intermediate Preference Treatment Intermediate Treatment Two Outcome Three Outcome Four
Follow-up Follow-up
CIT + BUS Remission L2-Tx +THY Remission
Augment R Augment RTCP
CIT + BUP L2-Tx +LI
CIT Non- Non- Rremission remission
BUP MIRTMIRT + VEN
Switch R Switch RVEN
SER NTP
30+ baseline variables, 10+ variables at each treatment level, both categorical and continuous
Outline
Framework and notation for decision making
Need for variable selection Variables that are important to decision
making Introduce a new technique Simulated and real data results Future work
Optimal Decision Making
3 components: observations X = (X1, X2,…, Xp), action, A, and reward, R
A policy, π, maps observations, X, to actions, A Policies compared via expected mean reward,
Vπ = Eπ[R], called the Value of π (Sutton & Barto,1998)
Long Term Goal: find a policy, π*, for which
][maxargmaxarg* REV
Some Working Assumptions
Data collection is difficult and expensive limited number of trajectories (<1000) training set with randomized actions many observations
Finite horizon (only 1-4 time points) we will initially work with just one time point
Noisy data with little knowledge about underlying system dynamics
Little knowledge about which variables are most important for decision making
Simple Example
A clinical trial to test two alternative drug treatments
The goal: to discover which treatment is optimal for any given future patient
Components
X baseline variables such as patient's background,
medical history, current symptoms, etc.
A assigned treatment
R patient's condition and symptoms post treatment
Variable Selection
Multiple reasons for variable selection in decision making, for example Better performance: avoid inclusion of
spurious variables that lead to bad policies Limited resources: only small number of
variables can be collected when enacting policies in a real world setting
Interpretability: policies with fewer variables are easier to understand
What are people currently using?
Variable selection for reinforcement learning in medical settings predominantly guided by expert opinion
Predictive selection techniques, such as Lasso (Loth et al., 2006) and decision trees (Ernst et al., 2005) have been proposed
Good predictive variables are useful in decision making, but are only a small part of the puzzle
Need variables that help determine optimal actions, variables that qualitatively interact with the action
Qualitative Interactions
What is a qualitative interaction? X qualitatively interacts with A if at least two distinct,
non-empty sets exist within the space of X for which the optimal action is different (Peto, 1982)
No Interaction Non-qualitative Interaction Qualitative interaction
Qualitative interactions tell us which actions are optimal
0.0 0.4 0.8
0.0
0.4
0.8
X1
R
A=1
A=0
0.0 0.4 0.8
0.0
0.4
0.8
X2
RA=1
A=0
0.0 0.4 0.8
0.0
0.4
0.8
X3
R
A=0
A=1
Qualitative Interactions
We focus on two important factors The magnitude of the interaction between the variable and
the action The proportion of patients whose optimal choice of action
changes given knowledge of the variable
big interaction small interaction big interaction
big proportion big proportion small proportion
0.0 0.4 0.8
0.0
0.4
0.8
X4
R
A=0
A=1
0.0 0.4 0.8
0.0
0.4
0.8
X5
R
A=0
A=1
0.0 0.4 0.8
0.0
0.4
0.8
X6
R
A=0
A=1
Variable Ranking for Qualitative Interactions
We propose ranking the variables in X based on potential for a qualitative interaction with A
We give a score for ranking the variables Given data on i = 1,.., n subjects with j = 1,…,p
variables in X, along with an action, A, and a reward, R, for each subject
For Ê[R| A=a] an estimator of E[R| A=a], define
]|[ˆmaxarg* aAREaa
Variable Ranking Components
Ranking score based on 2 usefulness factors Interaction Factor:
aAxXREaAxXRE
aAxXREaAxXRED
ijjijjni
ijjijjniaa
j
,|ˆ,|ˆmin
,|ˆ,|ˆmaxmax
*
*
1
1*
max = 1 – 0 = 1
min = 0.3 – 0.7 = - 0.4
0.0 0.4 0.8
0.0
0.4
0.8
Xj
R
A=0
A=1=a*
Dj = 1 – (-.4) = 1.4
Variable Ranking Components
Proportion Factor:
n
iiijj
aj aaAxXRE
nP
1
*,|ˆmaxarg1
2 out of 7 subjects would change choice of optimal action given Xj
0.0 0.4 0.8
0.0
0.4
0.8
Xj
R
A=0
A=1=a*
2
7jP
Ranking Score
Ranking Score:
Score, Uj, j=1,…,p can be used to rank the p variables in X based on their potential for a qualitative interaction with A
1 1
1 11 1
min min
max min max min
j k j kk n k n
jk k k k
k n k nk n k n
D D P PU
D D P P
Variable Selection Algorithm
1. Select important main effects of X on R using some predictive variable selection method
a. Choose tuning parameter value that gives best predictive model
2. Rank variables in X using score Uj; select top k in rank
3. Again use a predictive variable selection method, this time selecting among main effects of X from step 1, main effect of A, and ranked interactions from step 2
a. Choose tuning parameter value such that the total subset of variables selected leads to a policy with the highest estimated Value
Simulation
Data simulated under wide variety of scenarios (with and without qualitative interactions) Used observation matrix, X, and actions, A, from a
real data set Generated new rewards, R, based on several
different realistic models
Compared new ranking method Uj versus a standard method
1000 repetitions: recorded percentage of time each interaction was selected for each method
Methods Used in Simulation
Standard Method: Lasso on (X, A, XA) (Tibshirani,
1996) The standard Lasso minimization criterion is
where Zi is the vector of predictors for observation i and λ is a penalty parameter
Coefficient for A, βp+1, not included in penalty term Value of λ chosen by cross-validation on the
prediction error
n
iii ZR
11
2minarg)(ˆ
1
Methods Used in Simulation
New Method: 1. Select important main effects of X on R using Lasso
a. Choose λ value by cross-validation on prediction error
2. Rank variables in X using score Uj; select top k in rank
3. Use Lasso to select among main effects of X chosen in step 1, main effect of A, and interactions chosen in step 2a. Choose λ value such that the total subset of
variables selected leads to a policy with the highest estimated Value
Simulation Results
× Continuous Qualitative Interaction
Spurious Interaction
× Binary Qualitative Interaction
Spurious Interaction
Simulation Results
× Binary Qualitative Interaction Non-qualitative Interaction Spurious Interaction
× Continuous Qualitative Interaction Non-qualitative Interaction Spurious Interaction
Depression Study Analysis
Data from a randomized controlled trial comparing treatments for chronic depression (Keller et al., 2000)
n = 440 patients, p = 64 observation variables in X, actions, A = Nefazodone or A = Nefazodone + Cognitive psychotherapy (CBASP),
Reward, R = Hamilton Rating Scale for Depression score
Depression Study Results
Ran both methods on 1000 bootstrap samples Resulting selection percentages:
ALC2
ALC1
Som Anx
OCD
ALC2
Inclusion Thresholds
Based on previous plots, which variables should we select?
Need inclusion thresholds Idea: remove effect of X on R from data, then
run algorithm to determine maximum percentage of selections this tells us the noise threshold variables with percentages above this threshold
are selected
Inclusion Thresholds
Do 100 times Randomly assign the observed rewards to
different subjects given a particular action Run the methods on new data Record the variables that were selected by
each method Threshold: largest percentage of time a
variable was selected over the 100 iterations
Thresholds for Depression Study
0 20 40 60
0.00
0.03
0.06
Standard Lasso
variable number
% o
f tim
e ch
osen
0 20 40 60
0.00
0.03
0.06
New Method U
variable number%
of t
ime
chos
en
0 20 40 60
0.00
0.04
0.08
New Method S
variable number
% o
f tim
e ch
osen
We should disregard any interactions selected 6% of the time or less when using either method
Threshold on Results
New method U includes 2 indicator variables for Alcohol problems and Somatic Anxiety Score
Standard Lasso includes 39 variables!
0 20 40 60
0.0
0.2
0.4
Standard Lasso
variable number
% o
f tim
e ch
osen
0 20 40 60
0.00
0.10
New Method U
variable number
% o
f tim
e ch
osen
ALC2
ALC1
Som Anx
Future Work
Extend algorithm to select variables for
multiple time points
How best to do this? What rewards to use at each time point
Do we need to adjust the distribution of our X
based on prior actions
What order should variable selection be done
Other Issues To Think About
Do we need to account for variability in our estimate of E[R| Xj, A=a] over different Xj
Can we reasonably estimate the value of a derived policy from a fixed data set collected under random actions when the number of time points gets larger?
Any other issues?
References & Acknowledgements
For more information see: L. Gunter, J. Zhu, S.A. Murphy (2007). Variable Selection for Optimal Decision Making. Technical Report, Department of Statistics, University of Michigan.
This work was partially supported by NIH grants: R21 DA019800,K02 DA15674,P50 DA10075
Technical and data support A. John Rush, MD, Betty Jo Hay Chair in Mental Health at
the University of Texas Southwestern Medical Center, Dallas
Martin Keller and the investigators who conducted the trial `A Comparison of Nefazodone, the Cognitive Behavioral-analysis System of Psychotherapy, and Their Combination for Treatment of Chronic Depression’
Addressing Concerns
Many Biostat literature discourage looking for qualitative interactions and are very skeptical when new interactions are found, why is this? Qualitative interactions are hard to find, have small
effects Too many people fishing without disclosing Strict entry criteria for most clinical trials, thus small
variability in X precludes looking at avoid looking at interesting subgroups
How are we addressing these concerns? Testing new algorithms in multiple settings where no
qualitative interactions exist
No Interaction: What can we expect?
No Qualitative Interactions
No relationship between (X, A, X*A) and R
Main effects of X only
Main effects of X & moderate effect of A only
Everything but qualitative interactions
Estimating the Value
1. Fit selected variables into chosen estimator, Ê
2. Estimate optimal policy:
3. Estimate Value of by:
],|[ˆmaxargˆ* aAxXREa
n
i ii
ii
xXaAP
xaR
nV i
1*ˆ
)|(
)}(ˆ{11ˆ *
*̂
Estimating the Value (2 time points)
1. Estimate of the optimal policy:
2. Estimate Value of by:
11 1 1 1 1 1* ˆˆ arg max [ | , ]
aE R X x A a
1,
1, 1,
1, 1ˆ* 1,
1 1 1, 1 1,
1, 1 2 1, 2,
2,1 1 1, 1 1, 2 2, 1 1, 1 1, 2 2,
*
* *
ˆ1{ ( )}1ˆ( | )
ˆ ˆ1{ ( )}1{ ( , , )}1
( | ) ( | , , )
i
i i
ni
ii i i
ni i i i
ii i i i i i i
a xV R
n P A a X x
a x a x a xR
n P A a X x P A a X x A a X x
*̂1 2* **ˆ ˆ ˆ( , )
22 1 2 1 1 1 1 2 2 2 2* ˆˆ arg max [ | , , , ]
aE R R X x A a X x A a