+ All Categories
Home > Documents > 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

Date post: 21-Dec-2015
Category:
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
46
1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008
Transcript
Page 1: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

1

Machine/Reinforcement Learning in Clinical Research

S.A. Murphy

May 19, 2008

Page 2: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

2

OutlineGoal: Improving Clinical Decision Making

Using Data

– Clinical Decision Making– Types of Training Data– Incomplete Mechanistic Models– Clinical Trials– Some Open Problems– Example

Page 3: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

3

Page 4: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

4

Page 5: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

5

QuestionsPatient Evaluation Screen with MSE

Page 6: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

6

Page 7: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

7

Policies are individually tailored treatments, with treatment type and dosage changing according to the patient’s outcomes.

k Stages for each patient

Observation available at jth stage

Action at jth stage (usually a treatment)

Page 8: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

8

k Stages

History available at jth stage

Reward following jth stage (rj is a known function)

Primary Outcome:

Page 9: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

9

Goal:

Use training data to construct decision rules, d1,…, dk that input information in the history at each stage and output a recommended action; these decision rules should lead to a maximal mean Y (cumulative reward).

The policy is the sequence of decision rules, d1,…, dk .

In implementation of the policy the actions are set to:

Page 10: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

10

Some Characteristics of a Clinical Decision Making Policy

• The learned policy should be decisive only when warranted.

• The learned policy should not require excessive data collection in order to implement.

• The learned policy should be justifiable to clinical scientists.

Page 11: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

11

Types of Data

• Clinical Trial Data– Actions are manipulated (randomized)

• Large Databases or Observational Data Sets– Actions are not manipulated by scientist

• Bench research on cells/animals/humans

Page 12: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

12

Clinical Trial Data Sets

• Experimental trials conducted for research purposes– Scientists decide proactively which data to collect and

how to collect this data

– Use scientific knowledge to enhance the quality of the proxies for observation, reward

– Actions are manipulated (randomized) by scientist

– Short Horizon (less than 5)

– Hundreds of subjects.

Page 13: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

13

Observational Data Sets

• Observational data collected for research purposes– Use scientific knowledge to pinpoint high

quality proxies for observation, action, reward – Scientists decide proactively which proxies to

collect and how to collect this data– Actions are not manipulated by scientist– Moderate Horizon– Hundreds to thousands of subjects.

Page 14: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

14

Observational Data Sets

• Clinical databases or registries– (an example in the US would be the VA registries)– Data was not collected for research purposes– Only gross proxies are available to define

observation, action, reward– Moderate to Long Horizon– Thousands to Millions of subjects

Page 15: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

15

Mechanistic Models

• In many areas of RL, scientists can use mechanistic theory, e.g., physical laws, to model or simulate the interrelationships between observations and how the actions might impact the observations.

• Scientists know many (the most important) of the causes of the observations and know a model for how the observations relate to one another.

Page 16: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

16

Low Availability of Mechanistic Models

• Clinical scientists have recourse to only crude, qualitative models

• Unknown causes create problems. Scientists who want to use observational data to construct policies must confront the fact that non-causal “associations” occur due to the unknown causes of the observations.

Page 17: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

17

Unknown Unknown Causes Causes

Observations Action Observations Action RewardTime 1 Time 2

Time 2 Time 3

Conceptual Structure in the Clinical Sciences (observational data)

Page 18: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

18

Unknown, Unobserved Causes

(Incomplete Mechanistic Models)

Maturity/Unknown DecisionCauses to join "Adult"

Society

+

+

Binge Drinking Treatment Binge Drinking Counseling Functionality Time 1 Time 2

Time 2 Time 3

Page 19: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

19

Unknown, Unobserved Causes (Incomplete Mechanistic Models)

• Problem: Non-causal associations between treatment (here counseling) and rewards are likely.

• Solutions:– Collect clinical trial data in which treatment actions are

randomized. This breaks the non-causal associations yet permits causal associations.

– Participate in the observational data collection; proactively brainstorm with domain experts to ascertain and measure the main determinants of treatment selection. Then take advantage of causal inference methods designed to utilize this information.

Page 20: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

20

Maturity/

Unknown DecisionCauses to join "Adult"

Society

"+"

Observations Treatment Binge Drinking Counseling FunctionalityTime 1 Time 2

Time 2 Time 3

Conceptual Structure in the Clinical Sciences(experimental trial data)

Page 21: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

21

STAR*D

• The statistical expertise relevant for policy construction was unavailable at the time the trial was designed.

• This trial is over and one can apply for access to this data

• One goal of the trial is construct good treatment sequences for patients suffering from treatment resistant depression.

www.star-d.org

Page 22: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

22

STAR*D "Sequenced Treatment to Relieve Depression"

Preference Treatment Intermediate Preference Treatment Two Outcome Three

Follow-up

CIT + BUS Remission L2-Tx +THY

Augment R Augment R

CIT + BUP-SR L2-Tx +LI

CIT Non-remission

Bup-SR MIRTSwitch Switch

R RVEN

SER NTP

Page 23: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

23

ExTENd

• Ongoing study at U. Pennsylvania

• Goal is to learn how best to help alcohol dependent individuals reduce alcohol consumption.

Page 24: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

24

Oslin ExTENd

Late Trigger forNonresponse

8 wks Response

TDM + Naltrexone

CBIRandom

assignment:

CBI +Naltrexone

Nonresponse

Early Trigger for Nonresponse

Randomassignment:

Randomassignment:

Randomassignment:

Naltrexone

8 wks Response

Randomassignment:

CBI +Naltrexone

CBI

TDM + Naltrexone

Naltrexone

Nonresponse

Page 25: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

25

Clinical Trials

• Data from the --short horizon– clinical trials make excellent test beds for combinations of supervised/unsupervised and reinforcement learning methods.– In the clinical trial large amounts of data are

collected at each stage of treatment– Small number of finite horizon patient

trajectories– The learned policy can vary greatly from one

training set to another.

Page 26: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

26

Open Problems

1) Equivalent Actions– Need to know when a subset of actions are

equivalent–that is, when there is no or little evidence to contradict this equivalence.

2) Evaluation– Need to assess the quality of the learned policy

(or compare policies) using training data

Page 27: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

27

Open Problems

3) Variable Selection– To reduce the large number of variables to

those most useful for decision making – Once a small number of variables is

identified, we need to know if there is sufficient evidence that a particular variable (e.g. output of a biological test) should be part of the policy.

Page 28: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

28

Measures of Confidence

• A statistician’s approach: use measures of confidence to address these three challenges– Pinpointing equivalent actions– Pinpointing necessary patient inputs to the

policy– Evaluating the quality of a learned policy

Page 29: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

29

Evaluating the quality of a learned policy using the training data

• Traditional methods for constructing measures of conference require differentiability (to assess the variation in the policy from training set to training set).

• The mean outcome following use of a policy (the value of the policy) is a non-differentiable function of the policy.

Page 30: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

30

Example: Single Stage (k=1)

• Find a prediction interval for the mean outcome if a particular estimated policy (here one decision rule) is employed.

• Action A is binary in {-1,1}.

• Suppose the decision rule is of form

• We do not assume the Bayes decision boundary is linear.

Page 31: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

31

Single Stage (k=1)

Mean outcome following this policy is

is the randomization probability

Page 32: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

32

Prediction Interval for

Two problems

• V(β) is not necessarily smooth in β.

• We don’t know V so V must be estimated as well. Data set is small so overfitting is a problem.

Page 33: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

33

Similar Problem in ClassificationMisclassification rate for a given decision rule

(classifier)

where V is defined by

(A is the {-1,1} classification; O1 is the observation; βT

O1 is a linear classification boundary)

Page 34: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

34

Jittering

is non-smooth.

Toy Example: The unknown Bayes classifier has quadratic decision boundary. We fit, by least squares, a linear decision boundary

f(o)= sign(β0 + β1 o)

Page 35: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

35

Jittering of

N=100

N=30

Page 36: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

36

Simulation Example

• Data Sets from the UCI repository

• Use squared error loss to form classification rule

• Sample 30 examples from each data set; for each sample construct prediction interval. Assess coverage using remaining examples.

• Repeat 1000 times

Page 37: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

37

“95% Prediction Intervals”

Data Set Percentile Bootstrap

Adjusted Bootstrap

Ionosphere .61 .93

Heart .41 .98

Simulated .83 .72

Confidence rate should be ≥ .95

Page 38: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

38

Prediction Interval for

Our method obtains a prediction interval for a smooth upper bound on

is training error.

Page 39: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

39

Prediction Interval for

where is the set of close to in terms of squared error loss. Form a percentile bootstrap interval for this smooth upper bound.

• This method is generally too conservative

Page 40: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

40

“95% Prediction Intervals”Data Set CUD CV Inverse

Binomial

Ionosphere(width)

1.00(.4)

.99(.5)

.75(.3)

Heart(width)

1.00(.5)

1.00(.4)

.46(.4)

Simulated(width)

.99(.5)

.76(.5)

.95(.4)

Confidence rate should be ≥ .95

Page 41: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

41

A Challenge!

Methods for constructing the policy (or classifier) and providing an evaluation of the policy (or classifier) must use same small data set.

How might you better address this problem?

Page 42: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

42

Discussion

1) Equivalent Actions: Need to know when a subset of actions are equivalent–that is, when there is no or little evidence to contradict this equivalence.

2) Evaluating the usefulness of a particular variable in the learned policy.

3) Methods for producing composite rewards.– High quality elicitation of functionality

4) Feature construction for decision making in addition to prediction

Page 43: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

43

This seminar can be found at:

http://www.stat.lsa.umich.edu/~samurphy/

seminars/Benelearn08.ppt

Email me with questions or if you would like a copy:

[email protected]

Page 44: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

44

Maturity/Unknown DecisionCauses to join "Adult"

Society

+ -

Binge Drinking Counseling on - Binge Drinking Sanctions FunctionalityYes Health Yes/No + counseling

Consequences Time 2 Yes/No Time 3 Yes/No

Unknown, Unobserved Causes (Incomplete Mechanistic Models)

Page 45: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

45

Maturity/Unknown DecisionCauses to join "Adult"

Society

+ -

Binge Drinking Counseling on - Binge Drinking Sanctions FunctionalityYes Health Yes/No + counseling

Consequences Time 2 Yes/No Time 3 Yes/No

Unknown, Unobserved Causes (Incomplete Mechanistic Models)

Page 46: 1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.

46

QuestionsPatient Evaluation Screen with MSE


Recommended