Human Reward / Stimulus/ Response Signal Experiment: Data and Analysis

transcript

Draws on:

Alan and Bill’s experimentUsher & McClelland model and experiments

Patrick Simen’s modelSam and Phil’s analysisJuan’s further analysis

Human experiment examining reward bias effect with responsesignal given at different times after target onset

• Target stimuli are rectangles shifted 1,3, or 5 pixels L or R of fixation

• Reward cue occurs 750 msec before stimulus.

– Small arrow head pointing L or R visible for 250 msec. – Only biased reward conditions (2 vs 1 and 1 vs 2) are used.

• Response signal occurs at different times after target onset:

0 75 150 225 300 450 600 900 1200 2000

- Participant receives reward only if response is correct and occurs within 250 msec of response signal.

- Participants were run for 15-25 sessions to provide stable data.

- Data shown are from later sessions in which effects were all stable.

A participant with very little reward bias

• Top panel shows probability of response giving larger reward as a function of actual response time for combinations of:

Stimulus shift (1 3 5) pixels

Reward-stimulus compatibility

• Lower panel shows data transformed to z scores, and corresponds to the theoretical construct:

mean(x1(t)-x2(t))+bias(t)

sd(x1(t)-x2(t))

where x1 represents the state of the

accumulator associated with greater

reward, x2 the same for lesser reward,

and S is thought to choose larger reward if

x1(t)-x2(t)+bias(t) > 0.

Participants Showing Reward Bias

Analysis Assumptions

• Decision variable x varies as a function of t.• Choice is made at some time t = signal lag + rt.• At the time the choice is made:

– For a single difficulty level, two distributions, with means +, -, and equal sd set to 1. Choose high reward if decision variable x > -Xc

– For three difficulty levels, fixed = 1, means i (i=1,2,3),assume same Xc for all difficulty levels.

– Xc can be regarded as a positive increment to the state of the decision variable;high reward is chosen if x > 0 in this case.

-10 -8 -6 -4 -2 0 2 4 6 8 100

- +-xc

LHPinvNormZ

HHPinvNormZ

))|((2

Only one diff level

LHPinvNormZ

HHPinvNormZ

))|((2

Three diff levels

Subject’s sensitivity, as defined in theory of signal detectability

)(' ii

id When response

signal delay varies)(' tdi

For each subject, fit with function from UM’01

fiti detd

)1()()0(

Subject Sensitivity

0 0.5 1 1.5 2 2.5-0.5

RT+response cue delay

0 0.5 1 1.5 2 2.5-1

0 0.5 1 1.5 2 2.5-0.5

data, diff=5data, diff=3data, diff=1fit, diff=5fit, diff=3fit, diff=1

1 2 3 4 50.26

stimulus (diff) level

1 2 3 4 50.2

0 1 2 3 4 50

Optimal “bias” Xc/based on observedsensitivity data.

Observed “bias”, treatedas positive offsetfavoring response associated with highreward.

-10 -8 -6 -4 -2 0 2 4 6 8 100

0 0.5 1 1.5 2 2.50

optimal

0 0.5 1 1.5 2 2.50

optimal

0 0.5 1 1.5 2 2.5-0.5

optimal

Some possible models

• OU process ( < 0, 0 = 0) following F&H,with reward bias effect implemented as:

1. An alteration in initial condition, subject to decay 2. Optimal time-varying decision boundary outside of the OU

process3. An input ‘current’ starting at presentation of reward signal

1. Noise from reward onset2. Noise from stimulus onset

4. A constant offset or criterion shift unaffected by time

1. Reward as a change in initial condition, subject to decay

Note:1. Effect of the bias

decays away for lambda<0.

2. There is a dip at

3. At t=0, p=1.

aCaCt 0log1

0 0.5 1 1.5 2 2.50

Time (s)

RSC 1, diff 5RSC 0, diff 5RSC 1, diff 3RSC 0, diff 3RSC 1, diff 1RSC 0, diff 1

Feng & Holmes notes

)1()();1(),( 220

2 ttaCt etveetC

2. Time-varying optimal bias (Outside of OU process)

persists.2. There is a dip at

3. At t=0, p=1.4. The smaller the

stimulus effect, the larger the bias.

5. The harder the stimulus condition, the later the dip.

2log4122

)1()( 42log taC etb

)1()();1()(),( 22

2 ttaC etvetbtC

0 0.5 1 1.5 2 2.50

Time (s)

3.1. Reward acts as input “current”, stays on from reward signal to end of trial, noise starts at reward onset

Reward signal comes seconds before stimulus

Note:1. Effect of the

bias persists2. There is no

dip.3. At t=0, p<1.

Feng & Holmes notes

0 0.5 1 1.5 2 2.50

Time (s)

3.2. Same as 3.1 but variability is introduced only at stimulus onset

persists2. There is dip at

3. At t=0, p=1 since all accumulators have no variance.

baCbeaCt

0 0.5 1 1.5 2 2.50

Time (s)

4. Reward as a constant offset

Note:1. Equivalent to 3.2

for large

2. There is a dip at

3. At t=0, p=1

0 0.5 1 1.5 2 2.50

Time (s)

)1()();1(),( 220

2 ttaC etvetC

Some possible models

• OU models ( < 0, 0 = 0) following F&H,with reward bias effect implemented as:

1. An alteration in initial condition, subject to decay 2. Optimal time-varying decision boundary outside of the OU

process3. An input ‘current’ starting at presentation of reward signal

1. Noise from reward onset2. Noise from stimulus onset

4. A constant offset or criterion shift unaffected by time

• While none fit perfectly, starting point variability (0 > 0) would potentially improve 3.2 and 4.

Jay’s favorite mechanistic story(draws from Simen’s model)

• Participant learns to inject waves of activation that prime response accumulators; waves peak just after stimulus onset and have a residual.– Wave is higher for hi rwd response.

• Stimulus activation accumulates as in LCAM. • Response signal initiates added drive to both

accumulators equally.• First accumulator to fixed threshold initiates the

response.

Human Reward / Stimulus/ Response Signal Experiment: Data and Analysis

Documents