Date post: | 04-Jan-2016 |
Category: |
Documents |
Upload: | simon-atkinson |
View: | 214 times |
Download: | 0 times |
Schedules of Reinforcement and Choice
Simple Schedules
• Ratio
• Interval
• Fixed
• Variable
Fixed Ratio
• CRF = FR1
• Partial/intermittent reinforcement
• Post reinforcement pause
Causes of FR PRP
• Fatigue hypothesis
• Satiation hypothesis
• Remaining-responses hypothesis– Reinforcer is a discriminative stimulus
signaling absence of next reinforcer any time soon
Evidence• PRP increases as FR size increases
– Does not support satiation
• Multiple FR schedules– Long and short schedules– PRP longer if next schedule long, shorter if next one
short• Does not support fatigue
FR10 FR40
L SL
L
SS
Fixed Interval
• Also has PRP
• Not remaining responses, though
• Time estimation
• Minimize cost-to-benefit
Variable Ratio
• Steady response pattern
• PRPs unusual
• High response rate
Variable Interval
• Steady response pattern
• Slower response rate than VR
Comparison of VR and VI Response Rates
• Response rate for VR faster than for VI
• Molecular theories– Small-scale events– Reinforcement on trial-by-trial basis
• Molar theories– Large-scale events– Reinforcement over whole session
IRT Reinforcement Theory
• Molecular theory• IRT: Interresponse time• Time between two consecutive
responses• VI schedule
– Long IRT reinforced
• VR schedule– Time irrelevant– Short IRT reinforced
Time b/t responses1 3 10 4 1 1 1 3 8 8 7 9 9 7 1 1 5 6 1 9 8 5 1 4 1 9 6 3 10 5 … i i i i i i i i i i i i i i i i r r r r r
Time/number for reinforcement
8 9 5 3 2 3 5 6 6 4 3 6 1 4 5 6 6 8 …
1 2 3 4 5 6 7 8 9 10
num
ber
seconds
Interval
1 2 3 4 5 6 7 8 9 10
num
ber
seconds
Ratio
• Random number generator (mean=5)• 30 reinforcer deliveries
Response-Reinforcer Correlation Theory
• Molar theory• Response-reinforcer
relationship across whole experimental session– Long-range
reinforcement outcome– Trial-by-trial unimportant
• Criticism: too cognitive
100
50
10050
VI 60 sec
VR 60
Responses/minute
Rei
nfor
cers
/hou
r
Choice
• 2 key/lever protocol
• Ratio-ratio
• Interval-interval
• Typically VI-VI
• CODs
Matching Law
• B = behaviour (responses)
• R = reinforcement
B1
B1 + B2
R1
R1 + R2
=B1
B2
R1
R2
=or
Bias
• Spend more time on one alternative than predicted
• Side preferences
• Biological predispositions
• Quality and amount
• Undermatching, overmatching
Qualities and Amounts
• Q1: quality of first reinforcer• Q2: quality of second reinforcer• A1: amount of first reinforcer• A2: amount of second reinforcer
Undermatching
• Most common
• Response proportions less extreme than reinforcement proportions
Overmatching
• Response proportions are more extreme than reinforcement proportions
• Rare
• Found when large penalty imposed for switching– e.g., barrier between keys
Undermatching/Overmatching
B1
B1+
B2
R1R1+R2
0 1
1
0.5
0.5
Undermatching
B
1B
1+B
2 R1R1+R2
0 1
1
0.5
0.5
Overmatching
Baum’s Variation
B1
B2
R1
R2
( )bs
=
• s = sensitivity of behaviour relative to rate of reinforcement– Perfect matching, s=1– Undermatching, s<1– Overmatching, s>1
• b = response bias
Matching as a Theory of Choice
• Animals match because they are evolved to do so.
• Nice, simple approach, but ultimately wrong.
• Consider a VR-VR schedule– Exclusively choose one alternative
• Whichever is lower
– Matching law can’t explain this
Melioration Theory
• Invest effort in “best” alternative• In VI-VI, partition responses to get best
reinforcer:response ratio– Overshooting the goal; feedback loop
• In VR-VR, keep shifting towards lower schedule; gives best reinforcer:response ratio
• Mixture of responding important over long run, but trial-by-trial responding shifts the balance
Optimization Theory
• Optimize reinforcement over long-term
• Minimum work for maximum gain
• Respond to both choices to maximize reinforcement
Momentary Maximization Theory
• Molecular theory
• Select alternative that has highest value at that moment
• Short-term vs. long-term benefits
Delay-reduction Theory
• Immediate or delayed reinforcement• Basic principles of matching law, and...• Choice directed towards whichever
alternative gives greatest reduction in delay to next reinforcer
• Molar (matching response:reinforcement) and molecular (control by shorter delay) features
Self-Control• Conflict between short- and long-term
choices
• Choice between small, immediate reward or larger, delayed reward
• Self-control easier if immediate reinforcer delayed or harder to get
Value-Discounting Function
• V = M/(1+KD)– V = value of reinforcer– M = reward magnitude– K = discounting rate parameter– D = reward delay
• Set M = 10, K = 5– If D = 0, then V = M/(1+0) = 10– If D = 10, then V = M/(1+5*10) = 10/51 = 0.196
Reward Size & Delay
• Set M=5, K=5, D=1– V = 5/(1+5*1) = 5/6 = 0.833
• Set M=10, K=5, D=5– V = 10/(1+5*5) = 10/26 = 0.385
• To get same V with D=5 need to set M=21.66
Ainslie-Rachlin Theory
long-term benefit short-term benefit Time T1 T2 • Value of reinforcer
decreases as delay b/t choice & getting reinforcer increases
• Choose reinforcer with higher value at the moment of choice
• Ability to change mind; binding decisions