Schedules of Reinforcement and Choice. Simple Schedules Ratio Interval Fixed Variable.

Schedules of Reinforcement and Choice

Simple Schedules

• Ratio

• Interval

• Fixed

• Variable

Fixed Ratio

• CRF = FR1

• Partial/intermittent reinforcement

• Post reinforcement pause

Causes of FR PRP

• Fatigue hypothesis

• Satiation hypothesis

• Remaining-responses hypothesis– Reinforcer is a discriminative stimulus

signaling absence of next reinforcer any time soon

Evidence• PRP increases as FR size increases

– Does not support satiation

• Multiple FR schedules– Long and short schedules– PRP longer if next schedule long, shorter if next one

short• Does not support fatigue

FR10 FR40

L SL

L

SS

Fixed Interval

• Also has PRP

• Not remaining responses, though

• Time estimation

• Minimize cost-to-benefit

Variable Ratio

• Steady response pattern

• PRPs unusual

• High response rate

Variable Interval

• Steady response pattern

• Slower response rate than VR

Comparison of VR and VI Response Rates

• Response rate for VR faster than for VI

• Molecular theories– Small-scale events– Reinforcement on trial-by-trial basis

• Molar theories– Large-scale events– Reinforcement over whole session

IRT Reinforcement Theory

• Molecular theory• IRT: Interresponse time• Time between two consecutive

responses• VI schedule

– Long IRT reinforced

• VR schedule– Time irrelevant– Short IRT reinforced

Time b/t responses1 3 10 4 1 1 1 3 8 8 7 9 9 7 1 1 5 6 1 9 8 5 1 4 1 9 6 3 10 5 … i i i i i i i i i i i i i i i i r r r r r

Time/number for reinforcement

8 9 5 3 2 3 5 6 6 4 3 6 1 4 5 6 6 8 …

1 2 3 4 5 6 7 8 9 10

num

ber

seconds

Interval

1 2 3 4 5 6 7 8 9 10

num

ber

seconds

Ratio

• Random number generator (mean=5)• 30 reinforcer deliveries

Response-Reinforcer Correlation Theory

• Molar theory• Response-reinforcer

relationship across whole experimental session– Long-range

reinforcement outcome– Trial-by-trial unimportant

• Criticism: too cognitive

100

50

10050

VI 60 sec

VR 60

Responses/minute

Rei

nfor

cers

/hou

r

Choice

• 2 key/lever protocol

• Ratio-ratio

• Interval-interval

• Typically VI-VI

• CODs

Matching Law

• B = behaviour (responses)

• R = reinforcement

B1

B1 + B2

R1

R1 + R2

=B1

B2

R1

R2

=or

Bias

• Spend more time on one alternative than predicted

• Side preferences

• Biological predispositions

• Quality and amount

• Undermatching, overmatching

Qualities and Amounts

• Q1: quality of first reinforcer• Q2: quality of second reinforcer• A1: amount of first reinforcer• A2: amount of second reinforcer

Undermatching

• Most common

• Response proportions less extreme than reinforcement proportions

Overmatching

• Response proportions are more extreme than reinforcement proportions

• Rare

• Found when large penalty imposed for switching– e.g., barrier between keys

Undermatching/Overmatching

B1

B1+

B2

R1R1+R2

0 1

1

0.5

0.5

Undermatching

B

1B

1+B

2 R1R1+R2

0 1

1

0.5

0.5

Overmatching

Baum’s Variation

B1

B2

R1

R2

( )bs

=

• s = sensitivity of behaviour relative to rate of reinforcement– Perfect matching, s=1– Undermatching, s<1– Overmatching, s>1

• b = response bias

Matching as a Theory of Choice

• Animals match because they are evolved to do so.

• Nice, simple approach, but ultimately wrong.

• Consider a VR-VR schedule– Exclusively choose one alternative

• Whichever is lower

– Matching law can’t explain this

Melioration Theory

• Invest effort in “best” alternative• In VI-VI, partition responses to get best

reinforcer:response ratio– Overshooting the goal; feedback loop

• In VR-VR, keep shifting towards lower schedule; gives best reinforcer:response ratio

• Mixture of responding important over long run, but trial-by-trial responding shifts the balance

Optimization Theory

• Optimize reinforcement over long-term

• Minimum work for maximum gain

• Respond to both choices to maximize reinforcement

Momentary Maximization Theory

• Molecular theory

• Select alternative that has highest value at that moment

• Short-term vs. long-term benefits

Delay-reduction Theory

• Immediate or delayed reinforcement• Basic principles of matching law, and...• Choice directed towards whichever

alternative gives greatest reduction in delay to next reinforcer

• Molar (matching response:reinforcement) and molecular (control by shorter delay) features

Self-Control• Conflict between short- and long-term

choices

• Choice between small, immediate reward or larger, delayed reward

• Self-control easier if immediate reinforcer delayed or harder to get

Value-Discounting Function

• V = M/(1+KD)– V = value of reinforcer– M = reward magnitude– K = discounting rate parameter– D = reward delay

• Set M = 10, K = 5– If D = 0, then V = M/(1+0) = 10– If D = 10, then V = M/(1+5*10) = 10/51 = 0.196

Reward Size & Delay

• Set M=5, K=5, D=1– V = 5/(1+5*1) = 5/6 = 0.833

• Set M=10, K=5, D=5– V = 10/(1+5*5) = 10/26 = 0.385

• To get same V with D=5 need to set M=21.66

Ainslie-Rachlin Theory

long-term benefit short-term benefit Time T1 T2 • Value of reinforcer

decreases as delay b/t choice & getting reinforcer increases

• Choose reinforcer with higher value at the moment of choice

• Ability to change mind; binding decisions

Date post:	04-Jan-2016
Category:	Documents
Upload:	simon-atkinson
View:	214 times
Download:	0 times

Schedules of Reinforcement and Choice. Simple Schedules Ratio Interval Fixed Variable.

Documents