Operant Conditioning
Learning & Memory
Arlo Clark-Foos
Instrumental or Operant
• Law of Effect
“operates” on environment to cause an outcome
behavior is “instrumental” in causing outcome
• Priscilla the Fastidious Pig
• Thorndike & Skinnerhttps://www.youtube.com/watch?v=LSv992Ts6as
Classical vs. Instrumental
• Differences
– Classical
• Reflexive, automatic behavior
• Reinforcement follows CS, regardless of response
– Instrumental
• Voluntary behavior
• Reinforcement only follows the response
• Similarities• Negative acceleration, blocking, conditioned inhibition,
spontaneous recovery, generalization and discrimination…
History of Instrumental Cond.
• Edward Thorndike’s (1898) puzzle boxes
– Initially random acts
– Decrease in time to escape
– Law of Effect (S-R Association)• “Annoying” vs. “Satisfying” events
• Believed reinforcer is not part of association!
SD R
Superstitious Behavior
B.F. Skinner (1938) showed that nearly any behavior a pigeon performs during reinforcement will increase in frequency.
Belongingness
• Breland & Breland (1961)
– What makes Sammy dance?
• Shettleworth (1975)
“Reinforcing with food
only reinforces feeding
Behaviors”
Learned Helplessness
• Seligman & Maier (1967)– Rats and yoked shocks
– Later extended to college students and anagrams
– Also extended to depression
Losing Streaks
Detroit Lions, 2008
Detroit Lions, 2015?
METHODOLOGY
Studying/Observing Instrumental Learning
Willard Small
• 1901: Introduced mazes to animal research
Hampton Court, London
Mazes in Research
Mazes in Research
• T-Maze
– Alternation learning
– Better at win-shift than win-stay
• Radial Arm Maze
– Random without repetition
– Memory Load: 16+
Mazes in Research
• Morris Water Maze
– Cued (Response) Learning
• Rats can see the platform: S-R Association
– Place Learning
• Platform is below surface: Explicit, cognitive memory
Conditioning Takes Time
• Skinner’s Free Operant Protocol (vs. Discrete Trials)– Skinner box (automatizing data collection)
• Cumulative recorder (akin to Odometer)
– Secondary Reinforcer
What is Learned?
• Discriminative Stimuli (SD)SD (light on) R (press lever) O (get food)
SD (light off) R (press lever) O (no food)
Habit Slips
(Slips of Action; Reason, 1975)
• Responses (R)– Lashley’s rats swimming mazes (different motor responses)
• Outcomes (O)– Reinforcers and Punishments
Shaping Behavior
• Shaping
– Requires skilled trainer
• Physical rehabilitation and language in autism
• Bomb/drug detecting dogs
• Chaining
– Backward chaining
Twiggyhttps://www.youtube.com/watch?v=dVfXF8O-lHw
Human Skills and Habits
• Walking
– feedback from vision/muscles?
1. Lashley (1951): RTs > 100ms
• Pianists: 16+ movements per second
2. Damage to sensory feedback
3. Sequencing errors
4. Time to initiate depends on length
Human Skills and Habits
• Motor Programs– Initiated complete
– General outline, malleable (Schmidt, 1988)
• Skill Acquisition (Anderson, 1982)
1. Cognitive Stage
2. Associative Stage
3. Autonomous Stage
Reinforcers
• Primary– Food, water, sleep, sex, shelter (temp control)
• Secondary– Predict arrival of primary
– Token Economies (Conestogas)
• Drive Reduction Theory (Hull, 1943)– Primary not always reinforcing
• Negative contrast – Nipple sucking for sugar water
– Lame treats on Halloween
Punishers
• Determinants of effectiveness1. Punishment variable behavior
• Hot stove
2. SD can encourage cheating• Speeding or my dog and Krispy Kreme
3. Concurrent reinforcement• Class clowns
4. Intensity matters• Child rearing or criminal justice
Differential Reinforcement of Alternative Behaviors (DRA)
• Cinemark (2011)
Building SD R O
• Timing
– Immediate is best• Criminal Justice, Punishment
• Self Control– Immediate vs. Delayed Reward
– Diets, Studying, etc.
– Precommitment (SI)
Positive vs Negative Reinforcement
Positive vs Negative Punishment
Reinforcement Schedules
• Continuous vs. Partial
• Fixed-ratio (FR)– Postreinforcement pause
• Variable-ratio (VR)– Slot machine (keep playing)
• Fixed-interval (FI)– TBPM
• Variable-interval (VI)– Waiting is the hardest part
Choosing Between Behaviors
• Concurrent reinforcement schedules– Football on Saturdays
• Matching Law– Behavioral Economics (Thaler wins Nobel Prize, 2017)
– Bliss point and Sunfish (observation of behavior)
Why do I watch football?
• Behaviors with no primary reinforcers
• Premack Principle (1959)– Rats with water/wheel, Children with candy/pinball
• For me: Grading/Cleaning
– Response Deprivation Hypothesis• Illegal Drugs?
BRAIN SUBSTRATES
SD R
• Basal ganglia
– Dorsal Striatum (caudate nucleus, putamen)• Receives highly processed sensory info
• Projects to M1
• Lesioned rats fail to learn behaviors in response to stimuli
SD (light) R (lever press) O (food)
• Habitual and Automatic Behaviors– Bike riding, playing instruments, running past food in a maze
R O
• Prefrontal Cortex
– Orbitofrontal cortex (OPFC)• Receives sensory input (senses and visceral)
• Projects to dorsal striatum
• Grape juice neurons (Tremblay & Schultz, 1999)
“I want you to want me” by Cheap Trick
• James Olds (1954)
– Electrical current in lateral hypothalamus• 700 times an hour, physical exhaustion, starvation
• Ventral Tegmental Area (VMA)– Pleasure center?
– Excitement/anticipation?
– Motivational value
– Projects to SNc
Wanting in the VTA/SNc
• VTA SNc– Dopaminergic System
– Incentive Salience Hypothesis
– Working for pleasure (want/drive)• What if there is no drive (no dopamine)?
• Addiction, cues, and precommitment
Endogenous Opioids
• Exogenous Opiates: Opium, Morphine, Heroin
– May mediate Hedonic value• Increases liking of other stimuli
• Decreases perception of pain
– Endogenous released in response to primary reinforcers• Which and how many activated may determine preference
– Nipple Suckers
– Play Halo or Watch Cartoons
Punishment Signaling
• Somatosensory Cortex (S1)– Nociceptors
• Social Rejection
– Insular Cortex (Insula)• Dorsal posterior insula
• Degree of activation correlates with
magnitude of punisher
– Dorsal Anterior Cingulate Cortex• Motivational value of punishment
Drug Addiction
• Pathological– Known harmful consequences
– Concurrent reinforcement
• “Yay drugs” & “Boo withdrawals”
• Dopaminergic System– Stroke damage to insula can wipe out addiction
“Might as well face it, you’re addicted to love”
• Behavioral Addiction
– Gambling, VR Schedules (Skinner), and Gambler’s Fallacy
– Parkinson’s patients and dopamine agonists
– Cognitive and Behavioral Therapies based on Conditioning
Not All Conditioning is Equal
• Partial Reinforcement Effect
– Partial Reinforcement Extinction Effect (PREE)
• Frustration (Amsel) vs. Sequential (Capaldi) Theories
• Fixed vs. Variable & Ratio vs. Interval
– Child rearing, pet training, gambling, supersition
What explains the PREE?
Frustration Theory (Amsel)
CRFR+
ExtinctionR-
Frustration Punishes Response
Evidence for Frustration:• Behavior of pigeons• Children tantrums
CRF: R+ R+ R+ R+ R+ R+• Develop (R-O) expectancyPRF: R+ R+ R- R+ R- R-• Develop (R-O) and (R-no O) expectancy
S(frustration)
R O
What explains the PREE?
Sequential Theory (Capaldi)
Outcome of previous trial serves as a cue for subsequent behavior
PRF: R+ R+ R- R+ R- R-
Fm Fm NFm Fm NFm NFm
• NFm – R (S-R) strengthened by next R+
What happens with long ITI?....Decay• Frustration?• Memory?
Stronger PREE with long ITI
Complex Behavior
• Response Chaining
– Backward Chaining
– Breaks in the “chain”
– Animal intelligence
Striatum and Skill/Habit
• Caudate, putamen, nucleus accumbens
• Organizes somatosensory representations and motor responses for planning and executing goal-oriented behavior.
Double Dissociation
• Broca vs. Wernicke
Packard et al. (1989)
• Radial Arm Maze (8 arms)
• Win-Stay vs. Win-Shift
Response vs. Place Learning
Habit Learning in Humans
• Parkinson’s Disease
– Impaired dopaminergic system in striatum
• Huntington’s Disease
– Loss of some striatal function
(Gabrieli, 1995)
Weather Prediction Game
• Knowlton et al. (1996)
Weather Prediction Game
• Knowlton et al. (1996)
Weather Prediction Game
• Poldrack et al. (1999)
Neurophysiological Data
• Mink (1996)
– Neurons in striatum fire in anticipation of movement
• Schultz (2006)
– DA Neurons from brain stem into striatum
– Fire with expectation and reception of rewards• Blocking and expectation
Loose Ends
• Addiction and Drug Use
– Dopamine and Reward
• Stress and Memory
– Anxiogenics Response Strategy (Packard & Wingard, 2004)
• Peripheral or Intra-Basolateral Amygdala (Hippocampus)
• Yohimibine, RS78848-197, Vehicle (Placebo)
– “Autopilot”