Computational Neuromodulation
Peter Dayan Gatsby Computational Neuroscience Unit
University College London
Nathaniel Daw Sham Kakade Read Montague
John O’Doherty Wolfram Schultz Ben Seymour
Terry Sejnowski Angela Yu
2
5. Diseases of the Will
• Contemplators• Bibliophiles and Polyglots • Megalomaniacs• Instrument addicts• Misfits
• Theorists
3
There are highly cultivated, wonderfully endowed minds whose wills suffer from a particular form of lethargy. Its undeniable symptoms include a facility for exposition, a creative and restless imagination, an aversion to the laboratory, and an indomitable dislike for concrete science and seemingly unimportant data… When faced with a difficult problem, they feel an irresistible urge to formulate a theory rather than question nature.
As might be expected, disappointments plague the theorist…
Theorists
4
Computation and the Brain• statistical computations
– representation from density estimation (Terry)– combining uncertain information over space,
time, modalities for sensory/memory inference– learning as a hierarchical Bayesian problem– learning as a filtering problem
• control theoretic computations– optimising rewards, punishments– homeostasis/allostasis
5
Conditioning
• Ethology
• Psychology– classical/operant conditioning
• Computation– dynamic programming– Kalman filtering
• Algorithm– TD/delta rules
• Neurobiology
neuromodulators;
amygdala; OFC; nucleus accumbens; dorsal striatum
prediction: of important events
control: in the light of those predictions
policy evaluation
policy improvement
6
Dopamine
no prediction prediction, reward prediction, no reward
RR
LSchultz et al R L R
• drug addiction, self-stimulation
• effect of antagonists
• effect on vigour
• link to action
• `scalar’ signal
7
Prediction, but What Sort?
• Sutton: predict sum future reward
s tV(t)= r(s)
s t+1=r(t)+ r(s) =r(t)+V(t+1)
(t)=r(t)+V(t+1)- V(t)TD error
8
Rewards rather than Punishments
no prediction prediction, reward prediction, no reward
(t)=r(t)+V(t+1)- V(t)TD error
V(t)
R
RL
dopamine cells in VTA/SNc Schultz et al
9
Prediction, but What Sort?
• Sutton:
• Watkins: policy evaluation
predict sum future reward
s tV(t)= r(s)
s t+1=r(t)+ r(s) =r(t)+V(t+1)
(t)=r(t)+V(t+1)- V(t)TD error
~ ( )
V( ) ( , ) ( )V( )xyy a x
x E r x a P a y
10
Policy Improvement
• Sutton: define (x;M) do R-M on:
uses the same TD error
• Watkins: value iteration with
~ ( ; )
( , ) ( )V( ) ( )xyy a x M
E r x a P a y V x
(t)
( , )Q x a
* *( , ) ( , ) ( )max ( , )xy by
Q x a r x a P a Q y b
Q b(t)=r(t)+max Q(t+1,b) - Q(t,a)
11
Active Issues
• exploration/exploitation• model-based (PFC)/cached (striatal) methods• motivational influences• vigour• hierarchical control (PFC)• hyperbolic discounting, Pavlovian misbehavior
and ‘the will’• representational learning• appetitive/aversive opponency• links with behavioural economics
12
Computation and the Brain• statistical computations
– representation from density estimation (Terry)– combining uncertain information over space,
time, modalities for sensory/memory inference– learning as a hierarchical Bayesian problem– learning as a filtering problem
• control theoretic computations– optimising rewards, punishments– homeostasis/allostasis– exploration/exploitation trade-offs
13
Uncertainty
Computational functions of uncertainty:
weaken top-down influence over sensory processing
promote learning about the relevant representations
expected uncertainty from known variability or ignorance
We focus on two different kinds of uncertainties:
unexpected uncertainty due to gross mismatch between prediction and observation
ACh
NE
14
Norepinephrine
• vigilance
• reversals
• modulates plasticity? exploration?
• scalar
15
Aston-Jones: Target Detectiondetect and react to a rare target amongst common distractors
• elevated tonic activity for reversal• activated by rare target (and reverses)• not reward/stimulus related? more response related?
16
Vigilance Task
• variable time in start• η controls confusability
• one single run• cumulative is clearer
• exact inference• effect of 80% prior
18
Phasic NE
• onset response from timing uncertainty (SET)
• growth as P(target)/0.2 rises
• act when P(target)=0.95
• stop if P(target)=0.01
• arbitrarily set NE=0 after 5 timesteps(small prob of reflexive action)
19
Four Types of Trial
19%
1.5%
1%
77%
fall is rather arbitrary
20
Response Locking
slightly flatters the model – since no furtherresponse variability
21
Interrupts/Resets (SB)
LC
PFC/ACC
22
Active Issues
• approximate inference strategy• interaction with expected
uncertainty (ACh)• other representations of
uncertainty• finer gradations of ignorance
23
Computation and the Brain• statistical computations
– representation from density estimation (Terry)– combining uncertain information over space,
time, modalities for sensory/memory inference– learning as a hierarchical Bayesian problem– learning as a filtering problem
• control theoretic computations– optimising rewards, punishments– homeostasis/allostasis– exploration/exploitation trade-offs
24
• general: excitability, signal/noise ratios
• specific: prediction errors, uncertainty signals
Computational Neuromodulation
25
Learning and Inference
• Learning: predict; control
∆ weight (learning rate) x (error) x (stimulus)
– dopaminephasic prediction error for future reward
– serotoninphasic prediction error for future punishment
– acetylcholineexpected uncertainty boosts learning
– norepinephrineunexpected uncertainty boosts learning
26
Learning and Inference
z
x
ACh
expecteduncertainty
top-downprocessing
bottom-upprocessing
sensory inputs
cortical processing
context
NE
unexpecteduncertainty
prediction, learning, ...
y
27
HighPain
LowPain
0.8 1.0
0.8 1.0
0.2
0.2
Temporal Difference Prediction Error
predict sum future pain:
s tV(t)= r(s)
s t+1=r(t)+ r(s) =r(t)+V(t+1)
(t)=r(t)+V(t+1)- V(t)TD error
∆ weight (learning rate) x (error) x (stimulus)
28
HighPain
LowPain
0.8 1.0
0.8 1.0
0.2
0.2
Prediction error
(t)=r(t)+V(t+1)- V(t)TD error
Temporal Difference Prediction Error
Value
29
TD model
?
A – B – HIGH C – D – LOW C – B – HIGH A – B – HIGH A – D – LOW C – D – LOW A – B – HIGH A – B – HIGH C – D – LOW C – B – HIGH
Brain responsesPrediction error
experimental sequence…..
MR scanner
Ben Seymour; John O’Doherty
Temporal Difference Prediction Error
30
TD prediction error:
ventral striatum
Z=-4 R
31
Temporal Difference Values
right anterior insula dorsal raphe?
32
Rewards rather than Punishments
no prediction prediction, reward prediction, no reward
(t)=r(t)+V(t+1)- V(t)TD error
V(t)
R
RL
dopamine cells in VTA/SNc Schultz et al
33
TD Prediction Errors
• computation: dynamic programming and optimal control
• algorithm: ongoing error in predictions of the future
• implementation:– dopamine: phasic prediction error for reward;
tonic punishment– serotonin: phasic prediction error for punishment;
tonic reward
• evident in VTA; striatum; raphe?
• next: action; motivation; addiction; misbehavior
35
Task Difficulty
• set η=0.65 rather than 0.675• information accumulates over a longer period• hits more affected than cr’s• timing not quite right
36
Intra-trial Uncertainty
• phasic NE as unexpected state change within a model
• relative to prior probability; against default
• interrupts (resets) ongoing processing
• tie to ADHD?
• close to alerting (AJ) – but not necessarily tied to behavioral output (onset rise)
• close to behavioural switching (PR) – but not DA
• farther from optimal inference (EB)
• phasic ACh: aspects of known variability within a state?
37
Where Next
• dopamine– tonic release and vigour– appetitive misbehaviour and hyperbolic
discounting– actions and habits– psychosis
• serotonin– aversive misbehaviour and psychiatry
• norepinephrine– stress, depression and beyond
38
ACh & NE have distinct behavioral effects:
• ACh boosts learning to stimuli with uncertain consequences
• NE boosts learning upon encountering global changes in the environment
(e.g. Bear & Singer, 1986; Kilgard & Merzenich, 1998)
ACh & NE have similar physiological effects
• suppress recurrent & feedback processing
• enhance thalamocortical transmission
• boost experience-dependent plasticity(e.g. Gil et al, 1997)
(e.g. Kimura et al, 1995; Kobayashi et al, 2000)
Experimental Data
(e.g. Bucci, Holland, & Gallagher, 1998)
(e.g. Devauges & Sara, 1990)
39
Model Schematics
z
x
ACh
expecteduncertainty
top-downprocessing
bottom-upprocessing
sensory inputs
cortical processing
context
NE
unexpecteduncertainty
prediction, learning, ...
y
40
Attentionattentional selection for (statistically) optimal processing,above and beyond the traditional view of resource constraint
sensoryinput
Example 1: Posner’s Task
stimulus
location
cue
sensoryinput
cue
highvalidity
lowvalidity
stimulus
location
(Phillips, McAlonan, Robb, & Brown, 2000)
cue
target
response
0.2-0.5s
0.1s
0.1s
0.15s
generalize to the case that cue identity changes with no notice
41
Formal Framework
cues: vestibular, visual, ... 4c3c2c1c
Starget: stimulus location, exit direction...
variability in quality of relevant cuevariability in identity of relevant cue
AChNE
Sensory Informationavoid representing
full uncertainty
t1 t1
it ttt DP )|(*
1
1)|(*
hDijP ttt
42
Simulation Results: Posner’s Task
increase ACh
valid
ity e
ffect
% normal level
100 120 140
decrease ACh
% normal level
100 80 60
VE (1- )(NE 1-ACh)
3c2c1c
S
vary cue validity vary ACh
fix relevant cue low NE
nicotine
valid
ity e
ffect
concentration concentration
scopolamine
(Phillips, McAlonan, Robb, & Brown, 2000)
43
Maze Task
example 2: attentional shift
reward
cue 1
cue 2
reward
cue 1
cue 2
relevant irrelevant
irrelevant relevant
(Devauges & Sara, 1990)
no issue of validity
44
Simulation Results: Maze Navigation
3c2c1c
S
fix cue validity no explicit manipulation of ACh
change relevant cue NE
% R
ats
reach
ing c
rite
rion
No. days after shift from spatial to visual task
% R
ats
reach
ing c
rite
rion
No. days after shift from spatial to visual task
experimental data model data
(Devauges & Sara, 1990)
45
Simulation Results: Full Modeltrue & estimated relevant stimuli
neuromodulation in action
trials
validity effect (VE)
46
Simulated Psychopharmacology
50% NE
50% ACh/NE
AChcompensation
NE cannearly catchup
47
Summary
• single framework for understanding ACh, NE and some aspects of attention
• ACh/NE as expected/unexpected uncertainty signals
• experimental psychopharmacological data replicated by model simulations
• implications from complex interactions between ACh & NE
• predictions at the cellular, systems, and behavioral levels
• activity vs weight vs neuromodulatory vs population representations of uncertainty