+ All Categories
Home > Documents > 6/26/20071 ACQ and the Basal Ganglia Jimmy Bonaiuto USC Brain Project 6/26/2007.

6/26/20071 ACQ and the Basal Ganglia Jimmy Bonaiuto USC Brain Project 6/26/2007.

Date post: 05-Jan-2016
Category:
Upload: easter-york
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
15
6/26/2007 1 ACQ and the Basal Ganglia Jimmy Bonaiuto USC Brain Project 6/26/2007
Transcript
Page 1: 6/26/20071 ACQ and the Basal Ganglia Jimmy Bonaiuto USC Brain Project 6/26/2007.

6/26/2007 1

ACQ and the Basal Ganglia

Jimmy Bonaiuto

USC Brain Project

6/26/2007

Page 2: 6/26/20071 ACQ and the Basal Ganglia Jimmy Bonaiuto USC Brain Project 6/26/2007.

6/26/2007 2

Actor-Critic Learning

• Actor – learns action policy

• Critic – learns value functions

• Different actor-critic architectures have been proposed for learning different value functions:– V(s) = State values (most common)– V(a) = Action values– Q(s,a) = State, action pair values

Page 3: 6/26/20071 ACQ and the Basal Ganglia Jimmy Bonaiuto USC Brain Project 6/26/2007.

6/26/2007 3

Actor-Critic Architecture

• Core Data – recording of midbrain dopaminergic neurons in appetitive learning tasks (Schultz, 1992; Schultz, 1998)

(from Barto, 1995)

Page 4: 6/26/20071 ACQ and the Basal Ganglia Jimmy Bonaiuto USC Brain Project 6/26/2007.

6/26/2007 4

Critic – V(s), V(a), or Q(s,a)?

• How do dopamine cells know about reward value? – Largest striatum input is from cortex (Haber and Gdowski, 2004)– V(s) and Q(s,a) learning may require the ventral striatum, SNc, and/or

VTA to receive a copy of the same cortical projections that the dorsal striatum receives (state information)

– V(a) may only require a projection from the dorsal striatum or globus pallidus (actor) to the ventral striatum, SNc and/or VTA (critic)

– Largest forebrain input to dopamine neurons is striatum (Haber and Gdowski, 2004)

- V(a) may be more biologically plausible in terms of connectivity

Page 5: 6/26/20071 ACQ and the Basal Ganglia Jimmy Bonaiuto USC Brain Project 6/26/2007.

6/26/2007 5

Actor-Critic in the Basal Ganglia

• Dopamine targets (striatum) are site of value and policy learning (Suri & Schultz, 2001)

• The striatum split into dorsal and ventral divisions (some say dorsolateral and ventromedial) (Voorn et al., 2004)– Ventral striatum – inputs from limbic

structures (critic?)– Dorsal striatum – connected with motor and

associative cortices (actor?)

Page 6: 6/26/20071 ACQ and the Basal Ganglia Jimmy Bonaiuto USC Brain Project 6/26/2007.

6/26/2007 6

Role of Dopamine

• (Joel & Weiner, 2000) Dopamine neurons in the ventral tegmental area (VTA) and substantia nigra pars compacta (SNc)– VTA projects to ventral striatum – learning state

values– SNc projects to dorsal striatum – policy learning

• Little difference in VTA and SNc firing (Schultz et al., 1993)– Predicted by TD learning equation since the policy

and values are both updated using TD error

Page 7: 6/26/20071 ACQ and the Basal Ganglia Jimmy Bonaiuto USC Brain Project 6/26/2007.

6/26/2007 7

ACQ

• Reinforcement learning should maximize total utility, not necessarily total reward. Motivations map outcomes to utilities (Niv et al., 2006)

• Multiple critics – one for each dimension of interoception (hunger, thirst, etc.)– Q(s ,a), s =internal state, a=action

• Actor– Composite policy

• Desirability – based on internal state• Executability – based on environmental state

– Eligibility trace from mirror and canonical motor signals

ii

Page 8: 6/26/20071 ACQ and the Basal Ganglia Jimmy Bonaiuto USC Brain Project 6/26/2007.

6/26/2007 8

ACQ – Actor/Multiple Critics

x=executed action

x=recognized action^

Page 9: 6/26/20071 ACQ and the Basal Ganglia Jimmy Bonaiuto USC Brain Project 6/26/2007.

6/26/2007 9

ACQ - Eligibility Trace

• = executed action (from efference copy)

• = recognized action (from mirror system)

Action Outcome

x x ε

Not Attempted 0.0 0.0 0.0Unsuccessful 1.0 0.0 -1.0Unintended 0.0 1.0 1.0Successful 1.0 1.0 2.0

^

Idealized situations (perfect recognition)Realistic implementation would haveconfidence values between 0.0 and 1.0 forx and x, but the pattern of values for εwould be the same

ˆ ˆ ˆx x x x x x

x

^

Page 10: 6/26/20071 ACQ and the Basal Ganglia Jimmy Bonaiuto USC Brain Project 6/26/2007.

6/26/2007 10

ACQ - Weight Modification

• Desirability and Executability updated using same eligibility and reinforcement signals

• Requires different weight change rules:

• Desirability

• Executability ˆe t d r t t W E

Tonic dopamine level, d, added to TD error – Makes sign of weight change depend on ε(t)

ˆ ˆ1 maxi t r t t x t W I

Don’t update the value of the last action unless some action is currently recognized

Step function of eligibility trace – Makes sign of weight change depend on r(t)^

Page 11: 6/26/20071 ACQ and the Basal Ganglia Jimmy Bonaiuto USC Brain Project 6/26/2007.

6/26/2007 11

Multiple Critics – Q(s ,a)

• Is there evidence for multiple critics gated by interoceptive information? – The lateral hypothalamus does project to the SNc, VTA, and the

ventral striatum (Saper et al., 1979; Fadel & Deutch, 2002; Brog et al., 1993)

– The accumbens shell of the ventral striatum is reciprocally connected with the lateral hypothalamus and has been called a “sensory sentinel” or “visceral striatum” (Kelley, 1999, 2004)

– Motivational state, such as food deprivation can influence the magnitude of dopamine release in the ventral striatum (Wilson et al., 1995; Ahn & Phillips, 1999)

– Sexual satiety is signaled by serotonin from the lateral hypothalamus to the ventral striatum, which reduces dopamine levels (Lorrain et al., 1999)

i

Page 12: 6/26/20071 ACQ and the Basal Ganglia Jimmy Bonaiuto USC Brain Project 6/26/2007.

6/26/2007 12

Internal State-Dependent Policy

• Is there evidence for internal state-dependent policies? (Kelley et al., 2005)– Information from the lateral hypothalamus

reaches the dorsal striatum through the paraventricular nucleus

– Hypothalamic-midline thalamic-striatal projections carry internal state information to cholinergic interneurons of the dorsal striatum

• These are thought to modulate dorsal striatal output neurons

Page 13: 6/26/20071 ACQ and the Basal Ganglia Jimmy Bonaiuto USC Brain Project 6/26/2007.

6/26/2007 13

Eligibility Trace from the Mirror System

• What is the evidence for an eligibility signal from mirror neurons?– People can implicitly learn sequences through action

observation (Bird et al., 2005)– The striatum is consistently implicated in implicit

sequence learning and the magnitude of activation is correlated with reaction time improvement (Rauch et al., 1997, 1998)

– The basal ganglia is active during action observation (Frey & Gerry, 2006)

– Projection from ventral premotor cortex (including the arcuate sulcus) to dorsal and ventral striatum in the macaque (McFarland & Haber, 2000)

Page 14: 6/26/20071 ACQ and the Basal Ganglia Jimmy Bonaiuto USC Brain Project 6/26/2007.

6/26/2007 14

References

• Ahn S, Phillips AG (1999) Dopaminergic Correlates of Sensory-Specific Satiety in the Medial Prefrontal Cortex and Nucleus Accumbens of the Rat. The Journal of Neuroscience, 19:RC29:1-6.

• Bird G, Osman M, Saggerson A, Heyes C (2005) Sequence learning by action, observation and action observation. British Journal of Psychology, 96: 371–388.

• Brog JS, Salyapongse A, Deutch AY, Zahm DS (1993) The patterns of afferent innervation of the core and shell in the Accumbens part of the rat ventral striatum: Immunohistochemical detection of retrogradely transported fluoro-gold. The Journal of Comparative Neurology, 338(2): 255-278.

• Fadel J, Deutch AY (2002) Anatomical Substrates of Orexin-Dopamine Interactions: Lateral hypothalamic projections to the ventral tegmental area. Neuroscience, 111(2): 379-387.

• Frey SH, Gerry VE (2006) Modulation of Neural Activity during Observational Learning of Actions and Their Sequential Orders. The Journal of Neuroscience, 26(51):13194-13201.

• Haber SN, Gdowski MJ (2004) The basal ganglia. In: The human nervous system (Paxinos G, Mai JK, eds) Ed 2 pp. 676–738. New York: Elsevier Academic.

• D. Joel and I. Weiner. The connections of the dopaminergic system with the striatum in rats and primates: An analysis with respect to the functional and compartmental organization of the striatum. Neuroscience, 96:451–474, 2000.

• Kelley AE (1999) Functional Specificity of Ventral Striatal Compartments in Appetitive Behaviors. Annals New York Academy of Sciences.

• Kelley AE (2004) Ventral striatal control of appetitive motivation: role in ingestive behavior and reward-related learning. Neurosci Biobehav Rev, 27: 765-776.

• Kelley AE, Baldo BA, Pratt WE (2005) A proposed hypothalamic-thalamic-striatal axis for the integration of energy balance, arousal, and food reward. J Comp Neurol. 493(1):72-85.

Page 15: 6/26/20071 ACQ and the Basal Ganglia Jimmy Bonaiuto USC Brain Project 6/26/2007.

6/26/2007 15

References

• Lorrain DS, Riolo JV, Matuszewich L, Hull EM (1999) Lateral Hypothalamic Serotonin Inhibits Nucleus Accumbens Dopamine: Implications for Sexual Satiety. The Journal of Neuroscience, 19(17):7648-7652.

• McFarland NR, Haber SN (2000) Convergent Inputs from Thalamic Motor Nuclei and Frontal Cortical Areas to the Dorsal Striatum in the Primate. The Journal of Neuroscience, 20(10): 3798–3813.

• Niv Y, Joel D, Dayan P (2006) A normative perspective on motivation. Trends in Cognitive Sciences, 10(8): 375-381.

• Rauch SL, Whalen PJ, Savage CR, Curran T, Kendrick A, Brown HD, Bush G, Breiter HC, Rosen BR (1997) Striatal Recruitment During an Implicit Sequence Learning Task as Measured by Functional Magnetic Resonance Imaging. Human Brain Mapping 5:124–132.

• Rauch SL, Whalen PJ, Curran T, McInerney S, Heckers S, Savage CR (1998) Thalamic deactivation during early implicit sequence learning: a functional MRI study. NeuroReport, 9: 865–870.

• Saper, C.B.; Swanson, L.W.; Cowan, W.M. (1979) An autoradiographic study of the efferent connections of the lateral hypothalamic area in the rat. J Comp Neurol., 183(4): 689-706.

• W. Schultz. Activity of dopamine neurons in the behaving primate. Seminars in the Neurosciences, 4:129–138, 1992.

• W. Schultz. Predictive reward signal of dopamine neurons. Journal of Neurophysiology, 80:1–27, 1998.• W. Schultz, P. Apicella, and T. Ljungberg. Responses of monkey dopamine neurons to reward and conditioned

stimuli during successive steps of learning a delayed response task. Journal of Neuroscience, 13:900–913, 1993. • R. E. Suri and W. Schultz. Temporal difference model reproduces predictive neural activity. Neural Computation,

13:841–862, 2001. • P. Voorn, L. J. Vanderschuren, H. J. Groenewegen, T. W. Robbins, and C. M. Pennartz. Putting a spin on the

dorsal-ventral divide of the striatum. Trends in Neuroscience, 27:468–474, 2004. • Wilson C, Nomikos GG, Collu M, Fibiger HC (1995) Dopaminergic correlates of motivated behavior: importance of

drive. Journal of Neuroscience, 15: 5169-5178.


Recommended