The Basal Ganglia (Lecture 6) Harry R. Erwin, PhD COMM2E University of Sunderland.

Post on 13-Jan-2016

215 views 0 download

Tags:

transcript

The Basal Ganglia(Lecture 6)

Harry R. Erwin, PhD

COMM2E

University of Sunderland

Why is this important?

• Not well-understood• Hot research area• Apparently underlies reward learning. • Related to the production of behaviour.• May play a role in spatial localization.• Now known to be insufficient for goal-directed

behaviour (Daw and Dayan), which seems to involve forward models in the prefrontal cortex or some specialised processing in the basal ganglia. Care for a Nobel Prize?—solve this!

Reinforcement Learning

• Montague, PR, Hyman, SE & JD Cohen, 2004, “Computational roles for dopamine in behavioural control,” Nature, 431:760-767, 14 October 2004.

• Reinforcement learning theories discuss how (habitual) behaviour is organized in response to rewards or reinforcers. This is not stimulus response learning. This is also not how goal-oriented behaviour is learned.

How it Works

• The 'reinforcement signal' distribution measures the current value of the possible states of the agent.

• The current state of the agent is converted into a 'value' using a 'value function'.

• A 'policy function' then maps the agent's states to its possible actions, with the probability of each possible action weighted by the value of the next state produced by the action.

Temporal Difference Learning

• A form of reinforcement learning of interest here is temporal difference learning, where– current TD error = current reward +

gamma*next prediction - current prediction.

• Supports the learning of a plan leading to a reward.

Actor-Critic Model• Sutton and Barto, 1998, Reinforcement Learning, MIT

Press, describe a mechanism for bootstrapping reinforcement learning.

• “Actor-critic methods … have a separate memory structure to explicitly represent the policy independent of the value function. The policy structure is known as the actor, because it is used to select actions, and the estimated value function is known as the critic, because it criticizes the actions made by the actor….”

• The critique takes the form of a TD error estimate.

The Algorithm

t = rt+1 + *V(st+1) - V(st), – where rt+1 is the actual reward at time t+1, – st is the state at time t, – V(s) is the current perceived value of the state, s, and is the discount rate that translates a value at time t+1 to

a lower value at time t.

t is the TD error estimate at time t+1 of following a specific action, a, at time t.

• V(s) is zero at terminal states, and rt is zero unless there is a real reward at time t.

Interpretation

t is the quantity that appears to correspond to the dopamine level output by the basal ganglia to the cortex (Schultz, et al.).

• How are V(s) and the preferences for the various actions, a, updated?

• Given a set of actions, a, let p(st, at) be the preference for action a at time t given state s.

• Then let the probability of picking a be (s,a) = exp(p(s,a))/(p(s,ab)) summed over all reasonable actions ab.

Learning Processes

• Now, update the function V(st,at) by adding t times some learning rate (less than one).

• Update p(st, at) by adding t times another learning rate (less than one).

• That’s all, folks.• Note the state space is very large.• Actor-critic learning cannot cope with

changing goal values.

A Few Points

• Actor-critic learning works better for high-level rather than low-level actions. Somehow the biological system is able to shift up.

• Note that the error, t, can be either positive or negative. The basal ganglia output both dopamine (+) and GABA (-) to represent the error. Cocaine has the property of producing an error signal that is always positive, which really fouls up the learning process.

• Mirror neurons may play a role in this and autism may be a malady of this system.

The Bootstrap Issue

• To make this work, the critic has to either innately know the rewards for various actor actions, or it has to learn them.

• The resulting ‘bootstrap’ problem is of particular importance in biological systems that might implement the model.

• One approach might be for the critic to reward all actions indiscriminately and then as noxious stimuli are reported by sensory systems, reduce the corresponding rewards.

• Is this biologically realistic?

The Basal Ganglia

• A richly connected set of brain nuclei in the fore- and mid-brain of amniotes.

• Degenerative diseases tend to produce severe movement deficits, but there is reason to believe the function of the basal ganglia is more general—the selection among candidate movements, goals, strategies, and interpretations of sensory information.– (Wilson, 2004, in Shepherd, from which much of this

presentation is derived).

Rostral Anatomy

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

<http://thalamus.wustl.edu/course/cerebell.html>

Medial Anatomy

<http://thalamus.wustl.edu/course/cerebell.html>

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Caudal Anatomy

<http://thalamus.wustl.edu/course/cerebell.html>

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Nuclei of the Basal Ganglia• The most prominent are the following:

– Caudate Nucleus– Putamen or Striatum– Nucleus Accumbens– Globus Pallidus (GP)

• external segment (GPe), internal segment (GPi)

– Substantia Nigra (SN)• pars reticulata (SNr), pars compacta (SNc)

– Subthalamic Nucleus

• The two largest sources of input are the cerebral cortex and thalamus

BG Circuits

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

(from Dreyer, http://www.unifr.ch/biochem/DREYER/BG.html )

Neostriatum

• The neostriatum consists of the caudate nucleus, the putamen, and the nucleus accumbens.

• For the caudate nucleus and putamen, inputs from sensory, motor, and association cortical areas converge with inputs from the thalamic intralaminar nuclei, dopaminergic inputs from the SNc, and 5HT inputs from the dorsal Raphe' nucleus (serotoninergic).

• This subsystem supports planning and reinforcement learning involving the PFC.

Putamen

• A portion of the basal ganglia that forms the outermost part of the lenticular nucleus.

• The motor and somatosensory cortices, the intralaminar nuclei of the thalamus, and the substantia nigra project to the putamen.

• The putamen projects to premotor and supplementary motor areas of cortex via the globus pallidus and thalamus.

• Coextensive with the insula, which has been found to contain mirror neurons.

Nucleus Accumbens

• There are similar connections from the limbic cortex (emotional) and hippocampus, converging with inputs from the ventral tegmental area (VTA) in the nucleus accumbens.

• The VTA is dopaminergic and seems to play a role in reward learning.

• This subsystem appears to support emotional learning.

Input Structure

• The cortex, thalamus, and amygdala provide glutamergic input to the neostriatum (and can produce LTP or LTD).

• Most neostriatal interneurons are GABAergic, except the cholinergic cells, which are neuromodulatory, and the output of the principal cells is also GABAergic.

Neostriatal Structure

• Consist mainly of principle neurons and afferent fibres, with smaller populations of interneurons.

• The neostriatum appears to be a functional remapping of the cortex, based on common interests of some sort. For example the neurons concerned with a finger will tend to project to a common area.

• Coincidence detection important.

Neostriatal Neurons

• GABAergic principal neurons firing rarely and for short periods of time (100-3000 msec).

• The axons emit local collaterals to form an extremely rich arborization and then project to their long-range destinations.

• Approximately half are direct pathway neurons and the other half are indirect pathway neurons. It's unclear in Wilson, but it may be that only the direct pathway neurons are collateralized.

Neostriatal Interneurons

• A number of rare types (eight to nine estimated). Three major types as follows:– Giant cholinergic interneurons forming a dense plexus

of extremely fine axonal branches. Tonic

– GABA/parvalbumin-containing basket cells. Very similar to basket cells of the hippocampus and cerebral cortex. Linked by gap junctions.

– Somatostatin (SOM)/nitric oxide synthetase (NOS)-containing interneurons. A neuromodulatory function. Probably GABAergic.

Neostriatal Outputs

• The output of the neostriatum projects to the GPe, GPi, and SNr.

• The GPi and SNr project (GABAergic projections) outside the basal ganglia to the thalamus (and mostly from there to the frontal cortex), the lateral habenular nucleus, and the deep layers of the superior colliculus.

• The GPe projects mostly to the subthalamic nucleus, which also receives frontal input and finally projects to the GPe, GPi, and SN.

Intermediate Processing

• At the GP and SN, most afference is from the neostriatum, with secondary input from the subthalamic nucleus.

• The GPe projects to the GPi and SNr and has recurrent local inhibitory connections.

• The GPe also receives some input from the cerebral cortex and thalamus.

• The subthalamic neurons receive excitatory inputs from the cortex and inhibitory input from the GPe.

GP Processing

• The principal cells of the GP are inhibitory, receive excitatory input from the subthalamic nucleus, and inhibitory input from the neostriatum.

• The GPe inhibits the GPi and the SNr, which are the output nuclei of the basal ganglia.

Phasic/Tonic

• The principal cells of the SNc are dopaminergic and neuromodulatory. The SNc and the VTA seem to encode rewards.

• The cells of the GP and SN fire tonically, at very high rates, the GP and SNr inhibiting neurons in the thalamus and SC.

• Phasic firing of neostriatal neurons produces a pause in this tonic firing, allowing thalamic and SC neurons to respond to input. (This can also terminate tonic activity in the cortex.)

Detailed Neostriatal Projections

• There are two pathways:– Direct pathway: neurons with direct projections

to GPi and SN (possibly in addition to the GPe), directly playing a role in the output of the basal ganglia.

– Indirect pathway: neurons that project only to GPe. These affect the output of the basal ganglia via projections of the subthalamic nucleus and the GPe.

Cell Counts

• Count of the neostriatum is estimated at about 100,000,000 neurons.

• The GP is about 700,000 neurons in toto, 170,000 in the GPi. Highly convergent.

• Spiking in the GP and SN is very localized. • Principal cells of the neostriatum receive about

11,000 afferent synapses from about the same number of thalamic and cortical neurons.

Patch Structure

• The primate neostriatum is organized into cell islands or clusters (striosomes or patches) in a background of lesser cellular density (the matrix). Afferent fibres observe this compartmentalization, with some cortical regions projecting to each.

• Infragranular pyramidal neurons (layers 5 and 6) seem to project to the patches, while extragranular neurons (layers 2 and 3) project to the matrix.

Targets of Patches

• The patches project preferentially to the dopaminergic neurons of the SNc, while the matrix projects to the SNr (non-dopaminergic neurons projecting to the thalamus and SC),

• Results in two parallel pathways (in addition to the direct and indirect pathways, which are present in both).

• Interneurons in the neostriatum may provide intercommunication between the two paths.

General Role of the Basal Ganglia

• The basal ganglia are suspected of being a system that detects candidate movements, goals, strategies, or interpretations of sensory patterns and releases responses.

• They seem to be a multisensory integration system, and this seems particularly the case with reference to the SC.

How it May Work

• DA neurons fire in response to the resolution of uncertainty about the prospects for reward, providing a training signal for the neostriatal system:– These fire more at the moment when the animal

recognizes it can begin a behavioral sequence that will end with a reward.

– Pause when an expected reward isn’t received.

• The neostriatum thus detects patterns of cortical activity associated with future reward, associating values to situations.

Why Two Neostriatal Areas?• The matrix seems to learn what has worked in the past.

• The patches learn which cortical inputs are best able to predict the value of particular situations.

• Patches might use dopaminergic signals based on current knowledge to learn how to predict dopaminergic signals more accurately. (Houk, Adams, and Barto)

• To avoid a bootstrap problem, there has to be innate neural connectivity so that immediate rewards for behaviour are signalled to the SN via the patches.

Basic Mechanism of the BG• ‘Disinhibition of proposed actions’ • The basal ganglia output nuclei tonically inhibit the

thalamic nuclei and the superior colliculus. • Released when input patterns excite principal

neurons of the neostriatum. • Tonic activity regulated by striatal projections to the

GPe via the GP (inhibitory principal neurons) and to the subthalamic nucleus (excitatory principal neurons) that increase the activity of the GPi and SNr neurons, producing a balanced opposition of activity.

Feedback in the Neostriatum

• Plenz, Dietmar, (2003), "When inhibition goes incognito: feedback interaction between spiny projection neurons in striatal function," TINS, 26(8):436-443, August 2003.

• This paper discusses how spiny projection neurons (the principal GABAergic neurons of the striatum) process cortical inputs in a highly parallel way.

Implications

• Striatal dynamics are probably not 'winner take all'. Local depolarization facilitates the depolarization of nearby cells, so that behavioural sequences can be generated.

• Plenz suggests the striatum could also function as a resistive grid that computes state transitions for movement trajectories. (See Connolly and Burns, 1993, "A model for the functioning of the striatum," Biological Cybernetics 68:535-544.)

• This is an important but unclear idea.

Conclusions

• If you want to use reward learning in a system that generates behaviour, look at the Actor-Critic model.

• If you want to build a biologically-inspired reward learning system, consider the basal ganglia as a model.

• If you want to do the same for a trajectory prediction system, also consider modelling the basal ganglia.