Population Codes & Inference in Neurons Richard Zemel Department of Computer Science University of...

Post on 20-Jan-2016

214 views 0 download

Tags:

transcript

Population Codes & Inference in Neurons

Richard Zemel

Department of Computer ScienceDepartment of Computer ScienceUniversity of TorontoUniversity of Toronto

Basic questions of neural representation

Fundamental issue in computational neuroscience:

How is information represented in the brain?

What are the units of computation?

How is information processed at neural level?

Important part of answer: information not processed by single cells, but by populations

Population CodesCoding first thought to be localist: neurons as binary

units, encode unique value

Alternative: more distributed, graded response; neuron’s level of activity conveys information

Population code: group of units tuned to common variable

Good computational strategy: efficient and robust

Population codes all the way down

Examples: visual features;motor commands; other sensoryproperties; place fields

Outline1) Information processing in population codes

a) reading the neural codeb) computation in populations

2) Extending the information in population codesa) representing probability distributionsb) methods for encoding/decoding

distributions in neurons3) Maintaining and updating distributions

through time: dynamic distributionsa) optimal analytic formulationb) network approximation

Reading the Neural CodeNeurophysiologists collect neural recordings:

sequences of action potentials (spikes) from one or several cells during controlled experiment

Task: reconstruct identity, or value of parameter(s)

Why play the homunculus? Assess degree to which that parameter encoded

(establish sufficiency, not necessity) Limits on reliability and accuracy of neuronal encoding

(estimate optimal parameters) Characterize information processing: nervous system

faced with this decoding problem

Rate representation of response

Spikes convey information through timing

Typically converted into scalar rate value, summarized in ri: firing rate of cell i (#spikes in interval/interval size)

Interval size determines amount of information about spike timing lost in firing rate representation

Can also consider firing rate of cell as the probability that the cell will fire within specified time interval

Example: Reconstructing movement directionTask: given rates in population of direction-selective

cells (r = r1,…,rN) compute arm direction

Cells in motor cortex (M1) tuned to movement angle

Tuning function (curve) fi(x):

fi(x) = A + B cos (x-xi)

A = ½ (rimax + ri

min) B = ½ (rimax - ri

min)

Population vector methodConsider each cell as vector pointing in preferred direction xi

Length of vector represents relative response strength for particular movement direction

Sum of vectors is the estimated movement direction: the population vector

Simple, robust accurate method if N large, and {xi} randomly, uniformly span the space of directions

Can also view as reconstruction with cosine basis:

=

Bayesian reconstructionBasis function methods perform well, but other class of methods in some sense optimal

Set up statistical model: signal x produces response r, need to invert to model to find likely x for given r

Begin with encoding model

ri(x) = fi(x) + η rate ri(x) is random variable: response of cell i in population to stimulus x

tuning function fi(x) describes expected rate

noise η typically assumed Poisson or Gaussian

Goal: decode responses to form posterior distribution

P(x|r)= P(r|x) P(x) / P(r)

Standard Bayesian reconstruction

likelihood P(r|x) based on encoding model

assumptions in standard model: spikes have Poisson distribution (natural if rate defined as

spike count, spikes distributed independently, randomly) noise uncorrelated between different cells: all variability

captured in P(ri|x)

intuition: gain precision through multiplying rather than adding basis functions (tuning curves here)

obtain single value estimate through MAP or ML

Application: hippocampal population codes

P(x) based on spatial occupancy; P(ri|x) are place fields

Zhang et al.

ML reconstruction

under simplifying assumptions, ML reconstruction has simple intuitive form

implement ML by maximizing

if tuning curves evenly distributed ( constant)

for Gaussian tuning curves,

xM L =P

i ri xiPi ri

Computation in population codes

most of computational focus on population codes based on observation that they offer compromise: localist codes have problems with noise, robustness,

number of neurons required fully distributed codes can make decoding complicated,

cannot handle multiple values

other properties of population codes studied recently, key focus (driven partly by biological studies) on recurrent connections between units in population

Line attractor

simple network model, with recurrent connections Tij, governed by dynamic equation: ui is net input into unit i; rate ri its output; is

recurrent input; hi its feedforward input

if rate linear above threshold input:

recurrent contribution of j on i: feedforward contribution:

in general, set of N linear equations in N unknowns has unique solution, but can tune connections so fixed points (attractors) lie along a line

Line attractor model

applied to number of problems: short-term memory: remembering facing direction after

closing eyes, rotating head noise removal: used to clean up noisy population

responses set up lateral cxns so smooth hill centered on any point is stable transient noisy input, network settles into hill of activity peak position close approximation to xML: process allows simple

decoder (e.g., population vector method) to approximate ML

other recurrent connection schemes produce stimulus selection (nonlinear, WTA); gain modulation (linear, scale responses by background amplitude)

Outline1) Information processing in population codes

a) reading the neural codeb) computation in populations

2) Extending the information in population codesa) representing probability distributionsb) methods for encoding/decoding

distributions in neurons3) Maintaining and updating distributions

through time: dynamic distributionsa) optimal analytic formulationb) network approximation

Extending information in population codes Standard model focuses on encoding single value of x in

face of noisy r

Alternative: populations represent more than single value; motivated by computational efficiency, also necessity – handle important natural situations

(1). Multiple values

Extending information in population codes

(2). uncertainty (noise at all levels; inherent in image – insufficient information, e.g., low-contrast images)

Aperture Problem

Adelson & Movshon

v v

Inherent Ambiguity

All possible motion vectors lie along a line in the 2D vx,vy ‘velocity space’

vy

vx

Human behavior: Bayesian judgements

PriorLikelihood

Posterior

*

Weiss, Simoncelli, AdelsonWeiss, Simoncelli, Adelson

Bayesian cue combination

Ernst & BanksErnst & Banks

(A). Gain EncodingSimple extension of standard population code interpretation:

activity is noisy response of units to single underlying value

Encoding: P(ri|), for example, bell-shaped tuning:

Aim: given unit activities r, tuning curves fi(), find directions P(|r)

Decoding: log P(ri|), e.g., assume indy Poisson noise

(A). Gain Encoding (cont).

Gaussian, homogeneous fi(), uniform prior: log P(|{ri}) ! Gaussian

Solve for , by completing the square

Simple mechanism for encoding uncertainty: change overall population

activity (gain); but limited to Gaussian posterior

(A). Gain encoding: Transparent motion

Solve for , by completing the square

convolves responses w/ unimodal kernels

1. unimodal response pattern produces unimodal distn.2. surprisingly, also fails on bimodal response patterns

only extracts single motion component from responses to transparent motion

(B). Direct Encoding

Activity corresponds directly to probability

Simple case: binary (A vs. B):probability neuron 1 spikes P(A),or can wait to compute rates r1 P(A)

Note: r1 can also represent log P(A); log P(A)/P(B )

Shadlen et al; Rao; Deneve; Hoyer & Hyvarinen

(B). Direct Encoding: Example

Discrete alternatives i for explaining input s

ri / log P(s|i) = likelihood for i

Standard model for neural motion analysis: motion energy filter

Filter response gi(s) is energy of video s(y,t) convolved with oriented filter, tuned to velocity

i

Probabilistic model predicts ideal video is formed by image s(y) translating at velocity

i

(B). Direct Encoding: Example

= 0

= 1

= 2

Weiss & Fleet, 02

t

y

Weiss & Fleet

(C). Convolution Codes

Characterize population response in terms of P(|r) --standard model restricted to Gaussian posterior

Convolution codes can represent more general density fcns, introduce level of indirection to

direct method

Two forms of convolution codes:1. Decoding kernels2. Encoding kernels

(C). Convolution Codes

• bases can be distributions: P(|r) normalized

• bases can have simple form: i() = ( - i)

• multimodal P(|r) if active neurons have different i

Anderson

Decoding kernels (bases):

(C). Convolution Codes: DPC

if P() = (,*) then <ri> = i(*), so couldchoose tuning functions fi() as kernels

Zemel, Dayan, & Pouget

Encoding kernels (bases):

Decoding:• deconvolution (cannot recover high

freqs.)• probabilistic approach: nonlinear

regression to optimize P(|r) under encoding model

Sums or Products?

kernel decoder

kernel encoder

(C). Convolution codes: Transparent motion

Bimodal response patterns: recovers generating distributionUnimodal patterns fit, until (matches subject’s uncertainty)

(C). Convolution Codes: Extension

handle situation with multiple values and uncertainty

library of functions () that describe combinations

of values of

Sahani & Dayan

Outline1) Information processing in population codes

a) reading the neural codeb) computation in populations

2) Extending the information in population codesa) representing probability distributionsb) methods for encoding/decoding

distributions in neurons3) Maintaining and updating distributions

through time: dynamic distributionsa) optimal analytic formulationb) network approximation

Dynamic distributions: motivation

Dynamic cue combination

information constantly changing over time: extend framework to encode/decode dynamic

distributions

Kording & WolpertKording & Wolpert

Dynamic cue combination

Dynamic Distributions: decodingSpike train R(t) what is P(X(t)|R(t))?

Markov: dynamics determined by Tij = P(Xi(t)|Xj(t-1))

More general form: continuous time: R(t), X(t) is spike, posn from 0 to t: R(0)…R(t-²); X(0)…X(t-²)

GP spikes: Encoding model & prior

instantaneous, independent, inhomogeneous Poisson process:

and a Gaussian Process prior:

® defines the smoothness of the prior, and ¿ defines the speed of movement

P (R (t)jX (t)) =NY

j =1

MY

m=0

P (R j (tm)jX (tm)) /Y

j ;tm

f j (X (tm))

Huys, Zemel, Natarajan, Dayan

GP spikes decoding: Dynamics prior is key

static stimulus prior (® = 0):

dynamic stimulus prior (® > 0): spikes not eternally informative 1st-order Markov (® = 1) : OU process high-order (® = 2): smooth process

m(t) =P

j µj (X

f tm <tg

k(tm)R j (tm))

Trajectories & kernelsOU (® = 1) Smooth (®=2)

Optimal Dynamic Distributions

Analytically tractable formulationPrior important for rapidly changing stimuli – fewer spikes than temporal variations in stimulusFor smooth (natural) stimuli: no recursive formulation, recompute kernel per spikeDecoding: must maintain spike history

Hypothesis: Recoding spikes

Recode input spikes into a new set of spikes to facilitate downstream processing; obviate need to store spike historyTrain network to produce new spikes so that simple decoder can approximate optimal decoding of input spikes

Natarajan, Huys, Dayan, ZemelNatarajan, Huys, Dayan, Zemel

Log-linear spike decodingeffect of spike on postsynaptic neuron: produces smoothly decaying postsynaptic potential

t)

t

1 1

1

1

1 1

X X

Hinton & BrownHinton & Brown

1. Convolution kernel decoder for S(t):

2. Processing dynamics: standard recurrent net

3. Learn weights W, V to minimize

Dynamic Distributions: recoding networkAim: map spikes R(t) to S(t), so that simple decoding of S(t) approximates optimal P(X(t)|R(t))

Recoding network: example

Recoding network: analyzing kernels

Recoding network: results summary

Discussion

Current directions: Apply scheme recursively, hierarchically

Relate model to experimental results, e.g., Kording & Wolpert

Open issues: High-dimensional spaces: curse of dimensionality

doubled?

Experimental validation or refutation of proposed distributional schemes?