arXiv:q-bio.NC/0505003 v1 2 May 2005 - princeton.eduwbialek/our_papers/bialek+ruyter_05.pdf ·...

arX

iv:q

-bio

.NC

/050

5003

v1

2 M

ay 2

005

Features and dimensions: Motion estimation in fly vision

William Bialeka and Rob R. de Ruyter van Steveninckb

aJoseph Henry Laboratories of Physics, bDepartment of Molecular Biology,and the Lewis–Sigler Institute for Integrative Genomics

Princeton University, Princeton, New Jersey 08544bDepartment of Physics, Indiana University, Bloomington, Indiana 47405

(Dated: May 2, 2005)

We characterize the computation of motion in the fly visual system as a mapping from the highdimensional space of signals in the retinal photodetector array to the probability of generating anaction potential in a motion sensitive neuron. Our approach to this problem identifies a low dimen-sional subspace of signals within which the neuron is most sensitive, and then samples this subspaceto visualize the nonlinear structure of the mapping. The results illustrate the computational strate-gies predicted for a system that makes optimal motion estimates given the physical noise sourcesin the detector array. More generally, the hypothesis that neurons are sensitive to low dimensionalsubspaces of their inputs formalizes the intuitive notion of feature selectivity and suggests a strategyfor characterizing the neural processing of complex, naturalistic sensory inputs.

I. INTRODUCTION

Vision begins with the counting of photons by alarge array of detector elements in the retina. Fromthese inputs, the brain is thought to extract features,such as the edges in an image or the velocity of motionacross the visual field, out of which our perceptions areconstructed. In some cases we can point to individualneurons in the brain that represent the output of thisfeature extraction; classic examples include the center–surround comparison encoded by retinal ganglion cells[1], the orientation selective “edge detectors” describedby Hubel and Wiesel in primary visual cortex [2], andthe direction selective, motion sensitive neurons foundin visual systems from flies [3] to rabbits [4] to primates[5]. As emphasized by Marr, feature extraction seemssuch a natural and elementary step in sensory signalprocessing that it is easy to overlook the challengesposed by these computations [6].

Our focus in this paper is on the computations lead-ing to the extraction of motion across the visual field,although we believe that the key issues are commonto many different problems in neural computation. Inprimates the spike trains of motion sensitive neuronsare correlated with perceptual decision making on atrial–by–trial basis [7], lesions to populations of theseneurons produce specific behavioral deficits [8], andstimulation of the population can bias both percep-tual decisions [9] and more graded visuomotor behav-iors [10]. In insects, ablation of single motion sen-sitive neurons leads to deficits of visuomotor behav-ior that match the spatial and directional tuning ofthe individual neurons [11], and one can use the spikesequences from these cells to estimate the trajectoryof motion across the visual field or to distininguishamong subtly different trajectories [12]. Strikingly, atleast under some conditions the precision of these mo-tion estimates approaches the physical limits set by

diffraction and photon shot noise at the visual input[13, 14, 15]. Taken together these observations suggeststrongly that the motion sensitive neurons representmost of what the organism knows about visual mo-tion, and in some cases everything that the organismcould know given the physical signals and noise in theretina.

It is tempting to think that the stimulus for a motionsensitive neuron is the velocity of motion across thevisual field, but this is wrong: the input to all visualcomputation is a representation of the spatiotemporalhistory of light intensity falling on the retina, I(~x, t).This representation is approximate, first because thephysical carrier, a photon stream, is inherently noisy,and second because the intensity pattern is blurred bythe optics, and sampled in a discrete raster. Features,such as velocity, must be computed explicitly fromthis raw input stream. As discussed below, even thesimplest visual computations have access to D ∼ 102

spacetime samples of I(~x, t). If the response of a sin-gle neuron were an arbitrary function on a space of100 dimensions, then no reasonable experiment wouldbe sufficient to characterize the computation that isrepresented by the neuron’s output spike train. Anymethod for characterizing the mapping from the vi-sual input I(~x, t) to an estimate of motion as encodedby a motion sensitive neuron must thus involve somesimplifying assumptions.

Models for the neural computation of motion goback (at least) to the classic work of Hassenstein andReichardt, who proposed that insects compute motionby evaluating a spatiotemporal correlation of the sig-nals from the array of photodetector cells in the com-pound eye [16]. Essentially the same computationalstrategy is at the core of the motion energy models thatare widely applied to the analysis of human perceptionand neural responses in primates [17]. Both the corre-lation model and the motion energy model have been

extended in various ways to include saturation or nor-malization of the responses [18, 19]. A seemingly verydifferent approach emphasizes that motion is a rela-tionship between spatial and temporal variation in theimage, and in the simplest case this means that veloc-ity should be recoverable as the ratio of temporal andspatial derivatives [20]. Finally, the fact that the flyvisual system achieves motion estimates with a preci-sion close to the physical limits [13, 14] motivates thetheoretical question of which estimation strategies willin fact make best use of the available signals, and thisleads to rather specific predictions about the form ofthe motion computation [21]. The work described herehas its origins in the attempt to test these predictionsof optimal estimation theory.

The traditional approach to testing theories of mo-tion estimation involves the design of particular visualstimuli which would highlight or contrast the predic-tions of particular models. This tradition is best devel-oped in the work of the Reichardt school, which aimedat testing and elaborating the correlation model formotion estimation in fly vision [22, 23]. The fact thatsimple mathematical models developed from elegantbehavioral experiments in the early 1950s provided abasis for the design and analysis of experiments on theresponses of single neurons decades later [24] shouldbe viewed as one of the great triumphs of theoreticalapproaches to brain function. While the correlationmodel (or the related motion energy models) describesmany aspects of the neural response, it probably is fairto say that the simplest versions of these models arenot sufficient for describing neural responses to motiongenerally, and especially in more natural conditions.

One of the clear predictions of optimal estimationtheory is that computational strategies which makethe best use of the available information in the retinalarray must adapt to take account of different stimulusensembles [21]: not only will the computation of mo-tion reach a different answer in (for example) a bright,high contrast environment and in a dim, low contrastenvironment, the optimal strategy actually involvescomputing a different function in each of these envi-ronments. Further, this adaptation in computationalstrategy is not like the familiar light and dark adapta-tion; instead it involves adjusting the brain’s compu-tational strategy in relation to the whole distribution

of visual inputs rather than just the mean. If statis-tical adaptation occurs, then the program of testingcomputational models by careful choice of stimuli hasa major difficulty, namely that the system will adaptto our chosen stimulus ensemble and we may not beable to isolate different aspects of the computation asexpected.

There is evidence that the coding of dynamic sig-nals adapts to the input distribution both in the flymotion sensitive neurons [25, 26, 27] and in the ver-

tebrate retina [28, 29], so that in these systems atleast the representation of stimulus features dependson the context in which they are presented. In theseexamples context is defined by the probability distri-bution from which the signals are drawn, but therealso is a large body of work demonstrating that neu-ral responses at many levels of the visual system aremodulated by the (instantaneous) spatial context inwhich localized stimuli are presented [30]. What isneeded, then, is a method which allows us to ana-lyze the responses to more complex—and ultimately tofully natural—inputs and decompose these responsesinto the elementary computational steps.

We will see that models of motion estimation sharea common structure: estimation involves a projectionof the high dimensional visual inputs onto a lower di-mensional space, followed by a nonlinear interactionamong variables in this low dimensional space. Themain goal of this paper, then, is to present an analysismethod that allows us to observe directly the smallnumber of dimensions which are relevant to the pro-cessing of complex, high dimensional inputs by anyparticular neuron. We use this approach to show thatthe motion sensitive neuron H1 in the fly visual sys-tem computes a function of its inputs that is of theform predicted by optimal estimation theory. Moregenerally, we believe that the reduction of dimension-ality may be the essential simplification required forprogress on the subjects of neural coding and process-ing of complex, naturalistic sensory signals.

II. MODELS OF MOTION ESTIMATION

The classic model of visual motion detection is theReichardt correlator, schematized in Fig. 1. In thesimplest version of the model, the output signal is justthe product of the voltages in neighboring photorecep-tors that have been passed through different filters,

θest(t) ≈

[∫

dτf(τ)Vn(t − τ)

]

×

[∫

dτ ′g(τ ′)Vn−1(t − τ ′)

]

. (1)

This signal has a directionality, since the nth receptorvoltage is passed through filter f while its left neigh-bor is passed through filter g. A better estimate ofmotion, however, would be genuinely antisymmetricrather than merely directional, and to achieve this wecan subtract the signal computed with the oppositedirectionality:

θest(t) =

[∫

dτf(τ)Vn(t − τ)

]

×

[∫

dτ ′g(τ ′)Vn−1(t − τ ′)

]

2

−

[∫

dτf(τ)Vn−1(t − τ)

]

×

[∫

dτ ′g(τ ′)Vn(t − τ ′)

]

. (2)

Although it is natural to discuss visual computationusing the photoreceptor signals as input, in fact wecan’t control these signals and so we should refer thecomputation back to image intensities or contrasts. Ifthe photoreceptors give linear responses to contrastC(~x, t) over some reasonable range, then we can write

Vn(t) =

∫

d2xM(~x − ~xn)

∫

dτT (τ)C(~x, t − τ), (3)

where M(~x) is the spatial transfer function or aper-ture of the receptor and T (τ) is the temporal impulseresponse. Substituting into Eq. (2), we obtain

θ(n)est (t) = s1(t)s4(t) − s2(t)s3(t), (4)

FIG. 1: The correlator model of visual motion detection,adapted from Ref. [16]. A spatiotemporal contrast pat-tern C(x, t) is blurred by the photoreceptor point spreadfunction, M(x), and sampled by an array of photorecep-tors, two of which (neighboring photoreceptors numbersn − 1 and n) are shown here. After phototransduction,the signals in each photoreceptor are filtered by two differ-ent linear filters, f(t) and g(t). The outputs of these filtersfrom the different photoreceptors, s1(t) and s3(t) from pho-toreceptor n and s2(t) and s4(t) from photoreceptor n − 1are multiplied and one of these products is subtracted fromthe other by the addition unit, yielding a direction selectiveresponse.

where we are now careful to note that this is the es-timate obtained at point n on the retina, and eachof the signals si(t) is a linearly filtered version of thespatiotemporal contrast pattern,

s1(t) =

∫

d2xdτM(~x − ~xn)f(τ)C(~x, t − τ) (5)

s2(t) =

∫

d2xdτM(~x − ~xn−1)f(τ)C(~x, t − τ) (6)

s3(t) =

∫

d2xdτM(~x − ~xn)g(τ)C(~x, t − τ), (7)

s4(t) =

∫

d2xdτM(~x − ~xn−1)g(τ)C(~x, t − τ) (8)

with the temporal filters

f(τ) =

∫

dτ ′f(τ − τ ′)T (τ ′), (9)

g(τ) =

∫

dτ ′g(τ − τ ′)T (τ ′). (10)

Although much of the effort in applying the Reichardt(and related) models to experiment has focused onmeasuring the particular filters f and g, we want toemphasize here the fact the form of Eq. (4) is simpleno matter how complex the filters might be.

In principle, the estimate of motion at time t canbe influenced by the entire movie that we have seenup to this point in time, that is by the whole historyC(~x, t − τ) with τ > 0. In real systems it typicallymakes sense to use a finite time window, and in theconcrete example of the fly’s motion sensitive neurons,the relevant window for the computation can be on theorder of 150 msec [31], while the absolute time reso-lution is at worst a few milliseconds [32]. This meansthat there are at least ∼ 50 time points which can enterour description of this movie. To compute motion weneed access to at least two independent spatial pixels,so altogether the history C(~x, t − τ) involves at leastone hundred numbers: “the stimulus” is a point in aspace of over 100 dimensions. Despite this complexityof the stimulus, Eq. (4) tells us that—if this modelof motion computation is correct—only four stimulusparameters are relevant. The computation of motioninvolves a nonlinear combination of these parameters,but the parameters themselves are just linear combi-nations of the ∼ 102 stimulus variables. While ourimmediate concern is with the fly visual system, simi-lar dimensionalities arise if we think about the stimuliwhich drive neurons in the early stages of mammalianvisual cortex, or the acoustic stimuli of relevance forearly stages of auditory processing.

More formally, we can think of the stimulus his-tory C(~x, t − τ) leading up to time t as a vector ~st inD ∼ 102 dimensions. If the motion sensitive neuronsencode the output of the simplest Reichardt correlator

3

then the probability per unit time r(t) of generatingan action potential will be of the form

r(t) = rG(s1, s2, · · · sK) (11)

s1 = v1·~st (12)

s2 = v2·~st (13)

· · · , (14)

where in this case K = 4 and the vectors vi describethe spatiotemporal filters from Eq’s. (5–8). The cen-tral point is not that the function G has a simpleform—it might not, especially when we consider thenonlinearities associated with spike generation itself—but rather that the number of relevant dimensions Kis much less than the full stimulus dimensionality D.

As described here, the correlation computation in-volves just two photoreceptor elements. In motion en-ergy models these individual detector elements are re-placed by potentially more complex spatial receptivefields [17], so that M(~x) is Eq. (3) can have a richerstructure than that determined by photoreceptor op-tics; more generally we can imagine that rather thantwo identical but displaced spatial receptive fields wejust have two different fields. The general structure isthe same, however: two spatial samples of the movieC(~x, t) are passed through two different temporal fil-ters, and the resulting four variables are combined issome appropriate nonlinear fashion. Elaborated ver-sions of both the Reichardt and motion energy modelsmight include six or eight projections, but the spaceof relevant stimulus variables always is much smallerthan the hundreds of dimensions describing the inputstimulus movie.

Wide field motion sensitive neurons, such as the fly’sH1 cell which is the subject of the experiments below,are thought to sum the outputs of many elementarypairwise correlators to obtain an estimate of the globalor rigid body rotational motion,

θest(t) =

N∑

n=1

θ(n)est (t), (15)

where N is the total number of photoreceptors and thelocal estimators θ(n) at each point n along the retinaare defined in Eq. (4); we have written this as if allthe photoreceptors are arrayed along a line, but thereis a simple generalization to a fully two dimensionalarray. This computation takes as input not 2 × 50samples of the movie that we project onto the retina,but rather N × 50, which can reach D ∼ 104. In thiscase the essential reduction of dimensionality is fromthis enormous space to one of only (N/2) × 4 ∼ 102

dimensions. While dimensionality reduction still is akey to understanding the computation in this case, wewould like to start with a more modest problem. Inprinciple we can probe the response of this estimator

by stimulating only two photoreceptors [33]. Alter-natively we can limit the dimensionality of the inputby restricting stimuli to a single spatial frequency, orequivalently just two components which vary as sineand cosine of the visual angle x,

I(x, t) = I · [1 + s(t) sin(kx) + c(t) cos(kx)], (16)

where I(x, t) is the light intensity with mean I, k/2πis the spatial frequency, and the dynamics of the stim-ulus is defined by the functions s(t) and c(t). Theprediction of the correlator model Eq. (15) is thatthe motion estimate is again determined by only fourstimulus parameters, and in the limit that the cell in-tegrates over a large number of receptors we find thesimple result

θest(t) ∝

[∫

dτf(τ)s(t − τ)

]

×

[∫

dτ ′g(τ ′)c(t − τ ′)

]

−

[∫

dτf(τ)c(t − τ)

]

×

[∫

dτ ′g(τ ′)c(t − τ ′)

]

. (17)

We emphasize that even with this simplification of thestimulus the known combination of temporal precisionand integration times in motion computation meanthat ∼ 102 samples of the functions s(t) and c(t) couldbe relevant to the probability of spike generation in themotion sensitive neurons.

Thus far we have discussed models of how the vi-sual system could estimate motion; one can also ask ifthere is a way that the system should estimate motion.In particular, for the problem of wide field motion es-timation faced by the fly, we can ask how to processthe signals coming from the array of photodetectorsso as to generate an estimate of velocity which is asaccurate as possible given the constraints of noise andblur in the photoreceptor array. This is a well posedtheoretical problem, and the results are as follows [21]:

Low SNR. In the limit of low signal–to–noise ratios(SNR), the optimal estimator is a generalization of thecorrelator model in Eq. (15),

θest(t) =∑

nm

∫

dτdτ ′Vn(t − τ)Knm(τ, τ ′)Vm(t − τ ′)

+ · · · , (18)

where the higher order terms · · · include more compli-cated products of receptor voltages. More precisely,this is the leading term in a power series expansion,and at low SNR the leading term is guaranteed todominate. The detailed structure of the kernels Knm

4

depend on our assumptions about the statistical struc-ture of the visual world, but the general correlatorform is independent of these assumptions.

Intermediate SNR. As the signal–to–noise ratio in-creases, higher order terms in the expansion of Eq (18)become important and also the kernels Knm becomemodified so that the optimal estimator integrates overshorter times when the signals are more robust andwhen typical velocities are larger.

High SNR. At high SNR, under a broad range ofassumptions about the statistics of the visual worldthe optimal estimator crosses over to approximate

θest(t) =

∑

n[Vn+1(t) − Vn(t)] · (dVn(t)/dt)

A +∑

n[Vn+1(t) − Vn(t)]2(19)

≈

∫

dx[∂C(x, t)/∂x] · [∂C(x, t)/∂t]

A′ +∫

dx[∂C(x, t)/∂x]2(20)

→∂C(x, t)/∂t

∂C(x, t)/∂x(21)

where A and A′ are constants that depend on the de-tails of the image statistics and the last expressionindicates schematically the limiting behavior at highcontrast. In this limit the optimal estimator is justthe ratio of temporal and spatial derivatives. At veryhigh SNR there is no need to average over time to sup-press noise, so we show the estimator as being an in-stantaneous function of the receptor voltages and theirderivatives; more generally at finite SNR the form ofthe estimator is the same but the receptor responsesneed to be smoothed over time.

Perhaps the most interesting result of the theory isthat both the correlator model and a ratio of deriva-tives model emerge as opposite limiting cases of thegeneral estimation problem. The ratio of derivativesis in some sense the naive solution to the problem ofmotion estimation, since if the movie on the retinahas the form C(x, t) = F (x − vt), corresponding topure rigid motion, then it is easy to see that the ve-locity v = −(∂C/∂t)/(∂C/∂x). This isn’t the gen-eral solution because the combination of differentia-tion and division tends to amplify noise; thus the ratioof derivatives emerges only as the high SNR limit ofthe general problem. At the opposite extreme, thecorrelator model is maximally robust against noise al-though it does make well known systematic errors byconfounding the true velocity with the contrast andspatial structure of the image; the theory shows thatthese errors—which have correlates in the responses ofthe motion sensitive neurons, as well as in behavioralexperiments—may emerge not as limitations of neuralhardware but rather as part of the optimal strategyfor dealing with noise.

Although the theory of optimal estimation givesa new perspective on the correlator model, one canhardly count the well established correlator–like be-

havior of motion sensitive neurons as a success of thetheory. The real test would be to see the crossoverfrom correlator–like behavior to the ratio of deriva-tives. Notice from Eq. (20) that if the overall SNR islarge but the contrast is (instantaneously) small, theoptimal estimate is again correlator–like because thecontrast dependent term in the denominator can beneglected. Thus even under a statistically stationaryset of conditions corresponding to high SNR, we shouldbe able to see both the correlator, with its mutiplica-tive nonlinearity, and the divisive nonlinearity fromthe ratio of derivatives. Behaviors consistent with thisprediction have been observed [34], but we would likea more direct demonstration.

If we consider stimuli of the simple form in Eq. (16),then it is easy to see that the motion estimator in Eq.(20) can be written as

θest(t) ≈s(t) · [dc(t)/dt] − c(t) · [ds(t)/dt]

B + [s2(t) + c2(t)], (22)

where again B is a constant. More generally, if thereceptor signals are all smoothed in the time domainby a filter f(τ), then by analogy with Eq’s. (5–8), wecan define four relevant dimensions of the input movie,

s1 =

∫

dτf(τ)s(t − τ) (23)

s2 =

∫

dτf(τ)c(t − τ) (24)

s3 =

∫

dτf(τ)ds(t − τ)

dt

=

∫

dτdf(τ)

dτs(t − τ) (25)

s4 =

∫

dτf(τ)dc(t − τ)

dt

=

∫

dτdf(τ)

dτc(t − τ), (26)

and then

θest(t) ≈s1 · s4 − s2 · s3

B + [s21 + s2

2]. (27)

Thus the optimal estimator again is a function of fourrelevant dimensions out of the high dimensional spaceof input signals, these dimensions are built by oper-ating on s(t) and c(t) with two filters where one isthe time derivative of the other, and then the four di-mensions are combined nonlinearly. By analogy withEq. (4), we can identify the “correlator variable” Vcorr

constructed from these four variables,

Vcorr = s1 · s4 − s2 · s3, (28)

and the full optimal estimator normalizes this correla-tor variable through a divisive nonlinearity,

θest =Vcorr

B + D, (29)

5

where D = s21 + s2

2 approximates the mean square spa-tial derivative of the image. Note that the differentdimensions enter in highly symmetric combinations.

The goal of the experiments described below is infact to test the predictions from Eq. (27), but wecan also view this as an example of the more gen-eral problem of searching for relevant low dimensionalstructures within the high dimensional space of inputsto which a neuron responds. Related combinationsof multiplicative and divisive nonlinearities arise quitegenerally in models for the “normalization” of neuralresponses in visual cortex [18, 35] and in particular inthe normalization models applied to the motion sensi-tive cells of primate area MT [19]. Recent work [36]suggests that this sort of computation can be derivedfor motion estimation in primate vision from the samesorts of optimization arguments used previously for in-sect vision [21].

Although our focus here has been on the problemof motion estimation in vision, it is important to notethat the same kind of dimensionality reduction pro-vides a precise formulation of feature selectivity inother systems as well. The classic example of center–surround organization in retinal ganglion cells [1] canbe thought of as projecting the image (or movie) ontotwo dimensions corresponding to spatial filters withdifferent radii. Truly linear center–surround behaviorwould then correspond to the cell responding to justa linear combination (difference) of these two dimen-sions, so that really only one combined projection isrelevant, while more subtle forms of interaction be-tween center and surround (e.g., shunting inhibition)would still correspond to a projection onto two dimen-sions but the nonlinear operation with relates firingprobability to location in this two dimensional spacewould be more complicated. Similarly, the orientedreceptive fields in cortical simple cells are described ashaving multiple subregions [2], but if there are non-linear interactions among the subregions then effec-tively there is no single projection of the stimulus towhich the cell responds; rather the small number ofsubregions defines a projection onto a low dimensionalsubspace of images and there may be nontrivial com-putations within this subspace. Indeed, computationalmodels for the detection of contours and object bound-aries have precisely this sort of nonlinear interactionamong linear receptive subfields [37].

More generally, while filtering and feature selectiv-ity sometimes are used as synonyms, actually detecting

a feature requires a logical operation, and in fact in-teresting features may be defined by conjunctions oflogical operations. These more sophisticated notionsof feature selectivity are not summarized by a singlefilter or receptive field. These computations do fit,however, within the framework suggested here, as non-linear (perhaps even hard threshold) operations on a

low dimensional projection of the stimulus.

III. SEARCHING FOR LOW DIMENSIONAL

STRUCTURES

We have argued that models of motion estimation,and perhaps other examples of neural fetaure selec-tivity, belong to a class of models in which neuronsare sensitive only to a low dimensional projection oftheir high dimensional input signals. One approachwould be to find the best model in this class to de-scribe particular neurons. It would, however, be morecompelling if we could provide direct evidence for ap-plicability of the class of models before finding the bestmodel within the class. The essential hypothesis of Eq.(12) is that neurons are sensitive to a subspace of in-puts with dimensionality K much smaller than the fullstimulus dimensionality D. This suggests a series ofquestions:

1. Can we make a direct measurement of K, thenumber of relevant dimensions?

2. Can we find the set of vectors vn that spanthis relevant subspace?

3. Can we map the nonlinearity G(·) that the neu-ron implements within this space?

In the simple case where there is only one relevant di-mension, the idea of reverse or triggered correlation[12, 38] allows us to find this one special direction instimulus space provided that we choose our ensembleof stimuli correctly. If we want to test a model inwhich there are multiple stimulus dimensions we needto compute objects that have a chance of defining morethan one relevant vector. The basic suggestion comesfrom early work on the fly’s motion sensitive visualneurons [39]. Instead of computing the average stimu-lus that precedes a spike, we can characterize the fluc-tuations around the average by their covariance ma-trix. Along most directions in stimulus space, thiscovariance matrix has a structure determined only bycorrelations in the stimulus. There are a small num-ber of directions, however, along which the stimuli thattrigger spikes have a different variance than expected apriori. The fact that the number of directions with dif-ferent variances is small provides direct evidence thatthe cell is sensitive only to a small number of projec-tions. Further, identifying the directions along whichthe variance is different provides us with a coordinatesystem that spans the set of relevant projections. Thefollowing arguments, leading to Eq. (44), formalizethis intuition, answering the first two questions above.Then we turn to an analysis of the nonlinearities in G,leading to Eq. (50).

6

It is useful to think about the spike train as a sumof unit impulses,

ρ(t) =∑

i

δ(t − ti), (30)

where the ti are the spike times. Then the quantitiesof interest are correlation functions between ρ(t) andthe stimulus vector ~st; recall that this vector can rep-resent both the spatial and temporal variations in thestimulus movie. As an example, the average stimuluspreceding a spike is

〈~stspike〉 =

1

r〈ρ(t)~st〉, (31)

where r is the mean spike rate and 〈· · ·〉 denotes anaverage over a very long experiment. If we repeat thesame stimulus for many trials and average the resultingspike trains, then we will obtain the probability perunit time r(t) that the cell spikes, where t is measuredby a clock synchronized with the repeating stimulus asin the usual poststimulus time histogram. Thus

〈ρ(t)〉trials = r(t). (32)

The spike rate r(t) is an average of the spike train ρ(t)over all the noise in the neural response, so that whenwe need to compute averages over a long experiment,we imagine doing this formally by first averaging overthe noise with the stimulus held fixed, and then aver-aging over the distribution of signals; for example,

〈ρ(t)~st〉 = 〈r(t)~st〉s, (33)

where 〈· · ·〉s denotes an average over the distributionof signals presented in the experiment.

To find multiple relevant directions we consider thematrix of second moments that characterizes the stim-uli leading to a spike [39]. If the components of thestimulus vector ~st are written as st(i), with the indexi = 1, 2, · · · , D running over the full dimensionality ofthe stimulus space, then the second moments of stimulipreceeding a spike are

Cspike(i, j) ≡ 〈stspike(i)stspike

(j)〉 (34)

=1

r〈ρ(t)st(i)st(j)〉. (35)

From the arguments above this can be rewritten as

Cspike(i, j) =1

r〈r(t)st(i)st(j)〉. (36)

It is crucial that Cspike(i, j) is something we can es-timate directly from data, looking back at the stim-uli that lead to a spike and computing the matrix oftheir second moments according to the definition inEq. (34). On the other hand, Eq. (36) gives us a

way of relating these computations from the data tounderlying models of how the spike rate r(t) dependson the stimulus.

In general it is hard to go further than Eq. (36) an-alytically. More precisely, with stimuli chosen from anarbitrary distribution the relation between Cspike andsome underlying model of the response can be arbitrar-ily complicated [40]. We can make progress, however,if we are willing to restrict our attention to stimulithat are drawn from a Gaussian distribution as in re-verse correlation analyses. It is important to realizethat this restriction, while significant, does not specifya uniquely “random” stimulus. Gaussian does not im-ply white; we can construct an arbitrary correlationfunction for our stimuli, including correlation func-tions modelled after natural signals [41]. Further, wecan construct stimuli which are nonlinear functions ofunderlying “hidden” Gaussian variables; these stimulican have a complex and even naturalistic structure—see, for example, Ref [27]—and such hidden variablemethods may be useful as a bridge to more generalapplication of the dimensionality reduction idea.

If the distribution of signals is Gaussian, then av-erages such as Eq. (36) are straightforward to com-pute. The key step is the following identity: If ~x =x1, x2, · · · , xD is a vector drawn from a multidimen-sional Gaussian distribution with zero mean, and f(~x)is a differentiable function of this vector, then

〈xif(~x)〉 =

D∑

j=1

Cij

⟨

∂f(~x)

∂xj

⟩

, (37)

where Cij = 〈xixj〉 is the covariance matrix of ~x. Thiscan be applied twice:

〈xixjf(~x)〉 =D

∑

k=1

Cik

⟨

∂[xjf(~x)]

∂xk

⟩

=

D∑

m=1

Cim

[

δjm〈f(~x)〉 +

⟨

xj∂f(~x)

∂xm

⟩]

(38)

= Cij〈f(~x)〉 +

D∑

n, m=1

CimCjn

⟨

∂2f(~x)

∂xm∂xn

⟩

.

(39)

We can use this in evaluating Cspike from Eq (36) byidentifying the vector ~x with the stimulus ~st and thespike rate r(t) with the function f(~x). The result is

Cspike(i, j) = Cprior(i, j) + ∆C(i, j), (40)

∆C(i, j) =1

rCprior(i, k)

⟨

∂2r(t)

∂st(k)∂st(l)

⟩

Cprior(l, j),

(41)

7

where we sum over the repeated indices k and l, andCprior(i, j) is the second moment of stimuli averagedover the whole experiment,

Cprior(i, j) = 〈st(i)st(j)〉. (42)

Further, if the rate has the ‘low dimensional’ form ofEq. (12), then the derivatives in the full stimulus spacereduce to derivatives of the function G with respect toits K arguments:

∂2r(t)

∂st(k)∂st(l)= r

∂2G(s1, s2, · · · sK)

∂sα∂sβ

vα(k)vβ(l),

(43)

where as with the stimulus vector ~st we use vα(i)to denote the components of the projection vectors~vα; again the index i runs over the full dimension-ality of the stimulus, i = 1, 2, · · · , D while the in-dex α runs over the number of relevant dimensions,α = 1, 2, · · · , K, and we sum over repeated indices αand β.

Putting these results together, we find an expressionfor the difference ∆C between the second momentsof stimuli that lead to a spike and stimuli chosen atrandom:

∆C(i, j) = [Cprior(i, k)vα(k)] A(α, β)

× [vβ(l)Cprior(l, j)] , (44)

A(α, β) =

⟨

∂2G(s1, s2, · · · sK)

∂sα∂sβ

⟩

, (45)

and we sum over all repeated indices α, β, k and l inEq. (44). There are several important points whichfollow from these expressions.

First, Eq. (44) shows that ∆C(i, j), which is a D×Dmatrix, is determined by the K × K matrix A(α, β)formed from the second derivatives of the function G.This means that ∆C(i, j) can have only K nonzeroeigenvalues, where K is the number of relevant stimu-lus dimensions. Thus we can test directly the hypothe-sis that the number of relevant dimensions is small justby looking at the eigenvalues of ∆C. Further, this testis independent of assumptions about the nature of thenonlinearities represented by the function G.

Second, the eigenvectors of ∆C associated with thenonzero eigenvalues are linear combinations of the vec-tors ~vα, blurred by the correlations in the stimulusitself. More precisely, if we look at the set of nontriv-ial eigenvectors ~uα, with α = 1, 2, · · · , K, and undothe effects of stimulus correlations to form the vectors~v′

α = [Cprior]−1·~uα, then we will find that these vectors

span the same space as the vectors ~vα which define therelevant subspace of stimuli.

Third, we note that the eigenvalue analysis of ∆Cis not a principal components analysis of the stimu-lus probability distribution. In particular, unless the

function G were of a very special form, the distribu-tion of stimuli that lead to a spike will be stronglynon–Gaussian, and so a principal components analy-sis of this distribution will not capture its structure.Further, directions in stimulus space that have smallvariance can nonetheless make large contributions to∆C. Note also that the eigenvalues of ∆C can beboth positive or negative, while of course the spectrumof a covariance matrix (associated with the principalcomponents of the underlying distribution) always ispositive.

Finally, the eigenvectors (or their deblurred ver-sions) that emerge from this analysis are useful only

because they define a set of dimensions spanning thespace of relevant stimulus features. Once we are in thisrestricted space, we are free to choose any set of coordi-nates. In this sense, the notion of finding “the” linearfilters or receptive fields that characterize the cell be-comes meaningless once we leave behind a model inwhich only one stimulus dimension is relevant. Theonly truly complete characterization is in terms of thefull nonlinear input/output relation within the rele-vant subspace.

Once we have identified a subspace of stimulis1, s2, · · · , sK , we actually can map the nonlinear func-tion G directly provided that K is not too large. Werecall that the spike rate r(t) is the probability perunit time that a spike will occur at time t, given thestimulus ~st leading up to that time. Formally,

r(t) = P [spike@ t|~st]. (46)

From Eq. (12), the rate depends only on K projectionsof the stimulus, and so

r(t) = P [spike@ t|s1, s2, · · · , sK ]. (47)

But the probability of a spike given the stimulus canbe rewritten using Bayes’ rule:

P [spike@ t|s1, s2, · · · , sK ] =P [spike@ t]

P [s1, s2, · · · , sK ]

× P [s1, s2, · · · , sK |spike@ t]. (48)

In the same way that the function P [spike@ t|~st]gives the time dependent spike rate r(t), the numberP [spike@ t] is just the average spike rate r. Thus thenonlinear computation within the K–dimensional rele-vant subspace that determines the neural response canbe found from the ratio of probability distributions inthis subspace,

r(t) = rG(s1, s2, · · · , sK) (49)

= r ·P [s1, s2, · · · , sK |spike @ t]

P [s1, s2, · · · , sK ]. (50)

Now the full distribution P [s1, s2, · · · , sK ] is known,since this defines the conditions of the experiment; fur-ther, we have considered situations in which this distri-bution is Gaussian and hence is defined completely by

8

FIG. 2: A two second segment of the stimulus and cor-responding neural response. The stimulus movie consistsof vertical stripes. Here each horizontal slice indicates thepattern of these stripes at one instant of time. Plus signsat the right indicate spike times from H1. Brief periodswith coherent motion to the left are correlated with thespikes, while clear motions to the right inhibit the neuron.The challenge of the present analysis is to make more pre-cise this connection between features of the movie and theprobability of spiking.

a K ×K covariance matrix. The probabiliity distribu-tion of stimuli given a spike, the response–conditionalensemble [39], can be estimated by sampling: eachtime we see a spike, we can look back at the full stim-ulus ~st and form the K projections s1, s2, · · · , sK ; thisset of projections at one spike time provides one sampledrawn from the distribution P [s1, s2, · · · , sK |spike@ t],and from many such samples we can estimate the un-derlying distribution. This Bayesian strategy for map-ping the nonlinear input/ouput relation provides alarge dynamic range proportional to the total numberof spikes observed in the experiment—see, for exam-ple, Ref. [26].

We emphasize that the procedure described aboverests not on a family of parameterized models whichwe fit to the data, but rather on the idea that if thedimensionality of the relevant subspace is sufficientlysmall then we don’t really need a model. In practicewe have to assume only that the relevant functions arereasonably smooth, and then the combination of eigen-value analysis and Bayesian sampling provides explicitanswers to the three questions raised at the beginningof this section.

IV. AN EXPERIMENT IN H1

We apply these ideas to an experiment on the fly’smotion sensitive H1 neuron, where we would like todissect the different features of motion computationdiscussed in Section II. A segment of the visual stim-ulus and H1’s response is shown in Fig. 2. Whilethe most elementary model of motion computationsinvolves temporal comparisons between two pixels, H1is a wide field neuron and so is best stimulated by spa-tially extended stimuli. To retain the simplicity of thetwo–pixel limit in an extended stimulus we considerhere a stimulus which has just one spatial frequency,as in Eq. (16). For technical details of stimulus gen-eration and neural recording see Appendix A.

To make use of the results derived above, we chooses(t) and c(t) to be Gaussian stochastic processes, asseen in Fig, 3, with correlation times of 50 msec. Sim-ilar experiments with correlation times from 10–100msec lead to essentially the same results describedhere, although there is adaptation to the correlationtime, as expected from earlier work [25, 34]. The prob-lem we would like to solve is to describe the relationbetween the stimulus movie I(x, t) and the spike ar-rival times (cf. Fig. 2).

Simple computation of the spike triggered averagemovie produces no statistically significant results: thecell is sensitive to motion, and invariant to the overallcontrast of the movie, so that the stimulus generated

FIG. 3: Statistics of the visual stimulus. Probability distri-bution of s(t) and c(t) compared with a Gaussian. Becausethese signals represent image contrast, there is inevitablysome clipping at large amplitude (light intensity cannotbe negative), but this affects only ∼ 1% of the signals.At right, the autocorrelation of the signals, 〈s(t)s(t + τ )〉(circles) and 〈c(t)c(t + τ )〉 (squares) are almost perfect ex-ponential decays exp(−|τ |/τc), τc = 50msec.

9

FIG. 4: Distribution of interspike intervals. Solid lineshows the observed distribution collected in 2 msec bins.Dashed line is an exponential fit to the tail of the distribu-tion. Exponential decay of the interval distribution meansthat successive spikes are occurring independently, and thedata indicate that one must wait ∼ 40 msec for this inde-pendence to be established. Further analysis is done onlywith these “isolated spikes.”

by the transformation (s, c) → (−s,−c) will produceindistinguishable responses. This being said, we wantto proceed with the covariance matrix analysis.

From previous work we know that different patternsof spikes in time can stand for very different motiontrajectories [39]. From the present point of view, thisconnection to spike patterns is not a question aboutthe nature of the motion computation but rather abouthow the output of this computation is represented inthe spike train. To simplify the discussion, we willfocus on spikes that occur in relative isolation fromprevious spikes. Specifically, when we look at the in-terspike interval distribution in Fig. 4, we see thatfor intervals longer than ∼ 40 msec the distributionhas the form of a decaying exponential. This is whatwe expect if after such long intervals spikes are gen-erated independently without memory or correlationto previous spikes. More colloquially, spikes in thisexponential regime are being generated independentlyin response to the stimulus, and not in relation to pre-vious spikes. All further analysis is done using theseisolated spikes; for a related discussion in the contextof model neurons see Ref. [42].

Qualitative examination of the change in stimuluscovariance in the neighborhhod of an isolated spike,∆C, reveals that it is a very smooth matrix, consistent

with the idea that it is composed out of a small numberof significant eigenvectors. To quantify these observa-tions, and proceed along the analysis program outlinedin the previous section, we diagonalize ∆C to give theeigenvalues shown in Fig. 5. In trying to plot the re-sults there is a natural question about units. Becausethe stimuli themselves are not white, different stimuluscomponents have different variances. The eigenvalueanalysis of ∆C provides a coordinate system on thestimulus space, and the eigenvalues themselves mea-sure the change in stimulus variance along each coordi-nate when we trigger on a spike. Small changes in vari-ance along directions with large total variance presum-ably are not very meaningful, while along a directionwith small variance even a small change could meanthat the spike points precisely to a particular valueof that stimulus component. This suggests measuringthe eigenvalues of ∆C in units of the stimulus variancealong each eigendirection, and this is what we do inFig. 5. This normalization has the added value (notrelevant here) that one can describe the stimulus interms of components with different physical units andstill make meaningful comparisons among the differenteigenvalues. Figure 5 shows clearly that four directionsin stimulus space stand out relative to a background of196 other dimensions. The discussion in Section II ofmodels for motion estimation certainly prepares us tothink about four special directions, but before lookingat their structure we should answer questions abouttheir statistical significance.

In practice we form the matrix ∆C from a finiteamount of data; even if spikes and stimuli were com-pletely uncorrelated, this finite sampling gives rise tosome structure in ∆C and to a spectrum of eigenvalueswhich broadens around zero. One way to check the sig-nificance of eigenvalues is to examine the dependenceof the whole spectrum on the number of samples. Outof the ∼ 8000 isolated spikes which we have collectedin this experiment, we show in left panel of Fig. 6what happens if we choose 10%, 20%, ... , 90% atrandom to use as the basis for constructing ∆C. Thebasic structure of four modes separated from the back-ground is clear once we have included roughly half thedata, and the background seems (generally) to narrowas we include more data.

A different approach to statistical significance is togenerate a set of random data that have comparablestatistical properties to the real data, breaking onlythe correlations between stimuli and spikes. If we shiftthe all the spikes forward in time by several seconds rel-ative to the stimulus, then since the correlations in thestimulus itself are short–lived, there will be no residualcorrelation between stimulus and spikes, but all the in-ternal correlations of these signals are untouched. Ifwe choose the shift times at random, with a minimumvalue, then we can generate many examples of uncorre-

10

lated stimuli and spikes, and find the eigenvalue spec-tra of ∆C in each example. Taken together a largenumber of these examples gives us the distribution ofeigenvalues that we expect to arise from noise alone,and this distribution is shown in cumulative form inthe right panel of Fig. 6. A crucial point—expectedfrom the analytic analysis of eigenvalues in simplercases of random matrices [43]—is that the distributionof eigenvalues in the pure noise case has a sharp edgerather than a long tail, so that the band of eigenvaluesin a single data set will similarly have a fairly definiteendpoint rather than a long tail of ‘stragglers’ whichcould be confused with significant dimensions. Whilelarger data sets might reveal more significant dimen-sions, Fig. 6 indicates that the present data set pointsto four and only four significant stimulus dimensionsout of a total of 200.

In Fig. 7 we show the eigenvectors of ∆C associatedwith the four significant nonzero eigenvalues. We canthink of these eigenvectors as filters in the time do-main which are applied to the two spatial componentsof the movie s(t) and c(t); the four eigenvectors thus

FIG. 5: Eigenvalues of ∆C. Stimuli are represented as seg-ments of s(t) and c(t) in windows of ±200 msec surroundingisolated spikes, sampled at 4msec resolution; ∆C thus is a200 × 200matrix. As explained in the text, the eigenvaluemeasures the spike–triggered change in stimulus variancealong a particular direction in stimulus space, while theeigenvector specifies this direction. Since the stimulus it-self has correlations, different directions have different vari-ances a priori, and we express the change in variance as afraction of the total variance along the corresponding di-rection. There are four dimensions which stand out clearlyfrom the background.

FIG. 6: Testing the significance of the eigenvalue distribu-tions. At left we show the evolution of the eigenvalue spec-trum as we analyze larger data sets. The four dimensionswhich stand out from the background in the full data setalso have stable eigenvalues as a function of data set size,in contrast to the background of eigenvalues which comefrom a distribution which narrows as more data is included.At right we show the cumulative probability distributionof eigenvalues from surrogate data in which the correla-tons between stimulus and spike train have been brokenby random time shifts, as explained in the text. Eigenval-ues with absolute value larger than 0.1 arise roughly 1%of the time, but there is a very steep edge to the distribu-tion such that absolute values larger than 0.13 occur only0.01% of the time. The sharp edge in the random datais essential in identifying eigenvalues which stand out fromthe background, and this edge is inherited from the simplerproblem of eigenvalues in truly random matrices [43].

determine eight filters. We see that among these eightfilters there are some similarities. For example, the fil-ter applied to c(t) in eigenvector (a) looks very similarto that which is applied to s(t) in eigenvector (b), thefilter applied to s(t) in eigenvector (c) is close to beingthe negative of that which is applied to c(t) in eigen-vector (d), and so on. Some of these relations are aconsequence of the approximate invariance of H1’s re-sponse to static translations of the visual inputs, andthis is related to the fact that the significant eigen-values form two nearly degenerate pairs. In fact thesimilarity of the different filters is even greater thanrequired by translation invariance, as indicated by thesingular value decompostion shown in the lower panelsof Fig. 7: the eight temporal filters which emerge fromthe eigenvalue analysis are constructed largely fromonly two underlying filters which account for 80% ofthe variance among the filter waveforms. These two fil-ters are extended in time because they still contain [asexpected from Eq. (44)] a “blurring” due to intrinsiccorrelations in the stimulus ensemble. When we de-

11

convolve these correlations we inevitably get a noisyresult, but it is clear that the two filters form a pair,one which smooths the input signal over a ∼ 40 msecwindow, and one which smooths the time derivative ofthe input signal over the same window.

The results thus far already provide confirmation forimportant predictions of the models discussed in Sec-tion II. With stimuli of the form used here, these mod-els predict that the motion estimator is constructedfrom four relevant stimulus dimensions, that these di-mensions in turn are built from just two distinct tem-poral filters applied to two different spatial compo-nents of the visual input, and that one filter is thetime derivative of the other [cf. Eq’s. (23–27)]. Allthree of these features are seen in Figs. (5) and (7).

To proceed further we sample the probability dis-tributions along the different stimulus dimensions, asexplained in the discussion surrounding Eq. (50). Al-though the reduction from 200 to 4 stimulus dimen-sions is useful, it still is difficult to examine probabil-ity distributions in a four dimensional space. We pro-ceed in steps, guided by the intuition from the modelsin Section II. We begin by looking at a two dimen-sional projection. We recall that the optimal estima-

FIG. 7: Top panels: Eigenvectors associated with the foursignificant eigenvalues. Solid lines show components alongthe s stimulus directions, dashed lines along the c direc-tions. Bottom panels: Analysis of filters. (e) Results ofsingular value decomposition demonstrate that most of thevariation among the eight filters at left can be captured bytwo modes, which are shown in (f). Deconvolving the stim-ulus correlations from these vectors we find the results in(g), where we note that anti–causal pieces of the filters arenow just noise, as expected if the deconvolution is success-ful. Closer examination shows that one of these filters isapproxmiately the derivative of the other, and in (h) weimpose this condition exactly and truncate the filters tothe 100 msec window which seems most relevant.

FIG. 8: Spike probability in two stimulus dimensions.Color code represents log

10[r(t)/(spikes/sec)], with r(t) de-

termined from sampling the probability distributions inEq. (50), and we normalize the projections of the stimulusalong each dimension such that they have unit variance.Note that there is no absolute preference for the sign ofthe individual stimulus components; rather the spike prob-ability is higher when the two components have oppositesigns and lower when they have the same sign. This isthe pattern expected if the neuron in fact is sensitive tothe product of the two components, as in the correlationcomputation of Eq. (28).

tion strategy involves the correlation of spatial andtemporal derivatives. When we do this [as in Eq’s.(27) and (28)] we find terms involving the product ofthe time derivative of s(t) with the current value ofc(t), as well as the other combination is which s andc are exchanged. This suggests that we look at theresponse of H1 in the plane determined by the differ-entiating filter applied to s and the smoothing filterapplied to c, and this is shown in Fig. 8. We see thegeneral structure expected if the system is sensitive toa product of the two dimensions: symmetry across thequadrants, and contours of equal response have an ap-proximately hyperbolic form. The same structure isseen in the other pair of dimensions, but with the 90

rotation expected from the theory.If we take the product structure of the correlator

models—and the numerator of the optimal estimationtheory prediction in Eq. (27)—seriously, then we cantake the two dimensional projection of Fig. 8 and col-lapse it onto a single dimension by forming the prod-uct of the two stimulus variables. We can do the samething in the other two stimulus dimensions, and thensum these two (nonlinear) stimulus variables to formthe anti–symmetric combination Vcorr from Eq. (28).The dependence of the spike rate on Vcorr is shown inFig. 9: the rate is modulated by roughly a factor of

12

one hundred in response to changes in this stimulusvariable. We emphasize that this conclusion is derivednot by “manually” changing the value of the correla-tor variable, but rather by extracting this nonlinearcombination of stimulus variables from the continuousvariations of a complex dynamic input.

The correlator variable Vcorr = s1 · s3 − s2 · s4 fromEq. (28) is only one of many possible nonlinear com-binations of the four stimulus dimensions that emergefrom the analysis of ∆C. It is predicted by theory tobe a central part of the motion computation, but italso defines a quantity that is invariant to a time–independent spatial translation of the visual stimu-lus; thus the correlator variable automatically incor-porates the approximate translation invariance of H1’sreponse. Another such invariant combination is

D = s21 + s2

2. (51)

In the predictions of optimal estimation theory [cf. Eq.(27)], this combination arises because it is proportionalto the mean square spatial derivative of the (tempo-rally filtered) image on the retina. In Fig. 10 we showthe firing rate r(t) as a function of the two variablesVcorr and D. By exploring this (nonlinear) two dimen-sional space we expose a much larger dynamic rangeof firing rates than can be seen by projecting onto thecorrelator variable alone as in Fig. 9. We see in theright panel of Fig. 10 that with D fixed there is lit-tle evidence of saturation in the plot of spike rate vs.

FIG. 9: Neural response to the correlator variable Vcorr

from Eq. (28). At left, probability distributions of thecorrelator variable a priori (dashed) and conditional on anisolated spike (solid). Note that since the correlator vari-able is a nonlinear combination of the four relevant stimu-lus dimensions, the prior distribution is not Gaussian butin fact almost precisely exponential in form. At right, thespike rate r(t) calculated from these distributions using Eq.(50).

FIG. 10: Spike probability in two nonlinear dimensions. Atleft, the spike rate as a function of the correlator variableVcorr (as in Fig. 9) and the mean square spatial deriva-tive of the movie, as computed from our stimulus projec-tions as proportional to D [Eq. (51)]. Color scale indicateslog

10r(t), with r measured in spikes/sec. We see a hint

that the contours of constant color form a fan, as expectedif the neuron responds approximately to the ratio of thecorrelator variable and the mean square spatial derivative.At right, we make this clearer by sorting the stimuli intosmall, medium and large magnitudes of the spatial deriva-tive, effectively taking slices through the two dimensionalplot at left and averaging over the distribution of stimuli.The gain of the response to the correlator variable clearlyis modulated by the mean square spatial derivative.

Vcorr, and the gain in the neural response to the cor-relator variable is modulated by the magntiude of D.

Another way to look at the results of Fig. 10 is toplot r(t) vs. Vcorr with separate points for the differ-ent values of D. Because the rate has a substantialdependence on D, there is a large amount of scatter,as seen at the left panel of Fig. 11. On the other hand,the prediction of optimal estimation theory is that theresponse should be a function of normalized correlatorvariable Vcorr/(B + D) [Eq (29)]. If this prediction iscorrect, then if we plot r(t) vs this normalized quan-tity all of the scatter should collapse. The only un-certainty from optimal estimation theory is the valueof the parameter B, which depends on detailed as-sumptions about the statistical structure of the visualworld. We can choose the value of B which minimizesthe scatter, however, and the results are shown in theright panel of Fig. 11. We see that by constructing thenormalized correlator variable predicted from optimalestimation theory we have revealed a dynamic range ofnearly 103 in the modulation of spike rate. Since theexperiment involves only ∼ 8 × 103 isolated spikes, itis difficult to imagine meaningful measurements over

13

a larger dynamic range.

V. DISCUSSION

The approach to analysis of neural responses thatwe have taken here grows out of the reverse corre-lation method. We recall that correlating the inputand output of neurons, especially spiking neurons, canbe viewed from two very different perspectives, as re-viewed in Ref. [12]. In one view, we imagine writingthe firing rate of a neuron as a functional power seriesin the input signal,

r(t) = r0+∑

i

K1(i)st(i)+1

2

∑

ij

K2(i, j)st(i)st(j)+· · · ,

(52)where as in the discussion above st(i) is the ith com-ponent of the stimulus history that leads up to timet, and K1, K2, · · · are a set of “kernels” that formthe coefficients of the series expansion. If we chooseinputs ~st with Gaussian statistics, then by comput-ing the spike–triggered average stimulus we can re-cover K1(i), by computing the spike–triggered secondmoments we can recover K2(i, j), and so on; this isthe Wiener method for analysis of a nonlinear system.

FIG. 11: Collapsing back to a single stimulus feature. Atleft we show the data of Fig. 10 as a scatter plot of spikerate vs. the correlator variable. Each data point corre-sponds to a combination of the correlator variable and themean square spatial derivative; the broad scatter arisesfrom the fact that the spike rate depends significantly onboth variables. At right, we construct a normalized versionof the correlator variable corresponding to the form pre-dicted by optimal estimation theory, Eq. (29); the one ar-bitrary parameter is chosen to minimize the scatter. Notethat the same number of data points appear in both plots;at right many of the points lie on top of one another.

Poggio and Reichardt [23] emphasized that the correla-tor model of motion estimation defines a computationthat is equivalent precisely to a functional series as inEq (52) but with only the K2 term contributing. Fur-ther, they showed that other visual computations, suchas the separation of objects from background via rel-ative motion, can be cast in the same framework, butthe minimal nonlinearities are of higher order (e.g., K4

in the case of figure–ground separation).Marmarelis and McCann used the Wiener method to

analyze the responses of fly motion sensitive neurons[44]. Using a pair of independently modulated lightspots they verified that K1 makes a negligible contri-bution to the response, and showed that K2 has thedynamical structure predicted from experiments withdouble flash stimuli. By construction the second orderWiener kernel describes the same quadratic nonlinear-ity that is present in the correlator model. Resultson the structure of K2 in motion sensitive neurons,as with many other experiments, thus are consistentwith the correlator model, but don’t really constitutea test of that model. In particular, such an analysiscannnot exclude the presence of higher order contribu-tions, such as those described by Eq (19–21).

Despite its formal generality, the Wiener approachhas the problem that it is restricted in practice tolow order terms. If we can measure only the first fewterms in Eq (52) we are in effect hoping that neural re-sponses will be only weakly nonlinear functions of thesensory input, and this generally is not the case. Simi-larly, while Poggio and Reichardt were able to identifythe minimum order of nonlinearity required for differ-ent visual computations, it is not clear why the brainshould use just these minimal terms. Crucially there isan interpretation of reverse correlation that does notrest on these minimalist or weak nonlinearity assump-tions.

In their early work on the auditory system, de Boerand Kuyper emphasized that if there is a single stage oflinear filtering followed by an arbitrary instantaneousnonlinearity, then with Gaussian inputs the spike–triggered average or “reverse correlation” will uncoverthe linear filter [38]. In our notation, if we can write

r(t) = rG (v1·~st) , (53)

then the spike–triggered average stimulus allows us torecover a vector in stimulus space proportional to thefilter or receptive field ~v1 independent of assumptionsabout the form of the nonlinear function G, as longas symmetries of this function do not force the spike–triggered average to be zero. The hypothesis that neu-rons are sensitive to a single component of the stimulusclearly is very different from the hypothesis that theneuron responds linearly. Our approach generalizesthis interpretation of reverse correlation to encompassmodels in which the neural response is driven by mul-

14

tiple stimulus components, but still many fewer thanare required to describe the stimulus completely, as inEq (12).

The success of the receptive field concept as a toolfor the qualitative description of neural responses hasled to considerable interest in quantifying the preciseform of the receptive fields and their analogs in dif-ferent systems. This focus on the linear componentof the strongly nonlinear neural response leaves openseveral important questions. In the auditory system,for example, observation of spectrotemporal receptivefields with frequency sweep structures does not tell uswhether the neuron simply sums the energy in differ-ent frequency bands with different delays, or if theneuron has a strong, genuine “feature detecting” non-linearity such that it responds only when power in onefrequency band is followed by power in a neighboringband. Similarly, the different models of motion esti-mation discussed in Section II are distinguished not bydramatically different predictions for the spatiotem-poral filtering of incoming visual stimuli, but by theway in which these filtered components are combinednonlinearly. If we hope to explore nonlinear interac-tions among multiple stimulus components, it is cru-cial that there not be too many relevant components.The spike–triggered covariance method as developedhere provides us with tools for counting the number ofrelevant stimulus dimensions, for identifying these di-mensions or features explicitly, and for exploring theirinteractions.

As far as we know the first consideration of spike–triggered covariance matrices was by Bryant and Se-gundo in an analysis of the neural responses to in-jected currents [45]; in many ways this paper was farahead of its time. Our own initial work on the spike–triggered (or more generally response–triggered) co-variance came from an interest in making models ofthe full distribution of stimuli conditional on a spikeor combination of spikes, from which we could computethe information carried by these events. In that con-text the small number of nontrivial eigenvalues in ∆Cmeant that we could make estimates which were morerobust against the problems of small sample size [39].Roughly ten years later we realized that this struc-ture implies that the probability of generating a spikemust depend on only a low dimensional projection ofthe stimulus, and that the analysis of spike–triggeredcovariance matrices thus provides a generalization ofreverse correlation to multiple relevant dimensions, aspresented here [46].

The ideas of the covariance matrix analysis werestated and used in work on adaptation of the neuralcode in the fly motion sensitive neurons [26], and incharacterizing the computation done by the Hodgkin–Huxley model neuron [42]. Preliminary results suggestthat the same approach via low–dimensional struc-

tures may be useful for characterizing the feature selec-tivity of auditory neurons [47]. In the mammalian vi-sual system the covariance matrix methods have beenused in both the retina [48] and in the primary visualcortex [49, 50, 51] to characterize the neural responsebeyond the model of a single receptive field. Most re-cently this approach has been used to reveal the sen-sitivity of retinal ganglion cells to multiple dimensionsof temporal modulation, and to demonstrate a strik-ing diversity in how these dimensions are combinednonlinearly to determine the neural response [52].

In the particular case of motion estimation, thespike–triggered covariance method has made it pos-sible to test important predictions of optimal estima-tion theory. The optimal motion estimator exhibitsa smooth crossover from correlator–like behavior atlow signal to noise ratios to ratio of gradient behaviorat high signal to noise ratio [21]. In particular thismeans that the well known confounding of contrastand velocity in the correlator model should give wayto a more “pure” velocity sensitivity as the signal tonoise ratio is increased. In practice this means that thecontrast dependence of responses in motion sensitiveneurons should saturate at high contrast but this sat-urated response should retain its dependence on veloc-ity. Further, the form of the saturation should dependon the mean light intensity and on the statistics ofthe input movies. All of these features are observed inthe responses of H1 [34], but none is a “smoking gun”for the optimal estimator. Specifically, the optimalestimator has two very different types of nonlinear-ity: the multiplicative nonlinearity of the correlatormodel, and the divisive nonlinearity of the gradientratio. It is this nonlinear structure—and not, for ex-ample, a dramatic shift in frequency response or otherquasi–linear filtering properties—that seems to be thecentral prediction of optimal estimation theory. Thespike–triggered covariance matrix method has allowedus to demonstrate directly that both nonlinearities op-erate simultaneously in shaping the response of H1 tocomplex, dynamic inputs—the multiplicative nonlin-earity is illustrated by Figs 8 and 9, while the divisivenonlinearity is revealed in Figs 10 and 11.

Although much remains to be done, the demonstra-tion that the nonlinearities in motion computation areof the form predicted from optimal estimation theoryhelps to close a circle of ideas which began with theobservation that H1 encodes near–optimal motion esti-mates, at least under some range of conditions [13, 14].If the nervous system is to achieve optimal perfor-mance (at motion estimation or any other task) thenit must carry out certain very specific computations.Evidence for optimality thus opens a path for theoriesof neural computation based on mathematical analysisof the structure of the problem that the system has tosolve rather than on assumptions about the internal

15

dynamics of the circuitry that implements the solu-tion. Except in the very simplest cases, testing thesetheories requires methods for analysis of the nonlinearstructure in the neural response to complex stimuli.

No matter one what one’s theoretical prejudices, an-alyzing neural processing of complex, dynamic inputsrequires simplifying hypotheses. Linearity or weaknonlinearity, central features of the classical Wienersystem identification methods, are unlikely to be accu-rate or sufficient. The widely used concept of receptivefield replaces linearity with a single stimulus template,and much effort has gone into developing methods forfinding this single relevant direction in stimulus space.But already the earliest discussions of receptive fieldsmade clear that there can be more than one relevantstimulus dimension, and that these features can inter-act nonlinearly.

The methods developed here go systematically be-yond the “single template” view of receptive fields: wecan count the number of relevant stimulus dimensions,identify these features explicitly, and in favorable casesmap the full structure of their nonlinear interactions.Crucially, essential aspects of the results are clear evenfrom the analysis of relatively small data sets (cf Fig6), so that one can make substantial progress withminutes rather than hours of data. Further, the possi-bility of visualizing directly the nonlinear interactionsamong different stimulus dimensions by sampling therelevant probability distributions allows us to go be-yond fitting models to the data; instead one can besurprised by unexpected features of the neuron’s com-putation, as in recent work on the retina [52].

The idea that neurons are sensitive to low dimen-sional subspaces within the high dimensional space ofnatural sensory signals is a hypothesis that needs to betested more fully. If correct, this dimensionality reduc-tion can be sufficiently powerful to render tractable theotherwise daunting problem of characterizing the neu-ral processing and representation of complex inputs.

Acknowledgements

We thank GD Lewen and A Schweitzer for their helpwith the experiments. Discussions with N Brenner

and N Tishby were crucial in the development of ourideas, and we thank also B Aguera y Arcas, AJ Doupe,AL Fairhall, R Harris, NC Rust, E Schneidman, KSen, T Sharpee, JA White, and BD Wright for manydiscussions exploring the application of these meth-ods in other contexts. Early stages of this work weresupported by the NEC Research Institute and werepresented as part of the summer course on computa-tional neuroscience at the Marine Biological Labora-tory; we thank many students and course faculty whoasked penetrating questions and had helpful sugges-tions. Completion of the work was supported in partby National Science Foundation Grant IIS–0423039,as part of the program for Collaborative Research inComputational Neuroscience.

APPENDIX A: EXPERIMENTAL METHODS

Spikes from H1 are recorded with a conventional ex-tracellular tungsten microelectrode (FHC Inc., 3 MΩ),using a silver wire as the reference electrode. H1 isidentified by the combination of its projection acrossthe midline of the brain and its selectivity to inwardmotion [3, 53, 54]. Spike arrival times are digitized to10 µs precision and stored for further analysis, using aCED 1401plus real time computer. Stimulus patternsare computed using a Digital Signal Processor board(Ariel) based on a Motorola 56001 processor, and con-sist of frames of nominally 200 vertical lines, writtenat a frame rate of 500 Hz. Thus the patterns are essen-tially 1-dimensional, but extended in the vertical direc-tion. They are displayed on a Tektronix 608 monitor(phosphor P31), at a radiance of I = 165 mW/m2 · sr.Taking spectral and optical characteristics of the pho-toreceptor lens–wave guide into account, this light in-tensity corresponds to a flux of ∼ 4 × 104 effectivelytransduced photons/s in each retinal photoreceptor.Frames are generated in synchrony with the spike tim-ing clock by forcing the DSP to generate frames trig-gered by a 2 ms timing pulse from the CED. Angulardimensions of the display are calibrated using the mo-tion reversal response of the H1 neuron [55].

[1] HB Barlow, Summation and inhibition in the frog’sretina. J Physiol (Lond) 119, 69–88 (1953). SW Kuf-fler, Discharge patterns and functional organization ofmammalian retina. J. Neurophys. 16, 37–68 (1953).

[2] DH Hubel & TN Wiesel, Receptive fields, binocularinteraction, and functional architecture in the cat’svisual cortex. J Physiol (Lond) 160, 106–154 (1962).

[3] LG Bishop & DG Keehn, Two types of neurones sensi-

tive to motion in the optic lobe of the fly, Nature 213

1374–1376 (1966). K Hausen, The lobular complex ofthe fly: Structure, function, and significance in behav-ior. in Photoreception and Vision in Invertebrates, MAli, ed., pp. 523–559 (Plenum, New York, 1984).

[4] HB Barlow & WR Levick, The mechanism of direc-tionally selective units in rabbit’s retina, J. Physiol.178, 477–504 (1965).

16

[5] JF Baker, SE Petersen, WT Newsome & JM Allman,Visual response properties of neurons in four extras-triate visual areas of the owl monkey (Aotus Trivir-gatus): A quantitative comparison of medial, dorso-medial, dorsolateral, and middle temporal areas, J.Neurophysiol. 76, 416 (1981).

[6] D Marr, Vision. (WH Freeman, San Francisco, 1982).[7] KH Britten, WT Newsome, MN Shadlen, S Celebrini

& JA Movshon, A relationship between behavioralchoice and the visual responses of neurons in macaqueMT, Vis Neurosci 13, 87–100 (1996).

[8] WT Newsome, RH Wurtz, MR Dursteler & A Mikami,Deficits in visual motion processing following ibotenicacid lesions of the middle temporal visual area of themacaque monkey. J Neurosci 5, 825–840 (1985).

[9] CD Salzman, KH Britten & WT Newsome, Corticalmicrostimulation influences perceptual judgements ofmotion direction. Nature 346, 174–177 (1990). Erra-tum 346, 589 (1990).

[10] JM Groh, RT Born & WT Newsome, How is a sensorymap read out? Effects of microstimulation in visualarea MT on saccades and smooth pursuit eye move-ments. J Neurosci 17, 4312–4330 (1997). MJ Nichols& WT Newsome, Middle temporal visual area micros-timulation influences veridical judgments of motion di-rection. J Neurosci 22, 9530–9540 (2002).

[11] K Hausen & C Wehrhahn, Microsurgical lesion of hor-izontal cells changes optomotor yaw responses in theblowfly Calliphora erythrocephala. Proc R Soc Lond B21, 211–216 (1983).

[12] F Rieke, D Warland, R de Ruyter van Steveninck & WBialek Spikes: Exploring the Neural Code (MIT Press,Cambridge, 1997).

[13] W Bialek, F Rieke, RR de Ruyter van Steveninck & DWarland, Reading a neural code. Science 252, 1854–1857 (1991).

[14] R de Ruyter van Steveninck & W Bialek, Reliabil-ity and statistical efficiency of a blowfly movement–sensitive neuron. Phil Trans R Soc Lond 348, 321–340(1995).

[15] See also pp. 235–253 of [12]. The question of what lim-its the precision of motion estimation in the primatevisual system is much less clear; see HB Barlow & SPTripathy, Correspondence noise and signal pooling inthe detection of coherent visual motion. J. Neurosci.17, 7954–7966 (1997).

[16] S Hassenstein & W Reichardt, Systemtheoretis-che Analyse der Zeit–, Reihenfolgen–, und Vorze-ichenauswertung bei der Bewegnungsperzeption desRusselkafers. Z. Naturforsch. 11b, 513–524 (1956).W Reichardt, Autocorrelation, a principle for evalua-tion of sensory information by the central nervous sys-tem. In Sensory Communication, W Rosenblith, ed,pp 303–317 (J Wiley and Sons, New York, 1961).

[17] EH Adelson & JR Bergen, Spatiotemporal energymodels for the perception of motion. J Opt Soc AmA 2, 284–299 (1985).

[18] D Heeger, Normalization of cell responses in cat striatecortex. Vis Neurosci 9, 181–198 (1992).

[19] EP Simoncelli & DJ Heeger, A model of neuronal re-sponses in visual area MT. Vision Res 38, 743–761

(1998).[20] JO Limb & JA Murphy, Estimating the velocity of

moving objects in television signals. Comp Graph Im-age Proc 4, 311–327 (1975).

[21] M Potters & W Bialek, Statistical mechanics and vi-sual signal processing. J. Phys. I France 4, 1755–1775(1994).

[22] W Reichardt & T Poggio, Visual control of orientationbehavior in the fly. Part I. A quantitative analysis.Quart. Rev. Biophys. 9, 311–375 (1976).

[23] T Poggio & W Reichardt, Visual control of orientationbehavior in the fly. Part II. Towards the underlyingneural interactions. Quart. Rev. Biophys. 9, 377–438(1976).

[24] See, for example, S Single & A Borst, Dendritic inte-gration and its role in computing image velocity. Sci-ence 281, 1848–1850 (1998).

[25] RR de Ruyter van Steveninck, W Bialek, M Potters,RH Carlson & GD Lewen, Adaptive movement com-putation by the blowfy visual system. In Natural andArtificial Parallel Computation: Proceedings of theFifth NEC Research Symnposium, DL Waltz, ed., 21–41 (SIAM, Philadelphia, 1996).

[26] N Brenner, W Bialek & R de Ruyter van Steveninck,Adaptive rescaling optimizes information transmis-sion. Neuron 26, 695–702 (2000).

[27] AL Fairhall, GD Lewen, W Bialek & RR de Ruytervan Steveninck, Efficiency and ambiguity in an adap-tive neural code.Nature 412, 787–792 (2001).

[28] S Smirnakis, MJ Berry II, DK Warland, W Bialek, &M Meister, Adaptation of retinal processing to imagecontrast and spatial scale, Nature 386, 69–73 (1997).

[29] M Meister & MJ Berry II, The neural code of theretina. Neuron 22, 435–450 (1999).

[30] J Allman, F Miezin & E McGuinness, Stimulus specificresponses from beyond the classical receptive field:Neurophysiological mechanisms for local–global com-parisons in visual neurons. Ann Rev Neurosci 8, 407–430 (1985).

[31] RR de Ruyter van Steveninck, WH Zaagman & HAKMastebroek, Adaptation of transient responses of amovement–sensitive neuron in the visual system of theblowfly. Biol. Cybern. 54, 223–236 (1986).

[32] R de Ruyter van Steveninck, A Borst & W Bialek,Real time encoding of motion: Answerable ques-tions and questionable answers from the fly’s vi-sual system. in Processing Visual Motion in the RealWorld: A Survey of Computational, Neural and Eco-logical Constraints, JM Zanker & J Zeil, eds., pp.279–306 (Springer–Verlag, Berlin, 2001). See alsophysics/0004060.

[33] N Franceschini, A Riehle & A le Nestour, Directionallyselective motion detection by insect neurons. In Facetsof Vision, RC Hardie & DG Stavenga, eds., pp. 360–390 (Springer–Verlag, Berlin, 1989).

[34] RR de Ruyter van Steveninck, W Bialek, M Potters& RH Carlson, Statistical adaptation and optimal es-timation in movement computation by the blowfly vi-sual system. in Proc I.E.E.E. Conf. Sys. Man Cybern.,302–307 (1994). RR de Ruyter van Steveninck & WBialek, Optimality and adaptation in motion estima-

17

http://arxiv.org/abs/physics/0004060

tion by the blowfly visual system. Proceedings of theIEEE 22nd Annual Northeast Bioengineering Confer-ence, 40–41 (1996).

[35] M Carandini, DJ Heeger & JA Movshon, Linearityand normalization of simple cells of the macaque pri-mary visual cortex. J Neurosci 17, 8621–8644 (1997).

[36] Y Weiss, EP Simoncelli & EH Adelson, Motion illu-sions as optimal percepts. Nature Neurosci 5, 598–604(2002).

[37] L Iverson & SW Zucker, Logical/linear operators forimage curves. IEEE Trans PAMI 17, 982–996 (1995).

[38] E de Boer & P Kuyper, Triggered correlation. IEEETrans Biomed Eng 15, 169–179 (1968).

[39] R de Ruyter van Steveninck & W Bialek, Real–timeperformance of a movement sensitive neuron in theblowfly visual system: Coding and information trans-fer in short spike sequences. Proc R Soc Lond Ser. B234, 379–414 (1998).

[40] T Sharpee, NC Rust & W Bialek, Analyzing neuralresponses to natural signals: Maximally informativedimensions. Neural Comp 16, 223–250 (2004). See alsophysics/0212110, and a preliminary account in Ad-vances in Neural Information Processing 15, S Becker,S Thrun & K Obermayer, eds., pp. 261–268 (MITPress, Cambridge, 2003).

[41] F Rieke, DA Bodnar & W Bialek, Naturalistic stimuliincrease the rate and efficiency of information trans-mission by primary auditory neurons. Proc R Soc LondSer B 262, 259–265 (1995).

[42] B Aguera y Arcas, AL Fairhall & W Bialek, Com-putation in single neurons: Hodgkin and Huxley re-visited. Neural Comp. 15, 1715–1749 (2003). See alsophysics/0212113.

[43] ML Mehta Random Matrices and the Statistical The-ory of Energy Levels (Academic Press, New York,1967).

[44] PZ Marmarelis & GD McCann, Development and ap-plication of white–noise modeling techniques for stud-ies of insect visual nervous systems. Kybernetik 12,

74–89 (1973)[45] HL Bryant & JP Segundo, Spike initiation by trans-

membrane current: A white noise analysis. Physiol

(Lond) 260, 279–314 (1976).[46] Initial results were described in a technical report, W

Bialek & RR de Ruyter van Steveninck, What do mo-tion sensitive visual neurons compute?. NEC ResearchInstitute Technical Report 98-002N (1998). In this firsteffort, however, the connection to models of motion es-timation was incomplete.

[47] K Sen, BD Wright, AJ Doupe & W Bialek, Discov-ering features of natural stimuli relevant to forebrainauditory neurons.30th Annual Meeting of the Societyfor Neuroscience, Society for Neuroscience Abstracts(2000).

[48] O Schwartz, EJ Chichilnisky & EP Simoncelli, Char-acterizing neural gain control using spike–triggered co-variance. in Advances in Neural Information Process-ing 14, TG Dietterich, S Becker & Z Ghahramani,eds., pp. 269-276 (MIT Press, Cambridge, 2002).

[49] J Touryan, B Lau & Y Dan, Isolation of relevant vi-sual features from random stimuli for cortical complexcells. J Neurosci 22, 10811–10818 (2002).

[50] NC Rust, Signal transmission, feature representationand computation in areas V1 and MT of the macaquemonkey. (PhD Dissertation, New York University,2004).

[51] NC Rust, O Schwartz, JA Movshon & EP Simon-celli, Spike–triggered characterization of excitatoryand suppressive stimulus dimensions in monkey V1.Neurocomputing 58–60, 793–799 (2004).

[52] AL Fairhall, CA Burlingame, R Harris, J Puchalla &MJ Berry II, Multidimensional feature selectivity inthe retina. Preprint (2004).

[53] N Strausfeld Atlas of an Insect Brain (Springer–Verlag, Berlin, 1976).

[54] JD Armstrong, K Kaiser, A Muller, K–F Fis-chbach, N Merchant & NJ Strausfeld, Flybrain,an on–line atlas and database of the Drosophilanervous system.Neuron 15, 17–20 (1995). See alsohttp://flybrain.neurobio.arizona.edu/.

[55] K Gotz, Optomotorische untersuchungen des vi-suellen systems einiger augenmutanten der fruchtfliegeDrosophila. Kybernetik 2, 77–92 (1964).

18



http://flybrain.neurobio.arizona.edu/

Date post:	26-Aug-2018
Category:	Documents
Upload:	vanbao
View:	213 times
Download:	0 times

arXiv:q-bio.NC/0505003 v1 2 May 2005 - princeton.eduwbialek/our_papers/bialek+ruyter_05.pdf ·...

Documents