+ All Categories
Home > Documents > EFFICIENT CODING OF NATURAL...

EFFICIENT CODING OF NATURAL...

Date post: 31-May-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
18
1 To appear: New Encyclopedia of Neuroscience ed. Larry R. Squire (Elsevier, 2007). EFFICIENT CODING OF NATURAL IMAGES Daniel J. Graham and David J. Field Department of Psychology Cornell University Ithaca, NY 14853 USA OUTLINE: ABSTRACT INTRODUCTION Efficient for What Task? Defining Efficiency A. Representational Efficiency Correlation and Decorrelation Optimal Information Transfer Beyond Correlations: Sparseness and Independence Optimality with Nonlinear Systems B. Metabolic efficiency Spike Efficiency Minimum Wiring C. Learning Efficiency Sparseness and Invariance Overcompleteness "Hard-Coded" Efficiency Efficient Learning from the Environment Hybrid Strategies: Efficient "Programmed" Learning CONCLUSION REFERENCES FURTHER READING CROSS REFERENCES FIGURES ABSTRACT A wide variety of studies over the last twenty years have demonstrated that our sensory systems are remarkably efficient at coding the sensory environment. Much of this work has focused on the visual system and it has demonstrated that many properties of the early visual system are extremely well matched to the statistical structure of the visual world. However, there remain many questions regarding how far this approach can be taken in understanding the full visual system, especially higher levels of visual processing. Basic theories of efficiency (e.g., decorrelation, sparseness, independence, etc.) are likely to be insufficient to account for the more complex nonlinear representations found in higher levels. In this chapter, we take a closer look at how efficiency might be defined. In particular, we consider three forms of efficiency: representational efficiency, metabolic efficiency, and learning efficiency. Although the majority of studies have focused on representational and metabolic efficiency, we argue that a complete account of visual processing must consider all three forms of efficiency. Keywords: Vision, Natural Scenes, Efficient Coding, V1, Retina, Sparseness, Sparse Coding, Metabolism, Minimal Wiring, Retinal Waves, Programmed Learning, Invariance.
Transcript
Page 1: EFFICIENT CODING OF NATURAL IMAGESredwood.psych.cornell.edu/papers/graham-field-neuroscience-2007.pdf · computational models of the early visual system (e.g., Watson, 1983; Daugman,

1

To appear: New Encyclopedia of Neuroscience ed. Larry R. Squire (Elsevier, 2007).

EFFICIENT CODING OF NATURAL IMAGESDaniel J. Graham and David J. Field

Department of PsychologyCornell University

Ithaca, NY 14853 USA

OUTLINE: ABSTRACT

INTRODUCTIONEfficient for What Task?Defining Efficiency

A. Representational EfficiencyCorrelation and DecorrelationOptimal Information TransferBeyond Correlations: Sparseness and IndependenceOptimality with Nonlinear Systems

B. Metabolic efficiencySpike EfficiencyMinimum Wiring

C. Learning EfficiencySparseness and InvarianceOvercompleteness"Hard-Coded" EfficiencyEfficient Learning from the EnvironmentHybrid Strategies: Efficient "Programmed" Learning

CONCLUSIONREFERENCESFURTHER READINGCROSS REFERENCESFIGURES

ABSTRACTA wide variety of studies over the last twenty years have demonstrated that our sensory systems areremarkably efficient at coding the sensory environment. Much of this work has focused on thevisual system and it has demonstrated that many properties of the early visual system are extremelywell matched to the statistical structure of the visual world. However, there remain many questionsregarding how far this approach can be taken in understanding the full visual system, especiallyhigher levels of visual processing. Basic theories of efficiency (e.g., decorrelation, sparseness,independence, etc.) are likely to be insufficient to account for the more complex nonlinearrepresentations found in higher levels. In this chapter, we take a closer look at how efficiency mightbe defined. In particular, we consider three forms of efficiency: representational efficiency,metabolic efficiency, and learning efficiency. Although the majority of studies have focused onrepresentational and metabolic efficiency, we argue that a complete account of visual processingmust consider all three forms of efficiency.

Keywords: Vision, Natural Scenes, Efficient Coding, V1, Retina, Sparseness, Sparse Coding, Metabolism,Minimal Wiring, Retinal Waves, Programmed Learning, Invariance.

Page 2: EFFICIENT CODING OF NATURAL IMAGESredwood.psych.cornell.edu/papers/graham-field-neuroscience-2007.pdf · computational models of the early visual system (e.g., Watson, 1983; Daugman,

2

INTRODUCTION

Images and movies of the natural worldare known to share a variety of statisticalregularities. Such stimuli show consistentspatial statistics (e.g., Field, 1987, 1994;Burton and Moorhead, 1987), spatio-temporalstatistics (van Hateren and Ruderman, 1998),contrast and intensity distributions (e.g., Bradyand Field, 2000; Frazor and Geisler, 2006),and chromatic structure (e.g., Webster andMollon, 1997; Hoyer and Hyvärinen, 2000).Consider the images shown in Figure 1. Thefirst row shows white noise and the secondshows natural scenes. A process that generatesrandom white noise images will generate allpossible images with equal probability.However, because of the statistical regularitiesof natural scenes, the possibility of seeing anatural scene generated from a white noiseprocess is extremely low. For even a small 8x8pixel patch, the entropy of the white noisepatches with 8 bits of grey level is 8x8x8 or512 bits. This results in a total of 2512, or about10154 equally probable images. Natural scenesare considerably redundant (i.e., they havemuch lower entropy) with estimates that theycontain approximately 40% of the entropy ofwhite noise for small (8x8) patches, and stilllower relative entropy for larger patches(Chandler and Field, 2007). This differencewould in turn imply that one white noisepattern out of 1090 would have the basicstatistics of natural scenes (assuming a flatdistribution). Although the precise ratiorequires an estimate of the true distribution,the estimate does demonstrate just howredundant natural scenes are. Because of thisredundancy, it is possible to build a visualsystem that is efficient if it is specificallydedicated to representing such anenvironment. A visual system can reduce thesize of its problem space and focus its effortson what it is likely to encounter in the world.Because of this redundancy, visual systemscan be shaped by evolution and developmentto take advantage of the environment.

[Figure 1 here]

Using the supposition that the visualsystem employs codes that approach optimalefficiency with respect to predictable (i.e.,redundant) structure, one can go some way inexplaining why visual neurons show thecoding properties they do. A number ofstudies have shown success at predictingmany of the basic linear properties of visualneurons (e.g., Field, 1987, 1994; Zetzsche andRöhrbein, 2001; Wiskott and Sejnowski, 2002;Simoncelli and Olshausen, 2001). In thefollowing sections we will review some of thiswork. But we note that this work and anyextension of it depend on the appropriatedefinition of efficiency. In the classicaldefinition of efficiency used in engineering, asystem is most efficient if all of its work(within limits imposed by thermodynamics) isdone in service of its task. In this view,efficiency presumes that the system has somewell-defined tasks, and that the system’sdesign is a direct reflection of the need to doas little work as possible to achieve thosetasks. But how do we define these tasks forbiological systems?

Efficient for What Task?Many sensory systems, whose task or

"goal" is often readily definable andmeasurable, have sensitivity that is near theirphysical limits (e.g., Squires, 2004; Aizenberget al., 2001; Sundar et al., 2003; Baylor et al.,1979; Denk and Webb, 1989). With respect tothe brain as a whole, however, the only clearlydefined task is the most general one inbiology: differential reproductive success. Butthis goal tells us little about why a visualsystem would have the structure it does.Clearly, the visual system is involved in vision,but to treat it as an independent sub-systemwith some basic input/output relationship istoo much of a simplification. Certainly, theoutput of the visual system cannot be reducedto a behavior. And although it may seemcomputationally reasonable, there is littlereason to presume that the output of thevisual system is simply some “object-detector”. There are three reasons for this:

Page 3: EFFICIENT CODING OF NATURAL IMAGESredwood.psych.cornell.edu/papers/graham-field-neuroscience-2007.pdf · computational models of the early visual system (e.g., Watson, 1983; Daugman,

3

Object recognition is not a well localizedprocessing task in cortex; no knowncongenital disorder obliterates this abilityexclusively; and patients with visual agnosiacaused by lesions typically show severedeficits in other visual faculties (Bloom, 2002;Farah, 2004). Even though object recognitionis distributed, one might still propose thatobject recognition could be the collective goalof the visual system's many parts. But objectrecognition surely competes for neural realestate with areas whose goals include accuratespatial mapping, motion predictions, and ahost of other tasks.

This difficulty in defining the visualsystem's general goal is likely to prove asignificant hurdle for studies attempting toapply efficient coding theories to later stagesof processing. In addition, our current abilityto predict a neuron’s behavior is significantlylimited at these later stages. Much of thesuccess of efficient coding techniques hasbeen in the application to neurons relativelyearly in the visual pathway (retina, lateralgeniculate nucleus, and V1). For neurons inthese areas, the family of response propertiesis relatively well defined and it is possible totalk about how the information in the visualimage is represented with the array of neuronsin each of these areas (although see Olshausenand Field, 2005). This in turn allows us toaddress the question of whether the array ofneurons is processing the informationefficiently. However, the approach stillrequires a precise definition of efficiency.

Defining EfficiencyFor those who have proposed theories of

optimality in early visual coding, much of thediscussion has centered on the metric ofefficiency. The majority of the papers in thefield focus on what we will callrepresentational efficiency. Such papersemploy the tools of information theory, andthey have explored the properties of neuronsthat are involved in representing the image.These papers have focused on issues relatingto correlations and statistical independence inthe firing rates of neurons. Much of this work

has involved neural networks andcomputational models of visual areas. Asecond line of research has focused on whatwe call metabolic efficiency. Several influentialpapers have investigated the metabolic costsof generating spikes, and others have arguedthat constraints that minimize wiring areimportant for explaining known neuronalproperties. In this paper, we propose a thirdform of efficiency that we call learningefficiency. We argue that an importantconsideration for any sensory system is thechallenge of learning about the relativeprobability of events in the world based on ahandful of samples.

To the extent that the visual system isoptimally efficient at carrying out its multitudeof tasks, these efficiency rules are likely toeach contribute significantly to a descriptionof why the visual system is designed as it is.Visual systems may approach optimality (inthe engineering sense of the word) acrosseach of these three dimensions but as we willargue, one of these dimensions—learningefficiency—is one that engineers rarelyconsider. These dimensions are notnecessarily orthogonal to one another, nor dothey currently all have well-defined units ofmeasurement. These dimensions are meresketches of the terrain over which the humanvisual system appears to have been optimizedthrough evolution. Furthermore, we believethese dimensions can find application in othermodalities as well, though here we focus onthe visual system. Together, these efficiencydimensions may also suggest ways to designefficient artificial visual systems.

A. Representational EfficiencyMarcelja (1980) was the first to propose

that neurons found in the primary visualcortex (V1) show a number of similarities tothe mathematical functions described byGabor’s (1946) theory of communication.Later studies (Field and Tolhurst, 1986; Jonesand Palmer, 1987) confirmed that Marcelja'sGaussian-modulated sinusoid model (i.e. theGabor function model) provided a good firstorder approximation to the receptive field

Page 4: EFFICIENT CODING OF NATURAL IMAGESredwood.psych.cornell.edu/papers/graham-field-neuroscience-2007.pdf · computational models of the early visual system (e.g., Watson, 1983; Daugman,

4

properties of these neurons. Some of the firstcomputational models of the early visualsystem (e.g., Watson, 1983; Daugman, 1985)demonstrated how an array of neurons withthese properties can represent a visual image.However, this work left unresolved theimportant question of why such a solutionmight evolve.

One approach suggests that anunderstanding of the why question requiresconsideration of the environment in whichthe system functions. This approach suggeststhat theories should be guided by anunderstanding of the statistics of naturalscenes. Early television researchers, and laterField (1987) and Burton and Moorhead(1987), found that the Fourier spatialfrequency spectra of natural scenes typicallyfall off as 1/f k where f is spatial frequency andk is approximately 1.2 (Tolhurst et al., 1992;Field 1993). This regularity expresses the sameredundancy as the autocorrelation function ofscenes, which measures how similarneighboring points are in terms of luminance.The 1/f structure follows from twoproperties: (1.) Neighboring points arecorrelated, and (2.) the images are roughlyscale invariant (Field, 1987). A number ofstudies have shown that there are a widevariety of forms of redundancy found innatural scenes that go beyond the pairwisecorrelations and spatial frequency spectra, asdescribed below. However, pairwisecorrelations have been the source of a numberof theories regarding efficient coding andmany researchers imply that these are themost relevant statistics, so we begin with adiscussion of such correlations.

Correlation and DecorrelationWhen the notion of stimulus redundancy

is considered, most papers typically refer tothe pairwise correlations in the data. Many ofthe earliest theories of efficient sensory coding(e.g., Attneave, 1954; Barlow, 1961) developedfrom the notion that if the neural responses totwo stimuli are correlated, then an efficientsystem should strive to represent the data withreduced correlation. This is certainly one

important form of redundancy. Chandler andField (2007) argue that the pairwisecorrelations (as represented by the powerspectra) account for approximately 40% of tothe total redundancy in natural scenes (for 8x8natural scene patches).

In theory, any representation withsignificant correlations implies that most ofthe signal lies in a subspace within the largerspace of possible representations. By choosinga representation that codes for only thatsubspace, it is possible to represent the datawith significantly reduced dimensionality (e.g.,a reduced number of neurons, or a smallerdynamic range of responses). Srinivasan,Laughlin and Dubs (1982) showed that thecenter-surround structure of fly largemonopolar cells is well matched to thecorrelations in a collection of scenes theyconsidered relevant to the fly. Given a certainlevel of noise, they found that a weighted,linear sum over space transmits the greatestamount of information about their collectionof natural inputs. Atick and Redlich (1992)continued this line of inquiry arguing that theamplitude spectra of natural scenes areeffectively flattened once they reach retinaloutputs. The argument is that the roughlylinear rise in sensitivity with increasing spatialfrequency is inversely related to the 1/f–distributed fall-off in natural scene spectra,thus allowing neighboring neurons to beuncorrelated (i.e., have a flat spectrum). Atickand Redlich (1992) argue that thisdecorrelation is reflected in a flattening of thespatial frequency response of retinal ganglioncells. However, this argument is dependent onappropriate spacing of retinal neurons.Moreover, the current evidence suggests thatneighboring neurons are significantlycorrelated in the presence of natural scenes(Puchalla et al., 2005; Nirenberg et al., 2001).

We have argued (Graham, Chandler andField, 2006) that an independent goal ofretinal coding is to achieve compression whilemaintaining equal response magnitude acrossthe array of neurons of different sizes. Thisapproach achieves a form of responsespectrum flattening but it is quite independent

Page 5: EFFICIENT CODING OF NATURAL IMAGESredwood.psych.cornell.edu/papers/graham-field-neuroscience-2007.pdf · computational models of the early visual system (e.g., Watson, 1983; Daugman,

5

of whether the neurons are tuned and spacedideally to achieve decorrelation. This worksuggests in order to explain the details ofneural response properties, we mustacknowledge that decorrelation in the retinaand LGN is far from complete and thatefficient coding schemes must consideradditional constraints.

Optimal Information TransferAnother strategy attempts to utilize each

level of response magnitude with equalfrequency. To this end, Laughlin (1981)compared the distribution of local contrasts inthe blowfly environment with contrastresponses in large monopolar cells in thecreature's eye. The intensity-response functionmeasured for these cells is well matched to adistribution of natural contrasts, such that therange of possible responses approximated theinformation-theoretic ideal. In other words,contrast responses in the fly are distributedacross the range of environmental contrasts insuch a way as to minimize the number of"code words" necessary for transmittinginformation about the entire range ofcontrasts at a given fidelity. Later studies haveshown that adaptation of visual neurons withrespect to local luminance (van Hateren andSnippe, 2001; Schwartz and Simoncelli, 2001),orientation (Wainwright, 1999), and contrast(Yu and Lee, 2004) can also be addressedusing this maximum information transfer(infomax) approach. A more elaborateextension of infomax models combinesfeedback of stored predictors from highercortical areas with typical natural inputs(Mumford, 1994; Rao and Ballard, 1997).These schemes optimize over families ofrepresentations that minimize error betweenthe input and a top-down representation.Feedback representation optimization modelsof this sort effectively produce a system whereprimary visual areas reduce their response ashigher areas provide better descriptions ofinput "content," in line with recent imagingfindings (e.g., Murray et al., 2002). It should benoted that in many infomax models, optimalredundancy reduction (decorrelation) and

maximum information transfer strategies areequivalent.

Beyond Correlations: Sparseness and IndependenceThe pairwise correlations found in natural

scenes represent only one form ofredundancy. Figure 2 shows two images—anatural scene and noise—with similar 1/fstructure and therefore similar pairwisecorrelations. There are a number of ways todescribe the differences between these twoimages. For example, they differ in their phasespectra. But the two images can also bedescribed in terms of differences in theirsparse structure. For the 1/f noise image, alllinear representations produce responsedistributions that are Gaussian. However, forthe natural scene, some projections of thedata are non-Gaussian. That is, when theappropriate array of linear filters is used torepresent a natural scene, the histogram ofactivity will be a non-Gaussian histogram.

[Figure 2 here]

As noted by Field (1994), a non-Gaussianhistogram implies low entropy in the first-order responses and relatively high entropy inthe higher-order relationships between thefilters for a linear transform (i.e., moreindependent). In other words, a system thatproduces maximally non-Gaussian histogramsproduces a representation where the neuronsare maximally independent. This is the basicidea behind sparse coding algorithms(Olshausen and Field, 1996) and independentcomponents analysis or ICA (e.g., Bell andSejnowski, 1997).

In a code with maximal independence, thefiring of each neuron provides maximalunique information (i.e., the sharing ofinformation with other neurons has beenminimized). If the data consist of an array ofrelatively rare, sparse events then matchingthe neurons to those events will produceactivity that is sparse. The definitions ofsparseness have varied in the literature. Ingeneral, "sparse" implies a relatively highprobability of no activity across the

Page 6: EFFICIENT CODING OF NATURAL IMAGESredwood.psych.cornell.edu/papers/graham-field-neuroscience-2007.pdf · computational models of the early visual system (e.g., Watson, 1983; Daugman,

6

population, and some proportion of relativelyactive neurons. In the computationalliterature, where one is often modelingneurons as linear operators, the kurtosis (the4th statistical moment) of the responsehistogram can be used to describe relativesparseness. Other metrics such as thesparseness index (e.g., Rolls and Tovee, 1995)have proved more useful for spike trains.

Field (1987, 1994) demonstrated thatarrays of linear neurons with properties likethat found in primary visual cortex appear tomaximize the sparse response to naturalscenes. Olshausen and Field (1996) furtherdemonstrated that a neural network thatattempts to represent natural scenes andmaximize sparseness will produce an array ofneurons with spatial properties like thosefound in cortical simple cells. That is, a systemthat is forced to produce a faithfulrepresentation of the input using only ahandful of neurons (each firing near itsmaximum response when it is active) givessimple-cell-like receptive fields. This suggeststhat at least at the level of primary visualcortex, visual system representations showevidence of being efficiently matched to thesparseness of natural scenes. Similar resultshave been found for spatio-chromatic stimuli(e.g. Hoyer and Hyvärinen, 2000; Tailor et al.,2000; Wachtler et al., 2001; Lee et al., 2002;Caywood, Willmore, Tolhurst, 2004) andspatio-temporal patterns (e.g., van Haterenand Ruderman, 1998; Olshausen, 2003b).

It must be emphasized that sparse outputsof these networks result from the sparsestructure of the data. It would be relativelysimple to produce a nonlinearity that forces asparse output independent of the input, onethat would not be an efficient coding strategy.The networks described above expresslysearch for the sparse structure that exists inthe data. Moreover, recordings of primatevisual neurons in response to natural sceneshave also been shown to produce a sparseoutput (e.g., Vinje and Gallant, 2000, 2002,Willmore and Tolhurst, 2001; David, Vinje,and Gallant, 2004). Visual neurons showsparse responses in both early stages of

cortical processing like V1 and in higher levelslike inferotemporal cortex (Lamme, 1995;Baddeley et al., 1997). Highly sparse firing is awidespread phenomenon in a variety of brainareas and species, including monkeyassociation areas (Abeles et al., 1990), rabbitmotor areas (Beloozerova, Sirota andSwadlow, 2003), rat somatosensory areas(Brecht and Sakmann, 2002), rat auditoryareas (DeWeese et al., 2003), rat hippocampus(Thompson and Best, 1989) and centersbelieved to be involved in bird songgeneration (Hahnloser et al., 2002).

However, the sparse response of theseneurons is not direct evidence that theseneurons are efficiently representing theenvironment. All of these neurons arenonlinear and higher-level neurons are verynonlinear. A proof that these codes areefficient would require a clear understandingof how the array of neurons represents theinput image. We are not at this level ofunderstanding—and we may never be. Mostvisual neurons beyond the retina can bemodulated by higher levels of processing.Although a number of papers have impliedthat a decent theory of primary visual cortex isclose at hand, Olshausen and Field (2005)have argued that we are still a long way fromsuch a theory. Part of the argument comesfrom the work of the Gallant lab (e.g., Davidet al. 2004). This work has made an effort tomeasure the responses of primate visualneurons to a set of natural stimuli and topredict from these responses how eachneuron would respond to an arbitrary naturalscene. They have found that even in V1,typically less than 50% of the responsevariance to new stimuli can be predicted. Inhigher levels of the visual system, thatprediction accuracy is further reduced. Inaddition, as noted in Olshausen and Field(2005), recording studies select only a smallportion of the neurons in any given area andresearchers typically record only from largepyramidal neurons that produce large spikes.Other neuron types (e.g., granular neuronsthat produce smaller spikes) are often notsurveyed in these studies, which has led to the

Page 7: EFFICIENT CODING OF NATURAL IMAGESredwood.psych.cornell.edu/papers/graham-field-neuroscience-2007.pdf · computational models of the early visual system (e.g., Watson, 1983; Daugman,

7

interesting suggestion that there exists a neural"dark matter" problem (Shoham et al., 2006).

Optimality with Nonlinear SystemsDespite the difficulties in providing a clear

metric for defining efficiency in nonlinearsystems, a number of efforts have been madeto generalize efficiency arguments to accountfor the known nonlinear properties of visualneurons (e.g., Field, 1993; Schwartz andSimoncelli, 2001; Wainwright et al. 2002;Zetzsche and Röhrbein, 2001; Prenger et al.,2003). One particularly fruitful approachinvolves a technique called slow featureanalysis (Wiskott and Sejnowski, 2002; Berkesand Wiskott, 2005). According to this model,a stimulus moving across an image results inquick changes to the neurons responding tothe input (for a linear array of neurons). If weconsider the histogram of neural activityintegrated over some time period, we wouldfind that the activity is less sparse than in asingle time frame. However, the changes inimages are not random and are often causedby objects or backgrounds showing consistentmovement. Slow feature analysis attempts totake into account this redundancy in themovements of features. The techniqueattempts to find nonlinear solutions that arecapable of describing the moving images witha relatively consistent set of neurons, despitethe changes that are occurring. Slow featureanalysis has been demonstrated to producenonlinear behavior like that shown withcomplex cells (Berkes and Wiskott, 2005). Ithas been argued that current implementationsof slow feature analysis are mathematicallyequivalent to a form of spatiotemporal ICA(Blaschke et al., 2006). Given that objectidentities are relatively invariant over time,there is hope that these lines of work willeventually be capable of producing object-level representations like those found ininferotemporal cortex. However, we have notyet reached that point.

It should also be noted that there exist avariety of techniques that come under thegeneral heading of nonlinear ICA (e.g.,Schwartz and Simoncelli, 2001; Kayser et al.,

2003; Malo and Gutierrez, 2006). This area ofresearch is too vast to be reviewed here. Manyof these papers have generated some of thenonlinear properties of visual neurons.However, since we do not yet have acomplete model of the nonlinearities in theseneurons, we cannot yet argue that suchtechniques are capable of accounting for thefull array of neural properties in any region ofvisual cortex.

Without a clear account of howinformation is represented in any given area,we are left without any kind of proof that theinformation is represented efficiently. Linearmodels that optimize sparseness andindependence do produce simulated neuronswith many properties like those found in V1.This certainly supports the notion that V1 isdirected towards an efficient code for naturalscenes. But as we will argue in the nextsections, there are both metabolic andlearning costs for high sparseness andtherefore a more complete understanding ofefficient coding requires examining factorsbeyond those considered in proposals ofrepresentational efficiency. Both the design ofartificial visual system and the development ofalgorithms for representing natural scenes willlikely benefit from our understanding of alarger range of evolutionary constraints, likethose described below. These additionalbiological constraints require a broadening ofcurrent notions of efficient design beyondrepresentational efficiency. We may somedayfind that the best artificial systems lookremarkably like systems we find in biologicalsystems, and it is likely that a full account ofbiological systems will require anunderstanding of these biologicalrequirements.

B. Metabolic efficiencyConsideration of representational

efficiency as discussed above leads toimportant insights into the design of visualsystems. It provides a number of metrics ofefficiency that are generally independent ofthe limitations of energy or of neuralhardware. In this section, we consider

Page 8: EFFICIENT CODING OF NATURAL IMAGESredwood.psych.cornell.edu/papers/graham-field-neuroscience-2007.pdf · computational models of the early visual system (e.g., Watson, 1983; Daugman,

8

attempts to explain the properties of neuronsfrom a consideration of metabolic constraints.We discuss two broad hypotheses of efficientneural design: spike efficiency and neuralwiring optimization.

Spike EfficiencyAny information processing strategy in

the nervous system will incur metabolic costs.If we accept that information is primarilytransmitted through the use of spikes, then arelevant question is whether one of theconstraints on information processing is thenumber of spikes. Two studies have madedetailed efforts to estimate the metabolic costof spikes and both have come to theconclusion that the high cost of spikes indeedresults in an important constraint on neuralprocessing. Attewell and Laughlin (2001) andLennie (2003) both find that the total availablemetabolic resources and the cost of a spikelimit the firing rate to less than 1 Hz–andprobably less than 0.2 Hz. As noted byOlshausen and Field (2005) most studies havefound firing rates significantly highersuggesting that the cells that are typicallyrecorded have unusually high firing rates.Levy and Baxter (1996) argue that when thecost of spiking is considered, maximuminformation transfer is attained when only 2-16% of the neurons are firing. Lennie (2003)estimates that the limited resources imply thatat any given moment, only 1/50th of thepopulation of cortical neurons will show highfiring rates. Thus, in order to save energyindividually and across populations, visualneurons must adopt highly sparse patterns offiring. Therefore, we can argue that from botha representational point of view and ametabolic point of view, sparse firing isefficient.

A related debate centers on the optimalhistogram of activity for a real neuron thatcan only produce a positive firing rate. If thegoal of a neuron is to maximize the totalinformation rate while minimizing the meanactivity, the most efficient distribution is theexponential distribution (Levy and Baxter,1996; Rieke et al., 1997). There is certainly a

similarity between the histograms of activityof visual neurons (in response to naturalsignals) and the exponential distribution(Baddeley et al., 1997). However, on closeinspection, the exponential model does notappear to fit. For example, Treves et al. (1999)found that for neurons in inferotemporalcortex, the exponential model could berejected in 84% of cases. The lack of anyparticular successful model led these authorsto conclude that there was "no specialoptimization principle or purpose to the firingdistributions found."

Moreover, we question whether the goalshould be to optimize the information rate.Consider a case where there are n possiblecauses or features in an image and there existn neurons available to represent thosefeatures. One could argue from arepresentational viewpoint that one shouldmatch the neurons to the features. If eachfeature had a particular response probability,then matching that feature would produce ahistogram of activity that matched thehistogram of the feature, not one thatnecessarily matched the histogramcorresponding to the optimal compressionalgorithm. Certainly we do not expect thatthere is a match between the number offeatures and the number of neurons. But thisexample is meant to show that if we expectneurons to provide an explicit representationof their environment, part of the impetus forthat explicit representation may be to matchthe probability distributions of theenvironment—not simply to maximizeinformation rates.

Minimum WiringIn addition to metabolic constraints, any

neural architecture will be dependent on theanatomical "wiring" available. This so-calledwiring optimization principle (see, e.g.,Koulakov and Chklovskii, 2001) dates back toRamon y Cajal (see Laughlin and Sejnowski,2003). This type of metabolic efficiency istypically measured by comparing the wiringvolume and distribution of real neurons withcalculations of the minimum volume needed

Page 9: EFFICIENT CODING OF NATURAL IMAGESredwood.psych.cornell.edu/papers/graham-field-neuroscience-2007.pdf · computational models of the early visual system (e.g., Watson, 1983; Daugman,

9

to connect model cells and cortical areas toone another. Mitchison (1991) has argued thatstripe patterns in cortex (e.g., oculardominance columns) are efficient atminimizing the volume of dendritic wiringneeded by areas receiving input from twoseparate sources, compared to alternativearrangements. Durbin and Mitchison (1990)found that a highly reduced model of corticalwiring arrangement matched real wiring incertain ways and minimized a model of wiringcost. Along the same lines, Koulakov andChklovskii (2001) found that pinwheelpatterns of orientation preference in cortexallow efficient wiring. In a related study,neural wiring density and local signal delayhave similarly reached an optimal compromise(Chklovskii et al. 2002). Together, thesestudies suggest that intracortical wiringoptimization can help explain why corticalmaps have the topography they do(Chklovskii and Koulakov, 2004). Cherniak etal. (2004) and others have shown that minimalwiring may also help explain why brain areasare placed and interconnected as they are. Inan early study using a reduced anatomicalmodel of the C. Elegans ganglia, Cherniak(1994) employed a connection-optimizingalgorithm to search for connectivities thatminimized total wiring. The actual wiring ofthe ganglia matched the optimal arrangementfrom a family of possible orderings.

Some have argued that brain evolutionand development impose invariantconnectivity ratios, which may tend to enforceoptimal wiring. For example, Changizi andShimojo (2005) suggest that across phyla theaverage number of synapses per neuron scaleswith the number of neurons perphysiologically defined area, and that thenumber of connections among areas (perarea) scales with the number of areas.However, this approach has severallimitations. It should be noted that thenumber of ways that an efficient wiringsystem could strive to optimize the cost ofadding cell volume, increasing cellmetabolism, delaying and attenuating thesignal and making projections during

development is essentially unlimited. Butwhen considered in concert withrepresentational constraints, minimum wiringarguments can provide insights about efficientreceptive field design (e.g., Vincent andBaddeley, 2003).

C. Learning EfficiencyIn addition to the fact that neural systems,

unlike human-engineered ones, are difficult toassess in terms of their goals and costs, theyalso differ from traditionally engineeredsystems because they are produced throughdevelopment and learning rather than throughcarefully planned assembly. Indeed part of thereason why the level of efficiency of a brain ishard to measure is that it becomes specializedto a variety of tasks—oftenconcurrently—during development, thecombination of which helps the adult creaturereproduce. Therefore, in order to explain whythe visual system of the adult has the structureit does given a proposal of efficientprocessing of natural scenes, we mustconsider how visual system representationsmay depend on efficient learning anddevelopment. This efficiency dimension inparticular is an example of somewhere ourintuition about the brain seeking desperatelyto conserve energy can lead us astray. Indeed,learning efficiency could be at odds withmetabolic cost optimizations since theproportion of metabolic energy consumptionin infant human brains (which are mostsubject to this constraint) is roughly threetimes that for adults (Hofman, 1983).

In discussions of efficiency in sensorysystems, many studies have used various typesof neural networks and other learningalgorithms to generate early sensoryprocesses. There are certain basic problemsthat any learning algorithm must overcome.We examine one of these issues—invariance.Then we describe evidence of contributionsto learning efficiency from innate, purelylearned and hybrid strategies, which span thewell-known nature/nurture debate.

Sparseness and Invariance

Page 10: EFFICIENT CODING OF NATURAL IMAGESredwood.psych.cornell.edu/papers/graham-field-neuroscience-2007.pdf · computational models of the early visual system (e.g., Watson, 1983; Daugman,

10

Certain constraints are imposed on asystem that must learn about its environmentfrom a relatively small number of examples.In essence, efficient learning is dependent onfinding a balance between selectivity ofneurons for specific features and invarianceacross examples whose features vary inirrelevant ways (e.g., lighting). This task isfurther complicated because each example islikely to be seen just once in a lifetime. Asecond instance is unlikely to be presentedwith the same lighting, the same position, thesame size and the same orientation as the first.To allow any calculation of the probability ofwhat is likely, it is critical that the system becapable of generalizing to multiple instancesof that object.

Sparse coding, in the strict version, doesnot help with this problem. The typicalargument against hierarchical sparse codes isthat a system that develops a neuron for everyobject would require too many objectdetectors in order to function. This"grandmother cell" hypothesis has beencritiqued elsewhere (see Gross, 2002). But it isworth noting that plausible models of objectrecognition (e.g., Riesenhuber and Poggio,2000) suggest the need for grandmother cell-like coding for certain tasks (e.g.,distinguishing between different faces) andsparse population codes for other tasks (e.g.,categorization). That is, object recognitionmay require a variety of strategies that vary intheir degree of sparseness.

It has also been argued that optimizing forsparseness and independence can assist asystem in identifying novel inputs and indetecting new relationships (Baum et al. 1988;Field, 1994; Olshausen, 2003a). Consider thecase where objects are uniquely representedby just one neuron versus the case where theobject is identified by the relative activity of100 neurons. In the single neuron case, tolearn that two objects often co-occur requiresa relatively simple algorithm (e.g., Hebbianlearning). But when identification requires aparticular activity profile among manyneurons, the learning algorithm would require

a far more complex association among the100 units involved in the representation.

OvercompletenessAn important property of visual

representations beyond the optic nerve is thatthey are highly overcomplete. Visual cortex inhumans contains on the order of 1000 timesmore neurons than the two optic nerves(Barlow, 2001). In macaque V1 alone, thereare approximately 50 times more output fibersthan input fibers. These overcomplete codesmust involve significant redundancy.Overcompleteness has been suggested as anoptimally efficient way "to model theredundancy in images, not necessarily toreduce it" (Barlow, 2001). As Riesenhuber andPoggio (2000) point out, overcompletenessallows any particular signal to be representedwith a higher degree of sparseness than ispossible with complete codes, a propertywhich is useful for generalization duringlearning, as described above. In order toachieve efficient learning, one must considersolutions that may not be optimally efficientfrom a strict representational efficiency pointof view. That is, solutions that arerepresentationally efficient (e.g., maximuminformation transfer arguments) may not helpexplain how the system achieves efficientlearning.

"Hard-Coded" EfficiencyThough every part of the visual system

undergoes developmental change, someefficient properties of the adult cells arethought to be genetically pre-programmed.Maloney (1986) found that the distribution ofnatural chromaticities could be efficiently andaccurately coded using just three wavelengthvectors, and that the three cone spectralsensitivities in particular may optimallyachieve color constancy based on typicalsurface spectral reflectances in the naturalworld (see also Buchsbaum and Gottschalk,1983). Others (e.g., Osorio and Vorobyev,1996) have proposed that the distribution ofspectral sensitivities in primates is optimallyweighted so as to separate nutritious fruit

Page 11: EFFICIENT CODING OF NATURAL IMAGESredwood.psych.cornell.edu/papers/graham-field-neuroscience-2007.pdf · computational models of the early visual system (e.g., Watson, 1983; Daugman,

11

from surrounding foliage, a genetic adaptationthat could have aided propagation oftrichromatic primates.

Moreover, at the level of the retina, thereis a deep conservation of "virtually allfunctional and structural features ofimportance" and of the developmentalprogram in the vertebrate retina (Finlay et al.,2005). This may indicate that visual systemdevelopment is optimized to allow successfulelaboration, robustness and specialization(what is referred to as "evolvability") for arange of species that live in a variety ofhabitats using a single developmental andorganizational scheme (Finlay et al., 2005). Itcan therefore be seen as an efficiency which isinnate, but which is connected to the types ofvariations present in a changing visualenvironment.

The development of classical receptivefield organization in cortex does not appear torequire learning, and relatively little spatialrefinement is needed to achieve adult levelacuity. One area of debate centers on thequestion of the necessity of visual experiencein order to achieve adult receptive fieldstructures, which presumably contribute toefficient representations in the brain. Recentstudies (e.g., Carrasco et al., 2005) havesuggested the prevailing view—whichproposes that animals dark-reared during thecritical period of development are left withunrefined receptive fields because visualexperience had no chance to "prune"dendritic arbors—could be in need ofreevaluation.

Efficient Learning from the EnvironmentIn higher levels of cortex, learning leads to

a marked decrease in mean response inprefrontal and inferotemporal cortex, whichhas been linked to the detection of novelstimuli. In particular, learning based onnatural stimulus matching in monkeys led tosystematically lower mean responses tolearned natural stimuli in prefrontal cortex,and these responses did not vary with theaddition of noise (Rainer and Miller, 2000;Ranganath and Rainer, 2003). At lower levels

of cortex, learning has a minor effect onclassical response properties like orientationtuning and receptive field size in V1 cells(Ghose et al. 2002). Learning also has arelatively small effect on responses in V4 tonoise-degraded natural stimuli (Rainer et al.2004). Therefore, efficient learning strategiesmay be an important principle especially inhigher levels of visual cortex.

Hybrid Strategies: Efficient Innate LearningFinally, there is evidence that spontaneous

activity in the visual system in a period beforethe creature's eyes open may help refine theorganization of the visual system. There isgood evidence that suppressing spontaneousactivity in the retina (so-called retinal waves)can affect the refinement of retinalprojections to the LGN (see, e.g., Wong,1999; Butts, 2002). As some have noted,spontaneous, patterned activity is observed incortex, hippocampus, thalamus, retina andspinal cord in a variety of creatures (Butts etal., 1999). Simple programs of the sort thatcould theoretically produce spontaneous,patterned activity could well be coded intogenes, and the "running" of these programscould produce the necessary statisticalproperties that a developing visual systemneeds to operate at a basic level once the eyesopen. We refer to this as notion as innatelearning, since it employs elements of bothlearning and innateness. These programmedlearning strategies are efficient in the sensethat in the absence of external stimuli, theyrequire far fewer genetic instructions in orderto develop proper response propertiescompared to full genetic specificity.

However, innate learning in the form ofspontaneous retinal activity is not required forthe initial formation of ocular dominancecolumns (Crowley and Katz, 2000) or oforientation selectivity. There is strongevidence that retinotopic maps and oculardominance patterns fail to refine properly inanimals raised in altered visual environments(e.g., strobe environments, binoculardeprivation, environments with altered spatialstatistics). But it should be noted, especially

Page 12: EFFICIENT CODING OF NATURAL IMAGESredwood.psych.cornell.edu/papers/graham-field-neuroscience-2007.pdf · computational models of the early visual system (e.g., Watson, 1983; Daugman,

12

considering the brief time during whichspontaneous activity has been so far studied,that "the extent to which spontaneous andvisually driven activity contribute to thedevelopment and maintenance of stimulusspecificity is unclear" (Carrasco et al. 2005).

In short, learning efficiency may requiresolutions that would not be predicted byrepresentational efficiency arguments ormetabolic efficiency arguments alone. Thedifficulty of learning invariant properties ofobjects based on a handful of presentations isone reason why this is so. The degree towhich the visual system is efficientlyengineered to learn is dependent on innateproperties, on learned associations, and onpatterns of programmed spontaneous activity,the combination of which may approach"optimal" efficiency in concert with metabolicand representational constraints.

CONCLUSION

Visual systems have been argued to beoperating near theoretical limits ofoptimality—not simply parsimoniously—forcertain tasks. But given the profusion of waysthat brain structure, cell development andfunction, and neural representation arebelieved to be optimal in some way, we arguethat visual system design represents acompromise among these many demands forefficiency. We have attempted to elucidate avariety of ways in which visual systems canstrive for efficiency. No single quantity (e.g.,energy consumption) applies strict limits tobrain design (except in the absolute sense),nor is the system optimized to perform asingle, "optimizable" task. There is much tobe learned from investigating the contributionfrom a variety of constraints on efficiency.

REFERENCES

Abeles, M., Vaadia, E. and Bergman, H. 1990.Firing patterns of single units in theprefrontal cortex and neural networkmodels. Netw.: Comput. Neural Syst. 1, 13-25.

Aizenberg, J., Tkachenko, A., Weiner, S.,Addadi, L., and Hendler, G. 2001. Calciticmicrolenses as part of the photoreceptorsystem in brittlestars. Nature 412, 819-822.

Atick, J. J. and Redlich A. N. 1992. What doesthe retina know about natural scenes?Neural Comput. 4, 196-210.

Attneave, F. 1954. Some informationalaspects of visual perception. Psychol. Rev.61, 183-193.

Attewell, D. and Laughlin, S. B. 2001. Anenergy budget for signaling in the greymatter of the brain. J Cereb. Blood FlowMetab. 21, 1133-1145.

Baddeley, R., Abbott, L. F., Booth, M. C.,Sengpiel, F., Freeman, T., Wakeman, E. A.and Rolls, E. T. 1997. Responses ofneurons in primary and inferior temporalvisual cortices to natural scenes. Proc. R.Soc. Lond. B 264, 1775-1783.

Barlow, H. B. 1961. Possible principlesunderlying the transformation of sensorymessages. In: Sensory Communication, ed. W.A. Rosenblith. Cambridge, MA: MITPress.

Barlow, H. B. 2001. Redundancy reductionrevisited. Netw. Comput. Neural Syst. 12,241-253.

Baylor, D. A., Lamb, T. D. and Yau K.W.1979. Responses of retinal rods to singlephotons J. Physiol. (Lond.) 288, 613-634.

Baum, E. B., Moody, J. and Wilczek, F. 1988.Internal representations for associativememory. Biological Cybernetics 59, 217-228.

Bell, A. J. and Sejnowski, T. J. 1997. The'independent components' of naturalscenes are edge filters. Vision Res. 37,3327-3338.

Beloozerova, I. N., Sirota, M. G. andSwadlow, H. A. 2003. Activity of differentclasses of neurons of the motor cortexduring locomotion. J. Neurosci. 23, 1087-1097.

Berkes, P. and Wiskott., L. 2005. Slow featureanalysis yields a rich repertoire of complexcell properties. J. Vision 5, 579-602.

Blaschke, T., Berkes, P. and Wiskott, L. 2006.What Is the Relation Between SlowFeature Analysis and Independent

Page 13: EFFICIENT CODING OF NATURAL IMAGESredwood.psych.cornell.edu/papers/graham-field-neuroscience-2007.pdf · computational models of the early visual system (e.g., Watson, 1983; Daugman,

13

Component Analysis? Neural Comput. 18,2495-2508

Bloom, P. 2000. Descartes' Baby. New York:Basic Books.

Brady, N. and Field, D. J. 2000. Local contrastin natural images: normalisation andcoding efficiency. Perception 29, 1041-1055.

Brecht, M. and Sakmann, B. 2002. Dynamicrepresentation of whisker deflection bysynaptic potentials in spiny stellate andpyramidal cells in the barrels and septa oflayer 4 rat somatosensory cortex. J. Physiol.543, 49-70.

Buchsbaum, G. and Gottschalk, A. 1983.Trichromacy, opponent colours codingand optimum colour informationtransmission in the retina. Proc. R. Soc.Lond. B 220, 89-113.

Burton, G.J. and Moorhead, I. R. 1987. Colorand spatial structure in natural scenes.Appl. Optics 26, 157-170.

Butts, D. A. 2002. Retinal Waves:Implications for synaptic learning rulesduring development Neuroscientist 8, 243-253

Butts, D. A., Feller, M. B., Shatz, C. J. andRokshar, D. S. 1999. Retinal waves aregoverned by collective network properties.J. Neurosci. 19, 3580-3593.

Carrasco, M. M., Razak, K. A. and Pallas, S. L.2005. Visual experience is necessary formaintenance but not development ofrefined retinotopic maps in superiorcolliculus. J Neurophysiol. 94, 1962–1970

Caywood, M. S., Willmore, B. and Tolhurst,D. J. 2004. Independent Components ofColor Natural Scenes Resemble V1Neurons in Their Spatial and ColorTuning. J. Neurophysiol. 91, 2859-2873.

Chandler, D. M. and Field, D. J. 2007.Estimates of the information content anddimensionality of natural scenes fromproximity distributions. J. Opt. Soc. Am. Inpress.

Changizi, M. A. and Shimojo, S. 2005.Parcellation and area-area connectivity asa function of neocortex size. Brain,Behavior and Evolution 66, 88-98

Chapman, B. and Godecke, I. 2000. Corticalcell orientation selectivity fails to developin the absence of on-center retinalganglion cell activity. J. Neurosci. 20, 1922-1930

Chen, Y., Geisler, W. S. and Seidemann, E..2006. Optimal decoding of correlatedneural population responses in theprimate visual cortex. Nature Neurosci. 9,1412-1420.

Cherniak, C. 1994. Component placementoptimization in the brain J. Neurosci. 14,2418-2427.

Cherniak C, Mokhtarzada Z, Rodriguez-Esteban R, Changizi K. 2004. Globaloptimization of cerebral cortex layout.Proc. Natl. Acad. Sci. USA 101, 1081-6.

Chklovskii, D. B., Schikorski, T. and Stevens,C. F. 2002. Wiring optimization in corticalcircuits. Neuron 34, 341-347.

Chklovskii, D. B. and Koulakov, A. A. 2004Maps in the brain: What can we learnfrom them? Ann. Rev. Neurosci. 27, 369-92.

Crotty, P., Sangrey, T. and Levy, W. B. 2006.Energy cost of action potential velocity. J.Neurophysiol. 96, 1237-1246.

Crowley, J. C. and Katz, L. C. 2000. Earlydevelopment of ocular dominancecolumns Science 290, 1321 – 1324.

Daugman, J. G. 1985. Uncertainty relation forresolution in space, spatial frequency, andorientation optimized by two-dimensionalvisual cortical filters. J. Opt. Soc. Am. 2,1160-1169.

David, S. V., Vinje, W. E. and Gallant, J. L.2004. Natural stimulus statistics alter thereceptive field structure of V1 neurons. J.Neurosci. 24, 6991-7006.

Denk, W. and Webb, W.W.1989. Thermal-noise-limited transduction observed inmechanosensory receptors of the innerear. Phys. Rev. Lett. 63, 207-210.

DeWeese, M., Wehr, M. and Zador, A. 2003.Binary spiking in auditory cortex. J.Neurosci. 23, 7940-7949.

Durbin, R. and Mitchison G. 1990. Adimension reduction framework forunderstanding cortical maps. Nature 343,644–47.

Page 14: EFFICIENT CODING OF NATURAL IMAGESredwood.psych.cornell.edu/papers/graham-field-neuroscience-2007.pdf · computational models of the early visual system (e.g., Watson, 1983; Daugman,

14

Farah, M. 2004. Visual Agnosia. Cambridge,Mass.: MIT Press.

Field, D. J. 1987. Relations between thestatistics of natural images and theresponse properties of cortical cells. J. Opt.Soc. Am. A 4, 2379-2394.

Field, D. J. 1993. Scale-invariance and self-similar 'wavelet' transforms: an analysis ofnatural scenes and mammalian visualsystems. In: Wavelets, Fractals and FourierTransforms: New Developments and NewApplications, M. Farge, J. C. R. Hunt and J.C. Vassilicos, eds. Oxford: OxfordUniversity Press.

Field, D. J. 1994. What is the goal of sensorycoding? Neural Comput. 6, 559-601.

Finlay, B. L., Silveira, L. C. L. andReichenbach, A. 2005. Comparativeaspects of visual systemdevelopment. In:The Structure, Function and Evolution of thePrimate Visual System J.Kremers, ed., JohnWiley and Sons.

Frazor, R. A. and Geisler, W. A. 2006. Localluminance and contrast in natural images.Vision Res. 46, 1585-1598.

Gabor, D. 1946. Theory of Communication.J. Inst. Elect. Eng. 93, 429-457.

Ghose, G.M., Yang T. and Maunsell, J. H.2002. Physiological correlates ofperceptual learning in monkey V1 and V2.J. Neurophysiol. 87, 1867–1888.

Graham, D. J., Chandler, D. M. and Field, D.J. 2006. Can the theory of "whitening"explain the center-surround properties ofretinal ganglion cell receptive fields?Vision Res. 46, 2901-2913.

Gross, C. G. 2002. Genealogy of the"Grandmother Cell." Neuroscientist 8, 512-518.

Hahnloser,  R. H. R, Kozhevnikov, A. andFee, M. S. 2002. An ultrasparse codeunderlies the generation of neuralsequences in songbirds. Nature 419, 65-70.

Hofman, M. A. 1983. Energy metabolism,brain size and longevity in mammals.Quant. Rev. Biol. 58, 495.

Hoyer P. O. and Hyvärinen, A.2000. Independent component analysisapplied to feature extraction from colour

and stereo images.  Netw Comput. NeuralSyst. 11, 191-210.

Hyvärinen, A. and Hoyer, P. O. 2000.Emergence of phase and shift invariantfeatures by decomposition of naturalimages into independent featuresubspaces. Neural Comput. 12, 1705-1720.

Jones, J. P. and Palmer, L. A. 1987. Anevaluation of the two-dimensional gaborfilter model of simple receptive fields incat striate cortex. J. Neurophysiol. 58, 1233-1258.

Kayser, C., Körding, K. P. and König, P.2003. Learning the nonlinearity ofneurons from natural visual stimuli. NeuralComput. 15, 1751-1759.

Kendrick, K. M. and Baldwin, B. A. 1987.Cells in temporal cortex of conscioussheep can respond preferentially to thesight of faces. Science 236, 448-450.

Koulakov, A. A. and Chklovskii, D. B. 2001.Orientation preference patterns inmammalian visual cortex: A wire lengthminimization approach. Neuron 29, 519-527.

Kreutz-Delgado, K., Murray, J. F., Rao, B. D.,Engen, K., Lee, T.-W. and Sejnowski, T. J.2003. Dictionary learning algorithms forsparse representation. Neural Comput. 15,349-396.

Lamme, V. A. 1995. The neurophysiology offigure-ground segregation in primaryvisual cortex. J. Neurosci. 15, 1605–1615.

Laughlin, S. B. 1981. A simple codingprocedure enhances a neuron'sinformation capacity. Z. Naturforsch. 36C,910-912.

Laughlin, S. B., de Ruyter van Steveninck, R.R. and Anderson, J. C. 1998 Themetabolic cost of neural information.Nature Neurosci. 1, 36-41.

Laughlin, S. B. and Sejnowski, T. J. 2003.Communication in Neuronal NetworksScience 301, 1870-1874.

Lee, T. W., Wachtler, T. and Sejnowski, T. J.2002. Color opponency is an efficientrepresentation of spectral properties innatural scenes. Vision Res 42, 2095-2103.

Lennie, P. 1998. Single units and visual

Page 15: EFFICIENT CODING OF NATURAL IMAGESredwood.psych.cornell.edu/papers/graham-field-neuroscience-2007.pdf · computational models of the early visual system (e.g., Watson, 1983; Daugman,

15

cortical organization. Perception, 27, 889-935.

Lennie, P. 2003. The cost of corticalcomputation. Curr. Biol. 13, 493-497.

Levy, W. B. and Baxter, R. A. 1996. Energyefficient neural codes. Neural Comput. 8,531-543.

Malo, J. and Gutiérrez, J. 2006. V1 non-linearproperties emerge from local-to-globalnon-liinear ICA. Netw. Comput. Neural Syst.17, 85-102.

Maloney, L. T. and Wandell, B. A. 1986 Colorconstancy: a method for recoveringsurface spectral reflectance. J. Opt. Soc.Am. A 3, 29-33.

Marcelja, S. 1980. Mathematical description ofthe responses of simple cortical cells. J.Opt. Soc. Am. 70, 1297-1300.

Mitchison G. 1991. Neuronal branchingpatterns and the economy of corticalwiring. Proc. R. Soc. Lond. B 245, 151–58.

Mumford, D. 1994. Neuronal architecturesfor pattern-theoretic problems. In Largescale neuronal theories of the brain, C. Kochand J. L. Davis, eds. Cambridge, Mass.:MIT Press.

Murray, S. O., Kersten, D., Olshausen, B. A.,Schrater, P., and Woods, D. L. 2002.Shape perception reduces activity inhuman primary visual cortex. Proc. Natl.Acad. Sci. USA 99, 15164-15169.

Nirenberg, S. Carcieri, S. M., Jacobs, A. L. andLatham, P. E. 2001. Retinal ganglion cellsact largely as independent encoders.Nature 411, 698-701.

Olshausen, B. A.  2003a. Principles of ImageRepresentation in Visual Cortex. In: TheVisual Neurosciences, L.M. Chalupa, J.S.Werner, eds., MIT Press.

Olshausen, B. A. 2003b. Learning sparse,overcomplete representations of time-varying natural images. IEEE InternationalConference on Image Processing. Sept. 14-17,2003. Barcelona, Spain.

Olshausen, B. A., Anderson, C. H. and VanEssen, D. C. 1993.  A neurobiologicalmodel of visual attention and invariantpattern recognition based on dynamic

routing of information. J. Neurosci. 13,4700-4719.

Olshausen, B. A. and Field, D. J. 1996.Emergence of simple cell receptive fieldproperties by learning a sparse code fornatural images. Nature 381, 607-609.

Olshausen, B. A. and Field, D. J. 2004. Sparsecoding of sensory inputs. Curr. Opp.Neurobiol. 14, 481-487.

Olshausen, B. A. and Field, D. J. 2005. Howclose are we to understanding V1? NerualComput. 17, 1665-1699.

Osorio, D. and Vorobyev, M. 1996. Colourvision as an adaptation to frugivory inprimates Proc. R. Soc. Lond. B. 263, 593-599.

Prenger, R., Wu. M. C.-K., David, S. V. andGallant, J. L. 2004. Nonlinear V1responses to natural scenes revealed byneural network analysis. Neural Netw. 17,663-679.

Puchalla, J., Schneidman, E., Harris, R. A. andBerry, M. J. 2005. Redundancy in thepopulation code of the retina. Neuron 46,493-504.

Ranganath, C. and Rainer, G. 2003. Neuralmechanisms for detecting andremembering novel events. Nat. Rev.Neurosci. 4, 193–202.

Rao, R. P. N. and Ballard, D. H. 1997.Dynamic model of visual recognitionpredicts neural response properties in thevisual cortex. Neural Comput. 9, 721-763.

Rainer, G. and Miller E. K. 2000. Effects ofvisual experience on the representation ofobjects in the prefrontal cortex. Neuron 27,179–189.

Rainer, G., Lee, H. and Logothetis, N. 2004.The effect of learning on the function ofmonkey extrastriate visual cortex. PLoSBiol. 2, 275-283.

Reinagel, P. and Reid, R. C. 2000. Temporalcoding of visual in the thalamus. J.Neurosci. 20, 5392-5400.

Rieke, F., Warland, D., de Ruyter vanSteveninck, R. and Bialek, W. 2001. Spikes.Cambridge, Mass.: MIT Press.

Riesenhuber, M. and Poggio, T. 2000. Modelsof object recognition. Nature Neurosci. 3,

Page 16: EFFICIENT CODING OF NATURAL IMAGESredwood.psych.cornell.edu/papers/graham-field-neuroscience-2007.pdf · computational models of the early visual system (e.g., Watson, 1983; Daugman,

16

1199-1204.Rolls, E. T. and Tovee, M. J. 1995. Sparseness

of the neuronal representation of stimuliin the primate temporal visual cortex. J.Neurophysiol. 73, 713-726.

Ruderman, D. L. 1994 Designing receptivefields for highest fidelity. Netw. Comput.Neural Syst. 5, 147 - 155

Schwartz, O. and Simoncelli, E. P. 2001.Natural signal statistics and sensory gaincontrol. Nature Neurosci. 4, 819-825.

Shoham, S., O'Connor, D. H. and Segev, R.2006. How silent is the brain: is there a"dark matter" problem in neuroscience? J.Comp. Physiol. A 192, 777-784.

Simoncelli, E. P. and Olshausen, B. A. 2001.Natural image statistics and neuralrepresentation. Ann. Rev. Neurosci. 24,1193-1215.

Squires, T. M. 2004. Optimizing thevertebrate vestibular semicircular canal:Could we balance any better? Phys. Rev.Lett. 93, 198106.

Srinivasan, M. V., Laughlin, S. B. and Dubs,A. 1982. Predictive coding: a fresh view ofinhibition in the retina. Proc. R. Soc. B Biol.Sci. 216, 427-459.

Sundar, V. C., Yablon, A. D., Grazul, J. L.,Ilan, M. and Aizenberg, J. 2003. Fibre-optical features of a glass sponge. Nature424, 899-900.

Tailor, D. R., Finkel, L. H. and Buchsbaum,G. 2000. Color-opponent receptive fieldsderived from independent componentanalysis of natural images. Vision Res. 40,2671-2676.

Thompson, L. T. and Best, P. J. 1989. Placecells and silent cells in the hippocampusof freely-behaving rats. J. Neurosci. 9, 2382-2390.

Tolhurst, D. J., Tadmor, Y. and Tang, C.(1992). The amplitude spectra of naturalimages. Ophthalmic Physiol. Opt. 12, 229-232.

Treves, A., Panzeri, S., Rolls, E. T., Booth, M.and Wakeman, E. A. 1999. Firing ratedistributions and efficiency of informationtransmission of inferior temporal cortexneurons to natural visual stimuli. Neural

Comput. 11, 601-631.Van Essen, D. C. 1997. A tension-based

theory of morphogenesis and compactwiring in the central nervous system.Nature 385, 313-318.

van Hateren, J. H. and Ruderman, D. 1998.Independent component analysis ofnatural image sequences yields spatio-temporal filters similar to simple cells inprimary visual cortex. Proc. R. Soc. Lond. B265, 2315-2320.

van Hateren, J. H. and van der Schaaf, A.1998. Independent component filters ofnatural images compared with simple cellsin primary visual cortex. Proc. R. Soc. Lond.B 265, 359-366.

van Hateren, J. H. and Snippe, H. P. 2001.Information theoretical evaluation ofparametric models of gain control inblowfly photoreceptor cells, Vision Res. 411851-1865.

Vincent, B. T. and Baddeley, R. J. 2003.Synaptic energy efficiency in retinalprocessing. Vision Res. 43, 1283-1290.

Vinje, W. E. and Gallant, J. L. 2000. Sparsecoding and decorrelation in primary visualcortex during natural vision. Science 287,1273-1276.

Vinje, W. E. and Gallant, J. L. 2002. Naturalstimulation of the nonclassical receptivefield increases information transmissionefficiency in V1. J. Neurosci. 22, 2904-2915.

Wachtler, T., Albright, T. D. and Sejnowski,T. 2001. Nonlocal interactions in colorperception: nonlinear processing ofchromatic signals from remote inducers.Vision Res. 41, 1535-1546.

Watson, A. B. 1983. Detection andrecognition of simple spatial forms. In:Physical and Biological Processing of Images, O.J. Braddick and A. C. Sleigh, eds. Berlin:Springer-Verlag.

Wainwright, M. J. 1999. Visual adaptation asoptimal information transmission. VisionRes. 39, 3960–3974.

Webster, M. A. and Mollon, J. D. 1997.Adaptation and the color statistics ofnatural images. Vision Res. 37, 3283-3298.

Willmore, B. and Tolhurst, D. J. 2001.

Page 17: EFFICIENT CODING OF NATURAL IMAGESredwood.psych.cornell.edu/papers/graham-field-neuroscience-2007.pdf · computational models of the early visual system (e.g., Watson, 1983; Daugman,

17

Characterizing the sparseness of neuralcodes. Netw. Comput. Neural Syst. 12, 255-270.

Wiskott, L. 2005. How does our visual systemachieve shift and size invariance? In: 23Problems in Systems Neuroscience, J. L. vanHemmen and T. J. Sejnowski, eds. OxfordUniversity Press.

Wiskott, L. and Sejnowski, T. J. 2002. Slowfeature analysis: unsupervised earning ofinvariances. Neural Comput. 14, 715-770.

Wong, R. O. L. 1999. Retinal waves and visualsystem development. Ann. Rev. Neurosci.22, 29-47.

Yu, Y. and Lee, T. S. 2004. Adaptive contrastgain control and informationmaximization. Neurocomputing 65-66, 111-116.

Zetzsche, C. and Röhrbein, F. 2001.Nonlinear and extra-classical receptivefield properties and the statistics of naturalscenes. Netw. Comput. Neural Syst. 12, 331-350.

FURTHER READING

Barlow, H. B. 1961. Possible principlesunderlying the transformation of sensorymessages. In: Sensory Communication, ed. W.A. Rosenblith. Cambridge, Mass.: MITPress.

Gross, C. G. 2002. Genealogy of the"Grandmother Cell." Neuroscientist 8, 512-518.

Lennie, P. 1998. Single units and visualcortical organization. Perception, 27, 889-935.

Olshausen, B. A. and Field, D. J. 1996.Emergence of simple cell receptive fieldproperties by learning a sparse code fornatural images. Nature 381, 607-609.

Olshausen, B. A. and Field, D. J. 2005. Howclose are we to understanding V1? NerualComput. 17, 1665-1699.

Rieke, F., Warland, D., de Ruyter vanSteveninck, R. and Bialek, W. 2001. Spikes.Cambridge, Mass.: MIT Press.

Shoham, S., O'Connor, D. H. and Segev, R.2006. How silent is the brain: is there a"dark matter" problem in neuroscience? J.Comp. Physiol. A 192, 777-784.

Simoncelli, E. P. and Olshausen, B. A. 2001.Natural image statistics and neuralrepresentation. Ann. Rev. Neurosci. 24,1193-1215.

Vinje, W. E. and Gallant, J. L. 2000. Sparsecoding and decorrelation in primary visualcortex during natural vision. Science 287,1273-1276.

Wiskott, L. 2005. How does our visual systemachieve shift and size invariance? In: 23Problems in Systems Neuroscience, J. L. vanHemmen and T. J. Sejnowski, eds.,Oxford University Press.

Wong, R. O. L. 1999. Retinal waves and visualsystem development. Ann. Rev. Neurosci.22, 29-47.

CROSS REFERENCES

Activity in Visual DevelopmentEarly Development of Visual CortexInformation CodingRetina ModelsVisual Cortex Models

Page 18: EFFICIENT CODING OF NATURAL IMAGESredwood.psych.cornell.edu/papers/graham-field-neuroscience-2007.pdf · computational models of the early visual system (e.g., Watson, 1983; Daugman,

18

Figure 1. A process that chooses pixel intensity randomly will produce white noise images (A.).Images resembling natural scenes (B.) would almost never occur in such a process.

Figure 2. Noise with a spatial frequency amplitude spectrum like that of natural scenes (A.) has thesame pairwise correlations as the natural scene (B.) but lacks other statistical regularities of scenes.


Recommended