+ All Categories
Home > Documents > Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural...

Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural...

Date post: 10-Aug-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
35
Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal Cortex Nuo Li 1 and James J. DiCarlo 1, * 1 McGovern Institute for Brain Research, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA *Correspondence: [email protected] DOI 10.1016/j.neuron.2010.08.029 SUMMARY We easily recognize objects and faces across a myriad of retinal images produced by each object. One hypothesis is that this tolerance (a.k.a. ‘‘invari- ance’’) is learned by relying on the fact that object identities are temporally stable. While we previously found neuronal evidence supporting this idea at the top of the nonhuman primate ventral visual stream (inferior temporal cortex, or IT), we here test if this is a general tolerance learning mechanism. First, we found that the same type of unsupervised experience that reshaped IT position tolerance also predictably reshaped IT size tolerance, and the magnitude of reshaping was quantitatively similar. Second, this tolerance reshaping can be induced under naturally occurring dynamic visual experience, even without eye movements. Third, unsupervised temporal con- tiguous experience can build new neuronal toler- ance. These results suggest that the ventral visual stream uses a general unsupervised tolerance learning algorithm to build its invariant object repre- sentation. INTRODUCTION Our ability to recognize objects and faces is remarkably tolerant to variation in the retinal images produced by each object. That is, we can easily recognize each object even though it can appear in different positions, sizes, poses, etc. In the primate brain, the solution to this ‘‘invariance’’ problem is thought to be achieved through a series of transformations along the ventral visual stream. At the highest stage of this stream, the inferior temporal cortex (IT), a tolerant object representation is obtained in which individual IT neurons have a preference for some objects (‘‘selectivity’’) over others, and this rank-order prefer- ence is largely maintained across identity-preserving image transformations (Ito et al., 1995; Logothetis and Sheinberg, 1996; Tanaka, 1996; Vogels and Orban, 1996). Though most IT neurons are not strictly ‘‘invariant’’ (DiCarlo and Maunsell, 2003; Ito et al., 1995; Logothetis and Sheinberg, 1996; Vogels and Orban, 1996), reasonably sized populations of these so- called ‘‘tolerant’’ neurons can support object recognition tasks (Afraz et al., 2006; Hung et al., 2005; Li et al., 2009). However, we do not yet understand how IT neurons construct this tolerant response phenomenology. One potentially powerful idea is that time can act as an implicit teacher, in that the temporal contiguity of object features during natural visual experience can instruct the learning of tolerance, potentially in an unsupervised manner (Foldiak, 1991; Masque- lier et al., 2007; Masquelier and Thorpe, 2007; Sprekeler et al., 2007; Stryker, 1991; Wiskott and Sejnowski, 2002; Wyss et al., 2006). The overarching logic is as follows: during natural visual experience, objects tend to remain present for seconds or more, while object motion or viewer motion (e.g., eye move- ments) tend to cause rapid changes in the retinal image cast by each object over shorter time intervals (hundreds of ms). In theory, the ventral stream could construct a tolerant object representation by taking advantage of this natural tendency for temporally contiguous retinal images to belong to the same object, thus yielding tolerant object selectivity in IT cortex. A recent experimental result in adult nonhuman primate IT has provided some neuronal support for this temporal contiguity hypothesis (Li and DiCarlo, 2008). Specifically, we found that alterations of unsupervised experience of temporally contiguous object image changes across saccadic eye movements can induce rapid reshaping (within hours) of IT neuronal position tolerance (i.e., a reshaping of each IT neuron’s ability to respond with consistent object selectivity across the retina). This IT neuronal learning likely has perceptual consequences because similar temporal contiguity manipulations of eye-movement- driven position experience can produce qualitatively similar changes in the position tolerance of human object perception (Cox et al., 2005). However, these previous studies have two key limitations. First, they only uncovered evidence for temporal contiguity learning under a very restricted set of conditions: they showed learning effects only in the context of eye movements, and they only tested one type of tolerance—position tolerance. Because eye movements drive a great deal of the image statis- tics relevant only to position tolerance (temporally contiguous image translations), the previous results could reflect only a special case of tolerance learning. Second, the previous studies did not directly show that temporally contiguous image statistics can build new tolerance, but only showed that alterations of 1062 Neuron 67, 1062–1075, September 23, 2010 ª2010 Elsevier Inc.
Transcript
Page 1: Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal

Neuron

Article

Unsupervised Natural Visual Experience RapidlyReshapes Size-Invariant Object Representationin Inferior Temporal CortexNuo Li1 and James J. DiCarlo1,*1McGovern Institute for Brain Research, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology,Cambridge, MA 02139, USA*Correspondence: [email protected] 10.1016/j.neuron.2010.08.029

SUMMARY

We easily recognize objects and faces across amyriad of retinal images produced by each object.One hypothesis is that this tolerance (a.k.a. ‘‘invari-ance’’) is learned by relying on the fact that objectidentities are temporally stable. While we previouslyfound neuronal evidence supporting this idea at thetop of the nonhuman primate ventral visual stream(inferior temporal cortex, or IT), we here test if thisis a general tolerance learning mechanism. First, wefound that the same type of unsupervised experiencethat reshaped IT position tolerance also predictablyreshaped IT size tolerance, and the magnitude ofreshaping was quantitatively similar. Second, thistolerance reshaping can be induced under naturallyoccurring dynamic visual experience, even withouteye movements. Third, unsupervised temporal con-tiguous experience can build new neuronal toler-ance. These results suggest that the ventral visualstream uses a general unsupervised tolerancelearning algorithm to build its invariant object repre-sentation.

INTRODUCTION

Our ability to recognize objects and faces is remarkably tolerantto variation in the retinal images produced by each object. Thatis, we can easily recognize each object even though it canappear in different positions, sizes, poses, etc. In the primatebrain, the solution to this ‘‘invariance’’ problem is thought to beachieved through a series of transformations along the ventralvisual stream. At the highest stage of this stream, the inferiortemporal cortex (IT), a tolerant object representation is obtainedin which individual IT neurons have a preference for someobjects (‘‘selectivity’’) over others, and this rank-order prefer-ence is largely maintained across identity-preserving imagetransformations (Ito et al., 1995; Logothetis and Sheinberg,1996; Tanaka, 1996; Vogels and Orban, 1996). Though most ITneurons are not strictly ‘‘invariant’’ (DiCarlo and Maunsell,2003; Ito et al., 1995; Logothetis and Sheinberg, 1996; Vogels

and Orban, 1996), reasonably sized populations of these so-called ‘‘tolerant’’ neurons can support object recognition tasks(Afraz et al., 2006; Hung et al., 2005; Li et al., 2009). However,we do not yet understand how IT neurons construct this tolerantresponse phenomenology.One potentially powerful idea is that time can act as an implicit

teacher, in that the temporal contiguity of object features duringnatural visual experience can instruct the learning of tolerance,potentially in an unsupervised manner (Foldiak, 1991; Masque-lier et al., 2007; Masquelier and Thorpe, 2007; Sprekeler et al.,2007; Stryker, 1991; Wiskott and Sejnowski, 2002; Wyss et al.,2006). The overarching logic is as follows: during natural visualexperience, objects tend to remain present for seconds ormore, while object motion or viewer motion (e.g., eye move-ments) tend to cause rapid changes in the retinal image castby each object over shorter time intervals (hundreds of ms). Intheory, the ventral stream could construct a tolerant objectrepresentation by taking advantage of this natural tendency fortemporally contiguous retinal images to belong to the sameobject, thus yielding tolerant object selectivity in IT cortex. Arecent experimental result in adult nonhuman primate IT hasprovided some neuronal support for this temporal contiguityhypothesis (Li and DiCarlo, 2008). Specifically, we found thatalterations of unsupervised experience of temporally contiguousobject image changes across saccadic eye movements caninduce rapid reshaping (within hours) of IT neuronal positiontolerance (i.e., a reshaping of each IT neuron’s ability to respondwith consistent object selectivity across the retina). This ITneuronal learning likely has perceptual consequences becausesimilar temporal contiguity manipulations of eye-movement-driven position experience can produce qualitatively similarchanges in the position tolerance of human object perception(Cox et al., 2005).However, these previous studies have two key limitations.

First, they only uncovered evidence for temporal contiguitylearning under a very restricted set of conditions: they showedlearning effects only in the context of eye movements, andthey only tested one type of tolerance—position tolerance.Because eye movements drive a great deal of the image statis-tics relevant only to position tolerance (temporally contiguousimage translations), the previous results could reflect only aspecial case of tolerance learning. Second, the previous studiesdid not directly show that temporally contiguous image statisticscan build new tolerance, but only showed that alterations of

1062 Neuron 67, 1062–1075, September 23, 2010 ª2010 Elsevier Inc.

Page 2: Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal

those statistics can disrupt normal tolerance. Because of theselimitations, we do not know if the naive ventral stream usesa general, temporal contiguity-driven learning mechanism toconstruct its tolerance to all types of image variation.Here, we set out to test the temporal contiguity hypothesis in

three ways. First, we reasoned that, if the ventral stream is usingtemporal contiguity to drive a general tolerance-building mecha-nism, alterations in that temporal contiguity should reshape othertypes of tolerance (e.g., size tolerance, pose tolerance, illumina-tion tolerance), and the magnitude of that reshaping should besimilar to that found for position tolerance. We decided to testsize tolerance, because normal size tolerance in IT ismuch betterdescribed (Brincat and Connor, 2004; Ito et al., 1995; Logothetisand Sheinberg, 1996; Vogels and Orban, 1996) than pose orillumination tolerance. Our experimental logic follows ourprevious work on position tolerance (Cox et al., 2005; Li and Di-Carlo, 2008). Specifically, when an adult animal with a mature(e.g., size-tolerant) object representation is exposed to analtered visual world in which object identity is consistently swap-ped across object size change, its visual system should learnfrom those image statistics such that it predictably ‘‘breaks’’the size tolerance of that mature object representation.Assuming IT conveys this object representation (Afraz et al.,2006; Hung et al., 2005; Logothetis and Sheinberg, 1996;Tanaka, 1996), that learning should result in a specific changein the size tolerance of mature IT neurons (Figure 1).Second, many types of identity-preserving image transforma-

tions in natural vision do not involve intervening eye movements(e.g., object motion producing a change in object image size). If

the ventral stream is using a general tolerance-building mecha-nism, we should be able to find size tolerance reshaping evenwithout intervening eye movements, and we should also beable to find size tolerance reshaping when the dynamics of theimage statistics mimic naturally occurring image dynamics.Third, our previous studies (Cox et al., 2005; Li and DiCarlo,

2008) and our first two aims above use the breaking of naturallyoccurring image statistics to try to break the normal toleranceobserved in IT (i.e., to weaken existing IT object selectivity in aposition- or size-specificmanner; Figure 1). Such results supportthe inference that naturally occurring image statistics instruct the‘‘building’’ of that tolerance in the naive ventral stream. However,we also sought to test that inference more directly by looking forevidence that temporally contiguous image statistics can buildnew tolerance in IT neurons with immature tolerance (i.e., canproduce an increase in existing IT object selectivity in a position-or size-specific manner).Our results showed that targeted alterations in the temporal

contiguity of visual experience robustly and predictably re-shaped IT neuronal size tolerance over a period of hours. Thischange in size tolerance grew gradually stronger with increasingvisual experience, and the rate of reshaping was very similar topreviously reported position tolerance reshaping (Li and DiCarlo,2008). Second, we found that the size tolerance reshapingoccurred without eye movements, and it occurred when thedynamics of the image statistics mimicked naturally occurringdynamics. Third, we found that exposure to ‘‘broken’’ temporalcontiguity image statistics could weaken and even reverse thepreviously normal IT object selectivity at a specific position or

Exposure phase

Fully altered IT

size tolerance

Normal IT

size tolerance

Response

A

C

Test

phase

Exposure

phase

B

P

N

Object size

Non-swap Swap

1.5˚

P

4.5˚ 9˚

N

Object size

Swap

IT S

ele

ctivity

(P

- N

)

Time

Time~1 hr~10 min

100 ms 100-200 ms

Non-swap exposure event

Swap exposure event

Free

viewing

Free

viewing

Object PObject PObject P

Object NObject PObject P

Free

viewing

Free

viewing

Test

phase

Exposure

phase

Test

phase

Non-swap

No prediction

Figure 1. Experimental Design and Prediction(A) IT selectivity was tested in the Test Phases whereas animals received experience in the altered visual world in the Exposure Phases.

(B) The chart shows the full exposure design for a single IT site in Experiment I. Arrows show the temporal contiguity experience of retinal images (arrow heads

point to the retinal images occurring later in time, e.g., A). Each arrow shows a particular exposure event type (i.e., temporally linked images shown to the animal),

and all eight exposure event types were shown equally often (randomly interleaved) in each Exposure Phase.

(C) Prediction for IT responses collected in the Test Phase: if the visual system builds size tolerance using temporal contiguity, the swap exposure should cause

incorrect grouping of two different object images (P and N). The qualitative prediction is a decrease in object selectivity at the swap size (images and data points

outlined in red) that grows stronger with increasing exposure (in the limit, reversing object preference as illustrated schematically here), and little or no change in

object selectivity at the non-swap size. The experiment makes no quantitative prediction for the selectivity at the medium size (gray oval, see text).

Neuron

Natural Experience Reshapes IT Size Tolerance

Neuron 67, 1062–1075, September 23, 2010 ª2010 Elsevier Inc. 1063

Page 3: Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal

size (i.e., exposure could break old correct tolerance and buildnew ‘‘incorrect’’ tolerance), and that naturally occurring temporalcontiguity image statistics could build new, correct position orsize tolerance. Taken together with previous work, these resultsargue that the ventral stream uses unsupervised, natural visualexperience and a common learning mechanism (a.k.a. unsuper-vised temporal tolerance learning, or UTL) to build and maintainits tolerant (invariant) object representation.

RESULTS

In three separate experiments (Experiments I, II, III), twounsupervised nonhuman primates (Rhesus monkeys, Macacamulatta) were exposed to altered visual worlds in which wemanipulated the temporal contiguity statistics of the animals’visual experience with object size (Figure 1A, Exposure Phases).In each experiment, we recorded multiunit activity (MUA) in anunbiased sample of recording sites in the anterior region of ITto monitor any experience-induced change (Figure 1A, TestPhases). Specifically, for each IT site, a preferred object (P)and a less-preferred object (N) were chosen based on testingof a set of 96 objects (Figure 1B). We thenmeasured the baselineIT neuronal selectivity for P and N at three retinal sizes (1.5!, 4.5!,and 9!) in a Test Phase ("10 min) by presenting the objectimages in a rapid but naturally paced sequence (5 images/s)on the animals’ center of gaze. For all the results below, wereport selectivity values determined from these Test Phases,which we conducted both before and after experience manipu-lations. Thus, all response data shown in the results belowwere collected during orthogonal behavioral tasks in whichobject identity and size were irrelevant (Supplemental Experi-mental Procedures available online).

Consistent with previous reports (Kreiman et al., 2006), theinitial Test Phase data showed that each IT site tended to main-tain its preference for object P over object N at each size testedhere (Figures 3 and S3 available online). That is, most IT sitesshowed good, baseline size tolerance. Following the logic out-lined in the Introduction, the goal of Experiments I–III was todetermine if consistently applied, unsupervised experiencemanipulations would predictably reshape that baseline sizetolerance of each IT site (see Figure 1 for the basic prediction).In particular, we monitored changes in each IT site’s preferencefor object P over N at each of the three objects sizes, and anychange in that selectivity following experience that was notseen in control conditions was taken as evidence for an experi-ence-induced reshaping of IT size tolerance.

In each experiment, the key experience manipulation wasdeployed in one or more Exposure Phases that were all underprecise, automated computer-display control to implementspatiotemporally reliable experience manipulations (see Experi-mental Procedures). Specifically, during each Exposure Phasethe animals freely viewed a gray display monitor on whichimages of object P or N intermittently appeared at randomlychosen retinal positions away from the center of gaze (objectsize: 1.5!, 4.5!, or 9!). The animals almost always looked tofoveate each object (>95% of object appearances) within"124 ms (mean; median, 109 ms), placing the object image onthe center of gaze. Following that object acquisition saccade,

we reliably manipulated the visual experience of the animalsover the next 200–300 ms. The details of the experience manip-ulation (i.e., which object sizes where shown and the timing ofthose object images) were different in the three experiments,but all three experiments used the same basic logic outlined inthe Introduction and in Figure 1.

Experiment I: Does Unsupervised Visual ExperienceReshape IT Size Tolerance?In Experiment I, following the object acquisition saccade, we leftthe newly foveated object image unchanged for 100 ms, andthen we changed the size of the object image (while its retinalposition remained on the animal’s center of gaze) for the next100 ms (Figure 1A). We reasoned that this creates a temporalexperience linkage (‘‘exposure event’’) between one objectimage at one size and another object image at another size.Importantly, on half of the exposure events, one object wasswapped out for the other object: for example, a medium-sized(4.5!) object P would become a big (9!) object N (Figure 1A,‘‘swap exposure event’’). As one key control, we also exposedthe animal to more normal exposure events in which objectidentity did not change during the size change (Figure 1A,‘‘non-swap exposure event’’). The full exposure design for oneIT site is shown in Figure 1B; the animal received 800–1600swap exposures within the time period of 2–3 hr. Each day, wemade continuous recordings from a single IT site, and we alwaysdeployed the swap exposure at a particular object size (either1.5! or 9!, i.e., swap size) while keeping the other size as a control(i.e., non-swap size). Across different IT sites (i.e., differentrecording days), we strictly alternated the object size at whichswap manipulation took place so that object size was counter-balanced across our recorded IT population (n = 27).UTL theory makes the qualitative prediction that the altered

experience will induce a size-specific confusion of object identityin the IT response as the ventral stream learns to associate thetemporally linked images. In particular, our exposure designshould cause the IT site to reduce its original selectivity forimages of object P and N at the swap size (perhaps evenreversing that selectivity in the limit of large amounts of experi-ence; Figure 1C, red). UTL is not currently specific enough tomake a quantitative prediction of what this altered experienceshould do for selectivity among the medium object size imagesbecause those images were temporally paired in two ways:with images at the swap size (altered visual experience) andwith the images at the non-swap size (normal visual experience).Thus, our key experimental prediction and planned comparisonis between the selectivity (P versus N) at the swap and non-swapsize: we predict a selectivity decrease at the swap size thatshould be much larger than any selectivity change at the non-swap object size (Figure 1C, blue).This key prediction was born out by the data: as the animals

received experience in the altered visual world, IT selectivityamong objects P and N began to decrease at the swap size,but not at the control size. This change in selectivity grewstronger with increasing experience over the time course of2–3 hr (Figure 2A). To quantify the selectivity change, for eachIT site, we took the difference between the selectivity (P # N,response difference in units of spikes/s, see Experimental

Neuron

Natural Experience Reshapes IT Size Tolerance

1064 Neuron 67, 1062–1075, September 23, 2010 ª2010 Elsevier Inc.

Page 4: Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal

Procedures) in the first (pre-exposure) and last Test Phase. ThisD(P#N) sought to quantify the total amount of selectivity changefor each IT site induced by our experience manipulation. Onaverage, there was a significant decrease in selectivity at theswap size (Figure 2B, p < 0.0001, two-tailed t test against 0)and no significant change at the non-swap control size(Figure 2B, p = 0.89). Incidentally, we also observed a significantdecrease in selectivity at the medium size (p = 0.002). This is notsurprising given that the images at the medium object size wereexposed to the altered statistics half of the time when they weretemporally paired with the images at the swap size. Because noprediction wasmade about the selectivity change at the mediumsize, we concentrate below on the planned comparison betweenthe swap and non-swap size. We statistically confirmed the size

! (P

-N

) s

pik

es /

s

A B

!20

!15

!10

!5

0

5

Non-swap Swap

Object size

**

n=27

**

n.s.

0 800 1600

!20

!15

!10

!5

0

5

Number of exposure events

To

tal ch

an

ge

in

se

lectivity

! (P

-N

) s

pik

es /

sNon-

swap

Swap

!20 !10 0 10 20

!20

!10

0

10

! (P

-N

) s

pik

es /

s /

80

0 e

xp

.

Sw

ap

s

! (P-N) spikes /s /800Non-Swap s

E

! (P

-N

) s

pik

es /

s /

80

0 e

xp

.s !7.5

!5

0

!2.5

2.5

Foveate

image

Time

Experiment I

100 ms 100 - 200 ms

Experiment II

!50 !25 0 25 50

!50

!25

0

25

!10

!5

0

Non-swapSwap

M1 M2

!20

!10

0M1

M2

C D! (P-N)s

(P

-N

) s

pik

es /

s

0

30

0 1600

Number of exposure events

-30

! (P

-N

) s

pik

es /

s /

80

0 e

xp

.

Ch

an

ge

in

se

lectivity, S

wa

p

s

! (P

-N

) s

pik

es /

s /

80

0 e

xp

.s

! (P-N) spikes /s /800

Change in selectivity, Non-Swap

s

Foveate

200 ms 100 ms

Fn=15

Figure 2. Experimental I and II Key Results(A) Mean ± SEM. IT object selectivity change, D(P # N), from

the first Test Phase as a function of the number of exposure

events is shown. Each data point shows the average across

all the sites tested for that particular amount of experience

(n = 27, 800 exposure events; n = 22, 1600 exposure events).

(B) Mean ± SEM selectivity change at the swap, non-swap,

and medium size (4.5!). For each IT site (n = 27), total D(P# N)

was computed using the data from the first and last Test

Phase, excluding any middle Test Phase data. Hence, not all

data from (A) were included. *p < 0.05 by two-tailed t test;

**p < 0.01; n.s. p > 0.05.

(C) For each IT site (n = 27), we fit a line (linear regression) to the

(P # N) data as a function of the number of exposure events

(insert). We used the slope of the line fit, Ds(P # N), to quantify

the selectivity change. The Ds(P # N) is a measure that lever-

ages all our data while normalizing out the variable of exposure

amount [for sites with only two Test Phases, Ds(P # N) equals

D(P # N)]. Ds(P # N) was normalized to show selectivity

change per 800 exposure events. Error bars indicate the

standard error of the procedure to compute selectivity

(Supplemental Experimental Procedures). M1, monkey 1;

M2, monkey 2.

(D) Mean Ds(P # N) at the swap and non-swap size (n = 27 IT

sites; M1: 7, M2: 20). Error bars indicate SEM over neuronal

sites.

(E) Change in selectivity, Ds(P # N), of all IT sites from

Experiment II at the swap and non-swap size.

(F) Mean ± SEM Ds(P # N) at the swap and non-swap size.

specificity of the experience-induced decrease inselectivity by two different approaches: (1) a directt test on the D(P # N) between the swap and non-swap size (p < 0.001, two-tailed), and (2) a signifi-cant interaction of ‘‘exposure 3 object size’’ onthe raw selectivity measurements (P # N)—that is,IT selectivity was decreased by exposure only atthe swap size (p = 0.0018, repeated-measuresANOVA; p = 0.006, bootstrap, see SupplementalExperimental Procedures).To ask if the experience-induced selectivity

change was specific to the manipulated objectsor the features contained in those objects, wealso tested each IT site’s responses to a secondpair of objects (P0 and N0, control objects; seeExperimental Procedures). Images of these control

objects at three sizes were tested together with the swap objectsduring all Test Phases (randomly interleaved), but they were notshown during the Exposure Phase. On average, we observed nochange in IT selectivity among these unexposed control objects(Figure S4). This shows that that the experience-induced reshap-ing of IT size tolerance has at least some specificity for the expe-rienced objects or the features contained in those objects.We next set out to quantify the amount of IT size tolerance

reshaping induced by the altered visual experience. Becauseeach IT site was tested for different amounts of exposure time(due to experimental time constraints), we wanted to controlfor this and still leverage all the data for each site to gain maximalpower. To do so, we fit linear regressions to the (P#N) selectivityof individual sites at each object size (Figure 2C, insert). The

Neuron

Natural Experience Reshapes IT Size Tolerance

Neuron 67, 1062–1075, September 23, 2010 ª2010 Elsevier Inc. 1065

Page 5: Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal

slope of the line fit, which we will refer to as Ds(P # N), providedus with a sensitive, unbiased measure of the amount of selec-tivity change that normalizes the amount of exposure experi-ence. The Ds(P # N) for the swap size and non-swap size isshown in Figures 2C and 2D, which qualitatively confirmedthe result obtained in Figure 2B (using the simple measure ofselectivity change), and showed a mean selectivity change of#9.2 spikes/s for every 800 swap exposure events.

Importantly, we note that this reshaping of IT tolerance wasinduced by unsupervised exposure to temporally linked imagesthat did not include a saccadic eye movement to make that link(Figure 1A). We also considered the possibility that small inter-vening microsaccades might still have been present, but foundthat they cannot account for the reshaping (Figure S7). Thesize specificity of the selectivity change also rules out alternativeexplanations such as adaptation, which would not predict thisspecificity (because our exposure design equated the amountof exposure for both the swap and non-swap size). We alsofound the same amount of tolerance reshaping when the siteswere grouped by the physical object size at which we deployedthe swap (1.5! versus 9!, p = 0.26, t test). Thus the learning isindependent of low-level factors like the total luminance of theswapped objects. In sum, we found that unsupervised, tempo-rally linked experience with object images across object sizechange can reshape IT size tolerance.

Experiment II: Does Size Tolerance Learning Generalizeto the ‘‘Natural’’ Visual World?In the natural world, objects tend to undergo size changesmoothly on our retinas as a result of object motion or viewermotion, but, in Experiment I (above), the object size changeswe deployed were discontinuous: one image of an object wasimmediately replaced by an image of another object with nosmooth transition (Figure 2, top). Therefore, although thoseresults show that unsupervised experience with object imagesat different sizes linked in time could induce the predicted ITselectivity change, we wanted to know if that learning was alsofound during exposure to more natural (i.e., temporally smooth)image dynamics.

To answer this question, we carried out a second experiment(Experiment II) in which we deployed essentially the samemanipulation as Experiment I (object identity changes duringobject size changes, no intervening eye movement), but withnatural (i.e., smooth-varying) stimulus sequences. The dynamicsin these movie stimuli were closely modeled after the kind ofdynamics that our visual system encounters daily in the naturalenvironment (Figure S2). To create smooth-varying objectidentity changes over object size changes, we created morphlines between pairs of objects we swapped in Experiment I (Pand N). This allowed us to parametrically transform the shapeof the objects (Figure 2, bottom). All other experimental proce-dures were identical to Experiment I except, in the ExposurePhases, objects underwent size change smoothly whilechanging identity (swap exposure) or preserving identity (non-swap exposure, Figure S2).

When we carried out this temporally smooth experiencemanipulation on a new population of IT sites (n = 15), we repli-cated the Experiment I results (Figures 2E and 2F): there was a

predicted decrease in IT selectivity at the swap size and not atthe non-swap control size. This size specificity of the effectwas, again, confirmed statistically by (1) direct t test on the totalselectivity change, D(P # N), between the swap and non-swapsize [D(P # N) = #10.3 spikes/s at swap size, +2.8 at non-swap size; p < 0.0001, two-tailed t test]; and (2) a significantinteraction of ‘‘exposure 3 object size’’ on the raw selectivitymeasurements (P # N) (p < 0.001, repeated-measures ANOVA;p = 0.001, bootstrap). This result suggests that image linkingacross time is sufficient to induce tolerance learning in IT andis robust to the temporal details of that image linking (at leastover the "200 ms time windows of linking used here). Moreimportantly, Experiment II shows that unsupervised sizetolerance learning occurs in a spatiotemporal image regimeencountered in real-world vision.

Size Tolerance Learning: Observations and Effect SizeComparisonDespite a wide diversity in the initial tuning of the recorded ITmultiunit sites, our experience manipulation induced a predict-able selectivity change that was large enough to be observedin individual IT sites: 40% (17/42 sites, Experiment I and II datacombined) of the individual IT sites showed a significant selec-tivity decrease at the swap size within a single recording session(only 7% of sites showed significant selectivity decrease at thenon-swap size, which is essentially the fraction expected bychance; 3/42 sites, p < 0.05, permutation test, see SupplementalExperimental Procedures). Eight example sites are shown inFigure 3.We found that the magnitude of size-tolerance reshaping

depended on the initial selectivity at the medium object size,4.5! (Pearson correlation, r = 0.54, p < 0.01). That is, on average,IT sites that we initially encountered with greater object selec-tivity at the medium size underwent greater exposure-inducedselectivity change at the swap size. This correlation is not simplyexplained by the hypothesis that it is easier to break highlyselective neurons (e.g., due to factors that might have nothingto do with neuronal learning, such as loss of isolation), becausethe correlation was not seen for changes in selectivity at thenon-swapped size (r = #0.16, p = 0.35) and we found noaverage change in selectivity at the non-swapped size (Figure 2and statistics above). Instead, this observation is consistentwith the overarching hypothesis of this study: the initial imageselectivity at the medium object size provides (at least part of)the driving force for selectivity learning because those imagesare temporally linked with the swapped images at the swapsize.The change in selectivity produced by the experience manip-

ulation was found throughout the entire time period of the ITresponse, including the earliest part of that period where ITneurons are just beginning to respond above baseline("100 ms from stimulus onset, Figure S5). This shows that theexperience-induced change in IT selectivity cannot be explainedby changes in long lag feedback alone (>100 ms; also seeDiscussion). On average, the selectivity change at the swapsize resulted from both a decrease in the response to the imageof the preferred object (P) and an increase in the response to theless preferred object (N). Consistent with this, we found that the

Neuron

Natural Experience Reshapes IT Size Tolerance

1066 Neuron 67, 1062–1075, September 23, 2010 ª2010 Elsevier Inc.

Page 6: Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal

experience manipulation produced no average change in the ITsites’ mean response rate (Figure S5).In this study, we concentrated on multiunit response data

because it had a clear advantage as a direct test of our hypoth-esis—it allowed us to longitudinally track IT selectivity duringaltered visual experience across the entirety of each experi-mental session. We also examined the underlying single-unitdata and found results that were consistent with the multiunitdata. Figure 4A shows an example of a rare single-unit ITneuronal recording that we were able to track across an entirerecording session ("3 hr). The confidence that we wererecording from the same unit comes from the consistency ofthe unit’s waveform and its consistent pattern of responseamong the nonexposed control object images (Figure 4B).During this stable recording, the (P # N) selectivity at the swapsize gradually decreased while the selectivity at the non-swapsize remained stable, perfectly mirroring the multiunit resultsdescribed above. However these "3 hr single-unit recordingswere very rare because single units have limited hold-time inthe awake primate physiology preparation. Thus we took amore standard population approach to analyze the single-unitdata (Baker et al., 2002; Kobatake et al., 1998; Sakai and Miya-shita, 1991; Sigala et al., 2002). Specifically, we performedspike-sorting analyses to obtain clear single units from eachTest Phase (Experimental Procedures). We considered eachsingle unit obtained from each Test Phase as a sample ofthe IT population, taken either before or after the experience inthe altered visual world. This analysis does not require that the

120

160

200

50

100

150

0

200

400

60

90

120

20

40

60

120

160

200

1.5 4.5 9

80

110

140

1.5 4.5 91.5 4.5 9

100

200

300

1.5 4.5 9

Swap Swap

A BBefore

exposure

After

exposure

Before

exposure

After

exposure

Re

sp

on

se

(sp

ike

s/s

)

Object size (deg)

Swap Swap

Site 1

4

3

2

8

7

6

5

Figure 3. Example Single IT SitesMean ± SEM. IT response to P (solid square) and N (open

circle) as a function of object size for eight example IT sites

(from both Experiment I and II). The data shown are from the

first (‘‘before exposure’’) and last (‘‘after exposure’’) Test

Phase. (A) Swap size, 1.5!; (B) swap size, 9! (highlighted by

red boxes and arrows). Gray dotted lines show the baseline

response to a blank image (interleaved with the test images).

sampled units were the same neurons. The predic-tion is that IT single units sampled after exposure(i.e., at the last Test Phase of each day) would beless size tolerant at the swap size than at the non-swap size. This prediction was clearly observed inour single-unit data (Figure 4C, after exposure,p < 0.05; for reference, the size tolerance beforethe exposure is also shown and we observed nodifference between the swap and non-swap size).The result was robust to the choice of the criteriato define ‘‘single units’’ (Figure S6). Similarly, wefound that each single-unit population sampledafter successively more exposure showed asuccessively larger change in size tolerance(Figure 4D).We next aimed to quantify the absolute magni-

tude of this size tolerance learning effect acrossthe different experience manipulations deployedhere, and to compare that magnitude with ourprevious results on position-tolerance learning (Li

and DiCarlo, 2008). To do this, we plotted the mean selectivitychange at the swap size from each experiment as a function ofnumber of swap exposures (Figure 5). We found that Experi-ments I and II produced a very similar magnitude of learning:"5 spikes/s per 400 swap exposures (also see Discussion forcomparison to previous work). This effect grew larger at thisapproximately constant rate for as long as we could run eachexperiment, and the magnitude of the size tolerance learningwas remarkably similar to that seen in our previous study of posi-tion tolerance (Li and DiCarlo, 2008).

Size and Position Tolerance Learning: Reversing Old ITObject Selectivity and Building New IT Object SelectivityThe results on size tolerance presented above and our previousstudy of position tolerance (Li and DiCarlo, 2008) both used thebreaking of naturally occurring temporal contiguity experienceto discover that we can break normal position tolerance andsize tolerance (i.e., we can cause a decrease in adult IT objectselectivity in a size- or position-specific manner). While theseresults are consistent with the inference that naturally occurringimage statistics instruct the original building of that normal toler-ance (see Introduction), we next sought to test that inferencemore directly. Specifically, we asked if the temporal contiguitystatistics of visual experience can instruct the creation of newIT tolerance (i.e., if they can cause an increase in IT object selec-tivity in a size- or position-specific manner). Our experimentaldata offered two ways to test this idea (below), and both waysrevealed that unsupervised temporal contiguity learning could

Neuron

Natural Experience Reshapes IT Size Tolerance

Neuron 67, 1062–1075, September 23, 2010 ª2010 Elsevier Inc. 1067

Page 7: Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal

indeed build new IT tolerance. To do these analyses, we tookadvantage of the fact that we found very similar effects forboth size tolerance and position tolerance (Li and DiCarlo,2008), and we maximized our power by pooling the data acrossthis experiment (Figure 5: size experiment I, II; n = 42 MUA sites)and our previous position experiment (n = 10 MUA sites). Thispooling did not qualitatively change the result—the effectsshown in Figures 5 and 6 below were seen in the size tolerancedata alone (Figure S9).

First, as outlined in Figure 1C, a strong form of the UTL hypoth-esis predicts that our experience manipulation should not onlydegrade existing IT selectivity for P over N at the swap size/posi-tion, but should eventually reverse that selectivity and then buildnew incorrect selectivity for N over P (Figure 1C; note that werefer to this as incorrect selectivity because the full IT responsepattern is inappropriate for the veridical world in which objectsmaintain their identity across changes in position and size).Though the plasticity we discovered is remarkably strong("5 spikes/s per hour), it did not produce a selectivity reversalfor the ‘‘mean’’ IT site within the 2 hr recording session(Figure S5D). Instead, it only produced a "50% decrease inselectivity for that mean site, which is entirely consistent with

the fact that our mean IT site had reasonably strong initial selec-tivity for P over N (mean P # N = "20 spikes/s). To look moredeeply at this issue, we made use of the well-known observationthat not all adult IT neurons are identical— some have a largeamount of size or position tolerance, whereas others showa small amount of tolerance (DiCarlo and Maunsell, 2003; Itoet al., 1995; Logothetis and Sheinberg, 1996; Op De Beeckand Vogels, 2000). Specifically, some IT sites strongly preferobject P to N at some sizes/positions, but show only weak(P # N) selectivity at the swap sizes/positions (this neuronalresponse pattern is illustrated schematically at the top ofFigure 6). We reasoned that examination of these sites shouldreveal whether our experience manipulation is capable ofcausing a reversal in selectivity and building of new selectivity.Thus, we used independent data to select neuronal subpopula-tions from our data pool with varying amounts of initial selectivityat the swap size/position (Supplemental Experimental Proce-dures). Note that all of these neuronal sites had robust selectivityfor P over N at the medium sizes/positions (as schematicallyillustrated in Figure 6A). This analysis revealed that our manipu-lation caused neuronal sites with weak initial selectivity at theswap size/position to reverse their selectivity, and to build new

Figure 4. Single-Unit Results(A) P versus N selectivity of a rare single-unit IT neuron that was isolated across an entire recording session ("3 hr).

(B) The example single-unit’s response to the six control object images during each Test Phase and its waveforms (gray: all traces from a Test Phase; red: mean).

(C) Mean ± SEM size tolerance at the swap (red) and non-swap (blue) size for single units obtained before and after exposure. Size tolerance for the control objects

is also shown at these two sizes (black). Each neuron’s size tolerance was computed as (P#N)/(P#N)medium, where (P#N) is the selectivity at the tested size and

(P#N)medium is the selectivity at the medium object size. Only units that showed selectivity at the medium size were included [(P#N)medium > 1 spikes/s]. The top

and bottom panels include neurons that had selectivity for the swap objects, the control objects, or both. Thus they show different but overlapping populations of

neurons. The result is unchanged if we only examine populations for which each neuron has selectivity for both the swap and control objects (i.e., the intersections

of the neuronal populations in top and bottom panels; Figure S6).

(D) Mean ± SEM size tolerance at the swap size further broken out by the amount of exposure to the altered visual statistics. To quantify the change in IT size

tolerance, we performed linear regression of the size tolerance as a function of the amount of experience. Consistent with the multiunit results, we found a signif-

icant negative slope (D size tolerance = #0.84 per 800 exposure; p = 0.002, bootstrap; c.f. #0.42 for multiunit, Figure S6). No decrease in size tolerance was

observed at the non-swap control size (D size tolerance = 0.30; c.f. 0.12 for multiunit).

Neuron

Natural Experience Reshapes IT Size Tolerance

1068 Neuron 67, 1062–1075, September 23, 2010 ª2010 Elsevier Inc.

Page 8: Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal

selectivity (building incorrect selectivity for N over P), exactly aspredicted by the UTL hypothesis (Figure 6).A second way in which our data might reveal whether UTL can

build tolerance is to carefully look for any changes in selectivityat the non-swap (control) size/position. Our experiment wasdesigned to present a large number of normal temporal conti-guity exposures at that control size/position so that we wouldperfectly equate its amount of retinal exposure with thatprovided at the swap size/position. Although some forms ofunsupervised temporal contiguity theory might predict thatthese normal temporal contiguity exposures should increasethe (P # N) selectivity at the control size/position, we did notinitially make that prediction (Figure 1C, blue) because wereasoned that most IT sites would already have strong, adult-like selectivity for object P versus N at that size/position, suchthat further supporting statistics would have little to teach thoseIT sites (Figure 7A, top right). Consistent with this, we found littlemean change in (P # N) selectivity for the control condition ineither our position tolerance experiment (Li and DiCarlo, 2008)or our size tolerance experiment (Figure 2, blue). However,examination of all of our IT sites revealed that some siteshappened to have initially weak (P # N) selectivity at the controlsize/position while still having strong selectivity at the mediumsize/position (Figure 7A, top left). This suggested that these sitesmight be in a more naive state with respect to the particularobjects being tested such that our temporal contiguity statisticsmight expand their tolerance for these objects (i.e., increase theirP # N selectivity at the control size/position). Indeed, examina-tion of these sites reveals that our exposure experiment causeda clear, significant building of new, correct selectivity amongthese sites (Figure 7B), again directly demonstrating that unsu-pervised temporal contiguity experience can build IT tolerance.

Experiment III: Does the Learning Dependon the Temporal Direction of the Experience?Our results show that targeted alteration of unsupervised naturalvisual experience rapidly reshapes IT size tolerance—as pre-dicted by the hypothesis that the ventral stream uses a temporal

contiguity learning strategy to build that tolerance in the firstplace. Several instantiated computational models show howthis conceptual strategy can build tolerance (Foldiak, 1991; Mas-quelier et al., 2007; Masquelier and Thorpe, 2007; Wallis andRolls, 1997; Wiskott and Sejnowski, 2002; Wyss et al., 2006),and such models can be implemented using variants of Heb-bian-like learning rules that are dependent on the timing of spikes(Gerstner et al., 1996; Sprekeler et al., 2007; Wallis and Rolls,1997; Morrison et al., 2008; Sprekeler and Gerstner, 2009). Thetime course and task independence of the observed learningare consistent with synaptic plasticity (Markram et al., 1997; Me-liza and Dan, 2006), but our data do not constrain the underlyingmechanism. One can imagine ventral stream neurons usingalmost temporally coincident activity to learn which sets of itsafferents correspond to features of the same object acrosssize changes. If tolerance learning is spike timing dependent,any experience-induced change in IT selectivity might reflectany temporal asymmetries at the level of the underlying synapticlearning mechanism. For example, one hypothesis is thatlingering postsynaptic activity caused by temporally leadingimages drives synaptic plasticity in afferents activated by tempo-rally lagging images. Alternatively, afferents activated by tempo-rally leading imagesmight bemodified by the later arrival of post-synaptic activity caused by temporally lagging images. Ora combination of both hypotheses might be the case. To lookfor reflections of any such underlying temporal asymmetry, wecarried out a third experiment (Experiment III) centered on thequestion, ‘‘Do temporally leading images teach temporallylagging ones, or vice-versa?’’We deployed the same experience manipulation as before

(linking of different object images across size changes, thesame as Experiment I), but this time only in one direction(compare single-headed arrows in Figure 8A with double-headed arrows in Figure 1B). For example, during the recordingof a particular IT site, the animal only received experience seeingobjects temporally transition from a small size (arrow ‘‘tail’’ in Fig-ure 8A) to a large size (arrow ‘‘head’’ in Figure 8A) while swappingidentity. We strictly alternated the temporal direction of the expe-rience across different IT sites. That is, for the next IT site we re-corded, the animal experienced objects transitioning froma largesize to a small size while swapping identity. Thus, object sizewascounterbalanced across our recorded population, so that wecould isolate changes in selectivity among the temporallyleading stimuli (i.e., arrow tail stimuli) from changes in selectivityamong the temporally lagging stimuli (i.e., arrow head stimuli). Asin Experiments I and II, wemeasured the expression of any expe-rience-induced learning by looking for any change in (P # N)selectivity at each object size measured in a neutral task withall images randomly interleaved (Test Phase). We replicatedthe results in Experiments I and II in that a decrease in (P # N)selectivity was found following swapped experience (red barsare negative in Figure 8B). When we sorted our data based onthe temporal direction of the animals’ experience, we foundgreater selectivity change (i.e., learning) for the temporallylagging images (Figure 8B). This difference was statisticallysignificant (p = 0.038, n = 31, two-tailed t test) and cannot be ex-plained by any differences in the IT sites’ initial selectivity(Figure S4C; also see Figure S4B for results with all sites

0 800 1600

!10

!5

0

Number of exposure events

! (P

-N

) s

pik

es /

s

!15

Ch

an

ge

in

se

lectivity

Non-swap

Experiment I

Swap

Experiment II

Experiment III: temporally-leading

Experiment III: temporally-lagging

Position experiment (SUA)

Position experiment (MUA)

Li & DiCarlo, 2008

Figure 5. Effect Size Comparisons across Different ExperienceManipulationsMean object selectivity change as a function of the number of swap

exposure events for different experiments. For comparison, the data from

a position tolerance learning experiment (Li and DiCarlo, 2008) are also shown.

Plot format is the same as Figure 2A without the error bars. Mean ± SEM D

(P#N) at the non-swap size/position is shown in blue (all experiments pooled).

SUA, single-unit activity; MUA, multiunit activity.

Neuron

Natural Experience Reshapes IT Size Tolerance

Neuron 67, 1062–1075, September 23, 2010 ª2010 Elsevier Inc. 1069

Page 9: Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal

included). This result is consistent with an underlying learningmechanism that favors experience-induced plasticity of theafferents corresponding to temporally lagging images.

To test if the tolerance learning spread beyond the specificallyexperienced images, here, we also tested object images at anintermediate size (3!) between the two exposed sizes (Figure 8).Unlike as in Experiments I and II, this medium size was notexposed to the animals during the Exposure Phase (it was alsoat a different physical size from the medium size in Experiments

B

AFully altered

position/size tolerance

Normal

position/size tolerance

Response

P

N

Position / Size

(in the limit)

Exposure

Response

Experiment

time window (<3hr)

n=4

Group 6

P

N

Prediction

Data

1

1.2

0.8

Norm

alized

response

n=34

Group 1

**

n=28

Group 2

**

n=24

Group 3**

n=13

Group 4*

n=10

Group 5

p=0.05

Reversal

of selectivity

Building

new (incorrect)

selectivity

PP

NN

Destroying Reversal Building

Destroying

initial (correct)

selectivity

pre post

Altered statistics

Figure 6. Altered Statistics in Visual Experi-ence Builds Incorrect Selectivity(A) Prediction: top, most adult IT neurons start with

fully position/size tolerant selectivity (left). In the

limit of a large amount of altered visual experience,

temporal contiguity learning predicts that each

neuron will acquire fully altered tolerance (right).

Bottom, at the swap position/size (red), the selec-

tivity for P over N is predicted to reverse in the limit

(prefer N over P). Because we could only record

longitudinally from a multiunit site for less than

3 hr, we do not expect our experience manipula-

tion within a session to produce the full

selectivity reversal (pre versus post) among

neuronal sites with strong initial selectivity.

However, because different IT sites differ in their

degrees of initial selectivity, they start at different

distances from selectivity reversal. Thus, our

manipulation should produce selectivity reversal

among the initially weakly selective sites and build

new (‘‘incorrect’’) selectivity.

(B) Mean ± SEM normalized response to object P

and N at the swap position/size among subpopu-

lations of IT multiunit sites. Sites are grouped by

their initial selectivity at the swap position/size

using independent data. Data from the size and

position tolerance experiments (Li and DiCarlo,

2008) were combined to gain maximal power

(size experiment I, II; position experiment, see

Supplemental Experimental Procedures). These

sites show strong selectivity at the non-swap

(control) position/size, and no negative change in

that selectivity was observed (not shown). **p <

0.01; *p < 0.05, one-tailed t test against no change.

(Size experiment data only, group 1–6: p < 0.01;

p < 0.01; p < 0.01; p = 0.02; p = 0.07; n.s.).

I and II). We observed significant selec-tivity change for the medium size imagepairs (Figure 8B, middle bar; p = 0.01,two-tailed t test against zero), whichsuggests that the tolerance learning hassome degree of spread (but not to verydifferent objects; Figure S4). Finally, theeffect size observed in Experiment IIIwas consistent with, and can explain theeffect sizes observed in Experiments Iand II. That is, based on the ExperimentIII effect sizes for the temporally laggingand leading images, a first-order predic-tion of the net effect in Experiments Iand II is the average of these two effects

(because Experiments I and II employed a 50-50mix of the expe-rience manipulations considered separately in Experiment III).That prediction is very close to what we found (Figure 5).

DISCUSSION

The overarching goal of this work is to ask whether theprimate ventral visual stream uses a general, temporal contiguitydriven learning mechanism to construct its tolerance to

Neuron

Natural Experience Reshapes IT Size Tolerance

1070 Neuron 67, 1062–1075, September 23, 2010 ª2010 Elsevier Inc.

Page 10: Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal

object-identity-preserving image transformations. Our strategywas to use experience manipulations of temporally contiguousimage statistics to look for changes in IT neuronal tolerancethat are predicted by this hypothetical learning mechanism.Here we tested three key predictions that were not answeredby previous work (Li and DiCarlo, 2008). First, we asked if theseexperience manipulations predictably reshaped the size toler-ance of IT neurons. Our results strongly confirmed this predic-tion: we found that the change in size tolerance was large ("5spikes/s, "25% IT selectivity change per hour of exposure)and grew gradually stronger with increasing visual experience.Second, we asked if this tolerance reshaping was induced undervisual experience that mimics the common size-tolerance-

B

A Weak

position/size tolerance

Re

sp

on

se

P

N

Position / Size

Exposure

Response P

N

Data

P

P

N

N

Already tolerantBuilding

Normal

position/size tolerance

n=34

Group 5

1

1.2

0.8

Norm

alized

response

n=3

Group 1

n=9

Group 2

**

n=15

Group 3*

n=23

Group 4

Already

tolerant

selectivity

Building

new

selectivity

p=0.06

Experiment

time window (<3hr)

pre post

Normal statistics

Figure 7. Normal (‘‘Correct’’) Statistics inVisual Experience Builds Tolerant Selec-tivity(A) Prediction follows the same logic as in

Figure 6A, but here for the control conditions in

which normal temporal contiguity statistics were

provided (Figure 1). Top, temporal contiguity

learning predicts that neurons will be taught to

build new ‘‘correct’’ selectivity (i.e., normal toler-

ance), and neurons starting with initially weak

position/size tolerant selectivity (left) have the

highest potential to reveal that effect. Bottom, at

the non-swap position/size (blue), our manipula-

tion should build new correct selectivity for P

over N among IT sites with weak initial selectivity.

(B) Mean ± SEM normalized response to object P

and N at the non-swap position/size among

subpopulations of IT multiunit sites. Sites are

grouped by their initial selectivity at the non-

swap position/size using independent data. Other

details are the same as those in Figure 6B. (Size

experiment data only, group 1–5: p = 0.06; p <

0.01; p = 0.05; n.s.; n.s.).

building statistics in the natural world:temporally contiguous image changeswithout intervening eye movements, andtemporally smooth dynamics. Our resultsconfirmed this prediction: we found thatsize tolerance was robustly reshaped inboth of these conditions (Figure 2), andthe magnitude of reshaping was similarto that seen with eye-movement-contin-gent reshaping of IT position tolerance(Li and DiCarlo, 2008, Figure 5). Third,we asked if experience with temporalcontiguous image statistics could notonly break existing IT tolerance, but couldalso build new tolerance. Again, ourresults confirmed this prediction: wefound that experience with incorrectstatistics can build incorrect tolerance(Figure 6) and that experience withcorrect statistics can build correct toler-ance (Figure 7). Finally, we found thatthis tolerance learning is temporally

asymmetric and spreads beyond the specifically experiencedimages (Figure 8, medium size), results that have implicationsfor underlying mechanisms (see below).Given these results, it is now highly likely that our previously

reported results on eye-movement-contingent tolerancelearning (Li and DiCarlo, 2008) were only one instance ofa general tolerance learning mechanism. Taken together, ourtwo studies show that unsupervised, temporally contiguousexperience can reshape and build at least two types of IT toler-ance, and that they can do so under a wide range of spatiotem-poral regimes encountered during natural visual exploration. Insum, we speculate that these studies are both pointing to thesame general learning mechanism that builds adult IT tolerance,

Neuron

Natural Experience Reshapes IT Size Tolerance

Neuron 67, 1062–1075, September 23, 2010 ª2010 Elsevier Inc. 1071

Page 11: Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal

and we have previously termed this mechanism ‘‘unsupervisedtemporal slowness learning’’ (Li and DiCarlo, 2008).

Our suggestion that UTL is a general tolerance learning mech-anism is supported by a number of empirical commonalitiesbetween the size tolerance learning here and our previouslyreported position tolerance learning (Li and DiCarlo, 2008). (1)Object specificity: the experience-induced changes in IT sizetolerance and position tolerance have at least some specificityfor the exposed object. (2) Learning induction (driving force): inboth studies, the magnitude of learning depended on the initialselectivity of the temporally adjacent images (medium objectsize here, foveal position in the position tolerance study), whichis consistent with the idea that the initial selectivity may provideat least part of the driving force for the learning. (3) Time courseof learningexpression: learning increasedwith increasing amountof experience and changed the initial part of IT response (100 msafter stimulusonset). (4)Responsechangeof learningexpression:in both studies, the IT selectivity change arose from a response

decrease to the preferred object (P) and a response increase tothe less preferred object (N). (5) Effect size: our different experi-ence manipulations here as well as our previous position manip-ulation revealed a similar effect magnitude ("5 spikes/s per 400swap exposures). More specifically, when measured as learningmagnitude per exposure event, size tolerance learning wasslightly smaller than that found for position tolerance learning(Figure 5), and when considered as learning magnitude per unittime, the results of all three experiments were nearly identical(FigureS8).However,wenote that ourdatacannot cleanlydecon-found exposure amount from exposure time.

Relation to Previous LiteraturePrevious psychophysical studies have shown that human objectperception depends on the statistics of visual experience (e.g.,Brady and Oliva, 2008; Fiser and Aslin, 2001; Turk-Browneet al., 2005). Several studies have also shown that manipulatingthe spatiotemporal contiguity statistics of visual experience canalter the tolerance of human object perception (Cox et al., 2005;Wallis et al., 2009; Wallis and Bulthoff, 2001). In particular, anearlier study (Cox et al., 2005) showed that the same type ofexperience manipulation deployed here (experience of differentobject images across position change) produces increasedconfusion of object identities across position—a result thatqualitatively mirrors the neuronal results reported here and inour previous neuronal study (Li and DiCarlo, 2008). Thus, theavailable psychophysical data suggest that UTL has perceptualconsequences. However, this remains an open empirical ques-tion (see ‘‘Limitations and Future Direction’’ subsection).Previous neurophysiological investigations in the monkey

ventral visual stream showed that IT and perirhinal neurons couldlearn to give similar responses to temporally nearby stimuli wheninstructed by reward (i.e., so-called ‘‘paired associate’’ learning;Messinger et al., 2001; Miyashita, 1988; Sakai and Miyashita,1991), or sometimes, even in the absence of reward (Ericksonand Desimone, 1999). Though these studies were motivated inthe context of visual memory (Miyashita, 1993) and used visualpresentation rates of seconds or more, it was recognized thatthe same associational learning across time might also beused to learn invariant visual features for object recognition(e.g., Foldiak, 1991; Stryker, 1991; Wallis, 1998; Wiskott and Sej-nowski, 2002). Our studies provide a direct test of these ideas byshowing that temporally contiguous experience with objectimages can specifically reshape the size and position toleranceof IT neurons’ selectivity among visual objects. This is consistentwith the hypothesis that the ventral visual stream relies ona temporal contiguity strategy to learn its tolerant object repre-sentations in the first place. Our results also demonstrate thatUTL is somewhat specific to the experienced objects’ images(i.e., object, size, position specificity) and operates over natural,very fast time scales (hundreds of ms, faster than those previ-ously reported) in a largely unsupervised manner. This suggeststhat, during natural visual exploration, the visual system canleverage an enormous amount of visual experience to constructits object invariance.Computational models of the ventral visual stream have put

forms of the temporal contiguity hypothesis to test, and haveshown that learning to extract slowly varying features across

A

P

N

P

N

1.5˚ 4.5˚

Exposure phase

!7.5

!5

!2.5

0

leading

images

lagging

images

B

*

n=31

! (

P-N

) s

pik

es /s /800 e

xp.

s

Non-

exposed

medium

size

leading

images

lagging

images

Ch

an

ge

in

se

lectivity

Figure 8. Experiment III Exposure Design and Key Results(A) Exposure Phase design (top, same format as in Figure 1B) and example

object images used (bottom).

(B) Mean ± SEM selectivity change, Ds(P # N), among the temporally leading

images, the nonexposed images at the medium object size (3!), and the

temporally lagging images. Ds(P # N) was normalized to show selectivity

change per 800 exposure events. *p = 0.038, two-tailed t test.

Neuron

Natural Experience Reshapes IT Size Tolerance

1072 Neuron 67, 1062–1075, September 23, 2010 ª2010 Elsevier Inc.

Page 12: Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal

time can produce tolerant feature representations with units thatmimic the basic response properties of ventral stream neurons(Masquelier et al., 2007; Masquelier and Thorpe, 2007; Sprekeleret al., 2007;Wallis and Rolls, 1997;Wiskott and Sejnowski, 2002;Wyss et al., 2006). Thesemodels can be implemented using vari-ants of Hebbian-like learning rules (Masquelier and Thorpe, 2007;Sprekeler and Gerstner, 2009; Sprekeler et al., 2007; Wallis andRolls, 1997). The time course and task independence of UTL re-ported here is consistent with synaptic plasticity (Markramet al., 1997; Rolls et al., 1989), and the temporal asymmetry inlearning magnitude (Figure 8) constrains the possible underlyingmechanisms. While the experimental approach used here mayseem to imply that experience with all possible images of eachobject is necessary for UTL to build an invariant IT object repre-sentation, this is not believed to be true in a full computationalmodel of the ventral stream. For example, V1 complex cells thatencode edges may learn position tolerance that ultimatelysupports the invariant encodingofmanyobjects.Our observationof partial spread of tolerance learning to nonexperienced images(Figure 8) is consistent with this idea. In particular, at each level ofthe ventral stream, afferent input likely reflects tolerance alreadyconstructed for simpler features at the previous level (e.g., in thecontext of this study, some IT afferents may respond to anobject’s image at both the medium size and the swap size).Thus any modification of the swap-size-image-afferents wouldresult in a partial generalization of the learning beyond the specif-ically experienced images.

Limitations and Future DirectionBecause the change in object selectivity was expressed in theearliest part of the IT response after learning (Figure S5A), evenwhile the animal was performing tasks unrelated to the objectidentity, this rules out any simple attentional account of theeffect. However, our data do not rule out the possibility thatattention or other top down signals may be required to mediatethe learning during the Exposure Phase. These potential top-down signals could include nonspecific reward, attentional,and arousal signals. Indeed, psychophysical evidence (Seitzet al., 2009; Shibata et al., 2009) and physiological evidence(Baker et al., 2002; Freedman and Assad, 2006; Froemkeet al., 2007; Goard and Dan, 2009; Law and Gold, 2008) bothsuggest that reward is an important factor that can modulateor gate learning. We also cannot rule out the possibility that theattentional or the arousal systemmay be required for the learningto occur. In our work, we sought to engage the subjects in naturalexploration during the Exposure Phases under the assumptionthat visual arousal may be important for ongoing learning, eventhough we deployed the manipulation during the brief periodsof fixation during that exploration. Future experiments in whichwe systematically control these variables will shed light on thesequestions, and will help expose the circuits that underlie UTL.Although the UTL phenomenology induced by our experi-

ments was a very specific change in IT neuronal selectivity, themagnitude of this learning effect was quite largewhen expressedin units of spikes per second (Figure 5: "5 spikes/s, "25%change in IT selectivity per hour of exposure). This is comparableto or larger than other important neuronal phenomenology (e.g.,attention, Maunsell and Cook, 2002). However, because this

effect size was evaluated from the multiunit signal, withoutknowledge of how many neurons we are recording from, thiseffect size should be interpreted with caution. Furthermore, con-necting this neuronal phenomenology (i.e., change in IT imageselectivity) to the larger problem of size or position tolerance atthe level of the IT population or the animal’s behavior is notstraightforward. Quantitatively linking a neuronal effect size tobehavioral effect size requires a more complete understandingof how that neuronal representation is read out to supportbehavior, and large effects in confusion of object identities inindividual IT neurons may or may not correspond to large confu-sions of object identities in perception. Such questions are thetarget of our ongoing and future monkey studies in which onehas simultaneous measures of the neuronal learning and theanimal’s behaviors (modeled after those such as Britten et al.,1992; Cook and Maunsell, 2002).The rapid and unsupervised nature of UTL gives us new exper-

imental access to understand how cortical object representa-tions are actively maintained by the sensory environment.However, it also calls for further characterization of the timecourse of this learning to inform our understanding of the stabilityof ventral stream object representations in the face of constantlyavailable, natural visual experience. This sets the stage for futurestudies on how the ventral visual stream assembles its neuronalrepresentations at multiple cortical processing levels, particu-larly during early postnatal visual development, so as to achieveremarkably powerful adult object representation.

EXPERIMENTAL PROCEDURES

Animals and SurgeryAseptic surgery was performed on two male Rhesus monkeys (8 and 6 kg)

to implant a head post and a scleral search coil. After brief behavioral training

(1–3 months), a second surgery was performed to place a recording chamber

to reach the anterior half of the temporal lobe. All animal procedures were

performed in accordance with National Institute of Health guidelines and the

Massachusetts Institute of Technology Committee on Animal Care.

General DesignOn each experimental day, we recorded from a single IT multiunit site for

2–3 hr. During that time, the animal was provided with altered visual experi-

ence in Exposure Phases and wemade repeatedmeasurements of the IT site’s

selectivity during Test Phases (Figure 1). The study consisted of three separate

experiments (Experiments I, II, and III), which differed from each other only in

the Exposure Phase design (described below). We focused on one pair of

objects (swap objects) that the IT site was selective for (preferred object P,

and nonpreferred object N, chosen using a prescreening procedure; see

Supplemental Experimental Procedures).

Experiment I

Objects (P and N at 1.5!, 4.5!, or 9!) appeared at random positions on a gray

computer screen and animals naturally looked to the objects. The image of the

just-foveated object was replaced by an image of the other object at a different

size (swap exposure event, Figure 1A) or an image of the same object at a

different size (non-swap exposure event, Figure 1A). The image change was

initiated 100 ms after foveation and was instantaneous (Figure 2, top). We

used a fully symmetric design illustrated graphically in Figure 1B. This experi-

ence manipulation temporally linked pairs of object images (Figure 1A shows

one such link) and each link could go in both directions (Figure 1B shows full

design example). For each IT site, we always deployed the swap manipulation

at one particular size (referred to as the swap size: 1.5! or 9!, prechosen,

strictly alternated between sites), keeping the other size as the exposure-

equalized control (referred to as the non-swap size).

Neuron

Natural Experience Reshapes IT Size Tolerance

Neuron 67, 1062–1075, September 23, 2010 ª2010 Elsevier Inc. 1073

Page 13: Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal

Experiment II

All design parameters were identical to Experiment I except that the image

changes were smooth across time (Figure 2, bottom). The image change

sequence started immediately after the animal had foveated the image and

the entire sequence lasted for 200 ms (Figure S2). Identity-changing morph

lines were only achievable on the silhouette shapes. OnlyMonkey 2was tested

in Experiment II (given the stimulus class assignment).

Experiment III

We used an asymmetric design that is illustrated graphically in Figure 8A: for

each IT site, we only gave the animals experience of image changes in one

direction (1.5!/4.5! or vice versa, prechosen, strictly alternated between

sites). The timing of the image change was identical to that in Experiment I.

Another pair of control objects (P0 and N0, not shown in the Exposure Phase)

was also used to probe the IT site’s responses in the Test Phase. The selec-

tivity among the control objects served as a measure of recording stability

(below). In each Test Phase, the swap and control objects were tested at three

sizes (Experiments I and II: 1.5!, 4.5!, 9!; Experiment III: 1.5!, 3!, 4.5!) by

presenting them briefly (100 ms) on the animals’ center of gaze (50–60 repeti-

tions, randomized) during orthogonal behavioral tasks in which object identity

and size were irrelevant. See Supplemental Experimental Procedures for

details of the task design and behavioral monitoring.

Neuronal AssaysWe recorded MUA from the anterior region of IT using standard single micro-

electrode methods. Our previous study on IT position tolerance learning

showed that we could uncover the same learning in both single-unit activity

and MUA with comparable effect size (Li and DiCarlo, 2008), so here, we

only recorded MUA to maximize recording time. Over a series of recording

days, we sampled across IT and sites selected for all our primary analyses

were required to be selective among object P and N (ANOVA, object 3 sizes,

p < 0.05 for ‘‘object’’ main effect or interaction) and pass a stability criterion

(n = 27 for Experiment I; 15 for Experiment II; 31 for Experiment III). We verified

that the key result is robust to the choice of the stability criteria (Figure S4).

See Supplemental Experimental Procedures for details of the recording

procedures and site selections.

Data AnalysesAll the analyses and statistical tests were done in MATLAB (Mathworks,

Natick, MA) with either custom-written scripts or standard statistical pack-

ages. The IT response to each image was computed from the spike count in

a 150ms timewindow (100–250ms poststimulus onset, data from Test Phases

only). Neuronal selectivity was computed as the response difference in units of

spikes/s between images of object P and N at different object sizes. To avoid

any bias in this estimate of selectivity, for each IT site we define the labels P

(preferred) and N by using a portion of the pre-exposure data to determine

these labels, and the remaining data to compute the selectivity values reported

in the text (Supplemental Experimental Procedures). In cases where neuronal

response data was normalized and combined (Figures 6 and 7), each site’s

response from each Test Phase was normalized to its mean response to all

object images in that Test Phase. The key results were evaluated statistically

using a combination of t tests and interaction tests (Supplemental Experi-

mental Procedures). For analyses presented in Figure 4, we extracted clear

single units from the waveform data of each Test Phase using a PCA-based

spike sorting algorithm (Supplemental Experimental Procedures).

SUPPLEMENTAL INFORMATION

Supplemental Information for this article includes nine figures and Supple-

mental Experimental Procedures and can be found with this article online at

doi:10.1016/j.neuron.2010.08.029.

ACKNOWLEDGMENTS

We thank Professors T. Poggio, N. Kanwisher, and E. Miller and the members

of our laboratory for valuable discussion and comment on this work. We also

thank J. Deutsch, B. Andken, and Dr. R. Marini for technical support. This work

was supported by the NIH (grant R01-EY014970 and its ARRA supplement to

J.J.D., NRSA 1F31EY020057 to N.L.) and The McKnight Endowment Fund for

Neuroscience.

Accepted: August 5, 2010

Published: September 22, 2010

REFERENCES

Afraz, S.R., Kiani, R., and Esteky, H. (2006). Microstimulation of inferotemporal

cortex influences face categorization. Nature 442, 692–695.

Baker, C.I., Behrmann, M., and Olson, C.R. (2002). Impact of learning on

representation of parts and wholes in monkey inferotemporal cortex. Nat.

Neurosci. 5, 1210–1216.

Brady, T.F., and Oliva, A. (2008). Statistical learning using real-world scenes:

extracting categorical regularities without conscious intent. Psychol. Sci. 19,

678–685.

Brincat, S.L., and Connor, C.E. (2004). Underlying principles of visual shape

selectivity in posterior inferotemporal cortex. Nat. Neurosci. 7, 880–886.

Britten, K.H., Shadlen, M.N., Newsome, W.T., and Movshon, J.A. (1992). The

analysis of visual motion: a comparison of neuronal and psychophysical

performance. J. Neurosci. 12, 4745–4765.

Cook, E.P., and Maunsell, J.H.R. (2002). Attentional modulation of behavioral

performance and neuronal responses in middle temporal and ventral intrapar-

ietal areas of macaque monkey. J. Neurosci. 22, 1994–2004.

Cox, D.D., Meier, P., Oertelt, N., and DiCarlo, J.J. (2005). ‘Breaking’ position-

invariant object recognition. Nat. Neurosci. 8, 1145–1147.

DiCarlo, J.J., and Maunsell, J.H.R. (2003). Anterior inferotemporal neurons of

monkeys engaged in object recognition can be highly sensitive to object retinal

position. J. Neurophysiol. 89, 3264–3278.

Erickson, C.A., and Desimone, R. (1999). Responses of macaque perirhinal

neurons during and after visual stimulus association learning. J. Neurosci.

19, 10404–10416.

Fiser, J., and Aslin, R.N. (2001). Unsupervised statistical learning of higher-

order spatial structures from visual scenes. Psychol. Sci. 12, 499–504.

Foldiak, P. (1991). Learning invariance from transformation sequences. Neural

Comput. 3, 194–200.

Freedman, D.J., and Assad, J.A. (2006). Experience-dependent representa-

tion of visual categories in parietal cortex. Nature 443, 85–88.

Froemke, R.C., Merzenich, M.M., and Schreiner, C.E. (2007). A synaptic

memory trace for cortical receptive field plasticity. Nature 450, 425–429.

Gerstner, W., Kempter, R., van Hemmen, J.L., and Wagner, H. (1996).

A neuronal learning rule for sub-millisecond temporal coding. Nature 383,

76–81.

Goard, M., and Dan, Y. (2009). Basal forebrain activation enhances cortical

coding of natural scenes. Nat. Neurosci. 12, 1444–1449.

Hung, C.P., Kreiman, G., Poggio, T., and DiCarlo, J.J. (2005). Fast readout of

object identity from macaque inferior temporal cortex. Science 310, 863–866.

Ito, M., Tamura, H., Fujita, I., and Tanaka, K. (1995). Size and position

invariance of neuronal responses in monkey inferotemporal cortex. J. Neuro-

physiol. 73, 218–226.

Kobatake, E., Wang, G., and Tanaka, K. (1998). Effects of shape-discrimina-

tion training on the selectivity of inferotemporal cells in adult monkeys.

J. Neurophysiol. 80, 324–330.

Kreiman, G., Hung, C.P., Kraskov, A., Quiroga, R.Q., Poggio, T., and DiCarlo,

J.J. (2006). Object selectivity of local field potentials and spikes in the

macaque inferior temporal cortex. Neuron 49, 433–445.

Law, C.T., and Gold, J.I. (2008). Neural correlates of perceptual learning in

a sensory-motor, but not a sensory, cortical area. Nat. Neurosci. 11, 505–513.

Li, N., and DiCarlo, J.J. (2008). Unsupervised natural experience rapidly alters

invariant object representation in visual cortex. Science 321, 1502–1507.

Neuron

Natural Experience Reshapes IT Size Tolerance

1074 Neuron 67, 1062–1075, September 23, 2010 ª2010 Elsevier Inc.

Page 14: Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal

Li, N., Cox, D.D., Zoccolan, D., and DiCarlo, J.J. (2009). What response

properties do individual neurons need to underlie position and clutter

‘‘invariant’’ object recognition? J. Neurophysiol. 102, 360–376.

Logothetis, N.K., and Sheinberg, D.L. (1996). Visual object recognition. Annu.

Rev. Neurosci. 19, 577–621.

Markram, H., Lubke, J., Frotscher, M., and Sakmann, B. (1997). Regulation of

synaptic efficacy by coincidence of postsynaptic APs and EPSPs. Science

275, 213–215.

Masquelier, T., and Thorpe, S.J. (2007). Unsupervised learning of visual

features through spike timing dependent plasticity. PLoS Comput. Biol. 3, e31.

Masquelier, T., Serre, T., Thorpe, S.J., and Poggio, T. (2007). Learning

complex cell invariance from natural video: a plausibility proof. CBCL Paper

(Cambridge, MA: Massachusetts Institute of Technology).

Maunsell, J.H.R., and Cook, E.P. (2002). The role of attention in visual process-

ing. Philos. Trans. R. Soc. Lond. B Biol. Sci. 357, 1063–1072.

Meliza, C.D., and Dan, Y. (2006). Receptive-field modification in rat visual

cortex induced by paired visual stimulation and single-cell spiking. Neuron

49, 183–189.

Messinger, A., Squire, L.R., Zola, S.M., and Albright, T.D. (2001). Neuronal

representations of stimulus associations develop in the temporal lobe during

learning. Proc. Natl. Acad. Sci. USA 98, 12239–12244.

Miyashita, Y. (1988). Neuronal correlate of visual associative long-term

memory in the primate temporal cortex. Nature 335, 817–820.

Miyashita, Y. (1993). Inferior temporal cortex: where visual perception meets

memory. Annu. Rev. Neurosci. 16, 245–263.

Morrison, A., Diesmann, M., and Gerstner, W. (2008). Phenomenological

models of synaptic plasticity based on spike timing. Biol. Cybern. 98, 459–478.

Op De Beeck, H., and Vogels, R. (2000). Spatial sensitivity of macaque inferior

temporal neurons. J. Comp. Neurol. 426, 505–518.

Rolls, E.T., Baylis, G.C., Hasselmo, M.E., and Nalwa, V. (1989). The effect of

learning on the face selective responses of neurons in the cortex in the superior

temporal sulcus of the monkey. Exp. Brain Res. 76, 153–164.

Sakai, K., and Miyashita, Y. (1991). Neural organization for the long-term

memory of paired associates. Nature 354, 152–155.

Seitz, A.R., Kim, D., and Watanabe, T. (2009). Rewards evoke learning of

unconsciously processed visual stimuli in adult humans. Neuron 61, 700–707.

Shibata, K., Yamagishi, N., Ishii, S., and Kawato, M. (2009). Boosting percep-

tual learning by fake feedback. Vision Res. 49, 2574–2585.

Sigala, N., Gabbiani, F., and Logothetis, N.K. (2002). Visual categorization

and object representation in monkeys and humans. J. Cogn. Neurosci. 14,

187–198.

Sprekeler, H., and Gerstner, W. (2009). Robust learning of position invariant

visual representations with OFF responses (Salt Lake City: In COSYNE).

Sprekeler, H., Michaelis, C., and Wiskott, L. (2007). Slowness: an objective

for spike-timing-dependent plasticity? PLoS Comput. Biol. 3, e112.

Stryker, M.P. (1991). Neurobiology. Temporal associations. Nature 354,

108–109.

Tanaka, K. (1996). Inferotemporal cortex and object vision. Annu. Rev. Neuro-

sci. 19, 109–139.

Turk-Browne, N.B., Junge, J., and Scholl, B.J. (2005). The automaticity of

visual statistical learning. J. Exp. Psychol. Gen. 134, 552–564.

Vogels, R., and Orban, G.A. (1996). Coding of stimulus invariances by inferior

temporal neurons. Prog. Brain Res. 112, 195–211.

Wallis, G. (1998). Spatio-temporal influences at the neural level of object

recognition. Network 9, 265–278.

Wallis, G., and Bulthoff, H.H. (2001). Effects of temporal association on

recognition memory. Proc. Natl. Acad. Sci. USA 98, 4800–4804.

Wallis, G., and Rolls, E.T. (1997). Invariant face and object recognition in the

visual system. Prog. Neurobiol. 51, 167–194.

Wallis, G., Backus, B.T., Langer, M., Huebner, G., and Bulthoff, H. (2009).

Learning illumination- and orientation-invariant representations of objects

through temporal association. J. Vis. 9, 6.

Wiskott, L., and Sejnowski, T.J. (2002). Slow feature analysis: unsupervised

learning of invariances. Neural Comput. 14, 715–770.

Wyss, R., Konig, P., and Verschure, P.F. (2006). A model of the ventral visual

system based on temporal stability and local memory. PLoS Biol. 4, e120.

Neuron

Natural Experience Reshapes IT Size Tolerance

Neuron 67, 1062–1075, September 23, 2010 ª2010 Elsevier Inc. 1075

Page 15: Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal

! "

!"#$%&'()%*#+"(!"(

,#--*"+"&./*(0&1%$+/.2%&((3&4#-"$)24"5(!/.#$/*(624#/*(78-"$2"&9"(:/-25*;(:"4</-"4(,2="(0&)/$2/&.(>?@"9.(:"-$"4"&./.2%&(2&(0&1"$2%$(A"+-%$/*(B%$."8((!"#$%&$'()$*'+,-$*.$/&0'12#$

(#$%%&'(')*+&,-./$0',#1,!

!C2D#$"(,EF((!"#$%&#'()*'+$(,-'.)(&/0-0'''1.2'3-'0-&-4"-*'567-4"'8(#90':95$'";5'*#::-9-)"'0"#$%&%0'4&(00-0'1<='4%"5%"')("%9(&'0>(8-0?'<='0#&>5%-""-'0>(8-02@''A>-'0;(8'567-4"'8(#9'1B'()*'C'%0-*':59'">-'D-/'-E8-9#-)4-'$()#8%&("#5)2';(0'(&;(/0'8#4D-*':95$'5)-'4&(00':59'-(4>'()#$(&'1F5)D-/'GH')("%9(&?'F5)D-/'IH'0#&>5%-""-2@''A>-'45)"95&'567-4"'8(#9'1BJ'()*'CJ2';(0'(&;(/0'8#4D-*':95$'">-'5">-9'0"#$%&%0'4&(00@'''

Page 16: Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal

! #

1K2'!"#$%&#':95$'">-'";5'4&(00-0'(9-'L%#"-'*#::-9-)"':95$'-(4>'5">-9'#)'">-#9'8#E-&M;#0-'0#$#&(9#"/@''A>#0'#0'#&&%0"9("-*';>-)'">-'0"#$%&%0'#$(,-0'(9-'8&5""-*'6/'0459-0'5:'">-':#90"'">9--'89#)4#8&-'45$85)-)"0'1BN2'#)'">-'8#E-&'08(4-@''B9#)4#8&-'45$85)-)"0';-9-'45$8%"-*':95$'(&&'OP'#$(,-0@''+$(,-0';-9-'89-M8954-00-*'"5'>(Q-'-L%(&'$-()'()*'%)#"'Q(9#()4-'6-:59-'#$(,-'()(&/0-0@''!5&#*'0/$65&0H')("%9(&?'58-)'0/$65&0H'0#&>5%-""-@!!

#$%%&'(')*+&,-./$0',#2,,

!

Page 17: Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal

! $

!C2D#$"(,GF((!"#$%&#':95$'RE8-9#$-)"'++'()*'N5$8(9#05)0'"5'C("%9(&'S#0%(&'359&*'RE($8&-@'''1.2'N%"5%"'0#&>5%-""-'0>(8-0';-9-'9-)*-9-*'%0#),')5)M%)#:59$'9("#5)(&'KM08&#)-@''R(4>'0>(8-';(0'9-)*-9-*':95$'('0-"'5:'I<'45)"95&'85#)"0@''F("4>#),'()*'#)"-985&("#),'6-";--)'">-'45)"95&'85#)"0'(&&5;-*'%0'"5'8(9($-"9#4(&&/'$598>'6-";--)'*#::-9-)"'0>(8-0@''F598>M&#)-0';-9-'5)&/'(4>#-Q(6&-'5)'('0%60-"'5:'(&&'8500#6&-'0>(8-'8(#90'#)'">-'0#&>5%-""-'4&(00'1T#,%9-'!G2@''A>-':#,%9-'0>5;0'(&&'">-'$598>M&#)-'8(#90'%0-*'#)'RE8-9#$-)"'++@''U)&/'F5)D-/'I';(0'"-0"-*'#)'RE8-9#$-)"'++',#Q-)'">-'0"#$%&%0'4&(00'(00#,)$-)"@''A>-'-E($8&-'8(#9'#)'1K2'#0'>#,>&#,>"-*@''1K2'A58V'('9-(&';59&*'-E($8&-'5:')("%9(&'Q#0%(&'-E8-9#-)4-';>-)'&#:"#),'('4%8'"5'*9#)D@''K5""5$V'-E($8&-'-E850%9-'-Q-)"0';-'%0-*'#)'RE8-9#$-)"'++'1"58V')5)M0;(8'-E850%9-'-Q-)"?'65""5$V'0;(8'-E850%9-'-Q-)"2@''W%9#),'-(4>'-E850%9-'-Q-)"V'">-'567-4"'0#X-'4>(),-';(0'8&(/-*'5%"'0$55">&/'5Q-9'('"#$-'8-9#5*'5:'IYY'$0'1:9($-'9("-H'<Y':9($-0Z0-42@''3-'%0-*'">-'0($-'*/)($#4'1#@-@'0($-'0#X-'4>(),-'895:#&-'6%"'04(&-*'#)'($8&#"%*-2':59'">-'";5'*#::-9-)"'"/8-0'5:'0#X-'#)49-(0-'-E850%9-'-Q-)"0'1G@[\!<@[\V'<@[\!O\V'T#,%9-'GK2@''T59'">-'567-4"'0#X-'*-49-(0-'-E850%9-'-Q-)"0'1<@[\!G@[\V'O\!<@[\V'T#,%9-'GK2V'">-'9-Q-90-'0-L%-)4-';(0'8&(/-*V';>#4>'(&05'$#$#4D-*'">-')("%9(&'Q#0%(&'-E8-9#-)4-'5:'8%""#),'*5;)'('4%8'1)5"'0>5;)2@'''1N2'3-'L%()"#:#-*'">-'0"("#0"#40'5:'">-'Q#0%(&';59&*'-E($8&-'()*'5%9'$5Q#-'0"#$%&#'6/'(')%$6-9'5:'*#::-9-)"'#$(,-'$-(0%9-0@''K&(4D'&#)-0'0>5;'">-'Q#0%(&';59&*'-E($8&-'1$-()'45$8%"-*':95$'Q#*-50'5:'$%&"#8&-'9-8-("0'5:'">-'0($-'(4"#5)2?'6&%-'&#)-0'0>5;'5%9'$5Q#-'0"#$%&#'1$-()'45$8%"-*':95$'(&&'-E850%9-'-Q-)"02?'0>(*-*'(9-(0'0>5;'!RF0@''U67-4"'0#X-';(0'$-(0%9-*'6/'">-'9(*#%0'5:'">-'0$(&&-0"'65%)*#),'0L%(9-'(95%)*'">-'0>(8-'19-859"-*'#)'%)#"0'5:'54"(Q-V')59$(&#X-*'"5'">-'#)#"#(&'0#X-2@''U67-4"'0#X-'4>(),-'08--*';(0'45$8%"-*'6/'"(D#),'">-'*-9#Q("#Q-'5:'">-'567-4"'0#X-'$-(0%9-$-)"0@''U8"#4(&':&5;';(0'45$8%"-*'%0#),'0"()*(9*'45$8%"-9'Q#0#5)'(&,59#">$'1]59)V'GO=P2@''K9#,>")-00'8(""-9)0'#)'">-'#$(,-'$5Q-'(0'">-'567-4"0'">("',#Q-'9#0-'"5'">-$'$5Q-@''U8"#4(&':&5;'L%()"#:#-0'">-'(88(9-)"'$5"#5)'5:'">-'69#,>")-00'8(""-9)@'']-9-V'$-()'58"#4(&':&5;'$(,)#"%*-'5Q-9'">-'-)"#9-'#$(,-';(0'45$8%"-*@''B#E-&'4>(),-';(0'45$8%"-*'6/'"(D#),'">-'8#E-&'#)"-)0#"/'*#::-9-)4-0'6-";--)'(*7(4-)"'Q#*-5':9($-0'()*'">-'R%4&#*-()')59$'5:'">-'8#E-&'*#::-9-)4-'5Q-9'">-'-)"#9-'#$(,-';(0'45$8%"-*@''.&&'Q#*-5':9($-0';-9-'89-M8954-00-*'"5'>(Q-'%)#"'Q(9#()4-'6-:59-'#$(,-'()(&/0-0@''''''''''''''''''''

Page 18: Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal

! %

!

!

#$%%&'(')*+&,-./$0',#3,,

!

!,C2D#$"(,HF('+A'F%&"#M%)#"'.4"#Q#"/'RE>#6#"0'!#X-'A5&-9()"'U67-4"'!-&-4"#Q#"/@'''1.2'+A')-%95)0'>(Q-'567-4"'9()D'59*-9'0-&-4"#Q#"/'">("'#0'&(9,-&/'%)(::-4"-*'6/'567-4"'0#X-'4>(),-0'1K9#)4("'()*'N5))59V'IYY<?'+"5'-"'(&@V'GOO[?'^5,5">-"#0'()*'!>-#)6-9,V'GOOP?'S5,-&0'()*'U96()V'GOOP2V'()*'">("'0#X-'"5&-9()4-'#0'9-:&-4"-*'#)'">-'+A'$%&"#M%)#"'(4"#Q#"/'1]%),'-"'(&@V'IYY[?'_9-#$()'-"'(&@V'IYYP2@''N5)0#0"-)"';#">'89-Q#5%0'9-859"0V'$50"'5:'">-'+A'0#"-0';-'9-459*-*'$(#)"(#)-*'">-#9'567-4"'9()D'59*-9'89-:-9-)4-'(49500'">-'9(),-'5:'567-4"'0#X-'"-0"-*'>-9-'1G@[\`O\2@''A5'L%()"#:/'">-'*-,9--'5:'+A'0#X-'"5&-9()4-':59'">-'0;(8'()*'45)"95&'567-4"'8(#90V':59'-(4>'+A'0#"-';-'*-"-9$#)-*'#"0'89-:-99-*'1B2'()*'&-00'89-:-99-*'1C2'567-4"';#">#)'()'567-4"'8(#9'%0#),'('859"#5)'5:'">-'9-085)0-'*("('("'">-'$-*#%$'567-4"'0#X-'1<@[\2@''3-'">-)'%0-*'">50-'aBb'aCb'&(6-&0'"5'45$8%"-'">-'567-4"'0-&-4"#Q#"/'1BMC2':95$'">-'9-$(#)#),'9-085)0-'*("('()*':59'5">-9'567-4"'0#X-@''A>-'8&5"0'0>5;'">-'$-()'c'!RF'0-&-4"#Q#"/'5:'(&&'567-4"'0-&-4"#Q-'0#"-0':95$'RE8-9#$-)"'+'()*'++'1)dPe2@''B50#"#Q-'0-&-4"#Q#"/'#)*#4("-0'">("'+A'0#"-0V'5)'(Q-9(,-V'$(#)"(#)-*'">-#9'567-4"'89-:-9-)4-'(49500'0#X-'4>(),-0@'''1K2'F50"'5:'">-'#)*#Q#*%(&'+A'0#"-0'1`=YfV')dPe2'$(#)"(#)-*'">-#9'567-4"'9()D'59*-9'89-:-9-)4-@''A>-'8&5"'

Page 19: Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal

! &

0>5;0'">-':9(4"#5)'5:'">-'+A'0#"-0'#)'1.2'">("'$(#)"(#)-*'">-#9'567-4"'9()D'59*-9'89-:-9-)4-'("'-(4>'567-4"'0#X-@''R99596(90'0>5;'!RF0@''1N2'A5'0%$$(9#X-'">-'(Q-9(,-'-::-4"'5:'567-4"'0#X-'4>(),-0'5)'+A'567-4"'0-&-4"#Q#"/'(49500'(&&':5%9'567-4"0'10;(8'()*'45)"95&'567-4"'8(#90'45$6#)-*2V';-'08&#"'">-'Pe'567-4"'0-&-4"#Q-'+A'0#"-0'#)"5'">9--',95%80'6(0-*'5)'">-#9'0#X-'89-:-9-)4-@''B9-:-99-*'0#X-':59'()'+A'0#"-';(0'*-:#)-*'(0'">-'0#X-'("';>#4>'()/'567-4"'-Q5D-*'">-'$(E#$%$'9-085)0-':95$'">-'0#"-@''3-'">-)'9()D-*'">-'567-4"'89-:-9-)4-'6(0-*'5)'">-'9-085)0-'("'">-'89-:-99-*'0#X-'1:95$'6-0"'"5';590"2@''A>-'(604#00('9-89-0-)"0'">-')59$(&#X-*'9-085)0-'"5'">-'6-0"'567-4"'("'-(4>'8(9"#4%&(9'0#X-@'A>-'59*#)("-'9-89-0-)"0'">-')59$(&#X-*'9-085)0-'"5'">-'6-0"'567-4"'("'">-'89-:-99-*'0#X-@''R(4>'*("('85#)"'0>5;0'">-'$-()'c'!RF@''U)'(Q-9(,-V'+A'0#"-0'$(#)"(#)-*'">-#9'567-4"'9()D'59*-9'89-:-9-)4-@''3-':5%)*'$59-'0#"-0'89-:-99#),'">-'-E"9-$#"/'567-4"'0#X-0'1G@[g'()*'Og2'">()'">-'$-*#%$'567-4"'0#X-'1<@[g2V';#">'$59-'0#"-0'89-:-99#),'">-'6#,'567-4"'0#X-'1Og2@''

,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

Page 20: Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal

! '

#$%%&'(')*+&,-./$0',#4,,,

!!

(

Page 21: Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal

! (

C2D#$"(,IF('+A'h-0%&"0';#">'.&&'U67-4"'!-&-4"#Q-'!#"-0'6-:59-'()*'(:"-9'!"(6#&#"/'!49--)@''''1.2'3-'*-8&5/-*'">-'D-/'-E8-9#-)4-'$()#8%&("#5)';#">'('8(#9'5:'0;(8'567-4"0'1B'()*'C2'#)'">-'!"#$%&'()*+,%(@''3-'(&05'$-(0%9-*'">-'+A'9-085)0-'"5'('0-45)*'8(#9'5:'45)"95&'567-4"0'1BJ'()*'CJ2'(&5),';#">'">-'0;(8'567-4"0'#)'">-'-(%.)*+,%('10--'!%88&-$-)"(&'RE8-9#$-)"(&'B954-*%9-02@''3-';-9-'#)"-9-0"-*'#)'08-4#:#4'0-&-4"#Q#"/'4>(),-'#)'+A'#)*%4-*'6/'5%9'-E8-9#-)4-'$()#8%&("#5)@'']5;-Q-9V'">-9-';-9-'85"-)"#(&')5)M08-4#:#4'4>(),-0'#)'0-&-4"#Q#"/'1-@,@':95$'-&-4"95*-'*9#:"0'#)'"#00%-'59'"#00%-'*-(">2'">("'45%&*'45)"($#)("-'5%9'-::-4"'5:'#)"-9-0"@''i)&#D-'"9(*#"#5)(&'0#),&-M%)#"'9-459*#),';>-9-'5)-'45%&*'7%*,-'">-'0"(6#&#"/'5:'&5),M"-9$'9-459*#),'6(0-*'5)'08#D-';(Q-:59$V';-'*#*')5"'>(Q-'0%4>'$-(0%9-'#)'$%&"#M%)#"'9-459*#),@''A>%0';-'05%,>"'()5">-9'#)*-8-)*-)"'$-(0%9-'5:'&5),M"-9$'9-459*#),'0"(6#&#"/'1IMe'>5%902@''A5'*5'">#0V';-'9-&#-*'5)'+A'0-&-4"#Q#"/'($5),'">-'#$(,-0'5:'">-'45)"95&'567-4"0'1Bj'()*'Cj2@''3-'8#4D-*'">-0-'45)"95&'567-4"0'"5'6-'0%::#4#-)"&/'*#::-9-)"':95$'">-'0;(8'567-4"0'#)'">-#9'8#E-&M;#0-'0#$#&(9#"/'1T#,%9-'!G2@''U%9'()(&/0-0'18()-&'K'&-:"'45&%$)2'()*'5%9'89-Q#5%0'#)Q-0"#,("#5)'1^#'()*'W#N(9&5V'IYY=2'>(Q-'9-Q-(&-*'">("'()/'-E8-9#-)4-M#)*%4-*'4>(),-'#)'0-&-4"#Q#"/';(0'08-4#:#4'"5'">-'0;(8'567-4"0@''^-Q-9(,#),'">#0V';-'$(*-'">-'(00%$8"#5)'">("'">-'45)"95&'567-4"0';-9-':(9'(8(9"':95$'">-'0;(8'567-4"0'#)'+A'0>(8-'08(4-V'">%0'">-/'0>5%&*'6-'&#""&-'(::-4"-*'6/'5%9'-E8-9#-)4-'$()#8%&("#5)@''T59'-(4>'+A'0#"-V';-'45$8%"-*'B-(905)J0'4599-&("#5)'6-";--)'#"0'9-085)0-'Q-4"590'"5'">-0-'45)"95&'567-4"'#$(,-0'1P'*#$-)0#5)(&'Q-4"59V'I'567-4"0'E'e'0#X-02'$-(0%9-*':95$'">-':#90"'()*'&(0"'-(%.)*+,%('19#,>"'8()-&H'$-()'c'!RF?'*("(':95$'RE8-9#$-)"'+'5)&/2@''.':9(4"#5)'5:'">-'0#"-0'0>5;-*'&5;'4599-&("#5)0V'$-()#),'">-#9'9-085)0-0'"5'">-'45)"95&'567-4"'#$(,-0'>(*'*-Q#("-*':95$'">50-'$-(0%9-*'#)'">-':#90"'-(%.)*+,%(@''C5"-'">("'('0#"-'45%&*'(&05'>(Q-'&5;'4599-&("#5)':95$'>(Q#),')5'"%)#),'($5),'">-'45)"95&'567-4"'#$(,-0'"5'6-,#)';#">V'#)'">50-'4(0-0V';-'>(*')5'85;-9'"5'7%*,-'9-459*#),'0"(6#&#"/@''+)'89(4"#4-V';-'*--$-*'('0#"-'0"(6&-'#:'#"'>(*'('4599-&("#5)'Q(&%-'>#,>-9'">()'Y@k@'''1K2'.&&'">-'$(#)'"-E"'9-0%&"0'45)4-)"9("-*'5)'">-'0"(6&-'+A'0#"-0@'']-9-V';-'89-0-)"'">-'$(#)'+A'9-0%&"0':95$'(&&'567-4"'0-&-4"#Q-'0#"-0@''^-:"'45&%$)'8()-&0'0>5;'$-()'c'!RF'0-&-4"#Q#"/'4>(),-V'"01BMC2V'5:'">-'0;(8'567-4"0'19-*V'0;(8'0#X-?'6&%-V')5)M0;(8'0#X-2'()*'45)"95&'567-4"0'16&(4DV'0($-'0#X-'(0'">-'0;(8'567-4"02@''3-':5%)*'">-'4>(),-'#)'+A'0-&-4"#Q#"/';(0'08-4#:#4'"5'">-'0;(8'567-4"0'("'">-'0;(8'0#X-@''!"("#0"#4(&&/V'567-4"'08-4#:#4#"/'5:'">-'0-&-4"#Q#"/'4>(),-'("'">-'0;(8'0#X-';(0'45):#9$-*'6/'('0#,)#:#4()"'l567-4"'E'-E850%9-l'#)"-9(4"#5)'18dY@YYOV'9-8-("-*'$-(0%9-0'.CUS.2@''C-E"V';-'(88&#-*'">-'0"(6#&#"/'049--)'5%"&#)-*'#)'1.2'%0#),'">-'+A'9-085)0-0'"5'">-'45)"95&'567-4"'#$(,-0'1)5"'%0-*':59'">-'$(#)'()(&/0-02V';-'">-)'&55D-*'"5'">-'4>(),-'#)'0-&-4"#Q#"/V'"01BMC2V'($5),'">-'0;(8'567-4"0'("'">-'0;(8'()*')5)M0;(8'0#X-@''A>-'0"(6#&#"/'049--)'9-Q-(&-*')5)M08-4#:#4'4>(),-0'#)'0-&-4"#Q#"/'5:'">-')5)M0"(6&-'+A'0#"-0'1$#**&-'45&%$)'8()-&02@''.$5),'">-'0#"-0';-'*--$-*'0"(6&-'19#,>"'45&%$)'8()-&02V'5%9'-E8-9#-)4-'$()#8%&("#5)'#)*%4-*'Q-9/'08-4#:#4'4>(),-'#)'0-&-4"#Q#"/'5)&/'("'">-'0;(8'0#X-@''"01BMC2';(0')59$(&#X-*'"5'0>5;'+A'0-&-4"#Q#"/'4>(),-'8-9'=YY'-E850%9-'-Q-)"0@''m'8nY@Y['6/'"M"-0"?'mm'8nY@YG?')@0@'8oY@Y[''1N2'3-'(&05'"-0"-*'$59-'0"9#4"':59$0'5:'0"(6#&#"/'49#"-9#('">("'#)4&%*-*'6(0-&#)-'9-085)0-'14>(),-'nGYV'n[V'()*'nI'08#D-0Z0V'6-:59-'Q0@'(:"-9'-E850%9-2'#)'(**#"#5)'"5'">-'0"()*(9*'0"(6#&#"/'049--)@''A>-'8&5"'0>5;0'"01BMC2'("'">-'0;(8'19-*2'()*')5)M0;(8'0#X-'16&%-2@''W("(':95$'RE8-9#$-)"'+'()*'++'(9-'45$6#)-*'1&-:"'"5'9#,>"H')dPe?')d<I?')dG=?')dGG?')d[2@''m'8nY@Y[?'mm'8nY@YGV'"M"-0"V'0;(8'Q0@')5)M0;(8'0#X-@''1W2'F-()'c'!RF'#)#"#(&'0-&-4"#Q#"/V'1BMC2V'$-(0%9-*':95$'">-':#90"'-(%.)*+,%(@'''

'!

!

!

!

Page 22: Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal

! )

!

#$%%&'(')*+&,-./$0',#5!!

!

Page 23: Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal

! *

C2D#$"(,J@''+A'h-085)0-'N>(),-0'+)*%4-*'6/'S#0%(&'RE8-9#-)4-@'''1.2'Mean ± SEM IT selectivity time course at the swap (left) and non-swap size (right) measured in the first (light colored) and last Test Phase (dark colored). Data from Experiment I and II are combined (n=42 IT sites). Gray region shows the standard spike count time window we used for all other analyses in the main text. '1K2'+A':#9#),'9("-';(0')5"'(&"-9-*'6/'Q#0%(&'-E8-9#-)4-@''T59'-(4>'+A'0#"-V';-'45$8%"-*'#"0'$-()'-Q5D-*':#9#),'9("-'"5'(&&'567-4"'#$(,-0':95$'">-':#90"'()*'&(0"'-(%.)*+,%(@''.&&'567-4"'0-&-4"#Q-'0#"-0';-9-'45$6#)-*':95$'RE8-9#$-)"'+'()*'++'1)dPe2@''3-'560-9Q-*')5')-"'4>(),-'#)'">-'$-()'-Q5D-*':#9#),'9("-'6-:59-'()*'(:"-9'5%9'-E8-9#-)4-'$()#8%&("#5)'1&-:"'8()-&?'8dY@I<V'";5'"(#&-*'"M"-0"V'6-:59-'Q-90%0'(:"-92@''3-'(&05'560-9Q-*')5')-"'4>(),-'#)'+A'6(4D,95%)*':#9#),'9("-'19#,>"'8()-&?'8dY@GkV'";5'"(#&-*'"M"-0"2@''K(4D,95%)*':#9#),';(0'$-(0%9-*':95$'9()*5$&/'#)"-9&-(Q-*'6&()D'0"#$%&%0'89-0-)"("#5)0'*%9#),'">-'-(%.)*+,%(%@''.':-;'0#"-0'0>5;-*'&(9,-'4>(),-'#)'">-#9'6(4D,95%)*':#9#),'9("-'-Q-)'">5%,>'">-/';-9-'4&(00#:#-*'(0'a0"(6&-'0#"-0b'6/'">-#9'0-&-4"#Q#"/':59'">-'45)"95&'567-4"'#$(,-0'1T#,%9-'!<2@''3-'">%0'"-0"-*'$59-'0"9#4"':59$0'5:'0"(6#&#"/'49#"-9#('">("'#)4&%*-*'6(4D,95%)*':#9#),'9("-';#">'D-/'9-0%&"0'%)4>(),-*'1T#,%9-'!<2@''1N2'3-':#"'0"()*(9*'&#)-(9'9-,9-00#5)'"5'-(4>'+A'0#"-J0'9-085)0-0'"5'567-4"'B'()*'C'("'-(4>'567-4"'0#X-'(0'(':%)4"#5)'5:'">-')%$6-9'5:'-E850%9-'-Q-)"0@'A>-'0&58-'5:'">-'&#)-':#"0'1"02'895Q#*-*'('$-(0%9-'5:'">-'9-085)0-'4>(),-0'"5'B'()*'C':59'-(4>'+A'0#"-@''A>-'>#0"5,9($0'0>5;'">-'0&58-'Q(&%-0'5:'(&&'">-'0"(6&-'0#"-0':95$'RE8-9#$-)"'+'()*'++'1)d<I2@''"0B'()*'"0C';-9-')59$(&#X-*'"5'0>5;'9-085)0-'4>(),-0'8-9'=YY'-E850%9-'-Q-)"0@'''1W2'F-()'c'!RF')59$(&#X-*'9-085)0-0'"5'567-4"'B'()*'C'(0'(':%)4"#5)'5:'">-')%$6-9'5:'-E850%9-'-Q-)"0@''T59'-(4>'+A'0#"-V'9-085)0-'5:'-(4>'-(%.)*+,%(';(0')59$(&#X-*'"5'">-'$-()'9-085)0-'"5'(&&'567-4"'#$(,-0'#)'">("'-(%.)*+,%(@''''''

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

Page 24: Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal

! "+

!

#$%%&'(')*+&,-./$0',#!,!

!!

!C2D#$"(,KF''+A'!#),&-Mi)#"'9-0%&"'#0'h56%0"'"5'i)#"'!-&-4"#5)'N9#"-9#(@''1.2'3-'8-9:59$-*'BN.M6(0-*'08#D-'059"#),'5)'">-';(Q-:59$0'45&&-4"-*'*%9#),'-(4>'-(%.)*+,%(V'"9-("#),'-(4>'%)#"'(0'()'#)*-8-)*-)"'0($8&-':95$'">-'+A'858%&("#5)'-#">-9'6-:59-'59'(:"-9'">-'(&"-9-*'Q#0%(&'-E8-9#-)4-@''R(4>'%)#"'56"(#)-*':95$'">-'08#D-'059"#),';(0':%9">-9'-Q(&%("-*'6/'#"0'0#,)(&M"5M)5#0-'9("#5'1!ChH'9("#5'5:'8-(DM"5M8-(D'$-()';(Q-:59$'($8&#"%*-'"5'0"()*(9*'*-Q#("#5)'5:'">-')5#0-2@''A>-'>#0"5,9($'0>5;0'">-'*#0"9#6%"#5)'5:'!Ch':59'(&&'">-'%)#"0'56"(#)-*@''T59'(&&'">-'0#),&-M%)#"'()(&/0-0'#)'">-'$(#)'"-E"'1T#,%9-'<2V';-'0-"'('!Ch'">9-0>5&*'1*(0>M&#)-H'!Chd[@Y2'(65Q-';>#4>';-';#&&'"-9$'('%)#"'a0#),&-M%)#"b@''''1K2'A5'(0D'#:'">-'9-0%&"';(0'956%0"'"5'5%9'4>5#4-'5:'">-'0#),&-M%)#"'!Ch'">9-0>5&*V';-'0/0"-$("#4(&&/'Q(9#-*'">-'">9-0>5&*'()*'9-M8-9:59$-*'">-'0($-'()(&/0-0@''A>-'8&5"'0>5;0'">-'-E8-9#-)4-M#)*%4-*'4>(),-'#)'0#X-'"5&-9()4-'1"'0#X-'"5&-9()4-V'0($-'(0'#)'T#,%9-'<W2'("'">-'0;(8'19-*2'()*')5)M0;(8'16&%-2'0#X-@''3-':5%)*'">("'">-'9-0%&"';(0'>#,>&/'956%0"'"5'">-'0#),&-M%)#"'0-&-4"#5)'49#"-9#(V'()*'">-'-E8-9#-)4-'#)*%4-*'-::-4"'("'">-'0;(8'0#X-'5)&/',9-;'0"95),-9';>-)';-'#)49-(0-*'">-'0"9#4")-00'5:'">-'0#),&-M%)#"0'49#"-9#(@''mm'8nY@YYGV'655"0"9(8?'(995;'>-(*'0>5;0'">-'0#),&-M%)#"'">9-0>5&*'%0-*'#)'T#,%9-'<@''

Page 25: Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal

! ""

1N2'F-()'c'!RF'0#X-'"5&-9()4-':59'">-'0;(8'()*'45)"95&'567-4"0'$-(0%9-*'#)'">-'0($-'858%&("#5)'5:')-%95)0@''!($-'(0'T#,%9-'<K@'

'#$%%&'(')*+&,-./$0',#",!

!!

C2D#$"(,LF((R/-'F5Q-$-)"'B(""-9)'*%9#),'RE850%9-'RQ-)"0@'''

Page 26: Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal

! "#

1.2'U%9'89-Q#5%0'0"%*/'5)'+A'850#"#5)'"5&-9()4-'&-(9)#),'1^#'()*'W#N(9&5V'IYY=2'0>5;-*'">("'%)0%8-9Q#0-*'-E8-9#-)4-'5:'"-$859(&&/'45)"#,%5%0'#$(,-0'45%8&-*'6/'()'#)"-9Q-)#),'0(44(*-'4()'9-0>(8-'+A'850#"#5)'"5&-9()4-@'']-9-V';-'0>5;-*'">("'%)0%8-9Q#0-*'-E8-9#-)4-'5:'"-$859(&&/'45)"#,%5%0'#$(,-0'89-0-)"-*'5)'()#$(&0j'4-)"-9'5:',(X-'#0'0%::#4#-)"'"5'#)*%4-'+A'0#X-'"5&-9()4-'&-(9)#),@''A>-'()#$(&0':9--&/'Q#-;-*'(',9(/'45$8%"-9'049--)'5)';>#4>'567-4"0'#)"-9$#""-)"&/'(88-(9-*'("'9()*5$'850#"#5)@''3-'*-8&5/-*'">-'-E8-9#-)4-'$()#8%&("#5)'1#@-@'#$(,-'8(#9#),'(49500'"#$-2'*%9#),'69#-:'8-9#5*0'5:'">-'()#$(&0j':#E("#5)@''A>-'-E850%9-'-Q-)"0';-9-'$-()"'"5'$#$#4'9-,#$-0'5:')("%9(&'Q#0#5)';>-9-'567-4"'4>(),-'0#X-'5)'">-'9-"#)(&'*%-'"5'567-4"'$5"#5)'1*-M45%8&-*':95$'#)"-9Q-)#),'-/-'$5Q-$-)"02@'']5;-Q-9V'#"';(0'8500#6&-'">("'">-'*#045)"#)%5%0'#$(,-'4>(),-0';-'-$8&5/-*'(&;(/0'#)*%4-*'0$(&&'0(44(*-0':95$'">-'()#$(&0'*%9#),'">-'-E850%9-'-Q-)"0V'>-)4-'">-'560-9Q-*'+A'0#X-'"5&-9()4-'&-(9)#),'#0'0#$8&/'">-'0($-'8#-4-'5:'8>-)5$-)5&5,/'(0'">-'+A'850#"#5)'"5&-9()4-'&-(9)#),'9-859"-*'6-:59-@'']-9-';-'-E($#)-'">#0'8500#6#&#"/'6/'()(&/X#),'">-'!"#$%&'()*+,%('-/-'$5Q-$-)"'*("('(95%)*'">-'"#$-'5:'#$(,-'4>(),-'1cGYY$02@''A>-'8&5"0'0>5;'">-'0"#$%&%0'89-0-)"("#5)'"#$-'0-L%-)4-'1"582'()*'(&#,)-*'-/-'850#"#5)'*("('165""5$2'*%9#),'(':-;'-E850%9-'-Q-)"0':95$'5)-'-E($8&-'!"#$%&'()*+,%(@''A>-'()#$(&0';-9-'(6&-'"5'$(#)"(#)'">-#9',(X-'850#"#5)'">95%,>5%"'">-'8-9#5*0'5:'#$(,-'4>(),-'#)'$50"'4(0-0V'">5%,>'">-9-';-9-'$#)59'*9#:"0'1"/8#4(&&/'nG\2@''U44(0#5)(&&/V'">-'()#$(&0'$(*-'0$(&&'0(44(*-0'19-*'-/-'"9(4-02V'>5;-Q-9V'">-0-'5)&/'45)0"#"%"-*'('0$(&&':9(4"#5)'5:'(&&'-E850%9-'-Q-)"0V'0--'1N2@'''1K2'.&&'">-'-/-'$5Q-$-)"'*("(':95$'">-'-E($8&-'!"#$%&'()*+,%(';(0'8&5""-*'#)'">-#9'9-&("#5)0>#8'6-";--)'">-'"5"(&'-/-'*#08&(4-$-)"'()*'8-(D'Q-&54#"/'(95%)*'">-'"#$-'5:'#$(,-'4>(),-'1cGYY$02@''R(4>'*("('85#)"'9-89-0-)"0'*("(':95$'5)-'-E850%9-'-Q-)"'1#@-@'5)-'"9(4-'#)'1.22@''T59'0(44(*-0'19-*'*5"02V'">-9-';(0'('0/0"-$("#4'9-&("#5)0>#8'6-";--)'">-'8-(D'Q-&54#"/'()*'-/-'*#08&(4-$-)"'1#@-@'$(#)'0-L%-)4-2V';>#4>'*#0"#),%#0>-*'#"0-&:':95$'">-'8(""-9)'5:':#E("#5)'-/-'$5Q-$-)"'16&(4D'*5"02@''A>-9-';(0'(&;(/0',55*'0-8(9("#5)'6-";--)'">-'";5'"/8-0'5:'-/-'$5Q-$-)"'8(""-9)V'">%0';-'%0-*'('8-(D'Q-&54#"/'">9-0>5&*'"5'*-:#)-'0(44(*-0'1F5)D-/'GH'`PY\Z0?'F5)D-/'IH'`<Y\Z02@''1N2']#0"5,9($0'5:'-/-'$5Q-$-)"'8-(D'Q-&54#"/'*%9#),'(&&'-E850%9-'-Q-)"0'1RE8-9#$-)"'+'858%&("#5)'*("(H'(&&'!"#$%&'()*+,%(%'(49500'(&&'9-459*#),'0-00#5)0';-9-'45$6#)-*2@''RE850%9-'-Q-)"0'">("'45)"(#)-*'0(44(*-0'(9-'0>5;)'#)'9-*'6#)0'()*'-E850%9-'-Q-)"0';#">5%"'0(44(*-0'(9-'#)'6&(4D'6#)0@''A>-'()#$(&0'$(*-'0(44(*-0'5)&/'5)'('0$(&&':9(4"#5)'5:'(&&'-E850%9-'-Q-)"0'1F5)D-/'I';(0'0&#,>"&/';590-2@''p#Q-)'">-'0$(&&'544%99-)4-'5:'0(44(*-0'#)'45$8(9#05)'"5'5%9'89-Q#5%0'0"%*/'5)'850#"#5)'"5&-9()4-';>-9-'0(44(*-0'(445$8()#-*'-Q-9/'-E850%9-'-Q-)"'1^#'()*'W#N(9&5V'IYY=2V';-'45)4&%*-*'">("'">-'8500#6#&#"/'5:'#)"-9Q-)#),'0(44(*-0'4())5"'(445%)"':59'">-'560-9Q-*'+A'0-&-4"#Q#"/'4>(),-@!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

Page 27: Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal

! "$

#$%%&'(')*+&,-./$0',#6!!

!

!!

!C2D#$"(,MF((R::-4"'!#X-'N5$8(9#05)0'(49500'W#::-9-)"'RE8-9#-)4-'F()#8%&("#5)0'(0'('T%)4"#5)'5:'RE850%9-'A#$-@'''F-()'4>(),-'#)'+A'567-4"'0-&-4"#Q#"/V'"1BMC2V'(0'(':%)4"#5)'5:'0;(8'-E850%9-'"#$-':59'*#::-9-)"'-E8-9#-)4-'$()#8%&("#5)0'1#@-@'RE8-9#$-)"0'+V'++V'+++?'850#"#5)'-E8-9#$-)"0H'^#'()*'W#N(9&5V'IYY=2@''RE850%9-'"#$-';(0'*-"-9$#)-*'6(0-*'5)'">-'"#$-'-(%.)*+,%('*("(':#&-0';-9-'0(Q-*@''T59'-(4>'*("('85#)"0V';-'45$8%"-*'">-'(Q-9(,-'-E850%9-'"#$-'(49500'(&&'">-')-%95)0Z0#"-0'1,95%8-*'6/'">-#9'-(%.)*+,%(')%$6-902@''B&5"':59$("'#0'">-'0($-'(0'$(#)'"-E"'T#,%9-'[@''F-()'c'!RF'0-&-4"#Q#"/'4>(),-'("'">-')5)M0;(8'0#X-'159'850#"#5)2'#0'0>5;)'#)'6&%-'1855&-*'(49500'(&&'-E8-9#$-)"02@''!i.H'0#),&-M%)#"'(4"#Q#"/?'Fi.H'$%&"#M%)#"'(4"#Q#"/@''''

'!

!

!

!

!

!

!

!

!

!

!

!

!

!

Page 28: Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal

! "%

#$%%&'(')*+&,-./$0',#7,!

!!

(C2D#$"(,NF((K9-(D#),'()*'K%#&*#),'A5&-9()"'0-&-4"#Q#"/V'!#X-'RE8-9#$-)"'W("(@''F-()'c'!RF')59$(&#X-*'9-085)0-'"5'567-4"'B'()*'C'("'">-'0;(8'0#X-'1.2'()*')5)M0;(8'0#X-'1K2'($5),'0%6M858%&("#5)0'5:'+A'$%&"#M%)#"'0#"-0@''U">-9'*-"(#&0'0($-'(0'T#,%9-'P'()*'k@''!#X-'-E8-9#$-)"'*("('5)&/'1RE8-9#$-)"'+'()*'++2@''

!

!

!

!

!

!

!

!

!

!

Page 29: Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal

! "&

,#--*"+"&./*(78-"$2+"&./*(O$%9"5#$"4(((624#/*(,.2+#*2('''!"#$%&#';-9-'89-0-)"-*'5)'('IGb'NhA'$5)#"59'1=[']X'9-:9-0>'9("-V'`<='4$'(;(/V'6(4D,95%)*',9(/'&%$#)()4-H'II'N*Z$IV'$(E';>#"-H'<P'N*Z$I2@''3-'%0-*'OP'(4>95$("#4'#$(,-0':95$'";5'4&(00-0'5:'Q#0%(&'0"#$%&#H'<='4%"5%"')("%9(&'567-4"0'()*'<='0#&>5%-""-'0>(8-0V'65">'89-0-)"-*'5)',9(/'6(4D,95%)*'1T#,%9-'!G2@''3-'4>50-'">-0-'";5'4&(00-0'5:'0"#$%&#'"5'6-'0%::#4#-)"&/'*#::-9-)"':95$'-(4>'5">-9'#)'">-#9'8#E-&M;#0-'0#$#&(9#"/'1T#,%9-'!G2V'05'">("')-%95)(&'8&(0"#4#"/'#)*%4-*'($5),'5)-'567-4"'4&(00';5%&*'6-'%)&#D-&/'a08#&&M5Q-9b'"5'">-'5">-9'4&(00'15%9'9-0%&"0'()*'89-Q#5%0';59D'45):#9$-*'">#0'(00%$8"#5)V'0--'T#,%9-'!<'()*'^#'()*'W#N(9&5V'IYY=2@''.&&'0"#$%&#';-9-'89-0-)"-*'5)'">-'()#$(&j0'4-)"-9'5:',(X-'*%9#),'+A'0-&-4"#Q#"/'"-0"#),@''+)'(&&'-E8-9#$-)"0V';-'(&;(/0'%0-*'">9--'567-4"'0#X-0'1G@[\V'<@[\V'O\V'#)'RE8-9#$-)"'+'()*'++?'G@[\V'e\V'<@[\'#)'RE8-9#$-)"'+++2@''U67-4"'0#X-';(0'*-:#)-*'(0'">-';#*">'5:'">-'0$(&&-0"'65%)*#),'0L%(9-'"5'45)"(#)'">-'567-4"@''A>-'$-*#%$'567-4"'0#X-0';-9-'%0-*'"5'8#4D'89-:-99-*'1B2'()*')5)M89-:-99-*'1C2'567-4"0':59'()'+A'0#"-'#)'()'#)#"#(&'049--)#),'10--'/(&'$0,1)2%%,3%'6-&5;2V'6%"';-'*-0#,)-*'5%9'$()#8%&("#5)0'()*'()(&/0-0'"5':54%0'5)'">-'";5'-E"9-$#"/'0#X-0'1T#,%9-0'GKV'GNV'=.2@'''''+)'RE8-9#$-)"'++V'"5'49-("-'">-'0$55">&/MQ(9/#),'#*-)"#"/M4>(),#),'$5Q#-'0"#$%&#V';-'49-("-*'$598>'&#)-0'6-";--)'('0%60-"'5:'">-'0#&>5%-""-'0>(8-0@''!-Q-)'#)"-9$-*#("-'$598>0';-9-'49-("-*'#)M6-";--)'-(4>'567-4"'8(#90@''A>-'$5Q#-'0"#$%&#';-9-'49-("-*'"5'$("4>'">-'*/)($#40'5:'567-4"'0#X-'4>(),-0'">("'45%&*'6-'-)45%)"-9-*'#)'">-')("%9(&';59&*'10--'T#,%9-'!I2@''''P"</)2%$/*(Q44/;((N%0"5$'05:";(9-'45)"95&&-*'">-'0"#$%&%0'89-0-)"("#5)'()*'6->(Q#59(&'$5)#"59#),@''R/-'850#"#5)';(0'$5)#"59-*'#)')-(9&/'9-(&M"#$-'1&(,'5:'`e'$02'%0#),'0"()*(9*'04&-9('45#&'"-4>)#L%-'1h56#)05)V'GOPe2'()*'#)M>5%0-'05:";(9-V'()*'0(44(*-0'oY@Ig';-9-'9-&#(6&/'*-"-4"-*'1W#N(9&5'()*'F(%)0-&&V'IYYY2@'''''(8'9*,:;+9'<''W%9#),'-(4>'-(%.)*+,%('1`GY'$#)%"-02V'+A')-%95)(&'0-&-4"#Q#"/';(0'8956-*'#)'";5'*#::-9-)"'"(0D0@''F5)D-/'G':9--&/'0-(94>-*'()'(99(/'5:'-#,>"'0$(&&'*5"0'10#X-'Y@I\2'Q-9"#4(&&/'(99(),-*'e\'(8(9"@''A>-'*5"0')-Q-9'4>(),-*'#)'(88-(9()4-V'6%"'5)'-(4>'a"9#(&bV'5)-'*5"';5%&*'6-'9()*5$&/'6(#"-*'#)'">("'('7%#4-'9-;(9*';(0',#Q-)';>-)'">-'()#$(&':5Q-("-*'">("'*5"V'()*'">-')-E"'a"9#(&b'45)"#)%-*'%)#)"-99%8"-*@''A/8#4(&&/V'">-'$5)D-/'0(44(*-*':95$'5)-'*5"'"5'()5">-9'1)5"'(&;(/0'">-'4&50-0"'*5"2'&55D#),':59'">-'>#**-)'9-;(9*@''W%9#),'">#0'"(0DV'567-4"'#$(,-0';-9-'89-0-)"-*'1GYY'$0'*%9("#5)2'5)'">-'()#$(&J0'4-)"-9'5:',(X-V'15)0-"'"#$-';(0'">-'*-"-4"-*'-)*'5:'('0(44(*-?'(8895E#$("-&/'5)-'0%4>'89-0-)"("#5)'-Q-9/'5">-9'0(44(*-V')-Q-9'6(4DM"5M6(4D'0(44(*-02@''A>%0V'">-'$5)D-/J0'"(0D';(0'%)9-&("-*'"5'">-0-'"-0"'0"#$%&#@''A5'&#$#"'%);()"-*'-E8-9#-)4-';#">'">-'Q#0%(&'0"#$%&#V'-(4>'0%4>'89-0-)"-*'567-4"';(0'#$$-*#("-&/'9-$5Q-*'%85)'*-"-4"#5)'5:'()/'0(44(*-'()*'">-0-'(659"-*'89-0-)"("#5)0';-9-')5"'#)4&%*-*'#)'">-'5::&#)-'()(&/0-0@''F5)D-/'I'8-9:59$-*'('$59-'0"()*(9*':#E("#5)'"(0D'#)';>#4>'#"':5Q-("-*'('0#),&-V'4-)"9(&'*5"'10#X-'Y@I\V'cG@[\':#E("#5)';#)*5;2';>#&-'567-4"'#$(,-0';-9-'89-0-)"-*'("'('

Page 30: Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal

! "'

)("%9(&V'9(8#*'9("-'1['#$(,-0Z0?'GYY'$0'*%9("#5)V'GYY'$0'6&()D'#)"-9Q(&02@''h-;(9*';(0',#Q-)'("'">-'-)*'5:'">-'"9#(&'1[M='#$(,-0'89-0-)"-*'8-9'"9#(&2@''i85)'()/'69-(D'#)':#E("#5)V'()/'4%99-)"&/'89-0-)"'567-4"'#$(,-';(0'#$$-*#("-&/'9-$5Q-*'1()*')5"'#)4&%*-*'#)'">-'()(&/0-02V'()*'">-'"9#(&'(659"-*@''A>-'()#$(&'45%&*'"/8#4(&&/'$(#)"(#)':#E("#5)'0%44-00:%&&/'#)'ok[f'5:'">-'"9#(&0@''.0#*-':95$'">-'"(0D'*#::-9-)4-0'1:9--MQ#-;#),'0-(94>'Q0@':#E("#5)2V'9-"#)(&'0"#$%&("#5)'#)'">-'";5'"(0D0';(0'-00-)"#(&&/'#*-)"#4(&@''`PY'1cI2'9-8-"#"#5)0'5:'-(4>'#$(,-';-9-'45&&-4"-*'#)'">-':#90"'-(%.)*+,%('()*'`[Y'1cI2'9-8-"#"#5)0'#)'(&&'">-'&("-9'-(%.)*+,%(%@'''''=>%?9$0',:;+9'<''W%9#),'-(4>'!"#$%&'()*+,%('1`G@['>92V'">-'()#$(&':9--&/'Q#-;-*'">-'$5)#"59';>#&-'567-4"'#$(,-0'180-%*5M9()*5$&/'4>50-)2'#)"-9$#""-)"&/'(88-(9-*'("'9()*5$'850#"#5)0'5)'">-'049--)@''K-4(%0-':5Q-("#),'('0%**-)&/'(88-(9#),'567-4"'#0'(')("%9(&V'(%"5$("#4'6->(Q#59V'-00-)"#(&&/')5'"9(#)#),';(0'9-L%#9-*V'()*'">-'$5)D-/'(&$50"'(&;(/0'&55D-*'*#9-4"&/'"5'">-'567-4"'1oOYf'5:'">-'"#$-2@''GYY'$0'(:"-9'">-'()#$(&'>(*':5Q-("-*'">-'567-4"'1*-:#)-*'6/'('0(44(*-'5::0-"'49#"-9#('5:'-/-'Q-&54#"/nGY\Z0'()*'('cG@[\';#)*5;'4-)"-9-*'5)'">-'567-4"2V'">-'567-4"'%)*-9;-)"'('0#X-'4>(),-'5)'">-'()#$(&J0'4-)"-9'5:',(X-@''+$859"()"&/V'05$-'5:'">-'567-4"'0#X-'4>(),-0';-9-'(445$8()#-*'6/'#*-)"#"/'4>(),-0'1#@-@'5%9'D-/'$()#8%&("#5)V'0--'*-"(#&0'5:'08-4#:#4'-E8-9#$-)"0)#)'">-'$(#)'"-E"'RE8-9#$-)"(&'B954-*%9-02@''A>-':9--'Q#-;#),';(0'$-()"'"5'D--8'">-'$5)D-/'-),(,-*'#)')("%9(&'Q#0%(&'-E8&59("#5)V'6%"'">-'$()#8%&("#5)'5:'567-4"'0#X-'0"("#0"#40';(0'(&;(/0'*-8&5/-*'*%9#),'">-'69#-:'#)"-9Q(&0'5:':#E("#5)'*%9#),')("%9(&'-E8&59("#5)'10--'-/-'$5Q-$-)"'()(&/0-0'#)'T#,%9-'!k2@''A>-'()#$(&';(0'5)&/'9-;(9*-*':59'&55D#),'"5'">-'567-4"'"5'-)45%9(,-'-E8&59("#5)V'">%0')5'-E8&#4#"'0%8-9Q#0#5)';(0'#)Q5&Q-*@''A>-9-';-9-'('"5"(&'5:'='*#::-9-)"'-E850%9-'-Q-)"'"/8-0'#)'">-':%&&'*-0#,)'1#&&%0"9("-*'6/'">-'-#,>"'(995;0'#)'T#,%9-'GK2@''U)-'!"#$%&'()*+,%('45)0#0"-*'5:'GPYY'-E850%9-'-Q-)"0H'IYY'-E850%9-'-Q-)"0'8-9'(995;'-E(4"&/@''(((!"#$%&/*(Q44/;((F%"#M%)#"'(4"#Q#"/'1Fi.2';(0',(">-9-*':95$'G[<'+A'0#"-0'1)d<<':59'RE8-9#$-)"'+?'GO':59'RE8-9#$-)"'++?'OG':59'RE8-9#$-)"'+++2'6/'9()*5$&/'0($8&#),'5Q-9'('`<E<'$$'(9-('5:'">-'Q-)"9(&'!A!'()*'Q-)"9(&'0%9:(4-'&("-9(&'"5'">-'.FA!'1]590-/MN&(9D'4559*#)("-0H'.B'GeMGk'$$?'F^'G=MII'$$'("'9-459*#),'*-8">2':95$'">-'&-:"'>-$#08>-9-0'5:'";5'$5)D-/0@''Fi.';(0'*-:#)-*'(0'(&&'">-'0#,)(&';(Q-:59$0'#)'">-'08#D#),'6()*'1eYY']X'q'k'D]X2'">("'49500-*'('">9-0>5&*'0-"'"5'`I'0@*@'5:'">-'6(4D,95%)*'(4"#Q#"/@''A>("'">9-0>5&*';(0'>-&*'45)0"()"':59'">-'-)"#9-'0-00#5)@''.'0)#88-"'5:';(Q-:59$'*("('0($8&-*'("'Y@Yk'$0'#)"-9Q(&0';(0'9-459*-*':59'='$0'(95%)*'-(4>'">9-0>5&*M"9#,,-9#),'-Q-)"'()*'0(Q-*':59'5::&#)-'08#D-'059"#),'10--'4,.,)20,13%(%'6-&5;2@''R(4>'*(/V'(',&(00'0>#-&*-*'8&("#)%$M#9#*#%$'$#495-&-4"95*-';#9-';(0'#)"95*%4-*'#)"5'">-'69(#)'56,'(',%#*-M"%6-'()*'(*Q()4-*'"5'">-'Q-)"9(&'0%9:(4-'5:'">-'"-$859(&'&56-'6/'('>/*9(%&#4'$#495*9#Q-'1,%#*-*'6/'()("5$#4(&'Fh+2@''3-'">-)'(*Q()4-*'">-'$#495-&-4"95*-';>#&-'">-'OP'567-4"'#$(,-0'1T#,%9-'!G2';-9-'80-%*5M9()*5$&/'89-0-)"-*'5)'">-'()#$(&0J'4-)"-9'5:',(X-'1()#$(&'"(0D0'#*-)"#4(&'"5'">50-'#)'">-'-(%.)*+,%(%2@''U)4-'('Q#0%(&&/'*9#Q-)'9-459*#),'0#"-';(0':5%)*'16(0-*'5)'5)&#)-'#)08-4"#5)2V';-'0"588-*'(*Q()4#),'()*'&-:"'">-'-&-4"95*-'#)'">-'69(#)'"5'(&&5;':59'"#00%-'0-""&#),'1%8'"5'I'>5%902'6-:59-'">-'9-459*#),'0-00#5)'0"(9"-*@''R(4>'9-459*#),'0-00#5)'6-,()';#">'()'#)#"#(&'049--)#),'#)';>#4>'">-'+A'0#"-0';-9-'8956-*';#">'">-'0($-'567-4"'0-"'1OP'567-4"0V'`GY'9-8-"#"#5)0'8-9'567-4"V'(&&'89-0-)"-*'5)'">-'4-)"-9'5:',(X-2':59'567-4"'8(#9'

Page 31: Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal

! "(

0-&-4"#5)H'',,,,@+.),8'9*,ABC'D*9,E#F+%,ABC'D*9G<''.$5),'">-'567-4"0'">("'*95Q-'">-'0#"-'0#,)#:#4()"&/'(65Q-'#"0'6(4D,95%)*'9-085)0-'1"M"-0"'(,(#)0"'9()*5$&/'#)"-9&-(Q-*'6&()D'89-0-)"("#5)V'8nY@Y[V')5"'4599-4"-*':59'$%&"#8&-'"-0"02V'">-'$50"'89-:-99-*'1B2'()*'&-(0"'89-:-99-*'1C2'567-4"0';-9-'4>50-)'(0'('8(#9@''A>%0V'65">'567-4"0'"-)*-*'"5'*9#Q-'">-')-%95)(&'9-459*#),'0#"-V'()*'$50"'0#"-0'>(*'0-&-4"#Q#"/':59'5)-'1B2'5Q-9'">-'5">-9'1C2@'A>-0-'";5'567-4"0';-9-'4>50-)'0%67-4"'"5'">-'45)*#"#5)'">("'65">'567-4"0';-9-':95$'">-')("%9(&'567-4"'4&(00'1F5)D-/'G2'59'65">';-9-':95$'">-'0#&>5%-""-'567-4"'4&(00'1F5)D-/'I?'0--'T#,%9-'!G2@'',,,,H?)*0?&,ABC'D*9<''T59'-(4>'9-459*-*'+A'0#"-V';-'(&05'%0-*'">-'0($-'#)#"#(&'049--)#),'1(65Q-2'"5'4>550-'('0-45)*'8(#9'5:'45)"95&'567-4"0'1Bj'()*'Cj2@''U%9',5(&';(0'"5'4>550-'";5'567-4"0'">-'+A'0#"-';(0'0-&-4"#Q-':59'6%"';-9-'Q-9/'*#0"()"':95$'">-'0;(8'567-4"0'#)'+A'0>(8-'08(4-@''K-4(%0-';-'*5')5"'D)5;'">-'*#$-)0#5)0'5:'+A'0>(8-'08(4-V';-'4())5"'0"9#4"&/'-):594-'">#0@''+)'89(4"#4-V';-'0#$8&/'-)0%9-*'">("'">-'45)"95&'567-4"0';-9-'(&;(/0'4>50-)':95$'">-'567-4"'4&(00'">("';(0')5"'%0-*':59'">-'0;(8'567-4"0'1#@-@'">-'0#&>5%-""-'567-4"'4&(00':59'F5)D-/'GV'()*'">-')("%9(&'567-4"'4&(00':59'F5)D-/'IV'0--'T#,%9-'!G2@''3#">#)'">#0'45)0"9(#)"V'">-'45)"95&'567-4"0';-9-'4>50-)'%0#),'">-'-E(4"'0($-'9-085)0#Q#"/'()*'0-&-4"#Q#"/'49#"-9#('(0'">-'0;(8'567-4"0'1*-049#6-*'(65Q-2@'''''U)4-'">-'#)#"#(&'049--)#),'()*'567-4"'0-&-4"#5)';(0'45$8&-"-*V';-'">-)'4(99#-*'5%"'">-'-(%.'()*'!"#$%&'()*+,%(%'#)'(&"-9)("#5)';>#&-'$(D#),'45)"#)%5%0'9-459*#),':95$'">-'+A'0#"-':59'">-'-)"#9-'9-459*#),'0-00#5)'1`e'>5%902@''A>-'0;(8'567-4"0'()*'45)"95&'567-4"0';-9-'-(4>'"-0"-*'("'(&&'">9--'0#X-0'#)'-(4>'-(%.)*+,%('6%"'5)&/'">-'0;(8'567-4"0';-9-'0>5;)'()*'$()#8%&("-*'*%9#),'">-'!"#$%&'()*+,%(%@'''''R/./(Q&/*;4"4(((C-%95)(&'*("('9-459*-*':95$'">-'G[<'+A'0#"-0';(0':#90"'"-0"-*':59'">-#9'567-4"'0-&-4"#Q#"/@''U::&#)-'()(&/0-0'9-Q-(&-*'">("'(':9(4"#5)'5:'">-'0#"-0';-9-')5"'0#,)#:#4()"&/'0-&-4"#Q-'($5),'">-'0;(8'567-4"'8(#90'1";5M;(/'.CUS.V'I'567-4"'E'e'0#X-0V'8oY@Y[':59'65">'a567-4"b'$(#)'-::-4"'()*'a567-4"'E'0#X-b'#)"-9(4"#5)2V'8956(6&/'6-4(%0-'5)&/'('&#$#"-*')%$6-9'5:'9-085)0-'9-8-"#"#5)0';-9-'45&&-4"-*'*%9#),'">-'#)#"#(&'049--)#),'()*';-'0-&-4"-*'">-'567-4"0'"5'65">'895*%4-'('0"("#0"#4(&&/'0#,)#:#4()"'9-085)0-'1(0'*-049#6-*'(65Q-2@''3-'-E4&%*-*'">50-'0#"-0'()*'5)&/'45)4-)"9("-*'5)'">-'9-$(#)#),'567-4"M0-&-4"#Q-'0#"-0'1)d<e':59'RE8-9#$-)"'+?'GO':59'RE8-9#$-)"'++?'eP':59'RE8-9#$-)"'+++V'$()/'0#"-0':95$'RE8-9#$-)"'+++'0>5;-*'0#,)#:#4()"'0-&-4"#Q#"/'5)&/':59'">-'0;(8'567-4"'8(#9'59'5)&/':59'">-'45)"95&'567-4"'8(#9V'6%"';-'45)4-)"9("-*'5)'">-'0#"-0'">("'0>5;-*'0#,)#:#4()"'0-&-4"#Q#"/':59'65">'">-'0;(8'()*'45)"95&'567-4"'8(#902@'A>-0-'0#"-0';-9-'0%67-4"'"5'5)-'$59-'049--)#),':59'9-459*#),'0"(6#&#"/'10--'6-&5;2'()*'(&&'">-'9-0%&"0'89-0-)"-*'#)'">-'$(#)'"-E"';-9-':95$'">-'567-4"M0-&-4"#Q-'()*'0"(6&-'0#"-0'1)dIk':59'RE8-9#$-)"'+?'G[':59'RE8-9#$-)"'++?'eG':59'RE8-9#$-)"'+++2@''((I'D?0J.)/,#*+B.&.*K,#D0'')<''3-';-9-'#)"-9-0"-*'#)'08-4#:#4'0-&-4"#Q#"/'4>(),-0'#)*%4-*'6/'5%9'-E8-9#-)4-'$()#8%&("#5)@'']5;-Q-9V';-';-9-'45)4-9)-*'">("')5)M08-4#:#4'0-&-4"#Q#"/'4>(),-0'1-@,@'9-0%&"#),':95$'-&-4"95*-'*9#:"0'#)'"#00%-'59')-%95)(&'#)7%9/2'45%&*'85"-)"#(&&/'45)"($#)("-'5%9'-::-4"'5:'#)"-9-0"@''U%9'45)"95&0';-9-'*-0#,)-*'"5'$(D-'0%9-'">("';-';5%&*')5"'#)"-989-"'()/'0%4>'-::-4"0'(0'-Q#*-)4-'5:'&-(9)#),V'6%"';-'0"#&&';()"-*'"5'*5'5%9'6-0"'"5'#)0%9-'

Page 32: Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal

! ")

">("'()/')5)M08-4#:#4'-::-4"0';5%&*')5"'$(0D'">-'0#X-'5:'5%9'-::-4"'5:'#)"-9-0"@'''i)&#D-'0#),&-M%)#"'9-459*#),';>-9-'5)-'4()'7%*,-'">-'0"(6#&#"/'5:'9-459*#),'6(0-*'5)'08#D-';(Q-:59$'#05&("#5)V';-'*5')5"'>(Q-'0%4>'$-(0%9-0'#)'$%&"#M%)#"'9-459*#),@''A>%0';-'05%,>"'()5">-9'#)*-8-)*-)"'$-(0%9-'5:'9-459*#),'0"(6#&#"/@''A5'*5'">#0V';-'9-&#-*'5)'">-'0-&-4"#Q#"/'($5),'">-'45)"95&'567-4"'#$(,-0'10--'(65Q-2@''3-'8954--*-*'%)*-9'">-'(00%$8"#5)'">("'">-0-'45)"95&'567-4"'#$(,-0';-9-':(9'(8(9"':95$'">-'0;(8'567-4"'8(#90'#)'">-'+A'0>(8-'08(4-V'">-9-'0>5%&*'6-'&#""&-'4>(),-'#)'">-'0-&-4"#Q#"/'($5),'">-0-'45)"95&'567-4"'#$(,-0'#)*%4-*'6/'5%9'-E8-9#-)4-'$()#8%&("#5)'15%9'9-0%&"0'()*'89-Q#5%0';59D'45):#9$-*'">#0'(00%$8"#5)?'0--'!%88&-$-)"(&'T#,%9-'!<'()*'^#'()*'W#N(9&5V'IYY=2@'''A>("'#0V'">-'9-085)0-'"5'">-0-'567-4"0'895Q#*-0'(',(%,-'5:'()/')5)M08-4#:#4'4>(),-0'#)'+A'0-&-4"#Q#"/@''A5'L%()"#:/'">("',(%,-V';-'45$8%"-*'B-(905)J0'4599-&("#5)'6-";--)'">-'45)"95&'#$(,-'9-085)0-'Q-4"590'1P'*#$-)0#5)(&'Q-4"59V'I'567-4"0'E'e'0#X-02'$-(0%9-*':95$'">-':#90"'()*'&(0"'-(%.)*+,%(%@''3-'*--$-*'()'+A'0#"-'a0"(6&-b'#:'#"'>(*'('4599-&("#5)'Q(&%-'>#,>-9'">()'Y@k'1T#,%9-'!<2@''+)'">-'$(#)'"-E"V';-'5)&/'89-0-)"'9-0%&"0':95$'">-0-'0"(6&-'0#"-0'6-4(%0-'">-/'895Q#*-'">-'4&-()-0"'&55D'("'5%9'*("('()*'">-'6-0"'L%()"#"("#Q-'$-(0%9-'5:'&-(9)#),'$(,)#"%*-@''N9#"#4(&&/V'">#0'0#"-'0-&-4"#5)'8954-*%9-'9-&#-0'5)&/'5)'*("('">("'#0':%&&/'#)*-8-)*-)"'5:'5%9'D-/'-E850%9-'45)*#"#5)'()*'D-/'45)"95&'45)*#"#5)'1-@,@'T#,%9-'GK2V'05'">-9-'#0')5'0-&-4"#5)'6#(0@''C-Q-9">-&-00V';-'(&05'9-8-("-*'">-'0($-'()(&/0-0'5)'(&&'5:'">-'9-459*-*'+A'0#"-0'()*':5%)*'">("'">-'$(#)'9-0%&"0';-9-'L%(&#"("#Q-&/'%)4>(),-*'10--'T#,%9-'!<2@''3-'(&05'"-0"-*'$59-'0"9#4"':59$0'5:'0"(6#&#"/'49#"-9#('">("'#)4&%*-*'6(4D,95%)*'(4"#Q#"/'4>(),-'1nGYV'n[V'()*'nI'08#D-0Z02@''3#">'">-0-'0"(6#&#"/'49#"-9#(V'(&&'">-'D-/'9-0%&"0'(&05'9-$(#)-*'">-'0($-'1T#,%9-'!<N2@''H?(%$*.)/,E:LMG,)'$0?)+&,9'&'D*.N.*K<''A5'(Q5#*'()/'6#(0'#)'">#0'-0"#$("-'5:'0-&-4"#Q#"/V':59'-(4>'+A'0#"-V';-'0-"'(0#*-'()'#)*-8-)*-)"'0-"'5:'9-085)0-'*("(':95$'">-':#90"'-(%.)*+,%('1GY'9-085)0-'9-8-"#"#5)0'"5'-(4>'567-4"'#)'-(4>'0#X-2'()*'%0-*'">50-'*("('5)&/'"5'*-:#)-'">-'&(6-&0'lBl'()*'lCl'1lBl';(0'"(D-)'(0'">-'567-4"'">("'-&#4#"-*'('6#,,-9'5Q-9(&&'9-085)0-'855&-*'(49500'567-4"'0#X-2@''3-'9-459*-*'GY'-E"9('9-085)0-'9-8-"#"#5)0'#)'">-':#90"'-(%.)*+,%('#)'()"#4#8("#5)'5:'">#0')--*':59'#)*-8-)*-)"'*("('1PY'9-8-"#"#5)0'#)'">-':#90"'-(%.)*+,%(V'[Y'9-8-"#"#5)0'#)'">-'&("-9'-(%.)*+,%(%2@''A>-'&(6-&'lBl'()*'lCl':59'">-'0#"-';(0'">-)'>-&*':#E-*'(49500'567-4"'0#X-'()*'&("-9'-(%.)*+,%(%V'()*'(&&'9-$(#)#),'*("(';(0'%0-*'"5'45$8%"-'">-'0-&-4"#Q#"/'1BMC2'%0#),'">-0-'&(6-&0@''A>#0'8954-*%9-'-)0%9-*'">("'()/'560-9Q-*'9-085)0-'*#::-9-)4-'6-";--)'567-4"'B'()*'C'9-:&-4"-*'"9%-'0-&-4"#Q#"/V')5"'0-&-4"#5)'6#(0@''K-4(%0-'*#::-9-)"'08&#""#),'5:'049--)'()*'9-$(#)#),'*("('$(/')5"'9-0%&"'#)'45)0#0"-)"'lBl'lCl'&(6-&V':59'-(4>'+A'0#"-'">#0'8954-*%9-';(0'8-9:59$-*'GYY'"#$-0'1*#::-9-)"'08&#""#),'5:'049--)'()*'9-$(#)#),'*("('#)'">-':#90"'-(%.)*+,%(2'"5'56"(#)'()'(Q-9(,-*'0-&-4"#Q#"/'-0"#$("-'1BMC2@''S(9#(6#&#"/'(9#0#),':95$'">#0'8954-*%9-'#0'9-:&-4"-*'#)'">-'-9959'6(90'5:'T#,%9-'IN'()*'eK':59'-(4>'+A'0#"-@'''#*+*.9*.D+&,8'9*9,O?0,*;',P#.Q',>,=>%?9$0'P,R)*'0+D*.?)<,,A>-'D-/'8(9"'5:'5%9'-E8-9#$-)"(&'89-*#4"#5)'#0'">("'()/'4>(),-'#)'567-4"'0-&-4"#Q#"/'0>5%&*'6-':5%)*'89-*5$#)()"&/'("'">-'0;(8'0#X-'1T#,%9-'GN2@''A5'*#9-4"&/'"-0"':59'0%4>'()'#)"-9(4"#5)'6-";--)'567-4"'0#X-'()*'5%9'#)*-8-)*-)"'Q(9#(6&-'1-E850%9-2V';-'8-9:59$-*'";5'*#::-9-)"'0"("#0"#4(&'"-0"0'5)'">-')-%95)(&'0-&-4"#Q#"/'$-(0%9-$-)"0'1BMCV'#)'%)#"0'5:'08#D-0Z02@''A>#0'$(#)'89-*#4"#5)'()*'0"("#0"#4(&'9-0%&"0'(9-':95$'855&#),'(49500')-%95)0'1#@-@'855&-*'a0%67-4"0b'*-0#,)';#">'45%)"-96(&()4-2@''T#90"V';-'(88&#-*'('";5M:(4"59'9-8-("-*'$-(0%9-0'.CUS.@''A5'*-0#,)'">-'"-0"V';-'"9-("-*'-(4>'+A'0#"-'(0'5)-'9-8-("-*'$-(0%9-$-)"'1#@-@'5)-'0%67-4"2';#">'";5';#">#)M,95%8':(4"590'

Page 33: Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal

! "*

1a-E850%9-b'()*'a0#X-b2@''h-8-("-*'$-(0%9-0'.CUS.'-E8-4"0'">("'(&&'0%67-4"0'(9-'$-(0%9-*'(49500'">-'0($-')%$6-9'5:'45)*#"#5)0V'>5;-Q-9V'5%9'*("(';(0'0%4>'">("'-(4>'+A'0#"-';(0'"-0"-*':59'*#::-9-)"#(&'($5%)"'5:'"#$-H'05$-'+A'0#"-0'>(*'">9--'-(%.)*+,%(%';>#&-'5">-90'5)&/'>(*'";5'1*%-'"5'*#::-9-)"'9("-0'5:'-E8-9#$-)"(&'895,9-00'5)'-(4>'*(/'()*')59$(&'Q(9#("#5)'#)'">-'()#$(&J0'*(#&/';59D'-">#42@''A5',-"'(95%)*'">#0'8956&-$V':59'-(4>'+A'0#"-V';-'0#$8&/'%0-*'">-'*("('5)&/':95$'">-':#90"'()*'&(0"'-(%.)*+,%(V'5$#""#),'">-'*("(':95$'">-'#)"-9$-*#("-'-(%.)*+,%(%':59'05$-'+A'0#"-0@''A>%0'#)'5%9'.CUS.'*-0#,)V'">-'a-E850%9-b':(4"59'>(*'";5'&-Q-&0V'()*'">-'a0#X-l':(4"59'(&05'>(*'";5'&-Q-&0H'0;(8'()*')5)M0;(8@''U%9'$(#)':54%0';(0'5)'">-'0#,)#:#4()"'#)"-9(4"#5)0'6-";--)'a-E850%9-b'()*'a0#X-b'10--'$(#)'"-E"2@''U%9'*("('(&05'9-Q-(&-*'0#,)#:#4()"'$(#)'-::-4"0'5:'a-E850%9-b'1RE8-9#$-)"'+H'8dY@YYY<?'RE8-9#$-)"'++H'8dY@YG<2'()*')5'0#,)#:#4()"'$(#)'-::-4"'5:'a0#X-b'18'd'Y@kI?'8'd'Y@eI2@'''p#Q-)'5%9'-E8-9#-)4-'$()#8%&("#5)'()*'45%)"-96(&()4-*'-E8-9#-)4-'*-0#,)'(49500'567-4"'0#X-V'">#0'8(""-9)'5:'$(#)'-::-4"0'#0'-E8-4"-*'%)*-9'">-'"-$859(&'45)"#,%#"/'>/85">-0#0'10--'T#,%9-'GN2@''3-'(&05'4(99#-*'5%"'('0-45)*V'$59-')5)M8(9($-"9#4'0"("#0"#4(&'"-0"':59'">-'#)"-9(4"#5)'5:'a-E850%9-b'()*'a0#X-b'6/'(88&/#),'(',-)-9(&'&#)-(9'$5*-&@''A>-':59$%&("#5)'#0'0#$#&(9'"5'.CUS.@'']5;-Q-9V'#"'#0')5"'0%67-4"'"5'(00%$8"#5)0'(65%"'">-':59$'5:'">-'"9#(&M6/M"9#(&'9-085)0-'Q(9#(6#&#"/@''3-'>(Q-'89-Q#5%0&/'%0-*'">-'0($-'$-">5*'#)'5%9'0"%*/'5)'+A'850#"#5)'"5&-9()4-'&-(9)#),'1^#'()*'W#N(9&5V'IYY=2'()*'0#$%&("#5)0';#">'B5#005)'08#D#),')-%95)0'>(Q-'45):#9$-*'">-'4599-4")-00'5:'5%9'()(&/0#0'45*-'1`[f'0#,)#:#4()"'544%99-)4-'("'8nY@Y[';#">')%&&'-::-4"02@''A>-'$5*-&'>(*'">-':5&&5;#),':59$H'

,! !"-#$%&'#" #.()*$" (./01'(%&$" $ " +# # ," $ (# ,# $ $ # ,$ $ ,( $ $-'

A>-'">9--'#)*-8-)*-)"'Q(9#(6&-0'5:'">-'$5*-&';-9-H'l0#X-l'1%2V'l-E850%9-l'1(2V'()*'">-#9'#)"-9(4"#5)'1#@-@'">-#9'895*%4"V'%7(2@''A>-'l0#X-l':(4"59'>(*'";5'&-Q-&0'1#@-@'%'d'G':59'0;(8'0#X-V'MG':59')5)M0;(8'0#X-2'">-'l-E850%9-l':(4"59'>(*'%8'"5'">9--'&-Q-&0'*-8-)*#),'>5;'&5),'('0#"-';(0'"-0"-*V'1#@-@'('d'Y':59'89-M-E850%9-V'()*'45%&*'6-'%8'"5'GPYY'-E850%9-0'#)'#)49-$-)"0'5:'=YYj02@''R(4>',0';(0'">-'0-&-4"#Q#"/'5::0-"'08-4#:#4'"5'-(4>'+A'0#"-?'89V'8:V'()*'8;';-9-'0&58-'8(9($-"-90'">("';-9-'0>(9-*'($5),'(&&'">-'0#"-0'1#@-@';#">#)'0%67-4"':(4"5902@''A>%0V'">-'45$8&-"-'$5*-&':59'5%9'858%&("#5)'5:'0'0#"-0'<0dIkV'RE8-9#$-)"'+?')dG[V'RE8-9#$-)"'++2'45)"(#)-*'('"5"(&'5:'0re'8(9($-"-90'">("';-9-':#"'0#$%&"()-5%0&/'"5'5%9'-)"#9-'*("('0-"@''A>-',0J0'(60596-*'">-'0#"-M6/M0#"-'0-&-4"#Q#"/'*#::-9-)4-0'">("';-9-')5"'5:'#)"-9-0"'>-9-V'()*'">-'9-$(#)#),'">9--'8(9($-"-90'*-049#6-*'">-'$(#)'-::-4"0'#)'">-'858%&("#5)V';#">'8;)5:'89#$(9/'#)"-9-0"'1#)"-9(4"#5)2@'''''3-':#"'">-'&#)-(9'$5*-&'"5'">-'*("('10"()*(9*'&-(0"'0L%(9-02V'()*'">-)'(0D-*'#:'">-'560-9Q-*'Q(&%-'5:'">-'#)"-9(4"#5)'8(9($-"-9'18;2';(0'0"("#0"#4(&&/'*#::-9-)"':95$'Y@''A5'*5'">#0V';-'56"(#)-*'">-'Q(9#("#5)'5:'">-'6e'-0"#$("-'56,'655"0"9(8'5Q-9'65">'+A'0#"-0'()*'9-8-"#"#5)0'5:'-(4>'0#"-J0'9-085)0-'*("(@''A>-'-E(4"'8954-*%9-';(0'*5)-'(0':5&&5;0H':59'-(4>'95%)*'5:'655"0"9(8'5Q-9'+A'0#"-0V';-'9()*5$&/'0-&-4"-*'1;#">'9-8&(4-$-)"2'0'0#"-0':95$'5%9'9-459*-*'0'0#"-0V'05'('0#"-'45%&*'85"-)"#(&&/'-)"-9'5)-'95%)*'5:'655"0"9(8'$%&"#8&-'"#$-0@''U)4-'">-'0#"-0';-9-'0-&-4"-*V';-'">-)'9()*5$&/'0-&-4"-*'1;#">'9-8&(4-$-)"2'">-'9-085)0-'9-8-"#"#5)0'#)4&%*-*':59'-(4>'0#"-'15%9'%)#"'5:'*("('>-9-';(0'('04(&(9'08#D-'9("-'#)'9-085)0-'"5'('0#),&-'9-8-"#"#5)'5:'5)-'567-4"'#$(,-'#)'5)-'0#X-2@''+$859"()"&/V'">-'0-&-4"#5)'5:'">-'9-085)0-'9-8-"#"#5)0';(0'*5)-'(:"-9';-'>(Q-'-E4&%*-*'GY'9-085)0-'9-8-"#"#5)0'9-0-9Q-*':59'*-"-9$#)#),'567-4"'&(6-&0'1aBb'()*'aCb2@''A>#0'(605&%"-'#)*-8-)*-)4-'5:'">-'*("('(&&5;-*'%0'"5'56"(#)'%)6#(0-*'0-&-4"#Q#"/'-0"#$("-0@''R(4>'0#"-J0'1BMC2';(0'45$8%"-*':95$'#"0'0-&-4"-*'9-085)0-'9-8-"#"#5)0@''A>-'&#)-(9'$5*-&';(0'">-)'

Page 34: Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal

! #+

:#"'"5'">-'*("('("'">-'-)*'5:'">-0-'";5'9()*5$'0($8&-0'"5'56"(#)'(')-;'6e'-0"#$("-@'A>#0'8954-*%9-';(0'9-8-("-*'GYYY'"#$-0'/#-&*#),'('*#0"9#6%"#5)'5:'6e'-0"#$("-0V'()*'">-':#)(&'8MQ(&%-';(0'45$8%"-*'(0'">-':9(4"#5)'5:'">("'*#0"9#6%"#5)'">("';(0'&-00'">()'Y@''A>#0'8MQ(&%-';(0'#)"-989-"-*'(0H'#:';-';-9-'"5'9-8-("'">#0'-E8-9#$-)"V';#">'65">'">-'Q(9#(6#&#"/'560-9Q-*'#)'">-')-%95)(&'9-085)0-0'(0';-&&'(0'">-'Q(9#(6#&#"/'#)';>#4>'+A'0#"-0';-9-'0($8&-*V';>("'#0'">-'4>()4-'">("';-';5%&*'0$.'0--'">-'#)"-9(4"#5)'560-9Q-*'>-9-s''+)'-::-4"V'">#0'655"0"9(8'8954-*%9-'(&&5;-*'%0'"5'*-9#Q-'('45):#*-)4-'#)"-9Q(&'5)'">-'$5*-&'8(9($-"-9'-0"#$("-'16e2V'()*'">-'*%(&#"/'5:'45):#*-)4-'#)"-9Q(&0'()*'>/85">-0-0'"-0"#),'(&&5;-*'%0'"5'9-859"'">("'45):#*-)4-'#)"-9Q(&'(0'('8MQ(&%-'1R:95)'()*'A#60>#9()#V'IYYe2@'''#*+*.9*.D+&,8'9*9,O?0,*;',I'9%?)9',H;+)/',.),#.)/&',#.*'9<''3-'-Q(&%("-*'-(4>'+A'$%&"#M%)#"'0#"-j0'0-&-4"#Q#"/'1BMC2'4>(),-'6/':#""#),'&#)-(9'9-,9-00#5)'(0'(':%)4"#5)'5:'">-')%$6-9'5:'-E850%9-'-Q-)"0'"5'56"(#)'('0&58-V'"01BMC2@''A>-'0"("#0"#4(&'0#,)#:#4()4-'5:'">-'9-085)0-'4>(),-':59'-(4>'+A'0#"-';(0'-Q(&%("-*'6/'8-9$%"("#5)'"-0"@''!8-4#:#4(&&/V':59'-(4>'0#"-V';-'9()*5$&/'8-9$%"-*'">-'-(%.)*+,%('&(6-&'5:'">-'9-085)0-'*("('1#@-@';>#4>'-(%.)*+,%('-(4>'0($8&-'5:'B'()*'C'9-085)0-'*("('6-&5),-*'"5V'5%9'%)#"'5:'*("('>-9-';(0'('04(&(9'08#D-'9("-'#)'9-085)0-'"5'('0#),&-'9-8-"#"#5)'5:'5)-'567-4"'#$(,-'#)'5)-'0#X-2@''3-'">-)'9-M45$8%"-*'">-'1BMC2'0-&-4"#Q#"/'5)'">-'8-9$%"-*'*("('()*':#"'">-'&#)-(9'9-,9-00#5)@''A>-'8-9$%"("#5)'8954-*%9-';(0'8-9:59$-*'GYYY'"#$-0'"5'/#-&*'('*#0"9#6%"#5)'5:'0&58-0'1-$8#9#4(&'a)%&&'*#0"9#6%"#5)b'5:'"01BMC22@''A>-'8MQ(&%-';(0'*-"-9$#)-*'6/'45%)"#),'">-':9(4"#5)'5:'">-')%&&'*#0"9#6%"#5)'">("'-E4--*-*'">-'&#)-(9'9-,9-00#5)'0&58-'56"(#)-*':95$'">-'*("(@''.&&'0#"-0';#">'8'n'Y@Y[';-9-'*--$-*'0#,)#:#4()"'10--'$(#)'"-E"2@''''H?(B.).)/,*;',:?9.*.?),+)J,#.Q',8?&'0+)D',S'+0).)/,T+*+<,,+)'$(#)'"-E"'T#,%9-0'P'()*'kV';-'855&-*'">-'*("(':95$'0#X-'-E8-9#$-)"'+V'++V'1)d<I'Fi.'0#"-02V'()*'5%9'89-Q#5%0'850#"#5)'"5&-9()4-'-E8-9#$-)"'1)dGY'Fi.'0#"-0'45&&-4"-*'%0#),'">-'0($-'$-">5*'*-049#6-*'(65Q-V'0--'^#'()*'W#N(9&5V'IYY=2'6-4(%0-'">-'";5'-E8-9#$-)"0'%0-*'0#$#&(9'-E8-9#-)4-'$()#8%&("#5)0'()*'">-'-::-4"'$(,)#"%*-';(0'45$8(9(6&-'1T#,%9-'[2@''A5'-)"-9'">#0'()(&/0#0V';-'9-L%#9-*'">("'">-'0#"-0'>(*'1BMC2'0-&-4"#Q#"/'("'">-'$-*#%$'567-4"'0#X-Z850#"#5)'1o['08#D-0Z0'()*'n[Y'08#D-0Z0V')de<2@''A>#0';(0'*5)-'%)*-9'">-'&5,#4'">("'0%4>'0-&-4"#Q#"/'#0')--*-*'"5'895Q#*-'('*9#Q#),':594-':59'&-(9)#),@''3-'">-)'%0-*'#)*-8-)*-)"'*("('"5'*#Q#*-'">-'0#"-0'#)"5'*#::-9-)"',95%80'6(0-*'5)'">-'0-&-4"#Q#"/'("'">-'0;(8'850#"#5)Z0#X-'#)'T#,%9-'P'1p95%8'GH'(&&'0#"-0?'p95%8'IH'n<Y?'p95%8'eH'nIY?'p95%8'<H'nGY?'p95%8'[H'n[?'p95%8'PH'nY2'59'("'">-')5)M0;(8'850#"#5)Z0#X-'#)'T#,%9-'k'1p95%8'GH'nY?'p95%8'IH'n[?'p95%8'eH'nGY?'p95%8'<H'nIY?'p95%8'[H'(&&'0#"-02@'3-'%0-*'#)*-8-)*-)"'*("('"5'0-&-4"'">-0-'0%6M858%&("#5)0'05'">("'()/'0"54>(0"#4':&%4"%("#5)0'#)'0#"-M6/M0#"-'0-&-4"#Q#"/';5%&*'895*%4-')5'(Q-9(,-'0-&-4"#Q#"/'4>(),-@''#.)/&'L$).*,#?0*.)/,+)J,U)+&K9'9<''3-'8-9:59$-*'89#)4#8&-'45$85)-)"'()(&/0-0'1BN.2'6(0-*'08#D-'059"#),'5)'">-';(Q-:59$'*("('45&&-4"-*'*%9#),'-(4>'-(%.)*+,%(@''_M$-()'4&%0"-9#),';(0'8-9:59$-*'#)'">-'BN.':-("%9-'08(4-'"5'/#-&*'$%&"#8&-'%)#"0@''A>-')%$6-9'5:'4&%0"-90';(0'*-"-9$#)-*'(%"5$("#4(&&/'6/'$(E#$#X#),'">-'*#0"()4-0'6-";--)'85#)"0'5:'*#::-9-)"'4&%0"-90@''R(4>'%)#"'56"(#)-*':95$'">-'4&%0"-9#),';(0':%9">-9'-Q(&%("-*'6/'#"0'0#,)(&M"5M)5#0-'9("#5'1!ChH'9("#5'5:'8-(DM"5M8-(D'$-()';(Q-:59$'($8&#"%*-'"5'0"()*(9*'*-Q#("#5)'5:'">-')5#0-2@''T59'">-'()(&/0-0'89-0-)"-*'#)'T#,%9-'<V';-'0-"'('!Ch'">9-0>5&*'5:'[@YV'(65Q-';>#4>';-';#&&'"-9$'('%)#"'a0#),&-M%)#"b@''3-'Q-9#:#-*'">("'">-'D-/'9-0%&"'#0'956%0"'"5'">-'4>5#4-'5:'">#0'">9-0>5&*'1T#,%9-'!P2@'''

Page 35: Unsupervised Natural Visual Experience Rapidly Reshapes ...Neuron Article Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal

! #"

'K-4(%0-'">-9-';(0'(',9-("'($5%)"'5:'4-&&M"5M4-&&'Q(9#(6#&#"/'#)'+A')-%95)0J'0-&-4"#Q#"/V';-'45$8%"-*'(')59$(&#X-*'0-&-4"#Q#"/'$-(0%9-':59'-(4>')-%95)'1T#,%9-'<2@'R(4>')-%95)J0'0#X-'"5&-9()4-';(0'45$8%"-*'(0'1BMC2Z1BMC2$-*#%$V';>-9-'1BMC2'#0'">-'0-&-4"#Q#"/'($5),'">-'";5'567-4"0'("'">-'"-0"-*'0#X-'()*'1BMC2'$-*#%$'#0'">-'0-&-4"#Q#"/'("'">-'$-*#%$'567-4"'0#X-@''.'0#X-'"5&-9()4-'5:'G@Y'$-()0'">("'(')-%95)'8-9:-4"&/'$(#)"(#)-*'#"0'0-&-4"#Q#"/'(49500'">-'0#X-'Q(9#("#5)0'08())-*'>-9-@'K-4(%0-')5"'(&&'">-'0#),&-M%)#"0'>(*'567-4"'0-&-4"#Q#"/V'5)&/'%)#"0'">("'0>5;-*'0-&-4"#Q#"/'("'">-'$-*#%$'0#X-';-9-'#)4&%*-*'11BMC2'$-*#%$oG'08#D-0Z02@'''

,#--*"+"&./*(:"1"$"&9"4((2345678.!9:;:.!75<!=>55>3.!=:?:!,#++%-:!@5</3AB45C!1345641A/D!>E!F4DG7A!DH71/!D/A/684F48B!45!1>D8/34>3!

45E/3>8/I1>37A!6>38/0:!J78!J/G3>D64-..!))+K))':!

L4=73A>.!M:M:.!75<!N7G5D/AA.!M:O:P:!,#+++-:!Q>3I!3/13/D/58784>5!45!I>5R/B!45E/3>8/I1>37A!6>38/0!4D!

F438G7AAB!G57A8/3/<!SB!E3//!F4/T45C:!J78!J/G3>D64-/.!)"%K)#":!

?E3>5.!2:.!75<!U4SDH43754.!P:M:!,#++$-:!V5!W583><G684>5!8>!8H/!2>>8D8371:!,=H71I75!X!O7AA-:!

O>35.!2:!,"*)'-:!P>S>8!Y4D4>5!,NWU!Z3/DD!-:!

OG5C.!=:Z:.![3/4I75.!\:.!Z>CC4>.!U:.!75<!L4=73A>.!M:M:!,#++&-:!Q7D8!3/7<>G8!>E!>S]/68!4</5848B!E3>I!

I767^G/!45E/34>3!8/I1>37A!6>38/0:!964/56/-/01.!)'$K)'':!

W8>.!N:.!U7IG37.!O:.!QG]487.!W:.!75<!U757R7.![:!,"**&-:!94_/!75<!1>D484>5!45F734756/!>E!5/G3>57A!

3/D1>5D/D!45!I>5R/B!45E/3>8/I1>37A!6>38/0:!M>G357A!>E!J/G3>1HBD4>A>CB-./.!#")K##':!

[3/4I75.!\:.!OG5C.!=:Z:.![37DR>F.!V:.!`G43>C7.!P:`:.!Z>CC4>.!U:.!75<!L4=73A>.!M:M:!,#++'-:!aS]/68!

D/A/684F48B!>E!A>67A!E4/A<!1>8/5847AD!75<!D14R/D!45!8H/!I767^G/!45E/34>3!8/I1>37A!6>38/0:!J/G3>5-23.!%$$K

%%&:!

;4.!J:.!75<!L4=73A>.!M:M:!,#++)-:!@5DG1/3F4D/<!578G37A!/01/34/56/!3714<AB!7A8/3D!45F734758!>S]/68!

3/13/D/58784>5!45!F4DG7A!6>38/0:!964/56/-/40.!"&+#K"&+(:!

;>C>8H/84D.!J:[:.!75<!9H/45S/3C.!L:;:!,"**'-:!Y4DG7A!>S]/68!3/6>C5484>5:!V55:!P/F:!J/G3>D64:-03.!&((K

'#":!

P>S45D>5.!L:V:!,"*'$-:!V!I/8H><!>E!I/7DG345C!/B/!I>F/I/58D!GD45C!7!D6A/37A!D/736H!6>4A!45!7!I7C5/846!

E4/A<:!W???!U375D7684>5D!>5!24>I/<467A!?5C45//345C-010.!"$"K"%&:!

Y>C/AD.!P:.!75<!a3S75.!\:V:!,"**'-:!=><45C!>E!D84IGAGD!45F734756/D!SB!45E/34>3!8/I1>37A!5/G3>5D:!Z3>C!

23745!P/D-004.!"*&K#"":!

!


Recommended