Whatdoneuronsreallywant?Theroleofsemanticsincortical1
representations2
3
GabrielKreiman4
Children’sHospital,HarvardMedicalSchool5
7
Keywords:visualrecognition,deepconvolutionalnetwork,computationalmodels,8
ventralvisualcortex,abstractmeaning,categorization,neurophysiology9
10
NumberofFigures:211
12
Abstract13
Whatvisualinputsbesttriggeractivityforagivenneuronincortexandwhat14
type of semantic information may guide those neuronal responses? We revisit the15
methodologiesusedsofartodesignvisualexperiments,andwhatthosemethodologies16
havetaughtusaboutneuralcodinginvisualcortex.Despiteheroicandseminalwork17
inventralvisualcortex,westilldonotknowwhattypesofvisualfeaturesareoptimal18
for cortical neurons. We briefly review state-of-the-art standard models of visual19
recognitionandarguethatsuchmodelsshouldconstitutethenullhypothesisforany20
measurementthatpurportstoascribesemanticmeaningtoneuronalresponses.While21
it remains unclear when, where, and how abstract semantic information is22
incorporated in visual neurophysiology, there exists clear evidence of top-down23
modulationintheformofattention,task-modulationandexpectations.Suchtop-down24
signals open the doors to some of the most exciting questions today towards25
elucidating how abstract knowledge can be incorporated into our models of visual26
processing.27
28
29
30
31
InthisChapter,Iaimtohighlightcriticallacunainourunderstandingofthe32
tuningpropertiesofvisualneurons,especiallytheroleofhigh-levelknowledgein33
neuralcodingofvisualinputs.Firstofall,Ishouldclarifytheobvious.Neuronsdo34
not“want”anything.Aneuronemitsanactionpotentialwhenitsintracellular35
voltageexceedsacertainthreshold,typicallybutnotexclusively,intheaxonhillock36
(Koch,1999).Thisvoltageisaweightedsumoftheinfluencesreceivedthroughthe37
neuron’sthousandsofdendriticinputs,whichincludebottom-up,horizontal,and38
top-downconnections.Itremainsexperimentallychallengingtotraceallincoming39
signalstoagivencorticalneuronandtopropagatethosesignalsallthewaybackto40
thesensoryinputs,nottomentionallothernon-sensoryinputs.Thus,inthevast41
majorityofcases,wecorrelatetheactivityofacorticalneuronwiththepresentation42
ofsensorysignals.Itisinthissensethatthequestioninthetitleshouldbe43
understood.Iaskwhatsensoryinputsbesttriggeractivityforagivencorticalneuron44
andwhattypeofsemanticinformationmayguideormodulatethoseneuronal45
responses.46
Istartwithasuccinctdescriptionoftheclassicalviewonwhattypesofvisual47
stimulitriggeractivityinneuronsalongtheventralvisualcortex.Iintroducestate-48
of-the-artstandardcomputationalmodelsofvisionandconsiderthemasabasic49
nullhypothesistoevaluateneuronaltuningpropertiesandpotentialsemantic50
influences,particularlyinthecontextofvisualcategorizationtasks.Next,Iprovidea51
fewexamplesofhowtop-downsignalscanmodulateresponsesalongtheventral52
visualcortexwhileemphasizingthatwehavealongwaytogotounderstandthe53
roleofcommonknowledgeonvisualprocessing.Iconcludebydiscussingcritical54
Hilbertquestioninthefieldandabriefglimpseoftheampleopportunitiesand55
challengesahead.56
57
Assumptionsanddefinitions58
Ifocushereontriggeringactivityinthesenseoffiringratesdefinedasspike59
countsinshortwindowsspanningtensofmilliseconds.Thisisbynomeanstheonly60
oragreeduponrelevantpropertyofcorticalneurons,therehasbeenextensive61
discussionaboutneuralcodes(seeforexample(Kreiman,2004)).Aneuronmight62
contributetorepresentinginformationbyfiringonlyafewspikesataprecisetime63
inconcertwithotherspikesinthenetwork.Aneuronmayalsorepresent64
informationbynotfiring,inthesamewaythatSherlockHolmesintuitedwhothe65
murdererwasbyattendingtothedogthatdidnotbark.Additionally,innon-66
invasivestudies,therearemultipleexperimentaltechniquesthatmeasurenon-67
neuronalsignalsthatarelesswellunderstoodandwhicharedifficulttointerpret68
directlyintermsofneuronalfiringrates.Thegeneralflavorofthediscussionhere69
couldwellbeextendedtootherneuralcodes,butintheinterestofsimplicitywe70
understandthequestioninthetitletoindicatewhattypeofsensoryinputsleadto71
highfiringratesforagivencorticalneuron.72
Afewassumptionsanddisclaimersarepertinentbeforeproceeding.Inorder73
toinvestigatewhatneuronswant,Irestrictthequestiontovisionandsolicit74
inspirationfrombiologicallyplausiblecomputationalmodelsofvision.Thefocuson75
visionismerelyapracticalone:(i)Weknowmoreaboutthearchitectureofthe76
visualsystemthanothersystems;(ii)Wecanstandontheshouldersofgiantsthat77
havepavedthewaythroughmorethanacenturyofbehavioralstudiesofvisionand78
overhalfacenturyofneurophysiologicalscrutinyofvisualcortex;(iii)Wehavean79
arsenaloftoolstosynthesizevisualstimuli,topreciselycontrolthetimingof80
presentation,tomeasureeyemovements,andtocapitalizeonmillionsofdigital81
imagesandvideos.Theemphasisonbiologicallyplausiblecomputationalmodelsof82
visionreflectstheneedtoformalizeourassumptions,andtogenerateacommon83
languagethatcanbeusedtodirectlytestourideasacrosslabsandacross84
experiments.Verbaldescriptionssuchas“neuronsinV2respondpreferentiallyto85
angles”or“neuronsinITrespondpreferentiallytoobjects”arenotsufficientand86
vague,lackpredictivepower,arehardtorejectorvalidate,andoftengetusinto87
trouble.Weneedmathematicalmodelsthatareinstantiatedintocomputercode88
wherewecanuseexactlythesameconditionsandexactlythesameimagesasin89
behavioralorneurophysiologicalexperiments.90
Beyondthepatternofinputspropagatedfromtheretinaontovisualcortex,91
whataneuronwantsislikelytobemodulatedbysemanticpriorknowledgeabout92
theworld,probablyconveyedthroughtop-downconnections.Whatexactlydowe93
meanbysemantics?TheOxford’sEnglishDictionarydefinessemanticsas“…the94
branchoflinguisticsandlogicconcernedwithmeaning.”Howthisdefinitionapplies95
tointerpretingtheresponsesofneuronsalongventralvisualcortexisnotclear.We96
attempttoprovideamorequantitativedefinitionlateroninthischapter.Forthe97
moment,asanexampleofsemanticsinthecontextofvisualrecognition,we98
understandpicturesofgrapes,oranges,orpineapplestorepresentdifferenttypesof99
fruits,eventhoughtheyareratherdifferentintermsoftheirvisualfeatures.100
Similarly,werefertoants,elephantsorgoldfishasanimals.Wewillaskwhatroles101
thisandothertypesofhigh-levelknowledgeabouttheworldplaysinvisual102
processing.103
104
Neuronalresponsesinvisualcortex,theclassicalview105
106
Theintroductionoftechniquestorecordtheactivityofneuronsinthe107
beginningofthelastcenturyledtodecadesofexperimentsinterrogatingneuronal108
responsestovisualstimulation.Thehistoryofstudyingneuronaltuningproperties109
invisualcortexisthehistoryofvisualstimuli.Howdoweinvestigatethefeature110
preferencesofaneuroninvisualcortex?Weneedtodecidewhichstimulitousein111
theexperiments.Thecentralchallengeinansweringthisquestionisthatitis112
essentiallyimpossiblewithcurrent(andforeseeable)technologytoexhaustively113
exploretheentirespaceofimages:thenumberofpossibleimagesisbeyond114
astronomical.Consideringasmallimagepatchof100x100pixels,thereare115
210,000~103,010possiblebinaryimages,~1024,082grayscaleimageswith256shadesof116
grayperpixel,and~1072,2478-bitcolorimages.Asaconsequence,investigators117
havetraditionallyusedseveralastuteandreasonablestrategiestoselectvisual118
stimuliforexperiments:119
(i)Stimulifrompreviousstudies.Pastperformanceisastrongpredictorofcurrent120
performanceforneurons.Stimulithathaveexcitedneuronsinpreviousstudiesare121
oftenagoodinitialguesstodesignexperiments.Forexample,eversincethe122
discoverythatV1neuronsincatsandmonkeysrespondvigorouslytobarsor123
gratingsofspecificorientations(HubelandWiesel,1968),investigatorshaveused124
orientedbarsandgratingstoprobetheresponsesessentiallyineveryspeciesand125
everyvisualarea(Coogan,1993;Chapmanetal.,1996;HegdeandVanEssen,2007;126
GhoseandMaunsell,2008;NiellandStryker,2010;Nassietal.,2014).Avariationof127
thisapproachistostartwitheffectivestimulifrompreviousstudiesandevaluate128
neuronalresponsestomodifiedversionsofthosestimuli(KobatakeandTanaka,129
1994;Tanaka,2003;Leopoldetal.,2006;Tsaoetal.,2006).130
(ii)Naturalstimuli.Itseemsreasonabletoassumethatneuronsrepresent131
behaviorallyrelevantstimuliandthetypesofimagesencounteredintherealworld.132
Thus,multiplestudieshaveprobedneuralresponsestonaturalimagesandmovies133
(OlshausenandField,1996;VinjeandGallant,2000;SimoncelliandOlshausen,134
2001;LesicaandStanley,2004;Okazawaetal.,2015;Isiketal.,2017),andalsoto135
everydayobjects(LogothetisandSheinberg,1996;SheinbergandLogothetis,2001;136
Hungetal.,2005;Liuetal.,2009),includingfaces(Desimoneetal.,1984;Allisonet137
al.,1994;Kanwisheretal.,1997;Tsaoetal.,2006).138
(iii)Semi-serendipitousfindings.HubelandWieselclaimedthattheydiscovered139
orientationtuningwhiletheywerescrutinizingtheactivityofprimaryvisualcortex140
neuronsandobservedtheresponseselicitedwhentheyinsertedaslideinthe141
projector(Hubel,1981).GrossandDesimoneobservedthatneuronsinITCfired142
vigorouslywhenoneoftheinvestigatorspassedinfrontofthemonkey,leadingto143
theinvestigationsaboutneuronsthatrespondtofacestimuli(Gross,1994).Our144
owndescriptionsofso-called“Clinton”or“Anniston”cellsinthehumanmedial145
temporallobewerealsofortuitous(Kreiman,2002;QuianQuirogaetal.,2005).146
Whiletheroleofluckcanbedebated,rigorousanalysesofneuralresponsestonovel147
stimulicanleadtodiscoveringunexpectedfeaturepreferences.148
(iv)Computationalmethods.Despiteenormousprogressindeveloping149
computationalmodelstoexplainandpredictneuralresponsesalongventralvisual150
cortex(RiesenhuberandPoggio,1999;Wuetal.,2006;Connoretal.,2007;Serreet151
al.,2007;DiCarloetal.,2012),therehavebeenfeweffortstousethosemodelsto152
createstimulithatefficientlydriveavisualneuron.Oneoftheseapproachesis153
reversecorrelationwherebyarapidsuccessionofwhitenoisestimuliispresented154
followedbyaveragingtheimagesprecedingspikes(Jonesetal.,1987).This155
approachhasbeensuccessfulinelucidatingthestructureofreceptivefieldsinthe156
retinaand,tosomeextent,inprimaryvisualcortex,butitdoesnotseemtoworkin157
highervisualareas,duetotheaccumulationofnon-linearitiesandalsothecurseof158
dimensionalitydictatedbythereducedsamplingofstimulusspace.Ratherthan159
startingfromnoise,exploitingnaturalstimulusstatisticshasbeenaproductiveway160
ofsynthesizingimagesandpredictingresponsesinareasV1(OlshausenandField,161
1996;OlshausenandField,2004),V2(Freemanetal.,2013),andV4(Okazawaetal.,162
2015).Anelegantalternativeapproachistouseageneticalgorithmwherebythe163
neuronunderstudycanitselfdictatewhichstimuliitprefers.Asuccessful164
implementationofthisideabyConnorandcolleagues(Yamaneetal.,2008)has165
beenusedtoinvestigateselectivityinmacaqueareasV4andITC(Carlsonetal.,166
2011;Hungetal.,2012;VaziriandConnor,2016).167
Usingacombinationofthesestimulusselectionapproaches,seminalstudies168
ledtofoundationaldiscoveriesaboutvisualprocessing,includingcenter-surround169
receptivefields(Kuffler,1953),neuronsinprimaryvisualcortexthataretunedto170
theorientationofabarplacedwithintheirreceptivefields(HubelandWiesel,171
1962),neuronsinareaMTthatdiscriminatemotiondirection(Movshonand172
Newsome,1992),neuronsinareaV4thataresensitivetocolors(Zeki,1983)and173
curvature(Gallantetal.,1993;PasupathyandConnor,2001),selectivitytonatural174
objects(LogothetisandSheinberg,1996;DiCarloetal.,2012)includingfaces175
(Desimoneetal.,1984;Tsaoetal.,2006),amongmanyothers.Despitethese176
extensiveandheroicefforts,westilldonotknowthatanyofthosetuningproperties177
areoptimalforthoseneurons–whereoptimalmeanstriggeringhighfiringrates.It178
isconceivablethattherecouldbeotherstimulithatmightmorestronglydrive179
neuronsinallthoseareas.Mechanisticmodelscanhelpusunderstandhow180
neuronalresponsesariseandthusdesignbetterstimuli.Thelasttwodecadeshave181
seensignificantprogressinthedevelopmentofcomputationalmodelstohelpus182
understandneuraltuningpropertiesinvisualcortex.183
184
Computationalmodelsofventralvisualcortex185
Inspiredbyneuroanatomyandneurophysiology,manyinvestigatorshave186
developedcomputationalmodelsthatcapturethebasicprinciplesthat187
progressivelytransformapixel-likerepresentationofinputsintocomplexfeatures188
thatcanbelinearlydecodedtorecognizeobjects(Fukushima,1980;Olshausenet189
al.,1993;Mel,1997;WallisandRolls,1997;RiesenhuberandPoggio,1999;Deco190
andRolls,2004;Serreetal.,2007;DiCarloetal.,2012).Morerecently,thisfamilyof191
modelshastakenoverthecomputervisioncommunityintheformofdeep192
convolutionalnetworkarchitecturesthatperformquitewellinmanyobjectlabeling193
andobjectdetectiontasks(Krizhevskyetal.,2012;SimonyanandZisserman,2014;194
Heetal.,2015;Heetal.,2018;Serre,2019).195
Whilethereareimportantvariationsacrossdifferentmodels,theyallshare196
basicdesignprinciplesandwegenericallyrefertothemasafamilyofmodels.The197
modelsarehierarchical,typicallyfollowingasequentialpathofoperations,198
mimickingtheapproximatelyhierarchicalnatureofventralvisualcortex(Felleman199
andVanEssen,1991).Themodelsconsistofmultiplelayers,followingadivide-and-200
conquerstrategybreakingtheproblemofobjectrecognitionintomultiplesmaller201
andsimplersteps.Eachofthesestepsischaracterizedbyaseriesofbiologically202
plausiblecanonicalcomputations,typicallyincludingafilterimplementedbyadot203
product,anormalizationstep,andamaxpoolingoperation.Inmostofthesteps,the204
operationsareperformedinaconvolutionalfashion,suchthatthesame205
computationisrepeatedthroughouttheentirevisualfield.Thedotproduct206
operationischaracterizedbysetsofweightsthatarelearntviatraining.Inthe207
computervisionliterature,aprominentwayoftrainingtheseweightsisvia208
supervisedlearningalgorithmsimplementingback-propagation.Throughthe209
sequenceofcomputations,unitsshowtuningforincreasinglymorecomplex210
features,accompaniedbyanincreasingdegreeofinvariancetotransformationsof211
thosefeaturessuchaschangesinposition,scale,etc.212
Thesemodelsperformquitewellinmanyobjectlabelingtasks.Forexample,213
theResNetarchitectureachieveda4%top-5errorrateintheImageNetdataset214
consistingof1,000possiblecategories(Heetal.,2015).Thesemodelsalsoprovidea215
reasonable–yetcertainlyimperfect–firstorderapproximationtocharacterize216
humanandmonkeybehavioralperformanceinrapidobjectcategorizationtasks217
(Russakovskyetal.,2014;Rajalinghametal.,2018).Forexample,arecentstudy218
showedthatdeepconvolutionalnetworkarchitecturesperformedaswellas,andin219
manycasesbetterthan,forensicfacialexaminerexperts,facialreviewersandso-220
calledfacial“super-recognizers”inafaceidentificationtask(Phillipsetal.,2018).221
Furthermore,theactivityofunitsinthesemodelscanbemappedontotheactivityof222
neuronsalongtheventralvisualcortex(Yaminsetal.,2014),evenextrapolating223
acrosscategorieswhenlearningthetransformationfrommodelstoneurons224
(O'Connelletal.,2018).225
Despitethemultiplesuccessesofthisfamilyofmodels,itisclearthatthey226
onlyscratchthesurfaceofwhatweneedtounderstandaboutvisualcortexand227
thereisalargeamountofneuroscienceandbehavioraldatathatcannotquitebe228
accountedbycurrentinstantiationsofthesealgorithms(Markovetal.,2014;229
Kubiliusetal.,2016;Ullmanetal.,2016;Linsleyetal.,2017;Tangetal.,2018;Serre,230
2019).Becausesuchmodelsdonotincorporateaspectsofcommonsensecognitive231
knowledgeabouttheworldotherthanwhatwasusedtolabelimagesfortraining,232
theyconstituteasuitablestandardbenchmarkandnullhypothesistocontrast233
againstforanystudythataimstoinvestigateanytypeofhigh-levelinformation234
encoding(Kreiman,2017).235
Withsomeexceptions,thisfamilyofmodelshasbeenlessconcernedwiththe236
rolesoftop-downinfluencesonventralvisualcortexresponses.Yet,thereis237
extensivedatadocumentinghowtop-downsignalscanmodulateneuronalactivity238
invision.Forexample,spatialattentioncanenhanceneuronalresponsesthroughout239
visualcortex(DesimoneandDuncan,1995;ReynoldsandHeeger,2009).Top-down240
influencesarealsomanifestedintheformofmodulationbytaskdemandsand241
expectations(GilbertandLi,2013). 242
Ofnote,thisfamilyofmodelsdoesnotexplicitlyincorporateanytypeof243
linguisticorsemanticencodingintheirdesign.Themodelsaretypicallytrainedto244
learntoseparateimagesthatwerelabeledasbelongingtodifferentclasses.For245
example,aninvestigatormaylabel1,000imagesaspineapples,andlabelanother246
setof1,000imagesaselephants.Themodelmaybetrainedviasupervisedlearning247
toseparatethosetwogroupsofimagesandthealgorithmscitedabovecandoa248
remarkablejobinlabelingimages,includingextrapolatingtonovelpicturesof249
pineapplesandelephants.Wenextturnourattentiontoaskwhetherthisabilityto250
assigncategorylabelsindicatesanytypeofsemanticrepresentation.251
252
Category-selectiveresponsesdonotimplysemanticencoding253
Inmanytypicalneuroscienceexperiments,investigatorsmaypresentimages254
containingobjectsbelongingtodifferentcategories(Desimoneetal.,1984;255
LogothetisandSheinberg,1996;Sugaseetal.,1999;Vogels,1999;Kreimanetal.,256
2000b;Freedmanetal.,2001;Thomasetal.,2001;SigalaandLogothetis,2002;257
Hungetal.,2005;QuianQuirogaetal.,2005;Tsaoetal.,2006;Kianietal.,2007;258
Meyersetal.,2008;Liuetal.,2009;KourtziandConnor,2011;Mormannetal.,259
2011).Throughoutinferiortemporalcortex,andeveninareasofthemedial260
temporallobeandpre-frontalcortex,investigatorshavereportedselectiveneuronal261
responseswithhigherfiringrateselicitedbysomegroupsofstimulicomparedto262
others.Dothesedifferentialresponsesindicateanytypeofsemanticencoding?263
Tobeclearaboutwhatthisquestionmeans,wereturntothedefinitionof264
semanticsasalinguistrepresentationconcernedwithmeaning.Weunderstandthis265
definitiontoimplythatmeaningindicatesanabstractrepresentation,beyondwhat266
ispurelycapturedbythestimulusfeatures.Asystemoralgorithmthat267
comprehendssemanticinformationshouldbeabletocapturethelinkbetween268
lemonsandpineapples,anditshouldbeabletodiscernthatatennisballis269
functionallyclosertoatennisracquet,eventhoughitlooksmoresimilartoalemon.270
Toinvestigatewhetherdistinctneuronalresponsestodifferentgroupsof271
stimulireflectsemanticencoding,weturntothenullhypothesisforvisual272
representationsoutlinedintheprevioussection,namely,computationalmodelsof273
objectrecognition.ConsiderthemodelarchitectureshowninFigure1A,consisting274
ofaninputimageconveyedtoacascadeof3convolutionallayers(Conv1-Conv3)275
andafullyconnected(fc)layerthatclassifiesinputimagesintooneof6possible276
categories.Thereare6fcunitsthatindicatetheprobabilitythattheimagebelongs277
toeachofthe6categories.Thisisclearlyafarcryfromstate-of-the-artmodelsthat278
includehundredsoflayersandcategorizehundredsofimages.Thedetailsofthe279
architecturearenottoorelevant;otherarchitecturesincludingstate-of-the-art280
computervisionmodelswouldproducesimilarresultstotheonesshownbelow.We281
deliberatelykeepitsimpleforillustrationpurposes,andtoprovidesourcecodethat282
caneasilyberanonanymachine(seelinksattheendoftheChapter).Thismodel283
wastrainedviaback-propagationusingimagesfrom6categoriesintheImageNet284
dataset(Russakovskyetal.,2014):biologicalcells(synsetnumbern00006484),285
Labradordogs(synsetnumbern02099712),fireants(synsetnumbern02221083),286
sportscars(synsetnumbern04285008),roses(synsetnumbern04971313),andice287
(synsetnumbern14915184).Examplesoftheseimagesareshowninthetoppartof288
Figure1B.Themodelwasabletoseparatethestimuluscategories:top-1289
performanceinacross-validatedsetwas78%(wherechanceis16.7%).A2D290
representationoftheactivationstrengthofthe6fcunitsatthetopofthemodelin291
responsetoeachoftheimagesisshowninFigure1B,usingadimensionality292
reductiontechniquecalledtSNEwhichmapsthe6dimensionaloutputvectoronto293
twodimensionsforvisualizationpurposes(vanderMaatenandHinton,2008).The294
colorsrepresentthe6differentcategories,whichclusterintooverlappingyet295
distinctgroups.Forexample,theimagesbelongingtothe“ice”category(pink)296
mostlyclusteredonthebottomleftwhileimagesbelongingtothe“rose”category297
(blue)mostlyclusteredonthetopinFigure1B.298
299
Wefurtherexaminedtheresponsesofeachofthe6fcunitstoallthe~8,000300
images(Figure1C).Forexample,intheleftmostcolumn,eachcirclecorrespondsto301
theactivationoffcunit1inresponsetooneoftheimages.Theverticaldottedlines302
separateimagesfromthe6differentcategories.Asexpected,basedonthewaythe303
modelwastrained,eachofthefcunitsshowedspecializationandrespondedmost304
stronglytooneoftheimagecategories.Forexample,fcunit1showedhigher305
activationonaveragetotheimagesfromthe“cell”category(red)comparedtoall306
theothercategories.Theresponseswerenotall-or-noneandshowedaconsiderable307
degreeofoverlapbetweencategories.Forexample,certainimagesofice(lastsetof308
images)yieldedstrongeractivationforfcunit1thansomeoftheimagesofcells309
(firstsetofimages,comparedthetwocircleshighlightedbythetwoarrowsforfc310
unit1inFigure1C).Thefcunitsarecategoryunitsparexcellence:byconstruction,311
theiractivationdictateshowthemodelwilllabelaparticularimage.Yet,the312
distributionoftheiractivationpatternsshowsconsiderableoverlapacross313
categoricalborders.Eventhoughthemodeldoesadecentjobatseparatingthe6314
imagecategories,themodeldoesnotseemtohaveanynotionofsemantics.A315
zoomedinpictureofapinkcarmaywellbemisclassifiedasarose.Andthediverse316
andstrangepatternsofcellshapescanoftenbemisconstruedtoindicateiceorants.317
Theproblemintermsofsemanticsisnotwiththemodelperformanceitself.Deeper318
modelsandmoreextensivetrainingcanleadtohigherperformance.Toerris319
algorithmic,afterall.Thepointhereisthatthemodelhasnosenseofabstract320
meaning,beyondthesimilarityofshapefeatureswithinacategoryrepresentedbyits321
units.322
323
Wecanstillrefertofcunit5asa“roseunit”forsimplicity.Whatwemeanby324
a“roseunit”isaunitthatismorestrongly--butnotexclusively--activatedby325
imagesthatcontainvisualshapefeaturesthatarecommoninthesetofrosesin326
ImageNet.Theunitdoesnotknowanythingsemanticaboutrosesandcanshow327
highactivationforimagesfromothercategoriesandalsolowactivationforimages328
containingroses,dependingonthevisualshapefeaturespresentintheimage.329
330
Acomparisonthatpervadestheliteratureisthedistinctionbetweenimages331
labeledas“humanfaces”andimageslabeledas“houses”.Wouldthemodelin332
Figure1Abeabletodiscriminatehumanfacesversushouses?Onemightimagine333
thatthemodelshouldnotbeabletodistinguishhumanfacesfromhousesbecause334
themodelwasnevertrainedwithsuchimages.Evenifoneweretotrytoarguethat335
themodelhassomesortofconcrete,asopposedtoabstract,understandingofthe336
meaningofcells,sportscars,roses,etc.,themodelshouldhavenoknowledge337
whatsoeverabouthumanfacesorhouses.Inotherwords,byconstruction,the338
modelhasnosemanticinformationaboutfacesorhouses.Ifthemodelcanstill339
separatefacesfromhouses,thenanysuchseparationcannotbebasedonsemantic340
knowledge.ToevaluatewhetherthemodelinFigure1Acanseparatepicturesof341
facesversushouses,weconsideredtwoadditionalcategoriesofimages:faces342
(synsetnumbern09618957),andhouses(synsetnumbern03545150).We343
extractedtheactivationpatternsofthe6fcunitsofthemodelinresponsetoeachof344
thosehumanfaceandhouseimageswithoutanyre-training(i.e.,themodelwas345
trainedtolabelthe6categoriesinFigure1Bandwemerelymeasuredthe346
activationinresponsetothesetwonewcategories).WeusedanSVMclassifierwith347
alinearkerneltodiscriminatepicturesofhumanfacesversushousesbasedonthe348
activityofthe6fcunits.Inotherwords,weaskedwhethertherepresentationgiven349
bythe“cellunit”,the“Labradorunit,the“fireantunit”,the“sportscarunit”,the350
“roseunit”,andthe“iceunit”wassufficienttoseparateimagesofhumanfacesand351
houses.Theclassifierachievedaperformanceof86%(wherechanceis50%).That352
is,thepatternofactivationofthe6fcunits–whicharespecializedtodiscriminate353
cells,Labradordogs,fireants,sportscars,roses,andice–canwellseparatepictures354
ofhumanfacesfromhouses.A2Drenderingoftheactivationpatternsofthose6fc355
unitsbythehumanfacesandhousesisshowninFigure1D,depictingagainaclear356
butcertainlynotperfectseparationofthetwocategories.357
358
Asystemthathasnosemanticknowledgeaboutfacesorhousescanstill359
separatethetwocategoriesquitewell.Giventheabundantliteratureonstudies360
aboutfacesversushouses,itisworthfurtherscrutinizingthisresult.The361
photographsintheImageNetdatasetaretakenfromthewebandtherearea362
handfulofhumanfacesandhousesincludedinthe6categorieschosenhere.The363
smallnumberofhumanfacesandhousesarenotuniformlydistributedamongthose364
6categoriesandcouldintroduceasmallbias.Yet,removingthosefewhumanfaces365
andhousesdoesnotchangetheresults.Aficionadostotheideathathumanfaces366
constituteaspecialgroupmightarguethattheimagesofLabradordogsdocontain367
animalfacesandthereforethe“Labrador”fcunitmayhelptheclassifierseparate368
facesfromhouses.Toevaluatethispossibility,wecomputedthesignaltonoiseratio369
foreachofthe6fcunitsindiscriminatingfacesversushouses.Thebestfcunitwas370
unitnumber4(theonethatshowedstrongeractivationbyimagesofsportscars),371
closelyfollowedbyunitnumber5(roses).Theworstfcunitwasunitnumber3(fire372
ants),followedbyunitnumber1(cells).Inotherwords,theLabradorfcunitisnot373
theonethatcontributesmosttotheseparationofhumanfacesversushouses.The374
activationpatternoffcunitnumber4(sportscars)inresponsestohumanfacesand375
housesisshowninFigure1E.Thisfcunitshowedaclearseparationofthetwo376
imagecategories,respondingstrongertoimagesofhumanfaces(meanactivation=377
0.47±1.72)comparedtohouses(meanactivation=-1.54±1.18).Aspointedout378
earlierinconnectionwithFigure1C,thedistributionofresponsesforthetwo379
categoriesclearlyoverlapped.380
381
Nowconsideranexperimentwithactualneuronsstudyingtheresponsesto382
imagesoffacesversushouses.Recordingtheactivityofaneuronthatbehavedlike383
fcunit4,inanexperimentsimilartotheoneinFigure1E,aninvestigatormightbe384
temptedtoarguethattheneuronrepresentsthesemanticconceptoffaces.Yetfc385
unit4isclearlymorestronglytunedtoimagesofsportscars(Figure1C,fourth386
subplot):themeanactivationinresponsetosportscarswas4.59±2.27,whichis387
abouttentimeslargerthanthemeanactivationinresponsetohumanfaces(0.47±388
1.72).Thereisnothingparticularlyspecialaboutthisunit;infact,allfcunitsexcept389
unitnumber3(fireants)showedastatisticallysignificantdifferentiationbetween390
imagesofhumanfacesversushouses.Tofurtherdispelanydoubtsthatthe391
Labradorimagesareplayinganyroleinhere,weranaseparatesimulationwhere392
wetrainedthesamearchitectureinFigure1Afromscratchwithonly2fcoutput393
unitstodiscriminateimagesofdesks(synsetnumbern03179701)versusimagesof394
friedrice(synsetnumbern07868340).Thealgorithmachievedanaccuracyof98%395
(chance=50%).Thesetwofcunitscouldbedescribedasa“deskunit”anda“fried396
riceunit”.Thepatternofactivationofthosetwofcunitsinresponsetoimagesof397
humanfacesandhouses(withoutanyretrainingofthenetwork)wasableto398
distinguishthemwith73%accuracy.Thedeskunitshowedanactivationof2.49±399
1.55inresponsetoimagesofhumanfacesandanactivationof0.98±1.11in400
responsetoimagesofhouses,clearlydifferentiatingthetwocategories.Thefried401
riceunitshowedanactivationof-2.32±1.37inresponsetoimagesofhumanfaces402
versusanactivationof-1.13±1.11forimagesofhouses,clearlydifferentiating403
betweenthetwocategories.Insum,measuringhigheractivationforpicturesofone404
categoryversusothers(e.g.sportscarsversusrosesorfacesversushouses),inand405
ofitself,shouldnotbetakentoimplyanytypeofsemanticrepresentation.406
407
OnemaystillwanttomaintainthatthefcunitsinFigure1Aencodesome408
flavorofsemantics.Afterall,athresholdedversionoftheactivityofthoseunitsis409
sufficienttoprovideacategoricalimagelabel.Furthermore,thoseunitsarecapable410
ofacertaindegreeofabstractioninthesensethattheycanlabelnovelimagesthat411
themodelhasneverseenbeforeintothose6categories.Suchaversionofsemantics412
couldperhapsbebestdescribedasconcretevisualshapesemantics,asopposedto413
someabstractversionofsemanticsthattranscendsvisualfeatures.414
415
Whatarethepreferredstimuliforvisualneurons?416
WhatdothosefcunitsinFigure1Aactuallywant?Thatis,whattypesof417
imageswouldtriggerhighactivationinthosefcunits?Weknowalreadyfrom418
Figure1Cthatimagesofcellsleadtohighactivationinfcunit1,imagesof419
Labradorsleadtohighactivationinfcunit2,etc.Therefore,itseemsreasonableto420
arguethatfcunit1“wants”imagesofcells,fcunit2“wants”imagesofLabradors421
andsoon.Onemightevengoontodescribefcunit2asa“Labradorunit”,aswe422
havebeendoing.Butisitpossiblethatthereexistotherimagesthatleadtoeven423
higheractivationofthosefcunits?Toinvestigatethisquestion,weusedtheAlexnet424
model(Krizhevskyetal.,2012),pre-trainedontheImageNetdataset(Russakovsky425
etal.,2014).Weconsideredtwooftheoutputunits(layerlabeledfc8inAlexnet).426
Thesameanalysescanbeperformedforanyotherlayerbutwefocusonthe427
classificationlayerbecausethisisthestagethatwouldpresumablycontainthe428
highestdegreeofcategoricalinformation.Forillustrationpurposes,weshowthe429
activationoffc8unitnumber209(Figure2A)andfc8unitnumber527(Figure2B)430
inresponsetofourcategoriesofstimuli:Labradors,fireants,desksandsportscars.431
Asexpectedbasedonthewaythatthemodelwastrained,the“Labradorunit”(unit432
209)showedlargeractivationforimagescontainingLabradorscomparedtothe433
otherimages(Figure2A).Similarly,the“Deskunit”(unit527)showedlarger434
activationforimagescontainingdeskscomparedtotheotherimages(Figure2B).435
ThisistheequivalentoftheresultspresentedinFigure1C.Next,weusedthe436
“DeepDream”algorithmtogenerateimagesthatleadtohighactivationforthosefc437
units(Mordvintsevetal.,2015).Essentially,theDeepDreamalgorithmusesthe438
networkinreversemode.Insteadofgoingfrompixelstothefeaturerepresentation439
inagivenunitinthenetwork,DeepDreamgoesfromthefeaturerepresentationina440
givenunitbacktopixels,generatingimagesasitsoutput,andoptimizingthose441
imagesineachiterationtoelicitahighactivationinthechosenunit.Using442
DeepDreamtogenerateimagesthatleadtohighactivationforthe“Labradorunit”443
producedtheimageshowninFigure2C.Theactivationstrengthofthe“Labrador444
unit”inresponsetotheimageinFigure2Cyieldedtheactivationstrengthshownby445
thegreenandbluesymbolsinFigure2A,dependingonthesize(thebluesymbol446
correspondstothesameexactsizeasalltheotherimages).TheimageinFigure2C447
thustriggeredhigheractivationthananyofthe1,846photographsofLabradors448
(eventhoughthosephotographswereusedtotrainthenetwork).Theimagein449
Figure2Ccouldwellbedescribedusingwordsbyahumanobserverascontaining450
multiplerenderingsofdistorted,sketchy,blurred,Labrador-likepatches.Similar451
resultsareshownforthe“Deskunit”inFigures2Band2D.Aftersomesquinting,it452
isalsopossibletodiscernsomeresemblancetodesk-likefeaturesinFigure2D,but453
itislessobvious.Insum,whatfcunitswantisanimagerenderingcomplexfeatures,454
featuresthatarenoteasilymappedontoEnglishwords,thoughtheycertainly455
resembleaspectsoftheactualphotographsusedtotrainthealgorithms.Thosefc456
unitsrespondmoststronglytoimagesthatcannotbeobviouslypredictedbythe457
labelsassignedtothem.Whileonemaystillwanttorefertothoseunitsas458
“Labradorunits”and“Deskunits”,itisclearthattherearemanyimagesthatwould459
notbelabeledasLabradorsordesksbyanyhumanobserver,andyettheytrigger460
higheractivationinthoseunits,evenhigherthanreal-worldphotographs461
containingthosecategories.462
Tosummarize,typicalNeuroscienceexperimentsarelimitedbyhowlongit463
ispossibletorecordfromaneuron.Investigatorsmustmakehardchoicesabout464
whichstimulitopresent.Thereisarichandexcitingliteraturewithmany465
experimentsshowingthatneuronalresponsescandiscriminateamongdifferent466
categoriesofstimuli.AsillustratedherebythecomputationalmodelsinFigures1467
and2,thesetypesofresponsesdonotimplyanytypeofsemanticencoding.Simple468
computationalmodelscanalsoyieldresponsesthatdistinguishdifferentcategories469
(Figure1B),thoseresponsesarenotall-or-none(Figure1C),category470
differentiationcanbedemonstratedusingunitsthatareknowntobeclearly471
semanticallyunrelatedtothosecategories(Figures1D-E),andcompleximagesthat472
donotdirectlymapontoanysemanticmeaningcantriggerhigheractivationin473
putativecategoricalunits(Figure2).474
475
Modelsversusrealbrains476
477
Thesedeepconvolutionalbottom-upcomputationalmodelscastadoubton478
claimsaboutsemanticencodingbasedoncategory-selectiveresponsesandprovide479
anullhypothesistocompareagainst.Yet,thesecomputationalmodelsareafarcry480
fromrealbiologicalsystemsinallsortsofwaysandthereforeitisfairtoquestionto481
whatextentwecanextrapolateconclusionsfromthesecomputationalmodelstothe482
typesofrepresentationsmanifestedbyrealneurons.Advocatesofsemanticswould483
rightlyarguethattheexercisesintheprevioussectionmerelyreflecttoymodels484
andthatitremainsunclearwhetherthesameobservationsapplytoactualneuronal485
recordingsfromrealbrains.Theobservationthatthesemodelscanreproduce486
certainaspectsofselectivityinneurophysiologicalrecordingsdoesnotimplythat487
onecanruleoutthepresenceofsemanticinformationinneuraldata.488
489
Althoughdeepconvolutionalmodelsarestillratherprimitiveandfailto490
incorporatemuchofthearchitectureandfunctionofbiologicalcircuits,recent491
studieshaveshownthatthesemodelscanexplainarelativelylargefractionofthe492
varianceinneuronalresponses(Yaminsetal.,2014;Maheswaranathanetal.,2018).493
Infact,category-selectiveresponsesfrombiologicalneuronsalsoshowthetypeof494
propertiesillustratedinFigure1(e.g.,(Vogels;Kreimanetal.;Sigalaand495
Logothetis;Hungetal.))andthereforethesamecautionarynotesshouldbeusedin496
interpretingneuronalselectivity.Furthermore,recentworkhasshownthatitis497
possibletogenerateeffectivestimuliforbiologicalneuronsinafashionsimilarto498
theprocedureillustratedinFigure2(Ponceetal.,2019).Theauthorsuseda499
proceduresimilartotheDeepDreamalgorithmdiscussedearliertogenerateimages500
whilerecordingneuronalresponsesusedtoguidetheevolutionofimagestriggering501
highfiringrates.Theresultingsetofsyntheticimagestriggeredactivationin502
biologicalneuronsthatwasasstrongasorinmanycasesevenstrongerthannatural503
stimuli,similartothesyntheticimagescreatedinFigure2.Inotherwords,for504
biologicalneuronsalongtheventralvisualcortex,thetypeofstimulithattrigger505
strongestactivationarenotrealworldobjectswithsemanticmeanings,butrather506
complexshapeswithfeaturessharedwithrealworldobjectsbutdistinctand507
abstractandwithoutanyobvioussemanticmeaning(Ponceetal.,2019).508
509
Absenceofcompellingevidenceforsemanticencodingdoesnotconstitute510
evidenceofabsenceofsemantics.Thefactthatwecannotconcludethatthereis511
abstractsemanticinformationbyobservingtheresponsestoagivencategory512
versusothersinthistypeofexperimentscertainlyshouldnotbeinterpretedto513
implythatsemanticinformationdoesnotexist.Thepointintheprevioussectionis514
thatmerelyshowingdifferentialpatternsofactivitybetweentwo(ormore)515
categoriesofstimuliismoreofareflectionaboutthechoiceofstimuliandaboutthe516
waytheimagesweregatheredratherthananymysteriousnotionofabstract517
meaning.Thefamilyofdeepconvolutionalnetworkmodelsshouldbeusedasanull518
hypothesisforanystatementconcerningtherepresentationofabstractmeaningin519
experimentsonvisualimages.Wecanthusdefineabstractsemanticencodingas520
visualdiscriminationsthatcannotbeaccountedforbythefamilyofnull521
computationalmodelsofvisualrecognition.522
523
Ratherthandiscussingpresenceorabsenceofsemanticinformationina524
binaryfashion,itisprobablymoreusefultoconsiderdifferentlevelsofabstraction525
andinvariance.Atthebottomlevelisthenotionoftemplatematching,i.e.aneuron526
thatrespondswhenaspecificcombinationofpixelsisshownwithinitsreceptive527
field.Increasingthedegreeofinvariance,wecanconsideraneuronthatresponds528
withapproximatelythesameintensitywhenthestimulusshowssmallchangessuch529
asacomplexcellinprimaryvisualcortexanditsresponsestoanoptimallyoriented530
baratdifferentpositionswithinthereceptivefield.Increasingthedegreeof531
abstraction,wecanconsiderneuronsininferiortemporalcortexthatshow532
tolerancetosomeamountof2Drotationoftheirpreferredstimuliandneuronsthat533
respondtovisuallysimilarexemplarsfromagivencategorysuchastheones534
modeledintheprevioussection.Asignificantstepupwardsininvariancewouldbe535
tofindneuronsthatshowasimilarresponsetoatennisball,atennisracquet,a536
tenniscourt,atennisskirtandthewordWimbledon.Tothebestofmyknowledge,537
thereisnoevidenceyetforsucharepresentation.538
539
Insearchofabstractioninthebrain540
541
Whattypeofexperimentaldatawouldprovideevidenceinfavorofabstract542
semanticinformation?Returningtotheexamplesusedinthedefinitionof543
semantics,itwouldbenicetoshowneuronalresponsesthataresimilarforatennis544
ballandatennisracketandyetverydifferentbetweenatennisballandalemon.In545
otherwords,itwouldbenicetoshow(i)imagesthathaveasimilarvisual546
appearance(e.g.,atennisballandalemon)andyettheytriggerverydifferent547
responses,and(ii)imagesthatarevisuallydissimilar(e.g.,atennisballandatennis548
racket)andyettheytriggerverysimilarresponses.549
Anelegantstepinthisdirectionwascarriedoutbygeneratingmorphs550
betweensyntheticimagesofcatsanddogsandtrainingmonkeystobehaviorally551
separatethem(Freedmanetal.,2001).Theauthorscouldtitratethevisual552
similarityofthestimuliandseparatepurelyvisualshapefeaturesfromthetask-553
relevantcategoricaldifferentiationbetweenthem.Theauthorsdescribedthe554
activityofneuronsinpre-frontalcortexthatcorrelatedwiththecategorical555
distinctionsratherthanthevisualappearancedistinctionsbetweenstimuli.While556
pre-frontalcortexneuronsbetterreflectedsuchtask-dependentabstract557
information,neuronsininferiortemporalcortexalsoshowedevidenceforencoding558
thecategoricalboundaries(Meyersetal.,2008).Furthermore,monkeyscouldbe559
retrainedtochangetheirdefinitionofthecategoricalboundariesandpre-frontal560
cortexneuronsalteredtheirtuningtoreflectthenewcategoricaldefinitions561
imposedbythetaskdemands(Cromeretal.,2010).562
Anothersetofintriguingresultscomesfromneuralrecordingsinhuman563
epilepsypatients.Somepatientssufferingfrompharmacologicallyintractable564
epilepsyareimplantedwithelectrodesaspartoftheclinicalprocedureforpotential565
surgicalresectionoftheseizurefocus.Thisclinicalsituationprovidesarather566
uniqueopportunitytorecordthespikingactivityofneuronsinthehumanbrain,567
particularlyinareasofthemedialtemporallobeincludingthehippocampus,568
entorhinalcortex,parahippocampalgyrusandtheamygdala(Engeletal.,2005;569
Kreiman,2007;MukamelandFried,2012;Friedetal.,2014).Thislineofresearch570
hasgeneratedobservationsleadingtoclaimsaboutcategoricalinvariance(Kreiman571
etal.,2000b;Mormannetal.,2011).Therehavealsobeenresponsestospecific572
individualsortospecificlandmarks(QuianQuirogaetal.,2005).Thesestudiesare573
subjecttothesametypeofcaveatshighlightedintheprevioussection.However,in574
severalofthosecases,theinvariantresponsesweretriggeredbysetsofimagesthat575
wereverydifferentfromeachotherbasedonvisualinspection(therewasno576
quantitativedocumentationofvisualshapesimilaritybasedoncomputational577
models).Thesubjectivevisualdissimilarityofthosestimulisuggeststhatitwould578
bedifficulttoaccountforthoseresponsespurelybasedonthetypeofvisual579
similaritydescribedbythenullfamilyofstandardmodels.Particularlystrikingare580
thecaseswheretheneuronsrespondedtotheimageofaparticularindividualas581
wellastextversionoftheirname(QuianQuirogaetal.,2005),andcaseswherethe582
neuronsrespondedinaselectivefashionduringvisualimageryintheabsenceof583
anyvisualinput(Kreimanetal.,2000a;Gelbard-Sagivetal.,2008).Theactivityof584
humanmedialtemporallobeneuronstakenasawholethereforereflectsahigh585
degreeofabstraction.Interestingly,theseresponsestendtooccurratherlateinthe586
game,arisingsomewherebetween200and300millisecondsafterstimulusonset587
dependingonthespecificarea,whichisattheveryleast50-150millisecondsafter588
theselectivevisualresponsesdescribedinbothmonkey(Eskandaretal.,1992;589
Keysersetal.,2001;Hungetal.,2005)andhuman(Liuetal.,2009)inferior590
temporalcortex.Additionally,bothhumans(Thorpeetal.,1996)andmonkeys591
(Fabre-Thorpeetal.,1998)canbehaviorallycategorizeimageswellbeforetheonset592
ofthoseresponses.Thus,thesemedialtemporalloberesponsesaremorelikelyto593
reflecttheencodingofemotionalinformationandtheformationofepisodic594
memories(bothofwhicharelikelytodependonsemanticencodingofinformation),595
ratherthanthevisualcategorizationperse.596
Accordingtothebroaddefinitionofsemanticsasaspectsofneuronal597
responsesthatcannotbeaccountedbythenullstandardmodelsofvisual598
recognition,multiplestudieshaveshowntaskdependentmodulationof599
neurophysiologicalresponsesthroughoutvisualcortex.Forexample,inanelegant600
study,neuronsinprimarycortexrespondeddifferentlytothesamestimuluswithin601
theirreceptivefielddependingonwhethertheinformationwasrelevantornotfor602
thecurrenttask(Lietal.,2004).Task-dependentexpectationscanalsomodulate603
responsesallthewaydowntoV1neurons(GilbertandLi,2013).Satisfyingsuch604
taskdemandscanbeconsideredanimportantaspectofabstractioninthesenseof605
consideringtheincominginputsinthecontextofcurrentgoals.606
607
Semanticsandtheleastcommonsense608
Commonsense,orgeneralsemanticknowledgeabouttheworld,ishardto609
find.Thedefinitionofsemanticsincludinglinguistic-likeinformation,atleasttaken610
literally,suggeststhatweshouldbelookingforahighlevelofabstraction,beyond611
whatcanbedescribedbycurrentvisualobjectrecognitionmodels.Onepractical612
issuetotacklesemanticsisthatitisdifficulttostudylanguageinnon-human613
animals.Strangely,thereareeveninvestigatorsthathaveclaimedthatlanguageis614
uniquetohumans(BerwickandChomsky,2015).Additionally,thereisminimaldata615
onsingleneuronresponsesinlanguageareasinthehumanbrain.Onemayimagine616
thatanylinguisticinformationfrommedialtemporallobestructures,fromtask-617
dependentrepresentationsinpre-frontalcortex,orfromlanguageareas,618
mayverywellpropagatebacktoventralvisualcorticalareasanditmightbe619
possibletodiscernthosetop-downsemanticinfluencesinvisualcortex.620
Inthespiritofstimulatingfurtherdiscussionsandfuturework,weconclude621
withabriefdesiderataofexperimentsandmodelstofurtherourunderstandingof622
whatvisualcorticalneuronsreallywantandtheroleofsemanticinformation.623
[1]Computationalmodelsshouldplayanintegralpartinthedesignofvisual624
experimentstoelucidatewhatneuronswant.Thefamilyofdeepconvolutional625
networkmodelsprovidesareasonablenullhypothesistostartwith.Themodelscan626
beusedtoquantifywhatfractionofneuronalresponsevariabilitycanbeexplained,627
butalsotogenerateimagesanddesigntheexperimentsthemselves.Asanexample628
ofthisapproach,Figure2illustratesawayinwhichamodelcangenerateimages629
thattriggerhighactivationinitsunitsanditwillbeinterestingtofurtherevaluate630
thislineofreasoninginneuronalrecordings.631
[2]Giventhelimitedamountofdatathatwecanacquireforagivenneurondespite632
laboriousandheroicexperiments,weshouldbeopentotheideathatwehaveyetto633
uncoverwhatneuronstrulywant.Ourunderstandinganddescriptionofthetuning634
propertiesofneuronsalongventralvisualcortexmayhavetobesignificantly635
revisited.Twoimportantrecentdevelopmentsmayaccelerateprogress:theadvent636
ofsophisticatedcomputationalmodelsthatcanprovidequantitativehypothesisfor637
testingbeyondclassicalexperimentaldesigns,andtheexperimentalpossibilityof638
holdingneuronalrecordingsforprolongedperiodsoftime(McMahonetal.,2014).639
[3]Taskdemandsseemtoplayacriticalroleindynamicallyshapingneuronal640
responsesbeyondthedimensionsthatarepurelydictatedbysensoryinputs.As641
oneexampleofarecentsurprisingfindinginthisdirection,neuronsinrodent642
primaryvisualcortexarestronglymodulatednotonlybythevisualinputsbutalso643
bythespeedatwhichtheanimalismoving(NiellandStryker,2010).Thereare644
plentyofopportunitiestofurtherinvestigatehowtop-downmodulationcan645
dynamicallyrouteinformationaccordingtothecurrentbehavioralgoals.646
[4]Touncoversemanticencoding,wewouldliketoensurethattheneuronal647
responsescannotbeexplainedbythenullfamilyofmodels.Aneuronencoding648
semanticinformationshouldshowasimilarresponsetoimagesthatsharemeaning649
butwhichhavenosimilarityintheirappearance.Additionally,suchaneuronshould650
showadifferentresponsetoimagesthatarevisuallysimilarbutdonotsharethe651
samemeaning.652
[5]Anotherimportantquestionforfutureresearchistoelucidatetheneuronal653
mechanismsofhowabstractioncanbelearnt.Therehasbeenextensivework654
showingthatvisualcorticalneuronscanchangetheirresponsesasaconsequenceof655
associationsformedbydifferentstimuli(Miyashita,1988;HiguchiandMiyashita,656
1996;Messingeretal.,2001;Suzuki,2007).Extendingsuchmechanismsmightlead657
totheformationofsemanticlinkssuchasthoseestablishedbythestatisticalco-658
occurrencesoftennisballs,racquets,courts,andskirts.659
660
Dataavailability661
AllthecodeusedtogenerateFigures1and2,isavailablefordownloadfrom:662
http://klab.tch.harvard.edu/resources/Categorization_Semantics.html663
WecannotprovidetheimagesusedintheexperimentsinFigures1and2.However,664
weprovidethesynsetidentificationnumbers,whichcanbeusedtofreelydownload665
alltheimagesfromthefollowingsite:666
http://image-net.org/667
668
669
670
FiguresCaptions671
672Figure1673
A.Simplemulti-layerconvolutionalnetworkconsistingofaninputlayer,3674convolutionallayersandafullyconnectedclassificationlayerthatclassifiesimages675intooneof6possiblecategories:cells,Labradors,fireants,sportscars,rosesandice676(exampleimagesfromthosecategoriesareshowninpartB).Thenetworkwas677trainedviabackpropagationtooptimizeclassificationofimagesbelongingtothose678
6categories.B.Dimensionalityreductionusingstochasticembedding(vander679MaatenandHinton,2008)oftheactivationpatternforthe6fclayerunitsfrompart680Ainresponsetoeachoftheimages.Thecolorofeachdotreflectstheimage681category.C.Activationstrengthforeachofthe6fcunitsinresponsetoallthe682images.Theimagecategoriesareseparatedbyverticaldottedlines.Theimages683fromthecategoryelicitingthestrongestactivationforeachofthefcunitsisshown684incolor,withthecolorsmatchingtheonesinpartB(e.g.,fcunit1showedstronger685activationtoimagescorrespondingtothecellcategory).D.Dimensionality686reductionusingstochasticembeddingoftheactivationpatternforthe6fclayer687unitsfrompartAinresponsetoimagesoffaces(red)orhouses(blue).Thenetwork688wasnottrainedtorecognizeeitherfacesorhouses.Yet,asupportvectormachine689classifierwithalinearkernelcouldseparatethetwocategories(emptycircles690representwronglyclassifiedimagesandfilledcirclesrepresentcorrectlyclassified691images).E.Activationpatternoffcunitnumber4(theoneshowingstrongest692responsestosportscars)inresponsetoalltheimagescontainingfaces(red)or693houses(blue).Thehorizontaldashedlineindicatestheaverageresponses.Allthe694parametersandsourcecodetogeneratetheseimagesareavailablein695http://klab.tch.harvard.edu/resources/Categorization_Semantics.html696697
698
699Figure2700
A.Activationofunitcorrespondingtochannel209inlayerfc8inAlexnet701(Krizhevskyetal.,2012)inresponseto1846imagesofLabradordogs(redcircles),702972imagesofants,1366imagesofdesks,and1165imagesofsportscars(black703circles).Theverticaldottedlinesseparatethedifferentimagecategories.This704neuralnetworkwastrainedviabackpropagationusing1000imagecategories,705includingthe4categoriesshownhere.Thechannelshownherecorrespondstothe706classificationunitforthelabel“Labradordog”;asexpected,activationforthose707imageswasgenerallylargerthanactivationforotherimages.B.SameasAforunit708correspondingtochannel527(Desk).C.ImagegeneratedusingDeepDreamfor709Alexnetchannel209inlayerfc8(Mordvintsevetal.,2015).D.Imagegenerated710usingDeepDreamforAlexnetchannel527inlayerfc8.TheimagesinC-Dledtothe711activationdenotedbythegreentrianglesinA-B.UponresizingtheimagesinC-Dto712bethesamesizeasalltheotherimagesinpartsA-B,thecorrespondingactivations713aretheonesshownbythebluesquaresinA-B.Alltheparametersandsourcecode714togeneratetheseimagesareavailablein715http://klab.tch.harvard.edu/resources/Categorization_Semantics.html716 717
References718
719
AllisonT,GinterH,McCarthyG,NobreAC,PuceA,LubyM,SpencerDD(1994)Face720recognitioninhumanextrastriatecortex.JournalofNeurophysiology72171:821-825.722
BerwickR,ChomskyN(2015)WhyOnlyUs:LanguageandEvolution.Cambridge,723MA:MITPress.724
CarlsonET,RasquinhaRJ,ZhangK,ConnorCE(2011)Asparseobjectcodingscheme725inareaV4.CurrentBiology21:288-293.726
ChapmanB,StrykerM,BonhoefferT(1996)DevelopmentofOrientationPreference727MapsinFerretPrimaryVisualCortex.JournalofNeuroscience16:6443-7286453.729
ConnorCE,BrincatSL,PasupathyA(2007)Transformationofshapeinformationin730theventralpathway.Currentopinioninneurobiology17:140-147.731
CooganT,Burkhalter,A.(1993)HierarchicalOrganizationofAreasinRatVisual732Cortex.TheJournalofNeuroscience13:3749-3772.733
CromerJA,RoyJE,MillerEK(2010)Representationofmultiple,independent734categoriesintheprimateprefrontalcortex.Neuron66:796-807.735
DecoG,RollsET(2004)ComputationalNeuroscienceofVision.OxfordOxford736UniversityPress.737
DesimoneR,DuncanJ(1995)Neuralmechanismsofselectivevisualattention.738AnnualReviewofNeuroscience18:193-222.739
DesimoneR,AlbrightT,GrossC,BruceC(1984)Stimulus-selectivepropertiesof740inferiortemporalneuronsinthemacaque.JournalofNeuroscience4:2051-7412062.742
DiCarloJJ,ZoccolanD,RustNC(2012)Howdoesthebrainsolvevisualobject743recognition?Neuron73:415-434.744
EngelAK,MollCK,FriedI,OjemannGA(2005)Invasiverecordingsfromthehuman745brain:clinicalinsightsandbeyond.NatRevNeurosci6:35-47.746
EskandarEN,RichmondBJ,OpticanLM(1992)Roleofinferiortemporalneuronsin747visualmemory.I.Temporalencodingofinformationaboutvisualimages,748recalledimages,andbehavioralcontext.JNeurophysiol68:1277-1295.749
Fabre-ThorpeM,RichardG,ThorpeSJ(1998)Rapidcategorizationofnatural750imagesbyrhesusmonkeys.Neuroreport9:303-308.751
FellemanDJ,VanEssenDC(1991)Distributedhierarchicalprocessinginthe752primatecerebralcortex.Cerebralcortex1:1-47.753
FreedmanD,RiesenhuberM,PoggioT,MillerE(2001)Categoricalrepresentationof754visualstimuliintheprimateprefrontalcortex.Science291:312-316.755
FreemanJ,ZiembaCM,HeegerDJ,SimoncelliEP,MovshonJA(2013)Afunctional756andperceptualsignatureofthesecondvisualareainprimates.Nature757neuroscience16:974-981.758
FriedI,CerfM,RutishauserU,KreimanG(2014)Singleneuronstudiesofthehuman759brain.Probingcognition.Cambridge,MA:MITPress.760
FukushimaK(1980)Neocognitron:aselforganizingneuralnetworkmodelfora761mechanismofpatternrecognitionunaffectedbyshiftinposition.Biological762Cybernetics36:193-202.763
GallantJL,BraunJ,VanEssenDC(1993)Selectivityforpolar,hyperbolic,and764Cartesiangratingsinmacaquevisualcortex.Science259:100-103.765
Gelbard-SagivH,MukamelR,HarelM,MalachR,FriedI(2008)InternallyGenerated766ReactivationofSingleNeuronsinHumanHippocampusDuringFreeRecall.767Science.768
GhoseGM,MaunsellJH(2008)Spatialsummationcanexplaintheattentional769modulationofneuronalresponsestomultiplestimuliinareaV4.Journalof770Neuroscience28:5115-5126.771
GilbertCD,LiW(2013)Top-downinfluencesonvisualprocessing.NatRevNeurosci77214:350-363.773
GrossCG(1994)Howinferiortemporalcortexbecameavisualarea.Cerebralcortex7745:455-469.775
HeK,ZhangX,RenS,SunJ(2015)Deepresiduallearningforimagerecognition.776arXiv1512.03385.777
HeK,GkioxariG,DollarP,GirshickR(2018)MaskR-CNN.In:IEEETrans.Pattern778Anal.MachIntell.779
HegdeJ,VanEssenDC(2007)Acomparativestudyofshaperepresentationin780macaquevisualareasv2andv4.Cerebralcortex17:1100-1116.781
HiguchiS,MiyashitaY(1996)Formationofmnemonicneuronalresponsestovisual782pairedassociatesininferotemporalcortexisimpairedbyperirhinaland783entorhinallesions.PNAS93:739-743.784
HubelD(1981)Evolutionofideasontheprimaryvisualcortex,1955-1978:abiased785historicalaccount.In:NobelLectures.786
HubelDH,WieselTN(1962)Receptivefields,binocularinteractionandfunctional787architectureinthecat'svisualcortex.TheJournalofphysiology160:106-788154.789
HubelDH,WieselTN(1968)Receptivefieldsandfunctionalarchitectureofmonkey790striatecortex.TheJournalofphysiology195:215-243.791
HungCC,CarlsonET,ConnorCE(2012)Medialaxisshapecodinginmacaque792inferotemporalcortex.Neuron74:1099-1113.793
HungCP,KreimanG,PoggioT,DiCarloJJ(2005)FastRead-outofObjectIdentity794fromMacaqueInferiorTemporalCortex.Science310:863-866.795
IsikL,SingerJ,MadsenJR,KanwisherN,KreimanG(2017)Whatischangingwhen:796Decodingvisualinformationinmoviesfromhumanintracranialrecordings.797Neuroimage:InPress.798
JonesJP,StepnoskiA,PalmerLA(1987)Thetwo-dimensionalspectralstructureof799simplereceptivefieldsincatstriatecortex.JNeurophysiol58:1212-1232.800
KanwisherN,McDermottJ,ChunMM(1997)Thefusiformfacearea:amodulein801humanextrastriatecortexspecializedforfaceperception.Journalof802Neuroscience17:4302-4311.803
KeysersC,XiaoDK,FoldiakP,PerretDI(2001)Thespeedofsight.Journalof804CognitiveNeuroscience13:90-101.805
KianiR,EstekyH,MirpourK,TanakaK(2007)Objectcategorystructureinresponse806patternsofneuronalpopulationinmonkeyinferiortemporalcortex.J807Neurophysiol97:4296-4309.808
KobatakeE,TanakaK(1994)Neuronalselectivitiestocomplexobjectfeaturesin809theventralvisualpathwayofthemacaquecerebralcortex.JNeurophysiol81071:856-867.811
KochC(1999)BiophysicsofComputation.NewYork:OxfordUniversityPress.812KourtziZ,ConnorCE(2011)NeuralRepresentationsforObjectPerception:813
Structure,Category,andAdaptiveCoding.AnnuRevNeursci34:45-67.814KreimanG(2002)Ontheneuronalactivityinthehumanbrainduringvisual815
recognition,imageryandbinocularrivalry.In:Biology.Pasadena:California816InstituteofTechnology.817
KreimanG(2004)Neuralcoding:computationalandbiophysicalperspectives.818PhysicsofLifeReviews1:71-102.819
KreimanG(2007)Singleneuronapproachestohumanvisionandmemories.820Currentopinioninneurobiology17:471-475.821
KreimanG(2017)Anullmodelforcorticalrepresentationswithgrandmothers822galore.Language,CognitionandNeuroscience32:274-285.823
KreimanG,KochC,FriedI(2000a)Imageryneuronsinthehumanbrain.Nature824408:357-361.825
KreimanG,KochC,FriedI(2000b)Category-specificvisualresponsesofsingle826neuronsinthehumanmedialtemporallobe.NatureNeuroscience3:946-827953.828
KrizhevskyA,SutskeverI,HintonG(2012)ImageNetClassificationwithDeep829ConvolutionalNeuralNetworks.In:NIPS.Montreal.830
KubiliusJ,BracciS,OpdeBeeckHP(2016)DeepNeuralNetworksasa831ComputationalModelforHumanShapeSensitivity.PLoSComputBiol83212:e1004896.833
KufflerS(1953)Dischargepatternsandfunctionalorganizationofmammalian834retina.JournalofNeurophysiology16:37-68.835
LeopoldDA,BondarIV,GieseMA(2006)Norm-basedfaceencodingbysingle836neuronsinthemonkeyinferotemporalcortex.Nature442:572-575.837
LesicaNA,StanleyGB(2004)Encodingofnaturalscenemoviesbytonicandburst838spikesinthelateralgeniculatenucleus.JournalofNeuroscience24:10731-83910740.840
LiW,PiechV,GilbertCD(2004)Perceptuallearningandtop-downinfluencesin841primaryvisualcortex.Natureneuroscience7:651-657.842
LinsleyD,EberhardtS,SharmaT,GuptaP,SerreT(2017)Whatarethevisual843featuresunderlyinghumanversusmachinevision?In:IEEEICCVWorkshop844ontheMutualBenefitofCognitiveandComputerVision.845
LiuH,AgamY,MadsenJR,KreimanG(2009)Timing,timing,timing:Fastdecoding846ofobjectinformationfromintracranialfieldpotentialsinhumanvisual847cortex.Neuron62:281-290.848
LogothetisNK,SheinbergDL(1996)Visualobjectrecognition.AnnualReviewof849Neuroscience19:577-621.850
MaheswaranathanN,KastnerDB,BaccusSA,GanguliS(2018)Inferringhidden851structureinmultilayeredneuralcircuits.PLoSComputBiol14:e1006291.852
MarkovNTetal.(2014)AWeightedandDirectedInterarealConnectivityMatrixfor853MacaqueCerebralCortex.Cerebralcortex24:17-36.854
McMahonDB,JonesAP,BondarIV,LeopoldDA(2014)Face-selectiveneurons855maintainconsistentvisualresponsesacrossmonths.Proceedingsofthe856NationalAcademyofSciencesoftheUnitedStatesofAmerica111:8251-8578256.858
MelB(1997)SEEMORE:Combiningcolor,shapeandtexturehistogrammingina859neurallyinspiredapproachtovisualobjectrecognition.NeuralComputation8609:777.861
MessingerA,SquireLR,ZolaSM,AlbrightTD(2001)Neuronalrepresentationsof862stimulusassociationsdevelopinthetemporallobeduringlearning.863ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesof864America98:12239-12244.Epub12001Sep12225.865
MeyersE,FreedmanD,KreimanG,MillerE,PoggioT(2008)DynamicPopulation866CodingofCategoryInformationinITCandPFC.JournalofNeurophysiology867100:1407-1419.868
MiyashitaY(1988)Neuronalcorrelateofvisualassociativelong-termmemoryin869theprimatetemporalcortex.Nature335:817-820.870
MordvintsevA,OlahC,TykaM(2015)DeepDream-acodeexampleforvisualizing871NeuralNetworks.In:GoogleResearch.MountainView:Google.872
MormannF,DuboisJ,KornblithS,MilosavljevicM,CerfM,IsonM,TsuchiyaN,873KraskovA,QuirogaRQ,AdolphsR,FriedI,KochC(2011)Acategory-specific874responsetoanimalsintherighthumanamygdala.Natureneuroscience87514:1247-1249.876
MovshonJA,NewsomeWT(1992)NeuralFoundationsofVisualMotionPerception.877CurrentDirectionsinPsychologicalScience1:35-39.878
MukamelR,FriedI(2012)Humanintracranialrecordingsandcognitive879neuroscience.Annualreviewofpsychology63:511-537.880
NassiJ,Gomez-LabergeC,KreimanG,BornR(2014)Corticocorticalfeedback881increasesthespatialextentofnormalizationFrontiersinSystems882Neuroscience8.883
NiellCM,StrykerMP(2010)Modulationofvisualresponsesbybehavioralstatein884mousevisualcortex.Neuron65:472-479.885
O'ConnellT,ChunMM,KreimanG(2018)Zero-shotneuraldecodingofbasic-level886objectcategory.In:Cosyne.Denver.887
OkazawaG,TajimaS,KomatsuH(2015)Imagestatisticsunderlyingnaturaltexture888selectivityofneuronsinmacaqueV4.ProceedingsoftheNationalAcademy889ofSciencesoftheUnitedStatesofAmerica112:E351-360.890
OlshausenB,FieldD(2004)Sparsecodingofsensoryinputs.Currentopinionin891neurobiology14:481-487.892
OlshausenBA,FieldDJ(1996)Emergenceofsimple-cellreceptivefieldproperties893bylearningasparsecodefornaturalimages.Nature381:607-609.894
OlshausenBA,AndersonCH,VanEssenDC(1993)Aneurobiologicalmodelofvisual895attentionandinvariantpatternrecognitionbasedondynamicroutingof896information.JournalofNeuroscience13:4700-4719.897
PasupathyA,ConnorCE(2001)ShaperepresentationinareaV4:position-specific898tuningforboundaryconformation.JNeurophysiol86:2505-2519.899
PhillipsPJ,YatesAN,HuY,HahnCA,NoyesE,JacksonK,CavazosJG,JeckelnG,900RanjanR,SankaranarayananS,ChenJC,CastilloCD,ChellappaR,WhiteD,901O'TooleAJ(2018)Facerecognitionaccuracyofforensicexaminers,902superrecognizers,andfacerecognitionalgorithms.Proceedingsofthe903NationalAcademyofSciencesoftheUnitedStatesofAmerica115:6171-9046176.905
PonceCR,XiaoW,SchadePF,HartmannTS,KreimanG,LivingstoneM(2019)906Evolvingsuperstimuliforrealneuronsusingdeepgenerativenetworks.907Biorxiv10.1101/516484.908
QuianQuirogaR,ReddyL,KreimanG,KochC,FriedI(2005)Invariantvisual909representationbysingleneuronsinthehumanbrain.Nature435:1102-1107.910
RajalinghamR,IssaEB,BashivanP,KarK,SchmidtK,DiCarloJJ(2018)Large-Scale,911High-ResolutionComparisonoftheCoreVisualObjectRecognitionBehavior912ofHumans,Monkeys,andState-of-the-ArtDeepArtificialNeuralNetworks.913JournalofNeuroscience38:7255-7269.914
ReynoldsJH,HeegerDJ(2009)Thenormalizationmodelofattention.Neuron91561:168.916
RiesenhuberM,PoggioT(1999)Hierarchicalmodelsofobjectrecognitionincortex.917NatureNeuroscience2:1019-1025.918
RussakovskyO,DengJ,SuH,KrauseJ,SatheeshS,MaS,HuangS,KarpathyA,Khosla919A,BernsteinM,BergA,Fei-FeiL(2014)ImageNetLargeScaleVisual920RecognitionChallenge.In:CVPR:arXiv:1409.0575,2014.921
SerreT(2019)Deeplearning:thegood,thebadandtheugly.AnnualReviewof922VisionInPress.923
SerreT,KreimanG,KouhM,CadieuC,KnoblichU,PoggioT(2007)Aquantitative924theoryofimmediatevisualrecognition.ProgressInBrainResearch165C:33-92556.926
SheinbergDL,LogothetisNK(2001)Noticingfamiliarobjectsinrealworldscenes:927theroleoftemporalcorticalneuronsinnaturalvision.Journalof928Neuroscience21:1340-1350.929
SigalaN,LogothetisN(2002)Visualcategorizationshapesfeatureselectivityinthe930primatetemporalcortex.Nature415:318-320.931
SimoncelliE,OlshausenB(2001)NaturalImageStatisticsandNeural932Representation.AnnualReviewofNeuroscience24:193-216.933
SimonyanK,ZissermanA(2014)Verydeepconvolutionalnetworksforlarge-scale934imagerecognition.arXiv1409.1556.935
SugaseY,YamaneS,UenoS,KawanoK(1999)Globalandfineinformationcodedby936singleneuronsinthetemporalvisualcortex.Nature400:869-873.937
SuzukiWA(2007)Makingnewmemories:theroleofthehippocampusinnew938associativelearning.AnnalsoftheNewYorkAcademyofSciences1097:1-11.939
TanakaK(2003)Columnsforcomplexvisualobjectfeaturesintheinferotemporal940cortex:clusteringofcellswithsimilarbutslightlydifferentstimulus941selectivities.Cerebralcortex13:90-99.942
TangH,LotterW,SchrimpfM,ParedesA,OrtegaJ,HardestyW,CoxD,KreimanG943(2018)Recurrentcomputationsforvisualpatterncompletion.PNAS.944
ThomasE,vanHulleM,VogelsR(2001)Encodingofcategoriesbynoncategory-945specificneurosnintheinferiortemporalcortex.JournalofCognitive946Neuroscience13:190-200.947
ThorpeS,FizeD,MarlotC(1996)Speedofprocessinginthehumanvisualsystem.948Nature381:520-522.949
TsaoDY,FreiwaldWA,TootellRB,LivingstoneMS(2006)Acorticalregion950consistingentirelyofface-selectivecells.Science311:670-674.951
UllmanS,AssifL,FetayaE,HarariD(2016)Atomsofrecognitioninhumanand952computervisionPNAS.953
vanderMaatenL,HintonG(2008)VisualizingHigh-DimensionalDataUsingt-SNE.J954MachineLearningRes9:2579-2605.955
VaziriS,ConnorCE(2016)RepresentationofGravity-AlignedSceneStructurein956VentralPathwayVisualCortex.CurrentBiology26:766-774.957
VinjeWE,GallantJL(2000)Sparsecodinganddecorrelationinprimaryvisual958cortexduringnaturalvision.Science287:1273-1276.959
VogelsR(1999)Categorizationofcomplexvisualimagesbyrhesusmonkeys:Part2:960single-cellstudy.EuropeanJournalofNeuroscience11:1239-1255.961
WallisG,RollsET(1997)Invariantfaceandobjectrecognitioninthevisualsystem.962PROGRESSINNEUROBIOLOGY51:167-194.963
WuMC,DavidSV,GallantJL(2006)Completefunctionalcharacterizationofsensory964neuronsbysystemidentification.AnnuRevNeurosci29:477-505.965
YamaneY,CarlsonET,BowmanKC,WangZ,ConnorCE(2008)Aneuralcodefor966three-dimensionalobjectshapeinmacaqueinferotemporalcortex.Nature967neuroscience11:1352-1360.968
YaminsDL,HongH,CadieuCF,SolomonEA,SeibertD,DiCarloJJ(2014)969Performance-optimizedhierarchicalmodelspredictneuralresponsesin970highervisualcortex.ProceedingsoftheNationalAcademyofSciencesofthe971UnitedStatesofAmerica111:8619-8624.972
ZekiS(1983)Colorcodinginthecerebralcortex-Thereactionofcellsinmonkey973visualcortextowavelengthsandcolors.Neuroscience9:741-765.974
975