+ All Categories
Home > Documents > Unsupervised Visual Representation Learning by Context Prediction

Unsupervised Visual Representation Learning by Context Prediction

Date post: 14-Feb-2017
Category:
Upload: phamkien
View: 236 times
Download: 0 times
Share this document with a friend
33
Unsupervised Visual Representation Learning by Context Prediction Most slides in this representation are adopted from authors' original presentation at ICCV 2015 Berkan Demirel
Transcript
Page 1: Unsupervised Visual Representation Learning by Context Prediction

UnsupervisedVisualRepresentationLearningbyContextPrediction

Mostslidesinthisrepresentationareadoptedfromauthors'originalpresentationatICCV2015

Berkan Demirel

Page 2: Unsupervised Visual Representation Learning by Context Prediction

ImageNet +DeepLearning

Beagle

- ImageRetrieval- Detection(RCNN)- Segmentation(FCN)- DepthEstimation- …

Page 3: Unsupervised Visual Representation Learning by Context Prediction

ImageNet +DeepLearning

Beagle

Dowe needsemanticlabels?Pose?

Boundaries?Geometry?

Parts?Materials?

Page 4: Unsupervised Visual Representation Learning by Context Prediction

ContextasSupervision[Collobert&Weston2008;Mikolov etal.2013]

DeepNet

Page 5: Unsupervised Visual Representation Learning by Context Prediction

ContextPredictionforImages

A B

? ? ?

??

? ? ?

Page 6: Unsupervised Visual Representation Learning by Context Prediction

Semanticsfromanon-semantictask

Page 7: Unsupervised Visual Representation Learning by Context Prediction

RandomlySamplePatchSampleSecondPatch

CNN CNN

Classifier

RelativePositionTask8possiblelocations

Page 8: Unsupervised Visual Representation Learning by Context Prediction

CNN CNN

Classifier

PatchEmbedding

Input NearestNeighbors

CNN Note:connectsacross instances!

Page 9: Unsupervised Visual Representation Learning by Context Prediction

Architecture

Patch2Patch1

Fullyconnected

MaxPoolingLRN

MaxPoolingLRN

ConvolutionConvolutionConvolution

Convolution

Convolution

MaxPooling

MaxPoolingLRN

MaxPoolingLRN

Fullyconnected

ConvolutionConvolutionConvolution

Convolution

Convolution

MaxPooling

Softmax loss

Fullyconnected

Fullyconnected

TiedWeights

Page 10: Unsupervised Visual Representation Learning by Context Prediction

AvoidingTrivialShortcuts

Includeagap

Jitterthepatchlocations

Page 11: Unsupervised Visual Representation Learning by Context Prediction

PositioninImage

ANot-So“Trivial”Shortcut

Page 12: Unsupervised Visual Representation Learning by Context Prediction

ChromaticAberration

Page 13: Unsupervised Visual Representation Learning by Context Prediction

Solutions

ColorDroppingRandomlydrop2ofthe3colorchannelsfromeachpatch.Then,replacingthedroppedcolorswithGaussianNoise(standarddeviation~1/100thestandard

deviationoftheremainingchannel).

ProjectionShiftgreenandmagenta(red+blue)towardsgray

Page 14: Unsupervised Visual Representation Learning by Context Prediction

ImplementationDetails• TrainontheImageNet2012trainingset(1.3Mimages),usingonlytheimagesanddiscarding

thelabels.• Resizeeachimagetobetween150Kand450Ktotalpixels,preservingtheaspect-ratio.• Samplepatchesatresolution96-by-96.• Samplethepatchesfromagridlikepattern.Eachsampledpatchcanparticipateinasmanyas

8separatepairings.• Allowagapof48pixelsbetweenthesampledpatchesinthegrid,butalsojitterthe location

ofeachpatchinte gridby–7to7pixelsineachdirection.• Preprocesspatchesby(1)meansubstraction,(2)projectingordroppingcolors,(3)randomly

downsamplingsomepatchestoaslittleas100totalpixels,andthenupsamplingit,tobuildrobustness topixelation.

• Usebatchnormalization,without thescaleandshift.

Page 15: Unsupervised Visual Representation Learning by Context Prediction

Experiments• ChromaticAberration• Nearest-NeighborMatching• ObjectDetection• GeometryEstimation• VisualDataMining• LayoutPrediction

Page 16: Unsupervised Visual Representation Learning by Context Prediction

ChromaticAberration

CNN

Page 17: Unsupervised Visual Representation Learning by Context Prediction

ChromaticAberration

CNN

Page 18: Unsupervised Visual Representation Learning by Context Prediction

Nearest-NeighborMatching• fc6layerfeaturesandonlyoneofthetwostacksareused.• fc7andhigherlayersareremoved.• Normalizedcrosscorrelationisusedtofindsimilarpatches• Randomlyselected96x96patchesareusedinthecomparison.

Page 19: Unsupervised Visual Representation Learning by Context Prediction

Ours

Whatislearned?

Input RandomInitialization ImageNet AlexNet

Page 20: Unsupervised Visual Representation Learning by Context Prediction

Stilldon’tcaptureeverythingInput Ours RandomInitialization ImageNet AlexNet

Youdon’talwaysneedtolearn!Input Ours RandomInitialization ImageNet AlexNet

Page 21: Unsupervised Visual Representation Learning by Context Prediction

ObjectDetection

Pre-trainonrelative-positiontask,w/olabels

[Girshick etal.2014]

Page 22: Unsupervised Visual Representation Learning by Context Prediction

ObjectDetection

[Girshick etal.2014]

Page 23: Unsupervised Visual Representation Learning by Context Prediction

ObjectDetection

[Girshick etal.2014]

Page 24: Unsupervised Visual Representation Learning by Context Prediction

Multi-TaskTraining?

Page 25: Unsupervised Visual Representation Learning by Context Prediction

Surface-normalEstimation

Error (LowerBetter) %GoodPixels(HigherBetter)

NoPretraining 38.6 26.5 33.1 46.8 52.5Unsup.Track. 34.2 21.9 35.7 50.6 57.0Ours 33.2 21.3 36.0 51.2 57.8ImageNet Labels 33.3 20.8 36.7 51.7 58.1

Page 26: Unsupervised Visual Representation Learning by Context Prediction

VisualDataMining• Sampleaconstellationoffouradjacentpatchesfroman

image(weusefourtoreducethelikelihoodofamatchingspatialarrangementhappeningbychance).

• Findtop100imageswhichhavethestrongestmatchesforallfourpatches,ignoringspatiallayout.

• Useatypeofageometricverificationtofilterawaytheimageswherethefourmatchesarenotgeometricallyconsistent.

• ApplythedescribedminingalgorithmtoPascalVOC2011.

Page 27: Unsupervised Visual Representation Learning by Context Prediction

VisualDataMining

ViaGeometricVerification

Simplifiedfrom[Chumetal2007]

Page 28: Unsupervised Visual Representation Learning by Context Prediction

MinedfromPascalVOC2011

Page 29: Unsupervised Visual Representation Learning by Context Prediction

LayoutPredictionVisualDataMiningAlgorithmresultsfor15,000StreetViewimagesfromParis

Page 30: Unsupervised Visual Representation Learning by Context Prediction

Purity Test

Page 31: Unsupervised Visual Representation Learning by Context Prediction

So,doweneedsemanticlabels?

Page 32: Unsupervised Visual Representation Learning by Context Prediction

SourceCode&SupplementaryMaterials

• MagicInit• UnsupervisedVisualRepresentationLearningbyContextPrediction• VisualDataMiningResultsonunlabeledPASCALVOC2011Images• NearestNeighborsonPASCALVOC2007• More

Page 33: Unsupervised Visual Representation Learning by Context Prediction

THANKYOU!


Recommended