+ All Categories
Home > Documents > Deep Fisher Networks and Class Saliency Maps for and...

Deep Fisher Networks and Class Saliency Maps for and...

Date post: 07-Jul-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
33
Deep Fisher Networks and Class Saliency Maps for Object Classification and Localisation Karén Simonyan, Andrea Vedaldi, Andrew Zisserman Visual Geometry Group, University of Oxford
Transcript
Page 1: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •

Deep Fisher Networks and Class Saliency Maps for 

Object Classification and Localisation

Karén Simonyan, Andrea Vedaldi, Andrew ZissermanVisual Geometry Group, University of Oxford

Page 2: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •

Outline

• Classification challenge• can Fisher Vector encodings be improved by a deep architecture?• deep Fisher Network (FN)• combination of two deep models: Convolutional Network (CN) and deep Fisher Network

• Localisation challenge• visualization of class saliency maps  and  per‐image foreground  pixels from a single classification CN

• bounding boxes computed from foreground pixels• weak supervision: only image class labels used for training

Page 3: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •

• Bag of Visual Words (BOW) pipeline

... ... ... ...

VQ

Linear SVM

dogs

Shallow Image Encoding & Classification• Dense SIFT features

[Luong & Malik, 1999][Varma & Zisserman, 2003]

[Csurka et al, 2004] [Vogel & Schiele, 2004]

[Jurie & Triggs, 2005][Lazebnik et al, 2006]

[Bosch et al, 2006]

Page 4: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •

soft-assignment to GMM

1st order stats (k-th Gaussian):

2nd order stats (k-th Gaussian):

80-D 80-D 80-D

FV dimensionality: 80×2×512=81,920(for a mixture of 512 Gaussians)

stacking e.g. if SIFT x reduced to 80 dimensions by PCA

Dense set of local SIFT features → Fisher vector (high dim)

Fisher Vector (FV) – Encoding

Perronnin et al CVPR 07 & 10, ECCV 10

Page 5: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •

• Learn projection onto a low-dim space where classes are well-separated

• Joint learning of projection and projected-space classifiers (WSABIE):

• Or project onto the space of classifier scores:

• are linear SVM classifiers in the high-dimensional FV space

• fast-to-learn

Projection LearningFisher vector (high dim) → low dimensional representation

Page 6: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •

Deep Fisher Network

Dense feature extractionSIFT, colour

One vs. rest linear SVMs

low-dim FV encoder

Spatial stacking

L2 norm. & PCA

FV encoder

SSR & L2 norm.

SSR & L2 norm.

input image

0-th layer

1-st Fisher layer (local & global pooling)

2-nd Fisher layer(global pooling)

classifier layer

Dense feature extraction

SIFT, raw patches, …

One vs rest linear SVMs

FV encoder

SSR & L2 norm.

Shallow Fisher Vector

Page 7: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •

Fisher Layer

Compressedlocal Fisher encoding

Spatial stacking(2×2)

L2 norm‐n & PCA 

decorrelation

featurew

h

80

w/2

h/2

82000

w/2

h/2

1000

w/2

h/2

4000

w/2

h/2

256

Page 8: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •

Deep Fisher Network

Dense feature extractionSIFT, colour

One vs. rest linear SVMs

low-dim FV encoder

Spatial stacking

L2 norm. & PCA

FV encoder

SSR & L2 norm.

SSR & L2 norm.

input image

0-th layer

1-st Fisher layer (local & global pooling)

2-nd Fisher layer(global pooling)

classifier layer

Dense feature extraction

SIFT, raw patches, …

One vs rest linear SVMs

FV encoder

SSR & L2 norm.

Shallow Fisher Vector

Page 9: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •

Classification Results for Fisher Network

ImageNet 2010 challenge dataset: • 1.2M images, 1K classes• SIFT & colour features• Learning: 2‐3 days on 200 CPU cores  (MATLAB + MEX implementation)

Improved classification accuracy by adding layer

Page 10: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •

Deep ConvNet Implementation

• Based on cuda‐convnet [Krizhevsky et al., 2012]• 8 weight layers (rather narrow):

conv64‐conv256‐conv256‐conv256‐conv256‐full4096‐full4096‐full1000

• Jittering:• cropping, flipping, PCA‐aligned noise• random occlusion:

• Single ConvNet instance

Page 11: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •

Classification Results

ImageNet 2012 challenge dataset: • 1.2M images, 1K classes• top‐5 classification accuracy

Method top‐5 accuracyFV encoding (our 2012 entry) 72.7%Deep FishNet 76.9%Deep ConvNet [Krizhevsky et al., 2012] 81.8%

83.6% (5 ConvNets)Deep ConvNet (our implementation) 82.3%Deep ConvNet + Deep FishNet 84.8%

ConvNet and FisherNet are complementary

Page 12: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •

Outline

• Classification challenge• can Fisher Vector encodings be improved by a deep architecture?• deep Fisher Network (FN)• combination of two deep models: Convolutional Network (CN) and deep Fisher Network

• Localisation challenge• visualization of class saliency maps  and  per‐image foreground  pixels from a single classification CN

• bounding boxes computed from foreground pixels• weak supervision: only image class labels used for training

Page 13: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •

Deep inside ConvNets: what Has Been Learnt?

ConvNet class model visualisation• find a (regularised) image with a high class score                :

with a fixed learnt model

• compute                         using back‐prop

Cf ConvNet training• max log‐likelihood of the correct class 

• using back‐prop

Visualizing higher‐layer features of a deep network. Erhan, D., Bengio, Y., Courville, A., Vincent, P. Technical report, University of Montreal, 2009.

fully connectedclassifier layer

soft-max layer

Page 14: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •

fox

Page 15: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •

pepper

Page 16: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •

dumbbell

Page 17: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •

Deep inside ConvNets: what Has Been Learnt?

ConvNet class model visualisation• find a (regularised) image with a high class score                :

with a fixed learnt model

• compute                         using back‐prop

NB                             

Visualizing higher‐layer features of a deep network. Erhan, D., Bengio, Y., Courville, A., Vincent, P. Technical report, University of Montreal, 2009.

fully connectedclassifier layer

soft-max layer

gives less prominent visualisation, as it concentrates on reducing  scores of other classes

Page 18: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •

Deep inside ConvNets:What Makes an Image Belong to a Class?

• ConvNets are highly non‐linear → local linear approxima on

• 1st order expansion of a class score around a given image    :

• has the same dimensions as image• magnitude of     defines a saliency map for image     and class 

– computed using back‐prop

– score of  ‐th class

How to Explain Individual Classification Decisions. Baehrens, D., Schroeter, T., Harmeling, S., Kawanabe, M., Hansen, K., Müller, K.‐R. JMLR, 2010.

Page 19: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •

Saliency Maps For Top‐1 Class

Page 20: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •

Saliency Maps For Top‐1 Class

Page 21: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •

Saliency Maps For Top‐1 Class

Page 22: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •

• Weakly supervised• computed using class‐n ConvNet, trained on image class labels• no additional annotation required (e.g. boxes or masks)

• Highlights discriminative object parts• Instant computation – no sliding window• Fires on several object instances

• Related to deconvnet [Zeiler and Fergus, 2013]• very similar for convolution, max‐pooling, and RELU layers• but we also back‐prop through fully‐connected layers

Image Saliency Map

Page 23: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •

Saliency Maps for Object Localisation• Image → top‐k class → class saliency map → object box

Page 24: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •

• Given an image and a saliency map:

BBox Localisation for ILSVRC Submission

Page 25: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •

• Given an image and a saliency map:1. Foreground/background mask

using thresholds on saliency

BBox Localisation for ILSVRC Submission

blue – foregroundcyan – backgroundred – undefined

Page 26: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •

• Given an image and a saliency map:1. Foreground/background mask

using thresholds on saliency2. GraphCut colour segmentation

[Boykov and Jolly, 2001]

BBox Localisation for ILSVRC Submission

Page 27: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •

• Given an image and a saliency map:1. Foreground/background mask

using thresholds on saliency2. GraphCut colour segmentation

[Boykov and Jolly, 2001]3. Bounding box of the largest 

connected component

• Colour information propagates segmentation from the most discriminative areas

BBox Localisation for ILSVRC Submission

Page 28: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •

Segmentation‐Localisation Examples

Page 29: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •

Segmentation‐Localisation Examples

Page 30: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •

Segmentation‐Localisation Failure Cases

• Several object instances

Page 31: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •

• Segmentation isn’t propagated from the salient parts

Segmentation‐Localisation Failure Cases

Page 32: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •

• Limitations of GraphCut segmentation

Segmentation‐Localisation Failure Cases

Page 33: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •

Summary

• Fisher encoding benefits from stacking

• Deep FishNet is complementary to Deep ConvNet

• Class saliency maps are useful for localisation• location of discriminative object parts• weakly supervised: bounding boxes not used for training• fast to compute


Recommended