+ All Categories
Home > Documents > Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks."...

Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks."...

Date post: 16-Oct-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
55
Spatial Transformer Networks Shashank Tyagi Ishan Gupta Based on: Jaderberg, Max, et al. "Spatial transformer networks." Proceedings of the 28th International Conference on Neural Information Processing Systems. MIT Press, 2015.
Transcript
Page 1: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Spatial Transformer Networks

Shashank TyagiIshan Gupta

Based on: Jaderberg, Max, et al. "Spatial transformer networks." Proceedings of the 28th International Conference on Neural Information Processing Systems. MIT Press, 2015.

Page 3: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Outline● Introduction● Limitations of CNNs● Related work● Spatial transformer

○ Architecture○ Mathematical formulation

● Experimental results● Conclusion

3

Page 4: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Introduction● Convolutional Neural Networks.

4

Page 5: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Visualizing CNNs

Harley, Adam W. "An interactive node-link visualization of convolutional neural networks." International Symposium on Visual Computing. Springer International Publishing, 2015.

5

Page 6: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

● Limited spatial invariance.● Max pooling has small spatial support.● Only deep layers (towards output) achieve invariance.● No rotation and scaling invariance.● Fixed location and size of the receptive field puts a bottleneck on dealing with

invariance.

Limitations

6http://cdn-ak.f.st-hatena.com/images/fotolife/v/vaaaaaanquish/20150126/20150126055504.png

Page 7: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Related Work

7

● Hinton’s work on Autoencoders

● Local Scale Invariant Convolutional Neural Networks

Page 8: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Related Work● Previous works cover the ideas behind modelling transformations with Neural

Networks and learning transformation invariant representations.● Spatial Transformers manipulate the data layer rather than feature extractors.● The introduction of selective attention brought the idea of looking at specific

parts in the image which can be termed as the region of interests.● In that sense, Spatial Transformers are introduced as a differentiable attention

scheme which also learns along the spatial transformation.

8

Page 9: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Spatial Transformer● A dynamic mechanism that actively spatially transforms an image or feature map by learning

appropriate transformation matrix.● Transformation matrix is capable of including translation, rotation, scaling, cropping and non-rigid

deformations. ● Allows for end to end trainable models using standard back-propagation.

9

Page 10: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Spatial Transformer● Three differentiable modules:

○ Localisation network.○ Parameterised Sampling Grid (Grid Generator).○ Differentiable Image Sampling (Sampler).

10

Page 11: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Localisation Network

● Takes in feature map U ∈ RH×W×C and outputs parameters of the transformation.

● Can be realized using fully-connected or convolutional networks regressing the transformation parameters.

11

Page 12: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Parameterised Sampling Grid (Grid Generator)

● Generates sampling grid by using the transformation predicted by the localization network.

12

Page 13: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Parameterised Sampling Grid (Grid Generator)● Attention Model:

13

Target regular gridSource transformed grid

Identity Transform (s=1, tx=0, ty = 0)

Page 14: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Parameterised Sampling Grid (Grid Generator)● Affine transform:

14

Target regular gridSource transformed grid

Page 15: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Differentiable Image Sampling (Sampler)

● Samples the input feature map by using the sampling grid and produces the output map.

15

Page 16: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Mathematical Formulation of Sampling● General Formulation

Target feature value at location i in channel c

Input feature value at location (n,m) in channel c

Sampling coordinates

Sampling kernel

16

Kernel parameters

Page 17: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Kernels● Integer sampling kernel

● Bilinear sampling kernel

17

Page 18: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Backpropagation through Sampling Mechanism● Gradient with bilinear sampling kernel

18

Page 19: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Experiments: Evaluating spatial transformer networks.

● Distorted MNIST● Traffic Sign Detection● Co-localisation

19

Applications: Incorporating spatial transformers in CNNs.

● Multiple Spatial Transformers● Spatial Attention● Saliency Detection and Refinement

Page 20: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Distorted MNIST● Heavy reduction in training losses can be easily achieved using deep networks

already trained on diverse classes of images.● But what happens when the trained networks sees this !

20

Page 21: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Distorted MNIST● Distorted MNIST dataset is created by performing rotation, scale and rotation

and projective transformation on the available MNIST dataset.● Affine , Projective and Thin Spline Transformations were learnt by the

localization network of Spatial Transformer.● ST-FCN model improvises over the baseline FCN and CNN model and the

experiment justifies that spatial transformers have a complementary relationship with max pooling.

21

Page 22: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Distorted MNIST● Results

22

R : RotationRTS : Rotation, Translation and ScalingP : Projective DistortionE: Elastic Distortion

(a) : Inputs to the network(b) : Transformation applied by the Spatial Transformer Network(c) : Output of the Spatial Transformer Network

Page 23: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Traffic Sign Detection● Experiment performed by MoodStocks (French Image Recognition Startup)● Evaluation on GTSRB (German Traffic Sign Recognition Benchmark).● GTSRB dataset contains images spread over 43 classes.● A total of 39,209 training examples and 12,630 test ones.

23

Page 24: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Traffic Sign DetectionVisualising Spatial Transformers during training:

24

● On the left is the original image.

● On the right is the spatial transformation.

● On the bottom is the counter for training steps.

Page 25: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Traffic Sign DetectionPost Training:

● Images took from a video sequence while approaching a traffic sign.

25

Page 26: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Traffic Sign DetectionResults:

26

Page 27: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Co-Localisation● A semi-supervised learning scheme.● Require no training labels or the location ground truth. ● Applied on a dataset where each sample contains a common feature of any

class.● Wait, this covers the semi part but it is still supervised. How do you train it?

Triplet Loss

27

Cropped Image In Cropped Image Im Randomly Sampled Patch

Page 28: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Co-LocalisationTraining Procedure:

28

Page 29: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Co-LocalisationIterating the training process.

29

Page 30: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Multiple Spatial Transformers● As seen in the previous slides, Spatial Transformers can be inserted

before/after the conv layers, before/after max pooling.● Spatial Transformers can also be attached in parallel to learn the focus on

multiple objects in parallel.● Limitation :

○ Need to have as many spatial transformers in parallel as the number of objects to model.

30

Page 31: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Multiple Spatial Transformers● Adding digits in two images

31

Page 32: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Spatial AttentionInspiration behind attention:

● How do humans perceive a scene?● Do they compress the entire image into a static representation?● Or, do we focus on a single object at a time and learn the sequence generated to

develop a semantic?

32

Page 33: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Spatial AttentionInspiration behind attention:

33

Neural Machine Translation

Page 34: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Spatial Attention● Motivation:

34

Page 35: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Spatial AttentionHard vs Soft Attention:

35

Page 36: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Spatial AttentionSoft Attention:

● Uses a weighted sum of features as an input to the sequence generator.● Probabilities against each feature are learned.● Fully Differentiable and can be trained using standard back propagation.● Uses the whole input at all times.

36

Page 37: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Spatial AttentionSoft Attention:

37

Page 38: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Spatial AttentionHard Attention:

● Uses a single feature at a time for sequence generation.● A subset of soft attention where all the weights except one are zero.● Not differentiable.● Uses Reinforcement Learning to set rewards and decide the next state.

38

Page 39: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Spatial AttentionHard Attention:

39

Page 40: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Spatial Attention

● Spatial Transformers can be utilised as a differentiable attention mechanism.● Each transformer in the network focuses on discriminative object parts.● Predicts the location of the attention window and samples the cropped region.● Each output can then be described by its own network stream.

40

Page 41: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Spatial AttentionNetwork Architecture:

41

Page 42: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Spatial AttentionResults on CUB-200-2011 Birds Dataset using Spatial Transformers.

42

Page 43: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Saliency Detection and RefinementWhat is Saliency Detection?

● Detect high level modalities from the image by segmenting out objects with boundaries.

43Input Image Saliency Map

Page 44: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Saliency Detection and RefinementDetection Cues:

● Color spatial distribution.● Center Surround Histogram.● MultiScale Contrast.

44

Page 45: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Saliency Detection and RefinementThe need for accurate detection and refinement.

● Not able to capture high level information about the object and the surroundings.

● Computationally Intensive solutions to handle all scales.

45

Page 46: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Saliency Detection and RefinementCNN-DecNN Architecture.

46

Input Image Saliency Map

Page 47: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Saliency Detection and RefinementRecurrent Model using Spatial Transformers.

47

Spatial Transformers can perform attention….Remember!

Page 48: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Saliency Detection and RefinementImplementation Details.

● Generate Initial Saliency Map using a predefined CNN-DeCNN network.● RNNs are used to provide recurrent attention to refine the saliency map.● Spatial Transformers are learned to focus on sub-parts.● Deciding the focus using the context information from previous RNN state.

48

Input ImageInitial Saliency Map AttentionAttention

Page 49: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Saliency Detection and RefinementImplementation Details:

● Hidden to hidden interaction passes contextual information which is used for saliency refinement.

● Convolutional operations used in RNNs to maintain the spatial information for the deconvolutional networks.

● Double Layer RNN used for learning location and contextual dependencies separately.

49

Page 50: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Saliency Detection and RefinementImplementation Details:

50

Page 51: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Saliency Detection and RefinementResults

51Precision Recall Curves

Page 52: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Saliency Detection and RefinementResults

52Qualitative saliency results of some evaluated images. From the leftmost column: input image, saliency groundtruth, the saliency output maps of our proposed method (CNN-DecNN + RACDNN) with mean-shift post-processing, MCDL, MDF, RRWR, BSCA, DRFI, RBD, DSR, MC and HS

Page 53: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Conclusion● Introduced a new module - spatial transformer.● Helps in learning explicit spatial transformations like translation, rotation,

scaling, cropping, non-rigid deformations, etc. of features.● Can be used in any networks and at any layer and learnt in an end-to-end

trainable manner.● Provides improvement in the performance of existing models.

53

Page 54: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

QUESTIONS?

54

Page 55: Spatial Transformer Networks - Computer Science · 5/22/2017  · "Spatial transformer networks." Advances in Neural Information Processing Systems. 2015. A. W. Harley, "An Interactive

Resources● Jaderberg, Max, Karen Simonyan, and Andrew Zisserman. "Spatial transformer networks." Advances

in Neural Information Processing Systems. 2015.● A. W. Harley, "An Interactive Node-Link Visualization of Convolutional Neural Networks," in ISVC,

pages 867-877, 2015● CS231n Coursework @Stanford● Spatial Transformer Networks - Slides by Victor Campos● Kuen, Jason, Zhenhua Wang, and Gang Wang. "Recurrent Attentional Networks for Saliency

Detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016● Hinton, Geoffrey, Alex Krizhevsky, and Sida Wang. "Transforming auto-encoders." Artificial Neural

Networks and Machine Learning–ICANN 2011 (2011): 44-51.● Kanazawa, Angjoo, Abhishek Sharma, and David Jacobs. "Locally scale-invariant convolutional neural

networks." arXiv preprint arXiv:1412.5104 (2014).

55


Recommended