Dynamic Routing Between Capsules
Explainable Machine Learning
6/19/18 Michael Dorkenwald2
Introduction
Dynamic Routing Between CapsulesDynamic Routing Between Capsulesby Sara Sabour, Nicholas Frosst, Geoffrey Hinton
from October 2017
6/19/18 Michael Dorkenwald3
Geoffrey Hinton
● Significant contributions for the backpropagration algorithm
● Idea for AlexNet● Invented Dropout
6/19/18 Michael Dorkenwald4
List of Content
● Motivation for Capsules ● Idea of Inverse Graphics● Capsules● Dynamic Routing Between Capsules● Capsules on MNIST● Conclusion
6/19/18 Michael Dorkenwald5
Convolutional Neural Networks (CNN)
● Special type of multi-layer neural networks, constructed to recognize visual patterns directly from pixel images
6/19/18 Michael Dorkenwald6
Features Maps of CNN’s
6/19/18 Michael Dorkenwald7
Max-Pooling Layer
● Dimension Reduction● Selective routing of features
Loses out positional information
6/19/18 Michael Dorkenwald8
Achievements of CNN’s
6/19/18 Michael Dorkenwald9
Spatial Relation
6/19/18 Michael Dorkenwald11
Motivation
Hinton: “The pooling operation used in convolutional neural networks is a big mistake and the fact that it works so well is a disaster.”
Looking for equivarianceChanges in viewpoint leads to corresponding changes in neural activities
6/19/18 Michael Dorkenwald12
List of Content
● Motivation for Capsules ● Idea of Inverse Graphics● Capsules● Dynamic Routing Between Capsules● Capsules on MNIST● Conclusion
6/19/18 Michael Dorkenwald13
Computer Graphics
● Construct a visual image (rendering) from abstract representation of an object
6/19/18 Michael Dorkenwald14
Inverse Graphics
● Reverse process: start from the image and get the parameters trough inverse rendering
6/19/18 Michael Dorkenwald15
List of Content
● Motivation for Capsules ● Idea of Inverse Graphics● Capsules● Dynamic Routing Between Capsules● Capsules on MNIST● Conclusion
6/19/18 Michael Dorkenwald16
Capsule Network
Capsule Network is a neural network that tries to perform inverse graphics
6/19/18 Michael Dorkenwald17
Capsule
● A Group of neurons ● Goal: predict the presence and the instantiation
parameters of a specific entity at a given location● Presence represent by the length of the activity
vector (probability)● Instantiation paramerters are:
– position, size, orientation, deformation, hue, texture, etc.
6/19/18 Michael Dorkenwald18
Primary Capsule Activities
Input: Image features
1st layer: Convolutional layer with ReLu activation
2nd layer: Convolutional capsule layer
Squashing function
6/19/18 Michael Dorkenwald19
Squashing Function
● Output vector represent probability that the entity is present
● Apply non-linearity to the whole capsule
6/19/18 Michael Dorkenwald20
Capsules
Inverse Rendering
Image Capsule activations
6/19/18 Michael Dorkenwald21
Capsules
Inverse Rendering
Image
Equivariance
Capsule activations
6/19/18 Michael Dorkenwald22
List of Content
● Motivation for Capsules ● Idea of Inverse Graphics● Capsules● Dynamic Routing Between Capsules● Capsules on MNIST● Conclusion
6/19/18 Michael Dorkenwald23
Dynamic Routing
● Prediction vector
– With the previous capsule output ui and transformation matrix Wij
● The capsule output of the next layer– With vj ouput of the next layer
6/19/18 Michael Dorkenwald24
Dynamic Routing
● Couping Constants cij
– trained during the iterative dynamic routing process
– determined by ‘routing softmax’ whose initial logits bij are the log prior probabilities that capsule i should be coupled to capsule j.
● Agreement
– is simply the scalar product
6/19/18 Michael Dorkenwald25
Dynamic Routing
Routing Algorithm:
6/19/18 Michael Dorkenwald26
Dynamic Routing
Loss function:● For each digit capsule k, the loss function is margin
loss as
● where Tk = 1 when digit k is present and m+ = 0.9 and m− = 0.1. Default
6/19/18 Michael Dorkenwald27
Capsules
Hierachy of parts
6/19/18 Michael Dorkenwald28
Capsules
Inverse Rendering
6/19/18 Michael Dorkenwald29
Capsules
Inverse Rendering
Predicted outputs
6/19/18 Michael Dorkenwald30
Capsules
Inverse Rendering
AgreementShould be only routed to 7
Predicted outputs
6/19/18 Michael Dorkenwald31
Capsules
Inverse Rendering
Predicted outputs
Dynamic Routing:● bij = 0 for all i,j● Ci = softmax(bi)
0.5
0.50.5
0.5
6/19/18 Michael Dorkenwald32
Capsules
Inverse Rendering
Predicted outputs
Sj = weighted sumVj = squash(sj)
Output for round #1
0.50.5 0.5 0.5
6/19/18 Michael Dorkenwald33
Capsules
Inverse Rendering
Predicted outputs
Sj = weighted sumVj = squash(sj)
Output for round #1
0.50.5 0.5 0.5
Agreement huge
6/19/18 Michael Dorkenwald34
Capsules
Inverse Rendering
Predicted outputs
Sj = weighted sumVj = squash(sj)
Output for round #1
0.50.5 0.5 0.5
Disagreement small
6/19/18 Michael Dorkenwald35
Capsules
Inverse Rendering
Predicted outputs
Dynamic Routing:● bij = 0 for all i,j● Ci = softmax(bi)
0.8
0.10.2
0.9
6/19/18 Michael Dorkenwald36
Capsules
Inverse Rendering
Predicted outputs
Sj = weighted sumVj = squash(sj)
Output for round #2
0.80.2 0.1 0.9
6/19/18 Michael Dorkenwald37
Clustering on agreement
What really happens:
Mean
6/19/18 Michael Dorkenwald38
Clustering on agreement
What really happens:
Weighted mean
6/19/18 Michael Dorkenwald39
Clustering on agreement
What really happens:
Weighted mean
6/19/18 Michael Dorkenwald40
Classification
Inverse Rendering
Loss function
6/19/18 Michael Dorkenwald41
Capsule Network Architecture
6/19/18 Michael Dorkenwald42
Reconstruction
Inverse Rendering
Loss function
Decoder
Neural Net
Reconstruction
6/19/18 Michael Dorkenwald43
Reconstruction
● Decoder structure to reconstruct a digit from the DigitCaps layer
6/19/18 Michael Dorkenwald44
Reconstruction as a regularization method
● Force the digit capsules to encode the instantiation parameters of the input digit
● minimize the sum of squared differences between the outputs of the logistic units and the pixel intensities
● Loss = margin loss + a reconstruction loss
with a = 0.0005
6/19/18 Michael Dorkenwald45
List of Content
● Motivation for Capsules ● Idea of Inverse Graphics● Capsules● Dynamic Routing Between Capsules● Capsules on MNIST● Conclusion
6/19/18 Michael Dorkenwald46
Capsules on MNIST
● Images have been shifted by up to 2 pixel in each direction with zero padding, no other data augmentation or model averaging
● Baseline standard CNN with three Conv-Layers of (256,256,128 channels, 5x5 kernel, stride 1) followed by 2 fully connected layers (328,192 (dropout))
● Number of parameters: baseline 35.4 M, CapsNet 8.2 M and without reconstrcution subnetwork 6.8 M
6/19/18 Michael Dorkenwald47
Individual Dimensions of a Capsule
6/19/18 Michael Dorkenwald48
Robustness to Affine Transformations
● Trained a CapsNet and traditional CNN (with pooling) on a padded and translated MNIST training set
● tested networks on the affNIST (MNIST digit with a random small affine transformation)
● Under-trained CapsNet (99,23%) achieved 79 %● Traditional CNN (99,22%) with similar number of
parameters achieved 66 %
6/19/18 Michael Dorkenwald49
MultiMNIST
● create MultiMNIST dataset by overlaying a digit on top of another digit of a different class
● For each digit in MNIST they generate 1K MultiMNIST examples● Trainings set size 60M, test set size 10 M
6/19/18 Michael Dorkenwald50
MultiMNIST
6/19/18 Michael Dorkenwald51
CIFAR10
● Slight modification from the simple model they used for MNIST, with 3 routing iteration
● Achieved 10.6 % test error ● About what standard CNN achieved when they
were first applied to CIFAR10
6/19/18 Michael Dorkenwald52
List of Content
● Motivation for Capsules ● Idea of Inverse Graphics● Capsules● Dynamic Routing Between Capsules● Capsules on MNIST● Conclusion
6/19/18 Michael Dorkenwald53
Conclusion
● Achieved state of the art accuracy on MNIST● Spatial relation are preserved (equivariance)
– Promising for object detection and segmentation● Dynamic Routing works great for overlapping digits● Robustness to affine transformation● Activation vectors are easier to interpret (scale, thickness,
rotation etc.)● Ability to analyze the hierarchy of objects
6/19/18 Michael Dorkenwald54
Conclusion
● Not state of the art on CIFAR10● No results on larger data sets (e.g. ImageNet)● Slow in training, because of the inner loop in
Dynamic Routing
6/19/18 Michael Dorkenwald55
Sources
● Slide 3: https://medium.com/ai%C2%B3-theory-practice-business/understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b
● Slide 5: https://www.mathworks.com/discovery/convolutional-neural-network.html
● Slide 6 and 7: https://cs231n.github.io/convolutional-networks/#pool
● Slide 11: https://www.reddit.com/r/MachineLearning/comments/2lmo0l/ama_geoffrey_hinton/clyj4jv/
● Slide 7: https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/pooling_layer.html
● Slide 13 and 14 : https://kndrck.co/posts/capsule_networks_explained/
● Slide 8:https://medium.com/ai%C2%B3-theory-practice-business/understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b
● Slide 9: https://hackernoon.com/capsule-networks-are-shaking-up-ai-heres-how-to-use-them-c233a0971952
6/19/18 Michael Dorkenwald56
Thank you !