Deep Learning with CNNs
University of Rome "La Sapienza"Dep. of Computer, Control and Management Engineering A. Ruberti
Valsamis Ntouskos, ALCOR Lab
Outline
Deep Learning with CNNs
• Introduction - Motivation
• Theoretical aspects
• Brief history of CNNs
• Evolution of CNNs for image classification
• Applications of CNNs in computer vision
Deep Learning with CNNs
Deep Learning with CNNs
Compositional Models
Learned End-to-End
Hierarchy of Representations- vision: pixel, motif, part, object
- text: character, word, clause, sentence
- speech: audio, band, phone, word
concrete abstractlearning
Slides from Caffe framework tutorial @ CVPR2015
Deep Learning with CNNs
Deep Learning with CNNs
Compositional Models
Learned End-to-End
Back-propagation jointly learns
all of the model parameters to
optimize the output for the task.
Slides from Caffe framework tutorial @ CVPR2015
Motivation
Deep Learning with CNNs
Up to now we treated inputs as general feature vectors
In some cases inputs have special structure:• Audio• Images• Videos
Signals: Numerical representations of physical quantities
Deep learning can be directly applied on signals by using suitable operators
Motivation
Deep Learning with CNNs
. . . 0.0468 0.0468 0.0468 0.0390 0.0390 0.0390 0.0546 0.0625 0.0625 0.0390 0.0312 0.0468 0.0625 . . .
1D data - (variable length) vectors
Audio
Motivation
Deep Learning with CNNs
Images
A sequence of images sampled through time - 3D data
2D data - matrices
Video
What is a CNN?
Deep Learning with CNNs
Some theory
Deep Learning with CNNs
Convolution
From Steve Seitz and Richard Szeliski's slides(https://courses.cs.washington.edu/courses/cse576/08sp/)
Interactive examples:http://setosa.io/ev/image-kernels/
Luca
Some theory
Deep Learning with CNNs
Convolution
• Image filtering is
based on convolution
with special kernels
Some theory
Deep Learning with CNNs
Pooling
• Introduces subsampling
Some theory
Deep Learning with CNNs
Activation
Standard way to model a neuron
f(x) = tanh(x) or f(x) = (1 + e-x)-1
Very slow to train (saturation)
Non-saturating nonlinearity (RELU)f(x) = max(0, x)Quick to train
Some theory
Deep Learning with CNNs
Regularization
Dropout
• Applied on the fully-connected layers
• During training prune nodes with probability α
• During testing nodes are weighed by α
Image from Srivastava et al.. "Dropout: A Simple Way to Prevent Neural Networks from Overfitting"
Some theory
Deep Learning with CNNs
Every convolutional layer of a CNN transforms the 3D input
volume to a 3D output volume of neuron activations.
A regular 3-layer Neural Network
Material from Fei-Fei’s group
Some theory
Deep Learning with CNNs
Each neuron is connected to a
local region in the input volume
spatially, but to all channels
The neurons still compute a dot
product of their weights with the
input followed by a non-linearity
Material from Fei-Fei’s group
Terminology
• Kernel: matrix corresponding to convolution / filter
• Depth: number of feature maps / filters (d)
• Depth slice: a single feature map
• Padding: zero filled addition outer rows/columns (p)
• Stride: step of sliding kernel (s)
– e.g. value 1 takes one pixel at a time
• Receptive field: 2D dimensions of kernel (𝑤𝑘 × ℎk)
• Weight or parameter sharing: Parameters of the
filter are shared through a depth slice
– i.e. parameters are the same for all units of the same
features map
Deep Learning with CNNs
Algorithms
Deep Learning with CNNs
• Each* neuron/layer is differentiable!
• Just apply backpropagation (chain-rule)
• Use standard gradient-based optimization algorithms
(SGD, AdaGrad, …)
• The devil lies in the details though …
▪Choosing hyperparameters / loss-function
▪Exploding/Vanishing gradients – batch normalization
▪Overfitting – Regularization
▪Cost of performing experiments
▪Convergence
▪…
*what about max-pooling?
Kernels and
Feature maps
Deep Learning with CNNs
Material from Fei-Fei’s group
Brief history of CNNs
Foundational work done in the middle of the 1900s
• 1940s-1960s: Cybernetics [McCulloch and Pitts 1943,
Hebb 1949, Rosenblatt 1958]
• 1980s-mid 1990s: Connectionism [Rumelhart 1986,
Hinton 1989]
• 1990s: modern convolutional networks [LeCun et al.
1998], LSTM [Hochreiter & Schmidhuber 1997,
MNIST and other large datasets]
Deep Learning with CNNs
Brief history of CNNs
Deep Learning with CNNs
Hubel & Wiesel [60s] Simple & Complex cells architecture:
Hubel & Wiesel [60s] Simple & Complex cells architecture Fukushima’s Neocognitron [70s]
Yann LeCun’s Early CNNs [80s]:
Brief history of CNNs
Deep Learning with CNNs
Convolutional Networks: 1989
LeNet: a layered model composed of convolution and subsampling operations followed
by a holistic representation and ultimately a classifier for handwritten digits. [ LeNet ]
Recent success
• Parallel Computation (GPU)
• Larger training sets
• International Competitions
• Theoretical advancements
– Dropout
– ReLUs
– Batch Normalization
Deep Learning with CNNs
Recent success
Deep Learning with CNNs
CUDA Jetson TX1, TK1
Android lib, demo
OpenCL branch
Better Hardware – GPUs
Recent success
ImageNet
• Over 15M labeled high resolution images
• Roughly 22K categories
• Collected from web and labeled by Amazon Mechanical Turk
Deep Learning with CNNs
Larger training sets
Recent success
ILSVRC
• Annual competition of image classification at large scale
• 1.2M images in 1K categories
• Classification: make 5 guesses about the image label
Deep Learning with CNNs
Competitions
EntleBucher Appenzeller
Evolution of CNNs for image classification
Deep Learning with CNNs
AlexNet: a layered model composed of convolution, subsampling, and further operations
followed by a holistic representation and all-in-all a landmark classifier on
ILSVRC12. [ AlexNet ]
Convolutional Nets: 2012
AlexNet
Evolution of CNNs for image classification
Deep Learning with CNNs
Convolutional Nets: 2014
ILSVRC14 Winners: ~6.6% Top-5 error
- GoogLeNet: composition of multi-scale
dimension-reduced modules
+ depth
+ data
+ dimensionality reduction
Evolution of CNNs for image classification
Deep Learning with CNNs
Convolutional Nets: 2014
ILSVRC14 Winners: ~6.6% Top-5 error
- VGG: 16 layers of 3x3 convolution
interleaved with max pooling +
3 fully-connected layers
+ depth
+ data
+ dimensionality reduction
Evolution of CNNs for image classification
Deep Learning with CNNs
Convolutional Nets: 2015
ResNet
ILSVRC15 Winner: ~3.6% Top-5 error
Intuition: Easier to learn zero than identity function
Evolution of CNNs for image classification
Deep Learning with CNNs
Reasonable questions
• Is this just for a particular dataset? – No!
Deep Learning with CNNs
Slides from ICCV 2015 Math of Deep Learning tutorial
Reasonable questions
Deep Learning with CNNs
Object Localization[R-CNN, HyperColumns, Overfeat, etc.]
Pose estimation [Thomson et al, CVPR’15]
• Is this just for a particular task? – No!
Slides from ICCV 2015 Math of Deep Learning tutorial
Reasonable questions
Deep Learning with CNNs
Semantic Segmentation[Pinhero, Collobert, Dollar, ICCV’15]
• Is this just for a particular task? – No!
Slides from ICCV 2015 Math of Deep Learning tutorial
Reasonable questions
Deep Learning with CNNs
• Is this just for a particular task? – No!
Slides from ICCV 2015 Math of Deep Learning tutorial
Fine Tuning
Deep Learning with CNNs
Dogs vs. Cats
top 10 in 10 minutes
Take a pre-trained model and fine-tune to new tasks [DeCAF] [Zeiler-Fergus] [OverFeat]
© kaggle.com
Your Task
Style
RecognitionLots of Data
ImageNet
Pixelwise Prediction
Deep Learning with CNNs
Fully convolutional networks for pixel prediction
in particular semantic segmentation
- end-to-end learning
- efficient inference and learning
100 ms per-image prediction
- multi-modal, multi-task
Applications
- semantic segmentation
- denoising
- depth estimation
- optical flow
Jon Long* & Evan Shelhamer*,
Trevor Darrell. CVPR’15
Dealing with sequences
Deep Learning with CNNs
Recurrent Nets and Long Short Term Memories (LSTM)
are sequential models
- video
- language
- dynamics
learned by backpropagation through time
Recurrent Networks for Sequences
LRCN: Long-term Recurrent Convolutional Network
- activity recognition (sequence-in)
- image captioning (sequence-out)
- video captioning (sequence-to-sequence)
LRCN:
recurrent + convolutional
for visual sequences
Dealing with sequences
Deep Learning with CNNs
Visual Sequence Tasks
Jeff Donahue et al. CVPR’15
50
Based on Long short-term memory (LSTM) layers
What’s next?
Various questions/problems are still open:
• Learning with constraints / on manifolds
• Using high-level knowledge/structure
• Exploring the mathematics of the networks
– what types of functions they can represent?
– are these functions useful/interesting?
– convergence/efficiency
• Rotation invariance (group operators)
• CNNs can be easily ‘fooled’
• …
Deep Learning with CNNs
Resources
Frameworks:
• Caffe/Caffe 2 (UC Berkeley) | C/C++, Python, Matlab
• TensorFlow (Google) | C/C++, Python, Java, Go
• Theano (U Montreal) | Python
• CNTK (Microsoft) | Python, C++ , C#/.Net, Java
• Torch/PyTorch (Facebook) | Lua/Python
• MxNet (DMLC) | Python, C++, R, Perl, …
• Darknet (Redmon J.) | C
• …
Deep Learning with CNNs
Resources
High-level libraries:
• Keras | Backends: TensorFlow (TF), Theano
Models:
• Depends on the framework, e.g.
– https://github.com/BVLC/caffe/wiki/Model-Zoo (Caffe)
– https://github.com/tensorflow/models/tree/master/research (TF)
Interactive Interfaces:
• DIGITS (NVIDIA) | Caffe, TF, Torch
• TensorBoard (TF)
Tools:
• http://ethereon.github.io/netscope (for networks defined in protobuf )
Deep Learning with CNNs