Deep Networksand
Kernel MethodsEdgar Marca
Grupo de Reconocimiento de Patrones e Inteligencia Artificial Aplicada — PUCLima, Perú
June 18th, 2015
1 / 35
Table of Contents IImage Classification Problem
Deep NetworksConvolutional Neural NetworksSoftwareHow to start
Kernel MethodsSVMThe Kernel TrickHistory of Kernel MethodsSoftwareHow to start
Kernels and Deep LearningConvolutional Kernel NetworksDeep Fried Convnets
2 / 35
Image Classification Problem
Image Classification Problem
Figure: http://www.image-net.org/
3 / 35
Deep Networks
Deep Networks
Human-level control through deep reinforcement learning
▶ Volodymyr Mnih et al., Human-level control through deepreinforcement learning.
5 / 35
Deep Networks Convolutional Neural Networks
Convolutional Neural Networks
6 / 35
Deep Networks Software
Software
▶ Torch7 — http://torch.ch/.▶ Caffe — http://caffe.berkeleyvision.org/.▶ Minerva — https://github.com/dmlc/minerva.▶ Theano — http://deeplearning.net/software/theano/.
7 / 35
Deep Networks How to start
How to start I
▶ Deep Learning Course by Nando de Freitas —https://www.youtube.com/watch?v=PlhFWT7vAEw&list=
PLjK8ddCbDMphIMSXn-w1IjyYpHU3DaUYw.▶ Alex Smola Lecture on Deep Networks —
https://www.youtube.com/watch?v=xZzZb7wZ6eE.▶ Convolutional Neural Networks for Visual Recognition —
http://vision.stanford.edu/teaching/cs231n/.▶ Deep Learning, Spring 2015 —
http://cilvr.cs.nyu.edu/doku.php?id=courses:
deeplearning2015:start.
8 / 35
Deep Networks How to start
How to start II
▶ Deep Learning for Natural Language Processing —http://cs224d.stanford.edu/.
▶ Applied Deep Learning for Computer Vision with Torch –http://torch.ch/docs/cvpr15.html.
▶ DEEP LEARNING, An MIT Press book in preparation —http://www.iro.umontreal.ca/~bengioy/dlbook/.
▶ Reading List — http://deeplearning.net/reading-list/.
9 / 35
Kernel Methods
Kernel Methods SVM
Linear Support Vector Machine
〈w,x〉 + b = 1
〈w,x〉 + b = −1
〈w,x〉 + b = 0
margen
Figure: Linear Support Vector Machine
11 / 35
Kernel Methods SVM
Linear Support Vector Machine
MSV Lineal - Primal ProblemGiven a linear separable training set
D = {(x1, y1), (x2, y2), ..., (xl, yl)} ⊂ Rn × {+1,−1},
we can calculate the max margin decision surface ⟨w∗,x⟩ = b∗ solving the convexprogram
(P )
minw,b
ϕ(w, b) = 12⟨w,w⟩
subject to ⟨w, yixi⟩ ≥ 1 + yib,
where (xi, yi) ∈ D ⊂ Rn × {−1,+1}.
(1)
1. The objective function doesn’t depends on b.2. The displacement b appears in the restrictions.3. The number of restrictions is equal to the number of training points.
12 / 35
Kernel Methods SVM
Linear Support Vector Machine
MSVL - Dual Problem
(DP )
maxα
h(α) = maxα
(l∑
i=1αi − 1
2
l∑i=1
l∑j=1
αiαjyiyj⟨xi,xj⟩)
sujeto al∑
i=1αiyi = 0,
αi ≥ 0
for i = 1, . . . , l.
The calculus of b es in terms of w∗, as following:
b+ = min {⟨w∗,x⟩ | (x, y) ∈ Dwhere y = +1)}
b− = max {⟨w∗,x⟩ | (x, y) ∈ Dwhere y = −1)}
Then b∗ = b++b−
2
The training vectors associated to αi > 0 are named support vectors.
13 / 35
Kernel Methods SVM
Linear Support Vector Machine
Linear Support Vector Machine
f̂(x) = sign
(l∑
i=1
α∗i yi⟨xi,x⟩ − b∗
)
14 / 35
The Kernel Trick
Kernel Methods The Kernel Trick
Motivation
▶ How we can split data that is not linear separable?▶ How we can utilize algorithms that works for linear separable data
that only depends on the inner product?
16 / 35
Kernel Methods The Kernel Trick
R to R2 CaseHow to separate two classes?
0
R
R2
ϕ(x) = (x, x2)
ϕ
Figure: Separating the two classes of points by transforming the points into ahigher dimensional space where the data is separable.
17 / 35
Kernel Methods The Kernel Trick
R2 to R3 Case
+
+
+
+
+
+
+
+
++
+
+
+
+
+
-
-
-
-
-
-
-
-
-
-
- -
-
-
-
-
-
-
-
-
-
--
-
-
-
-
-
-
-
-
-
-
--
-
-
-
Figure: Data which is not linear separable.18 / 35
Kernel Methods The Kernel Trick
R2 to R3 CaseA simulation
Figure: SVM with polynomial kernel visualization.
19 / 35
Kernel Methods The Kernel Trick
Idea
ϕ
ϕ(+)ϕ(+)
ϕ(+)
ϕ(−)
ϕ(−)
ϕ(−)
ϕ(−)
ϕ(−)
ϕ(+)ϕ(+)
Figure: φ is a non-linear mapping from the input space to the feature space.
20 / 35
Kernel Methods The Kernel Trick
Non Linear Support Vector Machine
Now we can use a non-linear function φ to map the information fromthe initial space to a higher dimensional space.
Non Linear Support Vector Machine
f̂(x) = sign
(l∑
i=1
α∗i yi⟨φ(xi), φ(x)⟩ − b∗
)
21 / 35
Kernel Methods The Kernel Trick
Definition 3.1 (Kernel)Let X a non-empty set. A function k : X ×X → K is called kernel inX if and only if there is Hilbert Space H and a mapping Φ : X → Hsuch that for all s, t it holds
k(t, s) := ⟨Φ(t),Φ(s)⟩H (2)
The function Φ is called feature mapping and H feature space of k.
22 / 35
Kernel Methods The Kernel Trick
Example 3.2Consider X = R and the function k defined by
k(s, t) = st =
⟨[s√2
s√2
],
[t√2t√2
]⟩
where the feature mappings are Φ(s) = s and Φ̃(s) =[
s√2
s√2
]and the
features spaces are H = R and H̃ = R2 respectively.
23 / 35
Kernel Methods The Kernel Trick
Non Linear Support Vector MachinesUsing the kernel trick we can replace ⟨φ(xi), φ(x)⟩ by a kernel k(xi,x).
f̂(x) = sign
(l∑
i=1
α∗i yik(xi,x)− b∗
)
24 / 35
History of Kernel Methods
Kernel Methods History of Kernel Methods
Timeline
Table: Timeline of Support Vector Machines Algorithm Development
1909 • Mercer Theorem — James Mercer."Functions of Positive and Negative Type, and their Connection with the
Theory of Integral Equations".
1950 • "Moore-Aronzajn Theorem" — Nachman Aronszajn."Reproducing Kernel Hilbert Spaces".
1964 • Introduced the geometrical interpretation of the kernels asinner products in a feature space — Aizerman, Bravermanand Rozonoer."Theoretical Foundations of the Potential Function Method in Pattern
Recognition Learning".
1964 • Original SVM algorithm — Vladimir Vapnik and AlexeyChervonenkis."A Note on One Class of Perceptrons"
26 / 35
Kernel Methods History of Kernel Methods
Timeline
Table: Timeline of Support Vector Machines Algorithm Development
1965 • Cover’s Theorem — Thomas Cover."Geometrical and Statistical Properties of Systems of Linear Inequalities
with Applications in Pattern Recognition".
1992 • Support Vector Machines — Bernhard Boser, IsabelleGuyon and Vladimir Vapnik."A Training Algorithm for Optimal Margin Classifiers".
1995 • Soft Support Vector Machines — Corinna Cortes andVladimir Vapnik."Support Vector Networks".
27 / 35
Kernel Methods Software
Software
▶ LibSVM — https://www.csie.ntu.edu.tw/~cjlin/libsvm/.▶ SVMLight — http://svmlight.joachims.org/.▶ Scikit Learn —
http://scikit-learn.org/stable/modules/svm.html.
28 / 35
Kernel Methods How to start
How to start
▶ Introduction to Support Vector Machines —https://beta.oreilly.com/learning/intro-to-svm
▶ Lutz H. Hamel, Knowledge Discovery with Support VectorMachines.
▶ John Shawe-Taylor and Nello Cristianini, Kernel Methods forPattern Analysis.
29 / 35
Kernel and Deep Learning
Kernels and Deep Learning
Kernels and Deep Learning
▶ Julien Mairal et al., Convolutional Kernel Networks.▶ Zichao Yang et al., Deep Fried Convnets.
31 / 35
Kernels and Deep Learning Convolutional Kernel Networks
Convolutional Kernel Networks
32 / 35
Kernels and Deep Learning Deep Fried Convnets
Deep Fried Convnets
▶ Quoc Viet Le et al., Fastfood: Approximate Kernel Expansions inLoglinear Time.
33 / 35
Questions?
Thanks