Date post: | 05-Mar-2016 |
Category: |
Documents |
Upload: | cyobosaurus |
View: | 224 times |
Download: | 0 times |
of 47
7/21/2019 Intro Svm PDF
1/47
SVMCAn introduction to Support Vector Machines Classification
Lorenzo Rosasco([email protected])
Department of Brain and Cognitive ScienceMIT
6.783, Biomedical Decision Support
Friday, October 30, 2009
mailto:[email protected]:[email protected]7/21/2019 Intro Svm PDF
2/47
A typical problem
We have a cohort of patients from twogroups- say A and B.
We wish to devise a classification rule todistinguish patients of one group frompatients of the other group.
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
3/47
Learning and
Generalization
Goal: classify correctly newpatients
3
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
4/47
Plan
1. Linear SVM
2. Non Linear SVM: Kernels
3. Tuning SVM
4. Beyond SVM: Regularization Networks
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
5/47
Learning from Data
To make predictions we need informationsabout the patients
patient 1:
patient 2 :
....
patient : x= (x1, . . . , xn)
x= (x1, . . . , xn)
x= (x1, . . . , xn)
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
6/47
Linear modelPatients of class A are labeled y=1
Patients of class B are labeled y=-1
Linear model
classification rule sign(w x)
w x =
n
j=1
wjxj
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
7/47
1D Case
y=1
y=-1
Y
X
w x > 0
wx < 0
w x = 0
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
8/47
How do we find a good solution?
2D Classification Problem
x= (x1, x2)
y=1
y=-1
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
9/47
How do we find a good solution?
w
x = 0
w x > 0w x < 0
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
10/47
How do we find a good solution?
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
11/47
How do we find a good solution?
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
12/47
How do we find a good solution?
?
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
13/47
How do we find a good solution?
M
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
14/47
Maximum Margin Hyperplane
....with little effort ... one can show that
maximizing the margin M is equivalent to:
maximizing
w
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
15/47
SVM
Linear and Separable SVM
minwRn
||w||2
subject to: yi(wx) 1 i= 1, . . . ,
Text
Typically an off-set term is added to the solution
f(x) =sign(w
x+b).
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
16/47
A more general
AlgorithmThere are two things we would like toimprove:
Allow for errors
Non Linear Models
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
17/47
Measuring errors
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
18/47
Measuring errors (cont)
i
i
i
Slack Variablesi
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
19/47
Linear SVM
minwRn,Rn,bR
C
i=1i+ 12 ||w||2
subject to: yi(w x+b) 1 i i=1, . . . ,
i 0 i=1, . . . ,
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
20/47
Optimization
How do we solve this minimization problem?
(...and why do we call it SVM anyway?)
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
21/47
Some facts
Representer Theorem
Dual Formulation
Box Constraints and Support Vectors
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
22/47
Representer Theorem
The solution to the minimization problemcan be written as
w x
=
i=1
c
i(x x
i)
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
23/47
Dual Problem
The coefficients can be found solving:
maxR
i=1i 1
2
T
Q
subject to:
i=1yii= 0
0 i C i= 1, . . . ,
TextText
Here Q=yiyj(xi xj)
i =
ci/yiFriday, October 30, 2009
7/21/2019 Intro Svm PDF
24/47
Optimality conditions
with little effort ... one can show that
If then
The solution is sparse: some training pointsdo not contribute to the solution.
yif(xi) 1i> 0
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
25/47
Sparse SolutionNote that:
The solution depends only on the training
set points. (no dependence on the number offeatures!)
i i i
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
26/47
Feature Map
f(x) =w (x)
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
27/47
A Key Observation
The solution depends only on
maxR
i=
1
i 1
2
TQ
subject to:
i=1yii= 0
0 i C i= 1, . . . ,
Text
Idea: use Q=yiyj((xi) (xj))
Q=yiyj(xi xj)
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
28/47
Kernels and Feature
Ma sThe crucial quantity is the inner product
called Kernel.
K(x, t) =
(x)
(t)
A function is called Kernel if it is:
symmetricpositive definite
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
29/47
Examples of Kernels
Linear kernel
K(x, x) = x x
Gaussian kernel
K(x, x) = exx2
2 , >0
Polynomial kernel
K(x, x) = (x x + 1)d, d N
For specific applications, designing an effective kernel is a
challenging problem.
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
30/47
Non Linear SVM
Summing up:
Define Feature Map either explicitly or via akernel
Find linear solution in the Feature space
Use same solver as in the linear case
Representer theorem now gives:
w (x) =
i=1
ci((x) (xi)) =
i=1
ciK(x, xi)
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
31/47
Example in 1D
y=1
y=-1
Y
X
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
32/47
Software
SVM Light: http://svmlight.joachims.org
SVM Torch: http://www.torch.ch
libSVM:
http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
33/47
Model Selection
We have to fix the Regularization parameter C
We have to choose the kernel (and its
parameter)
Using default values isusually a BAD BAD idea
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
34/47
Regularization Parameter
Large C: we try to minimize errors ignoringthe complexity of the solution
Small C we ignore the errors to obtain asimple solution
minwRn,Rn,bR
C
i=1i+ 12||w||2
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
35/47
Which Kernel?
For very high dimensional data linear kernel is oftenthe default choice
allows computational speed up less prone to overfitting
Gaussian Kernel with proper tuning is anothercommon choice
Whenever possible use prior knowledgeto build problem specific features or
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
36/47
2D demo
demo
(a) (b)
Friday, October 30, 2009
http://svm.dcs.rhbnc.ac.uk/pagesnew/GPat.shtmlhttp://svm.dcs.rhbnc.ac.uk/pagesnew/GPat.shtml7/21/2019 Intro Svm PDF
37/47
Practical Rules
We can choose C (and the kernelparameter) via cross validation
Holdout set
K-fold cross validation
K=# of examples is called Leave One OutFriday, October 30, 2009
7/21/2019 Intro Svm PDF
38/47
K-Fold CV
We have to compute several solutions...
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
39/47
A Rule of Thumb
This is how the CV error typically looks like
Fix a reasonable kernel, then fine tune C
!!"&
!
!"'
!!"(
!
!"(
!"'
!"&
!"6
)
!(! !)$ !)! !$!'
!*
!(
!)
!
)
(
*
'
$
&
+,-(.+/012/3
+,-(.45-0/3
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
40/47
Which values do we start from?
For the Gaussian kernel, pick sigma of theorder of the average distance...
Take min (and max) C as the value for whichthe training set error does not increase(decrease) anymore.
k(Xi, Xj) =exp
||Xi Xj||2
2
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
41/47
Computational Considerations
the training time depends on theparameters: the more we fit, the slower the
algorithm. typically the computational burden is in the
selection of the regularization parameter(solvers for regularization path).
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
42/47
Regularization Networks
SVM are an example of a family of algorithmsof the form:
Vis called loss function
C
i=1
V(yi, w (xi)) +w2
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
43/47
Hinge Loss
V(yw (x))
0-1 loss
hinge loss
yw
(x)
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
44/47
Loss functions
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
45/47
Representer Theorem
For a LARGE class of loss functions:
w(x) =
n
i=1
i((x)(xi)) =
n
i=1
iK(x, xi)
The way we compute the coefficients depends on theconsidered loss function.
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
46/47
Regularized LS
The simplest, yet powerful, algorithm isprobably RLS
V(y,w (x)) = (y w (x))2Square loss
Algorithm (Q +1
C
I)=y, Qi,j =K(xi, xj)
Leave one out can be computed at the priceof one (!!!) solution
Friday, October 30, 2009
7/21/2019 Intro Svm PDF
47/47
Summary
Separable, Linear SVM
Non Separable, Linear SVM
Non Separable, Non Linear SVM
How to use SVM