+ All Categories
Home > Documents > Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

Date post: 29-Mar-2015
Category:
Upload: brielle-mordan
View: 278 times
Download: 7 times
Share this document with a friend
Popular Tags:
40
Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)
Transcript
Page 1: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

Introduction to Support Vector Machines (SVM)

By Debprakash PatnaikM.E (SSA)

Page 2: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

Introduction SVMs provide a learning technique for

Pattern Recognition Regression Estimation

Solution provided SVM is Theoretically elegant Computationally Efficient Very effective in many Large practical problems

It has a simple geometrical interpretation in a high-dimensional feature space that is nonlinearly related to input space

By using kernels all computations keep simple. It contains ANN, RBF and Polynomial classifiers as

special cases.

Page 3: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

History The Study on Statistical

Learning Theory was started in the 1960s by Vapnik

Statistical Learning Theory is the theory about Machine Learning Principle from a small sample size.

Support Vector Machine is a practical learning method based on Statistical Learning Theory

A simple SVM could beat a sophisticated neural networks with elaborate features in a handwriting recognition task.

Page 4: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

Learning Machine A bound on the Generalization Performance

of Learning Machine Expected Risk: Empirical Risk:

is the VC dimension, a measure of the notion of capacity of a classifier.

),(),(2

1)( yxdPxfyR

l

iiiemp xfy

lR

1

),(2

1)(

))4/(log)1)/2((log

()()(l

hlhRR emp

h

Page 5: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

VC Dimension

The VC dimension is a property of a set of functions , and can be defined for various classes of function .

The VC dimension for the set of functions

is defined as the maximum number of training points that can be shattered by .

VC dimension gives concreteness to the notion of the capacity of a given set of functions.

The number of parameters of Learning Machines is not proportional to the VC dimension.

)}({ ff

)}({ f)}({ f

Page 6: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

VC Dimension – An example

The VC-Dimension of the set of oriented hyperplanes in Rn is (n+1)

Page 7: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

Structural Risk Minimization

))4/(log)1)/2((log

()()(l

hlhRR emp

Page 8: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

Two Approaches

Goal: To find a trained machine in the series whose sum of empirical risk and VC confidence is minimal.

Neural Network Fix the VC confidence and minimize the

empirical risk Support Vector Machine

Fix the empirical risk and minimize the VC confidence

Page 9: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

The Two Class Problem

Several decision boundaries can separate these two classes.

Perceptron Algorithm learns any separating hyperplane.

SVM learns the best separating hyperplane. Class 1

Class 2

Page 10: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

Perceptron Algorithm

Class 1

Class 2

Simple Perceptron learning Algorithm

Page 11: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

SVM Algorithm

Finding the Optimal Separating Hyperplane in SVM

Class 1

Class 2

Support Vectors

Optimal Separating Hyperplane

Page 12: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

Decision Boundary The decision boundary/hyperplane should be as far

away from the data of both classes as possible. We should maximize the margin, m

Class 1

Class 2

m

Page 13: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

The Optimization Problem Let {x1, ..., xn} be our data set

And let yi {1,-1} be the class label of xi

The decision boundary should classify all points correctly

A constrained optimization problem

Page 14: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

Dual Formulation The Lagrangian for this problem is

where are the Lagrange multipliers

Quadratic cost Optimization and linear constraints. The Kuhn-Tucker Conditions for the problem

where (w*,b*) is the globalsolution of L and µ* is theoptimal Lagrange multiplier.

Page 15: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

Support Vectors Complimentary Slackness condition

We must have

Support Vectors are the set of xi’s that have µ*i>0

Page 16: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

The Dual Problem We can transform the problem to its dual

subject to

This is a quadratic programming (QP) problem w*,b* can be recovered by

Page 17: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

µ6=1.4

A Geometrical Interpretation

Class 1

Class 2

µ1=0.8

µ2=0

µ3=0

µ4=0

µ5=0µ7=0

µ8=0.6

µ9=0

µ10=0

Page 18: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

Some Notes

There are theoretical upper bounds on the error on unseen data for SVMThe larger the margin, the smaller the boundThe smaller the number of SV, the smaller the

boundNote that in both training and testing, the data

are referenced only as inner product, xTyThis is important for generalizing to the non-linear

case

Page 19: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

If Not Linearly SeparableWe allow “error” i in classification

Class 1

Class 2

Page 20: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

Soft Margin HyperplaneDefine i=0 if there is no error for xi

i are just “slack variables” in optimization theory

We want to minimizeC : tradeoff parameter between error and margin

The optimization problem becomes

Page 21: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

The New Optimization Problem

The dual of the problem is

w is also recovered asThe only difference with the linear separable

case is that there is an upper bound C on i

A QP solver can be used to find µi ‘s

Page 22: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

Extension to Non-linear Decision Boundary

Key idea: transform xi to a higher dimensional space to “make classes linearly separable”Input space: the space xi are inFeature space: the space of (xi) after

transformationWhy transform?

Linear operation in the feature space is equivalent to non-linear operation in input space

The classification task can be “easier” with a proper transformation. Example: XOR

Page 23: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

Higher Dimensions Project the data to high dimensional space where it

is linearly separable and then we can use linear SVM – (Using Kernels)

-1 0 +1

+ +-

(1,0)(0,0)

(0,1) +

+-

Page 24: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

The XOR problem

Z = ( x1, x2, x1x2 )X = ( x1, x2 )

Page 25: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

Extension to Non-linear Decision Boundary Possible problem of the transformation

High computation burden and hard to get a good estimate SVM solves these two issues simultaneously

Kernel tricks for efficient computation Minimize ||w||2 can lead to a “good” classifier

Feature spaceInput space

Φ: x → φ(x)

Page 26: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

What is Kernel?

2)(),( yxyxK

Page 27: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

Example TransformationDefine the kernel function K (x,y) as

Consider the following transformation

The inner product can be computed by K without going through the map (.)

Page 28: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

Kernel TrickThe relationship between the kernel

function K and the mapping (.) is

This is known as the kernel trickIn practice, we specify K, thereby

specifying (.) indirectly, instead of choosing (.)

K (x,y) needs to satisfy Mercer condition in order for (.) to exist

Page 29: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

Examples of Kernel Functions

Polynomial kernel with degree d

Radial basis function kernel with width

Closely related to radial basis function neural networks

Sigmoid with parameter and

It does not satisfy the Mercer condition on all and

Page 30: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

(x1,x2)

y = +1

y = -1

Page 31: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

)2,,( 2122

21 xxxx

Page 32: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

Optimization Algorithms Most popular optimization algorithms for SVMs are

SMO [Platt ’99] and SVMlight [Joachims’ 99], both use

decomposition to hill-climb over a subset of µi’s at a time.

Idea behind SMO Adjusting only 2 µi’s at each step

All µi’s are initialized to be zero

Page 33: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

SVM vs. Neural Networks

Neural Networks Generalizes well but

doesn’t have mathematical foundation

Can easily be learnt in incremental fashion

To learn complex function – use complex multi layer structure.

SVM Relatively new concept Nice Generalization

properties Hard to learn – learned in

batch modes using QP techniques

Using kernels can learn very complex functions

Page 34: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

Example of Non-linear SVM

Page 35: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

Results

Page 36: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

A Nonlinear Kernel Application

Checkerboard Training Set: 1000 Points in R2Separate 486 Asterisks from 514 Dots

Page 37: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

Previous Work

Page 38: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

K (A;A0) = ((100A à 1)(100

A 0à 1) à 0:5)6

ï

Polynomial Kernel

Page 39: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

SVM Applications

Pattern Recognition handwriting recognition 3D object recognition speaker identification face detection text categorization bio-informatics

Regression estimation or function learning. More…

Page 40: Introduction to Support Vector Machines (SVM) By Debprakash Patnaik M.E (SSA)

References

[1] C.J.C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition”, 1998

[2] P.S. Sastry, “An Introduction to Support Vector Machine”

[3] J. Platt, “Sequential minimal optimization: A fast algorithm for training support vector machines”, 1999


Recommended