+ All Categories
Home > Documents > Intro Svm PDF

Intro Svm PDF

Date post: 05-Mar-2016
Category:
Upload: cyobosaurus
View: 224 times
Download: 0 times
Share this document with a friend
Description:
introSVMPDF

of 47

Transcript
  • 7/21/2019 Intro Svm PDF

    1/47

    SVMCAn introduction to Support Vector Machines Classification

    Lorenzo Rosasco([email protected])

    Department of Brain and Cognitive ScienceMIT

    6.783, Biomedical Decision Support

    Friday, October 30, 2009

    mailto:[email protected]:[email protected]
  • 7/21/2019 Intro Svm PDF

    2/47

    A typical problem

    We have a cohort of patients from twogroups- say A and B.

    We wish to devise a classification rule todistinguish patients of one group frompatients of the other group.

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    3/47

    Learning and

    Generalization

    Goal: classify correctly newpatients

    3

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    4/47

    Plan

    1. Linear SVM

    2. Non Linear SVM: Kernels

    3. Tuning SVM

    4. Beyond SVM: Regularization Networks

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    5/47

    Learning from Data

    To make predictions we need informationsabout the patients

    patient 1:

    patient 2 :

    ....

    patient : x= (x1, . . . , xn)

    x= (x1, . . . , xn)

    x= (x1, . . . , xn)

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    6/47

    Linear modelPatients of class A are labeled y=1

    Patients of class B are labeled y=-1

    Linear model

    classification rule sign(w x)

    w x =

    n

    j=1

    wjxj

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    7/47

    1D Case

    y=1

    y=-1

    Y

    X

    w x > 0

    wx < 0

    w x = 0

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    8/47

    How do we find a good solution?

    2D Classification Problem

    x= (x1, x2)

    y=1

    y=-1

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    9/47

    How do we find a good solution?

    w

    x = 0

    w x > 0w x < 0

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    10/47

    How do we find a good solution?

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    11/47

    How do we find a good solution?

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    12/47

    How do we find a good solution?

    ?

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    13/47

    How do we find a good solution?

    M

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    14/47

    Maximum Margin Hyperplane

    ....with little effort ... one can show that

    maximizing the margin M is equivalent to:

    maximizing

    w

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    15/47

    SVM

    Linear and Separable SVM

    minwRn

    ||w||2

    subject to: yi(wx) 1 i= 1, . . . ,

    Text

    Typically an off-set term is added to the solution

    f(x) =sign(w

    x+b).

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    16/47

    A more general

    AlgorithmThere are two things we would like toimprove:

    Allow for errors

    Non Linear Models

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    17/47

    Measuring errors

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    18/47

    Measuring errors (cont)

    i

    i

    i

    Slack Variablesi

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    19/47

    Linear SVM

    minwRn,Rn,bR

    C

    i=1i+ 12 ||w||2

    subject to: yi(w x+b) 1 i i=1, . . . ,

    i 0 i=1, . . . ,

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    20/47

    Optimization

    How do we solve this minimization problem?

    (...and why do we call it SVM anyway?)

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    21/47

    Some facts

    Representer Theorem

    Dual Formulation

    Box Constraints and Support Vectors

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    22/47

    Representer Theorem

    The solution to the minimization problemcan be written as

    w x

    =

    i=1

    c

    i(x x

    i)

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    23/47

    Dual Problem

    The coefficients can be found solving:

    maxR

    i=1i 1

    2

    T

    Q

    subject to:

    i=1yii= 0

    0 i C i= 1, . . . ,

    TextText

    Here Q=yiyj(xi xj)

    i =

    ci/yiFriday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    24/47

    Optimality conditions

    with little effort ... one can show that

    If then

    The solution is sparse: some training pointsdo not contribute to the solution.

    yif(xi) 1i> 0

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    25/47

    Sparse SolutionNote that:

    The solution depends only on the training

    set points. (no dependence on the number offeatures!)

    i i i

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    26/47

    Feature Map

    f(x) =w (x)

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    27/47

    A Key Observation

    The solution depends only on

    maxR

    i=

    1

    i 1

    2

    TQ

    subject to:

    i=1yii= 0

    0 i C i= 1, . . . ,

    Text

    Idea: use Q=yiyj((xi) (xj))

    Q=yiyj(xi xj)

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    28/47

    Kernels and Feature

    Ma sThe crucial quantity is the inner product

    called Kernel.

    K(x, t) =

    (x)

    (t)

    A function is called Kernel if it is:

    symmetricpositive definite

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    29/47

    Examples of Kernels

    Linear kernel

    K(x, x) = x x

    Gaussian kernel

    K(x, x) = exx2

    2 , >0

    Polynomial kernel

    K(x, x) = (x x + 1)d, d N

    For specific applications, designing an effective kernel is a

    challenging problem.

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    30/47

    Non Linear SVM

    Summing up:

    Define Feature Map either explicitly or via akernel

    Find linear solution in the Feature space

    Use same solver as in the linear case

    Representer theorem now gives:

    w (x) =

    i=1

    ci((x) (xi)) =

    i=1

    ciK(x, xi)

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    31/47

    Example in 1D

    y=1

    y=-1

    Y

    X

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    32/47

    Software

    SVM Light: http://svmlight.joachims.org

    SVM Torch: http://www.torch.ch

    libSVM:

    http://www.csie.ntu.edu.tw/~cjlin/libsvm/

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    33/47

    Model Selection

    We have to fix the Regularization parameter C

    We have to choose the kernel (and its

    parameter)

    Using default values isusually a BAD BAD idea

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    34/47

    Regularization Parameter

    Large C: we try to minimize errors ignoringthe complexity of the solution

    Small C we ignore the errors to obtain asimple solution

    minwRn,Rn,bR

    C

    i=1i+ 12||w||2

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    35/47

    Which Kernel?

    For very high dimensional data linear kernel is oftenthe default choice

    allows computational speed up less prone to overfitting

    Gaussian Kernel with proper tuning is anothercommon choice

    Whenever possible use prior knowledgeto build problem specific features or

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    36/47

    2D demo

    demo

    (a) (b)

    Friday, October 30, 2009

    http://svm.dcs.rhbnc.ac.uk/pagesnew/GPat.shtmlhttp://svm.dcs.rhbnc.ac.uk/pagesnew/GPat.shtml
  • 7/21/2019 Intro Svm PDF

    37/47

    Practical Rules

    We can choose C (and the kernelparameter) via cross validation

    Holdout set

    K-fold cross validation

    K=# of examples is called Leave One OutFriday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    38/47

    K-Fold CV

    We have to compute several solutions...

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    39/47

    A Rule of Thumb

    This is how the CV error typically looks like

    Fix a reasonable kernel, then fine tune C

    !!"&

    !

    !"'

    !!"(

    !

    !"(

    !"'

    !"&

    !"6

    )

    !(! !)$ !)! !$!'

    !*

    !(

    !)

    !

    )

    (

    *

    '

    $

    &

    +,-(.+/012/3

    +,-(.45-0/3

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    40/47

    Which values do we start from?

    For the Gaussian kernel, pick sigma of theorder of the average distance...

    Take min (and max) C as the value for whichthe training set error does not increase(decrease) anymore.

    k(Xi, Xj) =exp

    ||Xi Xj||2

    2

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    41/47

    Computational Considerations

    the training time depends on theparameters: the more we fit, the slower the

    algorithm. typically the computational burden is in the

    selection of the regularization parameter(solvers for regularization path).

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    42/47

    Regularization Networks

    SVM are an example of a family of algorithmsof the form:

    Vis called loss function

    C

    i=1

    V(yi, w (xi)) +w2

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    43/47

    Hinge Loss

    V(yw (x))

    0-1 loss

    hinge loss

    yw

    (x)

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    44/47

    Loss functions

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    45/47

    Representer Theorem

    For a LARGE class of loss functions:

    w(x) =

    n

    i=1

    i((x)(xi)) =

    n

    i=1

    iK(x, xi)

    The way we compute the coefficients depends on theconsidered loss function.

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    46/47

    Regularized LS

    The simplest, yet powerful, algorithm isprobably RLS

    V(y,w (x)) = (y w (x))2Square loss

    Algorithm (Q +1

    C

    I)=y, Qi,j =K(xi, xj)

    Leave one out can be computed at the priceof one (!!!) solution

    Friday, October 30, 2009

  • 7/21/2019 Intro Svm PDF

    47/47

    Summary

    Separable, Linear SVM

    Non Separable, Linear SVM

    Non Separable, Non Linear SVM

    How to use SVM


Recommended