+ All Categories
Home > Documents > [Suykens J.] a Short Introduction to Support Vecto(BookZZ.org)

[Suykens J.] a Short Introduction to Support Vecto(BookZZ.org)

Date post: 22-Feb-2018
Category:
Upload: vishal-mishra
View: 223 times
Download: 0 times
Share this document with a friend

of 21

Transcript
  • 7/24/2019 [Suykens J.] a Short Introduction to Support Vecto(BookZZ.org)

    1/21

    A (short) Introduction to Support VectorMachines and Kernelbased Learning

    Johan Suykens

    K.U. Leuven, ESAT-SCD-SISTAKasteelpark Arenberg 10

    B-3001 Leuven (Heverlee), Belgium

    Tel: 32/16/32 18 02 - Fax: 32/16/32 19 70Email: [email protected]

    http://www.esat.kuleuven.ac.be/sista/members/suykens.html

    ESANN 2003, Bruges April 2003

  • 7/24/2019 [Suykens J.] a Short Introduction to Support Vecto(BookZZ.org)

    2/21

    Overview

    Disadvantages of classical neural nets

    SVM properties and standard SVM classifier

    Related kernelbased learning methods

    Use of the kernel trick (Mercer Theorem)

    LS-SVMs: extending the SVM framework

    Towards a next generation of universally applicable models?

    The problem of learning and generalization

    Introduction to SVM and kernelbased learning Johan Suykens ESANN 2003 1

  • 7/24/2019 [Suykens J.] a Short Introduction to Support Vecto(BookZZ.org)

    3/21

    Classical MLPsx1x2x3xn

    1

    w1

    w2

    w3wn

    b

    h()

    h()

    y

    Multilayer Perceptron (MLP) properties:

    Universal approximationof continuous nonlinear functions

    Learning frominput-output patterns; either off-line or on-line learning Parallelnetwork architecture, multiple inputs and outputs

    Use in feedforward and recurrent networks

    Use in supervised and unsupervised learning applications

    Problems: Existence of many local minima!How many neurons needed for a given task?

    Introduction to SVM and kernelbased learning Johan Suykens ESANN 2003 2

  • 7/24/2019 [Suykens J.] a Short Introduction to Support Vecto(BookZZ.org)

    4/21

    Support Vector Machines (SVM)

    cost function cost function

    weights weights

    MLP SVM

    Nonlinear classification and function estimation byconvex optimizationwith a unique solution andprimal-dualinterpretations.

    Number of neuronsautomatically follows from a convex program.

    Learning and generalization in huge dimensional input spaces (able toavoid the curse of dimensionality!).

    Use of kernels (e.g. linear, polynomial, RBF, MLP, splines, ... ).Application-specifickernels possible (e.g. textmining, bioinformatics)

    Introduction to SVM and kernelbased learning Johan Suykens ESANN 2003 3

  • 7/24/2019 [Suykens J.] a Short Introduction to Support Vecto(BookZZ.org)

    5/21

    SVM: support vectors

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    x1

    x2

    2.5 2 1.5 1 0.5 0 0.5 1 1.5 22.5

    2

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    2.5

    3

    x1

    x2

    Decision boundary can be expressed in terms of a limited number ofsupport vectors(subset of given training data); sparseness property

    Classifier follows from the solution to a convexQP problem.

    Introduction to SVM and kernelbased learning Johan Suykens ESANN 2003 4

  • 7/24/2019 [Suykens J.] a Short Introduction to Support Vecto(BookZZ.org)

    6/21

    SVMs: living in two worlds ...

    + +

    +

    x

    x

    x

    +

    xx

    x

    +

    +

    +

    + +

    +

    x

    x

    x

    +

    x

    x

    xx

    x

    x

    x

    +

    +

    +

    Input space

    Feature space

    (x)

    Primal space: ( large data sets)

    Dual space: ( high dimensional inputs)

    Parametric: estimate w Rnh

    Non-parametric: estimate RN

    y(x) = sign[wT(x)+ b]

    y(x) = sign[P#sv

    i=1 iyiK(x, xi) +b]

    K(xi, xj)=(xi)T(xj) (Kernel trick)

    y(x)

    y(x)

    w1

    wnh

    1

    #sv

    1(x)

    nh(x)

    K(x, x1)

    K(x, x#sv)

    x

    x

    Introduction to SVM and kernelbased learning Johan Suykens ESANN 2003 5

  • 7/24/2019 [Suykens J.] a Short Introduction to Support Vecto(BookZZ.org)

    7/21

    Standard SVM classifier (1)

    Training set {xi, yi}Ni=1: inputs xi R

    n; class labels yi {1, +1}

    Classifier: y(x) = sign[wT(x) +b]

    with () : Rn Rnh a mapping to a high dimensional feature space(which can be infinite dimensional!)

    Forseparable data, assume

    wT(xi) +b+1, if yi= +1wT(xi) +b 1, if yi=1

    yi[wT(xi) +b] 1,i

    Optimization problem (non-separable case):

    minw,b,

    J(w, ) =1

    2wTw+c

    Ni=1

    i s.t.

    yi[w

    T(xi) +b]1 ii0, i= 1,...,N

    Introduction to SVM and kernelbased learning Johan Suykens ESANN 2003 6

  • 7/24/2019 [Suykens J.] a Short Introduction to Support Vecto(BookZZ.org)

    8/21

    Standard SVM classifier (2)

    Lagrangian:

    L(w,b, ; , ) =J(w, ) N

    i=1

    i{yi[wT(xi) +b] 1 +i}

    N

    i=1

    ii

    Find saddle point: max,

    minw,b,

    L(w,b, ; , )

    One obtains

    Lw

    = 0 w=N

    i=1

    iyi(xi)

    Lb

    = 0 Ni=1

    iyi= 0

    Li

    = 0 0ic, i= 1,...,N

    Introduction to SVM and kernelbased learning Johan Suykens ESANN 2003 7

  • 7/24/2019 [Suykens J.] a Short Introduction to Support Vecto(BookZZ.org)

    9/21

    Standard SVM classifier (3)

    Dual problem: QP problem

    max

    Q() =1

    2

    N

    i,j=1

    yiyjK(xi, xj) ij+N

    j=1

    j s.t.

    Ni=1

    iyi= 0

    0 ic, i

    withkernel trick(Mercer Theorem): K(xi, xj) =(xi)T (xj)

    Obtained classifier: y(x) = sign[N

    i=1 i yi K(x, xi) +b]

    Some possible kernels K(, ):

    K(x, xi) =xTi x (linear SVM)

    K(x, xi) = (xTix+)

    d (polynomial SVM of degree d)K(x, xi) = exp{x xi

    22/

    2} (RBF kernel)K(x, xi) = tanh( x

    Ti x+) (MLP kernel)

    Introduction to SVM and kernelbased learning Johan Suykens ESANN 2003 8

  • 7/24/2019 [Suykens J.] a Short Introduction to Support Vecto(BookZZ.org)

    10/21

    Kernelbased learning: many related methods and fields

    =

    SVMs

    LS-SVMs

    Regularization networks

    Gaussian processes

    Kriging

    Kernel ridge regression

    RKHS

    ?

    Some early history on RKHS:

    1910-1920: Moore

    1940: Aronszajn

    1951: Krige1970: Parzen

    1971: Kimeldorf & Wahba

    SVMs are closely related to learning in Reproducing Kernel Hilbert Spaces

    Introduction to SVM and kernelbased learning Johan Suykens ESANN 2003 9

  • 7/24/2019 [Suykens J.] a Short Introduction to Support Vecto(BookZZ.org)

    11/21

    Wider use of the kernel trick

    Angle between vectors:Input space:

    cos xz = xTz

    x2z2Feature space:

    cos (x),(z)= (x)T(z)

    (x)2(z)2=

    K(x, z)K(x, x)

    K(z, z)

    Distance between vectors:Input space:

    x z22= (x z)T(x z) =xTx+zTz 2xTz

    Feature space:

    (x) (z)22=K(x, x) +K(z, z) 2K(x, z)

    Introduction to SVM and kernelbased learning Johan Suykens ESANN 2003 10

  • 7/24/2019 [Suykens J.] a Short Introduction to Support Vecto(BookZZ.org)

    12/21

    LS-SVM models: extending the SVM framework

    Linear and nonlinear classification and function estimation, applicable in highdimensionalinput spaces;primal-dual optimization formulations.

    Solvinglinear systems; link with Gaussian processes, regularization networks and kernel

    versions of Fisher discriminant analysis.

    Sparseapproximation androbustregression (robust statistics).

    Bayesian inference (probabilistic interpretations, inference of hyperparameters, modelselection, automatic relevance determination for input selection).

    Extensions to unsupervised learning: kernel PCA(and related methods ofkernel PLS,CCA), density estimation (clustering).

    Fixed-size LS-SVMs: large scaleproblems;adaptive learning machines; transductive.

    Extensions torecurrent networksandcontrol.

    Introduction to SVM and kernelbased learning Johan Suykens ESANN 2003 11

  • 7/24/2019 [Suykens J.] a Short Introduction to Support Vecto(BookZZ.org)

    13/21

    Towards a next generation of universal models?

    FDA

    PCA

    PLS

    CCA

    Classifiers

    Regression

    Clustering

    Recurrent

    Linear

    RobustLinear

    Kernel

    RobustKerne

    l

    LS-SVM

    SVM

    Research issues:

    Large scale methodsAdaptive processingRobustness issuesStatistical aspectsApplication-specific kernels

    Introduction to SVM and kernelbased learning Johan Suykens ESANN 2003 12

  • 7/24/2019 [Suykens J.] a Short Introduction to Support Vecto(BookZZ.org)

    14/21

    Fixed-size LS-SVM (1)

    Primal space Dual space

    Nystrom method

    Kernel PCA

    Density estimate

    Entropy criteria

    Eigenfunctions

    SV selectionRegression

    Modelling in view of primal-dual representationsLink Nystrom approximation (GP) - kernel PCA - density estimation

    Introduction to SVM and kernelbased learning Johan Suykens ESANN 2003 13

  • 7/24/2019 [Suykens J.] a Short Introduction to Support Vecto(BookZZ.org)

    15/21

    Fixed-size LS-SVM (2)

    high dimensional inputs, large data sets, adaptive learning machines (using LS-SVMlab)

    Sinc function (20.000 data, 10 SV)

    10 8 6 4 2 0 2 4 6 8 100.6

    0.4

    0.2

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    1.4

    x

    y

    10 8 6 4 2 0 2 4 6 8 100.4

    0.2

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    x

    y

    Santa Fe laser data

    2

    1

    0

    1

    2

    3

    4

    5

    0 100 200 300 400 500 600 700 800 900

    discrete time k

    y

    k

    10 20 30 40 50 60 70 80 902

    1

    0

    1

    2

    3

    4

    discrete time k

    yk,

    yk

    Introduction to SVM and kernelbased learning Johan Suykens ESANN 2003 14

  • 7/24/2019 [Suykens J.] a Short Introduction to Support Vecto(BookZZ.org)

    16/21

    Fixed-size LS-SVM (3)

    2.5 2 1.5 1 0.5 0 0.5 1 1.5 2 2.52.5

    2

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    2.5

    2.5 2 1.5 1 0.5 0 0.5 1 1.5 22.5

    2

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    2.5

    2.5 2 1.5 1 0.5 0 0.5 1 1.5 22.5

    2

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    2.5

    2.5 2 1.5 1 0.5 0 0.5 1 1.5 22.5

    2

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    2.5

    Introduction to SVM and kernelbased learning Johan Suykens ESANN 2003 15

  • 7/24/2019 [Suykens J.] a Short Introduction to Support Vecto(BookZZ.org)

    17/21

    The problem of learning and generalization (1)

    Different mathematical settings exist, e.g.

    Vapnik et al.:

    Predictive learning problem (inductive inference)Estimating values of functions at given points (transductive inference)Vapnik V. (1998) Statistical Learning Theory, John Wiley & Sons, New York.

    Poggio et al., Smale:Estimate true functionfwith analysis of approximation error and sampleerror (e.g. in RKHS space, Sobolev space)Cucker F., Smale S. (2002) On the mathematical foundations of learning theory, Bulletin of the AMS,

    39, 149.

    Goal: Deriving bounds on the generalization error (this can be used todetermine regularization parameters and other tuning constants). Importantfor practical applications is trying to get sharp bounds.

    Introduction to SVM and kernelbased learning Johan Suykens ESANN 2003 16

  • 7/24/2019 [Suykens J.] a Short Introduction to Support Vecto(BookZZ.org)

    18/21

    The problem of learning and generalization (2)

    (see Pontil, ESANN 2003)

    Random variables xX, yY RDraw i.i.d. samples from (unknown) probability distribution (x, y)

    Generalization error:

    E[f] =

    X,Y

    L(y, f(x))(x, y)dxdy

    Loss function L(y, f(x));empirical error EN[f] = 1

    N

    Ni=1

    L(yi, f(xi))

    f:= arg minf

    E[f] (true function); fN:= arg minf

    EN[f]

    IfL(y, f) = (f y)2 thenf=Y

    y(y|x)dy (regression function)

    Consider hypothesis space H with fH:= arg minfH

    E[f]

    Introduction to SVM and kernelbased learning Johan Suykens ESANN 2003 17

  • 7/24/2019 [Suykens J.] a Short Introduction to Support Vecto(BookZZ.org)

    19/21

    The problem of learning and generalization (3)

    generalization error= sample error + approximation errorE[fN] E[f] = (E[fN] E[fH]) + (E[fH] E[f])

    approximation errordepends only on H (not on sampled examples)sample error:

    E(fN) E(fH)(N, 1/h, 1/) (w.p. 1 )

    is a non-decreasing functionh measures the size of hypothesis space H

    Overfittingwhen h large & Nsmall (large sample error)Goal: obtain a good trade-off between sample error and approximation error

    Introduction to SVM and kernelbased learning Johan Suykens ESANN 2003 18

  • 7/24/2019 [Suykens J.] a Short Introduction to Support Vecto(BookZZ.org)

    20/21

    Interdisciplinary challenges

    NATO-ASI onLearning Theory and Practice, Leuven July 2002http://www.esat.kuleuven.ac.be/sista/natoasi/ltp2002.html

    SVM & kernel methods

    linear algebra

    mathematics

    statistics

    systems and control theory

    signal processingoptimization

    machine learning

    pattern recognition

    data mining

    neural networks

    J.A.K. Suykens, G. Horvath, S. Basu, C. Micchelli, J. Vandewalle (Eds.), Advances in Learning Theory:

    Methods, Models and Applications, NATO-ASI Series Computer and Systems Sciences, IOS Press, 2003.

    Introduction to SVM and kernelbased learning Johan Suykens ESANN 2003 19

  • 7/24/2019 [Suykens J.] a Short Introduction to Support Vecto(BookZZ.org)

    21/21

    Books, software, papers ...

    www.kernel-machines.org & www.esat.kuleuven.ac.be/sista/lssvmlab/

    Introductory papers:

    C.J.C. Burges (1998)A tutorial on support vector machines for pattern recognition,Knowledge Discovery

    and Data Mining, 2(2), 121-167.

    A.J. Smola, B. Scholkopf (1998)A tutorial on support vector regression, NeuroCOLT Technical Report

    NC-TR-98-030, Royal Holloway College, University of London, UK.

    T. Evgeniou, M. Pontil, T. Poggio (2000)Regularization networks and support vector machines, Advances

    in Computational Mathematics, 13(1), 150.

    K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda, B. Scholkopf (2001)An introduction to kernel-based learning

    algorithms, IEEE Transactions on Neural Networks, 12(2), 181-201.

    Introduction to SVM and kernelbased learning Johan Suykens ESANN 2003 20


Recommended