+ All Categories
Home > Documents > Kernel Methods

Kernel Methods

Date post: 24-Feb-2016
Category:
Upload: khuyen
View: 29 times
Download: 0 times
Share this document with a friend
Description:
Kernel Methods . A B M Shawkat Ali. Data Mining. ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable information  crucial decisions ¤ Approach. Model. crucial decisions. Train Data. Test Data. History of SVM. - PowerPoint PPT Presentation
24
Kernel Methods A B M Shawkat Ali 1
Transcript
Page 1: Kernel Methods

Kernel Methods

A B M Shawkat Ali

1

Page 2: Kernel Methods

2

Data Mining

¤ DM or KDD (Knowledge Discovery in Databases)

Extracting previously unknown, valid, and actionable information crucial decisions

¤ Approach ModelTrain Data

crucial decisions

Test Data

Page 3: Kernel Methods

History of SVM• The original optimal hyperplane algorithm proposed by Vladimir

Vapnik in 1963 was a linear classifier.

• However, in 1992, Bernhard Boser, Isabelle Guyon and Vapnik suggested a way to create non-linear classifiers by applying the kernel trick (originally proposed by Aizerman et al.) to maximum-margin hyperplanes. The resulting algorithm is formally similar, except that every dot product is replaced by a non-linear kernel function. This allows the algorithm to fit the maximum-margin hyperplane in a transformed feature space. The transformation may be non-linear and the transformed space high dimensional; thus though the classifier is a hyperplane in the high-dimensional feature space, it may be non-linear in the original input space.

Page 4: Kernel Methods

4

Property of the SVM

¤ Relatively new approach¤ Lot of interest recently: Many successes, e.g., text classification

¤ Important concepts: Transformation into high dimensional space Finding a "maximal margin" separation Structural risk minimization rather than Empirical risk minimization

Page 5: Kernel Methods

5

Support Vector Machine (SVM)

¤ Classification Grouping of similar data.

¤ Regression Prediction by historical knowledge.

¤ Novelty Detection To detect abnormal instances from a

dataset.¤ Clustering, Feature Selection

Page 6: Kernel Methods

6

SVM Block Diagram

Training Data Domain

Non linear Mapping by Kernel

To Choose Optimal Hyperplane

Linear Feature Space of SVM

Page 7: Kernel Methods

7

SVM Block Diagram

Constructed Model through Feature

knowledge

Class I

Class II

Test Data Domain

Kernel Mapping

Page 8: Kernel Methods

8

SVM Formulation

Siby ii ,1)( Xw

Si

iii y Xw

)(sign byySi

iii

XX

w1 )(sign by Xw

ww

:min

1)( by ii Xw

Page 9: Kernel Methods

9

SVM Formulation

Siby ii ,1)( Xw

Si

iii y Xw

)(sign byySi

iii

XX

)(sign by Xw

iiibbyC )(1:min 2

21

,Xww

w

Page 10: Kernel Methods

10

SVM Formulation

x X

)(

)()(

xXxX

ii

),( K ),()()( xxxx ii K

)),((sign bKyySi

iii

xx

))()((sign byySi

iii

xx

)()( xxXX ii

Mercer’s Condition

)(sign byySi

iii

XX

Page 11: Kernel Methods

11

Types of Kernels

Common kernels for SVM¤ Linear ¤ Polynomial¤ Radial Basis Function

New kernels (not used in SVM)¤ Laplace ¤ Multiquadratic

Page 12: Kernel Methods

12

SVM kernel

dii kK )(),( xxxx

)exp(),( 22).(2).().(

xxxxxxxx iii kkk

iK

Polynomial

Gaussian (Radial Basis Function)

Linear

xxxx iiK ),(

Page 13: Kernel Methods

13

Laplace kernel

Introduced by Pavel Paclik et. al. in Pattern Recognition letters 21 (2000)

Laplace Kernel based on Laplace Probability Density

N

i

D

j c

ijj

c h

xx

hNxf

1 1

exp211)|(ˆ

Smoothing Parameter (Sp)

Page 14: Kernel Methods

14

Linear Kernel

Page 15: Kernel Methods

The reality of data separation

Page 16: Kernel Methods

16

RBF kernel

Page 17: Kernel Methods

XOR solved by SVM

Input data x

Output class y

(-1,-1) -1

(-1,+1) +1

(+1,-1) +1

(+1,+1) -1

Table 5.3. Boolean XOR Problem

Page 18: Kernel Methods

2)1(),( TjijiK xxxx

1 11 1 1 1 1 1

.1 1 1 1 1 11 1

Ti j

x x

•First, we transform the dataset by polynomial kernel as:

Here,

Page 19: Kernel Methods

9111191111911119

),( jiK xx

4 4 41

21 1 1i j

o y y Ki i i j i ji i j

x x

Therefore the kernel matrix is:

We can write the maximization term following SVM implementation given in Figure 5.20 as:

1 2 2 (9 2 2 2 91 2 3 4 1 1 2 1 3 1 4 222 22 2 9 2 9 )2 3 2 4 3 3 4 4

04321

4

1

i

iiy

1 2 3 40 , 0 , 0 , 0 subject to:

,

Page 20: Kernel Methods

191919

19

4321

4321

4321

4321

By solving these above equations we can write the solution to this optimisation problem as:

81

4321

.

Therefore, the decision function in the inner product representation is:

4

1

2

1

1125.0,ˆi

ii

n

iiii yKyf xxxxx

Page 21: Kernel Methods

)()(

)(2)(2)())((2)(1

1)(2)(

)1),((),(

i

22112

2222112

11

22112

2211

2

jT

jijijijijiji

jijijiji

jiji

xxxxxxxxxxxx

xxxxxxxx

K

xx

xxxx

The 2nd degree polynomial kernel function:

Now we can write the 2nd degree polynomial transformation function as:

Tiiiiiii xxxxxx ]2,2,,2,,1[)( 21

2221

21 x

Page 22: Kernel Methods

4

1

)(i

iiio y x

)]()()()([81

4321 xxxx

Page 23: Kernel Methods

000

2/1

00

2

2

12

11

2

2

12

11

2

2

12

11

2

2

12

11

81

0

2

2

2

1

0,0,0,21,0,0

2

1

22

21

21

x

x

x

xx

x

=

Therefore the optimal hyperplane function for this XOR problem is:

21)(ˆ xxf x

Page 24: Kernel Methods

Conclusions

• Research Issues

– How to select a kernel automatically– How to select optimal parameter values for kernel


Recommended