+ All Categories
Home > Documents > Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes...

Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes...

Date post: 05-Jan-2016
Category:
Upload: louise-montgomery
View: 388 times
Download: 0 times
Share this document with a friend
Popular Tags:
21
Kernels Usman Roshan CS 675 Machine Learning
Transcript
Page 1: Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.

Kernels

Usman RoshanCS 675 Machine Learning

Page 2: Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.

Feature space representation

• Consider two classes shown below• Data cannot be separated by a hyperplane

Page 3: Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.

Feature space representation

• Suppose we square each coordinate• In other words (x1 , x2 ) => (x1

2 , x22 )

• Now the data are well separated

Page 4: Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.

Feature spaces/Kernel trick

• Using a linear classifier (nearest means or SVM) we solve a non-linear problem simply by working in a different feature space.

• With kernels – we don’t have to make the new feature space

explicit.– we can implicitly work in a different space and

efficiently compute dot products there.

Page 5: Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.

Support vector machine

• Consider the hard margin SVM optimization

• Solve by applying KKT. Think of KKT as a tool for constrained convex optimization.

• Form Lagrangian2

0

1- ( ( ) 1)

2

where are Lagrange multipliers

Tp i i i

i

i

L w y w x w

Page 6: Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.

Support vector machine

• KKT says the optimal w and w0 are given by the saddle point solution

• And KKT conditions imply that and

0

2

, 0

1min max - ( ( ) 1)

2

subject to 0

i

Tw w p i i i

i

i

L w y w x w

Page 7: Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.

Support vector machine

• After applying the Lagrange multipliers we obtain the dual by substituting w into the primal (dual is maximized)

Page 8: Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.

SVM and kernels

• We can rewrite the dual in a compact form:

Page 9: Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.

Optimization

• The SVM is thus a quadratic program that can be solved by any quadratic program solver.

• Platt’s Sequential Minimization Optimization (SMO) algorithm offers a simple specific solution to the SVM dual

• Idea is to perform coordinate ascent by selecting two variables at a time to optimize

• Let’s look at some kernels.

Page 10: Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.

Example kernels

• Polynomial kernels of degree d give a feature space with higher order non-linear terms

• Radial basis kernel gives infinite dimensional space (Taylor series)

( , ) ( 1)T di j i jK x x x x

2

2( , )i jx x

si jK x x e

Page 11: Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.

Example kernels

• Empirical kernel map– Define a set of reference vectors for– Define a score between xi and mj

– Then– And

Page 12: Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.

Example kernels

• Bag of words– Given two documents D1 and D2 the we define the

kernel K(D1,D2) as the number of words in common

– To prove this is a kernel first create a large set of words Wi. Define the mapping Φ(D1) as a high dimensional vector where Φ(D1)[i] is 1 if the word Wi is present in the document.

Page 13: Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.

SVM and kernels

• What if we make the kernel matrix K a variable and optimize the dual

• But now there is no way to tie the kernel matrix to the training data points.

Page 14: Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.

SVM and kernels

• To tie the kernel matrix to training data we assume that the kernel to be determined is a linear combination of some existing base kernels.

• Now we have a problem that is not a quadratic program anymore.

• Instead we have a semi-definite program (Lanckriet et. al. 2002)

Page 15: Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.

Theoretical foundation

• Recall the margin error theorem (7.3 from Learning with kernels)

Page 16: Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.

Theoretical foundation

• The kernel analogue of Theorem 7.3 from Lackriet et. al. 2002:

Page 17: Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.

How does MKL work in practice?

• Gonnen and Alpaydin, JMLR, 2011• Datasets:– Digit recognition, – Internet advertisements– Protein folding

• Form kernels with different sets of features• Apply SVM with various kernel learning

algorithms.

Page 18: Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.

How does MKL work in practice?

From Gonnen and Alpaydin, JMLR, 2011

Page 19: Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.

How does MKL work in practice?

From Gonnen and Alpaydin, JMLR, 2011

Page 20: Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.

How does MKL work in practice?

From Gonnen and Alpaydin, JMLR, 2011

Page 21: Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.

How does MKL work in practice?

• MKL better than single kernel• Mean kernel hard to beat• Non-linear MKL looks promising


Recommended