+ All Categories
Home > Documents > Multiple Kernel Learning

Multiple Kernel Learning

Date post: 07-Feb-2016
Category:
Upload: gzifa
View: 36 times
Download: 0 times
Share this document with a friend
Description:
November 5, 2013. Multiple Kernel Learning. Hossein Hajimirsadeghi School of Computing Science Simon Fraser University. Introduction - SVM. Max . Margin. s.t. s.t. Regularizer. Loss Function. SVM: Optimization Problem. s.t. SVM: Dual. Primal. s.t. s.t. Dual. SVM-Dual. s.t. - PowerPoint PPT Presentation
Popular Tags:
32
Multiple Kernel Learning Hossein Hajimirsadeghi School of Computing Science Simon Fraser University November 5, 2013
Transcript
Page 1: Multiple Kernel Learning

Multiple Kernel Learning

Hossein HajimirsadeghiSchool of Computing Science

Simon Fraser University

November 5, 2013

Page 2: Multiple Kernel Learning

2

Introduction - SVM

0)(. bxw

1)(. bxw

w

1Max . Margin

1))(.( bxwy ii

2

, 2

1min w

bw

is.t.

1)(. bxwbxwxf )(.)(

Page 3: Multiple Kernel Learning

3

iii bxwy 1))(.(

i

ibw

Cw 2

, 2

1min

is.t. 0i

Page 4: Multiple Kernel Learning

4

i

iibw

bxwyCw ))(.(1,0max2

1min

2

,

Regularizer )),(( ii yxfl

Loss Function

Page 5: Multiple Kernel Learning

5

SVM: Optimization Problem

iii bxwy 1))(.(

i

ibw

Cw 2

, 2

1min

is.t. 0i

iii

iiiii

ii

bxwy

CwbwL

1))((

2

1),,,,(

2

Page 6: Multiple Kernel Learning

6

SVM: Dual

iii bxwy 1))(.(

i

ibw

Cw 2

, 2

1min

is.t. 0i

ji

jijijii

i xxyy,

)().(2

1max

Ci 0

0i

ii y

s.t. i

Primal

Dual

Page 7: Multiple Kernel Learning

SVM-Dual

7

ji

jijijii

i xxyy,

)().(2

1max

Ci 00

iii y

s.t. i

Resulting Classifier:

,)().()(.)( bxxybxwxfi

iii

j

ijji xxyyb )().(

ji xxK ,

Page 8: Multiple Kernel Learning

8

Kernel Methods

:such that kernel, called ,: Define XXK

)().(),( yxyxK

Ideas:

K often interpreted as a similarity measure

Benefits: Efficiency Flexibility

22211 )(),( cyxyxyxK

c

xc

xc

xx

x

x

c

xc

xc

xx

x

x

2

1

21

21

21

2

1

21

21

21

2

2

2.

2

2

2

Page 9: Multiple Kernel Learning

Kernelized SVM

9

ji

jijijii

i xxyy,

)().(2

1max

Ci 00

iii y

s.t. i

Classifier:

,)()()(.)( bxxybxwxfi

iii

j

ijji xxyyb )()(

Page 10: Multiple Kernel Learning

Kernelized

10

ji

jijijii

i xxKyy,

),(2

1max

Ci 00

iii y

s.t. i

Classifier:

,),()(.)( bxxKybxwxfi

iii

j

ijji xxKyyb ),(

Page 11: Multiple Kernel Learning

11

Kernelized SVM

),(...),(),(

...

...

),(...),(),(

),(...),(),(

21

22212

12111

NNNN

N

N

xxKxxKxxK

xxKxxKxxK

xxKxxKxxK

K

YKYααα1α

TT

2

1max

Cα00Yα1TSubject to

Page 12: Multiple Kernel Learning

12

Ideal Kernel MatrixTyyK

bxxKyxfi

iii ),()(

ji

jijiji yy

yyyyxxK

1

1),(

byyyxfi

iii )(

byyxfi

ii 2)(

Page 13: Multiple Kernel Learning

13

Motivation for MKL

• Success of SVM is dependent on choice of good kernel:– How to choose kernels• Kernel function• Parameters

• Practical problems involve multiple heterogeneous data sources– How can kernels help to fuse features• Esp. features from different modalities

Page 14: Multiple Kernel Learning

14

Multiple Kernel Learning

P

m

mj

mimji xxKfxxK

1),(,

P

m

mj

mimmji xxKxxK

1

),(,

General MKL:

Linear MKL:

Page 15: Multiple Kernel Learning

15

MKL Algorithms

• Fixed Rules• Heuristic Approaches• Similarity Optimization– Maximizing the similarity to ideal kernel matrix

• Structural Risk Optimization– Minimizing “regularization term” + “error term”

Page 16: Multiple Kernel Learning

16

Similarity Optimization

• Similarity:– kernel alignment– Euclidean distance– Kullback-Leibler (KL) divergence

2211

2121

,,

,),(

KKKK

KKKK A

i j

jiji xxKxxK ),(),(, 222

11121 KK

),( TA yyK

Page 17: Multiple Kernel Learning

17

Similarity Optimization

• Lanckriet et al. (2004)

0 ,1 s.t.

),(max

KK

yyK

tr

A T

P

mmm

1

KK

Can be converted to a Semi-definite programming problem

Better Results: Centered Kernel AlignmentCortes et al (2010)

Page 18: Multiple Kernel Learning

18

Structural Risk Optimization

YαYKαα1 ηα

TT

2

1max

Cα00Yα1T

Subject to

)( ηK

)()(min ηK ηη

r

0ηK

Subject to

Page 19: Multiple Kernel Learning

Structural Risk Optimization

19

Subject to

)()(min ηK ηη

r

0ηK

General MKL (Varma et al. 2009)

η

K η )(**

2

1Yα

η

KYα η

T

Coordinate descent algorithm:1-Fix kernel parameters and find 2-Fix and update by gradientα η

η α

YαYKαα1 ηα

TT

2

1max )( ηK

Page 20: Multiple Kernel Learning

20

Structural Risk: Another View

P

m

mj

mimmji xxKxxK

1

),(, η

)(),(, jiji xxxxK ηηη

η0 if)(),( m

jmmim xx

)(

...

)(

)(

)(

...

)(

)(

,2

22

111

222

111

PiPP

i

i

T

PiPP

i

i

ji

x

x

x

x

x

x

xxK

η

)( ixη

Page 21: Multiple Kernel Learning

21

Structural Risk: Another Viewbxxf )(.)(,, ηηbw w

b

x

x

x

xf

PiPP

i

i

P

)(

...

)(

)(

].,...,,[)(2

22

111

21,,

wwwηbw

b

x

x

x

xf

PiP

i

i

PP

)(

...

)(

)(

].,...,,[)(2

2

11

2211,,

wwwηbw

bxdxfP

mmmm

1,, )(.)( wdbw

1d 2d Pd

Page 22: Multiple Kernel Learning

22

Structural Risk: Another Viewbxdxf

P

mmmm

1,, )(.)( wdbw

i

P

mmmmi bxdy

1))(.(1

w

i

i

P

mmm

bwCd

1

2

,, 2

1min w

is.t. 0i

mmm d wv :

i

P

mmmi bxy

1))(.(1

v

i

i

P

mmm

bvdCd

1

2

,,, 2

1min v

is.t. 0i

Page 23: Multiple Kernel Learning

23

Structural Risk Optimization

i

P

mmmi bxy

1))(.(1

v is.t. 0i

Simple MKL

i

i

P

mmm

bCdJ

1

2

,, 2

1min)( vdv

)(min dd

J

11

P

mmd 0mdSuch that

Rakotomamonjy et al. 2008

Page 24: Multiple Kernel Learning

24

Multi-Class SVM

yyy bxyxf )(.),(, wbw

iiiii yyyxfyxf ),(),(),( ww

i

iC

2

, 2

1min ww

yi,

s.t. 0i

),(.),( yxyxf ww

i

i

yy

yy

0

1

)),(),,((max iiiyy

yxfyxfli

ww

Page 25: Multiple Kernel Learning

25

Latent SVM

iii xfy 1)(w

i

iC

2

, 2

1min ww

is.t. 0i

),(.),( hxhxF ww

),(.max)( hxxfh

ww

1x 2x mx

1h 2h mh…

… ),( hxm),( 2 hx),( 1 hx

)(h

),(.max hxih

w

Page 26: Multiple Kernel Learning

26

),,(.max yhxih

w

Multi-Class Latent SVM

iiiii yyyxfyxf ),(),(),( ww

i

iC

2

, 2

1min ww

yi,

s.t. 0i

),,(.),,( yhxyhxF ww

),,(.max),( yhxyxfh

ww

1x 2x mx

1h 2h mh…

y

),( hxm),( 2 hx),( 1 hx

),( hy

),,(.max iih

yhxw

Page 27: Multiple Kernel Learning

27

Latent Kernelized Structural SVM

i

iC

2

, 2

1min ww

Wu and Jia 2012

),,(.),,( yhxyhxF ww

),,(.max),( yhxyxfh

ww

)),(),,((max iiiyy

i yxfyxfli

ww

)),,(max),,(max1,0max(,

iih

iyh

i yhxFyhxF ww

Page 28: Multiple Kernel Learning

28

Latent Kernelized Structural SVM

iiC

2

, 2

1min ww

)),,(max),,(max1,0max(,

iih

iyh

i yhxFyhxF ww

Find the dual

The dual Variables: ui , Su S

i Su

iiuw vxuxKvxFyhxF ),,,(),(),,( w

Page 29: Multiple Kernel Learning

29

Latent Kernelized Structural SVM

i Su

iiuSv

wSv

vxuxKvxFxf ),,,(max),(max)( w

Inference),,(.),,( yhxyhxF ww

),,(.max),( yhxyxfh

ww

),(.max)( vxxfSv

ww

NO EFFICIENT EXACT

SOLUTION

?),,,(max,

jjiihh

hxhxKji

Page 30: Multiple Kernel Learning

30

Latent MKL

i

P

mimmi bhxy

1)),(.(1

v is.t. 0i

m

mi

i

P

mmm

dbdCd 2

1

2

,,, 22

1min

vv

Vahdat et al. 2013Latent Version of SimpleMKL

P

mii

hhxdxf

1

),(.max)( ww

0md1y i

* ihh

1y i h

Coordinate descent Learning Algorithm:

1-Perform inference for positive samples2-Solve the dual optimization problem like SimpleMKL

Find the dual

Page 31: Multiple Kernel Learning

31

Some other works

• Hierarchical MKL (Bach 2008)• Latent Kernel SVM (Yang et al. 2012)• Deep MKL (Strobl and Visweswaran 2013)

Page 32: Multiple Kernel Learning

32

References• Gönen, M., & Alpaydın, E. (2011). Multiple kernel learning algorithms. The Journal of Machine Learning

Research, 2211-2268.

• Rakotomamonjy, A., Bach, F., Canu, S., and Grandvalet, Y. (2008). SimpleMKL.Journal of Machine Learning Research, 9, 2491-2521.

• Varma, M., & Babu, B. R. (2009, June). More generality in efficient multiple kernel learning. In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 1065-1072).

• Cortes, C., Mohri, M., & Rostamizadeh, A. (2010). Two-stage learning kernel algorithms. In Proceedings of the 27th International Conference on Machine Learning (ICML-10) (pp. 239-246).

• Lanckriet, G. R., Cristianini, N., Bartlett, P., Ghaoui, L. E., & Jordan, M. I. (2004). Learning the kernel matrix with semidefinite programming. The Journal of Machine Learning Research, 5, 27-72.

• Wu, X., & Jia, Y. (2012). View-invariant action recognition using latent kernelized structural SVM. In Computer Vision–ECCV 2012 (pp. 411-424). Springer Berlin Heidelberg.

• Felzenszwalb, P., McAllester, D., & Ramanan, D. (2008, June). A discriminatively trained, multiscale, deformable part model. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on (pp. 1-8). IEEE.

• Yang, W., Wang, Y., Vahdat, A., & Mori, G. (2012). Kernel Latent SVM for Visual Recognition. In Advances in Neural Information Processing Systems(pp. 818-826).

• Vahdat, A., Cannons, K., Mori, G., Oh, S., & Kim, I. (2013). Compositional Models for Video Event Detection: A Multiple Kernel Learning Latent Variable Approach. IEEE International Conference on Computer Vision (ICCV).

• Cortes, C., Mohri, M., Rostamizadeh, A., ICML 2011 Tutorial: Learning Kernels.


Recommended