Multiple Kernel Learning

Multiple Kernel Learning

Hossein HajimirsadeghiSchool of Computing Science

Simon Fraser University

November 5, 2013

2

Introduction - SVM

0)(. bxw

1)(. bxw

w

1Max . Margin

1))(.( bxwy ii

2

, 2

1min w

bw

is.t.

1)(. bxwbxwxf )(.)(

3

iii bxwy 1))(.(

i

ibw

Cw 2

, 2

1min

is.t. 0i

4

i

iibw

bxwyCw ))(.(1,0max2

1min

2

,

Regularizer )),(( ii yxfl

Loss Function

5

SVM: Optimization Problem

iii bxwy 1))(.(

i

ibw

Cw 2

, 2

1min

is.t. 0i

iii

iiiii

ii

bxwy

CwbwL

1))((

2

1),,,,(

2

6

SVM: Dual

iii bxwy 1))(.(

i

ibw

Cw 2

, 2

1min

is.t. 0i

ji

jijijii

i xxyy,

)().(2

1max

Ci 0

0i

ii y

s.t. i

Primal

Dual

SVM-Dual

7

ji

jijijii

i xxyy,

)().(2

1max

Ci 00

iii y

s.t. i

Resulting Classifier:

,)().()(.)( bxxybxwxfi

iii

j

ijji xxyyb )().(

ji xxK ,

8

Kernel Methods

:such that kernel, called ,: Define XXK

)().(),( yxyxK

Ideas:

K often interpreted as a similarity measure

Benefits: Efficiency Flexibility

22211 )(),( cyxyxyxK

c

xc

xc

xx

x

x

c

xc

xc

xx

x

x

2

1

21

21

21

2

1

21

21

21

2

2

2.

2

2

2

Kernelized SVM

9

ji

jijijii

i xxyy,

)().(2

1max

Ci 00

iii y

s.t. i

Classifier:

,)()()(.)( bxxybxwxfi

iii

j

ijji xxyyb )()(

Kernelized

10

ji

jijijii

i xxKyy,

),(2

1max

Ci 00

iii y

s.t. i

Classifier:

,),()(.)( bxxKybxwxfi

iii

j

ijji xxKyyb ),(

11

Kernelized SVM

),(...),(),(

...

...

),(...),(),(

),(...),(),(

21

22212

12111

NNNN

N

N

xxKxxKxxK

xxKxxKxxK

xxKxxKxxK

K

YKYααα1α

TT

2

1max

Cα00Yα1TSubject to

12

Ideal Kernel MatrixTyyK

bxxKyxfi

iii ),()(

ji

jijiji yy

yyyyxxK

1

1),(

byyyxfi

iii )(

byyxfi

ii 2)(

13

Motivation for MKL

• Success of SVM is dependent on choice of good kernel:– How to choose kernels• Kernel function• Parameters

• Practical problems involve multiple heterogeneous data sources– How can kernels help to fuse features• Esp. features from different modalities

14

Multiple Kernel Learning

P

m

mj

mimji xxKfxxK

1),(,

P

m

mj

mimmji xxKxxK

1

),(,

General MKL:

Linear MKL:

15

MKL Algorithms

• Fixed Rules• Heuristic Approaches• Similarity Optimization– Maximizing the similarity to ideal kernel matrix

• Structural Risk Optimization– Minimizing “regularization term” + “error term”

16

Similarity Optimization

• Similarity:– kernel alignment– Euclidean distance– Kullback-Leibler (KL) divergence

2211

2121

,,

,),(

KKKK

KKKK A

i j

jiji xxKxxK ),(),(, 222

11121 KK

),( TA yyK

17

Similarity Optimization

• Lanckriet et al. (2004)

0 ,1 s.t.

),(max

KK

yyK

tr

A T

P

mmm

1

KK

Can be converted to a Semi-definite programming problem

Better Results: Centered Kernel AlignmentCortes et al (2010)

18

Structural Risk Optimization

YαYKαα1 ηα

TT

2

1max

Cα00Yα1T

Subject to

)( ηK

)()(min ηK ηη

r

0ηK

Subject to


19

Subject to

)()(min ηK ηη

r

0ηK

General MKL (Varma et al. 2009)

η

K η )(**

2

1Yα

η

KYα η

T

Coordinate descent algorithm:1-Fix kernel parameters and find 2-Fix and update by gradientα η

η α

YαYKαα1 ηα

TT

2

1max )( ηK

20

Structural Risk: Another View

P

m

mj

mimmji xxKxxK

1

),(, η

)(),(, jiji xxxxK ηηη

η0 if)(),( m

jmmim xx

)(

...

)(

)(

)(

...

)(

)(

,2

22

111

222

111

PiPP

i

i

T

PiPP

i

i

ji

x

x

x

x

x

x

xxK

η

)( ixη

21

Structural Risk: Another Viewbxxf )(.)(,, ηηbw w

b

x

x

x

xf

PiPP

i

i

P

)(

...

)(

)(

].,...,,[)(2

22

111

21,,

wwwηbw

b

x

x

x

xf

PiP

i

i

PP

)(

...

)(

)(

].,...,,[)(2

2

11

2211,,

wwwηbw

bxdxfP

mmmm

1,, )(.)( wdbw

1d 2d Pd

22

Structural Risk: Another Viewbxdxf

P

mmmm

1,, )(.)( wdbw

i

P

mmmmi bxdy

1))(.(1

w

i

i

P

mmm

bwCd

1

2

,, 2

1min w

is.t. 0i

mmm d wv :

i

P

mmmi bxy

1))(.(1

v

i

i

P

mmm

bvdCd

1

2

,,, 2

1min v

is.t. 0i

23


i

P

mmmi bxy

1))(.(1

v is.t. 0i

Simple MKL

i

i

P

mmm

bCdJ

1

2

,, 2

1min)( vdv

)(min dd

J

11

P

mmd 0mdSuch that

Rakotomamonjy et al. 2008

24

Multi-Class SVM

yyy bxyxf )(.),(, wbw

iiiii yyyxfyxf ),(),(),( ww

i

iC

2

, 2

1min ww

yi,

s.t. 0i

),(.),( yxyxf ww

i

i

yy

yy

0

1

)),(),,((max iiiyy

yxfyxfli

ww

25

Latent SVM

iii xfy 1)(w

i

iC

2

, 2

1min ww

is.t. 0i

),(.),( hxhxF ww

),(.max)( hxxfh

ww

1x 2x mx

1h 2h mh…

…

… ),( hxm),( 2 hx),( 1 hx

)(h

),(.max hxih

w

26

),,(.max yhxih

w

Multi-Class Latent SVM

iiiii yyyxfyxf ),(),(),( ww

i

iC

2

, 2

1min ww

yi,

s.t. 0i

),,(.),,( yhxyhxF ww

),,(.max),( yhxyxfh

ww

1x 2x mx

1h 2h mh…

…

…

y

),( hxm),( 2 hx),( 1 hx

),( hy

),,(.max iih

yhxw

27

Latent Kernelized Structural SVM

i

iC

2

, 2

1min ww

Wu and Jia 2012

),,(.),,( yhxyhxF ww

),,(.max),( yhxyxfh

ww

)),(),,((max iiiyy

i yxfyxfli

ww

)),,(max),,(max1,0max(,

iih

iyh

i yhxFyhxF ww

28


iiC

2

, 2

1min ww

)),,(max),,(max1,0max(,

iih

iyh

i yhxFyhxF ww

Find the dual

The dual Variables: ui , Su S

i Su

iiuw vxuxKvxFyhxF ),,,(),(),,( w

29


i Su

iiuSv

wSv

vxuxKvxFxf ),,,(max),(max)( w

Inference),,(.),,( yhxyhxF ww

),,(.max),( yhxyxfh

ww

),(.max)( vxxfSv

ww

NO EFFICIENT EXACT

SOLUTION

?),,,(max,

jjiihh

hxhxKji

30

Latent MKL

i

P

mimmi bhxy

1)),(.(1

v is.t. 0i

m

mi

i

P

mmm

dbdCd 2

1

2

,,, 22

1min

vv

Vahdat et al. 2013Latent Version of SimpleMKL

P

mii

hhxdxf

1

),(.max)( ww

0md1y i

* ihh

1y i h

Coordinate descent Learning Algorithm:

1-Perform inference for positive samples2-Solve the dual optimization problem like SimpleMKL

Find the dual

31

Some other works

• Hierarchical MKL (Bach 2008)• Latent Kernel SVM (Yang et al. 2012)• Deep MKL (Strobl and Visweswaran 2013)

32

References• Gönen, M., & Alpaydın, E. (2011). Multiple kernel learning algorithms. The Journal of Machine Learning

Research, 2211-2268.

• Rakotomamonjy, A., Bach, F., Canu, S., and Grandvalet, Y. (2008). SimpleMKL.Journal of Machine Learning Research, 9, 2491-2521.

• Varma, M., & Babu, B. R. (2009, June). More generality in efficient multiple kernel learning. In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 1065-1072).

• Cortes, C., Mohri, M., & Rostamizadeh, A. (2010). Two-stage learning kernel algorithms. In Proceedings of the 27th International Conference on Machine Learning (ICML-10) (pp. 239-246).

• Lanckriet, G. R., Cristianini, N., Bartlett, P., Ghaoui, L. E., & Jordan, M. I. (2004). Learning the kernel matrix with semidefinite programming. The Journal of Machine Learning Research, 5, 27-72.

• Wu, X., & Jia, Y. (2012). View-invariant action recognition using latent kernelized structural SVM. In Computer Vision–ECCV 2012 (pp. 411-424). Springer Berlin Heidelberg.

• Felzenszwalb, P., McAllester, D., & Ramanan, D. (2008, June). A discriminatively trained, multiscale, deformable part model. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on (pp. 1-8). IEEE.

• Yang, W., Wang, Y., Vahdat, A., & Mori, G. (2012). Kernel Latent SVM for Visual Recognition. In Advances in Neural Information Processing Systems(pp. 818-826).

• Vahdat, A., Cannons, K., Mori, G., Oh, S., & Kim, I. (2013). Compositional Models for Video Event Detection: A Multiple Kernel Learning Latent Variable Approach. IEEE International Conference on Computer Vision (ICCV).

• Cortes, C., Mohri, M., Rostamizadeh, A., ICML 2011 Tutorial: Learning Kernels.

Date post:	07-Feb-2016
Category:	Documents
Upload:	gzifa
View:	36 times
Download:	0 times

Multiple Kernel Learning

Documents