+ All Categories
Home > Documents > Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n...

Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n...

Date post: 12-Jul-2020
Category:
Upload: others
View: 6 times
Download: 0 times
Share this document with a friend
64
Machine Learning Kernels and the Kernel Trick 1
Transcript
Page 1: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

MachineLearning

KernelsandtheKernelTrick

1

Page 2: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Supportvectormachines

• Trainingbymaximizingmargin

• TheSVMobjective

• SolvingtheSVMoptimizationproblem

• Supportvectors,dualsandkernels

2

Page 3: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Supportvectormachines

• Trainingbymaximizingmargin

• TheSVMobjective

• SolvingtheSVMoptimizationproblem

• Supportvectors,dualsandkernels

3

Page 4: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Thislecture

1. Supportvectors

2. Kernels

3. Thekerneltrick

4. Propertiesofkernels

5. Anotherexampleofthekerneltrick

4

Page 5: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Thislecture

1. Supportvectors

2. Kernels

3. Thekerneltrick

4. Propertiesofkernels

5. Anotherexampleofthekerneltrick

5

Page 6: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Sofarwehaveseen

• Supportvectormachines

• Hingelossandoptimizingtheregularizedloss

Morebroadly,differentalgorithmsforlearninglinearclassifiers

6

Page 7: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Sofarwehaveseen

• Supportvectormachines

• Hingelossandoptimizingtheregularizedloss

Morebroadly,differentalgorithmsforlearninglinearclassifiers

Whataboutnon-linearmodels?

7

Page 8: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Onewaytolearnnon-linearmodels

Explicitlyintroducenon-linearityintothefeaturespace

8

Ifthetrueseparatorisquadratic

Page 9: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Onewaytolearnnon-linearmodels

Explicitlyintroducenon-linearityintothefeaturespace

9

Ifthetrueseparatorisquadratic Transformallinputpointsas

Page 10: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Onewaytolearnnon-linearmodels

Explicitlyintroducenon-linearityintothefeaturespace

10

Ifthetrueseparatorisquadratic Transformallinputpointsas

Now,wecantrytofindaweightvectorinthishigherdimensionalspace

Thatis,predictusingwTÁ(x1,x2)¸ b

Page 11: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

SVM:Primals andduals

TheSVMobjective

11

Thisiscalledtheprimalformoftheobjective

Thiscanbeconvertedtoitsdualform,whichwillletusproveaveryusefulproperty

Page 12: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

SVM:Primals andduals

TheSVMobjective

12

Thisiscalledtheprimalformoftheobjective

Thiscanbeconvertedtoitsdualform,whichwillletusproveaveryusefulproperty

Anotheroptimizationproblem

HasthepropertythatmaxDual=minPrimal

Page 13: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Supportvectormachines

Letw betheminimizeroftheSVMproblemforsomedatasetwithmexamples:{(xi,yi)}Then,fori =1…m,thereexist®i¸ 0suchthattheoptimumwcanbewrittenas

13

Page 14: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Supportvectormachines

Letw betheminimizeroftheSVMproblemforsomedatasetwithmexamples:{(xi,yi)}Then,fori =1…m,thereexist®i¸ 0suchthattheoptimumwcanbewrittenas

Furthermore,

14

++

++++++

-- --

-- -- --

---- --

--

+ -

Allpointsoutsidethemargin

Page 15: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Supportvectormachines

Letw betheminimizeroftheSVMproblemforsomedatasetwithmexamples:{(xi,yi)}Then,fori =1…m,thereexist®i¸ 0suchthattheoptimumwcanbewrittenas

Furthermore,

15

++

++++++

-- --

-- -- --

---- --

--

+ -

Allpointsonthewrongsideofthemargin

Page 16: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Supportvectormachines

Letw betheminimizeroftheSVMproblemforsomedatasetwithmexamples:{(xi,yi)}Then,fori =1…m,thereexist®i¸ 0suchthattheoptimumwcanbewrittenas

Furthermore,

16

++

++++++

-- --

-- -- --

---- --

--

+ -

Allpointsonthemargin

Page 17: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Supportvectors

Theweightvectoriscompletelydefinedbytrainingexampleswhose®isarenotzero

Theseexamplesarecalledthesupportvectors

17

Page 18: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Thislecture

ü Supportvectors

2. Kernels

3. Thekerneltrick

4. Propertiesofkernels

5. Anotherexampleofthekerneltrick

18

Page 19: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Predictingwithlinearclassifiers

• Prediction=and

• Thatis,wejustshowedthat

– Weonlyneedtocomputedotproductsbetweentrainingexamplesandthenewexamplex

19

Page 20: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Predictingwithlinearclassifiers

• Prediction=and

• Thatis,wejustshowedthat

– Weonlyneedtocomputedotproductsbetweentrainingexamplesandthenewexamplex

20

Page 21: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Predictingwithlinearclassifiers

• Prediction=and

• Thatis,wejustshowedthat

– Weonlyneedtocomputedotproductsbetweentrainingexamplesandthenewexamplex

21

Page 22: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Predictingwithlinearclassifiers

• Prediction=and

• Thatis,wejustshowedthat

– Weonlyneedtocomputedotproductsbetweentrainingexamplesandthenewexamplex

• Thisistrueevenifwemapexamplestoahighdimensionalspace

22

Page 23: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Predictingwithlinearclassifiers

• Prediction=and

• Thatis,wejustshowedthat

– Thatisweonlyneedtocomputedotproductsbetweentrainingexamplesandthenewexamplex

• Thisistrueevenifwemapexamplestoahighdimensionalspace

23

Page 24: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Dotproductsinhighdimensionalspaces

Letusdefineadotproductinthehighdimensionalspace

Sopredictionwiththishighdimensionalliftingmapis

24

because

Page 25: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Dotproductsinhighdimensionalspaces

Letusdefineadotproductinthehighdimensionalspace

Sopredictionwiththishighdimensionalliftingmapis

25

because

Page 26: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Dotproductsinhighdimensionalspaces

Letusdefineadotproductinthehighdimensionalspace

Sopredictionwiththishighdimensionalliftingmapis

26

because

Page 27: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Kernelbasedmethods

Whatdoesthisnewformulationgiveus?IfwehavetocomputeÁ everytimeanyway,wegainnothing

IfwecancomputethevalueofKwithoutexplicitlywritingtheblownuprepresentation,thenwewillhaveacomputationaladvantage

27

Predictusing

Page 28: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Kernelbasedmethods

Whatdoesthisnewformulationgiveus?IfwehavetocomputeÁ everytimeanyway,wegainnothing

IfwecancomputethevalueofKwithoutexplicitlywritingtheblownuprepresentation,thenwewillhaveacomputationaladvantage.

28

Predictusing

Page 29: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Thislecture

ü Supportvectors

ü Kernels

3. Thekerneltrick

4. Propertiesofkernels

5. Anotherexampleofthekerneltrick

29

Page 30: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Example:PolynomialKernel

• Giventwoexamplesx andz wewanttomapthemtoahighdimensionalspace[forexample, quadratic]

30

Page 31: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Example:PolynomialKernel

• Giventwoexamplesx andz wewanttomapthemtoahighdimensionalspace[forexample, quadratic]

31

Alldegreezeroterms

Page 32: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Example:PolynomialKernel

• Giventwoexamplesx andz wewanttomapthemtoahighdimensionalspace[forexample, quadratic]

32

Alldegreezeroterms Alldegreeoneterms

Page 33: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Example:PolynomialKernel

• Giventwoexamplesx andz wewanttomapthemtoahighdimensionalspace[forexample, quadratic]

33

Alldegreezeroterms Alldegreeoneterms Alldegreetwoterms

Page 34: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Example:PolynomialKernel

• Giventwoexamplesx andz wewanttomapthemtoahighdimensionalspace[forexample, quadratic]

andcomputethedotproductA=Á(x)TÁ (z)[takestime]

34

Alldegreezeroterms Alldegreeoneterms Alldegreetwoterms

Page 35: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Example:PolynomialKernel

• Giventwoexamplesx andz wewanttomapthemtoahighdimensionalspace[forexample, quadratic]

andcomputethedotproductA=Á(x)TÁ (z)[takestime]

• Instead,intheoriginalspace,compute

Theorem:A=B(Coefficientsdonotreallymatter)

35

Page 36: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Example:PolynomialKernel

• Giventwoexamplesx andz wewanttomapthemtoahighdimensionalspace[forexample, quadratic]

andcomputethedotproductA=Á(x)TÁ (z)[takestime]

• Instead,intheoriginalspace,compute

Theorem:A=B(Coefficientsdonotreallymatter)

36

Page 37: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Example:PolynomialKernel

• Giventwoexamplesx andz wewanttomapthemtoahighdimensionalspace[forexample, quadratic]

andcomputethedotproductA=Á(x)TÁ (z)[takestime]

• Instead,intheoriginalspace,compute

Theorem:A=B(Coefficientsdonotreallymatter)

37

Page 38: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Example:PolynomialKernel

• Giventwoexamplesx andz wewanttomapthemtoahighdimensionalspace[forexample, quadratic]

andcomputethedotproductA=Á(x)TÁ (z)[takestime]

• Instead,intheoriginalspace,compute

Claim:A=B(Coefficientsdonotreallymatter)

38

Page 39: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Example:Twodimensions,quadratickernel

39

A=Á(x)TÁ (z)

Page 40: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

TheKernelTrick

SupposewewishtocomputeK(x,z)= Á(x)TÁ (z)

HereÁ mapsx andztoahighdimensionalspace

TheKernelTrick:Savetime/spacebycomputingthevalueofK(x,z)byperformingoperationsintheoriginalspace(withoutafeaturetransformation!)

40

Page 41: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Computingdotproductsefficiently

KernelTrick: Youwanttoworkwithdegree2polynomialfeatures,Á(x).Then,yourdotproductwillbeoperateusingvectorsinaspaceofdimensionalityn(n+1)/2.

Thekerneltrickallowsyoutosavetime/spaceandcomputedotproductsinanndimensionalspace.

• CanweuseanyfunctionK(.,.)?– No!AfunctionK(x,z)isavalidkernelifitcorrespondstoaninnerproductin

some(perhapsinfinitedimensional)featurespace.

• Generalcondition: constructtheGrammatrix{K(xi ,zj)};checkthatit’spositivesemidefinite

41

(Notjustfordegree2polynomials)

Page 42: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Thislecture

ü Supportvectors

ü Kernels

ü Thekerneltrick

4. Propertiesofkernels

5. Anotherexampleofthekerneltrick

42

Page 43: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Whichfunctionsarekernels?

KernelTrick: Youwanttoworkwithdegree2polynomialfeatures,Á(x).Then,yourdotproductwillbeoperateusingvectorsinaspaceofdimensionalityn(n+1)/2.

Thekerneltrickallowsyoutosavetime/spaceandcomputedotproductsinanndimensionalspace.

• CanweuseanyfunctionK(.,.)?– No!AfunctionK(x,z)isavalidkernelifitcorrespondstoaninnerproductin

some(perhapsinfinitedimensional)featurespace.

• Generalcondition: constructtheGrammatrix{K(xi ,zj)};checkthatit’spositivesemidefinite

43

(Notjustfordegree2polynomials)

Page 44: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Whichfunctionsarekernels?

KernelTrick: Youwanttoworkwithdegree2polynomialfeatures,Á(x).Then,yourdotproductwillbeoperateusingvectorsinaspaceofdimensionalityn(n+1)/2.

Thekerneltrickallowsyoutosavetime/spaceandcomputedotproductsinanndimensionalspace.

• CanweuseanyfunctionK(.,.)?– No!AfunctionK(x,z)isavalidkernelifitcorrespondstoaninnerproductin

some(perhapsinfinitedimensional)featurespace.

• Generalcondition: constructtheGrammatrix{K(xi ,zj)};checkthatit’spositivesemidefinite

44

(Notjustfordegree2polynomials)

Page 45: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Whichfunctionsarekernels?

KernelTrick: Youwanttoworkwithdegree2polynomialfeatures,Á(x).Then,yourdotproductwillbeoperateusingvectorsinaspaceofdimensionalityn(n+1)/2.

Thekerneltrickallowsyoutosavetime/spaceandcomputedotproductsinanndimensionalspace.

• CanweuseanyfunctionK(.,.)?– No!AfunctionK(x,z)isavalidkernelifitcorrespondstoaninnerproductin

some(perhapsinfinitedimensional)featurespace.

• Generalcondition: constructtheGrammatrix{K(xi ,zj)};checkthatit’spositivesemidefinite

45

(Notjustfordegree2polynomials)

Page 46: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Reminder:Positivesemi-definitematrices

AsymmetricmatrixMispositivesemi-definiteifitis– Foranyvectornon-zeroz,wehavezTMz¸ 0

(Ausefulpropertycharacterizingmanyinterestingmathematicalobjects)

46

Page 47: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

TheKernelMatrix

• TheGrammatrixofasetofnvectorsS={x1…xn}isthen×nmatrixG withGij =xiTxj– ThekernelmatrixistheGrammatrixof{φ(x1),…,φ(xn)}– (sizedependsonthe#ofexamples,notdimensionality)

• ShowingthatafunctionKisavalidkernel– Directapproach:Ifyouhavetheφ(xi),youhavetheGrammatrix(andit’seasyto

seethatitwillbepositivesemi-definite).Why?

– Indirect:IfyouhavetheKernel,writedowntheKernelmatrixKij,andshowthatitisalegitimatekernel,withoutanexplicitconstructionofφ(xi)

47

Page 48: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

TheKernelMatrix

• TheGrammatrixofasetofnvectorsS={x1…xn}isthen×nmatrixG withGij =xiTxj– ThekernelmatrixistheGrammatrixof{φ(x1),…,φ(xn)}– (sizedependsonthe#ofexamples,notdimensionality)

• ShowingthatafunctionKisavalidkernel– Directapproach:Ifyouhavetheφ(xi),youhavetheGrammatrix(andit’seasyto

seethatitwillbepositivesemi-definite).Why?

– Indirect:IfyouhavetheKernel,writedowntheKernelmatrixKij,andshowthatitisalegitimatekernel,withoutanexplicitconstructionofφ(xi)

48

Page 49: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Mercer’scondition

LetK(x,z)beafunctionthatmapstwondimensionalvectorstoarealnumber

Kisavalidkernelifforeveryfiniteset{x1,x2,! },foranychoiceofrealvaluedc1,c2,!,wehave

49

Page 50: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Polynomialkernels

• Linearkernel:k(x,z)=xTz

• Polynomialkernelofdegreed:k(x,z)=(xTz)d– onlydth-orderinteractions

• Polynomialkerneluptodegreed:k(x,z)=(xTz +c)d(c>0)– allinteractionsoforderdorlower

50

Page 51: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

GaussianKernel(ortheradialbasisfunctionkernel)

– (x−z)2:squaredEuclideandistancebetweenx andz– c=σ2:afreeparameter– verysmallc:K≈identitymatrix(everyitemisdifferent)– verylargec: K≈unitmatrix(allitemsarethesame)

– k(x,z)≈1whenx,zclose– k(x,z)≈0whenx,zdissimilar

51

Page 52: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

GaussianKernel(ortheradialbasisfunctionkernel)

– (x−z)2:squaredEuclideandistancebetweenx andz– c=σ2:afreeparameter– verysmallc:K≈identitymatrix(everyitemisdifferent)– verylargec: K≈unitmatrix(allitemsarethesame)

– k(x,z)≈1whenx,zclose– k(x,z)≈0whenx,zdissimilar

52

Exercises:1. Provethatthisisakernel.2. Whatisthe“blownup”featurespaceforthiskernel?

Page 53: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

ConstructingNewKernels

Youcanconstructnewkernelsk’(x,x’)fromexistingones:

– Multiplyingk(x,x’)byaconstantc

ck(x,x’)

– Multiplyingk(x,x’)byafunctionfappliedtox and x’

f(x)k(x,x’)f(x’)

– Applyingapolynomial(withnon-negativecoefficients)tok(x,x’)

P(k(x,x’))withP(z)=∑iaizi and ai≥0

– Exponentiatingk(x,x’)

exp(k(x,x’))

53

Page 54: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

ConstructingNewKernels(2)

• Youcanconstructk’(x,x’)fromk1(x,x’),k2(x,x’) by:– Addingk1(x,x’) andk2(x,x’):

k1(x,x’)+k2(x,x’)

– Multiplyingk1(x,x’)andk2(x,x’):k1(x,x’)k2(x,x’)

• Also:– Ifφ(x)2 Rm and km(z,z’)avalidkernelinRm,

k(x,x’) =km(φ(x),φ(x’))isalsoavalidkernel

– IfA isasymmetricpositivesemi-definitematrix,k(x,x’) =xAx’isalsoavalidkernel

54

Page 55: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

ConstructingNewKernels(2)

• Youcanconstructk’(x,x’)fromk1(x,x’),k2(x,x’) by:– Addingk1(x,x’) andk2(x,x’):

k1(x,x’)+k2(x,x’)

– Multiplyingk1(x,x’)andk2(x,x’):k1(x,x’)k2(x,x’)

• Also:– Ifφ(x)2 Rm and km(z,z’)avalidkernelinRm,

k(x,x’) =km(φ(x),φ(x’))isalsoavalidkernel

– IfA isasymmetricpositivesemi-definitematrix,k(x,x’) =xAx’isalsoavalidkernel

55

Page 56: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

KernelTrick:Anexample

Lettheblownupfeaturespacerepresentthespaceofall3nconjunctions.Then,

wheresame(x,z) isthenumberoffeaturesthathavethesamevalueforbothxandz

Example:Taken=3;x=(001),z=(011),wehaveconjunctionsofsize0,1,2,3Proof: letm=same(x,z);construct“surviving”conjunctionsby1. choosingtoincludeoneofthesekliteralswiththerightpolarityintheconjunctions,or2. choosingtonotincludeitatall.Conjunctionswithliteralsoutsidethissetdisappear.

56

Page 57: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Thislecture

ü Supportvectors

ü Kernels

ü Thekerneltrick

ü Propertiesofkernels

5. Anotherexampleofthekerneltrick

57

Page 58: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

KernelTrick:Anexample

Lettheblownupfeaturespacerepresentthespaceofall3nconjunctions.Then,

wheresame(x,z) isthenumberoffeaturesthathavethesamevalueforbothxandz

Example:Taken=3;x=(001),z=(011),wehaveconjunctionsofsize0,1,2,3Proof: letm=same(x,z);construct“surviving”conjunctionsby1. choosingtoincludeoneofthesekliteralswiththerightpolarityintheconjunctions,or2. choosingtonotincludeitatall.Conjunctionswithliteralsoutsidethissetdisappear.

58

Page 59: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

KernelTrick:Anexample

Lettheblownupfeaturespacerepresentthespaceofall3nconjunctions.Then,

wheresame(x,z) isthenumberoffeaturesthathavethesamevalueforbothxandz

Example:Taken=3;x=(001),z=(011),wehaveconjunctionsofsize0,1,2,3Proof: letm=same(x,z);construct“surviving”conjunctionsby1. choosingtoincludeoneofthesekliteralswiththerightpolarityintheconjunctions,or2. choosingtonotincludeitatall.Conjunctionswithliteralsoutsidethissetdisappear.

59

Page 60: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

KernelTrick:Anexample

Lettheblownupfeaturespacerepresentthespaceofall3nconjunctions.Then,

wheresame(x,z) isthenumberoffeaturesthathavethesamevalueforbothxandz

Example:Taken=3;x=(001),z=(011),wehaveconjunctionsofsize0,1,2,3Proof: letm=same(x,z);construct“surviving”conjunctionsby1. choosingtoincludeoneofthesekliteralswiththerightpolarityintheconjunctions,or2. choosingtonotincludeitatall.Conjunctionswithliteralsoutsidethissetdisappear.

60

Page 61: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

KernelTrick:Anexample

Lettheblownupfeaturespacerepresentthespaceofall3nconjunctions.Then,

wheresame(x,z) isthenumberoffeaturesthathavethesamevalueforbothxandz

Example:Taken=3;x=(001),z=(011),wehaveconjunctionsofsize0,1,2,3Proof: letm=same(x,z);construct“surviving”conjunctionsby1. choosingtoincludeoneofthesekliteralswiththerightpolarityintheconjunctions,or2. choosingtonotincludeitatall.Conjunctionswithliteralsoutsidethissetdisappear.

61

Page 62: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

KernelTrick:Anexample

Lettheblownupfeaturespacerepresentthespaceofall3nconjunctions.Then,

wheresame(x,z) isthenumberoffeaturesthathavethesamevalueforbothxandz

Example:Taken=3;x=(001),z=(011),wehaveconjunctionsofsize0,1,2,3Proof: letm=same(x,z);construct“surviving”conjunctionsby1. choosingtoincludeoneofthesekliteralswiththerightpolarityintheconjunctions,or2. choosingtonotincludeitatall.Conjunctionswithliteralsoutsidethissetdisappear.

62

Page 63: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Exercises

1. Showthatthisargumentworksforaspecificexample– TakeX={x1,x2,x3,x4}– Á(x) =Thespaceofall3n conjunctions;|Á(x)|=81– Considerx=(1100),z=(1101)– WriteÁ(x),Á(z),therepresentationofx,z intheÁ space– ComputeÁ(x)TÁ(z)– Showthat

K(x,z)=Á(x)TÁ(z)=åi Ái(z)Ái(x)=2same(x,z) =8

2. Trytodevelopanotherkernel,e.g.,wherethespaceofallconjunctionsofsize3(exactly)

63

Page 64: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is

Summary:Kerneltrick

• Tomakethefinalprediction,wearecomputingdotproducts

• Thekerneltrickisacomputationaltricktocomputedotproductsinhigherdimensionalspaces

• ThisisapplicablenotjusttoSVMs.ThesameideacanbeextendedtoPerceptrontoo:theKernelPerceptron

• Important:Alltheboundswehaveseen(eg:Perceptronbound,etc)dependontheunderlyingdimensionality– Bymovingtoahigherdimensionalspace,weareincurringapenalty

onsamplecomplexity

64


Recommended