Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n...

MachineLearning

KernelsandtheKernelTrick

1

Supportvectormachines

• Trainingbymaximizingmargin

• TheSVMobjective

• SolvingtheSVMoptimizationproblem

• Supportvectors,dualsandkernels

2


• Trainingbymaximizingmargin

• TheSVMobjective

• SolvingtheSVMoptimizationproblem

• Supportvectors,dualsandkernels

3

Thislecture

1. Supportvectors

2. Kernels

3. Thekerneltrick

4. Propertiesofkernels

5. Anotherexampleofthekerneltrick

4

Thislecture

1. Supportvectors

2. Kernels

3. Thekerneltrick



5

Sofarwehaveseen

• Supportvectormachines

• Hingelossandoptimizingtheregularizedloss

Morebroadly,differentalgorithmsforlearninglinearclassifiers

6

Sofarwehaveseen

• Supportvectormachines

• Hingelossandoptimizingtheregularizedloss

Morebroadly,differentalgorithmsforlearninglinearclassifiers

Whataboutnon-linearmodels?

7

Onewaytolearnnon-linearmodels

Explicitlyintroducenon-linearityintothefeaturespace

8

Ifthetrueseparatorisquadratic



9

Ifthetrueseparatorisquadratic Transformallinputpointsas



10

Ifthetrueseparatorisquadratic Transformallinputpointsas

Now,wecantrytofindaweightvectorinthishigherdimensionalspace

Thatis,predictusingwTÁ(x1,x2)¸ b

SVM:Primals andduals

TheSVMobjective

11

Thisiscalledtheprimalformoftheobjective

Thiscanbeconvertedtoitsdualform,whichwillletusproveaveryusefulproperty

SVM:Primals andduals

TheSVMobjective

12

Thisiscalledtheprimalformoftheobjective

Thiscanbeconvertedtoitsdualform,whichwillletusproveaveryusefulproperty

Anotheroptimizationproblem

HasthepropertythatmaxDual=minPrimal


Letw betheminimizeroftheSVMproblemforsomedatasetwithmexamples:{(xi,yi)}Then,fori =1…m,thereexist®i¸ 0suchthattheoptimumwcanbewrittenas

13



Furthermore,

14

++

++++++

-- --

-- -- --

---- --

--

+ -

Allpointsoutsidethemargin



Furthermore,

15

++

++++++

-- --

-- -- --

---- --

--

+ -

Allpointsonthewrongsideofthemargin



Furthermore,

16

++

++++++

-- --

-- -- --

---- --

--

+ -

Allpointsonthemargin

Supportvectors

Theweightvectoriscompletelydefinedbytrainingexampleswhose®isarenotzero

Theseexamplesarecalledthesupportvectors

17

Thislecture

ü Supportvectors

2. Kernels

3. Thekerneltrick



18

Predictingwithlinearclassifiers

• Prediction=and

• Thatis,wejustshowedthat

– Weonlyneedtocomputedotproductsbetweentrainingexamplesandthenewexamplex

19


• Prediction=and



20


• Prediction=and



21


• Prediction=and



• Thisistrueevenifwemapexamplestoahighdimensionalspace

22


• Prediction=and


– Thatisweonlyneedtocomputedotproductsbetweentrainingexamplesandthenewexamplex

• Thisistrueevenifwemapexamplestoahighdimensionalspace

23

Dotproductsinhighdimensionalspaces

Letusdefineadotproductinthehighdimensionalspace

Sopredictionwiththishighdimensionalliftingmapis

24

because




25

because




26

because

Kernelbasedmethods

Whatdoesthisnewformulationgiveus?IfwehavetocomputeÁ everytimeanyway,wegainnothing

IfwecancomputethevalueofKwithoutexplicitlywritingtheblownuprepresentation,thenwewillhaveacomputationaladvantage

27

Predictusing

Kernelbasedmethods

Whatdoesthisnewformulationgiveus?IfwehavetocomputeÁ everytimeanyway,wegainnothing

IfwecancomputethevalueofKwithoutexplicitlywritingtheblownuprepresentation,thenwewillhaveacomputationaladvantage.

28

Predictusing

Thislecture

ü Supportvectors

ü Kernels

3. Thekerneltrick



29

Example:PolynomialKernel

• Giventwoexamplesx andz wewanttomapthemtoahighdimensionalspace[forexample, quadratic]

30



31

Alldegreezeroterms



32

Alldegreezeroterms Alldegreeoneterms



33

Alldegreezeroterms Alldegreeoneterms Alldegreetwoterms



andcomputethedotproductA=Á(x)TÁ (z)[takestime]

34

Alldegreezeroterms Alldegreeoneterms Alldegreetwoterms




• Instead,intheoriginalspace,compute

Theorem:A=B(Coefficientsdonotreallymatter)

35






36






37





Claim:A=B(Coefficientsdonotreallymatter)

38

Example:Twodimensions,quadratickernel

39

A=Á(x)TÁ (z)

TheKernelTrick

SupposewewishtocomputeK(x,z)= Á(x)TÁ (z)

HereÁ mapsx andztoahighdimensionalspace

TheKernelTrick:Savetime/spacebycomputingthevalueofK(x,z)byperformingoperationsintheoriginalspace(withoutafeaturetransformation!)

40

Computingdotproductsefficiently

KernelTrick: Youwanttoworkwithdegree2polynomialfeatures,Á(x).Then,yourdotproductwillbeoperateusingvectorsinaspaceofdimensionalityn(n+1)/2.

Thekerneltrickallowsyoutosavetime/spaceandcomputedotproductsinanndimensionalspace.

• CanweuseanyfunctionK(.,.)?– No!AfunctionK(x,z)isavalidkernelifitcorrespondstoaninnerproductin

some(perhapsinfinitedimensional)featurespace.

• Generalcondition: constructtheGrammatrix{K(xi ,zj)};checkthatit’spositivesemidefinite

41

(Notjustfordegree2polynomials)

Thislecture

ü Supportvectors

ü Kernels

ü Thekerneltrick



42

Whichfunctionsarekernels?






43








44








45


Reminder:Positivesemi-definitematrices

AsymmetricmatrixMispositivesemi-definiteifitis– Foranyvectornon-zeroz,wehavezTMz¸ 0

(Ausefulpropertycharacterizingmanyinterestingmathematicalobjects)

46

TheKernelMatrix

• TheGrammatrixofasetofnvectorsS={x1…xn}isthen×nmatrixG withGij =xiTxj– ThekernelmatrixistheGrammatrixof{φ(x1),…,φ(xn)}– (sizedependsonthe#ofexamples,notdimensionality)

• ShowingthatafunctionKisavalidkernel– Directapproach:Ifyouhavetheφ(xi),youhavetheGrammatrix(andit’seasyto

seethatitwillbepositivesemi-definite).Why?

– Indirect:IfyouhavetheKernel,writedowntheKernelmatrixKij,andshowthatitisalegitimatekernel,withoutanexplicitconstructionofφ(xi)

47

TheKernelMatrix

• TheGrammatrixofasetofnvectorsS={x1…xn}isthen×nmatrixG withGij =xiTxj– ThekernelmatrixistheGrammatrixof{φ(x1),…,φ(xn)}– (sizedependsonthe#ofexamples,notdimensionality)

• ShowingthatafunctionKisavalidkernel– Directapproach:Ifyouhavetheφ(xi),youhavetheGrammatrix(andit’seasyto

seethatitwillbepositivesemi-definite).Why?

– Indirect:IfyouhavetheKernel,writedowntheKernelmatrixKij,andshowthatitisalegitimatekernel,withoutanexplicitconstructionofφ(xi)

48

Mercer’scondition

LetK(x,z)beafunctionthatmapstwondimensionalvectorstoarealnumber

Kisavalidkernelifforeveryfiniteset{x1,x2,! },foranychoiceofrealvaluedc1,c2,!,wehave

49

Polynomialkernels

• Linearkernel:k(x,z)=xTz

• Polynomialkernelofdegreed:k(x,z)=(xTz)d– onlydth-orderinteractions

• Polynomialkerneluptodegreed:k(x,z)=(xTz +c)d(c>0)– allinteractionsoforderdorlower

50

GaussianKernel(ortheradialbasisfunctionkernel)

– (x−z)2:squaredEuclideandistancebetweenx andz– c=σ2:afreeparameter– verysmallc:K≈identitymatrix(everyitemisdifferent)– verylargec: K≈unitmatrix(allitemsarethesame)

– k(x,z)≈1whenx,zclose– k(x,z)≈0whenx,zdissimilar

51

GaussianKernel(ortheradialbasisfunctionkernel)

– (x−z)2:squaredEuclideandistancebetweenx andz– c=σ2:afreeparameter– verysmallc:K≈identitymatrix(everyitemisdifferent)– verylargec: K≈unitmatrix(allitemsarethesame)

– k(x,z)≈1whenx,zclose– k(x,z)≈0whenx,zdissimilar

52

Exercises:1. Provethatthisisakernel.2. Whatisthe“blownup”featurespaceforthiskernel?

ConstructingNewKernels

Youcanconstructnewkernelsk’(x,x’)fromexistingones:

– Multiplyingk(x,x’)byaconstantc

ck(x,x’)

– Multiplyingk(x,x’)byafunctionfappliedtox and x’

f(x)k(x,x’)f(x’)

– Applyingapolynomial(withnon-negativecoefficients)tok(x,x’)

P(k(x,x’))withP(z)=∑iaizi and ai≥0

– Exponentiatingk(x,x’)

exp(k(x,x’))

53

ConstructingNewKernels(2)

• Youcanconstructk’(x,x’)fromk1(x,x’),k2(x,x’) by:– Addingk1(x,x’) andk2(x,x’):

k1(x,x’)+k2(x,x’)

– Multiplyingk1(x,x’)andk2(x,x’):k1(x,x’)k2(x,x’)

• Also:– Ifφ(x)2 Rm and km(z,z’)avalidkernelinRm,

k(x,x’) =km(φ(x),φ(x’))isalsoavalidkernel

– IfA isasymmetricpositivesemi-definitematrix,k(x,x’) =xAx’isalsoavalidkernel

54

ConstructingNewKernels(2)

• Youcanconstructk’(x,x’)fromk1(x,x’),k2(x,x’) by:– Addingk1(x,x’) andk2(x,x’):

k1(x,x’)+k2(x,x’)

– Multiplyingk1(x,x’)andk2(x,x’):k1(x,x’)k2(x,x’)

• Also:– Ifφ(x)2 Rm and km(z,z’)avalidkernelinRm,

k(x,x’) =km(φ(x),φ(x’))isalsoavalidkernel

– IfA isasymmetricpositivesemi-definitematrix,k(x,x’) =xAx’isalsoavalidkernel

55

KernelTrick:Anexample

Lettheblownupfeaturespacerepresentthespaceofall3nconjunctions.Then,

wheresame(x,z) isthenumberoffeaturesthathavethesamevalueforbothxandz

Example:Taken=3;x=(001),z=(011),wehaveconjunctionsofsize0,1,2,3Proof: letm=same(x,z);construct“surviving”conjunctionsby1. choosingtoincludeoneofthesekliteralswiththerightpolarityintheconjunctions,or2. choosingtonotincludeitatall.Conjunctionswithliteralsoutsidethissetdisappear.

56

Thislecture

ü Supportvectors

ü Kernels

ü Thekerneltrick

ü Propertiesofkernels


57





58





59





60





61





62

Exercises

1. Showthatthisargumentworksforaspecificexample– TakeX={x1,x2,x3,x4}– Á(x) =Thespaceofall3n conjunctions;|Á(x)|=81– Considerx=(1100),z=(1101)– WriteÁ(x),Á(z),therepresentationofx,z intheÁ space– ComputeÁ(x)TÁ(z)– Showthat

K(x,z)=Á(x)TÁ(z)=åi Ái(z)Ái(x)=2same(x,z) =8

2. Trytodevelopanotherkernel,e.g.,wherethespaceofallconjunctionsofsize3(exactly)

63

Summary:Kerneltrick

• Tomakethefinalprediction,wearecomputingdotproducts

• Thekerneltrickisacomputationaltricktocomputedotproductsinhigherdimensionalspaces

• ThisisapplicablenotjusttoSVMs.ThesameideacanbeextendedtoPerceptrontoo:theKernelPerceptron

• Important:Alltheboundswehaveseen(eg:Perceptronbound,etc)dependontheunderlyingdimensionality– Bymovingtoahigherdimensionalspace,weareincurringapenalty

onsamplecomplexity

64

Date post:	12-Jul-2020
Category:	Documents
Upload:	others
View:	6 times
Download:	0 times

Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n...

Documents