236862 –Introduction to Sparse and Redundant ...€¦ · Linearized Kernel Dictionary Learning...

LinearizedKernelDictionaryLearning

AlonaGolts,Prof.MikiElad

236862– IntroductiontoSparseandRedundantRepresentations

4.1.18

WhatWeShallSeeToday

Alona Golts2

SparseRepresentationsasamodelforsignalprocessing

Thismodelissuccessfulinmachinelearningtasksaswell Kernelsarealsoextremely

popularinmachinelearning

Sparserepresentationsandkernels“givebirth”to

aninterestingcombination

Thisnewmodelhasitsshareofgrowingpainsinboth:

spaceruntime

Ourpre-processingcalledLKDL,preservesthe“good”,whiledealingwiththe“bad”

Wrightetal.(‘09) http://cs.stanford.edu/people/karpathyhttp://alex.smola.org/books.html

1) Introtosparserepresentations2) Introtokernels3) Kerneldictionarylearning4) Linearizedkerneldictionarylearning(LKDL)5) Resultsandsummary

Outline

3

Background

Ourwork

Alona Golts

SparseRepresentations

Introto

4 Alona Golts

WhyUseSparseRepresentations?Inpainting[2]

Compression[4]

[1]Dabov,Foi,Katkovnik andEgiazarian (‘07)[2]Mairal,Elad.andSapiro(‘08)[3]Yang,Wright,HuangandMa(‘10)[4]BrytandElad(‘08)

5

Super-Resolution[3]Denoising[1]

Alona Golts

SparseCoding§ “Sparsecoding”– representingasignalwithasparse

combinationof“dictionaryatoms”

dictionarysignal sparsevector

Naïvesolutionofsolving(∗):

§ scanningmq optionsofsupports,𝛄

§ solvingleastsquares§ choosingbestreconstruction…

NOTagoodidea!𝐱 ∈ 𝐑, 𝐃 ∈ 𝐑,×/

𝛄 ∈ 𝐑/

» •

“cardinality”(∗)argmin

𝛄𝐱 − 𝐃𝛄 6

6 s. t. 𝛄 ; ≤ q

6 Alona Golts

GreedyApproach- OMP

§ Step1:chooseatomthatbestmatches 𝐱

§ Nextsteps:giventhepreviouslyfoundatoms,choosenextonethatbestfits residual.j; =argmax 𝐫ABC, 𝐝F

Repeatq timesoruntiltargetthresholdisreached.

§ updatecoefficientsofsparsevectorandresidual:𝛄𝐭 = argmin

𝛄 𝐱 − 𝐃A𝛄A 6

6,𝐫A = 𝐱 − 𝐃A𝛄A

»•

7 Alona Golts

§ “Dictionarylearning”– findingasetofatomsandrepresentationsthat“bestsparsify”acollectionofinputs𝐗

DictionaryLearning

»

inputsignalmatrix

𝐗 ∈ 𝐑,×I

•

dictionary

𝐃 ∈ 𝐑,×/

sparserepresentationmatrix

𝚪 ∈ 𝐑/×I

argmin𝐃,𝚪

𝐗 − 𝐃𝚪 K6s. t. 𝛄L ; ≤ q, ∀i = 1…N

8 Alona Golts

argmin𝐃,𝚪

𝐗 − 𝐃𝚪 K6s. t. 𝛄L ; ≤ q, ∀i = 1…N

§ Basicstrategy:blockcoordinatedescent

§ IterateoverthefollowingforT iterations:

ØGiven𝐃,findsparserepresentations,𝚪ØGiven𝚪,updatedictionary,𝐃

o MOD[1]– updateentiredictionaryatonce.o KSVD[2]– updateoneatomatatime,alongwiththecoefficients,solvingarank-1SVDproblem.

DictionaryLearning

[1]Engan,Aake,Hakon andHusoy,(’99)[2]EladandAharon.(‘06)

9 Alona Golts

IntrotoKernels

10 Alona Golts

Φ“inputspace”𝒳(xC, x6)

“featurespace”ℱ

zC, z6, zV = (xC6, 2� xCx6, x66)

xC

x6

ClassificationProblem

11

zC

zV

z6

Alona Golts

§ Forthepreviousmapping,letuscalculatetheinnerproductbetweentwosignalsinthefeaturespace:

“kernel”

Φ x ,Φ(y) = xC6, 2� xCx6, x66yC6

2� yCy6y66

=

KernelTrick

= xC6yC6 + 2xCx6yCy6 + x66y66 = xCyC + x6y6 6 = 𝐱, 𝐲 6

= κ x, y12 Alona Golts

Thefollowingtwoareequivalent:

§ κ ispositivedefinite(PD),i.e.,foranytrainingpoints 𝐱C, … , 𝐱I ∈𝒳 andforarbitraryscalars aC, … , aI ∈ 𝐑,thefollowingholds:

§ ThereexitsamapΦ intoadot-productspaceℋ s.t.:

PositiveDefiniteKernels

^aLaF𝐊L,F ≥ 0, 𝐊L,F = κ(𝐱L, 𝐱F)�

L,F

κ 𝐱, 𝐱b = Φ 𝐱 ,Φ(𝐱b)

13 Alona Golts

Commonlyusedkernels:

Linear: κ 𝐱, 𝐱b = 𝐱, 𝐱b + cPolynomial: κ 𝐱, 𝐱b = 𝐱, 𝐱b + c d

Gaussian/RBF: κ 𝐱, 𝐱b = exp − 𝐱 − 𝐱b 6/2σ6

Thekernelmatrixconsistsofinnerproductsofthefeaturevectorsinthehighdimensionalspace.

Typesofkernels

𝐗 = 𝐱C, 𝐱𝟐, … , 𝐱𝐍 , 𝐊 = Φ 𝐗 kΦ 𝐗 ,

𝐊L,F = κ 𝐱L, 𝐱F = Φ 𝐱L , Φ 𝐱F

14 Alona Golts

KernelMatrix

15

𝐗∈ 𝐑,×I 𝐊

∈ 𝐑I×I

Alona Golts

§ Kernelsprovidepowerfulrepresentationalpowertolinearmachinelearningalgorithms,thushavebeenusedextensivelyoverthepast20years:§ SVM§ KernelPCA§ KernelRegression§ KernelK-means§ KernelNN§ ...

KernelsinMachineLearning

16 Alona Golts

ClassificationusingSparsity

§ Thesparsitymodelisalsoeffectiveindiscriminativetasks,aswellasgenerativeones:§ “SparseRepresentationforSignalClassification”,Huangetal.,(‘06)§ “RobustFaceRecognitionusingSparseRepresentations”,Wrightetal.,(’09)§ “LinearSpatialPyramidMatchingUsingSparseCodingforImageClassification”,Yangetal.,(‘09)§ “Sparserepresentationforcomputervisionandpatternrecognition”,Wrightetal.,(’10)§ “RobustVisualTrackingandVehicleClassificationviaSparseRepresentation”,Meietal.,(’11)§ “LearningSparseRepresentationsforHumanActionRecognition”,Guhaetal.,(’12)§ “LearningStructuredLow-rankRepresentationsforImageClassification”,Zhangetal.,(’13)§ “MultiviewHessianDiscriminativeSparseCodingforImageAnnotation”,Liuetal.,(’14)§ “LearningDiscriminativeSparseRepresentationsforHyperspectralImageClassification”,Duetal.,(’15)

§ Whythennot“kernelize”classicsparserepresentationalgorithms?

17 Alona Golts

KernelSparseRepresentations

18

§ Inthepast5yearstherehasbeenamultitudeofworkconcentratedonkernelsparserepresentations.

§ Someexamples:§ Vincent&Bengio,(’02)§ Gao,Tsang&Chia,(’10)§ Zhang,Zho,Chang,Liu,Wang&Li,(’12)§ Nguyen,Patel,Nasarabadi&Chellappa,(’12)

§ Wechoosetoconcentrateonkerneldictionarylearningtohighlightthebenefitofourapproach.

Alona Golts

KernelDictionaryLearning

19

Nguyen,Patel,NasrabadiandChellappa(‘12)

Alona Golts

§ Performlineardictionarylearninginfeaturespace:

(*)“Representertheorem”- KimeldorfandWahba(‘71)(*)“DoubleSparsity”- Rubinstein,ZibulevskyandElad(‘10)


argminl 𝐃 ,𝚪

𝚽(𝐗) −𝚽 𝐃 𝚪 K6s. t. 𝛄L ; ≤ q, ∀i = 1…N

argmin𝐀,𝚪

𝚽(𝐗) − 𝚽 𝐗 𝐀𝚪 K6s. t. 𝛄L ; ≤ q, ∀i = 1…N

∗ Φ 𝐃 = Φ 𝐗 𝐀, 𝐀 ∈ 𝐑I×/

𝐗 → Φ 𝐗 ,𝐃 → Φ 𝐃

20 Alona Golts

𝚪∈ 𝐑/×I


Φ(𝐗)

∈ 𝐑?×I

𝐀

∈ 𝐑I×/

21

Φ(𝐗) ~

argminr,𝚪

𝚽(𝐗) − 𝚽 𝐗 𝐀𝚪 K6s. t. 𝛄L ; ≤ q, ∀i = 1…N

Alona Golts

j; = argmax Φ 𝐱 − Φ 𝐗 𝐀ABC𝛄ABC,Φ 𝐗 𝐚F

= 𝐊 𝐱, 𝐗 𝐚F − 𝛄ABCk 𝐀ABCk 𝐊 𝐗, 𝐗 𝐚F

AS:Chooseatomthatbestmatchesresidual:

“Kernelization”ofOMP

j; = argmax (𝐱 − 𝐃ABC𝛄ABC, 𝐝F

𝐫ABCClassic:

Kernel:

Inputsignal Trainset ∈ 𝐑C×I ∈ 𝐑I×I

22 Alona Golts

Kernel:

𝛄A = argmin𝛄

𝚽 𝐱 − Φ 𝐗 𝐀A𝛄 6 = Φ 𝐗 𝐀A t𝚽 𝐱

= 𝐀Ak𝐊 𝐗, 𝐗 𝐀ABC𝐀𝐭𝐓𝐊 𝐗, 𝐱

LS:Updatesparsevectorusingleastsquares:

“Kernelization”ofOMP

𝛄A = argmin𝛄

𝐱 − 𝐃A𝛄 6 = 𝐃Ak𝐃ABC𝐃Ak𝐱

Classic:

23 Alona Golts

§ Oncethesparserepresentation𝚪 isknown,update𝐀:

§ UpdateforKernelMOD:

§ KSVDcanbeupdatedtoousingkernelsonly

“Kernelization”ofMOD

𝐀 = 𝚪t = 𝚪k 𝚪𝚪𝐓 BC

argmin𝐀

Φ 𝐗 −Φ 𝐗 𝐀𝚪 K6

24

More

Alona Golts

Complexity:Step:

Runtime:

N − numberofsignalsq − targetcardinalityd − signaldimension

ProblemswithKDL

𝐊 ∈ 𝐑I×I

Memory:𝐗 ∈ 𝐑,×I

𝑂 dq + d

𝑂 N6 + Nq + N

𝑂 dq6 + dq + qV

𝑂 N6q + Nq + qV

OMP- AtomSelection

KOMP- AtomSelection

OMP- LeastSquares

KOMP– LeastSquares

N ≫ d ≫ q

25 Alona Golts

§ Introducesnonlinearitytosparserepresentationalgorithms.

§ Fairlyeasytosubstitutedotproductswithkernels.

§ Flexibilitywithchoiceofkernel.

TheBadTheGood§ Highdependenceonapossiblyhuge

kernelmatrix.

§ Complexityofalgorithmsdependsonnumberofsignalsinsteadoftheirdimension.

§ Aspecific“tailoring”ofthekernelisneededineachindividualalgorithm.

§ Algorithmcannotalwaysbewrittenusingdotproducts.

KDL:ProsandCons

26 Alona Golts

LinearizedKernelDictionary

Learning27

OurWork:

Alona Golts

OurObjective

Incorporatenonlinearityintodictionarylearningbykernelizing

Fasterruntime,lessmemory

TurninganyDLintokernelDLinaneasyway

28 Alona Golts

AnyPDkernelmatrixcanbedecomposedinto:

Zhang,Lan,WangandMoerchen,(‘12)

“VirtualSamples”

𝐅 ∈ 𝐑I×I

KernelMatrixDecomposition

𝐊 = Φ 𝐗 kΦ 𝐗 = 𝐅k𝐅

OriginalSamples

𝐗 ∈ 𝐑,×I

29 Alona Golts

𝐊 = 𝐅k𝐅

argmin𝐃,𝚪

𝐅 − 𝐃𝚪 K6

LinearizedKernelDL(LKDL)

Decomposekernelmatrixtoinnerproductof“virtualsamples”

Performclassical(linear)DLonvirtualsamples

Produceclassificationresult

Alona Golts30

computationalcost:𝑂 NV /𝑂(N6k)

§ Eigendecomposition:

§ Notpracticalforlargekernelmatrices:

HowtoDecomposeK?

𝐔𝚺𝐔k = 𝐊 = 𝐅k𝐅→ 𝐅 = 𝚺C/6𝐔k

𝐊 ∈ 𝐑I×I

31 Alona Golts

𝐊 = 𝐖 𝐒𝐓𝐒 𝐁

c columnsfrom𝐊 → 𝐂

32

Sampling

𝐊 =∈ 𝐑I×I

= 𝐂 𝐜 ≪ 𝐍

𝐂 =

NyströmMethod

∈ 𝐑I×�

§ FindanapproximationofthePDmatrix: 𝐊� ≈ 𝐊

Alona Golts

33

Approximationfork ≤ c

𝐊� = 𝐂𝐖t𝐂k

NyströmMethod

𝐂 =∈ 𝐑I×�

𝐖 =∈ 𝐑𝐜×𝐜

𝐂𝐓

=

𝐂 𝐖t𝐊�⋅ ⋅

Alona Golts

34

VirtualSampleComputation

Nyström:

𝐊� = 𝐂𝐖t𝐂k

eigen-decomposition:𝐖 = 𝐕𝚲𝐕k

𝐖t = 𝐕𝚲𝐕k

= ⋅ ⋅𝐖t

∈ 𝐑𝐜×𝐜

𝐕

∈ 𝐑𝐜×�

𝚲t

∈ 𝐑�×�

𝐕k

∈ 𝐑�×�

c − numberofsampledcolumnsinNyströmk − degreeofeigen-decomposition

Alona Golts

35

“virtualsample”computation:𝐊� = 𝐅k𝐅 → 𝐅 = 𝚲t C/6𝐕k𝐂k

VirtualSampleComputation

𝚲t C/6

∈ 𝐑�×�

𝐕k

∈ 𝐑�×�

𝐂𝐓

∈ 𝐑�×I

𝐅

∈ 𝐑�×I

= ⋅ ⋅

𝐊

∈ 𝐑I×I

Alona Golts

36

TrainL dictionaries,oneforeachclass:argmin𝐃�,𝚪�

𝐗L − 𝐃L𝚪L K6

ClassificationusingDL

𝐗 =

𝐗C 𝐗6 𝐗�

…

𝐃C 𝐃6 𝐃�…

KSVD

Alona Golts

37


SparsecodeeachtestsampleoverL dictionaries:argmin

𝛄�𝐱A��A − 𝐃L𝛄L 6

6

s. t. 𝛄L ; ≤ q,∀i = 1…L

𝐃C 𝐃6 𝐃�

…

𝐱A��A

𝛄C 𝛄6 𝛄�

…OMP

Alona Golts

38


Chosenclassistheonewithminimal

representationerror:

class = argminL

rL 6

𝐱A��A − 𝐃L𝛄L 66,

∀i = 1…L

𝐃C 𝛄C𝐱A��ArC = − ⋅

𝐃6 𝛄6𝐱A��A

r6 = − ⋅

𝐱A��A 𝐃� 𝛄�

r� = − ⋅

…

𝑐𝑙𝑎𝑠𝑠 = argminL

rL 6

∀i = 1…L

Alona Golts

TrainL dictionaries:argmin𝐀�,𝚪�

Φ 𝐗L − Φ 𝐗L 𝐀L𝚪L K6

39

ClassificationusingKDL

𝐗C𝐊C

𝐗6𝐊6

𝐗�𝐊�

KKSVD

𝐀C 𝐀6 𝐀�

…

… ∈ 𝐑��×/

Alona Golts

40


𝐊C 𝐀C 𝐊6 𝐀6 𝐊� 𝐀�

…

SparsecodeeachtestsampleoverL dictionaries:

solve:argmin𝛄�

Φ(𝐱A��A) − Φ 𝐗L 𝐀L𝛄L 66s. t. 𝛄L ; ≤ q, ∀i = 1…L

𝐤A��AC 𝐤A��A6 𝐤A��A�

κ(𝐱A��A, 𝐗C) κ(𝐱A��A, 𝐗6) κ(𝐱A��A, 𝐗�)

KOMP𝛄C 𝛄6 𝛄�

…

Alona Golts

41


𝐊C 𝐀C 𝐤A��AC𝛄C

Chosenclassistheonewithminimalrepresentationerror:


rL = argmin Φ 𝐱A��A − Φ(𝐗L)𝐀L𝛄L 66, ∀i = 1…L

𝐊6 𝐀6 𝐤A��A6𝛄6

𝐊� 𝐀� 𝐤A��A�𝛄�

…


rL∀i = 1…L

Alona Golts

ClassificationusingLKDL

Samplesignalsfrom

trainingset:𝐗 → 𝐗�

𝟏.

Computevirtualtestsample

𝐟A��A = 𝚲t C/6𝐕k𝐜A��Ak

7.

Compute

𝐂 = 𝐊 𝐗, 𝐗�

𝟐.Compute

𝐖 = 𝐊 𝐗�, 𝐗�

3.

Approximate𝐖

𝐖 = 𝐕𝚲𝐕k

4.

Computevirtualtrainset

𝐅 = 𝚲t C/6𝐕k𝐂k

5.Compute

𝐜A��A = 𝐊(𝐱A��A, 𝐗�)

𝟔.


42 Alona Golts

ResultsLKDL

43 Alona Golts

1. LKDLimprovesdiscriminabilityoverlinearDL.

2. LKDLworksasgoodorbetterthanKDL.

3. LKDLismoreefficientwithrespecttoKDL.

4. LKDLcanbeincorporatedseamlesslyinvirtuallyanyDLalgorithm.

Results- Objective

44 Alona Golts

256signal dim.

7291size oftrainset

2007sizeoftestset

10(digits)#classes

300# atomsperclass

5cardinality

5#iterations

Polynomialkernel

2kernelparameter

20%oftrainsamples

c – numberofsamplesinNyström

256k – approx.dim.

USPSDataset

45 Alona Golts

err =𝐊 − 𝐊� K𝐊 K

ApproximationQuality

46 Alona Golts

Dependenceonc/N

47 Alona Golts

EffectofNoise: EffectofMissingPixels:

RobustnesstoCorruptions

48 Alona Golts

784signal dim.

60,000size oftrainset

10,000sizeoftestset

10(digits)#classes

700# atomsperclass

11cardinality

2#iterations

Polynomialkernel

2kernelparameter

15%oftrainsamples

c – numberofsamplesinNyström

784k – approx.dim.

MNISTDataset

49

LeCunetal.(‘98)

Alona Golts

Accuracy

TestTime

TrainTime

RuntimeImprovement

50

§ Introducesnonlinearitytosparserepresentationalgorithms.

§ Canscale-upanddealwithrelativelyhighnumberofinputsamples

§ Canbeeasilyaddedtoanydictionarylearningalgorithm.

§ Flexibilitywithchoiceofkernel.

TheBadTheGood§ Nyströmmethodrequirescalculating

andstoringthematrix𝐂,whichislargeinitself

§ Eigen-decompositionof𝐖 iscomputationallydemandingforverylargedatasets

§ Virtualsamplesdon’tusuallyrelatetotheoriginaldata,thusimageprocessingtasksofflimits

LKDL– ProsandCons

01/03/16MScSeminar,AlonaGolts51

§ TherearebenefitsinusingkernelsinDL-basedclassificationtasks.

§ KernelDLimprovesaccuracyoverDLbutsuffersfromdimensionalityproblems.

§ LKDL– amethodofcombiningkernelsasfeaturesandusinglinearDLontopofthem,waspresented.

§ LKDLprovidescomparableaccuracytoKDL,withfastertrainingandtesting.

§ LKDLcancombinedontopofanyDLalgorithm.

Summary


ThankYou!


Updatestage:

Φ 𝐗 −Φ 𝐗 𝐀𝚪 K6 = Φ 𝐗 − Φ 𝐗 ∑ 𝐚𝐣𝛄F/

F C K6 =

= Φ 𝐗 𝐈 −^ 𝐚F𝛄F/

F¢�− Φ 𝐗 𝐚�𝛄F

K

6

= Φ 𝐗 𝐄� − Φ 𝐗 𝐌� K6

𝐄�� = 𝐄�𝛀� → Φ 𝐗 𝐄�� − Φ 𝐗 𝐚�𝛄�� K6

Φ 𝐗 𝐄�� = 𝐔𝚺𝐕k → Φ 𝐗 𝐚�𝛄�� = σC𝐮C𝐯Ck,

𝛄�� = σC𝐯Ck, Φ 𝐗 𝐚� = 𝐮C, 𝐚𝐤 = σCBC𝐄��𝐯C

54

Rank-1

KernelKSVD

Return

Alona Golts

Date post:	03-Oct-2020
Category:	Documents
Upload:	others
View:	8 times
Download:	0 times

236862 –Introduction to Sparse and Redundant ...€¦ · Linearized Kernel Dictionary Learning...

Documents