Deep Learning III Unsupervised Learning

DeepLearningIIIUnsupervisedLearning

RussSalakhutdinov

Machine Learning Department Carnegie Mellon University

Canadian Institute of Advanced Research

UnsupervisedLearning

Non-probabilis;cModelsØ  SparseCodingØ  AutoencodersØ  Others(e.g.k-means)

ExplicitDensityp(x)

Probabilis;c(Genera;ve)Models

TractableModelsØ  Fullyobserved

BeliefNetsØ  NADEØ  PixelRNN

Non-TractableModelsØ  BoltzmannMachinesØ  Varia;onal

AutoencodersØ  HelmholtzMachinesØ  Manyothers…

Ø  Genera;veAdversarialNetworks

Ø  MomentMatchingNetworks

ImplicitDensity

TalkRoadmap• BasicBuildingBlocks:

Ø  SparseCodingØ  Autoencoders

• DeepGenera;veModelsØ  RestrictedBoltzmannMachinesØ  DeepBeliefNetworksandDeepBoltzmannMachinesØ  HelmholtzMachines/Varia;onalAutoencoders

• Genera;veAdversarialNetworks

• ModelEvalua;on

h3

h2

h1

v

W3

W2

W1

h3

h2

h1

v

W3

W2

W1

Deep Belief Network Deep Boltzmann Machine

DBNsvs.DBMs

DBNsarehybridmodels:• InferenceinDBNsisproblema;cduetoexplainingaway.• Onlygreedypretrainig,nojointop/miza/onoveralllayers.• Approximateinferenceisfeed-forward:nobo6om-upandtop-down.

Mathema;calFormula;on

h3

h2

h1

v

W3

W2

W1

modelparameters

•  BoVom-upandTop-down:

DeepBoltzmannMachine

BoVom-up Top-Down

Unlikemanyexis;ngfeed-forwardmodels:ConvNet(LeCun),HMAX(Poggioet.al.),DeepBeliefNets(Hintonet.al.)

•  Dependenciesbetweenhiddenvariables.•  Allconnec;onsareundirected.

Input

h3

h2

h1

v

W3

W2

W1

NeuralNetworkOutput

h3

h2

h1

v

W3

W2

W1



h3

h2

h1

v

W3

W2

W1

DeepBeliefNetwork

Unlikemanyexis;ngfeed-forwardmodels:ConvNet(LeCun),HMAX(Poggio),DeepBeliefNets(Hinton)

Input

h3

h2

h1

v

W3

W2

W1

h3

h2

h1

v

W3

W2

W1


DeepBoltzmannMachine DeepBeliefNetwork

h3

h2

h1

v

W3

W2

W1

Unlikemanyexis;ngfeed-forwardmodels:ConvNet(LeCun),HMAX(Poggio),DeepBeliefNets(Hinton)

inference

NeuralNetworkOutput

Input


modelparameters

Maximumlikelihoodlearning:

Problem:Bothexpecta;onsareintractable!

Learningruleforundirectedgraphicalmodels:MRFs,CRFs,Factorgraphs.

•  Dependenciesbetweenhiddenvariables.


h3

h2

h1

v

W3

W2

W1

ApproximateLearning

(Approximate)MaximumLikelihood:

Notfactorialanymore!

h3

h2

h1

v

W3

W2

W1

•  Bothexpecta;onsareintractable!

Data

ApproximateLearning

(Approximate)MaximumLikelihood:h3

h2

h1

v

W3

W2

W1


ApproximateLearning

(Approximate)MaximumLikelihood:


h3

h2

h1

v

W3

W2

W1 Varia;onalInference

Stochas;cApproxima;on(MCMC-based)

PreviousWorkManyapproachesforlearningBoltzmannmachineshavebeenproposedoverthelast20years:• HintonandSejnowski(1983),• PetersonandAnderson(1987)• Galland(1991)• KappenandRodriguez(1998)• Lawrence,Bishop,andJordan(1998)• Tanaka(1998)• WellingandHinton(2002)• ZhuandLiu(2002)• WellingandTeh(2003)• YasudaandTanaka(2009)

ManyofthepreviousapproacheswerenotsuccessfulforlearninggeneralBoltzmannmachineswithhiddenvariables.

Real-worldapplica;ons–thousandsofhiddenandobservedvariableswithmillionsofparameters.

AlgorithmsbasedonContras;veDivergence,ScoreMatching,Pseudo-Likelihood,CompositeLikelihood,MCMC-MLE,PiecewiseLearning,cannothandlemul;plelayersofhiddenvariables.

NewLearningAlgorithm

Condi;onal Uncondi;onal

PosteriorInference SimulatefromtheModel

Approximatecondi;onal

Approximatethejointdistribu;on

(Salakhutdinov, 2008; NIPS 2009)


PosteriorInference SimulatefromtheModel


Data-dependent



Data-independent

density Match

(Salakhutdinov, 2008; NIPS 2009)


PosteriorInference


Data-dependent



Data-independent

Match

KeyIdea:

MarkovChainMonteCarlo

Data-dependent:Varia/onalInference,mean-fieldtheoryData-independent:Stochas/cApproxima/on,MCMCbased

Mean-Field

SimulatefromtheModel

h2

h1

v

Timet=1

Stochas;cApproxima;on

Update Updateh2

h1

v

t=2h2

h1

v

t=3

•  Generate bysimula;ngfromaMarkovchainthatleavesinvariant(e.g.GibbsorM-Hsampler)

•  Update byreplacingintractable withapointes;mate

Inprac;cewesimulateseveralMarkovchainsinparallel.(Robbins and Monro, Ann. Math. Stats, 1957; L. Younes, Probability Theory 1989)

Updateandsequen;ally,where

LearningAlgorithmUpdateruledecomposes:

Truegradient Perturba;ontermAlmostsureconvergenceguaranteesaslearningrate

Problem:High-dimensionaldata:theprobabilitylandscapeishighlymul;modal.

Connec;onstothetheoryofstochas;capproxima;onandadap;veMCMC.

Keyinsight:Thetransi;onoperatorcanbeanyvalidtransi;onoperator–TemperedTransi;ons,Parallel/SimulatedTempering.


PosteriorInference

Mean-Field

Varia;onalInferenceApproximateintractabledistribu;onwithsimpler,tractabledistribu;on :

Varia;onalLowerBound

MinimizeKLbetweenapproxima;ngandtruedistribu;onswithrespecttovaria;onalparameters.

PosteriorInference

Mean-Field


Mean-Field:Chooseafullyfactorizeddistribu;on:

with

Nonlinearfixed-pointequa;ons:

Varia/onalInference:Maximizethelowerboundw.r.t.Varia;onalparameters.


PosteriorInference

Mean-Field


1.Varia/onalInference:Maximizethelowerboundw.r.t.varia;onalparameters


2.MCMC:Applystochas;capproxima;ontoupdatemodelparameters

Almostsureconvergenceguaranteestoanasympto;callystablepoint.

Uncondi;onalSimula;onVaria;onalLowerBound

PosteriorInference

Mean-Field


1.Varia/onalInference:Maximizethelowerboundw.r.t.varia;onalparameters


2.MCMC:Applystochas;capproxima;ontoupdatemodelparameters

Almostsureconvergenceguaranteestoanasympto;callystablepoint.

Uncondi;onalSimula;on

FastInference

Learningcanscaletomillionsofexamples


GoodGenera;veModel?HandwriVenCharacters



RealDataSimulated


RealData Simulated


GoodGenera;veModel?MNISTHandwriVenDigitDataset

Handwri;ngRecogni;on

LearningAlgorithm Error

Logis;cregression 12.0%K-NN 3.09%NeuralNet(PlaV2005) 1.53%SVM(Decosteet.al.2002) 1.40%DeepAutoencoder(Bengioet.al.2007)

1.40%

DeepBeliefNet(Hintonet.al.2006)

1.20%

DBM 0.95%

LearningAlgorithm Error

Logis;cregression 22.14%K-NN 18.92%NeuralNet 14.62%SVM(Larochelleet.al.2009) 9.70%DeepAutoencoder(Bengioet.al.2007)

10.05%

DeepBeliefNet(Larochelleet.al.2009)

9.68%

DBM 8.40%

MNISTDataset Op;calCharacterRecogni;on60,000examplesof10digits 42,152examplesof26EnglishleVers

Permuta;on-invariantversion.

Genera;veModelof3-DObjects

24,000examples,5objectcategories,5differentobjectswithineachcategory,6lightningcondi;ons,9eleva;ons,18azimuths.

3-DObjectRecogni;on

LearningAlgorithm ErrorLogis;cregression 22.5%K-NN(LeCun2004) 18.92%SVM(Bengio&LeCun2007) 11.6%DeepBeliefNet(Nair&Hinton2009)

9.0%

DBM 7.2%

PaVernComple;on

Permuta;on-invariantversion.

Whereelsecanweusegenera;vemodels?

Data–Collec;onofModali;es• Mul;mediacontentontheweb-image+text+audio.

• Productrecommenda;onsystems.

• Robo;csapplica;ons.

AudioVision

TouchsensorsMotorcontrol

sunset,pacificocean,bakerbeach,seashore,ocean

car,automobile

Challenges-I

Verydifferentinputrepresenta;ons

Image Text

sunset,pacificocean,bakerbeach,seashore,

ocean • Images–real-valued,dense

Difficulttolearncross-modalfeaturesfromlow-levelrepresenta;ons.

Dense

• Text–discrete,sparse

Sparse

Challenges-II

Noisyandmissingdata

Image Textpentax,k10d,pentaxda50200,kangarooisland,sa,australiansealion

mickikrimmel,mickipedia,headshot

unseulpixel,naturey

<notext>

Challenges-IIImage Text Textgeneratedbythemodel

beach,sea,surf,strand,shore,wave,seascape,sand,ocean,waves

portrait,girl,woman,lady,blonde,preVy,gorgeous,expression,model

night,noVe,traffic,light,lights,parking,darkness,lowlight,nacht,glow

fall,autumn,trees,leaves,foliage,forest,woods,branches,path

pentax,k10d,pentaxda50200,kangarooisland,sa,australiansealion

mickikrimmel,mickipedia,headshot

unseulpixel,naturey

<notext>

ASimpleMul;modalModel• Useajointbinaryhiddenlayer.• Problem:Inputshaveverydifferentsta;s;calproper;es.

• Difficulttolearncross-modalfeatures.

0010

0Real-valued

1-of-K

0010

0Dense,real-valuedimagefeatures

GaussianmodelReplicatedSoumax

Mul;modalDBM

Wordcounts

(Srivastava & Salakhutdinov, NIPS 2012, JMLR 2014)

Mul;modalDBM

0010



Wordcounts



0010

0

Mul;modalDBM

Wordcounts

Dense,real-valuedimagefeatures


0010


Wordcounts


Mul;modalDBM

BoVom-up+

Top-down


TextGeneratedfromImages

canada,nature,sunrise,ontario,fog,mist,bc,morning

insect,buVerfly,insects,bug,buVerflies,lepidoptera

graffi;,streetart,stencil,s;cker,urbanart,graff,sanfrancisco

portrait,child,kid,ritraVo,kids,children,boy,cute,boys,italy

dog,cat,pet,kiVen,puppy,ginger,tongue,kiVy,dogs,furry

sea,france,boat,mer,beach,river,bretagne,plage,briVany

Given Generated Given Generated

TextGeneratedfromImagesGiven Generated

water,glass,beer,boVle,drink,wine,bubbles,splash,drops,drop

portrait,women,army,soldier,mother,postcard,soldiers

obama,barackobama,elec;on,poli;cs,president,hope,change,sanfrancisco,conven;on,rally

Genera;ngTextfromImages

Samplesdrawnauerevery50stepsofGibbsupdates

MIR-FlickrDataset

(Huiskes et al., 2010)

• 1millionimagesalongwithuser-assignedtags.

sculpture,beauty,stone

nikon,green,light,photoshop,apple,d70

white,yellow,abstract,lines,bus,graphic

sky,geotagged,reflec;on,cielo,bilbao,reflejo

food,cupcake,vegan

d80

anawesomeshot,theperfectphotographer,flash,damniwishidtakenthat,spiritofphotography

nikon,abigfave,goldstaraward,d80,nikond80

Results• Logis;cregressionontop-levelrepresenta;on.• Mul;modalInputs

LearningAlgorithm MAP Precision@50

Random 0.124 0.124LDA[Huiskeset.al.] 0.492 0.754SVM[Huiskeset.al.] 0.475 0.758DBM-Labelled 0.526 0.791DeepBeliefNet 0.638 0.867Autoencoder 0.638 0.875DBM 0.641 0.873

MeanAveragePrecision

Labeled25Kexamples

+1Millionunlabelled

TalkRoadmap• BasicBuildingBlocks:

Ø  SparseCodingØ  Autoencoders

• DeepGenera;veModelsØ  RestrictedBoltzmannMachinesØ  DeepBeliefNetwork,DeepBoltzmannMachinesØ  HelmholtzMachines/Varia;onalAutoencoders

• Genera;veAdversarialNetworks

• ModelEvalua;on

HelmholtzMachines• Hinton,G.E.,Dayan,P.,Frey,B.J.andNeal,R.,Science1995

Inputdata

h3

h2

h1

v

W3

W2

W1

Genera;veProcessApproximate

Inference

• Kingma&Welling,2014

• Rezende,Mohamed,Daan,2014

• Mnih&Gregor,2014

• Bornschein&Bengio,2015

• Tang&Salakhutdinov,2013

HelmholtzMachines,DBNs,DBMs

h3

h2

h1

v

W3

W2

W1

h3

h2

h1

v

W3

W2

W1

Deep Boltzmann Machine

Helmholtz Machine

h3

h2

h1

v

W3

W2

W1

Deep Belief Network

Varia;onalAutoencoders(VAEs)• TheVAEdefinesagenera;veprocessintermsofancestralsamplingthroughacascadeofhiddenstochas;clayers:

h3

h2

h1

v

W3

W2

W1

Eachtermmaydenoteacomplicatednonlinearrela;onship

•  Samplingandprobabilityevalua;onistractableforeach.

Genera;veProcess

•  denotesparametersofVAE.

•  isthenumberofstochas/clayers.

Inputdata

VAE:Example• TheVAEdefinesagenera;veprocessintermsofancestralsamplingthroughacascadeofhiddenstochas;clayers:

Thistermdenotesaone-layerneuralnet.

Determinis;cLayer

Stochas;cLayer

Stochas;cLayer

•  denotesparametersofVAE.

•  Samplingandprobabilityevalua;onistractableforeach.

•  isthenumberofstochas/clayers.

Varia;onalBound• TheVAEistrainedtomaximizethevaria;onallowerbound:

Inputdata

h3

h2

h1

v

W3

W2

W1

•  Hardtoop;mizethevaria;onalboundwithrespecttotherecogni;onnetwork(high-variance).

•  KeyideaofKingmaandWellingistousereparameteriza;ontrick.

•  Tradingoffthedatalog-likelihoodandtheKLdivergencefromthetrueposterior.

Reparameteriza;onTrick• Assumethattherecogni;ondistribu;onisGaussian:

withmeanandcovariancecomputedfromthestateofthehiddenunitsatthepreviouslayer.

•  Alterna;vely,wecanexpressthisintermofauxiliaryvariable:

• Assumethattherecogni;ondistribu;onisGaussian:

•  Or

Determinis;cEncoder

•  Therecogni;ondistribu;oncanbeexpressedintermsofadeterminis;cmapping:

Distribu;onofdoesnotdependon

Reparameteriza;onTrick

Compu;ngtheGradients•  Thegradientw.r.ttheparameters:bothrecogni;onandgenera;ve:

Gradientscanbecomputedbybackprop

Themappinghisadeterminis;cneuralnetforfixed.

Autoencoder

ImportanceWeightedAutoencoders•  CanimproveVAEbyusingfollowingk-sampleimportanceweigh;ngofthelog-likelihood:

wherearesampledfromtherecogni;onnetwork.

Inputdata

h3

h2

h1

v

W3

W2

W1

unnormalizedimportanceweights

(Burda, Grosse, Salakhutdinov, ICLR 2016)

ImportanceWeightedAutoencoders•  CanimproveVAEbyusingfollowingk-sampleimportanceweigh;ngofthelog-likelihood:

•  Thisisalowerboundonthemarginallog-likelihood:

•  SpecialCaseofk=1:SameasstandardVAEobjec;ve.

•  UsingmoresamplesàImprovesthe;ghtnessofthebound.

TighterLowerBound

•  Forallk,thelowerboundssa;sfy:

•  Usingmoresamplescanonlyimprovethe;ghtnessofthebound.

•  Moreoverifisbounded,then:

Compu;ngtheGradients•  Wecanusetheunbiasedes;mateofthegradientusingreparameteriza;ontrick:

wherewedefinenormalizedimportanceweights:

IWAEsvs.VAEs•  Drawk-samplesformtherecogni;onnetwork

-  ork-setsofauxiliaryvariables.•  ObtainthefollowingMonteCarloes;mateofthegradient:

•  ComparethistotheVAE’ses;mateofthegradient:

Mo;va;ngExample• Canwegenerateimagesfromnaturallanguagedescrip;ons?

Astopsignisflyinginblueskies

Apaleyellowschoolbusisflyinginblueskies

Aherdofelephantsisflyinginblueskies

Alargecommercialairplaneisflyinginblueskies

(Mansimov, Parisotto, Ba, Salakhutdinov, 2015)

Genera;ngImagesfromCap;ons

• Genera;veModel:Stochas;cRecurrentNetwork,chainedsequenceofVaria;onalAutoencoders,withasinglestochas;clayer.

• Recogni;onModel:Determinis;cRecurrentNetwork.

Stochas;cLayer

(Gregor et al., 2015)

FlippingColorsAyellowschoolbusparkedintheparkinglot

Aredschoolbusparkedintheparkinglot

Agreenschoolbusparkedintheparkinglot

Ablueschoolbusparkedintheparkinglot

NovelSceneComposi;onsAtoiletseatsitsopeninthebathroom

AskGoogle?

Atoiletseatsitsopeninthegrassfield

BloombergNews

Date post:	19-Oct-2021
Category:	Documents
Upload:	others
View:	9 times
Download:	0 times

Deep Learning III Unsupervised Learning

Documents