+ All Categories
Home > Documents > Deep Learning III Unsupervised Learning

Deep Learning III Unsupervised Learning

Date post: 19-Oct-2021
Category:
Upload: others
View: 9 times
Download: 0 times
Share this document with a friend
62
Deep Learning III Unsupervised Learning Russ Salakhutdinov Machine Learning Department Carnegie Mellon University Canadian Institute of Advanced Research
Transcript
Page 1: Deep Learning III Unsupervised Learning

DeepLearningIIIUnsupervisedLearning

RussSalakhutdinov

Machine Learning Department Carnegie Mellon University

Canadian Institute of Advanced Research

Page 2: Deep Learning III Unsupervised Learning

UnsupervisedLearning

Non-probabilis;cModelsØ  SparseCodingØ  AutoencodersØ  Others(e.g.k-means)

ExplicitDensityp(x)

Probabilis;c(Genera;ve)Models

TractableModelsØ  Fullyobserved

BeliefNetsØ  NADEØ  PixelRNN

Non-TractableModelsØ  BoltzmannMachinesØ  Varia;onal

AutoencodersØ  HelmholtzMachinesØ  Manyothers…

Ø  Genera;veAdversarialNetworks

Ø  MomentMatchingNetworks

ImplicitDensity

Page 3: Deep Learning III Unsupervised Learning

TalkRoadmap• BasicBuildingBlocks:

Ø  SparseCodingØ  Autoencoders

• DeepGenera;veModelsØ  RestrictedBoltzmannMachinesØ  DeepBeliefNetworksandDeepBoltzmannMachinesØ  HelmholtzMachines/Varia;onalAutoencoders

• Genera;veAdversarialNetworks

• ModelEvalua;on

Page 4: Deep Learning III Unsupervised Learning

h3

h2

h1

v

W3

W2

W1

h3

h2

h1

v

W3

W2

W1

Deep Belief Network Deep Boltzmann Machine

DBNsvs.DBMs

DBNsarehybridmodels:• InferenceinDBNsisproblema;cduetoexplainingaway.• Onlygreedypretrainig,nojointop/miza/onoveralllayers.• Approximateinferenceisfeed-forward:nobo6om-upandtop-down.

Page 5: Deep Learning III Unsupervised Learning

Mathema;calFormula;on

h3

h2

h1

v

W3

W2

W1

modelparameters

•  BoVom-upandTop-down:

DeepBoltzmannMachine

BoVom-up Top-Down

Unlikemanyexis;ngfeed-forwardmodels:ConvNet(LeCun),HMAX(Poggioet.al.),DeepBeliefNets(Hintonet.al.)

•  Dependenciesbetweenhiddenvariables.•  Allconnec;onsareundirected.

Input

Page 6: Deep Learning III Unsupervised Learning

h3

h2

h1

v

W3

W2

W1

NeuralNetworkOutput

h3

h2

h1

v

W3

W2

W1

Mathema;calFormula;on

DeepBoltzmannMachine

h3

h2

h1

v

W3

W2

W1

DeepBeliefNetwork

Unlikemanyexis;ngfeed-forwardmodels:ConvNet(LeCun),HMAX(Poggio),DeepBeliefNets(Hinton)

Input

Page 7: Deep Learning III Unsupervised Learning

h3

h2

h1

v

W3

W2

W1

h3

h2

h1

v

W3

W2

W1

Mathema;calFormula;on

DeepBoltzmannMachine DeepBeliefNetwork

h3

h2

h1

v

W3

W2

W1

Unlikemanyexis;ngfeed-forwardmodels:ConvNet(LeCun),HMAX(Poggio),DeepBeliefNets(Hinton)

inference

NeuralNetworkOutput

Input

Page 8: Deep Learning III Unsupervised Learning

Mathema;calFormula;on

modelparameters

Maximumlikelihoodlearning:

Problem:Bothexpecta;onsareintractable!

Learningruleforundirectedgraphicalmodels:MRFs,CRFs,Factorgraphs.

•  Dependenciesbetweenhiddenvariables.

DeepBoltzmannMachine

h3

h2

h1

v

W3

W2

W1

Page 9: Deep Learning III Unsupervised Learning

ApproximateLearning

(Approximate)MaximumLikelihood:

Notfactorialanymore!

h3

h2

h1

v

W3

W2

W1

•  Bothexpecta;onsareintractable!

Page 10: Deep Learning III Unsupervised Learning

Data

ApproximateLearning

(Approximate)MaximumLikelihood:h3

h2

h1

v

W3

W2

W1

Notfactorialanymore!

Page 11: Deep Learning III Unsupervised Learning

ApproximateLearning

(Approximate)MaximumLikelihood:

Notfactorialanymore!

h3

h2

h1

v

W3

W2

W1 Varia;onalInference

Stochas;cApproxima;on(MCMC-based)

Page 12: Deep Learning III Unsupervised Learning

PreviousWorkManyapproachesforlearningBoltzmannmachineshavebeenproposedoverthelast20years:• HintonandSejnowski(1983),• PetersonandAnderson(1987)• Galland(1991)• KappenandRodriguez(1998)• Lawrence,Bishop,andJordan(1998)• Tanaka(1998)• WellingandHinton(2002)• ZhuandLiu(2002)• WellingandTeh(2003)• YasudaandTanaka(2009)

ManyofthepreviousapproacheswerenotsuccessfulforlearninggeneralBoltzmannmachineswithhiddenvariables.

Real-worldapplica;ons–thousandsofhiddenandobservedvariableswithmillionsofparameters.

AlgorithmsbasedonContras;veDivergence,ScoreMatching,Pseudo-Likelihood,CompositeLikelihood,MCMC-MLE,PiecewiseLearning,cannothandlemul;plelayersofhiddenvariables.

Page 13: Deep Learning III Unsupervised Learning

NewLearningAlgorithm

Condi;onal Uncondi;onal

PosteriorInference SimulatefromtheModel

Approximatecondi;onal

Approximatethejointdistribu;on

(Salakhutdinov, 2008; NIPS 2009)

Page 14: Deep Learning III Unsupervised Learning

Condi;onal Uncondi;onal

PosteriorInference SimulatefromtheModel

Approximatethejointdistribu;on

Data-dependent

Approximatecondi;onal

NewLearningAlgorithm

Data-independent

density Match

(Salakhutdinov, 2008; NIPS 2009)

Page 15: Deep Learning III Unsupervised Learning

Condi;onal Uncondi;onal

PosteriorInference

Approximatethejointdistribu;on

Data-dependent

Approximatecondi;onal

NewLearningAlgorithm

Data-independent

Match

KeyIdea:

MarkovChainMonteCarlo

Data-dependent:Varia/onalInference,mean-fieldtheoryData-independent:Stochas/cApproxima/on,MCMCbased

Mean-Field

SimulatefromtheModel

Page 16: Deep Learning III Unsupervised Learning

h2

h1

v

Timet=1

Stochas;cApproxima;on

Update Updateh2

h1

v

t=2h2

h1

v

t=3

•  Generate bysimula;ngfromaMarkovchainthatleavesinvariant(e.g.GibbsorM-Hsampler)

•  Update byreplacingintractable withapointes;mate

Inprac;cewesimulateseveralMarkovchainsinparallel.(Robbins and Monro, Ann. Math. Stats, 1957; L. Younes, Probability Theory 1989)

Updateandsequen;ally,where

Page 17: Deep Learning III Unsupervised Learning

LearningAlgorithmUpdateruledecomposes:

Truegradient Perturba;ontermAlmostsureconvergenceguaranteesaslearningrate

Problem:High-dimensionaldata:theprobabilitylandscapeishighlymul;modal.

Connec;onstothetheoryofstochas;capproxima;onandadap;veMCMC.

Keyinsight:Thetransi;onoperatorcanbeanyvalidtransi;onoperator–TemperedTransi;ons,Parallel/SimulatedTempering.

MarkovChainMonteCarlo

Page 18: Deep Learning III Unsupervised Learning

PosteriorInference

Mean-Field

Varia;onalInferenceApproximateintractabledistribu;onwithsimpler,tractabledistribu;on :

Varia;onalLowerBound

MinimizeKLbetweenapproxima;ngandtruedistribu;onswithrespecttovaria;onalparameters.

Page 19: Deep Learning III Unsupervised Learning

PosteriorInference

Mean-Field

Varia;onalInferenceApproximateintractabledistribu;onwithsimpler,tractabledistribu;on :

Mean-Field:Chooseafullyfactorizeddistribu;on:

with

Nonlinearfixed-pointequa;ons:

Varia/onalInference:Maximizethelowerboundw.r.t.Varia;onalparameters.

Varia;onalLowerBound

Page 20: Deep Learning III Unsupervised Learning

PosteriorInference

Mean-Field

Varia;onalInferenceApproximateintractabledistribu;onwithsimpler,tractabledistribu;on :

1.Varia/onalInference:Maximizethelowerboundw.r.t.varia;onalparameters

MarkovChainMonteCarlo

2.MCMC:Applystochas;capproxima;ontoupdatemodelparameters

Almostsureconvergenceguaranteestoanasympto;callystablepoint.

Uncondi;onalSimula;onVaria;onalLowerBound

Page 21: Deep Learning III Unsupervised Learning

PosteriorInference

Mean-Field

Varia;onalInferenceApproximateintractabledistribu;onwithsimpler,tractabledistribu;on :

1.Varia/onalInference:Maximizethelowerboundw.r.t.varia;onalparameters

MarkovChainMonteCarlo

2.MCMC:Applystochas;capproxima;ontoupdatemodelparameters

Almostsureconvergenceguaranteestoanasympto;callystablepoint.

Uncondi;onalSimula;on

FastInference

Learningcanscaletomillionsofexamples

Varia;onalLowerBound

Page 22: Deep Learning III Unsupervised Learning

GoodGenera;veModel?HandwriVenCharacters

Page 23: Deep Learning III Unsupervised Learning

GoodGenera;veModel?HandwriVenCharacters

Page 24: Deep Learning III Unsupervised Learning

GoodGenera;veModel?HandwriVenCharacters

RealDataSimulated

Page 25: Deep Learning III Unsupervised Learning

GoodGenera;veModel?HandwriVenCharacters

RealData Simulated

Page 26: Deep Learning III Unsupervised Learning

GoodGenera;veModel?HandwriVenCharacters

Page 27: Deep Learning III Unsupervised Learning

GoodGenera;veModel?MNISTHandwriVenDigitDataset

Page 28: Deep Learning III Unsupervised Learning

Handwri;ngRecogni;on

LearningAlgorithm Error

Logis;cregression 12.0%K-NN 3.09%NeuralNet(PlaV2005) 1.53%SVM(Decosteet.al.2002) 1.40%DeepAutoencoder(Bengioet.al.2007)

1.40%

DeepBeliefNet(Hintonet.al.2006)

1.20%

DBM 0.95%

LearningAlgorithm Error

Logis;cregression 22.14%K-NN 18.92%NeuralNet 14.62%SVM(Larochelleet.al.2009) 9.70%DeepAutoencoder(Bengioet.al.2007)

10.05%

DeepBeliefNet(Larochelleet.al.2009)

9.68%

DBM 8.40%

MNISTDataset Op;calCharacterRecogni;on60,000examplesof10digits 42,152examplesof26EnglishleVers

Permuta;on-invariantversion.

Page 29: Deep Learning III Unsupervised Learning

Genera;veModelof3-DObjects

24,000examples,5objectcategories,5differentobjectswithineachcategory,6lightningcondi;ons,9eleva;ons,18azimuths.

Page 30: Deep Learning III Unsupervised Learning

3-DObjectRecogni;on

LearningAlgorithm ErrorLogis;cregression 22.5%K-NN(LeCun2004) 18.92%SVM(Bengio&LeCun2007) 11.6%DeepBeliefNet(Nair&Hinton2009)

9.0%

DBM 7.2%

PaVernComple;on

Permuta;on-invariantversion.

Whereelsecanweusegenera;vemodels?

Page 31: Deep Learning III Unsupervised Learning

Data–Collec;onofModali;es• Mul;mediacontentontheweb-image+text+audio.

• Productrecommenda;onsystems.

• Robo;csapplica;ons.

AudioVision

TouchsensorsMotorcontrol

sunset,pacificocean,bakerbeach,seashore,ocean

car,automobile

Page 32: Deep Learning III Unsupervised Learning

Challenges-I

Verydifferentinputrepresenta;ons

Image Text

sunset,pacificocean,bakerbeach,seashore,

ocean • Images–real-valued,dense

Difficulttolearncross-modalfeaturesfromlow-levelrepresenta;ons.

Dense

• Text–discrete,sparse

Sparse

Page 33: Deep Learning III Unsupervised Learning

Challenges-II

Noisyandmissingdata

Image Textpentax,k10d,pentaxda50200,kangarooisland,sa,australiansealion

mickikrimmel,mickipedia,headshot

unseulpixel,naturey

<notext>

Page 34: Deep Learning III Unsupervised Learning

Challenges-IIImage Text Textgeneratedbythemodel

beach,sea,surf,strand,shore,wave,seascape,sand,ocean,waves

portrait,girl,woman,lady,blonde,preVy,gorgeous,expression,model

night,noVe,traffic,light,lights,parking,darkness,lowlight,nacht,glow

fall,autumn,trees,leaves,foliage,forest,woods,branches,path

pentax,k10d,pentaxda50200,kangarooisland,sa,australiansealion

mickikrimmel,mickipedia,headshot

unseulpixel,naturey

<notext>

Page 35: Deep Learning III Unsupervised Learning

ASimpleMul;modalModel• Useajointbinaryhiddenlayer.• Problem:Inputshaveverydifferentsta;s;calproper;es.

• Difficulttolearncross-modalfeatures.

0010

0Real-valued

1-of-K

Page 36: Deep Learning III Unsupervised Learning

0010

0Dense,real-valuedimagefeatures

GaussianmodelReplicatedSoumax

Mul;modalDBM

Wordcounts

(Srivastava & Salakhutdinov, NIPS 2012, JMLR 2014)

Page 37: Deep Learning III Unsupervised Learning

Mul;modalDBM

0010

0Dense,real-valuedimagefeatures

GaussianmodelReplicatedSoumax

Wordcounts

(Srivastava & Salakhutdinov, NIPS 2012, JMLR 2014)

Page 38: Deep Learning III Unsupervised Learning

GaussianmodelReplicatedSoumax

0010

0

Mul;modalDBM

Wordcounts

Dense,real-valuedimagefeatures

(Srivastava & Salakhutdinov, NIPS 2012, JMLR 2014)

Page 39: Deep Learning III Unsupervised Learning

0010

0Dense,real-valuedimagefeatures

Wordcounts

GaussianmodelReplicatedSoumax

Mul;modalDBM

BoVom-up+

Top-down

(Srivastava & Salakhutdinov, NIPS 2012, JMLR 2014)

Page 40: Deep Learning III Unsupervised Learning

TextGeneratedfromImages

canada,nature,sunrise,ontario,fog,mist,bc,morning

insect,buVerfly,insects,bug,buVerflies,lepidoptera

graffi;,streetart,stencil,s;cker,urbanart,graff,sanfrancisco

portrait,child,kid,ritraVo,kids,children,boy,cute,boys,italy

dog,cat,pet,kiVen,puppy,ginger,tongue,kiVy,dogs,furry

sea,france,boat,mer,beach,river,bretagne,plage,briVany

Given Generated Given Generated

Page 41: Deep Learning III Unsupervised Learning

TextGeneratedfromImagesGiven Generated

water,glass,beer,boVle,drink,wine,bubbles,splash,drops,drop

portrait,women,army,soldier,mother,postcard,soldiers

obama,barackobama,elec;on,poli;cs,president,hope,change,sanfrancisco,conven;on,rally

Page 42: Deep Learning III Unsupervised Learning

Genera;ngTextfromImages

Samplesdrawnauerevery50stepsofGibbsupdates

Page 43: Deep Learning III Unsupervised Learning

MIR-FlickrDataset

(Huiskes et al., 2010)

• 1millionimagesalongwithuser-assignedtags.

sculpture,beauty,stone

nikon,green,light,photoshop,apple,d70

white,yellow,abstract,lines,bus,graphic

sky,geotagged,reflec;on,cielo,bilbao,reflejo

food,cupcake,vegan

d80

anawesomeshot,theperfectphotographer,flash,damniwishidtakenthat,spiritofphotography

nikon,abigfave,goldstaraward,d80,nikond80

Page 44: Deep Learning III Unsupervised Learning

Results• Logis;cregressionontop-levelrepresenta;on.• Mul;modalInputs

LearningAlgorithm MAP Precision@50

Random 0.124 0.124LDA[Huiskeset.al.] 0.492 0.754SVM[Huiskeset.al.] 0.475 0.758DBM-Labelled 0.526 0.791DeepBeliefNet 0.638 0.867Autoencoder 0.638 0.875DBM 0.641 0.873

MeanAveragePrecision

Labeled25Kexamples

+1Millionunlabelled

Page 45: Deep Learning III Unsupervised Learning

TalkRoadmap• BasicBuildingBlocks:

Ø  SparseCodingØ  Autoencoders

• DeepGenera;veModelsØ  RestrictedBoltzmannMachinesØ  DeepBeliefNetwork,DeepBoltzmannMachinesØ  HelmholtzMachines/Varia;onalAutoencoders

• Genera;veAdversarialNetworks

• ModelEvalua;on

Page 46: Deep Learning III Unsupervised Learning

HelmholtzMachines• Hinton,G.E.,Dayan,P.,Frey,B.J.andNeal,R.,Science1995

Inputdata

h3

h2

h1

v

W3

W2

W1

Genera;veProcessApproximate

Inference

• Kingma&Welling,2014

• Rezende,Mohamed,Daan,2014

• Mnih&Gregor,2014

• Bornschein&Bengio,2015

• Tang&Salakhutdinov,2013

Page 47: Deep Learning III Unsupervised Learning

HelmholtzMachines,DBNs,DBMs

h3

h2

h1

v

W3

W2

W1

h3

h2

h1

v

W3

W2

W1

Deep Boltzmann Machine

Helmholtz Machine

h3

h2

h1

v

W3

W2

W1

Deep Belief Network

Page 48: Deep Learning III Unsupervised Learning

Varia;onalAutoencoders(VAEs)• TheVAEdefinesagenera;veprocessintermsofancestralsamplingthroughacascadeofhiddenstochas;clayers:

h3

h2

h1

v

W3

W2

W1

Eachtermmaydenoteacomplicatednonlinearrela;onship

•  Samplingandprobabilityevalua;onistractableforeach.

Genera;veProcess

•  denotesparametersofVAE.

•  isthenumberofstochas/clayers.

Inputdata

Page 49: Deep Learning III Unsupervised Learning

VAE:Example• TheVAEdefinesagenera;veprocessintermsofancestralsamplingthroughacascadeofhiddenstochas;clayers:

Thistermdenotesaone-layerneuralnet.

Determinis;cLayer

Stochas;cLayer

Stochas;cLayer

•  denotesparametersofVAE.

•  Samplingandprobabilityevalua;onistractableforeach.

•  isthenumberofstochas/clayers.

Page 50: Deep Learning III Unsupervised Learning

Varia;onalBound• TheVAEistrainedtomaximizethevaria;onallowerbound:

Inputdata

h3

h2

h1

v

W3

W2

W1

•  Hardtoop;mizethevaria;onalboundwithrespecttotherecogni;onnetwork(high-variance).

•  KeyideaofKingmaandWellingistousereparameteriza;ontrick.

•  Tradingoffthedatalog-likelihoodandtheKLdivergencefromthetrueposterior.

Page 51: Deep Learning III Unsupervised Learning

Reparameteriza;onTrick• Assumethattherecogni;ondistribu;onisGaussian:

withmeanandcovariancecomputedfromthestateofthehiddenunitsatthepreviouslayer.

•  Alterna;vely,wecanexpressthisintermofauxiliaryvariable:

Page 52: Deep Learning III Unsupervised Learning

• Assumethattherecogni;ondistribu;onisGaussian:

•  Or

Determinis;cEncoder

•  Therecogni;ondistribu;oncanbeexpressedintermsofadeterminis;cmapping:

Distribu;onofdoesnotdependon

Reparameteriza;onTrick

Page 53: Deep Learning III Unsupervised Learning

Compu;ngtheGradients•  Thegradientw.r.ttheparameters:bothrecogni;onandgenera;ve:

Gradientscanbecomputedbybackprop

Themappinghisadeterminis;cneuralnetforfixed.

Autoencoder

Page 54: Deep Learning III Unsupervised Learning

ImportanceWeightedAutoencoders•  CanimproveVAEbyusingfollowingk-sampleimportanceweigh;ngofthelog-likelihood:

wherearesampledfromtherecogni;onnetwork.

Inputdata

h3

h2

h1

v

W3

W2

W1

unnormalizedimportanceweights

(Burda, Grosse, Salakhutdinov, ICLR 2016)

Page 55: Deep Learning III Unsupervised Learning

ImportanceWeightedAutoencoders•  CanimproveVAEbyusingfollowingk-sampleimportanceweigh;ngofthelog-likelihood:

•  Thisisalowerboundonthemarginallog-likelihood:

•  SpecialCaseofk=1:SameasstandardVAEobjec;ve.

•  UsingmoresamplesàImprovesthe;ghtnessofthebound.

Page 56: Deep Learning III Unsupervised Learning

TighterLowerBound

•  Forallk,thelowerboundssa;sfy:

•  Usingmoresamplescanonlyimprovethe;ghtnessofthebound.

•  Moreoverifisbounded,then:

Page 57: Deep Learning III Unsupervised Learning

Compu;ngtheGradients•  Wecanusetheunbiasedes;mateofthegradientusingreparameteriza;ontrick:

wherewedefinenormalizedimportanceweights:

Page 58: Deep Learning III Unsupervised Learning

IWAEsvs.VAEs•  Drawk-samplesformtherecogni;onnetwork

-  ork-setsofauxiliaryvariables.•  ObtainthefollowingMonteCarloes;mateofthegradient:

•  ComparethistotheVAE’ses;mateofthegradient:

Page 59: Deep Learning III Unsupervised Learning

Mo;va;ngExample• Canwegenerateimagesfromnaturallanguagedescrip;ons?

Astopsignisflyinginblueskies

Apaleyellowschoolbusisflyinginblueskies

Aherdofelephantsisflyinginblueskies

Alargecommercialairplaneisflyinginblueskies

(Mansimov, Parisotto, Ba, Salakhutdinov, 2015)

Page 60: Deep Learning III Unsupervised Learning

Genera;ngImagesfromCap;ons

• Genera;veModel:Stochas;cRecurrentNetwork,chainedsequenceofVaria;onalAutoencoders,withasinglestochas;clayer.

• Recogni;onModel:Determinis;cRecurrentNetwork.

Stochas;cLayer

(Gregor et al., 2015)

Page 61: Deep Learning III Unsupervised Learning

FlippingColorsAyellowschoolbusparkedintheparkinglot

Aredschoolbusparkedintheparkinglot

Agreenschoolbusparkedintheparkinglot

Ablueschoolbusparkedintheparkinglot

Page 62: Deep Learning III Unsupervised Learning

NovelSceneComposi;onsAtoiletseatsitsopeninthebathroom

AskGoogle?

Atoiletseatsitsopeninthegrassfield

BloombergNews


Recommended