Jérôme Tubiana, Rémi Monasson Laboratoire de Physique...

JérômeTubiana,RémiMonassonLaboratoiredePhysiqueThéorique

EcoleNormaleSupérieure

Mo3va3on

GoogleNetDeepNeuralNetwork

Whydoesthisnetworkwork?(andnotothers!)

RestrictedBoltzmannMachines

Ackley,Hinton&Sejnowski1985;Smolensky1986;Hinton,2002

Hiddenlayer

Visiblelayer(binaryr.v.)

V1 V3V2

h1 h2

W�,i

•  Graphicalmodelcons3tutedbytwosetsofrandomvariablesthatarecoupledtogether.

P (v, h) =1

Ze�E(v,h)

E(v, h) = �NX

i=1

ivvi +

KX

�=1

U�(h�)�X

�,i

W�,ivih�

RestrictedBoltzmannMachines

Ackley,Hinton&Sejnowski1985;Smolensky1986;Hinton,2002

Hiddenlayer

Visiblelayer(binaryr.v.)

V1 V3V2

h1 h2

W�,i

•  Graphicalmodelcons3tutedbytwosetsofrandomvariablesthatarecoupledtogether.

•  RBMlearnaprobabilitydistribu3onoverthe

visiblelayer.

P (v, h) =1

Ze�E(v,h)

E(v, h) = �NX

i=1

ivvi +

KX

�=1

U�(h�)�X

�,i

W�,ivih�

P (v) =

Z KY

�=1

dh�P (v, h�) =1

Ze�Heff (v)

Vanillaexample

•  Supposeavectoroffourbinaryrandomvariable(v1v2v3v4)•  Observestrongcorrela3onbetweenallpairsofvariables

V1

V3V2

V3

Vanillaexample

V1

V3V2

V4

Isingmodelexplainscorrela3onbydirect

couplings


Vanillaexample


V1

V3V2

V3

h1

RBMexplainscorrela3onbycommoninput

SamplingfromRBM

V(0)

h(0)

V(1) V(2)

h(1)Extractfeaturesfrom

data

•  Computethehiddenlayerinputs

•  Sampleeachhiddenunitindependently

•  Computethevisiblelayerinputs

•  Sampleeachvisibleunitindependently

x↵ =X

i

W↵,ivi

yi =X

↵

W↵,ih↵

p(h↵

|x↵

) / e

�U↵(h↵)+h↵x↵

p(vi|yi) / evi( iv+yi)

Inputlayer(data)

Hiddenlayer(features)

Reconstructdatafromfeatures

Thehiddenunitspoten3alsGaussian units :

h↵ 2 R , U↵(h↵) =h2↵

2

Heff [v] = �X

i

ivvi �

X

↵

(X

i

W↵,ivi)2

GaussianRBMslearnpairwisecouplings(EquivalenttotheHopfieldmodel)


h↵ 2 R , U↵(h↵) =h2↵

2

Heff [v] = �X

i

ivvi �

X

↵

(X

i

W↵,ivi)2


Ingeneral,theeffec3vehamiltonianisnotquadra3candhashigher-ordercouplings.Examples:•  Bernoulli•  ReLU+

Non gaussian units :


h↵ 2 R , U↵(h↵) =h2↵

2

Heff [v] = �X

i

ivvi �

X

↵

(X

i

W↵,ivi)2


Ingeneral,theeffec3vehamiltonianisnotquadra3candhashigher-ordercouplings.Examples:•  Bernoulli•  ReLU+

Non gaussian units :

DifferentpotenNalscorrespondtodifferent

transferfuncNons

RBMlearndatarepresenta3ons

Givenaconfigura3on(v1,…,vN),Thehiddenunitsac3va3ons(h1,…,hK)definearepresenta3onofthedata.Learningrepresenta3onsisacrucialtask:

•  InNeuroscience:Sensoryinforma3onprocessing(ex:fromsensorstocor3calareas).

•  InMachineLearning:Thesuccessoflearningalgorithmsdependsondatarepresenta3ons.Agoodrepresenta3onlearningalgorithmextractthefeaturesthathavevariabilityinmanydifferentneighborregionsoftheinputspace.

ThesuccessofDeepLearningalgorithmliesintheirabilitytolearnabstractrepresenta3ons.RBMisoneofthesimplestrepresenta3onlearningalgorithm

thatcanbestudied

Example:MNISTsynthe3cdigits

ReLU+RBMK=400

LogLikelihood:-63Nats

GaussianRBMK=400

LogLikelihood:-83Nats

MNIST60,000imageofdigitsofsize28x28

Learningalgorithm:PCD,PT(Tieleman2008,Desjardins2010)

MNISTdistributedrepresenta3on

AsubsetofthefeatureslearntforaReLU+RBM.K=400.Eachimagerepresentaweightvector

W�Eachgeneratedhandwriiendigitimageiscomposedby

superposingabout20elementarystrokes.

Differentcombina3onsofstrokesproducedifferentvariantsofthesamedigit

Phenomenologyoflearning

Keymetricsmonitoredduringlearning:•  Log-likelihood:increases L =< log(p(x|✓)) >

validation

Learningalgorithms:PCD&PT(Tieleman2008,Desjardins2010)


� =<X

i

W 2↵,i >↵

Keymetricsmonitoredduringlearning:•  Log-likelihood:increases

•  Weightamplitude:increases

L =< log(p(x|✓)) >validation



� =<X

i

W 2↵,i >↵



•  Weightsparsity:Theweightsaregekngmoresparse

p = fraction non-zero couplings




� =<X

i

W 2↵,i >↵



•  Weightsparsity:Theweightsaregekngmoresparse

•  Averagenumberofac3vehiddenunits:increase(alertransient)

p = fraction non-zero couplings


L = Number of active hidden units


Ques3ons:

•  Howcanabipar3tenetworkgeneratesuchdata?•  WhydosomeRBMworkandothersdon’t?•  Whatmechanismproducesdistributedrepresenta3ons?

Ques3ons:

•  Howcanabipar3tenetworkgeneratesuchdata?•  WhydosomeRBMworkandothersdon’t?•  Whatmechanismproducesdistributedrepresenta3ons?

Sta3s3calPhysicsApproach:studytheproper3esofarandomRBMwithprescribedcontrolparametersandthedifferentphases(theoutcomeofthealgorithm)

RandomWeightRBMmodel

MulNtaskingassociaNvenetworksAgliarietal.,PRL2012

N ! 1

↵N

Sparse Random Weights

W

8<

:

0 1� p+1

p2

�1

p2

vi 2 {0, 1}field v

h� 2 R+ , ReLU+

Threshold h

ThePhasesGlassyPhase:

•  Allthehiddenunitareweakly

ac3vated•  Allstateshaveweak

probability

ComposiNonalPhase•  Severalhiddenunitsarestronglyac3vated,

andtheothersarequiet•  Numberofregionswithhighprobabilityis

polynomialinN

FerromagneNcPhase:•  Onehiddenunitisstrongly

ac3vatedandtheothersareweaklyac3vated.

•  NumberofregionswithhighprobabilityislinearinN

h %p &

↵ %

StatMechsofRandomRBM•  Areplicatheorycomputa3onisperformedtoes3matethefreeenergyinthe

zero-temperaturelimit:

NumberofacNve

hiddenunitsHiddenunitsthreshold

Fieldsonvisibleunits

Number of hidden units

Number of visible units

Weightssparsity

L? / 1

p

F (↵, p, v, h, L)

Valida3onMNIST

NumericalExperiment:

ReLU+RBMsaretrainedwitharangeofL1-like

regulariza3onthatcontroltheweightmatrixsparsity

Astheweightsget

sparser,thenumberofac3vehiddenunits

increases

Conclusions

•  Distributedencodingreliesontwokeyproper3es:– Nonquadra3cpoten3als(ienon-lineartransferfunc3ons).Theydenoisethehiddenlayerinputsallowingforbeierstability.

– Weightsparsityallowforac3va3onofmanyhiddenunitsthatdetectcomplementaryfeatures.Thecombinatoricscreatesarichoutputdistribu3on.

•  Future:– Dynamicsoflearning– Deepmodels

Acknowledgements

•  Funding:– EcoleNormaleSupérieure– CNRS:Inphyni3Challenge

•  Discussions:A.Dubreuil,L.Posani

Date post:	30-Sep-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Jérôme Tubiana, Rémi Monasson Laboratoire de Physique...

Documents