JérômeTubiana,RémiMonassonLaboratoiredePhysiqueThéorique
EcoleNormaleSupérieure
Mo3va3on
GoogleNetDeepNeuralNetwork
Whydoesthisnetworkwork?(andnotothers!)
RestrictedBoltzmannMachines
Ackley,Hinton&Sejnowski1985;Smolensky1986;Hinton,2002
Hiddenlayer
Visiblelayer(binaryr.v.)
V1 V3V2
h1 h2
W�,i
• Graphicalmodelcons3tutedbytwosetsofrandomvariablesthatarecoupledtogether.
P (v, h) =1
Ze�E(v,h)
E(v, h) = �NX
i=1
ivvi +
KX
�=1
U�(h�)�X
�,i
W�,ivih�
RestrictedBoltzmannMachines
Ackley,Hinton&Sejnowski1985;Smolensky1986;Hinton,2002
Hiddenlayer
Visiblelayer(binaryr.v.)
V1 V3V2
h1 h2
W�,i
• Graphicalmodelcons3tutedbytwosetsofrandomvariablesthatarecoupledtogether.
• RBMlearnaprobabilitydistribu3onoverthe
visiblelayer.
P (v, h) =1
Ze�E(v,h)
E(v, h) = �NX
i=1
ivvi +
KX
�=1
U�(h�)�X
�,i
W�,ivih�
P (v) =
Z KY
�=1
dh�P (v, h�) =1
Ze�Heff (v)
Vanillaexample
• Supposeavectoroffourbinaryrandomvariable(v1v2v3v4)• Observestrongcorrela3onbetweenallpairsofvariables
V1
V3V2
V3
Vanillaexample
V1
V3V2
V4
Isingmodelexplainscorrela3onbydirect
couplings
• Supposeavectoroffourbinaryrandomvariable(v1v2v3v4)• Observestrongcorrela3onbetweenallpairsofvariables
Vanillaexample
• Supposeavectoroffourbinaryrandomvariable(v1v2v3v4)• Observestrongcorrela3onbetweenallpairsofvariables
V1
V3V2
V3
h1
RBMexplainscorrela3onbycommoninput
SamplingfromRBM
V(0)
h(0)
V(1) V(2)
h(1)Extractfeaturesfrom
data
• Computethehiddenlayerinputs
• Sampleeachhiddenunitindependently
• Computethevisiblelayerinputs
• Sampleeachvisibleunitindependently
x↵ =X
i
W↵,ivi
yi =X
↵
W↵,ih↵
p(h↵
|x↵
) / e
�U↵(h↵)+h↵x↵
p(vi|yi) / evi( iv+yi)
Inputlayer(data)
Hiddenlayer(features)
Reconstructdatafromfeatures
Thehiddenunitspoten3alsGaussian units :
h↵ 2 R , U↵(h↵) =h2↵
2
Heff [v] = �X
i
ivvi �
X
↵
(X
i
W↵,ivi)2
GaussianRBMslearnpairwisecouplings(EquivalenttotheHopfieldmodel)
Thehiddenunitspoten3alsGaussian units :
h↵ 2 R , U↵(h↵) =h2↵
2
Heff [v] = �X
i
ivvi �
X
↵
(X
i
W↵,ivi)2
GaussianRBMslearnpairwisecouplings(EquivalenttotheHopfieldmodel)
Ingeneral,theeffec3vehamiltonianisnotquadra3candhashigher-ordercouplings.Examples:• Bernoulli• ReLU+
Non gaussian units :
Thehiddenunitspoten3alsGaussian units :
h↵ 2 R , U↵(h↵) =h2↵
2
Heff [v] = �X
i
ivvi �
X
↵
(X
i
W↵,ivi)2
GaussianRBMslearnpairwisecouplings(EquivalenttotheHopfieldmodel)
Ingeneral,theeffec3vehamiltonianisnotquadra3candhashigher-ordercouplings.Examples:• Bernoulli• ReLU+
Non gaussian units :
DifferentpotenNalscorrespondtodifferent
transferfuncNons
RBMlearndatarepresenta3ons
Givenaconfigura3on(v1,…,vN),Thehiddenunitsac3va3ons(h1,…,hK)definearepresenta3onofthedata.Learningrepresenta3onsisacrucialtask:
• InNeuroscience:Sensoryinforma3onprocessing(ex:fromsensorstocor3calareas).
• InMachineLearning:Thesuccessoflearningalgorithmsdependsondatarepresenta3ons.Agoodrepresenta3onlearningalgorithmextractthefeaturesthathavevariabilityinmanydifferentneighborregionsoftheinputspace.
ThesuccessofDeepLearningalgorithmliesintheirabilitytolearnabstractrepresenta3ons.RBMisoneofthesimplestrepresenta3onlearningalgorithm
thatcanbestudied
Example:MNISTsynthe3cdigits
ReLU+RBMK=400
LogLikelihood:-63Nats
GaussianRBMK=400
LogLikelihood:-83Nats
MNIST60,000imageofdigitsofsize28x28
Learningalgorithm:PCD,PT(Tieleman2008,Desjardins2010)
MNISTdistributedrepresenta3on
AsubsetofthefeatureslearntforaReLU+RBM.K=400.Eachimagerepresentaweightvector
W�Eachgeneratedhandwriiendigitimageiscomposedby
superposingabout20elementarystrokes.
Differentcombina3onsofstrokesproducedifferentvariantsofthesamedigit
Phenomenologyoflearning
Keymetricsmonitoredduringlearning:• Log-likelihood:increases L =< log(p(x|✓)) >
validation
Learningalgorithms:PCD&PT(Tieleman2008,Desjardins2010)
Phenomenologyoflearning
� =<X
i
W 2↵,i >↵
Keymetricsmonitoredduringlearning:• Log-likelihood:increases
• Weightamplitude:increases
L =< log(p(x|✓)) >validation
Learningalgorithms:PCD&PT(Tieleman2008,Desjardins2010)
Phenomenologyoflearning
� =<X
i
W 2↵,i >↵
Keymetricsmonitoredduringlearning:• Log-likelihood:increases
• Weightamplitude:increases
• Weightsparsity:Theweightsaregekngmoresparse
p = fraction non-zero couplings
L =< log(p(x|✓)) >validation
Learningalgorithms:PCD&PT(Tieleman2008,Desjardins2010)
Phenomenologyoflearning
� =<X
i
W 2↵,i >↵
Keymetricsmonitoredduringlearning:• Log-likelihood:increases
• Weightamplitude:increases
• Weightsparsity:Theweightsaregekngmoresparse
• Averagenumberofac3vehiddenunits:increase(alertransient)
p = fraction non-zero couplings
L =< log(p(x|✓)) >validation
L = Number of active hidden units
Learningalgorithms:PCD&PT(Tieleman2008,Desjardins2010)
Ques3ons:
• Howcanabipar3tenetworkgeneratesuchdata?• WhydosomeRBMworkandothersdon’t?• Whatmechanismproducesdistributedrepresenta3ons?
Ques3ons:
• Howcanabipar3tenetworkgeneratesuchdata?• WhydosomeRBMworkandothersdon’t?• Whatmechanismproducesdistributedrepresenta3ons?
Sta3s3calPhysicsApproach:studytheproper3esofarandomRBMwithprescribedcontrolparametersandthedifferentphases(theoutcomeofthealgorithm)
RandomWeightRBMmodel
MulNtaskingassociaNvenetworksAgliarietal.,PRL2012
N ! 1
↵N
Sparse Random Weights
W
8<
:
0 1� p+1
p2
�1
p2
vi 2 {0, 1}field v
h� 2 R+ , ReLU+
Threshold h
ThePhasesGlassyPhase:
• Allthehiddenunitareweakly
ac3vated• Allstateshaveweak
probability
ComposiNonalPhase• Severalhiddenunitsarestronglyac3vated,
andtheothersarequiet• Numberofregionswithhighprobabilityis
polynomialinN
FerromagneNcPhase:• Onehiddenunitisstrongly
ac3vatedandtheothersareweaklyac3vated.
• NumberofregionswithhighprobabilityislinearinN
h %p &
↵ %
StatMechsofRandomRBM• Areplicatheorycomputa3onisperformedtoes3matethefreeenergyinthe
zero-temperaturelimit:
NumberofacNve
hiddenunitsHiddenunitsthreshold
Fieldsonvisibleunits
Number of hidden units
Number of visible units
Weightssparsity
L? / 1
p
F (↵, p, v, h, L)
Valida3onMNIST
NumericalExperiment:
ReLU+RBMsaretrainedwitharangeofL1-like
regulariza3onthatcontroltheweightmatrixsparsity
Astheweightsget
sparser,thenumberofac3vehiddenunits
increases
Conclusions
• Distributedencodingreliesontwokeyproper3es:– Nonquadra3cpoten3als(ienon-lineartransferfunc3ons).Theydenoisethehiddenlayerinputsallowingforbeierstability.
– Weightsparsityallowforac3va3onofmanyhiddenunitsthatdetectcomplementaryfeatures.Thecombinatoricscreatesarichoutputdistribu3on.
• Future:– Dynamicsoflearning– Deepmodels
Acknowledgements
• Funding:– EcoleNormaleSupérieure– CNRS:Inphyni3Challenge
• Discussions:A.Dubreuil,L.Posani