FastComputationofUncertaintyinDeepLearning
MohammadEmtiyazKhanRIKENCenterforAIProject,Tokyo,Japan
JointworkwithWuLin(UBC),Didrik Nielsen(RIKEN),Voot Tangkaratt (RIKEN)
Yarin Gal(UniversityofOxford),AkashSrivastava(UniversityofEdinburgh)Zuozhu Liu(SUTD,Singapore)
Uncertainty
Quantifiestheconfidenceinthepredictionofamodel,i.e.,howmuch
itdoesnotknow.
2
Example:WhichisaBetterFit?
Blue
Red
57%
43%
Freq
uency
MagnitudeofEarthquake
RealdatafromTohoku(Japan).ExampletakenfromNateSilver’sbook“Thesignalandnoise” 4
Example:WhichisaBetterFit?Freq
uency
MagnitudeofEarthquake
Whenthedataisscarceandnoisy,e.g.,inmedicine,androbotics.
OutlineoftheTalk
• Uncertaintyisimportant– E.g.,whendataarescarce,missing,unreliableetc.
• Uncertaintycomputationisdifficult– Duetolargemodelanddatausedindeeplearning
• Thistalk:fastcomputationofuncertainty– Bayesiandeeplearning–Methodsthatareextremelyeasytoimplement
5
Uncertainty inDeepLearning
Whyisitdifficulttoestimateit?
6
ANaïveMethod
7
p(D|✓) =NY
i=1
p(yi|f✓(xi))
Parameters
Data
Neuralnetwork
InputOutput
✓ ⇠ p(✓)
Generate
Priordistribution
BayesianInference
8
Bayes’rule:
Intractableintegral
p(✓|D) =p(D|✓)p(✓)Rp(D|✓)p(✓)d✓
Posteriordistribution
Narrow Wide
ApproximateBayesianInference
9
minµ,�2
DhN (✓|µ,�2)kp(✓|D)
i
Variational Inference:ApproximatetheposteriorbyaGaussiandistribution
VarianceMean
Optimizeusinggradientmethods(SGD/Adam)– BayesbyBackprop (Blundelletal.2015),PracticalVI(Gravesetal.2011),
Black-boxVI(Rangnathan etal.2014)andmanymore….
Computationandmemoryintensive,andrequiresubstantialimplementationeffort
FastComputationof(Approximate)UncertaintyApproximatebyaGaussiandistribution,
andfinditby“perturbing”theparametersduringbackpropagation
10
FastComputationofUncertainty
11
1. Selectaminibatch2. Computegradientusingbackpropagation3. Computeascalevectortoadaptthelearningrate4. Takeagradientstep
✓ ✓ + learning rate ⇤ gradientpscale + 10�8
Adaptivelearning-ratemethod(e.g.,Adam)
NY
i=1
p(yi|f✓(xi)) ✓ ⇠ N (✓|0, I)
FastComputationofUncertainty
12
1. Selectaminibatch2. Computegradientusingbackpropagation3. Computeascalevectortoadaptthelearningrate4. Takeagradientstep
0.Sample𝜖 fromastandardnormaldistribution
NY
i=1
p(yi|f✓(xi))
✓temp ✓ + ✏ ⇤pN ⇤ scale + 1
✓ ✓ + learning rate ⇤ gradient + ✓/Npscale + 1/N
✓ ⇠ N (✓|0, I)
Variational Adam(Vadam)
Illustration:Classification
13
Logisticregression(30datapoints,2dimensionalinput).
SampledfromGaussianmixturewith2components
AdamvsVadam
14
Forbothalgorithms,Minibatch of5
Learning_rate =0.01Priorprecision=0.01
Adam
Vadam (mean)
Vadam (samples)
Whydoesthiswork?
• Thisalgorithmisobtainedbyreplacing“gradients”by“naturalgradients”.– SeeourICML2018paper.
• ThescalinginnaturalgradientisrelatedtothescalinginNewtonmethod.
• AnapproximationtotheHessianresultsinAdam.
• Somecaveats:Choosesmallminibatches,betterresultsareobtainedwithVOGN.
15
Faster,Simpler,andMoreRobustRegressiononAustralian-Scaledatasetusingdeepneuralnetsforvariousnumberofminibatch size.
16
ExistingMethod(BBVI)Ourmethod(Vadam)Ourmethod(VOGN)
Faster,Simpler,andMoreRobust
17
ResultsonMNISTdigitclassification(forvariousvaluesofGaussianpriorprecisionparameterλ)
ExistingMethod(BBVI)Ourmethod(Vadam)
DeepReinforcementLearning
18
NoExploration(SGD)Reward=2860
ExplorationusingVadamReward=5264
ReduceOverfittingwithVadam
19
Vadam showsconsistenttrain-testperformance,whileAdamoverfits whenNissmall
BNNclassificationona1a- a9adatasets
AdamTestAdamTest
AdamTrain AdamTrain
AdamTest
AdamTrain
Vadam Testandtrain Vadam Testandtrain
Vadam Testandtrain AdamTest
AdamTrain
20
AvoidingLocal
MinimaAnexampletakenfromCasellaand
Robert’sbook.
Vadam reachestheflat
minima,butGDgetsstuckatalocalminima.
Optimizationbysmoothing,Gaussianhomotopy/blurringetc.,EntropySGLDetc.
Summary
• Uncertaintyisimportant,especiallywhenthedataisscarce,missing,unreliableetc.
• Wecanobtainuncertaintycheaplywithverylittleeffort– Bayesiandeeplearning
• Itworksreasonablywellonourbenchmarks.
21
OpenQuestions
• Qualityofuncertaintyestimates– Applicationtolifescience?– Checkoutthe“Bayesiandeeplearning”workshopatNIPS2018.
• Estimatingvarioustypesofuncertainty–Modeluncertaintyvsdatauncertainty– Applicationsplayabigrolehere
• Isuncertaintyindeeplearninguseful?–Multiplelocalminimamakeitdifficulttoestablish
22
References
23
https://emtiyaz.github.io
Thanks!
https://emtiyaz.github.io
24