Fast Computation of Uncertainty in Deep Learning · YarinGal (University of Oxford), Akash...

transcript

FastComputationofUncertaintyinDeepLearning

MohammadEmtiyazKhanRIKENCenterforAIProject,Tokyo,Japan

JointworkwithWuLin(UBC),Didrik Nielsen(RIKEN),Voot Tangkaratt (RIKEN)

Yarin Gal(UniversityofOxford),AkashSrivastava(UniversityofEdinburgh)Zuozhu Liu(SUTD,Singapore)

Uncertainty

Quantifiestheconfidenceinthepredictionofamodel,i.e.,howmuch

itdoesnotknow.

Example:WhichisaBetterFit?

MagnitudeofEarthquake

RealdatafromTohoku(Japan).ExampletakenfromNateSilver’sbook“Thesignalandnoise” 4

Example:WhichisaBetterFit?Freq

MagnitudeofEarthquake

Whenthedataisscarceandnoisy,e.g.,inmedicine,androbotics.

OutlineoftheTalk

• Uncertaintyisimportant– E.g.,whendataarescarce,missing,unreliableetc.

• Uncertaintycomputationisdifficult– Duetolargemodelanddatausedindeeplearning

• Thistalk:fastcomputationofuncertainty– Bayesiandeeplearning–Methodsthatareextremelyeasytoimplement

Uncertainty inDeepLearning

Whyisitdifficulttoestimateit?

ANaïveMethod

p(D|✓) =NY

p(yi|f✓(xi))

Parameters

Neuralnetwork

InputOutput

✓ ⇠ p(✓)

Generate

Priordistribution

BayesianInference

Bayes’rule:

Intractableintegral

p(✓|D) =p(D|✓)p(✓)Rp(D|✓)p(✓)d✓

Posteriordistribution

Narrow Wide

ApproximateBayesianInference

minµ,�2

DhN (✓|µ,�2)kp(✓|D)

Variational Inference:ApproximatetheposteriorbyaGaussiandistribution

VarianceMean

Optimizeusinggradientmethods(SGD/Adam)– BayesbyBackprop (Blundelletal.2015),PracticalVI(Gravesetal.2011),

Black-boxVI(Rangnathan etal.2014)andmanymore….

Computationandmemoryintensive,andrequiresubstantialimplementationeffort

FastComputationof(Approximate)UncertaintyApproximatebyaGaussiandistribution,

andfinditby“perturbing”theparametersduringbackpropagation

FastComputationofUncertainty

1. Selectaminibatch2. Computegradientusingbackpropagation3. Computeascalevectortoadaptthelearningrate4. Takeagradientstep

✓ ✓ + learning rate ⇤ gradientpscale + 10�8

Adaptivelearning-ratemethod(e.g.,Adam)

p(yi|f✓(xi)) ✓ ⇠ N (✓|0, I)

FastComputationofUncertainty

1. Selectaminibatch2. Computegradientusingbackpropagation3. Computeascalevectortoadaptthelearningrate4. Takeagradientstep

0.Sample𝜖 fromastandardnormaldistribution

p(yi|f✓(xi))

✓temp ✓ + ✏ ⇤pN ⇤ scale + 1

✓ ✓ + learning rate ⇤ gradient + ✓/Npscale + 1/N

✓ ⇠ N (✓|0, I)

Variational Adam(Vadam)

Illustration:Classification

Logisticregression(30datapoints,2dimensionalinput).

SampledfromGaussianmixturewith2components

AdamvsVadam

Forbothalgorithms,Minibatch of5

Learning_rate =0.01Priorprecision=0.01

Vadam (mean)

Vadam (samples)

Whydoesthiswork?

• Thisalgorithmisobtainedbyreplacing“gradients”by“naturalgradients”.– SeeourICML2018paper.

• ThescalinginnaturalgradientisrelatedtothescalinginNewtonmethod.

• AnapproximationtotheHessianresultsinAdam.

• Somecaveats:Choosesmallminibatches,betterresultsareobtainedwithVOGN.

Faster,Simpler,andMoreRobustRegressiononAustralian-Scaledatasetusingdeepneuralnetsforvariousnumberofminibatch size.

ExistingMethod(BBVI)Ourmethod(Vadam)Ourmethod(VOGN)

Faster,Simpler,andMoreRobust

ResultsonMNISTdigitclassification(forvariousvaluesofGaussianpriorprecisionparameterλ)

ExistingMethod(BBVI)Ourmethod(Vadam)

DeepReinforcementLearning

NoExploration(SGD)Reward=2860

ExplorationusingVadamReward=5264

ReduceOverfittingwithVadam

Vadam showsconsistenttrain-testperformance,whileAdamoverfits whenNissmall

BNNclassificationona1a- a9adatasets

AdamTestAdamTest

AdamTrain AdamTrain

AdamTest

AdamTrain

Vadam Testandtrain Vadam Testandtrain

Vadam Testandtrain AdamTest

AdamTrain

AvoidingLocal

MinimaAnexampletakenfromCasellaand

Robert’sbook.

Vadam reachestheflat

minima,butGDgetsstuckatalocalminima.

Optimizationbysmoothing,Gaussianhomotopy/blurringetc.,EntropySGLDetc.

Summary

• Uncertaintyisimportant,especiallywhenthedataisscarce,missing,unreliableetc.

• Wecanobtainuncertaintycheaplywithverylittleeffort– Bayesiandeeplearning

• Itworksreasonablywellonourbenchmarks.

OpenQuestions

• Qualityofuncertaintyestimates– Applicationtolifescience?– Checkoutthe“Bayesiandeeplearning”workshopatNIPS2018.

• Estimatingvarioustypesofuncertainty–Modeluncertaintyvsdatauncertainty– Applicationsplayabigrolehere

• Isuncertaintyindeeplearninguseful?–Multiplelocalminimamakeitdifficulttoestablish

References

https://emtiyaz.github.io

Thanks!

https://emtiyaz.github.io

Fast Computation of Uncertainty in Deep Learning · YarinGal (University of Oxford), Akash...

Documents