Fast Computation of Uncertainty in Deep Learning · YarinGal (University of Oxford), Akash...

FastComputationofUncertaintyinDeepLearning

MohammadEmtiyazKhanRIKENCenterforAIProject,Tokyo,Japan

JointworkwithWuLin(UBC),Didrik Nielsen(RIKEN),Voot Tangkaratt (RIKEN)

Yarin Gal(UniversityofOxford),AkashSrivastava(UniversityofEdinburgh)Zuozhu Liu(SUTD,Singapore)

Uncertainty

Quantifiestheconfidenceinthepredictionofamodel,i.e.,howmuch

itdoesnotknow.

2

Example:WhichisaBetterFit?

Blue

Red

57%

43%

Freq

uency

MagnitudeofEarthquake

RealdatafromTohoku(Japan).ExampletakenfromNateSilver’sbook“Thesignalandnoise” 4

Example:WhichisaBetterFit?Freq

uency

MagnitudeofEarthquake

Whenthedataisscarceandnoisy,e.g.,inmedicine,androbotics.

OutlineoftheTalk

• Uncertaintyisimportant– E.g.,whendataarescarce,missing,unreliableetc.

• Uncertaintycomputationisdifficult– Duetolargemodelanddatausedindeeplearning

• Thistalk:fastcomputationofuncertainty– Bayesiandeeplearning–Methodsthatareextremelyeasytoimplement

5

Uncertainty inDeepLearning

Whyisitdifficulttoestimateit?

6

ANaïveMethod

7

p(D|✓) =NY

i=1

p(yi|f✓(xi))

Parameters

Data

Neuralnetwork

InputOutput

✓ ⇠ p(✓)

Generate

Priordistribution

BayesianInference

8

Bayes’rule:

Intractableintegral

p(✓|D) =p(D|✓)p(✓)Rp(D|✓)p(✓)d✓

Posteriordistribution

Narrow Wide

ApproximateBayesianInference

9

minµ,�2

DhN (✓|µ,�2)kp(✓|D)

i

Variational Inference:ApproximatetheposteriorbyaGaussiandistribution

VarianceMean

Optimizeusinggradientmethods(SGD/Adam)– BayesbyBackprop (Blundelletal.2015),PracticalVI(Gravesetal.2011),

Black-boxVI(Rangnathan etal.2014)andmanymore….

Computationandmemoryintensive,andrequiresubstantialimplementationeffort

FastComputationof(Approximate)UncertaintyApproximatebyaGaussiandistribution,

andfinditby“perturbing”theparametersduringbackpropagation

10

FastComputationofUncertainty

11

1. Selectaminibatch2. Computegradientusingbackpropagation3. Computeascalevectortoadaptthelearningrate4. Takeagradientstep

✓ ✓ + learning rate ⇤ gradientpscale + 10�8

Adaptivelearning-ratemethod(e.g.,Adam)

NY

i=1

p(yi|f✓(xi)) ✓ ⇠ N (✓|0, I)

FastComputationofUncertainty

12

1. Selectaminibatch2. Computegradientusingbackpropagation3. Computeascalevectortoadaptthelearningrate4. Takeagradientstep

0.Sample𝜖 fromastandardnormaldistribution

NY

i=1

p(yi|f✓(xi))

✓temp ✓ + ✏ ⇤pN ⇤ scale + 1

✓ ✓ + learning rate ⇤ gradient + ✓/Npscale + 1/N

✓ ⇠ N (✓|0, I)

Variational Adam(Vadam)

Illustration:Classification

13

Logisticregression(30datapoints,2dimensionalinput).

SampledfromGaussianmixturewith2components

AdamvsVadam

14

Forbothalgorithms,Minibatch of5

Learning_rate =0.01Priorprecision=0.01

Adam

Vadam (mean)

Vadam (samples)

Whydoesthiswork?

• Thisalgorithmisobtainedbyreplacing“gradients”by“naturalgradients”.– SeeourICML2018paper.

• ThescalinginnaturalgradientisrelatedtothescalinginNewtonmethod.

• AnapproximationtotheHessianresultsinAdam.

• Somecaveats:Choosesmallminibatches,betterresultsareobtainedwithVOGN.

15

Faster,Simpler,andMoreRobustRegressiononAustralian-Scaledatasetusingdeepneuralnetsforvariousnumberofminibatch size.

16

ExistingMethod(BBVI)Ourmethod(Vadam)Ourmethod(VOGN)

Faster,Simpler,andMoreRobust

17

ResultsonMNISTdigitclassification(forvariousvaluesofGaussianpriorprecisionparameterλ)

ExistingMethod(BBVI)Ourmethod(Vadam)

DeepReinforcementLearning

18

NoExploration(SGD)Reward=2860

ExplorationusingVadamReward=5264

ReduceOverfittingwithVadam

19

Vadam showsconsistenttrain-testperformance,whileAdamoverfits whenNissmall

BNNclassificationona1a- a9adatasets

AdamTestAdamTest

AdamTrain AdamTrain

AdamTest

AdamTrain

Vadam Testandtrain Vadam Testandtrain

Vadam Testandtrain AdamTest

AdamTrain

20

AvoidingLocal

MinimaAnexampletakenfromCasellaand

Robert’sbook.

Vadam reachestheflat

minima,butGDgetsstuckatalocalminima.

Optimizationbysmoothing,Gaussianhomotopy/blurringetc.,EntropySGLDetc.

Summary

• Uncertaintyisimportant,especiallywhenthedataisscarce,missing,unreliableetc.

• Wecanobtainuncertaintycheaplywithverylittleeffort– Bayesiandeeplearning

• Itworksreasonablywellonourbenchmarks.

21

OpenQuestions

• Qualityofuncertaintyestimates– Applicationtolifescience?– Checkoutthe“Bayesiandeeplearning”workshopatNIPS2018.

• Estimatingvarioustypesofuncertainty–Modeluncertaintyvsdatauncertainty– Applicationsplayabigrolehere

• Isuncertaintyindeeplearninguseful?–Multiplelocalminimamakeitdifficulttoestablish

22

References

23

https://emtiyaz.github.io

Thanks!

https://emtiyaz.github.io

24

Date post:	29-Sep-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Fast Computation of Uncertainty in Deep Learning · YarinGal (University of Oxford), Akash...

Documents