Lecture 5: Message Passing/Belief Propagation · Lecture 5: Message Passing/Belief Propagation Theo...

Post on 06-Jul-2020

13 views 0 download

transcript

CS839:ProbabilisticGraphicalModels

Lecture5:MessagePassing/BeliefPropagation

TheoRekatsinas

1

JunctionTree

2

• Acliquetreeforatriangulated graphisreferredtoasajunctiontree

• Injunctiontrees,localconsistencyimpliesglobalconsistency.Thusthelocalmessage-passingalgorithmsis(provably)correct.

• Only triangulatedgraphshavethepropertythattheircliquetreesarejunctiontrees.Thusifwewantlocalalgorithms,wemust triangulate

Howtotriangulate?

3

• Intermediatetermscorrespondtothecliquesresultedfromelimination• VEvsMPoverjunctiontree?

SketchoftheJunctionTreeAlgorithm

4

• Resultsinmarginalprobabilitiesofallcliques--- solvesallqueriesinasinglerun• AgenericexactinferencealgorithmforanyGM• Complexity:exponentialinthesizeofthemaximalclique--- agoodeliminationorderoftenleadstosmallmaximalclique,andhenceagood(i.e.,thin)JT

InferenceinHMMs

5

• Summingwithelimination

• Messagepassingcorrespondstoaforwardandbackwardpass

InferenceinHMMs

6

• AjunctiontreefortheHMM

• Forwardpass

InferenceinHMMs

7

• AjunctiontreefortheHMM

• Backwardpass

CS839:ProbabilisticGraphicalModels

Lecture6:GeneralizedLinearModels(MLE)

TheoRekatsinas

8

ParametersinGraphicalModels

9

• Bayesiannetwork

Howdowefindtheseparameters?

LinearRegressionasaBayesNet

10

• LinearReg:D=((x1,y1),(x2,y2),…,(xn,yn)),

• Assumethatε (errortermofunmoldedeffectsofrandomnoise)isaGaussianrandomvariableN(0,σ2)

• UseLeast-Mean-Squarealgorithmtoestimateparameters.

yi = ✓Txi + ✏i

xi 2 Rd, yi 2 R

p(yi|xi; ✓) =1p2⇡�2

exp

✓� (yi � ✓Txi)

2

2�2

LogisticRegression(sigmoidclassifier)asaGm

11

• ConditionaldistributionisaBernoulli:

• Wecanuseatailoredgradientmethodagainasinlinearregression

• Butseethatp(y|x)belongsintheexponentialfamily anditisageneralizedlinearmodel.

p(y|x) = µ(x))

y(1� µ(x)))

1�y

µ(x)) =

1

1 + exp(�✓

Tx)

Markovrandomfields

12

RestrictedBoltzmannMachines

13

ConditionalRandomFields

14

• Discriminative

• Doesn’tassumethatfeaturesareindependent

• Whenlabelingfeatureobservationsaretakenintoaccount

P✓(Y |X) =

1

Z(✓, X)

exp

X

c

✓cfc(X,Yc)

!

Exponentialfamily:abasicbuildingblock

15

• ForanumericrandomvariableX

isanexponentialfamilydistributionwithnatural(canonical)parameterη

• FunctionT(x)isasufficientstatistic.• FunctionA(η)=logZ(η)isthelognormalizer• Examples:Bernoulli,multinomial,Gaussian,Poisson,Gamma,Categorical

p(x|⌘) = h(x) exp

�⌘

TT (x)�A(⌘)

�=

1

Z(⌘)

h(x) exp(⌘

TT (x))

Example:MultivariateGaussianDistribution

16

• Foracontinuousvectorrandomvariable

• Exponentialfamilyrepresentation

X 2 Rk

Example:MultivariateGaussianDistribution

17

• Forabinaryvectorrandomvariablex~multi(x|π)

Whyexponentialfamily?

18

• Momentgeneratingproperty

WecaneasilycomputemomentsofanyexponentialfamilydistributionbytakingthederivativesofthelognormalizerA(η)

Momentsvscanonicalparameters

19

• Themomentparameters(e.g.,μ) canbederivedfromthenaturalparameters• First=mean• Second– variance• Etc.

• A(η) isconvex

• Hence,wecaninverttherelationshipandinferthecanonicalparametersfromthemomentparameters(1-to-1)• Adistributionintheexp.familycanbeparametrizednotonlybyη butalsobyμ

MLEforExponentialFamily

20

• Foriid datathelog-likelihoodis

• Wetakethederivativesandsetthemtozero

• Weperformmomentmatching

• Wecaninferthecanonicalparametersusing

Sufficiency

21

• Forp(x|θ),T(x)issufficient forθifthereisnoinformationinXregardingθ beyondthatinT(x)• WecanthrowawayXforthepurposeofinferencew.r.t.Θ

• BayesianView

• FrequentistView

• Neyman factorizationtheorem• T(x)issufficientforθ if

Examples

22

• Gaussian:

• Multinomial:

GeneralizedLinearModels

23

• Thegraphicalmodel:• Linearregression• Discriminativelinearclassification

• GeneralizedLinearModel• Theobservedinputxisassumedtoenterintothemodelviaalinearcombinationofitselements.

• Theconditionalmeanμisrepresentedasafunctionf(ξ)ofξ,wherefisknownastheresponsefunction.

• Theobservedoutputyisassumedtobecharacterizedbyanexponentialfamilydistributionwithconditionalmeanμ.

Ep(T ) = µ = f(✓TX)

⇠ = ✓Tx

GeneralizedLinearModels

24

MLEforGLIMswithnaturalresponse

25

• Log-likelihood

• Derivativeoflog-likelihood

• LearningforcanonicalGLIMs• Stochasticgradientascent=leastmeansquares(LMS)

Second-ordermethods

26

• TheHessianmatrix

• XisthedesignmatrixandWiscomputedbycalculatingthe2-ndderivativeofA(ηn)

BacktoLeastSquares

27

• Objectivefunctioninmatrixform

• Tominimizethisobjectivewetakethederivativeandsetittozero

IterativelyReweightedLeastSquares

28

• Newton-RaphsonmethodswithobjectiveJ

• Wehave

• Update

IterativelyReweightedLeastSquares

29

• Newton-RaphsonmethodswithobjectiveJ

• Wehave

• Update

GenericupdateforanyExp FamilyDistribution

Example1:LogisticRegression

30

• ConditionaldistributionisaBernoulli:

• IRLS

p(y|x) = µ(x))

y(1� µ(x)))

1�y

µ(x)) =

1

1 + exp(�⌘(x))

⌘ = ⇠ = ✓

x

@⌘= µ(1� µ)

W =

2

64µ11� µ1 . . . . . .

.... . .

.... . . . . . µN (1� µN )

3

75

Example2:LinearRegression

31

• ConditionaldistributionisaGaussian:

• IRLS

SimpleGLIMsarethebuildingblocksofcomplexBNs

32

CPDscorrespondtoGLIMs

MLEforgeneralBNs

33

• IfweassumetheparametersforeachCPDaregloballyindependent,andallnodesarefullyobserved,thenthelog-likelihoodfunctiondecomposesintoasumoflocalterms,onepernode

• MLE-basedparameterestimationofGMreducestolocalest.ofeachGLIM.

Summary

34

• Forexponentialfamilydistributions,MLEamountstomomentmatching

• GLIM:• Naturalresponse• IterativelyReweightedLeastSquaresasageneralalgorithm

• GLIMsarebuildingblocksofmostpracticalGMs