Lecture 5: Message Passing/Belief Propagation · Lecture 5: Message Passing/Belief Propagation Theo...

transcript

CS839:ProbabilisticGraphicalModels

Lecture5:MessagePassing/BeliefPropagation

TheoRekatsinas

JunctionTree

• Acliquetreeforatriangulated graphisreferredtoasajunctiontree

• Injunctiontrees,localconsistencyimpliesglobalconsistency.Thusthelocalmessage-passingalgorithmsis(provably)correct.

• Only triangulatedgraphshavethepropertythattheircliquetreesarejunctiontrees.Thusifwewantlocalalgorithms,wemust triangulate

Howtotriangulate?

• Intermediatetermscorrespondtothecliquesresultedfromelimination• VEvsMPoverjunctiontree?

SketchoftheJunctionTreeAlgorithm

• Resultsinmarginalprobabilitiesofallcliques--- solvesallqueriesinasinglerun• AgenericexactinferencealgorithmforanyGM• Complexity:exponentialinthesizeofthemaximalclique--- agoodeliminationorderoftenleadstosmallmaximalclique,andhenceagood(i.e.,thin)JT

InferenceinHMMs

• Summingwithelimination

• Messagepassingcorrespondstoaforwardandbackwardpass

InferenceinHMMs

• AjunctiontreefortheHMM

• Forwardpass

InferenceinHMMs

• AjunctiontreefortheHMM

• Backwardpass

CS839:ProbabilisticGraphicalModels

Lecture6:GeneralizedLinearModels(MLE)

TheoRekatsinas

ParametersinGraphicalModels

• Bayesiannetwork

Howdowefindtheseparameters?

LinearRegressionasaBayesNet

• LinearReg:D=((x1,y1),(x2,y2),…,(xn,yn)),

• Assumethatε (errortermofunmoldedeffectsofrandomnoise)isaGaussianrandomvariableN(0,σ2)

• UseLeast-Mean-Squarealgorithmtoestimateparameters.

yi = ✓Txi + ✏i

xi 2 Rd, yi 2 R

p(yi|xi; ✓) =1p2⇡�2

✓� (yi � ✓Txi)

LogisticRegression(sigmoidclassifier)asaGm

• ConditionaldistributionisaBernoulli:

• Wecanuseatailoredgradientmethodagainasinlinearregression

• Butseethatp(y|x)belongsintheexponentialfamily anditisageneralizedlinearmodel.

p(y|x) = µ(x))

y(1� µ(x)))

µ(x)) =

1 + exp(�✓

Markovrandomfields

RestrictedBoltzmannMachines

ConditionalRandomFields

• Discriminative

• Doesn’tassumethatfeaturesareindependent

• Whenlabelingfeatureobservationsaretakenintoaccount

P✓(Y |X) =

Z(✓, X)

✓cfc(X,Yc)

Exponentialfamily:abasicbuildingblock

• ForanumericrandomvariableX

isanexponentialfamilydistributionwithnatural(canonical)parameterη

• FunctionT(x)isasufficientstatistic.• FunctionA(η)=logZ(η)isthelognormalizer• Examples:Bernoulli,multinomial,Gaussian,Poisson,Gamma,Categorical

p(x|⌘) = h(x) exp

�⌘

TT (x)�A(⌘)

Z(⌘)

h(x) exp(⌘

TT (x))

Example:MultivariateGaussianDistribution

• Foracontinuousvectorrandomvariable

• Exponentialfamilyrepresentation

X 2 Rk

Example:MultivariateGaussianDistribution

• Forabinaryvectorrandomvariablex~multi(x|π)

Whyexponentialfamily?

• Momentgeneratingproperty

WecaneasilycomputemomentsofanyexponentialfamilydistributionbytakingthederivativesofthelognormalizerA(η)

Momentsvscanonicalparameters

• Themomentparameters(e.g.,μ) canbederivedfromthenaturalparameters• First=mean• Second– variance• Etc.

• A(η) isconvex

• Hence,wecaninverttherelationshipandinferthecanonicalparametersfromthemomentparameters(1-to-1)• Adistributionintheexp.familycanbeparametrizednotonlybyη butalsobyμ

MLEforExponentialFamily

• Foriid datathelog-likelihoodis

• Wetakethederivativesandsetthemtozero

• Weperformmomentmatching

• Wecaninferthecanonicalparametersusing

Sufficiency

• Forp(x|θ),T(x)issufficient forθifthereisnoinformationinXregardingθ beyondthatinT(x)• WecanthrowawayXforthepurposeofinferencew.r.t.Θ

• BayesianView

• FrequentistView

• Neyman factorizationtheorem• T(x)issufficientforθ if

Examples

• Gaussian:

• Multinomial:

GeneralizedLinearModels

• Thegraphicalmodel:• Linearregression• Discriminativelinearclassification

• GeneralizedLinearModel• Theobservedinputxisassumedtoenterintothemodelviaalinearcombinationofitselements.

• Theconditionalmeanμisrepresentedasafunctionf(ξ)ofξ,wherefisknownastheresponsefunction.

• Theobservedoutputyisassumedtobecharacterizedbyanexponentialfamilydistributionwithconditionalmeanμ.

Ep(T ) = µ = f(✓TX)

⇠ = ✓Tx

GeneralizedLinearModels

MLEforGLIMswithnaturalresponse

• Log-likelihood

• Derivativeoflog-likelihood

• LearningforcanonicalGLIMs• Stochasticgradientascent=leastmeansquares(LMS)

Second-ordermethods

• TheHessianmatrix

• XisthedesignmatrixandWiscomputedbycalculatingthe2-ndderivativeofA(ηn)

BacktoLeastSquares

• Objectivefunctioninmatrixform

• Tominimizethisobjectivewetakethederivativeandsetittozero

IterativelyReweightedLeastSquares

• Newton-RaphsonmethodswithobjectiveJ

• Wehave

• Update

IterativelyReweightedLeastSquares

• Newton-RaphsonmethodswithobjectiveJ

• Wehave

• Update

GenericupdateforanyExp FamilyDistribution

Example1:LogisticRegression

• ConditionaldistributionisaBernoulli:

• IRLS

p(y|x) = µ(x))

y(1� µ(x)))

µ(x)) =

1 + exp(�⌘(x))

⌘ = ⇠ = ✓

@⌘= µ(1� µ)

64µ11� µ1 . . . . . .

.... . .

.... . . . . . µN (1� µN )

Example2:LinearRegression

• ConditionaldistributionisaGaussian:

• IRLS

SimpleGLIMsarethebuildingblocksofcomplexBNs

CPDscorrespondtoGLIMs

MLEforgeneralBNs

• IfweassumetheparametersforeachCPDaregloballyindependent,andallnodesarefullyobserved,thenthelog-likelihoodfunctiondecomposesintoasumoflocalterms,onepernode

• MLE-basedparameterestimationofGMreducestolocalest.ofeachGLIM.

Summary

• Forexponentialfamilydistributions,MLEamountstomomentmatching

• GLIM:• Naturalresponse• IterativelyReweightedLeastSquaresasageneralalgorithm

• GLIMsarebuildingblocksofmostpracticalGMs

Lecture 5: Message Passing/Belief Propagation · Lecture 5: Message Passing/Belief Propagation Theo...

Documents