+ All Categories
Home > Documents > Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables •...

Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables •...

Date post: 23-May-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
44
Lecture 5: Hidden Variables
Transcript
Page 1: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

Lecture5:HiddenVariables

Page 2: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

RandomVariablesinDecoding

inputs(X)

outputs(Y)

parameters(w)

Page 3: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

RandomVariablesinSupervisedLearning

inputs(X)

outputs(Y)

parameters(w)

Page 4: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

HiddenVariablesareDifferent

•  Weusetheterm“hiddenvariable”(or“latentvariable”)torefertosomethingweneversee.– Notevenintraining.– SomeFmeswebelievetheyarereal.– SomeFmeswebelievetheyonlyapproximatereality.

Page 5: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

RandomVariablesinDecoding

inputs(X)

outputs(Y)

parameters(w)

latent(Z)

Page 6: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

RandomVariablesinSupervisedLearning

inputs(X)

outputs(Y)

parameters(w)

latent(Z)

Page 7: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

LatentVariablesandInference

•  BothlearninganddecodingcanbesFllbeunderstoodasinferenceproblems.

•  Usually“mixed”:– somevariablesaregeLngmaximized– somevariablesaregeLngsummed

Page 8: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

WordAlignments

•  SinceIBMmodel1,wordalignmentshavebeentheprototypicalhiddenvariable.

•  UlFmately,intranslaFon,wedonotcarewhattheyare.

•  Currentapproach:learnthewordalignmentsunsupervised,thenfixthemtotheirmostlikelyvalues.–  ThenconstructmodelsfortranslaFon.

•  Alignmentonitsown:unsupervisedproblem.•  MTonitsown:supervisedproblem.•  MT+alignment:supervisedproblemwithlatentvariables.

Page 9: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

AlignmentsinText‐to‐TextProblems

•  Wangetal.(2007):“Jeopardy”modelforanswerrankinginQA.– AlignquesFonstoanswers.– SimilarmodelforparaphrasedetecFon(DasandSmith,2009)

Page 10: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

LatentAnnotaFonsinParsing

•  Treebankcategories(N,NN,NP,etc.)aretoocoarse‐grained.– LexicalizaFon(Collins,Eisner)–  Johnson’s(1998)parentannotaFon– KleinandManning(2003)parser

•  Treatthetrue,fine‐grainedcategoryashidden,andinferitfromdata.– Matsuzaki,Petrov,Dreyer,manyothers.

Page 11: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

RicherFormalisms

•  Cohnetal.(2009):treesubsFtuFongrammar.– Derivedtreeisobserved(outputvariable).– DerivaFontree(segmentaFonintoelementarytrees)ishidden.

•  ZeglemoyerandCollins(2005andlater):inferCCGsyntaxfromfirst‐orderlogicalexpressionsandsentences.

•  Liangetal.(2011):infersemanFcrepresentaFonfromtextanddatabase.

Page 12: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

TopicModels

•  Infertopics(ortopicblends)indocuments.•  LatentDirichletallocaFon(Bleietal.,2003)isagreatexample.– SomeFmesaugmentedwithanoutputvariable(BleiandMcAuliffe,2007)–“supervised”LDA.

– Manyextensions!

Page 13: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

UnsupervisedNLP

•  Clustering(Brown,1992,manymore)•  POStagging(Merialdo,1994,manymore)•  Parsing(PereiraandSchabes,KleinandManning,…)•  SegmentaFon(word–Goldwater;discourse–Eisenstein)

•  Morphology•  LexicalsemanFcs•  Syntax‐semanFcscorrespondences•  SenFmentanalysis•  CoreferenceresoluFon•  Word,phrase,andtreealignment

Page 14: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

SupervisedorUnsupervised?

•  Dependsonthetask,notthemodel.–  Isay“unsupervised”whentheoutputvariablesarehiddenattrainingFme.

Page 15: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

RandomVariablesinUnsupervisedLearning

inputs(X)

outputs(Y)

parameters(w)

latent(Z)

Page 16: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

ProbabilisFcView

•  TheusualstarFngpointforhiddenvariablesismaximumlikelihood.– “Input”and“output”donotmager;onlyobserved/latent.

Page 17: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

RandomVariablesinProbabilisFcLearning

visible(V)

latent(L)

parameters(w)

Page 18: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

EmpiricalRiskView

•  Log‐loss–  Equatestomaximummarginallikelihood(orMAPifR(w)isanegatedlogprior).

–  UnlikelossfuncFonsinlecture4,thisisnotconvex!–  EMseekstosolvethisproblem(butit’snottheonlyway).–  RegularizaFondecisionsareorthogonal.

loss(v;hw) = ! log pw(v)

= ! log!

!

pw(v, !)

minw!Rd

1N

!

i

loss(vi;hw) + R(w)

Page 19: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

OpFmizingtheMarginalLog‐Loss

•  EMasinference•  EMasopFmizaFon

•  DirectopFmizaFon

Page 20: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

GenericEMAlgorithm

•  Input:w(0)andobservaFonsv1,v2,…vN•  Output:learnedw•  t=0•  RepeatunFlw(t)≈w(t‐1):–  Estep:

– Mstep:

–  ++t•  Returnw(t)

!i,!!, q(t)i (!)" pw(t)(! | vi)

w(t+1) ! maxw

!

i

!

!

q(t)i (!) log pw(vi, !)

Page 21: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

MAPLearningasaGraphicalModel

w L

V

R

exp–R(w)=p(w)

pw(L)

pw(V|L)

•  Combinedinference(maxoverw,sumoverL)isveryhard.–  Ifwwerefixed,geLngtheposterioroverLwouldn’tbesobad.

–  IfLwerefixed,maximizingoverwwouldn’tbesobad.

Page 22: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

MAPLearningasaGraphicalModel

w L

V

R

exp–R(w)=p(w)

pw(L)

pw(V|L)

Estep Mstep

w L

V

R

exp–R(w)=p(w)

pw(L)

pw(V|L)

Page 23: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

Baum‐Welch(EMforHMMs)asanExample

•  Estep:forward‐backwardalgorithm(oneachexample).–  ThisisexactmarginalinferencebyvariableeliminaFon.

–  Thestructureofthegraphicalmodelletsusdothisbydynamicprogramming.

–  ThemarginalsareprobabiliFesoftransiFonandemissioneventsateachposiFon.

•  Mstep:MLEbasedonsoteventcounts.–  RelaFvefrequencyesFmaFonaccomplishesMLEformulFnomials.

Page 24: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

visible(V) latent(L)

parameters(w)

Baum‐WelchasaGraphicalModel

emit Y1

X1

R

Y2

X2

transit

Y3

X3

YnXn

Page 25: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

visible(V) latent(L)

parameters(w)

AcFveTrail!

emit Y1

X1

R

Y2

X2

transit

Y3

X3

YnXn

Page 26: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

visible(V) visible(V)

parameters(w)

NoAcFveTrailinAll‐VisibleCase

emit Y1

X1

R

Y2

X2

transit

Y3

X3

YnXn

Page 27: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

WhyLatentVariablesMakeLearningHard

•  NewintuiFon:parametersarenowinterdependentthatwerenotinterdependentinthefully‐visiblecase.

•  ItallgoesbacktoacFvetrails.

Page 28: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

“Viterbi”Learningis“Okay”!

w L

V

R

exp–R(w)=p(w)

pw(L)

pw(V|L)

•  ApproximatejointMAPinferenceoverwandL(mostprobableexplanaFoninference).

•  LossfuncFon: loss(v;hw) = !max!

log pw(v, !)

Page 29: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

CondiFonalModels

•  EMisusuallycloselyassociatedwithfullygeneraFveapproaches.

•  Youcandothesamethingswithlog‐linearmodelsandwithcondiFonalmodels.– Locallynormalizedmodelsgiveflexibilitywithoutrequiringglobalinference(Berg‐Kirkpatrick,2010).

– HiddenvariableCRFs(Quagonietal.,2007)areverypowerful.

Page 30: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

LearningCondiFonalHiddenVariableModels

w L

Vout

R

Vin

w

Vout

R

VindistribuFonoverVinisnotmodeled

standardcondiFonalmodel(e.g.,CRF)

Page 31: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

OpFmizaFonforHiddenVariables

•  We’vedescribedhiddenvariablelearningasinferenceproblems.

•  ItismorepracFcal,ofcourse,tothinkaboutthisasop.miza.on.

•  EMcanbeunderstoodfromanopFmizaFonframework,aswell.

Page 32: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

EMandLikelihood

•  TheconnecFonbetweenthegoalaboveandtheEMprocedureisnotimmediatelyclear.

!(w) =!

i

log!

!

pw(vi, !)

Page 33: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

OpFmizaFonViewofEM

•  AfuncFonofwandthecollecFonofqi.•  Claim:EMperformscoordinate ascentonthisfuncFon.

!

i

"!

!

!

qi(!) log qi(!) +!

!

qi(!) log pw(! | vi) + log pw(vi)

#

Page 34: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

OpFmizaFonViewofEM

•  Thethirdtermisouractualgoal,Φ.Itonlydependsonw(nottheqi).

!

i

"!

!

!

qi(!) log qi(!) +!

!

qi(!) log pw(! | vi) + log pw(vi)

#!(w)

Page 35: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

OpFmizaFonViewofEM

•  ThelagertwotermstogetherarepreciselywhatwemaximizeontheMstep,giventhecurrentqi.–  Thisisaconcaveproblemandwesolveitexactly.

!

i

"!

!

!

qi(!) log qi(!) +!

!

qi(!) log pw(! | vi) + log pw(vi)

#!(w)

!

!

qi(!) log pw(vi, !)

Page 36: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

OpFmizaFonViewofEM

•  Concern:istheMstepimprovingterm2attheexpenseofΦ?– No.

!

i

"!

!

!

qi(!) log qi(!) +!

!

qi(!) log pw(! | vi) + log pw(vi)

#!(w)

!

!

qi(!) log pw(vi, !)

Page 37: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

TheMStep

•  SecondpartisalsonotgeLnganyworsefromiteraFontoiteraFon:

!(w) =!

i

!

!

q(t)i (!) log pw(vi, !)!

!

i

!

!

q(t)i (!) log pw(! | vi)

!!

i

!

!

q(t)i (!) log pw(t+1)(! | vi) +

!

i

!

!

q(t)i (!) log pw(t)(! | vi)

= !!

i

!

!

q(t)i (!) log pw(t+1)(! | vi) +

!

i

!

!

q(t)i (!) log q(t)

i (!)

=!

i

D(q(t)i (·)"pw(t+1)(· | vi))

# 0

Page 38: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

TheMStep

•  EachMstep,onceqiisfixed,maximizesaboundonthelog‐likelihoodΦ.– Forfixedqi,thisisaconcaveproblemwecansolveinclosedforminmanycases.

•  WhatabouttheEstep?

Page 39: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

OpFmizaFonViewofEM

•  Estepconsidersthefirsttwoterms.•  Setseachqitobeequaltotheposteriorunderthecurrentmodel.

!

i

"!

!

!

qi(!) log qi(!) +!

!

qi(!) log pw(! | vi) + log pw(vi)

#!(w)

!

!

qi(!) log pw(vi, !)

!D(qi(·)"pw(· | vi))

Page 40: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

CoordinateAscent

•  Estepfixeswandsolvesfortheqi.•  Mstepfixesallqiandsolvesforw.

!

i

"!

!

!

qi(!) log qi(!) +!

!

qi(!) log pw(! | vi) + log pw(vi)

#

!

!

qi(!) log pw(vi, !)

!D(qi(·)"pw(· | vi))

Page 41: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

ThingsPeopleForgetAboutEM

•  MulFplerandomstarts(ornon‐randomstarts),selectusinglikelihoodondevelopmentdata.

•  VariantsmayhelpavoidlocalopFma…

Page 42: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

VariantsofEM

•  “Online”variantswherewedoanEstepononeoramini‐batchofexamplesaresFllcoordinateascent(NealandHinton,1998).

•  DeterminisFcannealing:flagenouttheqi,makingthefuncFonclosertoconcave.

•  StochasFcvariant:userandomizedapproximateinferenceforEstep.

•  “Generalized”EM:improvewbutdon’tbotheropFmizingcompletely.

Page 43: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

DirectOpFmizaFon

•  AnalternaFvetoEM:applystochasFcgradientascentorquasi‐NewtonmethodsdirectlytoΦ.

•  TypicallydoneforMN‐likemodelswithfeatures,e.g.,latent‐variableCRFs.– GradientisadifferenceoffeatureexpectaFons.– Requiresmarginalinference.

Page 44: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more

Summary

•  EM:manywaystounderstandit.– Theguarantee:eachroundwillimprovethelikelihood.

– That’saboutasmuchaswecansay.

•  SomeFmesitworks.– SmartiniFalizers– Lotsofbiasinherentinthemodelstructure/assumpFons


Recommended