+ All Categories
Home > Documents > Heavily adapted from slides by Mitch Marcuscis520/lectures/bayes... · 1 z 2 z 3 z 4 w 1 w 2 w 3 w...

Heavily adapted from slides by Mitch Marcuscis520/lectures/bayes... · 1 z 2 z 3 z 4 w 1 w 2 w 3 w...

Date post: 12-Aug-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
28
Recitation: Bayes Nets and Friends Lyle Ungar and Tony Liu Heavily adapted from slides by Mitch Marcus
Transcript
Page 1: Heavily adapted from slides by Mitch Marcuscis520/lectures/bayes... · 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 b z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 For each document,

Recitation: Bayes Nets and Friends

Lyle Ungar and Tony LiuHeavily adapted from slides by Mitch Marcus

Page 2: Heavily adapted from slides by Mitch Marcuscis520/lectures/bayes... · 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 b z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 For each document,

Recitation Plan

◆ Naïve Bayes Exercise

◆ LDA Example

◆ Bayes Net Exercises

◆ HMM Example

Page 3: Heavily adapted from slides by Mitch Marcuscis520/lectures/bayes... · 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 b z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 For each document,

Recall: Naïve Bayes

Page 4: Heavily adapted from slides by Mitch Marcuscis520/lectures/bayes... · 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 b z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 For each document,

Recall: Naïve Bayes◆ Conditional independence assumption

◆ As MAP estimator (uses prior for smoothing)

⚫ Contrast MLE – what’s the problem?

Page 5: Heavily adapted from slides by Mitch Marcuscis520/lectures/bayes... · 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 b z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 For each document,

Naïve Bayes Exercise

Page 6: Heavily adapted from slides by Mitch Marcuscis520/lectures/bayes... · 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 b z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 For each document,

Recall: The LDA Model

Page 7: Heavily adapted from slides by Mitch Marcuscis520/lectures/bayes... · 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 b z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 For each document,

Recall: The LDA Model

z4z3z2z1

w4w3w2w1

b

z4z3z2z1

w4w3w2w1

z4z3z2z1

w4w3w2w1

◆ For each document,

Page 8: Heavily adapted from slides by Mitch Marcuscis520/lectures/bayes... · 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 b z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 For each document,

Recall: The LDA Model

z4z3z2z1

w4w3w2w1

b

z4z3z2z1

w4w3w2w1

z4z3z2z1

w4w3w2w1

◆ For each document,

⚫ Choose the topic distribution ~ Dirichlet()

Page 9: Heavily adapted from slides by Mitch Marcuscis520/lectures/bayes... · 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 b z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 For each document,

Recall: The LDA Model

z4z3z2z1

w4w3w2w1

b

z4z3z2z1

w4w3w2w1

z4z3z2z1

w4w3w2w1

◆ For each document,

⚫ Choose the topic distribution ~ Dirichlet()

⚫ For each of the N words wn:

Page 10: Heavily adapted from slides by Mitch Marcuscis520/lectures/bayes... · 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 b z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 For each document,

Recall: The LDA Model

z4z3z2z1

w4w3w2w1

b

z4z3z2z1

w4w3w2w1

z4z3z2z1

w4w3w2w1

◆ For each document,

⚫ Choose the topic distribution ~ Dirichlet()

⚫ For each of the N words wn:

◼ Choose a topic z ~ Multinomial()

Page 11: Heavily adapted from slides by Mitch Marcuscis520/lectures/bayes... · 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 b z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 For each document,

Recall: The LDA Model

z4z3z2z1

w4w3w2w1

b

z4z3z2z1

w4w3w2w1

z4z3z2z1

w4w3w2w1

◆ For each document,

⚫ Choose the topic distribution ~ Dirichlet()

⚫ For each of the N words wn:

◼ Choose a topic z ~ Multinomial()

◼ Then choose a word wn ~ Multinomial(bz)

Where each topic has a different parameter vector b for the words

Page 12: Heavily adapted from slides by Mitch Marcuscis520/lectures/bayes... · 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 b z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 For each document,

h

d ~ Dirichlet()

zd,n ~ Multinomial(d)

wd,n ~ Multinomial(bz)

David Blei 2012: Probabilistic Topic Models (on course website)

Page 13: Heavily adapted from slides by Mitch Marcuscis520/lectures/bayes... · 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 b z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 For each document,

LDA Parameter Estimation

◆ Given a corpus of documents, find the parameters and b which

maximize the likelihood of the observed data (words in documents),

marginalizing over the hidden variables , z

◆ E-step:

⚫ Compute p(,z|w,,b), the posterior of the hidden variables (,z)

given each document w, and parameters and b.

◆ M-step

⚫ Estimate parameters and b given the current hidden variable

distribution estimates

: topic distribution for the document,z: topic for each word in the document

You don’t need to know the details;Only what is hidden and what is observed;And that EM works here.

Page 14: Heavily adapted from slides by Mitch Marcuscis520/lectures/bayes... · 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 b z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 For each document,

LDA Exercise

Page 15: Heavily adapted from slides by Mitch Marcuscis520/lectures/bayes... · 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 b z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 For each document,

Recall: Bayes Nets

Page 16: Heavily adapted from slides by Mitch Marcuscis520/lectures/bayes... · 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 b z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 For each document,

Recall: Bayes Nets

◆ Local Markov Assumption

⚫ given its parents, each node is conditionally

independent of everything except its descendants

◆ Active Trails

◆ D Separation

Page 17: Heavily adapted from slides by Mitch Marcuscis520/lectures/bayes... · 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 b z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 For each document,

Active Trails

A trail {X1,X2,⋯,Xk} in the graph (no cycles) is an active

trail if for each consecutive triplet in the trail:

◆ Xi−1→Xi→Xi+1, and Xi is not observed

Xi−1←Xi←Xi+1, and Xi is not observed

Xi−1←Xi→Xi+1, and Xi is not observed

Xi−1→Xi←Xi+1, and Xi is observed or one of its

descendants is observed

Variables connected by active trails are not conditionally independent

Page 18: Heavily adapted from slides by Mitch Marcuscis520/lectures/bayes... · 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 b z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 For each document,

D-separation

◆ Variables Xi and Xj are independent if there is no

active trail between Xi and Xj .

⚫ given a set of observed variables O⊂{X1,⋯,Xm}

⚫ O sometimes called a Markov Blanket

Page 19: Heavily adapted from slides by Mitch Marcuscis520/lectures/bayes... · 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 b z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 For each document,

Bayes Net Exercises

Page 20: Heavily adapted from slides by Mitch Marcuscis520/lectures/bayes... · 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 b z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 For each document,

Bayes Net Exercises

Page 21: Heavily adapted from slides by Mitch Marcuscis520/lectures/bayes... · 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 b z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 For each document,

Bayes Net Exercises

Page 22: Heavily adapted from slides by Mitch Marcuscis520/lectures/bayes... · 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 b z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 For each document,

Bayes Net Exercises

Page 23: Heavily adapted from slides by Mitch Marcuscis520/lectures/bayes... · 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 b z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 For each document,

Recall: Hidden Markov Models

Page 24: Heavily adapted from slides by Mitch Marcuscis520/lectures/bayes... · 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 b z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 For each document,

Recall: Hidden Markov Models

◆ Markov assumption

◆ Parameterization

Page 25: Heavily adapted from slides by Mitch Marcuscis520/lectures/bayes... · 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 b z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 For each document,

Parameters of an HMM

◆ States: A set of states S = s1,…,sk

◆ Markov transition probabilities: A =

a1,1,a1,2,…,ak,k Each ai,j = p(sj | si) represents the

probability of transitioning from state si to sj.

◆ Emission probabilities: A set B of functions of

the form bi(ot) = p(o|si) giving the probability of

observation ot being emitted by si

◆ Initial state distribution: the probability that si is

a start state

Page 26: Heavily adapted from slides by Mitch Marcuscis520/lectures/bayes... · 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 b z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 For each document,

Markov Model ExampleTomorrow’s Weather

Today’s

Weather

Sunny Rainy

Sunny 0.8 0.2

Rainy 0.6 0.4

S1 = [0, 1]

Markov Transition Matrix A

Steady state at [0.75, 0.25]

Page 27: Heavily adapted from slides by Mitch Marcuscis520/lectures/bayes... · 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 b z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 For each document,

Hidden Markov Model ExampleTomorrow’s Weather

Today’s

Weather

Sunny Rainy

Sunny 0.8 0.2

Rainy 0.6 0.4

S1 = [0.5, 0.5]

Markov Transition Matrix A

Weather

Sunny Rainy

Umbrella 0.1 0.8

No Umbrella 0.9 0.2

Emission Probabilities B

We observe:(umbrella, no umbrella)

We can ask questions like:- What is the joint probability

of the states (rain, sun) and our observations?

Page 28: Heavily adapted from slides by Mitch Marcuscis520/lectures/bayes... · 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 b z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 For each document,

HMM Exercise


Recommended