An Introduction to Topic Modeling - Verbs...

Post on 29-May-2020

4 views 0 download

transcript

An Introduction to Topic Modeling

Daniel W. Peterson

Department of Computer ScienceUniversity of Colorado at Boulder

daniel.w.peterson@colorado.edu

April 24, 2013

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 1 / 20

Latent Semantic Analysis

Documents x Terms matrix: large and sparse

Use SVD to decompose it into three matrices

Keep only the “important” dimensions

Assumptions:

Word order doesn’t matterWords are orthogonal dimensions in a high-dimensional space

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 2 / 20

Probabilistic Latent Semantic Analysis

Documents are generated by a probabilistic process

Structure based on topicsDifferent topics make different words more likely

Assumptions:

Word order doesn’t matterEach word is chosen as the result of exactly one topic

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 3 / 20

Probabilistic Latent Semantic Analysis

N documents

A document is L words long

Each entry has an assignment toone of K topics

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 4 / 20

Probabilistic Latent Semantic Analysis

How do we choose a topic?

We sample from a distributionover topics.

How do we choose a word?We sample from a distributionover words.

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 5 / 20

Probabilistic Latent Semantic Analysis

How do we choose a topic?We sample from a distributionover topics.

How do we choose a word?

We sample from a distributionover words.

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 5 / 20

Probabilistic Latent Semantic Analysis

How do we choose a topic?We sample from a distributionover topics.

How do we choose a word?We sample from a distributionover words.

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 5 / 20

Multinomial Distribution

Select one of several possible outcomes

Outcomes may be equally likely (like dice)

OR: some outcomes may be more likely thanothers (load the dice)

Looks like: a 1× n vector of probabilities

[x1, x2, . . . , xn]x1 + x2 + . . .+ xn = 1every xi > 0

A sample looks like: a number

The outcome of rolling the diceProbability we get i is given by xi

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 6 / 20

Multinomial Distribution

Select one of several possible outcomes

Outcomes may be equally likely (like dice)

OR: some outcomes may be more likely thanothers (load the dice)

Looks like: a 1× n vector of probabilities

[x1, x2, . . . , xn]x1 + x2 + . . .+ xn = 1every xi > 0

A sample looks like: a number

The outcome of rolling the diceProbability we get i is given by xi

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 6 / 20

Multinomial Distribution

Select one of several possible outcomes

Outcomes may be equally likely (like dice)

OR: some outcomes may be more likely thanothers (load the dice)

Looks like: a 1× n vector of probabilities

[x1, x2, . . . , xn]x1 + x2 + . . .+ xn = 1every xi > 0

A sample looks like: a number

The outcome of rolling the diceProbability we get i is given by xi

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 6 / 20

Multinomial Distribution

Select one of several possible outcomes

Outcomes may be equally likely (like dice)

OR: some outcomes may be more likely thanothers (load the dice)

Looks like: a 1× n vector of probabilities

[x1, x2, . . . , xn]x1 + x2 + . . .+ xn = 1every xi > 0

A sample looks like: a number

The outcome of rolling the diceProbability we get i is given by xi

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 6 / 20

Multinomial Distribution

Select one of several possible outcomes

Outcomes may be equally likely (like dice)

OR: some outcomes may be more likely thanothers (load the dice)

Looks like: a 1× n vector of probabilities

[x1, x2, . . . , xn]x1 + x2 + . . .+ xn = 1every xi > 0

A sample looks like: a number

The outcome of rolling the diceProbability we get i is given by xi

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 6 / 20

Probabilistic Latent Semantic Analysis

θ is a distribution over topicsin a document

One θ for each document

θ is a 1× K vector

Sum of θ is 1

φ is a distribution over wordsin a topic

One φ for each topic

φ is a 1×W vector

Sum of φ is 1

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 7 / 20

Probabilistic Latent Semantic Analysis

θ is a distribution over topicsin a document

One θ for each document

θ is a 1× K vector

Sum of θ is 1

φ is a distribution over wordsin a topic

One φ for each topic

φ is a 1×W vector

Sum of φ is 1

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 7 / 20

Probabilistic Latent Semantic Analysis

Fold θ into graphicalmodel

Where do θ and φ comefrom?

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 8 / 20

Probabilistic Latent Semantic Analysis

Fold θ into graphicalmodel

Where do θ and φ comefrom?

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 8 / 20

Topic Modeling

Sample θ and φ from anappropriate distribution

Dirchlet: a distributionover distributions

Incorporating Dirichletprior provides smoothing

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 9 / 20

Topic Modeling

Sample θ and φ from anappropriate distribution

Dirchlet: a distributionover distributions

Incorporating Dirichletprior provides smoothing

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 9 / 20

Topic Modeling

Sample θ and φ from anappropriate distribution

Dirchlet: a distributionover distributions

Incorporating Dirichletprior provides smoothing

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 9 / 20

Dirichlet Distribution

Takes n parameters α1, α2, . . . , αn

Distribution over 1× n vectors with sum of 1

αi are called concentration parameters

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 10 / 20

Dirichlet Distribution with 2 Parameters

Figure: Image source: Wikipedia

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 11 / 20

Dirichlet Distribution with 3 Parameters

Figure: Image source: Yee Whye Teh

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 12 / 20

A Sample from a Dirichlet

A particular 1× n vector with sum of 1

[x1, x2, . . . , xn] such that x1 + x2 + . . .+ xn = 1

every xi > 0

A multinomial distribution

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 13 / 20

A Sample from a Dirichlet

A particular 1× n vector with sum of 1

[x1, x2, . . . , xn] such that x1 + x2 + . . .+ xn = 1

every xi > 0

A multinomial distribution

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 13 / 20

A Sample from a Dirichlet

A particular 1× n vector with sum of 1

[x1, x2, . . . , xn] such that x1 + x2 + . . .+ xn = 1

every xi > 0

A multinomial distribution

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 13 / 20

Topic Modeling

Sample θ and φ from aDirichlet distribution

This is important forwhen we turn the modelaround:

Dirichlet distribution isconjugate prior ofmultinomial:

Given a Dirichlet prior,and counts of topicassignments, theposterior is also Dirichlet

β and γ are smoothingparameters

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 14 / 20

Topic Modeling

Sample θ and φ from aDirichlet distribution

This is important forwhen we turn the modelaround:

Dirichlet distribution isconjugate prior ofmultinomial:

Given a Dirichlet prior,and counts of topicassignments, theposterior is also Dirichlet

β and γ are smoothingparameters

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 14 / 20

Topic Modeling

Sample θ and φ from aDirichlet distribution

This is important forwhen we turn the modelaround:

Dirichlet distribution isconjugate prior ofmultinomial:

Given a Dirichlet prior,and counts of topicassignments, theposterior is also Dirichlet

β and γ are smoothingparameters

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 14 / 20

Inference

Generative model explains how the data was created

Inference: trying to guess model parameters

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 15 / 20

Inference

Generative model explains how the data was created

Inference: trying to guess model parameters

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 15 / 20

Gibbs Sampling

Hard to determine most likely model parameters

Hard for even relatively likely parameters

Can’t sample from overall distribution: sample instead a singlevariable

Take a walk through distribution

One step (parameter) at a timeSpend more time walking around more likely areasWe can get to likely areas from anywhereIt doesn’t matter where we start!

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 16 / 20

Gibbs Sampling

Hard to determine most likely model parameters

Hard for even relatively likely parameters

Can’t sample from overall distribution: sample instead a singlevariable

Take a walk through distribution

One step (parameter) at a timeSpend more time walking around more likely areasWe can get to likely areas from anywhereIt doesn’t matter where we start!

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 16 / 20

Gibbs Sampling

Hard to determine most likely model parameters

Hard for even relatively likely parameters

Can’t sample from overall distribution: sample instead a singlevariable

Take a walk through distribution

One step (parameter) at a timeSpend more time walking around more likely areasWe can get to likely areas from anywhereIt doesn’t matter where we start!

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 16 / 20

Gibbs Sampling

Hard to determine most likely model parameters

Hard for even relatively likely parameters

Can’t sample from overall distribution: sample instead a singlevariable

Take a walk through distribution

One step (parameter) at a timeSpend more time walking around more likely areasWe can get to likely areas from anywhereIt doesn’t matter where we start!

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 16 / 20

Gibbs Sampling

Hard to determine most likely model parameters

Hard for even relatively likely parameters

Can’t sample from overall distribution: sample instead a singlevariable

Take a walk through distribution

One step (parameter) at a time

Spend more time walking around more likely areasWe can get to likely areas from anywhereIt doesn’t matter where we start!

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 16 / 20

Gibbs Sampling

Hard to determine most likely model parameters

Hard for even relatively likely parameters

Can’t sample from overall distribution: sample instead a singlevariable

Take a walk through distribution

One step (parameter) at a timeSpend more time walking around more likely areas

We can get to likely areas from anywhereIt doesn’t matter where we start!

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 16 / 20

Gibbs Sampling

Hard to determine most likely model parameters

Hard for even relatively likely parameters

Can’t sample from overall distribution: sample instead a singlevariable

Take a walk through distribution

One step (parameter) at a timeSpend more time walking around more likely areasWe can get to likely areas from anywhere

It doesn’t matter where we start!

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 16 / 20

Gibbs Sampling

Hard to determine most likely model parameters

Hard for even relatively likely parameters

Can’t sample from overall distribution: sample instead a singlevariable

Take a walk through distribution

One step (parameter) at a timeSpend more time walking around more likely areasWe can get to likely areas from anywhereIt doesn’t matter where we start!

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 16 / 20

Gibbs Sampling in a Topic Model

Start with randomassignment of topics

For each< word , document >pair:

Sample θ based oncounts and priorSample φ based oncounts and priorChoose k based on θ,φ, and w

Repeat the above manytimes

Smoothing (β and γ)very important

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 17 / 20

Gibbs Sampling in a Topic Model

Start with randomassignment of topics

For each< word , document >pair:

Sample θ based oncounts and priorSample φ based oncounts and priorChoose k based on θ,φ, and w

Repeat the above manytimes

Smoothing (β and γ)very important

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 17 / 20

Gibbs Sampling in a Topic Model

Start with randomassignment of topics

For each< word , document >pair:

Sample θ based oncounts and prior

Sample φ based oncounts and priorChoose k based on θ,φ, and w

Repeat the above manytimes

Smoothing (β and γ)very important

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 17 / 20

Gibbs Sampling in a Topic Model

Start with randomassignment of topics

For each< word , document >pair:

Sample θ based oncounts and priorSample φ based oncounts and prior

Choose k based on θ,φ, and w

Repeat the above manytimes

Smoothing (β and γ)very important

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 17 / 20

Gibbs Sampling in a Topic Model

Start with randomassignment of topics

For each< word , document >pair:

Sample θ based oncounts and priorSample φ based oncounts and priorChoose k based on θ,φ, and w

Repeat the above manytimes

Smoothing (β and γ)very important

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 17 / 20

Gibbs Sampling in a Topic Model

Start with randomassignment of topics

For each< word , document >pair:

Sample θ based oncounts and priorSample φ based oncounts and priorChoose k based on θ,φ, and w

Repeat the above manytimes

Smoothing (β and γ)very important

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 17 / 20

Gibbs Sampling in a Topic Model

Start with randomassignment of topics

For each< word , document >pair:

Sample θ based oncounts and priorSample φ based oncounts and priorChoose k based on θ,φ, and w

Repeat the above manytimes

Smoothing (β and γ)very important

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 17 / 20

Bayes Rule

P(k|β,X) ∝ P(k|β)P(X|k)

Sampling from a conditional distribution can bebroken down into sampling based on the parentnodes (prior, β) and the children (likelihood, X)

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 18 / 20

Blocked Gibbs Sampling in a Topic Model

Start with randomassignment of topics

Repeat many times:

Sample all θ and φfrom counts and priorChoose k for anumber of< word , document >pairs

More sampling, lesscounting

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 19 / 20

Blocked Gibbs Sampling in a Topic Model

Start with randomassignment of topics

Repeat many times:

Sample all θ and φfrom counts and priorChoose k for anumber of< word , document >pairs

More sampling, lesscounting

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 19 / 20

Blocked Gibbs Sampling in a Topic Model

Start with randomassignment of topics

Repeat many times:

Sample all θ and φfrom counts and prior

Choose k for anumber of< word , document >pairs

More sampling, lesscounting

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 19 / 20

Blocked Gibbs Sampling in a Topic Model

Start with randomassignment of topics

Repeat many times:

Sample all θ and φfrom counts and priorChoose k for anumber of< word , document >pairs

More sampling, lesscounting

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 19 / 20

Blocked Gibbs Sampling in a Topic Model

Start with randomassignment of topics

Repeat many times:

Sample all θ and φfrom counts and priorChoose k for anumber of< word , document >pairs

More sampling, lesscounting

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 19 / 20

Collapsed Gibbs Sampling in a Topic Model

Integrate out θ and φ

Start with random assignment of topics

For each < word , document > pair:

Sample k directly from counts

Repeat many times

P(zi = k |z−i ,w) ∝n(wi )−i ,k + γ

n(·)−i ,k + W γ

n(di )−i ,k + β

n(di )−i ,· + Kβ

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 20 / 20

Collapsed Gibbs Sampling in a Topic Model

Integrate out θ and φ

Start with random assignment of topics

For each < word , document > pair:

Sample k directly from counts

Repeat many times

P(zi = k |z−i ,w) ∝n(wi )−i ,k + γ

n(·)−i ,k + W γ

n(di )−i ,k + β

n(di )−i ,· + Kβ

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 20 / 20

Collapsed Gibbs Sampling in a Topic Model

Integrate out θ and φ

Start with random assignment of topics

For each < word , document > pair:

Sample k directly from counts

Repeat many times

P(zi = k |z−i ,w) ∝n(wi )−i ,k + γ

n(·)−i ,k + W γ

n(di )−i ,k + β

n(di )−i ,· + Kβ

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 20 / 20

Collapsed Gibbs Sampling in a Topic Model

Integrate out θ and φ

Start with random assignment of topics

For each < word , document > pair:

Sample k directly from counts

Repeat many times

P(zi = k |z−i ,w) ∝n(wi )−i ,k + γ

n(·)−i ,k + W γ

n(di )−i ,k + β

n(di )−i ,· + Kβ

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 20 / 20

Collapsed Gibbs Sampling in a Topic Model

Integrate out θ and φ

Start with random assignment of topics

For each < word , document > pair:

Sample k directly from counts

Repeat many times

P(zi = k |z−i ,w) ∝n(wi )−i ,k + γ

n(·)−i ,k + W γ

n(di )−i ,k + β

n(di )−i ,· + Kβ

Daniel Peterson (University of Colorado) Introduction to the HDP April 24, 2013 20 / 20