Modeling Science
David M. Blei
Department of Computer SciencePrinceton University
October 3, 2007
Joint work with John Lafferty (CMU)
D. Blei Modeling Science 1 / 49
Modeling Science
Science, August 13, 1886
water acid disease
milk water blood
food solution cholera
dry experiments bacteria
fed liquid found
cows chemical bacillus
houses action experiments
butter copper organisms
fat crystals bacilli
found carbon cases
made alcohol diseases
contained made germs
wells obtained animal
produced substances koch
poisonous nitrogen made
5
Science, June 24, 1994
evolution rna disease
evolutionary mrna host
species site bacteria
organisms splicing diseases
biology rnas new
phylogenetic nuclear bacterial
life sequence resistance
origin introns control
diversity messenger strains
groups cleavage infectious
molecular two malaria
animals splice parasites
two sequences parasite
new polymerase tuberculosis
living intron health
6• Our data are Science from 1880-2002, courtesy of JSTOR.
• JSTOR is an on-line archive that scans the original volumes andperforms optical character recognition on the scans.
• This process results in 130K documents, 76M words.
D. Blei Modeling Science 2 / 49
Modeling Science
Science, August 13, 1886
water acid disease
milk water blood
food solution cholera
dry experiments bacteria
fed liquid found
cows chemical bacillus
houses action experiments
butter copper organisms
fat crystals bacilli
found carbon cases
made alcohol diseases
contained made germs
wells obtained animal
produced substances koch
poisonous nitrogen made
5
Science, June 24, 1994
evolution rna disease
evolutionary mrna host
species site bacteria
organisms splicing diseases
biology rnas new
phylogenetic nuclear bacterial
life sequence resistance
origin introns control
diversity messenger strains
groups cleavage infectious
molecular two malaria
animals splice parasites
two sequences parasite
new polymerase tuberculosis
living intron health
6• Discover the hidden thematic structure with hierarchical probabilisticmodels called topic models.
• Use this structure for browsing, search, and similarity assessment.
D. Blei Modeling Science 2 / 49
Discover topics from a corpus
“Genetics” “Evolution” “Disease” “Computers”
human evolution disease computergenome evolutionary host models
dna species bacteria informationgenetic organisms diseases datagenes life resistance computers
sequence origin bacterial systemgene biology new network
molecular groups strains systemssequencing phylogenetic control model
map living infectious parallelinformation diversity malaria methods
genetics group parasite networksmapping new parasites softwareproject two united new
sequences common tuberculosis simulations
D. Blei Modeling Science 3 / 49
Annotate unlabeled imagesAutomatic image annotation
birds nest leaves branch treepredicted caption: predicted caption:
people market pattern textile displaysky water tree mountain peoplepredicted caption:
fish water ocean tree coral sky water buildings people mountainpredicted caption: predicted caption: predicted caption:
scotland water flowers hills tree
Probabilistic modelsof text and images – p.5/53
SKY WATER TREE
MOUNTAIN PEOPLE
Automatic image annotation
birds nest leaves branch treepredicted caption: predicted caption:
people market pattern textile displaysky water tree mountain peoplepredicted caption:
fish water ocean tree coral sky water buildings people mountainpredicted caption: predicted caption: predicted caption:
scotland water flowers hills tree
Probabilistic modelsof text and images – p.5/53
SCOTLAND WATER
FLOWER HILLS TREE
Automatic image annotation
birds nest leaves branch treepredicted caption: predicted caption:
people market pattern textile displaysky water tree mountain peoplepredicted caption:
fish water ocean tree coral sky water buildings people mountainpredicted caption: predicted caption: predicted caption:
scotland water flowers hills tree
Probabilistic modelsof text and images – p.5/53
SKY WATER BUILDING
PEOPLE WATER
Automatic image annotation
birds nest leaves branch treepredicted caption: predicted caption:
people market pattern textile displaysky water tree mountain peoplepredicted caption:
fish water ocean tree coral sky water buildings people mountainpredicted caption: predicted caption: predicted caption:
scotland water flowers hills tree
Probabilistic modelsof text and images – p.5/53
FISH WATER OCEAN
TREE CORAL
Automatic image annotation
birds nest leaves branch treepredicted caption: predicted caption:
people market pattern textile displaysky water tree mountain peoplepredicted caption:
fish water ocean tree coral sky water buildings people mountainpredicted caption: predicted caption: predicted caption:
scotland water flowers hills tree
Probabilistic modelsof text and images – p.5/53
PEOPLE MARKET PATTERN
TEXTILE DISPLAY
Automatic image annotation
birds nest leaves branch treepredicted caption: predicted caption:
people market pattern textile displaysky water tree mountain peoplepredicted caption:
fish water ocean tree coral sky water buildings people mountainpredicted caption: predicted caption: predicted caption:
scotland water flowers hills tree
Probabilistic modelsof text and images – p.5/53
BIRDS NEST TREE
BRANCH LEAVES
D. Blei Modeling Science 4 / 49
Model the evolution of topics over time
1880 1900 1920 1940 1960 1980 2000
o o o o o o o ooooooooo o o o o o o o o o
o o o o oo o o
o oo o
o oooo
o
oooo
oo o
o o oo o
oooo o
o
o
ooo
o
o
o
oo o o
o o o
1880 1900 1920 1940 1960 1980 2000
o o ooo o
oooo o o o
oo o o o o o
o o o o o
o o oooo
ooooo o
o
o
oooooo o
ooo o
o o o o o o o o o oo o
o ooo
o
o
oo o o o o o
RELATIVITY
LASERFORCE
NERVE
OXYGEN
NEURON
"Theoretical Physics" "Neuroscience"
D. Blei Modeling Science 5 / 49
Model connections between topics
wild typemutant
mutationsmutantsmutation
plantsplantgenegenes
arabidopsis
p53cell cycleactivitycyclin
regulation
amino acidscdna
sequenceisolatedprotein
genedisease
mutationsfamiliesmutation
rnadna
rna polymerasecleavage
site
cellscell
expressioncell lines
bone marrow
united stateswomen
universitiesstudents
education
sciencescientists
saysresearchpeople
researchfundingsupport
nihprogram
surfacetip
imagesampledevice
laseropticallight
electronsquantum
materialsorganicpolymerpolymersmolecules
volcanicdepositsmagmaeruption
volcanism
mantlecrust
upper mantlemeteorites
ratios
earthquakeearthquakes
faultimages
dataancientfoundimpact
million years agoafrica
climateocean
icechanges
climate change
cellsproteins
researchersproteinfound
patientsdisease
treatmentdrugsclinical
geneticpopulationpopulationsdifferencesvariation
fossil recordbirds
fossilsdinosaurs
fossil
sequencesequences
genomedna
sequencing
bacteriabacterial
hostresistanceparasite
developmentembryos
drosophilagenes
expression
speciesforestforests
populationsecosystems
synapsesltp
glutamatesynapticneurons
neuronsstimulusmotorvisual
cortical
ozoneatmospheric
measurementsstratosphere
concentrations
sunsolar wind
earthplanetsplanet
co2carbon
carbon dioxidemethane
water
receptorreceptors
ligandligands
apoptosis
proteinsproteinbindingdomaindomains
activatedtyrosine phosphorylation
activationphosphorylation
kinase
magneticmagnetic field
spinsuperconductivitysuperconducting
physicistsparticlesphysicsparticle
experimentsurfaceliquid
surfacesfluid
model reactionreactionsmoleculemolecules
transition state
enzymeenzymes
ironactive sitereduction
pressurehigh pressure
pressurescore
inner core
brainmemorysubjects
lefttask
computerproblem
informationcomputersproblems
starsastronomers
universegalaxiesgalaxy
virushiv
aidsinfectionviruses
miceantigent cells
antigensimmune response
D. Blei Modeling Science 6 / 49
Outline
1 Introduction
2 Latent Dirichlet allocation
3 Dynamic topic models
4 Correlated topic models
D. Blei Modeling Science 7 / 49
Outline
1 Introduction
2 Latent Dirichlet allocation
3 Dynamic topic models
4 Correlated topic models
D. Blei Modeling Science 8 / 49
Probabilistic modeling
• Treat data as observations that arise from a generative probabilisticprocess that includes hidden variables
• For documents, the hidden variables reflect the thematicstructure of the collection.
• Infer the hidden structure using posterior inference
• What are the topics that describe this collection?
• Situate new data into the estimated model.
• How does this query or new document fit into the estimatedtopic structure?
D. Blei Modeling Science 9 / 49
Intuition behind LDA
Simple intuition: Documents exhibit multiple topics.
D. Blei Modeling Science 10 / 49
Generative process
• Cast these intuitions into a generative probabilistic process
• Each document is a random mixture of corpus-wide topics
• Each word is drawn from one of those topics
D. Blei Modeling Science 11 / 49
Generative process
• In reality, we only observe the documents
• Our goal is to infer the underlying topic structure
• What are the topics?• How are the documents divided according to those topics?
D. Blei Modeling Science 11 / 49
Graphical models (Aside)
· · ·
Y
X1 X2 XN
Xn
Y
N
≡
• Nodes are random variables
• Edges denote possible dependence
• Observed variables are shaded
• Plates denote replicated structure
D. Blei Modeling Science 12 / 49
Graphical models (Aside)
· · ·
Y
X1 X2 XN
Xn
Y
N
≡
• Structure of the graph defines the pattern of conditional dependencebetween the ensemble of random variables
• E.g., this graph corresponds to
p(y , x1, . . . , xN) = p(y)N∏
n=1
p(xn | y)
D. Blei Modeling Science 12 / 49
Latent Dirichlet allocation
θd Zd,n Wd,nN
D Kβk
α η
Dirichletparameter
Per-documenttopic proportions
Per-wordtopic assignment
Observedword Topics
Topichyperparameter
D. Blei Modeling Science 13 / 49
Latent Dirichlet allocation
θd Zd,n Wd,nN
D Kβk
α η
1 Draw each topic βi ∼ Dir(η), for i ∈ {1, . . . ,K}.2 For each document:
1 Draw topic proportions θd ∼ Dir(α).2 For each word:
1 Draw Zd ,n ∼ Mult(θd).2 Draw Wd ,n ∼ Mult(βzd,n
).
D. Blei Modeling Science 14 / 49
Latent Dirichlet allocation
θd Zd,n Wd,nN
D Kβk
α η
• From a collection of documents, infer
• Per-word topic assignment zd ,n
• Per-document topic proportions θd
• Per-corpus topic distributions βk
• Use posterior expectations to perform the task at hand, e.g.,information retrieval, document similarity, etc.
D. Blei Modeling Science 14 / 49
Latent Dirichlet allocation
θd Zd,n Wd,nN
D Kβk
α η
• Computing the posterior is intractable:
p(θ |α)∏N
n=1 p(zn | θ)p(wn | zn, β1:K )∫θ p(θ |α)
∏Nn=1
∑Kz=1 p(zn | θ)p(wn | zn, β1:K )
• Several approximation techniques have been developed.
D. Blei Modeling Science 14 / 49
Latent Dirichlet allocation
θd Zd,n Wd,nN
D Kβk
α η
• Mean field variational methods (Blei et al., 2001, 2003)
• Expectation propagation (Minka and Lafferty, 2002)
• Collapsed Gibbs sampling (Griffiths and Steyvers, 2002)
• Collapsed variational inference (Teh et al., 2006)
D. Blei Modeling Science 14 / 49
Example inference
• Data: The OCR’ed collection of Science from 1990–2000
• 17K documents• 11M words• 20K unique terms (stop words and rare words removed)
• Model: 100-topic LDA model using variational inference.
D. Blei Modeling Science 15 / 49
Example inference
1 8 16 26 36 46 56 66 76 86 96
Topics
Probability
0.0
0.1
0.2
0.3
0.4
D. Blei Modeling Science 16 / 49
Example topics
“Genetics” “Evolution” “Disease” “Computers”
human evolution disease computergenome evolutionary host models
dna species bacteria informationgenetic organisms diseases datagenes life resistance computers
sequence origin bacterial systemgene biology new network
molecular groups strains systemssequencing phylogenetic control model
map living infectious parallelinformation diversity malaria methods
genetics group parasite networksmapping new parasites softwareproject two united new
sequences common tuberculosis simulations
D. Blei Modeling Science 17 / 49
LDA discussion
• LDA is a powerful model for
• Visualizing the hidden thematic structure in large corpora• Generalizing new data to fit into that structure
• LDA is a mixed membership model (Erosheva, 2004) that builds onthe work of Deerwester et al. (1990) and Hofmann (1999).
• For document collections and other grouped data, this mightbe more appropriate than a simple finite mixture
• See Blei et al., 2003 for a quantitative comparison.
• Modular : It can be embedded in more complicated models.
• General : The data generating distribution can be changed.
• Variational inference is fast; allows us to analyze large data sets.
• Code to play with LDA is freely available on my web-site,http://www.cs.princeton.edu/∼blei.
D. Blei Modeling Science 18 / 49
Outline
1 Introduction
2 Latent Dirichlet allocation
3 Dynamic topic models
4 Correlated topic models
D. Blei Modeling Science 19 / 49
LDA and exchangeability
θd Zd,n Wd,nN
D Kβk
α η
• LDA assumes that documents are exchangeable.
• I.e., their joint probability is invariant to permutation.
• This is too restrictive.
D. Blei Modeling Science 20 / 49
Documents are not exchangeable
"Infrared Reflectance in Leaf-Sitting Neotropical Frogs" (1977)"Instantaneous Photography" (1890)
• Documents about the same topic are not exchangeable.
• Topics evolve over time.
D. Blei Modeling Science 21 / 49
Dynamic topic model
• Divide corpus into sequential slices (e.g., by year).
• Assume each slice’s documents exchangeable.
• Drawn from an LDA model.
• Allow topic distributions evolve from slice to slice.
D. Blei Modeling Science 22 / 49
Dynamic topic models
D
θd
Zd,n
Wd,n
N
K
α
D
θd
Zd,n
Wd,n
N
α
D
θd
Zd,n
Wd,n
N
α
βk,1 βk,2 βk,T
. . .
D. Blei Modeling Science 23 / 49
Modeling evolving topics
βk,1 βk,2 βk,T
. . .
• Use a logistic normal distribution to model topics evolving over time(Aitchison, 1980)
• A state-space model on the natural parameter of the topicmultinomial (West and Harrison, 1997)
βt,k |βt−1,k ∼ N (βt−1,k , Iσ2)
p(w |βt,k) = exp{
βt,k − (1 +∑V−1
v=1 exp{βt,k,v})}
D. Blei Modeling Science 24 / 49
Posterior inference
• Our goal is to compute the posterior distribution,
p(β1:T ,1:K , θ1:T ,1:D , z1:T ,1:D |w1:T ,1:D).
• Exact inference is impossible
• Per-document mixed-membership model• Non-conjugacy between p(w |βt,k) and p(βt,k)
• MCMC is not practical for the amount of data.
• Solution: Variational inference
D. Blei Modeling Science 25 / 49
Variational inference
• Define a family of distributions q on the latent variables indexed byfree variational parameters.
• Find the member closest in KL(q||p) to the true posterior.
• Equivalently, maximize the Jensen’s bound on the marginallikelihood of the data, within the variational family.
• See Jordan et al. (1999) and Wainwright and Jordan (2003).
• (More details at the end of the talk, if you are interested.)
D. Blei Modeling Science 26 / 49
Science data
TECHVIEW: DNA S E Q U E N C I NG
Sequencing the Genome, Fast
James C. Mullikin and Amanda A. McMurray
Genome sequencing projects reveal the genetic makeup of an organism by reading off the sequence of theDNA bases, which encodes all of the infor-mation necessary for the life of the organ-ism. The base sequence contains four nu-cleotides-adenine, thymidine, guanosine,and cytosine-which are linked togetherinto long double-helical chains. Over thelast two decades, automated DNA se-quencers have made the process of obtain-ing the base-by-base sequence of DNA...
• Analyze JSTOR’s entire collection from Science (1880-2002)
• No reliable punctuation, meta-data, or references
• Restrict to 30K terms that occur more than ten times
• The data are 76M words in 130K documents
D. Blei Modeling Science 27 / 49
Analyzing a document
Original article Topic proportions
D. Blei Modeling Science 28 / 49
Analyzing a document
sequencegenomegenessequenceshumangenednasequencingchromosomeregionsanalysisdatagenomicnumber
devicesdevicematerialscurrenthighgatelightsiliconmaterialtechnologyelectricalfiberpowerbased
datainformationnetworkwebcomputerlanguagenetworkstimesoftwaresystemwordsalgorithmnumberinternet
Original article Most likely words from top topics
D. Blei Modeling Science 28 / 49
Analyzing a topic
1880electric
machinepowerenginesteam
twomachines
ironbattery
wire
1890electricpower
companysteam
electricalmachine
twosystemmotorengine
1900apparatus
steampowerengine
engineeringwater
constructionengineer
roomfeet
1910air
waterengineeringapparatus
roomlaboratoryengineer
madegastube
1920apparatus
tubeair
pressurewaterglassgas
madelaboratorymercury
1930tube
apparatusglass
airmercury
laboratorypressure
madegas
small
1940air
tubeapparatus
glasslaboratory
rubberpressure
smallmercury
gas
1950tube
apparatusglass
airchamber
instrumentsmall
laboratorypressurerubber
1960tube
systemtemperature
airheat
chamberpowerhigh
instrumentcontrol
1970air
heatpowersystem
temperaturechamber
highflowtube
design
1980high
powerdesignheat
systemsystemsdevices
instrumentscontrollarge
1990materials
highpowercurrent
applicationstechnology
devicesdesigndeviceheat
2000devicesdevice
materialscurrent
gatehighlight
siliconmaterial
technology
D. Blei Modeling Science 29 / 49
Visualizing trends within a topic
1880 1900 1920 1940 1960 1980 2000
o o o o o o o ooooooooo o o o o o o o o o
o o o o oo o o
o oo o
o oooo
o
oooo
oo o
o o oo o
oooo o
o
o
ooo
o
o
o
oo o o
o o o
1880 1900 1920 1940 1960 1980 2000
o o ooo o
oooo o o o
oo o o o o o
o o o o o
o o oooo
ooooo o
o
o
oooooo o
ooo o
o o o o o o o o o oo o
o ooo
o
o
oo o o o o o
RELATIVITY
LASERFORCE
NERVE
OXYGEN
NEURON
"Theoretical Physics" "Neuroscience"
D. Blei Modeling Science 30 / 49
Time-corrected document similarity
• Consider the expected Hellinger distance between the topicproportions of two documents,
dij = E
[K∑
k=1
(√
θi ,k −√
θj ,k)2 |wi ,wj
]
• Uses the latent structure to define similarity
• Time has been factored out because the topics associated to thecomponents are different from year to year.
• Similarity based only on topic proportions
D. Blei Modeling Science 31 / 49
Time-corrected document similarity
The Brain of the Orang (1880)
D. Blei Modeling Science 32 / 49
Time-corrected document similarity
Representation of the Visual Field on the Medial Wall ofOccipital-Parietal Cortex in the Owl Monkey (1976)
D. Blei Modeling Science 33 / 49
Browser of Science
D. Blei Modeling Science 34 / 49
Quantitative comparison
• Compute the probability of each year’s documents conditional on allthe previous year’s documents,
p(wt |w1, . . . ,wt−1)
• Compare exchangeable and dynamic topic models
D. Blei Modeling Science 35 / 49
Quantitative comparison
1920 1940 1960 1980 2000
1015
2025
Year
Per
−w
ord
nega
tive
log
likel
ihoo
d
● ●●
●
● ●●
● ●
●
●
●
● ● ●
● ●●
●●
●
●
●
●
● ●
●
●
●
●
●●
● ●
● ●● ●
●
●
LDADTM
D. Blei Modeling Science 36 / 49
Dynamic topic models discussion
• The DTM is a hierarchical model of sequential document collections;
• Exchangeability assumptions should be taken seriously.
• Variational methods allow large scale posterior inference.
• Examining the latent structure yields useful browsing tools
• Some open issues
• Model selection: choosing the number of topics• Variational inference: what are the hidden assumptions?
D. Blei Modeling Science 37 / 49
Outline
1 Introduction
2 Latent Dirichlet allocation
3 Dynamic topic models
4 Correlated topic models
D. Blei Modeling Science 38 / 49
The hidden assumptions of the Dirichlet distribution
• The Dirichlet is an exponential family distribution on the simplex,positive vectors that sum to one.
• However, the near independence of components makes it a poorchoice for modeling topic proportions.
• An article about fossil fuels is more likely to also be about geologythan about genetics.
D. Blei Modeling Science 39 / 49
The logistic normal distribution
• The logistic normal is a distribution on the simplex that can modeldependence between components.
• The natural parameters of the multinomial are drawn from amultivariate Gaussian distribution.
X ∼ NK−1(µ,Σ)
θi = exp{xi − log(1 +∑K−1
j=1 exp{xj})}D. Blei Modeling Science 40 / 49
Correlated topic model (CTM)
Zd,n Wd,nN
D
K
Σ
µ
ηd
βk
• Draw topic proportions from a logistic normal, where topicoccurrences can exhibit correlation.
• Use for:
• Providing a “map” of topics and how they are related• Better prediction via correlated topics
D. Blei Modeling Science 41 / 49
wild typemutant
mutationsmutantsmutation
plantsplantgenegenes
arabidopsis
p53cell cycleactivitycyclin
regulation
amino acidscdna
sequenceisolatedprotein
genedisease
mutationsfamiliesmutation
rnadna
rna polymerasecleavage
site
cellscell
expressioncell lines
bone marrow
united stateswomen
universitiesstudents
education
sciencescientists
saysresearchpeople
researchfundingsupport
nihprogram
surfacetip
imagesampledevice
laseropticallight
electronsquantum
materialsorganicpolymerpolymersmolecules
volcanicdepositsmagmaeruption
volcanism
mantlecrust
upper mantlemeteorites
ratios
earthquakeearthquakes
faultimages
dataancientfoundimpact
million years agoafrica
climateocean
icechanges
climate change
cellsproteins
researchersproteinfound
patientsdisease
treatmentdrugsclinical
geneticpopulationpopulationsdifferencesvariation
fossil recordbirds
fossilsdinosaurs
fossil
sequencesequences
genomedna
sequencing
bacteriabacterial
hostresistanceparasite
developmentembryos
drosophilagenes
expression
speciesforestforests
populationsecosystems
synapsesltp
glutamatesynapticneurons
neuronsstimulusmotorvisual
cortical
ozoneatmospheric
measurementsstratosphere
concentrations
sunsolar wind
earthplanetsplanet
co2carbon
carbon dioxidemethane
water
receptorreceptors
ligandligands
apoptosis
proteinsproteinbindingdomaindomains
activatedtyrosine phosphorylation
activationphosphorylation
kinase
magneticmagnetic field
spinsuperconductivitysuperconducting
physicistsparticlesphysicsparticle
experimentsurfaceliquid
surfacesfluid
model reactionreactionsmoleculemolecules
transition state
enzymeenzymes
ironactive sitereduction
pressurehigh pressure
pressurescore
inner core
brainmemorysubjects
lefttask
computerproblem
informationcomputersproblems
starsastronomers
universegalaxiesgalaxy
virushiv
aidsinfectionviruses
miceantigent cells
antigensimmune response
D. Blei Modeling Science 42 / 49
Summary
• Topic models provide useful descriptive statistics for analyzing andunderstanding the latent structure of large text collections.
• More generally, probabilistic graphical models are a useful way toexpress assumptions about the hidden structure of complicated data.
• Variational methods allow us to perform posterior inference toautomatically infer that structure from large data sets.
• Current research
• Choosing the number of topics• Continuous time dynamic topic models• Topic models for prediction• Inferring the impact of a document
D. Blei Modeling Science 43 / 49
“We should seek out unfamiliar summaries of observational material, andestablish their useful properties... And still more novelty can come fromfinding, and evading, still deeper lying constraints.” (Tukey, 1962)
D. Blei Modeling Science 44 / 49
Diversion: Variational inference
• Let x1:N be observations and z1:M be latent variables
• Our goal is to compute the posterior distribution
p(z1:M | x1:N) =p(z1:M , x1:N)∫
p(z1:M , x1:N)dz1:M
• For many interesting distributions, the marginal likelihood of theobservations is difficult to efficiently compute
D. Blei Modeling Science 45 / 49
Variational inference
• Use Jensen’s inequality to bound the log prob of the observations:
log p(x1:N) ≥ Eqν [log p(z1:M , x1:N)]− Eqν [log qν(z1:M)].
• We have introduced a distribution of the latent variables with freevariational parameters ν.
• We optimize those parameters to tighten this bound.
• This is the same as finding the member of the family qν that isclosest in KL divergence to p(z1:M | x1:N).
D. Blei Modeling Science 46 / 49
Mean-field variational inference
• Complexity of optimization is determined by the factorization of qν
• In mean field variational inference we choose qν to be fully factored
qν(z1:M) =M∏
m=1
qνm(zm).
• The latent variables are independent.
• Each is governed by its own variational parameter νm.
• In the true posterior they can exhibit dependence(often, this is what makes exact inference difficult).
D. Blei Modeling Science 47 / 49
MFVI and conditional exponential families
• Suppose the distribution of each latent variable conditional on theobservations and other latent variables is in the exponential family:
p(zm | z−m, x) = hm(zm) exp{gm(z−m, x)T zm − am(gi (z−m, x))}
• Assume qν is fully factorized, and each factor is in the sameexponential family:
qνm(zm) = hm(zm) exp{νTmzm − am(νm)}
D. Blei Modeling Science 48 / 49
MFVI and conditional exponential families
• Variational inference is the following coordinate ascent algorithm
νm = Eqν [gm(Z−m, x)]
• Notice the relationship to Gibbs sampling
D. Blei Modeling Science 48 / 49
Variational family for the DTM
βk,1 βk,2 βk,T
. . .
β̂k,1 β̂k,2 β̂k,T
• Distribution of θ and z is fully-factorized (Blei et al., 2003)
• Distribution of {β1,k , . . . , βT ,k} is a variational Kalman filter
• Gaussian state-space model with free observations β̂k,t .
• Fit observations such that the corresponding posterior over thechain is close to the true posterior.
D. Blei Modeling Science 49 / 49
Variational family for the DTM
βk,1 βk,2 βk,T
. . .
β̂k,1 β̂k,2 β̂k,T
• Given a document collection, use coordinate ascent on all thevariational parameters until the KL converges.
• Yields a distribution close to the true posterior of interest
• Take expectations w/r/t the simpler variational distribution
D. Blei Modeling Science 49 / 49