Date post: | 02-Jul-2015 |
Category: |
Education |
Upload: | christian-robert |
View: | 825 times |
Download: | 1 times |
Does MCMC converge?Postprocessing MCMC output
Multimodality and label switching: a discussion
Christian P. Robert
Universite Paris-Dauphine and CREST, INSEE
http://www.ceremade.dauphine.fr/~xian
Workshop on mixtures, ICMSFebruary 28, 2010
Christian P. Robert Multimodality and label switching: a discussion
Does MCMC converge?Postprocessing MCMC output
Outline
1 Does MCMC converge?
2 Postprocessing MCMC output
Christian P. Robert Multimodality and label switching: a discussion
Does MCMC converge?Postprocessing MCMC output
Monte Carlo perspective
When, given a target π, an MCMC sampler never visits more than50% of the support of π,
Christian P. Robert Multimodality and label switching: a discussion
Does MCMC converge?Postprocessing MCMC output
Monte Carlo perspective
When, given a target π, an MCMC sampler never visits more than50% of the support of π, it can be argued that the sampler doesnot converge!
Christian P. Robert Multimodality and label switching: a discussion
Does MCMC converge?Postprocessing MCMC output
Monte Carlo perspective
When, given a target π, an MCMC sampler never visits more than50% of the support of π, it can be argued that the sampler doesnot converge!
Two-component normal mixture and Gibbs sampler
Case when both means µi
are the only unknowns,with different weights andsame variance: identifiablemodel
−1 0 1 2 3 4
−1
01
23
4
µ1
µ 2
Christian P. Robert Multimodality and label switching: a discussion
Does MCMC converge?Postprocessing MCMC output
Monte Carlo perspective
When, given a target π, an MCMC sampler never visits more than50% of the support of π, it can be argued that the sampler doesnot converge!
Two-component normal mixture and Gibbs sampler
Case when both means µi
are the only unknowns,with different weights andsame variance: identifiablemodel
−1 0 1 2 3 4
−1
01
23
4
µ1
µ 2
Christian P. Robert Multimodality and label switching: a discussion
Does MCMC converge?Postprocessing MCMC output
Monte Carlo perspective
When, given a target π, an MCMC sampler never visits more than50% of the support of π, it can be argued that the sampler doesnot converge!
Two-component normal mixture and Gibbs sampler
Case when both means µi
are the only unknowns,with different weights andsame variance: identifiablemodel
−1 0 1 2 3 4
−1
01
23
4
µ1
µ 2
(C.) Simple MCMC does not work
Christian P. Robert Multimodality and label switching: a discussion
Does MCMC converge?Postprocessing MCMC output
Imposed permutations may miss the mark
While duplicating the MCMC sampler according to allpermutations ρ in Sk produces perfect exchangeability [nice!],
Christian P. Robert Multimodality and label switching: a discussion
Does MCMC converge?Postprocessing MCMC output
Imposed permutations may miss the mark
While duplicating the MCMC sampler according to allpermutations ρ in Sk produces perfect exchangeability [nice!],
it does not bring additional energy to the MCMC sampler
it does not identify other modes (under- or over-fitting)
it does not apply in nearly-but-not exchangeable settings
Christian P. Robert Multimodality and label switching: a discussion
Does MCMC converge?Postprocessing MCMC output
Illustrations
Example (Two-mean Gaussian mixture)
Case ofpN (µ1, 1) + (1 − p)N (µ1, 1)(p 6= 0.5)
−1
01
23
4
−101234
µ 1
µ2
Christian P. Robert Multimodality and label switching: a discussion
Does MCMC converge?Postprocessing MCMC output
Illustrations (2)
Example (Two-mean Gaussian mixture and outliers)
Same model, but data from 5-component mixture
Christian P. Robert Multimodality and label switching: a discussion
Does MCMC converge?Postprocessing MCMC output
Illustrations (3)
Example (Outlier Gaussian mixture)
Case of pN (0, 1) + (1 − p)N (µ, σ2) with p known
Christian P. Robert Multimodality and label switching: a discussion
Does MCMC converge?Postprocessing MCMC output
Illustrations (3)
Example (Outlier Gaussian mixture)
Case of pN (0, 1) + (1 − p)N (µ, σ2) with p known
Christian P. Robert Multimodality and label switching: a discussion
Does MCMC converge?Postprocessing MCMC output
Chib’s solutionNested sampling
Postprocessing issues
When assessing the number k of components via the evidence
Zk =
∫
Θk
πk(θk)Lk(θk) dθk,
aka the marginal likelihood,
Christian P. Robert Multimodality and label switching: a discussion
Does MCMC converge?Postprocessing MCMC output
Chib’s solutionNested sampling
Postprocessing issues
When assessing the number k of components via the evidence
Zk =
∫
Θk
πk(θk)Lk(θk) dθk,
aka the marginal likelihood, label switching is a liability and auninteresting phenomenon
Christian P. Robert Multimodality and label switching: a discussion
Does MCMC converge?Postprocessing MCMC output
Chib’s solutionNested sampling
Postprocessing issues
When assessing the number k of components via the evidence
Zk =
∫
Θk
πk(θk)Lk(θk) dθk,
aka the marginal likelihood, label switching is a liability and auninteresting phenomenonIndeed,
∫
Θk
πk(θk)Lk(θk) dθk, = k!
∫
Θk/Sk
πk(θk)Lk(θk) dθk
means that integrating over the restricted space is [more than] ok!
Christian P. Robert Multimodality and label switching: a discussion
Does MCMC converge?Postprocessing MCMC output
Chib’s solutionNested sampling
Chib’s representation
Direct application of Bayes’ theorem: given x ∼ fk(x|θk) andθk ∼ πk(θk),
Zk = mk(x) =fk(x|θk)πk(θk)
πk(θk|x)
Christian P. Robert Multimodality and label switching: a discussion
Does MCMC converge?Postprocessing MCMC output
Chib’s solutionNested sampling
Chib’s representation
Direct application of Bayes’ theorem: given x ∼ fk(x|θk) andθk ∼ πk(θk),
Zk = mk(x) =fk(x|θk)πk(θk)
πk(θk|x)
Use of an approximation to the posterior
Zk = mk(x) =fk(x|θ
∗
k)πk(θ∗
k)
πk(θ∗
k|x).
Christian P. Robert Multimodality and label switching: a discussion
Does MCMC converge?Postprocessing MCMC output
Chib’s solutionNested sampling
Case of latent variables
For missing variable z as in mixture models, natural Rao-Blackwellestimate
πk(θ∗
k|x) =1
T
T∑
t=1
πk(θ∗
k|x, z(t)k ) ,
where the z(t)k ’s are Gibbs sampled latent variables
Christian P. Robert Multimodality and label switching: a discussion
Does MCMC converge?Postprocessing MCMC output
Chib’s solutionNested sampling
Case of latent variables
For missing variable z as in mixture models, natural Rao-Blackwellestimate
πk(θ∗
k|x) =1
T
T∑
t=1
πk(θ∗
k|x, z(t)k ) ,
where the z(t)k ’s are Gibbs sampled latent variables
But convergence impaired by lack of label switching
Christian P. Robert Multimodality and label switching: a discussion
Does MCMC converge?Postprocessing MCMC output
Chib’s solutionNested sampling
Case of latent variables
For missing variable z as in mixture models, natural Rao-Blackwellestimate
πk(θ∗
k|x) =1
T
T∑
t=1
πk(θ∗
k|x, z(t)k ) ,
where the z(t)k ’s are Gibbs sampled latent variables
But convergence impaired by lack of label switching(C.) Simple MCMC does not work
Christian P. Robert Multimodality and label switching: a discussion
Does MCMC converge?Postprocessing MCMC output
Chib’s solutionNested sampling
Compensation for label switching
For mixture models, z(t)k usually fails to visit all configurations in a
balanced way, despite the symmetry predicted by the theory
πk(θk|x) = πk(ρ(θk)|x) =1
k!
∑
ρ∈S
πk(ρ(θk)|x)
for all ρ’s in Sk, set of all permutations of {1, . . . , k}.Consequences on numerical approximation, biased by an order k!
Christian P. Robert Multimodality and label switching: a discussion
Does MCMC converge?Postprocessing MCMC output
Chib’s solutionNested sampling
Compensation for label switching
For mixture models, z(t)k usually fails to visit all configurations in a
balanced way, despite the symmetry predicted by the theory
πk(θk|x) = πk(ρ(θk)|x) =1
k!
∑
ρ∈S
πk(ρ(θk)|x)
for all ρ’s in Sk, set of all permutations of {1, . . . , k}.Consequences on numerical approximation, biased by an order k!Recover the theoretical symmetry by using
πk(θ∗
k|x) =1
T k!
∑
ρ∈Sk
T∑
t=1
πk(ρ(θ∗k)|x, z(t)k ) .
[Berkhof, Mechelen, & Gelman, 2003]
Christian P. Robert Multimodality and label switching: a discussion
Does MCMC converge?Postprocessing MCMC output
Chib’s solutionNested sampling
Galaxy dataset (k)
Using only the original estimate, with θ∗k as the MAP estimator,
log(mk(x)) = −105.1396
for k = 3 (based on 103 simulations), while introducing thepermutations leads to
log(mk(x)) = −103.3479 = −105.1396 + log(3!)
Christian P. Robert Multimodality and label switching: a discussion
Does MCMC converge?Postprocessing MCMC output
Chib’s solutionNested sampling
Galaxy dataset (k)
Using only the original estimate, with θ∗k as the MAP estimator,
log(mk(x)) = −105.1396
for k = 3 (based on 103 simulations), while introducing thepermutations leads to
log(mk(x)) = −103.3479 = −105.1396 + log(3!)
k 2 3 4 5 6 7 8
mk(x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44
Estimations of the marginal likelihoods by the symmetrised Chib’sapproximation (based on 105 Gibbs iterations and, for k > 5, 100permutations selected at random in Sk).
[Lee, Marin, Mengersen & Robert, 2008]
Christian P. Robert Multimodality and label switching: a discussion
Does MCMC converge?Postprocessing MCMC output
Chib’s solutionNested sampling
Comparison between evidence approximations
1 Nested sampling: M = 1000 points, with 10 random walkmoves at each step, simulations from the constr’d prior and astopping rule at 95% of the observed maximum likelihood
2 T = 104 MCMC (=Gibbs) simulations producingnon-parametric estimates ϕ
3 Monte Carlo estimates Z1, Z2, Z3 using product of twoGaussian kernels
4 numerical integration based on 850 × 950 grid [referencevalue, confirmed by Chib’s]
Christian P. Robert Multimodality and label switching: a discussion
Does MCMC converge?Postprocessing MCMC output
Chib’s solutionNested sampling
Comparison (cont’d)
Graph based on a sample of 10 observations for µ = 2 andσ = 3/2 (150 replicas) V1=Nested sampling, V2=importance
sampling, V3=harmonic mean, V4=bridge sampling.[Chopin & Robert, 2010]
Christian P. Robert Multimodality and label switching: a discussion
Does MCMC converge?Postprocessing MCMC output
Chib’s solutionNested sampling
Comparison (cont’d)
Graph based on a sample of 50 observations for µ = 2 andσ = 3/2 (150 replicas) V1=Nested sampling, V2=importance
sampling, V3=harmonic mean, V4=bridge sampling.[Chopin & Robert, 2010]
Christian P. Robert Multimodality and label switching: a discussion
Does MCMC converge?Postprocessing MCMC output
Chib’s solutionNested sampling
Comparison (cont’d)
Graph based on a sample of 100 observations for µ = 2 andσ = 3/2 (150 replicas) V1=Nested sampling, V2=importance
sampling, V3=harmonic mean, V4=bridge sampling.[Chopin & Robert, 2010]
Christian P. Robert Multimodality and label switching: a discussion
Does MCMC converge?Postprocessing MCMC output
Chib’s solutionNested sampling
Comparison (cont’d)
[Lee, Marin, Mengersen & Robert, 2010]
Christian P. Robert Multimodality and label switching: a discussion