+ All Categories
Home > Education > ICMS Discussion, March 2010

ICMS Discussion, March 2010

Date post: 02-Jul-2015
Category:
Upload: christian-robert
View: 825 times
Download: 1 times
Share this document with a friend
Description:
This is a discussion of the presentations of John Geweke and of Sylvia Früwirth-Schnatter, during the ICMS convference on March 3-5, 2010, in Edinburgh
30
Does MCMC converge? Postprocessing MCMC output Multimodality and label switching: a discussion Christian P. Robert Universit´ e Paris-Dauphine and CREST, INSEE http://www.ceremade.dauphine.fr/ ~ xian Workshop on mixtures, ICMS February 28, 2010 Christian P. Robert Multimodality and label switching: a discussion
Transcript
Page 1: ICMS Discussion, March 2010

Does MCMC converge?Postprocessing MCMC output

Multimodality and label switching: a discussion

Christian P. Robert

Universite Paris-Dauphine and CREST, INSEE

http://www.ceremade.dauphine.fr/~xian

Workshop on mixtures, ICMSFebruary 28, 2010

Christian P. Robert Multimodality and label switching: a discussion

Page 2: ICMS Discussion, March 2010

Does MCMC converge?Postprocessing MCMC output

Outline

1 Does MCMC converge?

2 Postprocessing MCMC output

Christian P. Robert Multimodality and label switching: a discussion

Page 3: ICMS Discussion, March 2010

Does MCMC converge?Postprocessing MCMC output

Monte Carlo perspective

When, given a target π, an MCMC sampler never visits more than50% of the support of π,

Christian P. Robert Multimodality and label switching: a discussion

Page 4: ICMS Discussion, March 2010

Does MCMC converge?Postprocessing MCMC output

Monte Carlo perspective

When, given a target π, an MCMC sampler never visits more than50% of the support of π, it can be argued that the sampler doesnot converge!

Christian P. Robert Multimodality and label switching: a discussion

Page 5: ICMS Discussion, March 2010

Does MCMC converge?Postprocessing MCMC output

Monte Carlo perspective

When, given a target π, an MCMC sampler never visits more than50% of the support of π, it can be argued that the sampler doesnot converge!

Two-component normal mixture and Gibbs sampler

Case when both means µi

are the only unknowns,with different weights andsame variance: identifiablemodel

−1 0 1 2 3 4

−1

01

23

4

µ1

µ 2

Christian P. Robert Multimodality and label switching: a discussion

Page 6: ICMS Discussion, March 2010

Does MCMC converge?Postprocessing MCMC output

Monte Carlo perspective

When, given a target π, an MCMC sampler never visits more than50% of the support of π, it can be argued that the sampler doesnot converge!

Two-component normal mixture and Gibbs sampler

Case when both means µi

are the only unknowns,with different weights andsame variance: identifiablemodel

−1 0 1 2 3 4

−1

01

23

4

µ1

µ 2

Christian P. Robert Multimodality and label switching: a discussion

Page 7: ICMS Discussion, March 2010

Does MCMC converge?Postprocessing MCMC output

Monte Carlo perspective

When, given a target π, an MCMC sampler never visits more than50% of the support of π, it can be argued that the sampler doesnot converge!

Two-component normal mixture and Gibbs sampler

Case when both means µi

are the only unknowns,with different weights andsame variance: identifiablemodel

−1 0 1 2 3 4

−1

01

23

4

µ1

µ 2

(C.) Simple MCMC does not work

Christian P. Robert Multimodality and label switching: a discussion

Page 8: ICMS Discussion, March 2010

Does MCMC converge?Postprocessing MCMC output

Imposed permutations may miss the mark

While duplicating the MCMC sampler according to allpermutations ρ in Sk produces perfect exchangeability [nice!],

Christian P. Robert Multimodality and label switching: a discussion

Page 9: ICMS Discussion, March 2010

Does MCMC converge?Postprocessing MCMC output

Imposed permutations may miss the mark

While duplicating the MCMC sampler according to allpermutations ρ in Sk produces perfect exchangeability [nice!],

it does not bring additional energy to the MCMC sampler

it does not identify other modes (under- or over-fitting)

it does not apply in nearly-but-not exchangeable settings

Christian P. Robert Multimodality and label switching: a discussion

Page 10: ICMS Discussion, March 2010

Does MCMC converge?Postprocessing MCMC output

Illustrations

Example (Two-mean Gaussian mixture)

Case ofpN (µ1, 1) + (1 − p)N (µ1, 1)(p 6= 0.5)

−1

01

23

4

−101234

µ 1

µ2

Christian P. Robert Multimodality and label switching: a discussion

Page 11: ICMS Discussion, March 2010

Does MCMC converge?Postprocessing MCMC output

Illustrations (2)

Example (Two-mean Gaussian mixture and outliers)

Same model, but data from 5-component mixture

Christian P. Robert Multimodality and label switching: a discussion

Page 12: ICMS Discussion, March 2010

Does MCMC converge?Postprocessing MCMC output

Illustrations (3)

Example (Outlier Gaussian mixture)

Case of pN (0, 1) + (1 − p)N (µ, σ2) with p known

Christian P. Robert Multimodality and label switching: a discussion

Page 13: ICMS Discussion, March 2010

Does MCMC converge?Postprocessing MCMC output

Illustrations (3)

Example (Outlier Gaussian mixture)

Case of pN (0, 1) + (1 − p)N (µ, σ2) with p known

Christian P. Robert Multimodality and label switching: a discussion

Page 14: ICMS Discussion, March 2010

Does MCMC converge?Postprocessing MCMC output

Chib’s solutionNested sampling

Postprocessing issues

When assessing the number k of components via the evidence

Zk =

Θk

πk(θk)Lk(θk) dθk,

aka the marginal likelihood,

Christian P. Robert Multimodality and label switching: a discussion

Page 15: ICMS Discussion, March 2010

Does MCMC converge?Postprocessing MCMC output

Chib’s solutionNested sampling

Postprocessing issues

When assessing the number k of components via the evidence

Zk =

Θk

πk(θk)Lk(θk) dθk,

aka the marginal likelihood, label switching is a liability and auninteresting phenomenon

Christian P. Robert Multimodality and label switching: a discussion

Page 16: ICMS Discussion, March 2010

Does MCMC converge?Postprocessing MCMC output

Chib’s solutionNested sampling

Postprocessing issues

When assessing the number k of components via the evidence

Zk =

Θk

πk(θk)Lk(θk) dθk,

aka the marginal likelihood, label switching is a liability and auninteresting phenomenonIndeed,

Θk

πk(θk)Lk(θk) dθk, = k!

Θk/Sk

πk(θk)Lk(θk) dθk

means that integrating over the restricted space is [more than] ok!

Christian P. Robert Multimodality and label switching: a discussion

Page 17: ICMS Discussion, March 2010

Does MCMC converge?Postprocessing MCMC output

Chib’s solutionNested sampling

Chib’s representation

Direct application of Bayes’ theorem: given x ∼ fk(x|θk) andθk ∼ πk(θk),

Zk = mk(x) =fk(x|θk)πk(θk)

πk(θk|x)

Christian P. Robert Multimodality and label switching: a discussion

Page 18: ICMS Discussion, March 2010

Does MCMC converge?Postprocessing MCMC output

Chib’s solutionNested sampling

Chib’s representation

Direct application of Bayes’ theorem: given x ∼ fk(x|θk) andθk ∼ πk(θk),

Zk = mk(x) =fk(x|θk)πk(θk)

πk(θk|x)

Use of an approximation to the posterior

Zk = mk(x) =fk(x|θ

k)πk(θ∗

k)

πk(θ∗

k|x).

Christian P. Robert Multimodality and label switching: a discussion

Page 19: ICMS Discussion, March 2010

Does MCMC converge?Postprocessing MCMC output

Chib’s solutionNested sampling

Case of latent variables

For missing variable z as in mixture models, natural Rao-Blackwellestimate

πk(θ∗

k|x) =1

T

T∑

t=1

πk(θ∗

k|x, z(t)k ) ,

where the z(t)k ’s are Gibbs sampled latent variables

Christian P. Robert Multimodality and label switching: a discussion

Page 20: ICMS Discussion, March 2010

Does MCMC converge?Postprocessing MCMC output

Chib’s solutionNested sampling

Case of latent variables

For missing variable z as in mixture models, natural Rao-Blackwellestimate

πk(θ∗

k|x) =1

T

T∑

t=1

πk(θ∗

k|x, z(t)k ) ,

where the z(t)k ’s are Gibbs sampled latent variables

But convergence impaired by lack of label switching

Christian P. Robert Multimodality and label switching: a discussion

Page 21: ICMS Discussion, March 2010

Does MCMC converge?Postprocessing MCMC output

Chib’s solutionNested sampling

Case of latent variables

For missing variable z as in mixture models, natural Rao-Blackwellestimate

πk(θ∗

k|x) =1

T

T∑

t=1

πk(θ∗

k|x, z(t)k ) ,

where the z(t)k ’s are Gibbs sampled latent variables

But convergence impaired by lack of label switching(C.) Simple MCMC does not work

Christian P. Robert Multimodality and label switching: a discussion

Page 22: ICMS Discussion, March 2010

Does MCMC converge?Postprocessing MCMC output

Chib’s solutionNested sampling

Compensation for label switching

For mixture models, z(t)k usually fails to visit all configurations in a

balanced way, despite the symmetry predicted by the theory

πk(θk|x) = πk(ρ(θk)|x) =1

k!

ρ∈S

πk(ρ(θk)|x)

for all ρ’s in Sk, set of all permutations of {1, . . . , k}.Consequences on numerical approximation, biased by an order k!

Christian P. Robert Multimodality and label switching: a discussion

Page 23: ICMS Discussion, March 2010

Does MCMC converge?Postprocessing MCMC output

Chib’s solutionNested sampling

Compensation for label switching

For mixture models, z(t)k usually fails to visit all configurations in a

balanced way, despite the symmetry predicted by the theory

πk(θk|x) = πk(ρ(θk)|x) =1

k!

ρ∈S

πk(ρ(θk)|x)

for all ρ’s in Sk, set of all permutations of {1, . . . , k}.Consequences on numerical approximation, biased by an order k!Recover the theoretical symmetry by using

πk(θ∗

k|x) =1

T k!

ρ∈Sk

T∑

t=1

πk(ρ(θ∗k)|x, z(t)k ) .

[Berkhof, Mechelen, & Gelman, 2003]

Christian P. Robert Multimodality and label switching: a discussion

Page 24: ICMS Discussion, March 2010

Does MCMC converge?Postprocessing MCMC output

Chib’s solutionNested sampling

Galaxy dataset (k)

Using only the original estimate, with θ∗k as the MAP estimator,

log(mk(x)) = −105.1396

for k = 3 (based on 103 simulations), while introducing thepermutations leads to

log(mk(x)) = −103.3479 = −105.1396 + log(3!)

Christian P. Robert Multimodality and label switching: a discussion

Page 25: ICMS Discussion, March 2010

Does MCMC converge?Postprocessing MCMC output

Chib’s solutionNested sampling

Galaxy dataset (k)

Using only the original estimate, with θ∗k as the MAP estimator,

log(mk(x)) = −105.1396

for k = 3 (based on 103 simulations), while introducing thepermutations leads to

log(mk(x)) = −103.3479 = −105.1396 + log(3!)

k 2 3 4 5 6 7 8

mk(x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44

Estimations of the marginal likelihoods by the symmetrised Chib’sapproximation (based on 105 Gibbs iterations and, for k > 5, 100permutations selected at random in Sk).

[Lee, Marin, Mengersen & Robert, 2008]

Christian P. Robert Multimodality and label switching: a discussion

Page 26: ICMS Discussion, March 2010

Does MCMC converge?Postprocessing MCMC output

Chib’s solutionNested sampling

Comparison between evidence approximations

1 Nested sampling: M = 1000 points, with 10 random walkmoves at each step, simulations from the constr’d prior and astopping rule at 95% of the observed maximum likelihood

2 T = 104 MCMC (=Gibbs) simulations producingnon-parametric estimates ϕ

3 Monte Carlo estimates Z1, Z2, Z3 using product of twoGaussian kernels

4 numerical integration based on 850 × 950 grid [referencevalue, confirmed by Chib’s]

Christian P. Robert Multimodality and label switching: a discussion

Page 27: ICMS Discussion, March 2010

Does MCMC converge?Postprocessing MCMC output

Chib’s solutionNested sampling

Comparison (cont’d)

Graph based on a sample of 10 observations for µ = 2 andσ = 3/2 (150 replicas) V1=Nested sampling, V2=importance

sampling, V3=harmonic mean, V4=bridge sampling.[Chopin & Robert, 2010]

Christian P. Robert Multimodality and label switching: a discussion

Page 28: ICMS Discussion, March 2010

Does MCMC converge?Postprocessing MCMC output

Chib’s solutionNested sampling

Comparison (cont’d)

Graph based on a sample of 50 observations for µ = 2 andσ = 3/2 (150 replicas) V1=Nested sampling, V2=importance

sampling, V3=harmonic mean, V4=bridge sampling.[Chopin & Robert, 2010]

Christian P. Robert Multimodality and label switching: a discussion

Page 29: ICMS Discussion, March 2010

Does MCMC converge?Postprocessing MCMC output

Chib’s solutionNested sampling

Comparison (cont’d)

Graph based on a sample of 100 observations for µ = 2 andσ = 3/2 (150 replicas) V1=Nested sampling, V2=importance

sampling, V3=harmonic mean, V4=bridge sampling.[Chopin & Robert, 2010]

Christian P. Robert Multimodality and label switching: a discussion

Page 30: ICMS Discussion, March 2010

Does MCMC converge?Postprocessing MCMC output

Chib’s solutionNested sampling

Comparison (cont’d)

[Lee, Marin, Mengersen & Robert, 2010]

Christian P. Robert Multimodality and label switching: a discussion


Recommended