Markov Chain Monte Carlo explained

Post on 14-Jun-2015

5,590 views 8 download

Tags:

description

An brief overview of some of the most famous Markov Chain Monte Carlo Methods.

transcript

MarkovChainMonteCarlotheory and worked examples

Dario Digiuni,

A.A. 2007/2008

Markov Chain Monte Carlo

• Class of sampling algorithms

• High sampling efficiency

• Sample from a distribution with unknown normalization constant

• Often the only way to solve problems in time polynomial in the number of dimensions

e.g. evaluation of a convex body volume

MCMC: applications

• Statistical Mechanics

Metropolis-Hastings

• Optimization

▫ Simulated annealing

• Bayesian Inference

▫ Metropolis-Hastings

▫ Gibbs sampling

The Monte Carlo principle• Sample a set of N independent and identically-distributed variables

• Approximation of the target p.d.f. with the empirical expression

… then approximation of the integrals!

Rejection Sampling

1. It needs finding M!2. Low acceptance rate

Idea• I can use the previously sampled value to find the following one

• Exploration of the configuration space by means of Markov Chains:

def .: Markov process

def .: Markov chain

Invariant distribution• Stability conditions:

1. Irreducibility= for every state there exists a finite probability to visitany other state

2. Aperiodicity = there are no loops.

• Sufficient condition

1. Detailed balance principle

MCMC algorithms are aperiodic, irreducible Markov chains havingthe target pdf as the invariant distribution

Example• What is the probability to find the lift at the ground floor in a three

floor building?

▫ 3 states Markov chain

▫ Lift= Random Walker

▫ Transition matrix

▫ Looking for the invariant distribution

… burn-in …

Example - 2• I can apply the matrix T on the right to any of the states, e.g.

• Google’s PageRank:

▫ Websites are the states, T is defined by the number of hyperlinks amongthem and the user is the random walker:

The webpages are displayed following the invariant distribution!

~ 50% is the probability to findthe lift at the ground floor

homogeneousMarkov chain

Metropolis-Hastings• Given the target distribution

1. Choose a value for

2. Sample from a proposal distribution

3. Accept the new value with probability

4. Return to 1

Ratio independentof the normalization!

Equal in Metropolis algorithm

equivalent to T

M.-H. – Pros and Cons

• Very general sampling method:

▫ I can sample from a unnormalized distribution

▫ It does not require to provide upper bound for the function

• Good working depends on the choice of the proposal distribution

▫ well-mixing condition

M.-H. - Example• In Statistical Mechanics it is important to evalue the partition

function,

e.g. Ising model

Sum every possible spin state:In a 10 x 10 x 10 spin cube,I would have to sum over

Possible states = UNFEASIBLE

MCMC APPROACH:

1. Evaluate the system’s energy

2. Pick up a spin at random and flip it:

1. If energy decreases, this is the new spin configuration

2. If energy increases, this is the new spin configuration withprobability

Simulated Annealing

• It allows one to find the global maximum of a generic pdf

▫ No comparison between the value of local minima required

▫ Application to the maximum-likelihood method

• It is a non-homogeneous Markov chain whose invariant distributionkeeps changing as follows:

Simulated Annealing: example• Let us apply the algorithm to a simple, 1-dimensional case

• The optimal cooling scheme is

Simulated Annealing: Pros and Cons

• The global maximum is univocally determined▫ Even if walker starts next to a local (non global!) maximum, it converges to the

true global maximum

• It requires a good tuning of the parameters

Gibbs Sampler

• Optimal method to marginalize multidimensional distributions

• Let us assume we have a n-dimensional vector and that we know allthe conditional probability expression for the pdf

• We take the following proposal distribution:

Gibbs Sampler - 2

• Then:

very efficientmethod!

Gibbs Sampler – practically

Gibbs Sampler – practically

1. §Initialize

2. for (i=0 ; i < N; i++)

• Sample

• Sample

• Sample

• Sample

fix n-1 coordinates and sample from the resulting pdf

Gibbs Sampler – example

• Let us pretend we cannot determine the normalizationconstant…

… but we can make a comparison with the true marginalizedpdf…

Gibbs Sampler – results

• Comparison between GibbsSampling and the true M.-H.sampling from the marginalized pdf

• Good c2 agreement

A complex MCMC application

A radioactive source decays with frequency l1 and a detector recordsonly every k1 –th event, then at the moment tc the decay rate

changes to l2 and only one event out ofk2 is recorded.

Apparently l1 , k1 , tc , l2 and k2 are undetermined.

We wish to find them.

Preparation

• The waiting time for the k-th event in a Poissonian process withfrequency l is distributed according to:

• I can sample a big amount of events from this pdf, changing the parameters l1 e k1 to l2 e k2 at time tc

• I evaluate the likelihood:

Idea• I assume log-likelihood to be the invariant distribution!

▫ which are the Markov chain states?

struct State {

double lambda1, lambda2;double tc;int k1, k2;double plog;

State(double la1, double la2, double t, int kk1, int kk2) :

lambda1(la1), lambda2(la2), tc(t), k1(kk1), k2(kk2) {}

State() {}; };

Parameterspace

Corresponding log-likelihood value

Practically

• I have to find an appropriate proposal distribution to move amongthe states

▫ Attention: varying li and ki I have toi prevent the acceptance rate to betoo low… but also too high!

• The a ratio is evaluated as the ratio between the final-state and initial-state likelihood values.

• Try to guess the values for li , ki and tc

• Let the chain evolve for a burn-in time and then record the results.

Results• Even if the inital guess is quite far from the real

value, the random walker converges.

guess: l1=5 l2 = 5 k1 = 3 k2 = 2

real: l1=1 l2 = 2 k1 = 1, k2 = 1

Results- 2

• Estimate of the uncertainty

l1

l2

Results- 3

• All the parameters can be detemined quickly

guess: tc=150 real: tc=300

References

• C. Andrieu, N. De Freitas, A. Doucet e M.I. Jordan, Machine Learning 50(2003), 5-43.

• G. Casella e E.I. George, The American Statistician 46, 3 (1992), 167-174.

• W.H. Press, S. A. Teukolsky, W.T. Vetterling e B.P. Flannery, NumericalRecipes , Third Edition, Cambridge University Press, 2007.

• M. Loreti, Teoria degli errori e fondamenti di statistica, Decibel, Zanichelli (1998).

• B. Walsh, Markov Chain Monte Carlo and Gibbs Sampling, Lecture Notes for EEB 581