A tutorial on Markov Chain Monte Carlo. Problem g (x) dx I = If{X } form a Markov chain with...

A tutorial

on Markov

Chain

Monte Carlo

Problem

g (x) dxI =

If {X } form a Markov chain withstationary probability

i

I g(x )i(x )ii=1

N

1

N

( e.g. bayesian inference )

MCMC is thenThe problem of designing a Markov Chainwith a pre-specified stationary distribution so that the integral I can be accuratelyapproximated in a reasonable amount of time.

X1

X X2

, , . . . XN

Metropolis et. al. (circ. 1953)

yx

p

1-P

y ‘

. . .

.

.

.

p’

p = min( 1, (y)

(x) ) G(y|x)

Theorem:Metropolis works for any proposaldistribution G such that:

G(y|x) = G(x|y)

provided the MC is irreducible andaperiodic.

Proof: min( 1,

(y)(x) ) G(y|x) (x)

Is symmetric in x and y.Note: it also works for general G provided we change p a bit (Hastings’ trick)

12

3

3

2

1

1

.5

.5

1Period = 2

Reducible

Let f(w,z) be the joint density with conditionals u(w|z) ,v(z|w) and marginals g(w), h(z).

u(w|z) h(z) dz = g(w)

v(z|w) g(w) dw = h(z)

Gibbs samplerTake X=(W,Z) a vector. To sample X it is sufficient tosample cyclically from the conditionals (W|z), (Z|w).

Gibbs is in fact a special case of Metropolis. Take proposals as theexact conditionals and G(y|x) = 1, i.e. always accept a proposed move.

T(g,h) = (g,h) a fix point!

Example:Likelihood

Entropic prior

Entropic posterior

In terms of the suff. stats. and

Entropic inference on gaussians

d exp(-I(’)) d

The Conditionals

|

|

gaussian

Generalized inverse gaussian

Gibbs+Metropolis% init posterior log likelihoodLL = ((t1-n2*mu).*mu-t3).*v + (n3-1)*log(v) - a2*((mu-m).^2+1./v);

LL1s(1:Nchains,1) = LL;for t=1:burnin mu = normrnd((v*t1+a1m)./(n*v+a1),1./(n*v+a1)); v = do_metropolis(v,Nmet,n3,beta,a2); LL1s(1:Nchains,t+1) = ... ((t1-n2*mu).*mu-t3).*v + (n3-1)*log(v) - a2*((mu-m).^2+1./v);end

function y = do_metropolis(v,Nmet,n3,t3,a2)%[Nchains,one] = size(v);x = v;accept = 0;reject = 0;lx = log(x);lfx = (n3-1)*lx-t3*x-a2./x;for t=1:Nmet y = gamrnd(n3,t3,Nchains,1); ly = log(y); lfy = (n3-1)*ly-t3*y-a2./y; for c=1:Nchains if (lfy(c) > lfx(c)) | (rand(1,1) < exp(lfy(c)-lfx(c))) x(c) = y(c); lx(c) = ly(c); lfx(c) = lfy(c); accept = accept+1; else reject = reject+1; end endend

Convergence: Are we there yet?

Looks OKafter second point.

Mixing is Good Segregation is Bad!

The of simulation

• Run several chains• Start at over-dispersed

points• Monitor the log lik.• Monitor the serial

correlations• Monitor acceptance

ratios

• Re-parameterize (to get approx. indep.)

• Re-block (Gibbs)• Collapse (int. over

other pars.)• Run with troubled

pars. fixed at reasonable vals.

Monitor R-hat, Monitor mean of score functions, Monitor coalescence, use connections, become EXACT!

Get Connected!

Unnormalized posteriors: q(|ww) (w)

(e.g. w = w(x) = vector of suff. stats.)

q(|w)

q(|w

q(|w(t)

t=0

t=1

vk(t) k (,w(t))

log(Z1/Z0) (1/N) (j,w(tj))

Where tj uniform on [0,1] and j from (w(tj)).is the average tangent direction along the path.Choice of path is equivalent to choice of prior on [0,1]. Best (min. var.) prior (path) is generalized Jeffreys!Information geodesics are the best paths on manifoldof unnormalized posteriors.

Easy paths:

- geometric

- mixture

- scale

Exact rejection constants are known along the mixture path!

The Presentis

trying to bePerfectly Exact

New Exact MathMost MCs are iterations of random functions:

Let f:family of functions.Choose n points, …,n in independently

with some p.m. defined on

Forward iter.: X0 = x0, X1 = f(x0), …, Xn+1 = fn (Xn) = (fnff1)(x0)

Backward iter.: Y0 = x0, Y1 = fn(x0), …, Yn+1 = f1 (Yn) = (f1f2 ...fn)(x0)

Xn = Yn for all n, but as processes {Xn } {Yn } d

E.g. Let a<1. Take S (space of states) the real line, ={+,-}, (+)=(-)=1/2 and f+(x) = a x + 1, f-(x) = a x - 1.

Xn = a Xn-1 + en but Yn = a Yn-1 + e1

Moves all over S

To a constant on S

(corresp. frames have same distribution but the MOVIES are different )

Dead Leaves Simulation

Forward Backward

Looking down Looking up

http://www.warwick.ac.uk/statsdept/Staff/WSK/

Convergence of

functions are contracting on average

when

Yn+1 = (f1f2 ...fn)(x0)

Propp & Wilsonwww.dbwilson.com

Perfectly equilibrated2D Ising state atcritical T = 529K

t = 0

t = -MGibbs with the same random numbers

s t f(s) f(t)0

1

1

.5

.5

Need backward iterations.First time to coalescence isnot distributed as ere chain always coalesces at 0first BUT (0) = 2/3, (1) = 1/3

Not Exactly!Yet.

Date post:	19-Jan-2016
Category:	Documents
Upload:	frank-french
View:	220 times
Download:	0 times

A tutorial on Markov Chain Monte Carlo. Problem g (x) dx I = If{X } form a Markov chain with...

Documents