Markov-Chain Monte-CarloAdvanced Seminar “Machine Learning”
Sascha Meusel
04.02.2015
Winter Semester 2014/2015
Motivation Introduction Markov-Chain Monte-Carlo References
Motivation
What is Markov-Chain Monte-Carlo, and what use has it?
Problems can be difficult to solve analytically,or don’t even have any analytical solutionMCMC is a class of algorithms based on Monte Carlosampling, tackling such problems
for Monte Carlo: needed distributions can be difficult toimplement (e.g. non-Gaussian / non-Uniform)but Markov chains can provide also complexer distributionsMarkov chains are a kind of state machines with transitions toother states having a certain probabilityStarting with an initial state, calculate the probability whicheach state will have after N transitions→ distribution over states
Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 2 / 22
Motivation Introduction Markov-Chain Monte-Carlo References
Motivation
Example: calculate volume of d-dimensional convex bodySolution with MCMC: Formulate distribution over x ∈ Rd with
p(x) ={
1 if x inside body0 else
Draw N samples xi of a d-dimensional bounding box BB inRd with the convex body completely inside itThe volume of the bounding box is known(side1 ∗ side2 ∗ ... ∗ sided)|samples inside box|
N ∗ volume(BB) ≈ volume(body)In this simple example no Markov chain usage is visible, but thereexists more sophisticated MCMC methods using Markov chains tosolve this problem.
Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 3 / 22
Motivation Introduction Markov-Chain Monte-Carlo References
Contents
1 Motivation
2 IntroductionIntroduction to Monte-CarloIntroduction to Markov-Chains
3 Markov-Chain Monte-CarloMetropolis-HastingsRejection SamplingImportance SamplingGibbs samplingHybrid Monte CarloSlice sampling
4 References
Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 4 / 22
Motivation Introduction Markov-Chain Monte-Carlo References
Introduction to Monte-Carlo
Task: expectation value needed:
Ep(x)[f (x)] =∫
f (x)p(x)dx
Problem: no or only an expensive analytical solutionSolution: Sample over p(x):
Ep(x)[f (x)] ≈ f̂ = 1S
S∑s=1
f (x(s)), x(x) ∼ p(x)
Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 5 / 22
Motivation Introduction Markov-Chain Monte-Carlo References
Introduction to Monte-Carlo
Properties:Unbiased estimator f̂ :
Ep({x(s)})
[f̂]
=S∑
s=1Ep(x)[f (x)] = Ep({x(s)}) [f (x)]
Variance shrinks ∝ 1S :
varp({x(s)})
[f̂]
= 1S2
S∑s=1
varp(x)[f (x)] = 1S varp({x(s)}) [f (x)]
Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 6 / 22
Motivation Introduction Markov-Chain Monte-Carlo References
Introduction to Markov-Chains
Markov chain on finite state space:
stochastic process x(i) ∈ X = {x1, ..., xS}(sequence of random variables)p(x(i)|x(i−1), ..., x(1)) = T (x(i)|x(i−1))→ T depends only on current state i − 1
homogeneous Markov chain:
T is invariant ∀i, with∑
x(i)∈X T (x(i)|x(i−1)) = 1 ∀i→ fixed T matrix, withp(x(i)|x(i−1), ..., x(1)) = Tp(x(i−1)|x(i−2), ..., x(1))given irreducibility and aperiodity, chain converges toinvariant distribution p(x) after several steps:pN (x) = TN p0(x)
Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 7 / 22
Motivation Introduction Markov-Chain Monte-Carlo References
Introduction to Markov-Chains
T =
0 0 0.61 0.1 0.40 0.9 0
, initial distribution: p0(x) =
0.50.20.3
Ti,j : the probability to go to state i given state jpN (x) = TN p0(x), gives pN (x) =
(0.2 0.4 0.4
)T
Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 8 / 22
Motivation Introduction Markov-Chain Monte-Carlo References
Introduction to Markov-Chains
Markov chain on continuous state space:∫p(x(i))K (x(i+1)|x(i))dx(i) = p(x(i+1))
instead T an integral kernel K: the conditional density ofx(i+1) given x(i)
is a mathematical description for a Markov chain algorithm
Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 9 / 22
Motivation Introduction Markov-Chain Monte-Carlo References
Metropolis-Hastings
proposal distribution q(x∗|x), with x∗ being a samplingcandidate and x being the current valuetarget distribution p(x)acceptance probability A(x(i), x∗) = min
(1, p(x∗)q(x(i)|x∗)
p(x(i))q(x∗|x(i))
)initialize x(0)
for i = 0 to N − 1:sample u ∼ U[0,1] //U is uniform distributionsample x∗ ∼ q(x∗|x(i))if u < A(x(i), x∗):
x(i+1) = x∗
else:x(i+1) = x(i)
Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 10 / 22
Motivation Introduction Markov-Chain Monte-Carlo References
Metropolis-Hastings
proposal distribution q(x∗|x(i)) = N (x(i), 100)bimodal target distribution p(x) ∝ 0.3e−0.2x2 + 0.7e−0.2(x−10)2
Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 11 / 22
Motivation Introduction Markov-Chain Monte-Carlo References
Metropolis-Hastings
Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 12 / 22
Motivation Introduction Markov-Chain Monte-Carlo References
Rejection Sampling
Given: complex distribution p(x)Choose a distribution q(x) which we can sample (e.g. Gaussian)Find factor M , so that p(x) ≤ Mq(x), with M <∞
Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 13 / 22
Motivation Introduction Markov-Chain Monte-Carlo References
Rejection Sampling
Sampling algorithm:
i := 1while i ≤ N :
sample x(i) ∼ q(x)sample u ∼ U(0,Mq(x(i)))if u < p(x(i)):
accept x(i) as samplei + +
else: reject sample
To avoid too many rejections, Mq(x) should be chosen so that itbounds p(x) as strong as possible.
Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 14 / 22
Motivation Introduction Markov-Chain Monte-Carlo References
Importance Sampling
∫f (x)p(x)dx =
∫f (x)p(x)
q(x)q(x)dx
≈ 1S
S∑s=1
f (x(s))p(x(s))q(x(s))
, with x(s) ∼ q(x)
p(x(s))q(x(s)) is the importance weight w(s)
So we can simply sample over q(x) and multiply each sample withits weight w(s) → no rejections
Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 15 / 22
Motivation Introduction Markov-Chain Monte-Carlo References
Gibbs sampling
Let x be n-dimensional.Also let be given, that we can calculate
p(xj |x1, ..., xj−1, xj+1, ..., xn) = p(xj |x−j)
with a proposal distribution q(x∗|x(i)) ={
p(x∗j |x(i)−j ) if x∗−j = x(i)
−j0 else
and A(x(i), x∗) = min(1, p(x∗)q(x(i)|x∗)
p(x(i))q(x∗|x(i))
)= min
(1, p(x∗
−j)p(x(i)
−j )
)= 1
initialize x(0)1:n
for i = 0 to N − 1:for j = 0 to n:
x(i+1)j ∼ p(xj |x(i)
1 , ..., x(i)j−1, x
(i)j+1, ..., x
(i)n )
Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 16 / 22
Motivation Introduction Markov-Chain Monte-Carlo References
Hybrid Monte Carlo
Also known as Hamilton Monte Carlo.Basic idea: using gradient of target distribution
Simulate walk through target distribution as a sphere withoutfriction on a potential field surface.Therefore auxiliary variables u ∈ Rnx needed to storemomentum of sphere.Sphere will be more often in areas with lower potential, sothose areas represents regions in the target distribution withhigher density.parameters: stepsize ρ and number of steps per iteration L
Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 17 / 22
Motivation Introduction Markov-Chain Monte-Carlo References
Hybrid Monte Carlo
initialize x(0)
for i = 0 to N − 1:sample v ∼ U[0,1] and u∗ ∼ N (0, Inx )define x0 = x(i) and u0 = u∗ + ρ∆(x0)/2for l = 1 to L:
xl = xl−1 + ρul−1
ul = ul−1 + ρl∆(xl) with ρl ={ρ if l < Lρ/2 if l = L
(x(i+1), u(i+1)) ={
(xL, uL) if v < A(x(i), u∗) else
with ∆(x) = ∂∂x logp(x),
and A = min(1, p(xL)
p(x(i))exp(−12(uT
L uL − u∗T u∗)))
Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 18 / 22
Motivation Introduction Markov-Chain Monte-Carlo References
Slice sampling
Idea: use auxiliary variable u ∈ R and extended target distribution
p∗(x, u) ={
1 if 0 ≤ u ≤ p(x)0 else
with∫
p∗(x, u)du =∫ p(x)
0du = p(x)
So we can sample over p∗(x, u) and then ignore u. We can alsoextend this to L many variables, resulting to following sampler:
for l = 1 to L:sample u(i)
l ∼ U[0,fl(x(i−1))](ul)
sample x(i) ∼ UA(i)(x)with A(i) = {x|fl(x) ≥ u(i)
l , l = 1, ...,L}
Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 19 / 22
Motivation Introduction Markov-Chain Monte-Carlo References
Slice sampling
Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 20 / 22
Motivation Introduction Markov-Chain Monte-Carlo References
Thanks for your Attention :-)
Thanks for your attention :-)
Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 21 / 22
Motivation Introduction Markov-Chain Monte-Carlo References
References
Andrieu, Christophe and de Freitas, Nandoand Doucet, Arnaud and Jordan, Michael I.:An Introduction to MCMC for Machine LearningIn: Machine Learning, Kluwer Academic Publishers, 2003, 50,5-43.Murray, Iain:Markov chain Monte CarloIn: Tutorial at Machine Learning Summer School, 2009.
Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 22 / 22