Date post: | 06-May-2015 |
Category: |
Documents |
Upload: | xin-she-yang |
View: | 921 times |
Download: | 0 times |
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
Monte Carlo Simulations, Sampling and
Markov Chain Monte Carlo
Xin-She Yang
c©2010
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
Estimating π
How to estimate π using only a ruler and some match sticks?
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
Buffon’s Needle Problem
Buffon’s needle problem (1733). Probability of crossing a line
p =2
π· Ld
,
where L = length of needles, and d =spacing.
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
Probability of Crossing a Line
Since p ≈ n/N ≈ 2L/πd , we have
π ≈ 2N
n· Ld
.
Lazzarini (1901): L = 5d/6, N = 3408, n = 1808, so
π ≈ 2× 3408
1808· 56≈ 3.14159290.
Too accurate?! Is this right? What happens when n = 1809?
Errors ∼ 1/√
N ∼ 2%.
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
Monte Carlo Methods
Everyone has used Monte Carlo methods in some way ...
Measure temperatures, choose a product, ...
Taste soup, wine ...
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
Monte Carlo Integration
I =
∫
Ωfdv = V
[ 1
N
n∑
i=1
fi
]
+ O(ǫ),
ǫ ∼
√
1N
∑Ni=1 f 2
i − µ2
N∼ O(1/
√N).
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
Importance and Quality of the Samples
Higher dimensions – even more challenging!
I =
∫ ∫
...
∫
f (u, v , ...,w) du dv ...dw .
Errors ∼ 1/√
N
Higher dimensional integrals
How to distribute these sampling points?
Regular grids: E ∼ O(N−2/d ) in d ≥ 4 dimensions (notenough!)
Strategies: importance sampling, Latin hypercube, ...
Any other ways?
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
Quasi-Monte Carlo Methods
In essence, that is to distribute (consecutive) sampling pointsas far away as possible, using quasi-random or low-discrepancynumbers (not pseudo-random)... Halton, Sobol, Corput ...
For example, Corput express an integer n as a prime base b
n =m
∑
j=0
aj(n)bj , aj ∈ 0, 1, 2, ..., b − 1.
Then, it is reversed or reflected
φb(n) =
m∑
j=0
aj(n)1
bj+1.
For example, 0, 1, 2, ..., 15 =⇒ 0, 12 , 1
4 , 34 , 1
8 , ..., 1516 .
Errors ∼ O(1/N)
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
Pseudorandom numbers – by deterministic
sequences
Uniform Distributions:
di = (adi−1 + c) mod m,
Classic IBM generator:
a = 65539, c = 0, m = 231(strong correlation!)
In fact, correlation coefficient is 1!Better choice (old Matlab):
a = 75 = 16807, c = 0, m =31 −1 = 2, 147, 483, 647.
If scaled by m, all numbers are in [1/m, (m − 1)/m].New Matlab: [ǫ, 1− ǫ], ǫ = 2−53 ≈ 1.1 × 10−16.
IEEE: 64-bits system = 53 bits for a signed fraction in base 2and 11 bits for a signed exponent.
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
Other Distributions
Inverse transform method, rejection method, Mersenne twister,..., Markov chain Monte Carlo.
Standard norm distribution: p(u) = 1√2π
e−u2/2,
CDF: Φ(v) = 1√2π
∫ v
−∞ e−u2/2du = 12 [1 + ( v√
2)],
v = Φ−1(u) =√
2 erf−1(2u − 1),
0
200
400
600
800
1000
1200
0 0.2 0.4 0.6 0.8 10
2000
4000
6000
8000
10000
-6 -4 -2 0 2 4 6
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
Transform method: Limitations
v = Φ−1(u) =√
2 erf−1(2u − 1),
erf−1(x) =
√π
2
[
x +πx3
12+
7π2x5
480+
127π3x7
40320+ · · ·
]
.
Not so easy to calculate!
Sometimes, the inverse may not be possible.
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
Multivariate Distributions
Bivariate normal distributions:
p(v1, v2) =1
2πe−(v2
1 +v22 )/2.
Box-Muller method: from u1, u2 ∼ uniform distributions
v1 =√
−2 ln u1 cos(2πu2), v2 =√
−2 ln u1 sin(2πu2).
Problems
Difficult to calculate the inverse in most cases(sometimes, even impossible!).
Other methods (e.g., rejection method) are inefficient.
So – the Markov chain Monte Carlo (MCMC) way!
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
Random Walk down the Markov Chains
Random walk – A drunkard’s walk:
ut+1 = µ + ut + wt ,
where wt is a random variable, and µ is the drift.For example, wt ∼ N(0, σ2) (Gaussian).
-10
-5
0
5
10
15
20
25
0 100 200 300 400 500-20
-15
-10
-5
0
5
10
-15 -10 -5 0 5 10 15 20
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
Markov Chains
Markov chain: the next state only depends on the current stateand the transition probability.
P(i , j) ≡ P(Vt+1 = Sj
∣
∣
∣V0 = Sp, ...,Vt = Si)
= P(Vt+1 = Sj
∣
∣
∣Vt = Sj),
=⇒ Pijπ∗i = Pjiπ
∗j , π∗ = stionary probability distribution.
Examples: Brownian motion
ui+1 = µ + ui + ǫi , ǫi ∼ N(0, σ2).
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
Markov Chains
Monopoly (board games)
Monopoly Animation
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
A Famous $Billion Markov Chain – PageRank
Google PageRank Algorithm (by Page et al., 1997)
Billions of web pages: pages = states, link probability ∼ 1/twhere t ≈ the expectation of the number of clicks.
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
Googling as a Markov Chain
Rank(t+1)j =
1− α
N+ α
∑
pi∈Ω(pi )
Rank(t)i
B(pi),
where N=number of pages, B(pi) is the link bounds of page
pi , and α=a ranking factor (≈ 0.85). Rank(t=0)i = 1/N.
Let R =(
Rank1, ...,RankN
)T, and L(pi , pj ) = 0 if no links
=⇒
R =1
N
(1 − α)
...
(1 − α)
+ α
L(p1, p1) ... L(p1, pj ) ...L(p1, pN )...
L(pi , p1) L(pi , pj ) ...L(pi , pN )...
. . .
L(pN , p1) ... L(pN , pN )
R,
where∑N
i=1 L(pi , pj ) = 1. Google Matrix (stochastic, sparse).=⇒ a stationary probability distribution R (update monthly).
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
Markov Chain Monte Carlo
Landmarks: Monte Carlo method (1930s, 1945, from 1950s)e.g., Metropolis Algorithm (1953), Metropolis-Hastings (1970).
Markov Chain Monte Carlo (MCMC) methods – A class ofmethods.
Really took off in 1990s, now applied to a wide range of areas:physics, Bayesian statistics, climate changes, machine learning,finance, economy, medicine, biology, materials and engineering...
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
Metropolis-Hastings
The Metropolis-Hastings algorithm algorithm:
1 Begin with any initial θ0 at time t ← 0 such thatp(θ0) > 0
2 Generating a candidate sample θ∗ ∼ q(θt , .) from aproposal distribution
3 Evaluate the acceptance probability α(θt , θ∗) given by
α = min[p(θ∗)q(θ∗, θt)
p(θt)q(θt , θ∗), 1
]
4 Generate a uniformly-distributed random number u ∼Unif[0, 1], and accept θ∗ if α ≥ u. That is, if α ≥ u thenθt+1 ← θ∗ else θt+1 ← θt
5 Increase the counter or time t ← t + 1, and go to step 2
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
Mixture distribution: A distribution with known
mean and variance.
f (x |µ, σ2) =∑K
i=i αipi (x |µi , σ2i ),
∑Ki=1 αi = 1.
E.g., α1 = α2 = 1/2, µ1 = 2, µ2 = −2 and σ1 = σ2 = 1.
-4
-2
0
2
4
6
0 2000 4000 6000 8000 10000
−6 −4 −2 0 2 4 60
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
When to Stop the Chain
As the MCMC runs, convergence may be reached
When does a chain converge? When to stop the chain ... ?
Are the samples correlated ?
0
100
200
300
400
500
600
0 100 200 300 400 500 600 700 800 900
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
A Long Single Chain or Multiple Short Chains?
When a Markov chain will converge in practice? If it hasconverged, what does it mean?
Is a very long chain really good enough (from statisticalpoint of view)?
How long is long enough?
Are multiple chains better?
How to improve the sampling efficiency and/or mixingproperties ?
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
Simulated Tempering
Simulated annealing: temperature T from high to low.Simulated tempering: raise T to a higher value, reduce to low.
πτ = π(x)1/τ , πτ→∞ → 1, as τ →∞.
The basic idea is to reduce from a very high τ to τ0 = 1.
=⇒flatten
π≥0 πτ = π(x)1/τ
Tempering
Use flattened (near uniform) distributions asproposals/candidates to produce high quality samplings.
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
Sampling: Forward or Backward? Which Way?
Is this the only way?
No! – Coupling from the Past & Metaheuristics
If we go backward along the chain, any advantages? If so, how?
Is there a universally efficient sampling tool for drawingsamples in general?
No! – No-free-lunch theorem (Wolpert & Macready, 1997)
The aim of the research is to find the best algorithm(s) for agiven/specific problem/distribution.
Also Metaheuristics (very promosing).
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
Thank you
References
Gamerman D., Markov Chain Monte Carlo, Chapman & Hall/CRC, (1997).
Corcoran J. and Tweedie R., Perfect sampling ... Jour. Stat. Plan. Infer., 104, 297 (2002).
Cox M., Forbes A. B., Harris P. M., Smith I., Classification and solution of regression ..., NPL SSfMReport, (2004).
Propp J. & Wilson D., Exact sampling ..., Random Stru. Alg., 9, 223 (1996).
Yang X. S., Nature-Inspired Metaheuristic Algorithms, Luniver Press, (2008).
Yang X. S., Introduction to Computational Mathematics, World Scientific, (2008).
Yang X. S., Engineering Optimization: An Introduction with Metaheuristic Applications, Wiley,(2010).
Acknowledgement:
EPSRC, SSfM, NPL, CUED, and London Maths Society.
Thank you!
Xin-She Yang Monte Carlo & MCMC