7/31/2019 MCMC Algorithms - Saquib
1/25
MCMC algorithms: Metropolis-
Hastings and its variants
Data Mining Seminar Fall 2012
Nazmus Saquib
7/31/2019 MCMC Algorithms - Saquib
2/25
Motivation
Metropolis among the top 10 algorithms in
science and engineering.
Use in Statistics, Econometrics, Physics,
Computing science.
Example: High dimensional problems such as
computing the volume of a convex body in d
dimensions.
7/31/2019 MCMC Algorithms - Saquib
3/25
Motivation
Normalizing factor in Bayes Theorem:
Statistical Mechanics: Partition function Z
7/31/2019 MCMC Algorithms - Saquib
4/25
Back to Monte Carlo
Monte Carlo Simulation:
Draw i.i.d. set of N samples {x_(i)}.
Almost surely converges
Using central limit theorem.
7/31/2019 MCMC Algorithms - Saquib
5/25
Rejection Sampling
Sample another easy to use distribution q(x)
that satisfies p(x)
7/31/2019 MCMC Algorithms - Saquib
6/25
Importance Sampling
7/31/2019 MCMC Algorithms - Saquib
7/25
Why MCMC?
Wasting resources we need to spend more
time on the tail that overlaps with E.
7/31/2019 MCMC Algorithms - Saquib
8/25
MCMC Principles
Even with adaptation, often impossible to obtain proposaldistributions that are easy to sample from and goodapproximations at the same time.
Markov Chain is used to explore the state space X.
Transition matrix (kernels) are constructed so that the chainspends more time in the important regions.
7/31/2019 MCMC Algorithms - Saquib
9/25
MCMC Principles
For any starting point, the chain will converge
to the invariant distribution p(x) As long as T is a stochastic transition matrix
Irreducible graph should be connected.
Aperiodicity chain should not get trapped in cycles.
7/31/2019 MCMC Algorithms - Saquib
10/25
Detailed Balance (reversibility)
Condition
One way to design a MCMC sampler is to
satisfy this condition. However, convergence speed plays a more
crucial role in terms of practicalities.
7/31/2019 MCMC Algorithms - Saquib
11/25
Spectral Theory and Convergence
(brief review)
Note that p(x) is the left eigenvector of the matrix
T with corresponding eigenvalue 1 (Perron-Frobenius theorem).
Remaining eigenvalues are less than 1.
Second largest eigenvalue, therefore, determinesthe rate of convergence. Should be as small aspossible.
7/31/2019 MCMC Algorithms - Saquib
12/25
Application: PageRank (Google)
T = L + E, where L is a large link matrix.
L_(i,j) = normalized number of links from websiteI to website j.
E = uniform random matrix of small magnitudeadded to L to ensure irreducibility andaperiodicity. (addition of noise).
[L + E] p(x_(i+1)) = p(x_i)
Transition matrix as kernels: design differentkernels to introduce bias etc. to make the resultsmore interesting.
7/31/2019 MCMC Algorithms - Saquib
13/25
Mathematical Representation
Based on different kernels, different kinds of
Markov Chain algorithms are possible.
The most celebrated is the Metropolis-Hastings algorithm.
7/31/2019 MCMC Algorithms - Saquib
14/25
Metropolis-Hastings Algorithm
7/31/2019 MCMC Algorithms - Saquib
15/25
Metropolis-Hastings Algorithm
7/31/2019 MCMC Algorithms - Saquib
16/25
Metropolis-Hastings Algorithm
(properties)
Kernel:
Rejection Term:
Detailed Balance:
7/31/2019 MCMC Algorithms - Saquib
17/25
Independent Sampler Algorithm
Proposal is independent of the current state.
Algorithm is close to importance sampling, but
now the samples are correlated, since they result
from comparing one sample to the other.
7/31/2019 MCMC Algorithms - Saquib
18/25
Metropolis Algorithm
Assumes a symmetric random walk proposal.
7/31/2019 MCMC Algorithms - Saquib
19/25
Metropolis Algorithm
Normalizing constant of the target distribution
is not required. (Cancels each other out)
Parallelization Several independent chains
can be simulated in parallel.
Success or failure depends on the parameters
selected for the proposal distribution.
7/31/2019 MCMC Algorithms - Saquib
20/25
Metropolis Algorithm
7/31/2019 MCMC Algorithms - Saquib
21/25
Simulated Annealing
Global Optimization.
Could be estimated by
Argmax p(x_i), x_i, i = 1..N
Inefficient because random samples rarely comefrom the vicinity of the mode (blind samplingunless the distribution has large probabilitymass around the mode).
Simulated Annealing is a variant ofMCMC/Metropolis-Hastings that solves thisproblem.
7/31/2019 MCMC Algorithms - Saquib
22/25
Simulated Annealing
7/31/2019 MCMC Algorithms - Saquib
23/25
Simulated Annealing
7/31/2019 MCMC Algorithms - Saquib
24/25
Other Methods
Mixture of Kernels! Could be very useful when target distribution has many
peaks Can incorporate global proposals to explore vast regions
of the state space. (global proposal locks into peaks)
Local proposals to discover finer details. (explore spacearound peaks)
7/31/2019 MCMC Algorithms - Saquib
25/25
Gibbs Sampling etc..
Parasaran..
Thank you!