Sampling algorithms * Purpose * Construction * Performance measures
SANDA DEJANIC
EAWAG ZÜRICH, SWITZERLAND
T H I S P R O J E C T H A S R E C E I V E D F U N D I N G F R O M T H E E U R O P E A N U N I O N ’ S
S E V E N T H F R A M E W O R K P R O G R A M M E F O R R E S E A R C H , T E C H N O L O G I C A L
D E V E L O P M E N T A N D D E M O N S T R A T I O N U N D E R G R A N T A G R E E M E N T N O 6 0 7 0 0 0 .
What is the probability of the
outcome being 6+6?
How likely is to pick the right parameters for our simulation?
Likelihood distribution
Quantification of how well does our parameter describe the reality
Take into account all available knowledge!
Lets see the Monty Hall problem
Solution to the Monty Hall problem
Introducing the prior
Some of the parameters have to be in certain limits.
Example:
* width of a water pipe can not be less than zero.
Define prior
Mathematically describe what we know about the prior distribution of all parameters
Common case:
* uniform distribution
* normal distribution
* distribution taken from literature
Bayesian Theorem
Constructing
the posterior
distribution
Likelyhood Prior
Posterior
Optimization comes down to finding the maximum of the target distribution (posterior distribution)
Sampling has an aim to describe the whole target distribution (posterior distribution)
Optimization vs Sampling
Optimization
calibrates the
model and allows
us to make predictions
Sampling gives
the uncertainty
intervals of the
prediction
Sometimes describing the posterior is not so easy…
Like exploring a dark room with a flashlight
Developing algorithms to deal with complex posteriors…
MCMC
Markov Chain Monte Carlo
Metropolis algorithm
1 Begin with initial value
2 Generate a candidate from a proposal distribution
3 Evaluate the acceptance probability given by:
4 Generate a uniformly-distributed random number from
Unif[0,1] and accept if
5 Increase the counter and goto 2
Convergence guaranteed, but…
Generally it can be inefficient due to the scale, and shape of posterior
Adaptive algorithms
Learn from the past and adapt the proposal distribution
Reminder:
* Markov chain updates based only on the
previous step
Possibility of restrained exploration
Example:
* Ignoring some of the modes
Chain Diagnostics
Can we think of a clever way to make sure our posterior is explored entirely?
Ensemble based sampling algorithms
Stretch move vs. Differential evolution
Ensemble based sampling algorithms
Many walkers
exploring the
parameter
space
Learning
from each
other
Jumping
according
to already
accepted
positions Each
having
it’s
own
chain
Each
having
the burn
in phase
Posibility of running the model
on multiple cores
Standard Diagnostics
Chain diagnostics as for Metropolis
Mean
Gelman and Rubin
Marginals
Appraisal of the performance
* Burn in: moment of convergence
* Robustness: reliable, scalable, available
* Effective sample size I am not mean,
everybody is just
too sensitive…
- Posterior
Measuring the entropy
Analytically:
Empirically:
Taking the mean log.posterior from the ensemble
Burn in period
Robustness
Sensitivity to tuning
parameters
Effective sample size
Acceptance rate
Correlation
◦ * within the chains
◦ * within the ensemble
Thinning
Packages
Implementation in R and Julia on GitHub
MCMCEnsembleSampler
Original implementation in Python
EMCEE
Goodmann and Ware
There are many
tools…
Use the right one
Sanda Dejanic