Sampling algorithms - University of Sheffield

transcript

Sampling algorithms * Purpose * Construction * Performance measures

SANDA DEJANIC

EAWAG ZÜRICH, SWITZERLAND

T H I S P R O J E C T H A S R E C E I V E D F U N D I N G F R O M T H E E U R O P E A N U N I O N ’ S

S E V E N T H F R A M E W O R K P R O G R A M M E F O R R E S E A R C H , T E C H N O L O G I C A L

D E V E L O P M E N T A N D D E M O N S T R A T I O N U N D E R G R A N T A G R E E M E N T N O 6 0 7 0 0 0 .

What is the probability of the

outcome being 6+6?

How likely is to pick the right parameters for our simulation?

Likelihood distribution

Quantification of how well does our parameter describe the reality

Take into account all available knowledge!

Lets see the Monty Hall problem

Solution to the Monty Hall problem

Introducing the prior

Some of the parameters have to be in certain limits.

Example:

* width of a water pipe can not be less than zero.

Define prior

Mathematically describe what we know about the prior distribution of all parameters

Common case:

* uniform distribution

* normal distribution

* distribution taken from literature

Bayesian Theorem

Constructing

the posterior

distribution

Likelyhood Prior

Posterior

Optimization comes down to finding the maximum of the target distribution (posterior distribution)

Sampling has an aim to describe the whole target distribution (posterior distribution)

Optimization vs Sampling

Optimization

calibrates the

model and allows

us to make predictions

Sampling gives

the uncertainty

intervals of the

prediction

Sometimes describing the posterior is not so easy…

Like exploring a dark room with a flashlight

Developing algorithms to deal with complex posteriors…

Markov Chain Monte Carlo

Metropolis algorithm

1 Begin with initial value

2 Generate a candidate from a proposal distribution

3 Evaluate the acceptance probability given by:

4 Generate a uniformly-distributed random number from

Unif[0,1] and accept if

5 Increase the counter and goto 2

Convergence guaranteed, but…

Generally it can be inefficient due to the scale, and shape of posterior

Adaptive algorithms

Learn from the past and adapt the proposal distribution

Reminder:

* Markov chain updates based only on the

previous step

Possibility of restrained exploration

Example:

* Ignoring some of the modes

Chain Diagnostics

Can we think of a clever way to make sure our posterior is explored entirely?

Ensemble based sampling algorithms

Stretch move vs. Differential evolution

Ensemble based sampling algorithms

Many walkers

exploring the

parameter

Learning

from each

Jumping

according

to already

accepted

positions Each

having

it’s

having

the burn

in phase

Posibility of running the model

on multiple cores

Standard Diagnostics

Chain diagnostics as for Metropolis

Gelman and Rubin

Marginals

Appraisal of the performance

* Burn in: moment of convergence

* Robustness: reliable, scalable, available

* Effective sample size I am not mean,

everybody is just

too sensitive…

- Posterior

Measuring the entropy

Analytically:

Empirically:

Taking the mean log.posterior from the ensemble

Burn in period

Robustness

Sensitivity to tuning

parameters

Effective sample size

Acceptance rate

Correlation

◦ * within the chains

◦ * within the ensemble

Thinning

Packages

Implementation in R and Julia on GitHub

MCMCEnsembleSampler

Original implementation in Python

Goodmann and Ware

There are many

tools…

Use the right one

Sanda Dejanic

Sampling algorithms - University of Sheffield

Documents