• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Bayesian Statistics
Introduction to Stan
Leonardo Egidi
A.A. 2019/20
Leonardo Egidi Introduction 1 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Origins
Stanislaw Ulam (1909-1984): Manhattan project, H-Bomb experiments inLos Alamos, MCMC father jointly with John von Neumann.
Leonardo Egidi Introduction 2 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Indice
1 What is Stan?
2 Why Stan?
3 Writing a Stan program
4 Linked package: bayesplot
Leonardo Egidi Introduction 3 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
What is Stan?
Probabilistic programming language and inference algorithms.
Stan program
declares data and (constrained) parameter variablesde�nes log posterior (or penalized likelihood)
Stan inference
MCMC for full BayesVariational Bayes for approximate BayesOptimization for (penalized) MLE
Stan ecosystem
lang, math library (C++)interfaces and tools (R, Python, Julia, many more)documentation (example model repo, user guide & reference manual ,
case studies , R package vignettes)online community ( Stan Forums on Discourse)
Leonardo Egidi Introduction 4 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Why Stan?
Fit rich Bayesian statistical models. Close to the big data philosophy.
E�ciency
Hamiltonian Monte Carlo + NUTSCompiled to C++
Flexible domain speci�c language
�Freedom-respecting, open-source�
doc & written materialsinteracting communitycontinuous development
Interaction with some other R packages designed to explore the Stanoutput.
Leonardo Egidi Introduction 5 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Who is using Stan?
Biological & physical sciences: clinical trials, epidemiology, genomics,population ecology, entomology, ophthalmology, neurology, agriculture,�sheries, cancer biology, astrophysics & cosmology, molecular biology,oceanography, climatology.
Social sciences: population dynamics. psycholinguistics, socialnetworks, political science, human development, economics.
Many more: sports analytics, public health, publishing, �nance,pharma, actuarial, recommender systems, educational testing,materials engineering.
Leonardo Egidi Introduction 6 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Interfaces
Leonardo Egidi Introduction 7 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Indice
1 What is Stan?
2 Why Stan?
3 Writing a Stan program
4 Linked package: bayesplot
Leonardo Egidi Introduction 8 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Improving MCMC performance
With Stan, we aim to provide an MCMC implementation that worksrobustly for as many target distributions as possible
Gibbs, RW Metropoilis can be very ine�cient, hard to diagnose.
To explore complicated high-dimensional spaces we need to leveragewhat we know about the geometry of the typical set.
For such a reason, Stan enjoys Hamiltonian Monte Carlo.
We will have now only a brief sketch about how HMC works (furtherreadings are mentioned later on). The Stan users, however, may use,analyze and interpret HMC outputs as they were standard MCMC outputs!
Leonardo Egidi Introduction 9 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Moving to Hamiltonian Monte Carlo
Once we have built a model, Bayesian computation reduces to evaluatingexpectations, or integrals.
Eπ(θ|y) =∫θπ(θ|y)dθ (1)
How do we compute posterior expectations in practice?
Construct a Markov chain that explores the parameter space.Anything you would want to do if you could write it analytically, youcan do to any accuracy with the draws (history) of the chain
limS→∞
1
S
S∑s=1
θ(s) → Eπ(θ|y)
Leonardo Egidi Introduction 10 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Moving to Hamiltonian Monte Carlo
To be e�cient we need to focus computation on the relevantneighborhoods of parameter space. Relevant neighborhoods, however, arede�ned not by probability density but rather by probability mass.
But exactly which neighborhoods end up contributing most to arbitraryexpectations?
The neighborhoods around the maxima of probability distributions featurea lot of probability density, but, especially in a large number of dimensions,or in long tailed distributions, they do not feature much volume. In otherwords, the sliver size dθ tends to be small there.
Leonardo Egidi Introduction 11 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
The Geometry of High-Dimensional Spaces
Consider a rectangular partitioning centered around a distinguished point,such as the mode (example from Betancourt, 2017):
One of the characteristic properties of high-dimensional spaces is that thereis much more volume outside any given neighborhood than inside of it!
Leonardo Egidi Introduction 12 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Typical set
Thus, relevant neighborhoods are de�ned not by probability density butrather by probability mass.
Leonardo Egidi Introduction 13 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Typical set
Probability mass concentrates on a hypersurface called the typical set thatsurrounds the mode.
Leonardo Egidi Introduction 14 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Moving to Hamiltonian Monte Carlo
To accurately estimate expectations we need a method for numerically�nding and then exploring the typical set.
A Markov transition that targets our desired distribution naturallyconcentrates towards probability mass. An inherent ine�ciency in theGibbs sampler and in the random walk Metropolis Hastings is their randomwalk behaviour: the simulations can take a long time zigging and zaggingwhile moving through the target distribution.
HMC borrows strengths from physics to suppress the random walkbehaviour in the Metropolis algorithm, thus allowing it to move much morerapidly through the target distribution. The method enjoys the gradient ofthe log-posterior distribution, d log(π(θ|y))
dθ for a sort of adjustment of thealgorithm towards the typical set area.
Leonardo Egidi Introduction 15 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
When something goes wrong
Under ideal conditions, MCMC estimators converge to the trueexpectations in a very practical progression.
Leonardo Egidi Introduction 16 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
When something goes wrong
Under ideal conditions, MCMC estimators converge to the trueexpectations in a very practical progression.
Leonardo Egidi Introduction 17 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
When something goes wrong
Under ideal conditions, MCMC estimators converge to the trueexpectations in a very practical progression.
Leonardo Egidi Introduction 18 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
When something goes wrong
There are many pathological posterior geometries, however, that spoil theseideal conditions.
Leonardo Egidi Introduction 19 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
When something goes wrong
There are many pathological posterior geometries, however, that spoil theseideal conditions.
Leonardo Egidi Introduction 20 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
When something goes wrong
There are many pathological posterior geometries, however, that spoil theseideal conditions.
Leonardo Egidi Introduction 21 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Hamiltonian Monte Carlo
Hamiltonian Monte Carlo yields fast, and robust, exploration of thedistributions common in practice.
Leonardo Egidi Introduction 22 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Hamiltonian Monte Carlo: bivariate Gaussian
Leonardo Egidi Introduction 23 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Indice
1 What is Stan?
2 Why Stan?
3 Writing a Stan program
4 Linked package: bayesplot
Leonardo Egidi Introduction 24 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Before starting
What is a Bayesian model?
Building a Bayesian model forces us to build a model for how the datais generated
We often think of this as specifying a prior and a likelihood, as if theseare two separate things
They are not!
Leonardo Egidi Introduction 25 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Generative models
The philosophy behind Stan is to think generatively.
The model is expressed as a joint probability distribution of observed andunobserved variables, which may be decomposed as follows:
p(y , θ) = p(y |θ)π(θ) (2)
The posterior of interest is then proportional to the joint distribution (2):
p(θ|y) ∝ p(y |θ)π(θ) (3)
Leonardo Egidi Introduction 26 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Generative models
A Bayesian modeller commits to to an a priori joint distribution:
p(y , θ) = p(y |θ)π(θ)︸ ︷︷ ︸Likelihood×Prior
= π(θ|y)p(y)︸ ︷︷ ︸Posterior×Marginal Likelihood
(4)
Leonardo Egidi Introduction 27 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Generative models and vague priors
What is the problem with vague/di�use priors?
If we use an improper prior, then we do not specify a joint model forour data and parameters.
More importantly, we do not specify a data generating mechanismp(y).
By construction, these priors do not regularize inferences, which isquite often a bad idea
Proper but di�use is better than .improper but is still oftenproblematic.
Leonardo Egidi Introduction 28 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Generative models
If we disallow improper priors, then Bayesian modeling is generative.
In particular, we have a simple way to simulate from p(y):
Leonardo Egidi Introduction 29 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Stan computations
Stan works in logarithmic terms: all the computations are actually done onlog-scale. So, for the posterior we have.
log(π(θ|y)) = log(π(θ)) + log(p(y |θ)) + constant (5)
Products become sums of logs:
p(y |θ) =n∏
i=1
p(yi |θ)→ log(p(y |θ)) =n∑
i=1
log(p(yi |θ)).
Leonardo Egidi Introduction 30 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Starting point
We are now going to write a Stan program together:
Open a new empty �le in RStudio
Save it as linear_regression.stan
Leonardo Egidi Introduction 31 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Blocks strategy
Stan programs are organized into blocks:
data block: declare data types, sizes, and constraints. Read from datasource and constraints validated. Evaluated: once.
parameters block: declare parameter types, sizes, and constraints.Evaluated: every log prob evaluation.
transformed parameters block: declare those parameters transformedfrom the original ones declared in the parameters block. Evaluated:every log prob evaluation.
model block: statements de�ning the posterior density in log scale.Evaluated: every log prob evaluation.
generated quantities: declare and de�ne derived variables. (P)RNGs,predictions, event probabilities, decision making. Constraintsvalidated. Evaluated: once per draw.
Leonardo Egidi Introduction 32 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Data block
Leonardo Egidi Introduction 33 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Parameters' block
Leonardo Egidi Introduction 34 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Model block
Leonardo Egidi Introduction 35 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Generated quantities block
Leonardo Egidi Introduction 36 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Complete Stan model
Leonardo Egidi Introduction 37 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Launching the Stan model from R
Now we may launch the Stan program directly in R:
library(rstan)
# passing the data (already stored)
data <- list(N=N, K=K, X=X, y=y)
# fitting the model
fit1 <- stan(
file = 'linear_regression.stan',
data = data,
iter = 2000,
chains = 4)
# extracting the estimates
sims <- extract(fit1)
Leonardo Egidi Introduction 38 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
First example: 8 schools
This example studied coaching e�ects from eight schools.We denote with yij the result of the i-th test in the j-th school. We assumethe following model:
yij ∼ N (θj , σ2
y )
θj ∼ N (µ, τ2)
Do some schools perform better/worse according to these coaching e�ects?Here is the data, already aggregated by schools:
schools_dat <- list(J = 8,
y = c(28, 8, -3, 7, -1, 1, 18, 12),
sigma = c(15, 10, 16, 11, 9, 11, 10, 18))
Leonardo Egidi Introduction 39 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
First Stan model: 8 schools
// saved as 8schools.stan
data {
int<lower=0> J; // number of schools
real y[J]; // estimated treatment effects
real<lower=0> sigma[J]; // standard error of effect estimates
}
parameters {
real mu; // population treatment effect
real<lower=0> tau; // standard deviation in treatment effects
vector[J] eta; // unscaled deviation from mu by school
}
transformed parameters {
vector[J] theta = mu + tau * eta; // school treatment effects
}
model {
eta ~ normal(0,1); // prior
y ~ normal(theta, sigma); //likelihood
}Leonardo Egidi Introduction 40 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
First example: 8 schools
To �t the model and visualize the estimates, it is su�cient to type in R thefollowing commands (with 2000 iterations and 4 chains as a default):
fit_8schools <- stan(file = '8schools.stan', data = schools_dat)
print(fit_8schools, pars=c("mu", "tau", "theta"))
mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
mu 7.89 5.04 -2.31 4.74 7.92 11.05 18.05 2352 1
tau 6.70 5.71 0.24 2.61 5.43 9.19 21.16 1480 1
theta[1] 11.36 8.23 -2.25 6.18 10.29 15.46 31.15 3161 1
theta[2] 7.89 6.21 -4.43 3.96 7.83 11.78 20.47 4923 1
theta[3] 6.05 7.59 -10.81 1.92 6.56 10.81 20.25 4057 1
theta[4] 7.60 6.44 -5.36 3.74 7.63 11.73 20.57 5055 1
theta[5] 5.13 6.23 -8.45 1.35 5.60 9.26 16.37 4346 1
theta[6] 5.95 6.68 -8.21 1.99 6.30 10.21 18.21 4313 1
theta[7] 10.62 6.93 -1.58 6.12 10.14 14.53 25.63 3381 1
theta[8] 8.40 7.77 -7.18 3.84 8.26 12.63 25.78 3854 1
Leonardo Egidi Introduction 41 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Indice
1 What is Stan?
2 Why Stan?
3 Writing a Stan program
4 Linked package: bayesplot
Leonardo Egidi Introduction 42 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Posterior graphical analysis with bayesplot
Once we �t a model, it is to vital check it via graphical inspection. Thebayesplot package (for any help, see the vignette ) is designed to this task.
The package allows to display:
Posterior uncertainty intervals
Univariate marginal posterior distributions
Bivariate plots
Trace plots
Posterior predictive plots
Leonardo Egidi Introduction 43 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Posterior graphical analysis with bayesplot
The �rst step is to save the posterior. Then you have many choices:
library(bayesplot)
posterior <- as.array(fit_8schools)
mcmc_intervals(posterior) # posterior intervals
mcmc_areas(posterior) # posterior areas
mcmc_dens(posterior) # marginal posteriors
mcmc_pairs(posterior) # bivariate plots
mcmc_trace(posterior) # trace plots
With the arguments pars or regex_pars you may select the desiredparameters.
Leonardo Egidi Introduction 44 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Posterior uncertainty intervals
●
●
●
●
●
●
●
●
●
●
θ1
θ2
θ3
θ4
θ5
θ6
θ7
θ8
τ
µ
−10 0 10 20 30
Posterior intervals
Leonardo Egidi Introduction 45 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Posterior uncertainty areas
θ1
θ2
θ3
θ4
θ5
θ6
θ7
θ8
τ
µ
−25 0 25 50 75
Posterior areas
Leonardo Egidi Introduction 46 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Marginal posteriors
theta[6] theta[7] theta[8] lp__
theta[1] theta[2] theta[3] theta[4] theta[5]
eta[4] eta[5] eta[6] eta[7] eta[8]
mu tau eta[1] eta[2] eta[3]
−20 0 20 0 20 40 −20 0 20 40 60 −15 −10 −5 0
0 20 40 60 −10 0 10 20 30 −20 0 20 40 0 20 40 −20 −10 0 10 20
−2 0 2 −4 −2 0 2 −3 −2 −1 0 1 2 3 −2 0 2 −4 −2 0 2
−10 0 10 20 30 20 40 60 −2 0 2 −2 0 2 −2 0 2
Leonardo Egidi Introduction 47 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Marginal posteriors separated for each chain
theta[6] theta[7] theta[8] lp__
theta[1] theta[2] theta[3] theta[4] theta[5]
eta[4] eta[5] eta[6] eta[7] eta[8]
mu tau eta[1] eta[2] eta[3]
−20 0 20 0 20 40 −20 0 20 40 60 −15 −10 −5 0
0 20 40 60 −10 0 10 20 30 −20 0 20 40 0 20 40 −20 −10 0 10 20
−2 0 2 −4 −2 0 2 −3 −2 −1 0 1 2 3 −2 0 2 −4 −2 0 2
−10 0 10 20 30 20 40 60 −2 0 2 −2 0 2 −2 0 2
Chain
1234
Leonardo Egidi Introduction 48 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Bivariate posterior plots
Leonardo Egidi Introduction 49 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Trace plots for the Markov chains
theta[6] theta[7] theta[8] lp__
theta[1] theta[2] theta[3] theta[4] theta[5]
eta[4] eta[5] eta[6] eta[7] eta[8]
mu tau eta[1] eta[2] eta[3]
0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000
0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000
0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000
0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000−4
−2
0
2
−4
−2
0
2
4
−20
−10
0
10
20
−2
0
2
−2
0
2
4
−20
0
20
40
−15
−10
−5
0
−2
0
2
4
−2
0
2
−20
0
20
40
−25
0
25
50
0
20
40
60
−4
−2
0
2
−20
0
20
0
20
40
−10
0
10
20
30
−2
0
2
0
25
50
−40
−20
0
20
Chain
1234
Leonardo Egidi Introduction 50 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Our challenge with Stan
The Stan shuttle is ready to start! We will learn to:
write simple and more complex model in Stan: lm, glm, hierarchicalmodels.
analyze the posterior summaries.
criticize the model and, eventually, change/reparametrize it.
Leonardo Egidi Introduction 51 / 52
• What is Stan? • Why Stan? • Writing a Stan program • Linked package: bayesplot •
Further reading
Further reading:
Carpenter, B, and Gelman, A, Ho�man, M.D., Lee, D., Goodrich, B.,Betancourt, M., Brubaker, M., Guo, J., Li, P., Riddell, A. (2017).Stan: A Probabilistic Programming Language, Journal of statisticalsoftware 76(1). Here the pdf
Further optional reading about Hamiltonian Monte Carlo:
Betancourt, M. (2017) A conceptual introduction to HamiltonianMonte Carlo. Here the pdf
Leonardo Egidi Introduction 52 / 52