Chapter 4
More than one parameter
Aims:
◃ Moving towards practical applications
◃ Illustrating that computations become quickly involved
◃ Illustrating that frequentist results can be obtained with Bayesian procedures
◃ Illustrating a multivariate (independent) sampling algorithm
Bayesian Biostatistics - Piracicaba 2014 196
4.1 Introduction
• Most statistical models involve more than one parameter to estimate
• Examples:
◃ Normal distribution: mean µ and variance σ2
◃ Linear regression: regression coefficients β0, β1, . . . , βd and residual variance σ2
◃ Logistic regression: regression coefficients β0, β1, . . . , βd
◃ Multinomial distribution: class probabilities θ1, θ2, . . . , θd with∑d
j=1 θj = 1
• This requires a prior for all parameters (together): expresses our beliefs about themodel parameters
• Aim: derive posterior for all parameters and their summary measures
Bayesian Biostatistics - Piracicaba 2014 197
• It turns out that in most cases analytical solutions for the posterior will not bepossible anymore
• In Chapter 6, we will see that for this Markov Chain Monte Carlo methods areneeded
• Here, we look at a simple multivariate sampling approach: Method of Composition
Bayesian Biostatistics - Piracicaba 2014 198
4.2 Joint versus marginal posterior inference
• Bayes theorem:
p(θ | y) = L(θ | y)p(θ)∫L(θ | y)p(θ) dθ
◦ Hence, the same expression as before but now θ = (θ1, θ2, . . . , θd)T
◦ Now, the prior p(θ) is multivariate. But often a prior is given for eachparameter separately
◦ Posterior p(θ | y) is also multivariate. But we usually look only at the(marginal) posteriors p(θj | y) (j = 1, . . . , d)
• We also need for each parameter: posterior mean, median (and sometimes mode),and credible intervals
Bayesian Biostatistics - Piracicaba 2014 199
• Illustration on the normal distribution with µ and σ2 unknown
• Application: determining 95% normal range of alp (continuation of Example III.6)
• We look at three cases (priors):
◃ No prior knowledge is available
◃ Previous study is available
◃ Expert knowledge is available
• But first, a brief theoretical introduction
Bayesian Biostatistics - Piracicaba 2014 200
4.3 The normal distribution with µ and σ2 unknown
Acknowledging that µ and σ2 are unknown
• Sample y1, . . . , yn of independent observations from N(µ, σ2)
• Joint likelihood of (µ, σ2) given y:
L(µ, σ2 | y) = 1
(2πσ2)n/2exp
[− 1
2σ2
n∑i=1
(yi − µ)2
]
• The posterior is again product of likelihood with prior divided by the denominatorwhich involves an integral
• In this case analytical calculations are possible in 2 of the 3 cases
Bayesian Biostatistics - Piracicaba 2014 201
4.3.1 No prior knowledge on µ and σ2 is available
• Noninformative joint prior p(µ, σ2) ∝ σ−2 (µ and σ2 a priori independent)
• Posterior distribution p(µ, σ2 | y) ∝ 1σn+2 exp
{− 1
2σ2
[(n− 1)s2 + n(y − µ)2
]}
6.8
6.9
7.0
7.1
7.2
7.3
7.4
1.2
1.4
1.6
1.8
2.0
2.2
2.4
µσ2
Bayesian Biostatistics - Piracicaba 2014 202
Justification prior distribution
• Most often prior information on several parameters arrives to us for each of theparameters separately and independently ⇒ p(µ, σ2) = p(µ)× p(σ2)
• And, we do not have prior information on µ nor on σ2 ⇒ choice of priordistributions:
��� �� � � ��
���
���
���
���
��
�
�� ������������ �
��� �� � � ��
���
���
���
���
��
� ������ ������������ �
• The chosen priors are called flat priors
Bayesian Biostatistics - Piracicaba 2014 203
• Motivation:
◦ If one is totally ignorant of a location parameter, then it could take any valueon the real line with equal prior probability.
◦ If totally ignorant about the scale of a parameter, then it is as likely to lie inthe interval 1-10 as it is to lie in the interval 10-100. This implies a flat prioron the log scale.
• The flat prior p(log(σ)) = c is equivalent to chosen prior p(σ2) ∝ σ−2
Bayesian Biostatistics - Piracicaba 2014 204
Marginal posterior distributions
Marginal posterior distributions are needed in practice
◃ p(µ | y)
◃ p(σ2 | y)
• Calculation of marginal posterior distributions involve integration:
p(µ | y) =∫p(µ, σ2 | y)dσ2 =
∫p(µ | σ2,y)p(σ2 | y)dσ2
• Marginal posterior is weighted sum of conditional posteriors with weights =uncertainty on other parameter(s)
Bayesian Biostatistics - Piracicaba 2014 205
Conditional & marginal posterior distributions for the normal case
• Conditional posterior for µ: p(µ | σ2,y) = N(y, σ2/n)
• Marginal posterior for µ: p(µ | y) = tn−1(y, s2/n)
⇒ µ− y
s/√n∼ t(n−1) (µ is the random variable)
• Marginal posterior for σ2: p(σ2 | y) ≡ Inv− χ2(n− 1, s2)(scaled inverse chi-squared distribution)
⇒ (n− 1)s2
σ2∼ χ2(n− 1) (σ2 is the random variable)
= special case of IG(α, β) (α = (n− 1)/2, β = 1/2)
Bayesian Biostatistics - Piracicaba 2014 206
Some t-densities
�� �� �� � � � �
���
���
���
���
���
������ � ���������
��� �� �� � � � �
���
���
���
���
���
������ � ����������
��� �� �� � � � �
���
���
���
���
���
������ � ����������
�
�� �� �� � � � �
���
���
���
���
���
������ � ���������
��� �� �� � � � �
���
���
���
���
���
������ � ����������
��� �� �� � � � �
���
���
���
���
���
������ � ����������
�
Bayesian Biostatistics - Piracicaba 2014 207
Some inverse-gamma densities
� � � � � ��
���
���
���
���
���
���
���� � ������
�� � � � � ��
���
���
���
���
���
���
���� � ������
�� � � � � ��
���
���
���
���
���
���
���� � ����
�
� � � � � ��
���
���
���
���
���
���
���� � ������
�� � � � � ��
���
���
���
���
���
���
���� � ����������
�� � � � � ��
���
���
���
���
���
���
���� � �������
�
Bayesian Biostatistics - Piracicaba 2014 208
Joint posterior distribution
• Joint posterior = multiplication of marginal with conditional posterior
p(µ, σ2 | y) = p(µ | σ2,y) p(σ2 | y) = N(y, σ2/n) Inv− χ2(n− 1, s2)
• Normal-scaled-inverse chi-square distribution = N-Inv-χ2(y,n,(n− 1),s2)
6.8
6.9
7.0
7.1
7.2
7.3
7.4
1.2
1.4
1.6
1.8
2.0
2.2
2.4
µσ2
⇒ A posteriori µ and σ2 are dependent
Bayesian Biostatistics - Piracicaba 2014 209
Posterior summary measures and PPD
For µ:
◃ Posterior mean = mode = median = y
◃ Posterior variance = (n−1)n(n−2)s
2
◃ 95% equal tail credible and HPD interval:
[y − t(0.025;n− 1) s/√n, y + t(0.025;n− 1) s/
√n]
For σ2:
◦ Posterior mean, mode, median, variance, 95% equal tail CI all analytically available
◦ 95% HPD interval is computed iteratively
PPD:
◦ tn−1
[y, s2
(1 + 1
n
)]-distribution
Bayesian Biostatistics - Piracicaba 2014 210
Implications of previous results
Frequentist versus Bayesian inference:
◃ Numerical results are the same
◃ Inference is based on different principles
Bayesian Biostatistics - Piracicaba 2014 211
Example IV.1: SAP study – Noninformative prior
◃ Example III.6: normal range for alp is too narrow
◃ Joint posterior distribution = N-Inv-χ2 (NI prior + likelihood, see before)
◃ Marginal posterior distributions (red curves) for y = 100/√alp
6.2 6.6 7.0 7.4
01
23
4
µ
Posterior
1.5 2.0 2.5 3.0 3.5 4.0
0.0
0.5
1.0
1.5
2.0
σ2
Posterior
Bayesian Biostatistics - Piracicaba 2014 212
Normal range for alp:
• PPD for y = t249(7.11, 1.37)-distribution
• 95% normal range for alp = [104.1, 513.2], slightly wider than before
Bayesian Biostatistics - Piracicaba 2014 213
4.3.2 An historical study is available
• Posterior of historical data can be used as prior to the likelihood of current data
• Prior = N-Inv-χ2(µ0,κ0,ν0,σ20)-distribution (from historical data)
• Posterior = N-Inv-χ2(µ,κ, ν,σ2)-distribution (combining data and N-Inv-χ2 prior)
◃ N-Inv-χ2 is conjugate prior
◃ Again shrinkage of posterior mean towards prior mean
◃ Posterior variance = weighted average of prior-, sample variance and distancebetween prior and sample mean
⇒ posterior variance is not necessarily smaller than prior variance!
• Similar results for posterior measures and PPD as in first case
Bayesian Biostatistics - Piracicaba 2014 214
Example IV.2: SAP study – Conjugate prior
• Prior based on retrospective study (Topal et al., 2003) of 65 ‘healthy’ subjects:
◦ Mean (SD) for y = 100/√alp = 5.25 (1.66)
◦ Conjugate prior = N-Inv-χ2(5.25, 65, 64,2.76)
◦ Note: mean (SD) prospective data: 7.11 (1.4), quite different
◦ Posterior = N-Inv-χ2(6.72, 315, 314, 2.61):
◦ Posterior mean in-between between prior mean & sample mean, but:
◦ Posterior precision = prior + sample precision
◦ Posterior variance < prior variance and > sample variance
◦ Posterior informative variance > NI variance
◦ Prior information did not lower posterior uncertainty, reason: conflict oflikelihood with prior
Bayesian Biostatistics - Piracicaba 2014 215
Marginal posteriors:
6.2 6.6 7.0 7.4
01
23
4
µ
Posterior
1.5 2.0 2.5 3.0 3.5 4.0
0.0
0.5
1.0
1.5
2.0
σ2
Posterior
Red curves = marginal posteriors from informative prior (historical data)
Bayesian Biostatistics - Piracicaba 2014 216
Histograms retro- and prospective data:
Prospective data (Likelihood)
100ALP(−1 2)
Density
2 4 6 8 10 12
0.0
0.1
0.2
0.3
0.4
Retrospective data (Informative prior)
100ALP(−1 2)
Density
2 4 6 8 10 12
0.0
0.1
0.2
0.3
Bayesian Biostatistics - Piracicaba 2014 217
4.3.3 Expert knowledge is available
• Expert knowledge available on each parameter separately
⇒ Joint prior N(µ0, σ20) × Inv− χ2(ν0, τ
20 ) = conjugate
• Posterior cannot be derived analytically, but numerical/sampling techniques areavailable
Bayesian Biostatistics - Piracicaba 2014 218
What now?
Computational problem:
◃ ‘Simplest problem’ in classical statistics is already complicated
◃ Ad hoc solution is still possible, but not satisfactory
◃ There is the need for another approach
Bayesian Biostatistics - Piracicaba 2014 219
4.4 Multivariate distributions
Distributions with a multivariate response:
◃ Multivariate normal distribution: generalization of normal distribution
◃ Multivariate Student’s t-distribution: generalization of location-scale t-distribution
◃ Multinomial distribution: generalization of binomial distribution
Multivariate prior distributions:
◃ N-Inv-χ2-distribution: prior for N(µ, σ2)
◃ Dirichlet distribution: generalization of beta distribution
◃ (Inverse-)Wishart distribution: generalization of (inverse-) gamma (prior) forcovariance matrices (see mixed model chapter)
Bayesian Biostatistics - Piracicaba 2014 220
Example IV.3: Young adult study – Smoking and alcohol drinking
• Study examining life style among young adults
Smoking
Alcohol No Yes
No-Mild 180 41
Moderate-Heavy 216 64
Total 396 105
• Of interest: association between smoking & alcohol-consumption
Bayesian Biostatistics - Piracicaba 2014 221
Likelihood part:
2×2 contingency table = multinomial model Mult(n,θ)
• θ = {θ11, θ12, θ21, θ22 = 1− θ11 − θ12 − θ21} and 1 =∑
i,j θij
• y = {y11, y12, y21, y22} and n =∑
i,j yij
Mult(n,θ) =n!
y11! y12! y21! y22!θy1111 θ
y1212 θ
y1121 θ
y2222
Bayesian Biostatistics - Piracicaba 2014 222
Dirichlet prior:
Conjugate prior to multinomial distribution = Dirichlet prior Dir(α)
θ ∼ 1
B(α)
∏i,j
θαij−1ij
◦ α = {α11, α12, α21, α22}
◦ B(α) =∏
i,j Γ(αij)/Γ(∑
i,j αij
)⇒ Posterior distribution = Dir(α + y)
• Note:
◦ Dirichlet distribution = extension of beta distribution to higher dimensions
◦ Marginal distributions of a Dirichlet distribution = beta distribution
Bayesian Biostatistics - Piracicaba 2014 223
Measuring association:
• Association between smoking and alcohol consumption:
ψ =θ11 θ22θ12 θ21
• Needed p(ψ | y), but difficult to derive
• Alternatively replace analytical calculations by sampling procedure
Bayesian Biostatistics - Piracicaba 2014 224
Analysis of contingency table:
• Prior distribution: Dir(1, 1, 1, 1)
• Posterior distribution: Dir(180+1, 41+1,216+1, 64+1)
• Sample of 10, 000 generated values for θ parameters
• 95% equal tail CI for ψ: [0.839, 2.014]
• Equal to classically obtained estimate
Bayesian Biostatistics - Piracicaba 2014 225
Posterior distributions:
θ11
0.30 0.35 0.40 0.45
05
10
15
20
θ12
0.04 0.06 0.08 0.10 0.12
05
101520253035
θ21
0.35 0.40 0.45 0.50
05
10
15
ψ
0.5 1.0 1.5 2.0 2.5 3.0 3.5
0.0
0.4
0.8
1.2
Bayesian Biostatistics - Piracicaba 2014 226
4.5 Frequentist properties of Bayesian inference
• Not of prime interest for a Bayesian to know the sampling properties of estimators
• However, it is important that Bayesian approach gives most often the right answer
• What is known?
◃ Theory: posterior is normal for a large sample (BCLT)
◃ Simulations: Bayesian approach may offer alternative interval estimators withbetter coverage than classical frequentist approaches
Bayesian Biostatistics - Piracicaba 2014 227
4.6 The Method of Composition
A method to yield a random sample from a multivariate distribution
• Stagewise approach
• Based on factorization of joint distribution into a marginal & several conditionals
p(θ1, . . . , θd | y) = p(θd | y) p(θd−1 | θd,y) . . . p(θ1 | θd−1, . . . , θ2,y)
• Sampling approach:
◃ Sample θd from p(θd | y)
◃ Sample θ(d−1) from p(θ(d−1) | θd,y)◃ . . .
◃ Sample θ1 from p(θ1 | θd−1, . . . , θ2,y)
Bayesian Biostatistics - Piracicaba 2014 228
Sampling from posterior when y ∼ N(µ, σ2), both parameters unknown
• Sample first σ2, then given a sampled value of σ2 (σ2) sample µ from p(µ | σ2,y)
• Output case 1: No prior knowledge on µ and σ2 on next page
Bayesian Biostatistics - Piracicaba 2014 229
Sampled posterior distributions:
σ2
1.4 1.6 1.8 2.0 2.2 2.4
0.0
0.5
1.0
1.5
2.0
(a)
µ
6.9 7.1 7.3
01
23
45 (b)
6.9 7.0 7.1 7.2 7.3 7.4
1.4
1.6
1.8
2.0
2.2
2.4
µ
σ2
(c)
y~4 6 8 10 12
0.00
0.10
0.20
0.30
(d)
Bayesian Biostatistics - Piracicaba 2014 230
4.7 Bayesian linear regression models
• Example of a classical multiple linear regression analysis
• Non-informative Bayesian multiple linear regression analysis:
◃ Non-informative prior for all parameters + . . . classical linear regression model
◃ Analytical results are available + method of composition can be applied
Bayesian Biostatistics - Piracicaba 2014 231
4.7.1 The frequentist approach to linear regression
Classical regression model: y =Xβ + ε
. y = a n× 1 vector of independent responses
.X = n× (d + 1) design matrix
. β = (d + 1)× 1 vector of regression parameters
. ε = n× 1 vector of random errors ∼ N(0, σ2 I)
Likelihood:
L(β, σ2 | y,X) =1
(2πσ2)n/2exp
[− 1
2σ2(y −Xβ)T (y −Xβ)
]. MLE = LSE of β: β = (XTX)−1XTy
. Residual sum of squares: S = (y −Xβ)T (y −Xβ)
. Mean residual sum of squares: s2 = S/(n− d− 1)
Bayesian Biostatistics - Piracicaba 2014 232
Example IV.7: Osteoporosis study: a frequentist linear regression analysis
◃ Cross-sectional study (Boonen et al., 1996)
◃ 245 healthy elderly women in a geriatric hospital
◃ Aim: Find determinants for osteoporosis
◃ Average age women = 75 yrs with a range of 70-90 yrs
◃ Marker for osteoporosis = tbbmc (in kg) measured for 234 women
◃ Simple linear regression model: regressing tbbmc on bmi
◃ Classical frequentist regression analysis:
◦ β0 = 0.813 (0.12)
◦ β1 = 0.0404 (0.0043)
◦ s2 = 0.29, with n− d− 1 = 232
◦ corr(β0, β1) =-0.99
Bayesian Biostatistics - Piracicaba 2014 233
Scatterplot + fitted regression line:
20 25 30 35 40
0.5
1.0
1.5
2.0
2.5
BMI(kg m2)
TB
BM
C (
kg)
Bayesian Biostatistics - Piracicaba 2014 234
4.7.2 A noninformative Bayesian linear regression model
Bayesian linear regression model = prior information on regression parameters &residual variance + normal regression likelihood
• Noninformative prior for (β, σ2): p(β, σ2) ∝ σ−2
• Notation: omit design matrix X
• Posterior distributions:
p(β, σ2 | y) = N(d+1)
[β | β, σ2(XTX)−1
]× Inv− χ2(σ2 | n− d− 1, s2)
p(β | σ2,y) = N(d+1)
[β | β, σ2(XTX)−1
]p(σ2 | y) = Inv− χ2(σ2 | n− d− 1, s2)
p(β | y) = Tn−d−1
[β | β, s2(XTX)−1
]Bayesian Biostatistics - Piracicaba 2014 235
4.7.3 Posterior summary measures for the linear regression model
• Posterior summary measures of
(a) regression parameters β
(b) parameter of residual variability σ2
• Univariate posterior summary measures
◃ The marginal posterior mean (mode, median) of βj = MLE (LSE) βj
◃ 95% HPD interval for βj
◃ Marginal posterior mode and mean of σ2
◃ 95% HPD-interval for σ2
Bayesian Biostatistics - Piracicaba 2014 236
Multivariate posterior summary measures
Multivariate posterior summary measures for β
• Posterior mean (mode) of β = β (MLE=LSE)
• 100(1-α)%-HPD region
• Contour probability for H0 : β = β0
Bayesian Biostatistics - Piracicaba 2014 237
Posterior predictive distribution
• PPD of y with x: t-distribution
• How to sample?
◃ Directly from t-distribution
◃ Method of Composition
Bayesian Biostatistics - Piracicaba 2014 238
4.7.4 Sampling from the posterior distribution
• Most posteriors can be sampled via standard sampling algorithms
• What about p(β | y) = multivariate t-distribution? How to sample from thisdistribution? (R function rmvt in mvtnorm)
• Easy with Method of Composition: Sample in two steps
◃ Sample from p(σ2 | y): scaled inverse chi-squared distribution ⇒ σ2
◃ Sample from p(β | σ2,y) = multivariate normal distribution
Bayesian Biostatistics - Piracicaba 2014 239
Example IV.8: Osteoporosis study – Sampling with Method of Composition
• Sample σ2 from p(σ2 | y) = Inv− χ2(σ2 | n− d− 1, s2)
• Sample from β from p(β | σ2,y) = N(d+1)
[β | β, σ2(XTX)−1
]• Sampled mean regression vector = (0.816, 0.0403)
• 95% equal tail CIs = β0: [0.594, 1.040] & β1: [0.0317, 0.0486]
• Contour probability for H0 : β = 0 < 0.001
• Marginal posterior of (β0, β1) has a ridge (r(β0, β1) = −0.99)
Bayesian Biostatistics - Piracicaba 2014 240
PPD:
• Distribution of a future observation at bmi=30
• Sample future observation y from N(µ30, σ230):
◃ µ30 = βT(1, 30)
◃ σ230 = σ2[1 + (1, 30)(XTX)−1(1, 30)T
]• Sampled mean and standard deviation = 2.033 and 0.282
Bayesian Biostatistics - Piracicaba 2014 241
Posterior distributions:
β0
0.6 0.8 1.0 1.2
01
23
4 (a)
β1
0.025 0.035 0.045
020
40
60
80
100
(b)
0.6 0.8 1.0 1.2
0.025
0.035
0.045
β0
β1
(c)
y~1.5 2.0 2.5 3.0
0.0
0.5
1.0
1.5 (d)
Bayesian Biostatistics - Piracicaba 2014 242
4.8 Bayesian generalized linear models
Generalized Linear Model (GLIM): extension of the linear regression model to a wideclass of regression models
• Examples:
◦ Normal linear regression model with normal distribution for continuousresponse and σ2 assumed known
◦ Poisson regression model with Poisson distribution for count response, andlog(mean) = linear function of covariates
◦ Logistic regression model with Bernoulli distribution for binary response andlogit of probability = linear function of covariates
Bayesian Biostatistics - Piracicaba 2014 243
4.8.1 More complex regression models
• Considered multiparameter models are limited
◃ Weibull distribution for alp?
◃ Censored/truncated data?
◃ Cox regression?
• Postpone to MCMC techniques
Bayesian Biostatistics - Piracicaba 2014 244
Take home messages
• Any practical application involves more than one parameter, hence immediatelyBayesian inference is multivariate even with univariate data.
• A multivariate prior is needed and a multivariate posterior is obtained, but themarginal posterior is the basis for practical inference
• Nuisance parameters:
◃ Bayesian inference: average out nuisance parameter
◃ Classical inference: profile out (maximize out nuisance parameter)
• Multivariate independent sampling can be done, if marginals can be computed
• Frequentist properties of Bayesian estimators (with NI priors) often good
Bayesian Biostatistics - Piracicaba 2014 245
Chapter 5
Choosing the prior distribution
Aims:
◃ Review the different principles that lead to a prior distribution
◃ Critically review the impact of the subjectivity of prior information
Bayesian Biostatistics - Piracicaba 2014 246
5.1 Introduction
Incorporating prior knowledge
◃ Unique feature for Bayesian approach
◃ But might introduce subjectivity
◃ Useful in clinical trials to reduce sample size
In this chapter we review different kinds of priors:
◃ Conjugate
◃ Noninformative
◃ Informative
Bayesian Biostatistics - Piracicaba 2014 247
5.2 The sequential use of Bayes theorem
• Posterior of the kth experiment = prior for the (k + 1)th experiment (sequentialsurgeries)
• In this way, the Bayesian approach can mimic our human learning process
• Meaning of ‘prior’ in prior distribution:
◦ Prior: prior knowledge should be specified independent of the collected data
◦ In RCTs: fix the prior distribution in advance
Bayesian Biostatistics - Piracicaba 2014 248
5.3 Conjugate prior distributions
In this section:
• Conjugate priors for univariate & multivariate data distributions
• Conditional conjugate and semi-conjugate distributions
• Hyperpriors
Bayesian Biostatistics - Piracicaba 2014 249
5.3.1 Conjugate priors for univariate data distributions
• In previous chapters, examples were given whereby combination of prior withlikelihood, gives posterior of the same type as the prior.
• This property is called conjugacy.
• For an important class of distributions (those that belong to exponential family)there is a recipe to produce the conjugate prior
Bayesian Biostatistics - Piracicaba 2014 250
Table conjugate priors for univariate discrete data distributions
Exponential family member Parameter Conjugate prior
UNIVARIATE CASE
Discrete distributions
Bernoulli Bern(θ) θ Beta(α0, β0)
Binomial Bin(n,θ) θ Beta(α0, β0)
Negative Binomial NB(k,θ) θ Beta(α0, β0)
Poisson Poisson(λ) λ Gamma(α0, β0)
Bayesian Biostatistics - Piracicaba 2014 251
Table conjugate priors for univariate continuous data distributions
Exponential family member Parameter Conjugate prior
UNIVARIATE CASE
Continuous distributions
Normal-variance fixed N(µ, σ2)-σ2 fixed µ N(µ0, σ20)
Normal-mean fixed N(µ, σ2)-µ fixed σ2 IG(α0, β0)
Inv-χ2(ν0, τ20 )
Normal∗ N(µ, σ2) µ, σ2 NIG(µ0, κ0, a0, b0)
N-Inv-χ2(µ0,κ0,ν0,τ20 )
Exponential Exp(λ) λ Gamma(α0, β0)
Bayesian Biostatistics - Piracicaba 2014 252
Recipe to choose conjugate priors
p(y | θ) ∈ exponential family:
p(y | θ) = b(y) exp[c(θ)Tt(y) + d(θ)
]◦ d(θ), b(y) = scalar functions, c(θ) = (c1(θ), . . . , cd(θ))
T
◦ t(y) = d-dimensional sufficient statistic for θ (canonical parameter)
◦ Examples: Binomial distribution, Poisson distribution, normal distribution, etc.
For a random sample y = {y1, . . . , yn} of i.i.d. elements:
p(y | θ) = b(y) exp[c(θ)Tt(y) + nd(θ)
]◦ b(y) =
∏n1 b(yi) & t(y) =
∑n1 t(yi)
Bayesian Biostatistics - Piracicaba 2014 253
Recipe to choose conjugate priors
For the exponential family, the class of prior distributions ℑ closed under sampling =
p(θ | α, β) = k(α, β) exp[c(θ)Tα + βd(θ)
]◦ α = (α1, . . . , αd)
T and β hyperparameters
◦ Normalizing constant: k(α, β) = 1/∫exp[c(θ)Tα + βd(θ)
]dθ
Proof of closure:
p(θ | y) ∝ p(y | θ)p(θ)= exp
[c(θ)Tt(y) + n d(θ)
]exp[c(θ)Tα + βd(θ)
]= exp
[c(θ)Tα∗ + β∗d(θ)
],
with α∗ = α + t(y), β∗ = β + n
Bayesian Biostatistics - Piracicaba 2014 254
Recipe to choose conjugate priors
• Above rule gives the natural conjugate family
• Enlarge class of priors ℑ by adding extra parameters: conjugate family of priors,again closed under sampling (O’Hagan & Forster, 2004)
• The conjugate prior has the same functional form as the likelihood, obtained byreplacing the data (t(y) and n) by parameters (α and β)
• A conjugate prior is model-dependent, in fact likelihood-dependent
Bayesian Biostatistics - Piracicaba 2014 255
Practical advantages when using conjugate priors
A (natural) conjugate prior distribution for the exponential family is convenient fromseveral viewpoints:
• mathematical
• numerical
• interpretational (convenience prior):
◃ The likelihood of historical data can be easily turned into a conjugate prior.
The natural conjugate distribution = equivalent to a fictitious experiment
◃ For a natural conjugate prior, the posterior mean = weighted combination ofthe prior mean and sample estimate
Bayesian Biostatistics - Piracicaba 2014 256
Example V.2: Dietary study – Normal versus t-prior
• Example II.2: IBBENS-2 normal likelihood was combined with N(328,100)(conjugate) prior distribution
• Replace the normal prior by a t30(328, 100)-prior
⇒ posterior practically unchanged, but 3 elegant features of normal prior are lost:
◃ Posterior cannot be determined analytically
◃ Posterior is not of the same class as the prior
◃ Posterior summary measures are not obvious functions of the prior and thesample summary measures
Bayesian Biostatistics - Piracicaba 2014 257
5.3.2 Conjugate prior for normal distribution – mean and varianceunknown
N(µ, σ2) with µ and σ2 unknown ∈ two-parameter exponential family
• Conjugate = product of a normal prior with inverse gamma prior
• Notation: NIG(µ0, κ0, a0, b0)
Bayesian Biostatistics - Piracicaba 2014 258
Mean known and variance unknown
• For σ2 unknown and µ known :
Natural conjugate is inverse gamma (IG)
Equivalently: scaled inverse-χ2 distribution (Inv-χ2)
Bayesian Biostatistics - Piracicaba 2014 259
5.3.3 Multivariate data distributions
Priors for two popular multivariate models:
• Multinomial model
• Multivariate normal model
Bayesian Biostatistics - Piracicaba 2014 260
Table conjugate priors for multivariate data distributions
Exponential family member Parameter Conjugate prior
MULTIVARIATE CASE
Discrete distributions
Multinomial Mult(n,θ) θ Dirichlet(α0)
Continuous distributions
Normal-covariance fixed N(µ, Σ)-Σ fixed µ N(µ0, Σ0)
Normal-mean fixed N(µ, Σ)-µ fixed Σ IW(Λ0, ν0)
Normal∗ N(µ, Σ) µ, Σ NIW(µ0, κ0, ν0, Λ0)
Bayesian Biostatistics - Piracicaba 2014 261
Multinomial model
Mult(n,θ): p(y | θ) = n!y1!y2!...yk!
∏kj=1 θ
yjj ∈ exponential family
Natural conjugate: Dirichlet(α0) distribution
p(θ | α0) =∏kj=1 Γ(α0j)∑kj=1 Γ(α0j)
∏kj=1 θ
α0j−1j
Properties:
◃ Posterior distribution = Dirichlet(α0 + y)
◃ Beta distribution = special case of a Dirichlet distribution with k = 2
◃ Marginal distributions of the Dirichlet distribution = beta distributions
◃ Dirichlet(1, 1, . . . , 1) = extension of the classical uniform prior Beta(1,1)
Bayesian Biostatistics - Piracicaba 2014 262
Multivariate normal model
The p-dimensional multivariate normal distribution:
p(y1, . . . ,yn | µ,Σ) = 1(2π)np/2|Σ|1/2 exp
[−1
2
∑ni=1(yi − µ)TΣ−1(yi − µ)
]
Conjugates:
◃ Σ known and µ unknown: N(µ0,Σ0) for µ
◃ Σ unknown and µ known: inverse Wishart distribution IW(Λ0, ν0) for Σ
◃ Σ unknown and µ unknown:
Normal-inverse Wishart distribution NIW(µ0, κ0, ν0,Λ0) for µ and Σ
Bayesian Biostatistics - Piracicaba 2014 263
5.3.4 Conditional conjugate and semi-conjugate priors
Example θ = (µ, σ2) for y ∼ N(µ, σ2)
• Conditional conjugate for µ: N(µ0, σ20)
• Conditional conjugate for σ2: IG(α, β)
• Semi-conjugate prior = product of conditional conjugates
• Often conjugate priors cannot be used in WinBUGS, but semi-conjugates arepopular
Bayesian Biostatistics - Piracicaba 2014 264
5.3.5 Hyperpriors
Conjugate priors are restrictive to present prior knowledge
⇒ Give parameters of conjugate prior also a prior
Example:
• Prior: θ ∼ Beta(1, 1)
• Instead: θ ∼ Beta(α, β) and α ∼ Gamma(1, 3), β ∼ Gamma(2, 4)
◃ α, β = hyperparameters
◃ Gamma(1, 3)× Gamma(2, 4) = hyperprior/hierarchical prior
• Aim: more flexibility in prior distribution (and useful for Gibbs sampling)
Bayesian Biostatistics - Piracicaba 2014 265
5.4 Noninformative prior distributions
Bayesian Biostatistics - Piracicaba 2014 266
5.4.1 Introduction
Sometimes/often researchers cannot or do not wish to make use of prior knowledge⇒ prior should reflect this absence of knowledge
• Prior that express no knowledge = (initially) called a noninformative (NI)
• Central question: What prior reflects absence of knowledge?
◃ Flat prior?
◃ Huge amount of research to find best NI prior
◃ Other terms for NI: non-subjective, objective, default, reference, weak, diffuse,flat, conventional and minimally informative, etc
• Challenge: make sure that posterior is a proper distribution!
Bayesian Biostatistics - Piracicaba 2014 267
5.4.2 Expressing ignorance
• Equal prior probabilities = principle of insufficient reason, principle of indifference,Bayes-Laplace postulate
• Unfortunately, but . . . flat prior cannot express ignorance
Bayesian Biostatistics - Piracicaba 2014 268
Ignorance at different scales:
0 1 2 3 4 5
01
23
45
Flat prior on σ
σ
Pri
or
0 1 2 3 4 5
01
23
45
Flat prior on σ
σ2
Pri
or
0 1 2 3 4 5
01
23
45
Flat prior on σ2
σ2
Pri
or
0 1 2 3 4 50
12
34
5
Flat prior on σ2
σ
Pri
or
Ignorance on σ-scale is different from ignorance on σ2-scale
Bayesian Biostatistics - Piracicaba 2014 269
Ignorance cannot be expressed mathematically
Bayesian Biostatistics - Piracicaba 2014 270
5.4.3 General principles to choose noninformative priors
A lot of research has been spent on the specification of NI priors, most popular areJeffreys priors:
• Result of a Bayesian analysis depends on choice of scale for flat prior:
p(θ) ∝ c or p(h(θ)) ≡ p(ψ) ∝ c
• To preserve conclusions when changing scale: Jeffreys suggested a rule toconstruct priors based on the invariance principle/rule (conclusions do notchange when changing scale)
• Jeffreys rule suggests a way to choose a scale to take the flat prior on
• Jeffreys rule also exists for more than one parameter (Jeffreys multi-parameterrule)
Bayesian Biostatistics - Piracicaba 2014 271
Examples of Jeffreys priors
• Binomial model: p(θ) ∝ θ−1/2(1− θ)−1/2 ⇔ flat prior on ψ(θ) ∝ arcsin√θ
• Poisson model: p(λ) ∝ λ−1/2 ⇔ flat prior on ψ(λ) =√λ
• Normal model with σ fixed: p(µ) ∝ c
• Normal model with µ fixed: p(σ2) ∝ σ−2 ⇔ flat prior on log(σ)
• Normal model with µ and σ2 unknown: p(µ, σ2) ∝ σ−2, which reproducessome classical frequentist results !!!
Bayesian Biostatistics - Piracicaba 2014 272
5.4.4 Improper prior distributions
• Many NI priors are improper (= AUC is infinite)
• Improper prior is technically no problem when posterior is proper
• Example: Normal likelihood (µ unknown + σ2 known) + flat prior on µ
p(µ | y) = p(y | µ) p(µ)∫p(y | µ) p(µ) dµ
=p(y | µ) c∫p(y | µ) c dµ
=1√
2πσ/√nexp
[−n2
(µ− y
σ
)2]
• Complex models: difficult to know when improper prior yields a proper posterior(variance of the level-2 obs in Gaussian hierarchical model)
• Interpretation of improper priors?
Bayesian Biostatistics - Piracicaba 2014 273
5.4.5 Weak/vague priors
• For practical purposes: sufficient that prior is locally uniform also called vague orweak
• Locally uniform: prior ≈ constant on interval outside which likelihood ≈ zero
• Examples for N(µ, σ2) likelihood:
◦ µ: N(0, σ20) prior with σ0 large◦ σ2: IG(ε, ε) prior with ε small ≈ Jeffreys prior
Bayesian Biostatistics - Piracicaba 2014 274
Locally uniform prior
0 200 400 600 800 1000
0.0
00
0.0
05
0.0
10
0.0
15
0.0
20
0.0
25
µ
LOCALLY UNIFORM PRIOR
LIKELIHOOD
POSTERIOR
Bayesian Biostatistics - Piracicaba 2014 275
Vague priors in software:
• WinBUGS allows only (proper) vague priors (Jeffreys priors are not allowed)
◦ mu ∼ dnorm(0.0,1.0E-6): normal prior with variance = 1000
◦ tau2 ∼ dgamma(0.001,0.001): inverse gamma prior for variance withshape=rate=10−3
• SAS allows improper priors (allows Jeffreys priors)
Bayesian Biostatistics - Piracicaba 2014 276
Density of log(σ) for σ2 (= 1/τ 2) ∼ IG(ε, ε)
� � � � � �
���
���
���
���
���
��� ��� � ���
��
��
������
�������
� �� �� �� ��
����������������������������
� � � � � �
���
���
���
���
��� ��� � ����
��
��
������
�������
� ��� ��� ���
�����
�����
�����
�����
�����
Bayesian Biostatistics - Piracicaba 2014 277
Density of log(σ) for σ2 ∼ IG(ε, ε)
� � � � � �
���
���
���
���
��� ��� � �����
��
��
������
�������
� ��� ��� ���
������
������
������
������
������
� � � � � �
����
����
����
����
���
�� ������ � �����
��
��
������
������
� ��� ��� ���
������
������
������
������
Bayesian Biostatistics - Piracicaba 2014 278
5.5 Informative prior distributions
Bayesian Biostatistics - Piracicaba 2014 279
5.5.1 Introduction
• In basically all research some prior knowledge is available
• In this section:
◃ Formalize the use of historical data as prior information using the power prior
◃ Review the use of clinical priors, which are prior distributions based on eitherhistorical data or on expert knowledge
◃ Priors that are based on formal rules expressing prior skepticism and optimism
• The set of priors representing prior knowledge = subjective or informative priors
• But, first two success stories how the Bayesian approach helped to find:
◃ a crashed plane
◃ a lost fisherman on the Atlantic Ocean
Bayesian Biostatistics - Piracicaba 2014 280
Locating a lost plane
◃ Statisticians helped locate an Air France plane in 2011 which was missing for twoyears using Bayesian methods
◃ June 2009: Air France flight 447 went missing flying from Rio de Janeiro in Brazilto Paris, France
◃ Debris from the Airbus A330 was found floating on the surface of the Atlantic fivedays later
◃ After a number of days, the debris would have moved with the ocean current,hence finding the black box is not easy
◃ Existing software (used by the US Coast Guard) did not help
◃ Senior analyst at Metron, Colleen Keller, relied on Bayesian methods to locate theblack box in 2011
Bayesian Biostatistics - Piracicaba 2014 281
Members of the Brazilian Frigate Constituicao recovering debris
in June 2009
Debris from the Air France crash is laid out for investi-
gation in 2009
A 2009 infrared satellite image shows weather conditions
off the Brazilian coast and the plane search area
Bayesian Biostatistics - Piracicaba 2014 282
Finding a lost fisherman on the Atlantic Ocean
New York Times (30 September 2014)
◃ ”. . . if not for statisticians, a Long Island fisherman might have died in theAtlantic Ocean after falling off his boat early one morning last summer
◃ The man owes his life to a once obscure field known as Bayesian statistics - a setof mathematical rules for using new data to continuously update beliefs or existingknowledge
◃ It is proving especially useful in approaching complex problems, including searcheslike the one the Coast Guard used in 2013 to find the missing fisherman, JohnAldridge
◃ But the current debate is about how scientists turn data into knowledge, evidenceand predictions. Concern has been growing in recent years that some fields are notdoing a very good job at this sort of inference. In 2012, for example, a team atthe biotech company Amgen announced that they’d analyzed 53 cancer studies
Bayesian Biostatistics - Piracicaba 2014 283
and found it could not replicate 47 of them
◃ The Coast Guard has been using Bayesian analysis since the 1970s. The approachlends itself well to problems like searches, which involve a single incident and manydifferent kinds of relevant data, said Lawrence Stone, a statistician for Metron, ascientific consulting firm in Reston, Va., that works with the Coast Guard
The Coast Guard, guided by the statistical
method of Thomas Bayes, was able to find the
missing fisherman John Aldridge.
Bayesian Biostatistics - Piracicaba 2014 284
5.5.2 Data-based prior distributions
• In previous chapters:
◦ Combined historical data with current data assuming identical conditions
◦ Discounted importance of prior data by increasing variance
• Generalized by power prior (Ibrahim and Chen):
◦ Likelihood historical data: L(θ | y0) based on y0 = {y01, . . . , y0n0}◦ Prior of historical data: p0(θ | c0)◦ Power prior distribution:
p(θ | y0, a0) ∝ L(θ | y0)a0 p0(θ | c0)
with 0 no accounting ≤ a0 ≤ 1 fully accounting
Bayesian Biostatistics - Piracicaba 2014 285
5.5.3 Elicitation of prior knowledge
• Elicitation of prior knowledge: turn (qualitative) information from ‘experts’ intoprobabilistic language
• Challenges:
◃ Most experts have no statistical background
◃ What to ask to construct prior distribution:
◦ Prior mode, median, mean and prior 95% CI?
◦ Description of the prior: quartiles, mean, SD?
◃ Some probability statements are easier to elicit than others
Bayesian Biostatistics - Piracicaba 2014 286
Example V.5: Stroke study – Prior for 1st interim analysis from experts
Prior knowledge on θ (incidence of SICH), elicitation based on:
◦ Most likely value for θ and prior equal-tail 95% CI
◦ Prior belief pk on each of the K intervals Ik ≡ [θk−1, θk) covering [0,1]
0.0 0.1 0.2 0.3 0.4
02
46
810
θ
Bayesian Biostatistics - Piracicaba 2014 287
Elicitation of prior knowledge – some remarks
• Community and consensus prior: obtained from a community of experts
• Difficulty in eliciting prior information on more than 1 parameter jointly
• Lack of Bayesian papers based on genuine prior information
Bayesian Biostatistics - Piracicaba 2014 288
Identifiability issues
• With overspecified model: non-identifiable model
• Unidentified parameter, when given a NI prior also posterior is NI
• Bayesian approach can make parameters estimable, so that it becomes anidentifiable model
• In next example, not all parameters can be estimated without extra (prior)information
Bayesian Biostatistics - Piracicaba 2014 289
Example V.6: Cysticercosis study – Estimate prevalence without gold standard
Experiment:
◃ 868 pigs tested in Zambia with Ag-ELISA diagnostic test
◃ 496 pigs showed a positive test
◃ Aim: estimate the prevalence π of cysticercosis in Zambia among pigs
If estimate of sensitivity α and specificity β available, then:
π =p+ + β − 1
α + β − 1
◦ p+ = n+/n = proportion of subjects with a positive test
◦ α and β = estimated sensitivity and specificity
Bayesian Biostatistics - Piracicaba 2014 290
Data:
Table of results:
Test Disease (True) Observed
+ -
+ πα (1− π)(1− β) n+=496
- π(1− α) (1− π)β n−=372
Total π (1− π) n=868
◃ Only collapsed table is available
◃ Since α and β vary geographically, expert knowledge is needed
Bayesian Biostatistics - Piracicaba 2014 291
Prior and posterior:
• Prior distribution on π (p(π)) , α (p(α)) and β (p(β)) is needed
• Posterior distribution:
p(π, α, β | n+, n−) ∝(nn+
)[πα + (1− π)(1− β)]n
+
[π(1− α) + (1− π)β]n−p(π)p(α)p(β)
• WinBUGS was used
Bayesian Biostatistics - Piracicaba 2014 292
Posterior of π:
(a) Uniform priors for π, α and β (no prior information)
(b) Beta(21,12) prior for α and Beta(32,4) prior for β (historical data)
0.0 0.2 0.4 0.6 0.8 1.0
01
23
4
N = 10000 Bandwidth = 0.04473
(a)
p(π|y)
0.0 0.2 0.4 0.6 0.8 1.0
01
23
4
N = 10000 Bandwidth = 0.01542
(b)
p(π|y)
Bayesian Biostatistics - Piracicaba 2014 293
5.5.4 Archetypal prior distributions
• Use of prior information in Phase III RCTs is problematic, except for medicaldevice trials (FDA guidance document)
⇒ Pleas for objective priors in RCTs
• There is a role of subjective priors for interim analyses:
◃ Skeptical prior
◃ Enthusiastic prior
Bayesian Biostatistics - Piracicaba 2014 294
Example V.7: Skeptical priors in a phase III RCT
Tan et al. (2003):
◃ Phase III RCT for treating patients with hepatocellular carcinoma
◃ Standard treatment: surgical resection
◃ Experimental treatment: surgery + adjuvant radioactive iodine (adjuvant therapy)
◃ Planning: recruit 120 patients
Frequentist interim analyses for efficacy were planned:
◃ First interim analysis (30 patients): experimental treatment better (P = 0.01< 0.029 = P -value of stopping rule)
◃ But, scientific community was skeptical about adjuvant therapy
⇒ New multicentric trial (300 patients) was set up
Bayesian Biostatistics - Piracicaba 2014 295
Prior to the start of the subsequent trial:
◃ Pretrial opinions of the 14 clinical investigators were elicited
◃ The prior distributions of each investigator were constructed by eliciting the priorbelief on the treatment effect (adjuvant versus standard) on a grid of intervals
◃ Average of all priors = community prior
◃ Average of the priors of the 5 most skeptical investigators = skeptical prior
To exemplify the use of the skeptical prior:
◃ Combine skeptical prior with interim analysis results of previous trial
⇒ 1-sided contour probability (in 1st interim analysis) = 0.49
⇒ The first trial would not have been stopped for efficacy
Bayesian Biostatistics - Piracicaba 2014 296
Questionnaire:
Bayesian Biostatistics - Piracicaba 2014 297
Prior of investigators:
Bayesian Biostatistics - Piracicaba 2014 298
Skeptical priors:
Bayesian Biostatistics - Piracicaba 2014 299
A formal skeptical/enthusiastic prior
Formal subjective priors (Spiegelhalter et al., 1994) in normal case:
• Useful in the context of monitoring clinical trials in a Bayesian manner
• θ = true effect of treatment (A versus B)
• Skeptical normal prior: choose mean and variance of p(θ) to reflect skepticism
• Enthusiastic normal prior: choose mean and variance of p(θ) to reflectenthusiasm
• See figure next page & book
Bayesian Biostatistics - Piracicaba 2014 300
Example V.8+9
−1.0 −0.5 0.0 0.5 1.0
0.0
0.5
1.0
1.5
2.0
θ
skeptical prior
enthusiastic prior
θa
5%5%
Bayesian Biostatistics - Piracicaba 2014 301
5.6 Prior distributions for regression models
Bayesian Biostatistics - Piracicaba 2014 302
5.6.1 Normal linear regression
Normal linear regression model:
yi = xTi β + εi, (i = 1, . . . , n)
y =Xβ + ε
Bayesian Biostatistics - Piracicaba 2014 303
Priors
• Non-informative priors:
◃ Popular NI prior: p(β, σ2) ∝ σ−2 (Jeffreys multi-parameter rule)
◃ WinBUGS: product of independent N(0, σ20) (σ
20 large) + IG(ε, ε) (ε small)
• Conjugate priors:
◃ Conjugate NIG prior = N(β0, σ2Σ0) × IG(a0, b0) (or Inv-χ
2(ν0, τ20 ))
• Historical/expert priors:
◃ Prior knowledge on regression coefficients must be given jointly
◃ Elicitation process via distributions at covariate values
◃ Most popular: express prior based on historical data
Bayesian Biostatistics - Piracicaba 2014 304
5.6.2 Generalized linear models
• In practice choice of NI priors much the same as with linear models
• But, too large prior variance may not be best for sampling, e.g. in logisticregression model
• In SAS: Jeffreys (improper) prior can be chosen
• Conjugate priors are based on fictive historical data
◃ Data augmentation priors & conditional mean priors
◃ Not implemented in classical software, but fictive data can be explicitly addedand then standard software can be used
Bayesian Biostatistics - Piracicaba 2014 305
5.7 Modeling priors
Modeling prior: adapt characteristics of the statistical model
• Multicollinearity: appropriate prior avoids inflation of of β
• Numerical (separation) problems: appropriate prior avoids inflation of β
• Constraints on parameters: constraint can be put in prior
• Variable selection: prior can direct the variable search
Bayesian Biostatistics - Piracicaba 2014 306
Multicollinearity
Multicollinearity: |XTX| ≈ 0 ⇒ regression coefficients and standard errors inflated
Ridge regression:
◃ Minimize: (y∗ −Xβ)T (y∗ −Xβ) + λβTβ with λ ≥ 0 & y∗ = y − y1n
◃ Estimate: βR(λ) = (XTX + λI)−1XTy
= Posterior mode of a Bayesian normal linear regression analysis with:
◃ Normal ridge prior N(0, τ 2I) for β
◃ τ 2 = σ2/λ with σ and λ fixed
• Can be easily extended to BGLIM
Bayesian Biostatistics - Piracicaba 2014 307
Numerical (separation) problems
Separation problems in binary regression models: complete separation andquasi-complete separation
Solution: Take weakly informative prior on regression coefficients
0 2 4 6 8 10
02
46
810
x1
x2
0
0
0
0
0
0
0
1
1
1
1
1
1
1
quasi complete separation
N(0,100)
Cauchy (Gelman)
Bayesian Biostatistics - Piracicaba 2014 308
Constraints on parameters
Signal-Tandmobielr study:
• θk = probability of CE among Flemish children in (k = 1, . . . , 6) school year
• Constraint on parameters: θ1 ≤ θ2 ≤ · · · ≤ θ6
• Solutions:
◃ Prior on θ = (θ1, . . . , θ6)T that maps all θs that violate the constraint to zero
◃ Neglect the values that are not allowed in the posterior (useful when sampling)
Bayesian Biostatistics - Piracicaba 2014 309
Other modeling priors
• LASSO prior (see Bayesian variable selection)
• . . .
Bayesian Biostatistics - Piracicaba 2014 310
5.8 Other regression models
• A great variety of models
• Not considered here: conditional logistic regression model, Cox proportionalhazards model, generalized linear mixed effects models
• . . .
Bayesian Biostatistics - Piracicaba 2014 311
Take home messages
• Often prior is dominated by the likelihood (data)
• Prior in RCTs: prior to the trial
• Conjugate priors: convenient mathematically, computationally and from aninterpretational viewpoint
• Conditional conjugate priors: heavily used in Gibbs sampling
• Hyperpriors: extend the range of conjugate priors, also important in Gibbssampling
Bayesian Biostatistics - Piracicaba 2014 312
• Noninformative priors:
◃ do not exist, strictly speaking
◃ in practice vague priors (e.g. locally uniform) are ok
◃ important class of NI priors: Jeffreys priors
◃ be careful with improper priors, they might imply improper posterior
• Informative priors:
◃ can be based on historical data & expert knowledge (but only useful whenviewpoint of a community of experts)
◃ are useful in clinical trials to reduce sample size
Bayesian Biostatistics - Piracicaba 2014 313
Chapter 6
Markov chain Monte Carlo sampling
Aims:
◃ Introduce the sampling approach(es) that revolutionized Bayesian approach
Bayesian Biostatistics - Piracicaba 2014 314
6.1 Introduction
◃ Solving the posterior distribution analytically is often not feasible due to thedifficulty in determining the integration constant
◃ Computing the integral using numerical integration methods is a practicalalternative if only a few parameters are involved
⇒ New computational approach is needed
◃ Sampling is the way to go!
◃ With Markov chain Monte Carlo (MCMC) methods:
1. Gibbs sampler
2. Metropolis-(Hastings) algorithm
MCMC approaches have revolutionized Bayesian methods!
Bayesian Biostatistics - Piracicaba 2014 315
Intermezzo: Joint, marginal and conditional probability
Two (discrete) random variables X and Y
• Joint probability of X and Y: probability that X=x and Y=y happen together
• Marginal probability of X: probability that X=x happens
• Marginal probability of Y: probability that Y=y happens
• Conditional probability of X given Y=y: probability that X=x happens if Y=y
• Conditional probability of Y given X=x: probability that Y=y happens if X=x
Bayesian Biostatistics - Piracicaba 2014 316
Intermezzo: Joint, marginal and conditional probability
IBBENS study: 563 (556) bank employees in 8 subsidiaries of Belgian bankparticipated in a dietary study
LENGTH
WE
IGH
T
140 150 160 170 180 190 200
4060
8010
012
0
Bayesian Biostatistics - Piracicaba 2014 317
Intermezzo: Joint, marginal and conditional probability
IBBENS study: 563 (556) bank employees in 8 subsidiaries of Belgian bankparticipated in a dietary study
LENGTH
WE
IGH
T
140 150 160 170 180 190 200
4060
8010
012
0
Bayesian Biostatistics - Piracicaba 2014 318
Intermezzo: Joint, marginal and conditional probability
IBBENS study: frequency table
Length
Weight −150 150− 160 160− 170 170− 180 180− 190 190− 200 200− Total
−50 2 12 4 0 0 0 0 18
50− 60 1 25 50 14 0 0 0 90
60− 70 0 12 54 52 13 1 0 132
70− 80 0 5 42 72 34 0 0 153
80− 90 0 0 12 58 32 2 1 105
90− 100 0 0 0 20 18 3 0 41
100− 110 0 0 1 2 7 1 0 11
110− 120 0 0 0 2 2 1 0 5
120− 0 0 0 0 1 0 0 1
Total 3 54 163 220 107 8 1 556
Bayesian Biostatistics - Piracicaba 2014 319
Intermezzo: Joint, marginal and conditional probability
IBBENS study: joint probability
Length
Weight −150 150− 160 160− 170 170− 180 180− 190 190− 200 200− total
−50 2/556 12/556 4/556 0/556 0/556 0/556 0/556 18/556
50− 60 1/556 25/556 50/556 14/556 0/556 0/556 0/556 90/556
60− 70 0/556 12/556 54/556 52/556 13/556 1/556 0/556 132/556
70− 80 0/556 5/556 42/556 72/556 34/556 0/556 0/556 153/556
80− 90 0/556 0/556 12/556 58/556 32/556 2/556 1/556 105/556
90− 100 0/556 0/556 0/556 20/556 18/556 3/556 0/556 41/556
100− 110 0/556 0/556 1/556 2/556 7/556 1/556 0/556 11/556
110− 120 0/556 0/556 0/556 2/556 2/556 1/556 0/556 5/556
120− 0/556 0/556 0/556 0/556 1/556 0/556 0/556 1/556
Total 3/556 54/556 163/556 220/556 107/556 8/556 1/556 1
Bayesian Biostatistics - Piracicaba 2014 320
Intermezzo: Joint, marginal and conditional probability
IBBENS study: marginal probabilities
Length
Weight −150 150− 160 160− 170 170− 180 180− 190 190− 200 200− total
−50 2/556 12/556 4/556 0/556 0/556 0/556 0/556 18/556
50− 60 1/556 25/556 50/556 14/556 0/556 0/556 0/556 90/556
60− 70 0/556 12/556 54/556 52/556 13/556 1/556 0/556 132/556
70− 80 0/556 5/556 42/556 72/556 34/556 0/556 0/556 153/556
80− 90 0/556 0/556 12/556 58/556 32/556 2/556 1/556 105/556
90− 100 0/556 0/556 0/556 20/556 18/556 3/556 0/556 41/556
100− 110 0/556 0/556 1/556 2/556 7/556 1/556 0/556 11/556
110− 120 0/556 0/556 0/556 2/556 2/556 1/556 0/556 5/556
120− 0/556 0/556 0/556 0/556 1/556 0/556 0/556 1/556
Total 3/556 54/556 163/556 220/556 107/556 8/556 1/556 1
Bayesian Biostatistics - Piracicaba 2014 321
Intermezzo: Joint, marginal and conditional probability
IBBENS study: conditional probabilities
Length
Weight −150 150− 160 160− 170 170− 180 180− 190 190− 200 200− total
−50 12/54
50− 60 1/90 25/90 25/54 50/90 14/90 0/90 0/90 0/90 90/90
60− 70 12/54
70− 80 5/54
80− 90 0/54
90− 100 0/54
100− 110 0/54
110− 120 0/54
120− 0/54
Total 54/54
Bayesian Biostatistics - Piracicaba 2014 322
Intermezzo: Joint, marginal and conditional density
Two (continuous) random variables X and Y
• Joint density of X and Y: density f (x, y)
• Marginal density of X: density f (x)
• Marginal density of Y: density f (y)
• Conditional density of X given Y=y: density f (x|y)
• Conditional density of Y given X=x: density f (y|x)
Bayesian Biostatistics - Piracicaba 2014 323
Intermezzo: Joint, marginal and conditional density
IBBENS study: joint density
Bayesian Biostatistics - Piracicaba 2014 324
Intermezzo: Joint, marginal and conditional density
IBBENS study: marginal densities
Bayesian Biostatistics - Piracicaba 2014 325
Intermezzo: Joint, marginal and conditional density
IBBENS study: conditional densities
Conditional density of
LENGTH GIVEN WEIGHT
Conditional density of
WEIGHT GIVEN LENGTH
Bayesian Biostatistics - Piracicaba 2014 326
6.2 The Gibbs sampler
• Gibbs Sampler: introduced by Geman and Geman (1984) in the context ofimage-processing for the estimation of the parameters of the Gibbs distribution
• Gelfand and Smith (1990) introduced Gibbs sampling to tackle complexestimation problems in a Bayesian manner
Bayesian Biostatistics - Piracicaba 2014 327
6.2.1 The bivariate Gibbs sampler
Method of Composition:
• p(θ1, θ2 | y) is completely determined by:
◃ marginal p(θ2 | y)
◃ conditional p(θ1 | θ2,y)
• Split-up yields a simple way to sample from joint distribution
Bayesian Biostatistics - Piracicaba 2014 328
Gibbs sampling:
• p(θ1, θ2 | y) is completely determined by:
◃ conditional p(θ2 | θ1,y)
◃ conditional p(θ1 | θ2,y)
• Property yields another simple way to sample from joint distribution:
◃ Take starting values θ01 and θ02 (only 1 is needed)
◃ Given θk1 and θk2 at iteration k, generate the (k + 1)-th value according toiterative scheme:
1. Sample θ(k+1)1 from p(θ1 | θk2 ,y)
2. Sample θ(k+1)2 from p(θ2 | θ(k+1)
1 ,y)
Bayesian Biostatistics - Piracicaba 2014 329
Result of Gibbs sampling:
• Chain of vectors: θk = (θk1 , θk2)T , k = 1, 2, . . .
◦ Consists of dependent elements
◦ Markov property: p(θ(k+1) | θk, θ(k−1), . . . , y) = p(θ(k+1) | θk,y)
• Chain depends on starting value + initial portion/burn-in part must be discarded
• Under mild conditions: sample from the posterior distribution = target distribution
⇒ From k0 on: summary measures calculated from the chain consistently estimatethe true posterior measures
Gibbs sampler is called a Markov chain Monte Carlo method
Bayesian Biostatistics - Piracicaba 2014 330
Example VI.1: SAP study – Gibbs sampling the posterior with NI priors
• Example IV.5: sampling from posterior distribution of the normal likelihood basedon 250 alp measurements of ‘healthy’ patients with NI prior for both parameters
• Now using Gibbs sampler based on y = 100/√alp
• Determine two conditional distributions:
1. p(µ | σ2,y): N(µ | y, σ2/n)2. p(σ2 | µ,y): Inv− χ2(σ2 | n, s2µ) with s2µ = 1
n
∑ni=1(yi − µ)2
• Iterative procedure: At iteration (k + 1)
1. Sample µ(k+1) from N(y, (σ2)k/n)
2. Sample (σ2)(k+1) from Inv− χ2(n, s2µ(k+1))
Bayesian Biostatistics - Piracicaba 2014 331
Gibbs sampling:
6 7 8 9 100
12
34
56
µ
σ2
6 7 8 9 10
01
23
45
6
µ
σ2
6 7 8 9 10
01
23
45
6
µ
σ2
6 7 8 9 10
01
23
45
6
µ
σ2
◦ Sampling from conditional density of µ given σ2
◦ Sampling from conditional density of σ2 given µ
Bayesian Biostatistics - Piracicaba 2014 332
Gibbs sampling path and sample from joint posterior:
6.6 6.8 7.0 7.2 7.4
1.4
1.6
1.8
2.0
2.2
2.4
2.6
µ
σ2
(a)
6.6 6.8 7.0 7.2 7.4
1.4
1.6
1.8
2.0
2.2
2.4
2.6
µ
σ2
(b)
◦ Zigzag pattern in the (µ, σ2)-plane
◦ 1 complete step = 2 substeps (blue=genuine element)
◦ Burn-in = 500, total chain = 1,500
Bayesian Biostatistics - Piracicaba 2014 333
Posterior distributions:
µ
6.8 6.9 7.0 7.1 7.2 7.3 7.4
01
23
4
(a)
σ2
1.4 1.6 1.8 2.0 2.2 2.4
0.0
0.5
1.0
1.5
2.0
2.5
(b)
Solid line = true posterior distribution
Bayesian Biostatistics - Piracicaba 2014 334
Example VI.2: Sampling from a discrete × continuous distribution
• Joint distribution: f (x, y) ∝(nx
)yx+α−1(1− y)(n−x+β−1)
◦ x a discrete random variable taking values in {0, 1, . . . , n}◦ y a random variable on the unit interval
◦ α, β > 0 parameters
• Question: f (x)?
Bayesian Biostatistics - Piracicaba 2014 335
Marginal distribution:
x
Density
0 5 10 15 20 25 30
0.00
0.02
0.04
0.06
0.08
◦ Solid line = true marginal distribution
◦ Burn-in = 500, total chain = 1,500
Bayesian Biostatistics - Piracicaba 2014 336
Example VI.3: SAP study – Gibbs sampling the posterior with I priors
• Example VI.1: now with independent informative priors (semi-conjugate prior)
◦ µ ∼ N(µ0, σ20)
◦ σ2 ∼ Inv− χ2(ν0, τ20 )
• Posterior:
p(µ, σ2 | y) ∝ 1
σ0e− 1
2σ20(µ−µ0)2
× (σ2)−(ν0/2+1) e−ν0 τ20/2σ
2
× 1
σn
n∏i=1
e− 1
2σ2(yi−µ)2
∝n∏i=1
e− 1
2σ2(yi−µ)2 e
− 12σ20
(µ−µ0)2(σ2)−(
n+ν02 +1) e−ν0 τ
20/2σ
2
Bayesian Biostatistics - Piracicaba 2014 337
Conditional distributions:
• Determine two conditional distributions:
1. p(µ | σ2,y):∏n
i=1 e− 1
2σ2(yi−µ)2 e
− 12σ20
(µ−µ0)2(N(µk, (σ2)k
))
2. p(σ2 | µ,y): Inv− χ2(ν0 + n,
∑ni=1(yi−µ)2+ν0τ20
ν0+n
)
• Iterative procedure: At iteration (k + 1)
1. Sample µ(k+1) from N(µk, (σ2)k
)2. Sample (σ2)(k+1) from Inv− χ2
(ν0 + n,
∑ni=1(yi−µ)2+ν0τ20
ν0+n
)
Bayesian Biostatistics - Piracicaba 2014 338
Trace plots:
0 500 1000 1500
5.5
6.0
6.5
7.0
Iteration
µ
(a)
0 500 1000 1500
1.8
2.2
2.6
3.0
Iteration
σ2
(b)
Bayesian Biostatistics - Piracicaba 2014 339
6.2.2 The general Gibbs sampler
Starting position θ0 = (θ01, . . . , θ0d)T
Multivariate version of the Gibbs sampler:
Iteration (k + 1):
1. Sample θ(k+1)1 from p(θ1 | θk2 , . . . , θk(d−1), θ
kd,y)
2. Sample θ(k+1)2 from p(θ2 | θ(k+1)
1 , θk3 , . . . , θkd,y)
...
d. Sample θ(k+1)d from p(θd | θ(k+1)
1 , . . . , θ(k+1)(d−1), y)
Bayesian Biostatistics - Piracicaba 2014 340
• Full conditional distributions: p(θj | θk1 , . . . , θk(j−1), θk(j+1), . . . , θ
k(d−1), θ
kd,y)
• Also called: full conditionals
• Under mild regularity conditions:
θk,θ(k+1), . . . ultimately are observations from the posterior distribution
With the help of advanced sampling algorithms (AR, ARS, ARMS, etc)sampling the full conditionals is done based on the prior × likelihood
Bayesian Biostatistics - Piracicaba 2014 341
Example VI.4: British coal mining disasters data
◃ British coal mining disasters data set: # severe accidents in British coal minesfrom 1851 to 1962
◃ Decrease in frequency of disasters from year 40 (+ 1850) onwards?
0 20 40 60 80 100
01
23
45
6
1850+year
# D
isaste
rs
Bayesian Biostatistics - Piracicaba 2014 342
Statistical model:
• Likelihood: Poisson process with a change point at k
◃ yi ∼ Poisson(θ) for i = 1, . . . , k
◃ yi ∼ Poisson(λ) for i = k + 1, . . . , n (n=112)
• Priors
◃ θ: Gamma(a1, b1), (a1 constant, b1 parameter)
◃ λ: Gamma(a2, b2), (a2 constant, b2 parameter)
◃ k: p(k) = 1/n
◃ b1: Gamma(c1, d1), (c1, d1 constants)
◃ b2: Gamma(c2, d2), (c2, d2 constants)
Bayesian Biostatistics - Piracicaba 2014 343
Full conditionals:
p(θ | y, λ, b1, b2, k) = Gamma(a1 +k∑i=1
yi, k + b1)
p(λ | y, θ, b1, b2, k) = Gamma(a2 +n∑
i=k+1
yi, n− k + b2)
p(b1 | y, θ, λ, b2, k) = Gamma(a1 + c1, θ + d1)
p(b2 | y, θ, λ, b1, k) = Gamma(a2 + c2, λ + d2)
p(k | y, θ, λ, b1, b2) =π(y | k, θ, λ)∑nj=1 π(y | j, θ, λ)
with π(y | k, θ, λ) = exp [k(λ− θ)]
(θ
λ
)∑ki=1 yi
◦ a1 = a2 = 0.5, c1 = c2 = 0, d1 = d2 = 1
Bayesian Biostatistics - Piracicaba 2014 344
Posterior distributions:
θ λ
2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
k
35 40 45
0.00
0.05
0.10
0.15
0.20
◦ Posterior mode of k: 1891
◦ Posterior mean for θ/λ= 3.42 with 95% CI = [2.48, 4.59]
Bayesian Biostatistics - Piracicaba 2014 345
Note:
• In most published analyses of this data set b1 and b2 are given inverse gammapriors. The full conditionals are then also inverse gamma
• The results are almost the same ⇒ our analysis is a sensitivity analysis of theanalyses seen in the literature
• Despite the classical full conditionals, the WinBUGS/OpenBUGS sampler for θand λ are not standard gamma but rather a slice sampler. See Exercise 8.10.
Bayesian Biostatistics - Piracicaba 2014 346
Example VI.5: Osteoporosis study – Using the Gibbs sampler
Bayesian linear regression model with NI priors:
◃ Regression model: tbbmci = β0 + β1bmii + εi (i = 1, . . . , n = 234)
◃ Priors: p(β0, β1, σ2) ∝ σ−2
◃ Notation: y = (tbbmc1, . . . , tbbmc234)T , x = (bmi1, . . . , bmi234)
T
Full conditionals: p(σ2 | β0, β1,y) = Inv− χ2(n, s2β)
p(β0 | σ2, β1,y) = N(rβ1, σ2/n)
p(β1 | σ2, β0,y) = N(rβ0, σ2/xTx)
withs2β = 1
n
∑(yi − β0 − β1 xi)
2
rβ1 =1n
∑(yi − β1 xi)
rβ0 =∑
(yi − β0)xi/xTx
Bayesian Biostatistics - Piracicaba 2014 347
Comparison with Method of Composition:
Parameter Method of Composition
2.5% 25% 50% 75% 97.5% Mean SD
β0 0.57 0.74 0.81 0.89 1.05 0.81 0.12
β1 0.032 0.038 0.040 0.043 0.049 0.040 0.004
σ2 0.069 0.078 0.083 0.088 0.100 0.083 0.008
Gibbs sampler
2.5% 25% 50% 75% 97.5% Mean SD
β0 0.67 0.77 0.84 0.91 1.10 0.77 0.11
β1 0.030 0.036 0.040 0.042 0.046 0.039 0.0041
σ2 0.069 0.077 0.083 0.088 0.099 0.083 0.0077
◦ Method of Composition = 1,000 independently sampled values
◦ Gibbs sampler: burn-in = 500, total chain = 1,500
Bayesian Biostatistics - Piracicaba 2014 348
Index plot from Method of Composition:
0 200 400 600 800 1000
0.030
0.045
Index
β1
(a)
0 200 400 600 800 1000
0.07
0.09
0.11
Index
σ2
(b)
Bayesian Biostatistics - Piracicaba 2014 349
Trace plot from Gibbs sampler:
0 500 1000 1500
0.030
0.045
Iteration
β1
(a)
0 500 1000 1500
0.06
0.08
0.10
Iteration
σ2
(b)
Bayesian Biostatistics - Piracicaba 2014 350
Trace versus index plot:
Comparison of index plot with trace plot shows:
• σ2: index plot and trace plot similar ⇒ (almost) independent sampling
• β1: trace plot shows slow mixing ⇒ quite dependent sampling
⇒ Method of Composition and Gibbs sampling: similar posterior measures of σ2
⇒ Method of Composition and Gibbs sampling: less similar posterior measures ofβ1
Bayesian Biostatistics - Piracicaba 2014 351
Autocorrelation:
◃ Autocorrelation of lag 1: correlation of βk1 with β(k−1)1 (k=1, . . .)
◃ Autocorrelation of lag 2: correlation of βk1 with β(k−2)1 (k=1, . . .)
. . .
◃ Autocorrelation of lag m: correlation of βk1 with β(k−m)1 (k=1, . . .)
High autocorrelation:
⇒ burn-in part is larger ⇒ takes longer to forget initial positions
⇒ remaining part needs to be longer to obtain stable posterior measures
Bayesian Biostatistics - Piracicaba 2014 352
6.2.3 Remarks∗
• Full conditionals determine joint distribution
• Generate joint distribution from full conditionals
• Transition kernel
Bayesian Biostatistics - Piracicaba 2014 353
6.2.4 Review of Gibbs sampling approaches
Sampling the full conditionals is done via different algorithms depending on:
◃ Shape of full conditional (classical versus general purpose algorithm)
◃ Preference of software developer:
◦ SASr procedures GENMOD, LIFEREG and PHREG: ARMS algorithm
◦ WinBUGS: variety of samplers
Several versions of the basic Gibbs sampler:
◃ Deterministic- or systematic scan Gibbs sampler: d dims visited in fixed order
◃ Block Gibbs sampler: d dims split up into m blocks of parameters and Gibbssampler applied to blocks
Bayesian Biostatistics - Piracicaba 2014 354
Review of Gibbs sampling approaches – The block Gibbs sampler
Block Gibbs sampler:
• Normal linear regression
◃ p(σ2 | β0, β1,y)
◃ p(β0, β1 | σ2,y)
• May speed up considerably convergence, at the expense of more computationaltime needed at each iteration
• WinBUGS: blocking option on
• SASr procedure MCMC: allows the user to specify the blocks
Bayesian Biostatistics - Piracicaba 2014 355
6.3 The Metropolis(-Hastings) algorithm
Metropolis-Hastings (MH) algorithm = general Markov chain Monte Carlotechnique to sample from the posterior distribution but does not require fullconditionals
• Special case: Metropolis algorithm proposed by Metropolis in 1953
• General case: Metropolis-Hastings algorithm proposed by Hastings in 1970
• Became popular only after introduction of Gelfand & Smith’s paper (1990)
• Further generalization: Reversible Jump MCMC algorithm by Green (1995)
Bayesian Biostatistics - Piracicaba 2014 356
6.3.1 The Metropolis algorithm
Sketch of algorithm:
• New positions are proposed by a proposal density q
• Proposed positions will be:
◃ Accepted:
◦ Proposed location has higher posterior probability: with probability 1
◦ Otherwise: with probability proportional to ratio of posterior probabilities
◃ Rejected:
◦ Otherwise
• Algorithm satisfies again Markov property ⇒ MCMC algorithm
• Similarity with AR algorithm
Bayesian Biostatistics - Piracicaba 2014 357
Metropolis algorithm:
Chain is at θk ⇒ Metropolis algorithm samples value θ(k+1) as follows:
1. Sample a candidate θ from the symmetric proposal density q(θ | θ), withθ = θk
2. The next value θ(k+1) will be equal to:
• θ with probability α(θk, θ) (accept proposal),
• θk otherwise (reject proposal),
with
α(θk, θ) = min
(r =
p(θ | y)p(θk | y)
, 1
)
Function α(θk, θ) = probability of a move
Bayesian Biostatistics - Piracicaba 2014 358
The MH algorithm only requires the product of the prior and the likelihoodto sample from the posterior
Bayesian Biostatistics - Piracicaba 2014 359
Example VI.7: SAP study – Metropolis algorithm for NI prior case
Settings as in Example VI.1, now apply Metropolis algorithm:
◃ Proposal density: N(θk,Σ) with θk = (µk, (σ2)k)T and Σ = diag(0.03, 0.03)
6.6 6.8 7.0 7.2 7.4
1.4
1.6
1.8
2.0
2.2
2.4
2.6
µ
σ2
(a)
6.6 6.8 7.0 7.2 7.4
1.4
1.6
1.8
2.0
2.2
2.4
2.6
µ
σ2
(b)
◦ Jumps to any location in the (µ, σ2)-plane
◦ Burn-in = 500, total chain = 1,500
Bayesian Biostatistics - Piracicaba 2014 360
MH-sampling:
6.5 7.0 7.5
1.5
2.0
2.5
µ
σ2
●
6.5 7.0 7.5
1.5
2.0
2.5
µ
σ2
●
●
●
●
6.5 7.0 7.5
1.5
2.0
2.5
µ
σ2
●
●
●
Bayesian Biostatistics - Piracicaba 2014 361
Marginal posterior distributions:
µ
6.9 7.0 7.1 7.2 7.3
01
23
45
6 (a)
σ2
1.6 1.8 2.0 2.2 2.4
0.0
0.5
1.0
1.5
2.0
2.5
3.0 (b)
◦ Acceptance rate = 40%
◦ Burn-in = 500, total chain = 1,500
Bayesian Biostatistics - Piracicaba 2014 362
Trace plots:
600 800 1000 1200 1400
6.9
7.1
7.3
Iteration
µ
(a)
600 800 1000 1200 1400
1.6
2.0
2.4
Iteration
σ2
(b)
◦ Accepted moves = blue color, rejected moves = red color
Bayesian Biostatistics - Piracicaba 2014 363
Second choice of proposal density:
◃ Proposal density: N(θk,Σ) with θk = (µk, (σ2)k)T and Σ = diag(0.001, 0.001)
6.6 6.8 7.0 7.2 7.4
1.4
1.6
1.8
2.0
2.2
2.4
2.6
µ
σ2
(a)
σ2
1.5 1.7 1.9 2.1
0.0
1.0
2.0
3.0
(b)
◦ Acceptance rate = 84%
◦ Poor approximation of true distribution
Bayesian Biostatistics - Piracicaba 2014 364
Accepted + rejected positions:
6.5 7.0 7.5
1.5
2.0
2.5
Variance proposal = 0.03
µ
σ2
●
●
●●
●
●
●
●
●
●●
●
●●●
●
6.5 7.0 7.5
1.5
2.0
2.5
Variance proposal = 0.001
µ
σ2
● ●
●
●●
●●●
6.5 7.0 7.5
1.5
2.0
2.5
Variance proposal = 0.1
µ
σ2
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
Bayesian Biostatistics - Piracicaba 2014 365
Problem:
What should be the acceptance rate for a good Metropolis algorithm?
From theoretical work + simulations:
• Acceptance rate: 45% for d = 1 and ≈ 24% for d > 1
Bayesian Biostatistics - Piracicaba 2014 366
6.3.2 The Metropolis-Hastings algorithm
Metropolis-Hastings algorithm:
Chain is at θk ⇒ Metropolis-Hastings algorithm samples value θ(k+1) as follows:
1. Sample a candidate θ from the (asymmetric) proposal density q(θ | θ), withθ = θk
2. The next value θ(k+1) will be equal to:
• θ with probability α(θk, θ) (accept proposal),
• θk otherwise (reject proposal),
with
α(θk, θ) = min
(r =
p(θ | y) q(θk | θ)p(θk | y) q(θ | θk)
, 1
)
Bayesian Biostatistics - Piracicaba 2014 367
• Reversibility condition: Probability of move from θ to θ = probability of movefrom θ to θ
• Reversible chain: chain satisfying reversibility condition
• Example asymmetric proposal density: q(θ | θk) ≡ q(θ) (Independent MHalgorithm)
• WinBUGS makes use of univariate MH algorithm to sample from somenon-standard full conditionals
Bayesian Biostatistics - Piracicaba 2014 368
Example VI.8: Sampling a t-distribution using Independent MH algorithm
Target distribution : t3(3, 22)-distribution
(a) Independent MH algorithm with proposal density N(3,42)
(b) Independent MH algorithm with proposal density N(3,22)
t
−5 0 5 10
0.00
0.10
0.20
0.30 (a)
t
−5 0 5 10
0.00
0.10
0.20
0.30 (b)
Bayesian Biostatistics - Piracicaba 2014 369
6.3.3 Remarks*
• The Gibbs sampler is a special case of the Metropolis-Hastings algorithm, butGibbs sampler is still treated differently
• The transition kernel of the MH-algorithm
• The reversibility condition
• Difference with AR algorithm
Bayesian Biostatistics - Piracicaba 2014 370
6.5. Choice of the sampler
Choice of the sampler depends on a variety of considerations
Bayesian Biostatistics - Piracicaba 2014 371
Example VI.9: Caries study – MCMC approaches for logistic regression
Subset of n = 500 children of the Signal-Tandmobielr study at 1st examination:
◃ Research questions:
◦ Have girls a different risk for developing caries experience (CE ) than boys(gender) in the first year of primary school?
◦ Is there an east-west gradient (x-coordinate) in CE?
◃ Bayesian model: logistic regression + N(0, 1002) priors for regression coefficients
◃ No standard full conditionals
◃ Three algorithms:
◦ Self-written R program: evaluate full conditionals on a grid + ICDF-method
◦ WinBUGS program: multivariate MH algorithm (blocking mode on)
◦ SASr procedure MCMC: Random-Walk MH algorithm
Bayesian Biostatistics - Piracicaba 2014 372
Program Parameter Mode Mean SD Median MCSE
Intercept -0.5900 0.2800
MLE gender -0.0379 0.1810
x-coord 0.0052 0.0017
Intercept -0.5880 0.2840 -0.5860 0.0104
R gender -0.0516 0.1850 -0.0578 0.0071
x-coord 0.0052 0.0017 0.0052 6.621E-5
Intercept -0.5800 0.2810 -0.5730 0.0094
WinBUGS gender -0.0379 0.1770 -0.0324 0.0060
x-coord 0.0052 0.0018 0.0053 5.901E-5
Intercept -0.6530 0.2600 -0.6450 0.0317
SASr gender -0.0319 0.1950 -0.0443 0.0208
x-coord 0.0055 0.0016 0.0055 0.00016
Bayesian Biostatistics - Piracicaba 2014 373
Conclusions:
• Posterior means/medians of the three samplers are close (to the MLE)
• Precision with which the posterior mean was determined (high precision = lowMCSE) differs considerably
• The clinical conclusion was the same
⇒ Samplers may have quite a different efficiency
Bayesian Biostatistics - Piracicaba 2014 374
Take home messages
• The two MCMC approaches allow fitting basically any proposed model
• There is no free lunch: computation time can be MUCH longer than withlikelihood approaches
• The choice between Gibbs sampling and the Metropolis-Hastings approachdepends on computational and practical considerations
Bayesian Biostatistics - Piracicaba 2014 375