Lecture 8 Other MCMC features in MLwiN and the MLwiN->WinBUGS interface.

transcript

Lecture 8

Other MCMC features in MLwiN and the MLwiN->WinBUGS

interface

Lecture Contents

• Convergence diagnostics.

• Starting values.

• MLwiN demonstration.

• MLwiN to WinBUGS interface.

• t distributed residuals example.

Recap of practical session

The practical covered two features of MCMC sampling that we will cover in more detail here:

1. Convergence diagnostics and summary statistics.

2. Choosing starting values for unknown parameters.

Reunion Island Model revisited

We firstly fitted the following model to the dataset:

Trajectories window

• By default we have the last 500 iterations• Clearly some chains mix better than others!

MCMC diagnostics for 0

• We will describe each pane separately

Trace plot

• This graph plots the generated values of the parameter against the iteration number.

• A crude test of mixing is the ‘blue finger’ test.

• This chain doesn’t mix that well but could be worse!

Kernel Density plot

This plot is like a smoothed histogram.Instead of counting the estimates into bins of particular

widths like a histogram, the effect of each iteration is spread around the estimate via a Kernel function e.g. a normal distribution.

This means that at each point we get the sum of the Kernel function parts for each iteration.

The Kernel density plot has a smoothness parameter that can be modified.

Time series diagnostics

Here we have the Auto correlation function (ACF) and partial autocorrelation function (PACF) plots.

The ACF measures how correlated the values in the chain are with their close neighbours. The lag is the distance between the two chains to be compared.

An independent chain will have approximately zero autocorrelation at each lag.

A Markov chain should have a power relationship in the lags i.e. if ACF(1) = then ACF(2) = 2 etc. This is known as an AR(1) process.

The PACF measures discrepancies from such a process and so should normally have values 0 after lag 1.

Accuracy Diagnostics

MLwiN has 2 accuracy diagnostics:• Raftery-Lewis works on quantiles of distribution.• Brooks-Draper works on quoting the mean to n

significant figures. It’s formulae uses the estimate, it’s s.d. and the ACF and it can often give very small or very large values!

Raftery Lewis Diagnostic

This diagnostic is based on the properties of a 2 state Markov chain.

For a given quantile the chain is dichotomized into values that are greater (1) or less than (0) the value of the quantile.

For an independent chain the 0’s and 1’s should appear randomly. If there is large clustering (in time) of 0’s or 1’s then the chain is mixing badly.

Summary Statistics

Three estimates of location are given:

Mean – from the chain.

Mode – from the kernel plot.

Median (50% quantile) – by sorting the chain and finding the middle value.

The SD is calculated from the chain and the other quantiles are used to give (possibly) non-symmetric interval estimates.

The MCSE and ESS will be discussed next.

Monte Carlo Standard Error

The Monte Carlo Standard Error (MCSE) is an indication of how much error is in the estimate due to the fact that MCMC is used.

As the number of iterations increases the MCSE0.

For an independent sampler it equals the SD/n.

However it is adjusted due to the autocorrelation in the chain.

The graph above gives estimates for the MCSE for longer runs.

Effective Sample Size

This quantity gives an estimate of the equivalent number of independent iterations that the chain represents.

This is related to the ACF and the MCSE.

Its formula is:

For this parameter our 5,000 actual iterations are equivalent to only 344 independent iterations!

.)(21 where/1

MCMC diagnostics for 2u

This is the worst mixing parameter with a suggestion of a run of over 100k iterations being required.

MCMC diagnostics for 2u

After 100,000 iterations

Note that now the diagnostics are all satisfied.

Starting Values in MLwiN

By default MLwiN uses the values currently stored for each parameter.

As we usually run MCMC directly after IGLS this means we use IGLS estimates as starting values.

These values are a good place to start and hence burnin will often be short.

This is OK for most multilevel models as we know they should be unimodal.

However it is better to try several starting values to ensure we have convergence.

WinBUGS allows multiple chains to be run which allows multiple chain MCMC diagnostics.

Diagnostics in WinBUGS

WinBUGS uses a multiple chain diagnostic by Brooks, Gelman and Rubin.

This is based on the ratio of between:within chain variance (ANOVA).

WinBUGS produces plots of:• Average 80% interval within-chains (blue) and pooled

80% interval between chains (green) which should converge to stable values.

• Ratio pooled:average interval widths (red) – should converge to 1.0.

• To print values of the diagnostic double click on the plot and press CTRL left hand mouse button.

MCSE in WinBUGS

Accuracy of the posterior estimates can be assessed by the Monte Carlo standard error for each parameter.

If samples were generated independently could estimate SE of mean as

is the sample variance.This will underestimate the true MCSE due to

autocorrelation in the chain.WinBUGS uses a ‘batch means’ method – replace sample

variance S2 by variance of batched means that are assumed independent.

MLwiN replaces actual sample size with effective sample size.

1122 )( where/ i

BUGS history

Bayesian inference Using Gibbs Sampling:• Language for specifying complex Bayesian models.• Constructs object-oriented internal representation of the

model.• Builds up an arbitrary complex model through

specification of local structure.• Simulation from full conditionals using Gibbs sampling• Current version (WinBUGS 1.4) runs in Windows, and

incorporates a script language for running in batch mode.

• ‘Classic’ BUGS available for UNIX but this is an old version.

WinBUGS is freely available from http://www.mrc-bsu.cam.ac.uk

WinBUGS common distributionsExpression Distribution Usage

dbin binomial r ~ dbin(p,n)

dnorm normal x ~ dnorm(mu,tau)

dpois Poisson r ~ dpois(lambda)

dunif uniform x ~ dunif(a,b)

dgamma gamma x ~ dgamma(a,b)

NB: As noted earlier the Normal is parameterised in terms of its mean and precision = 1/variance= 1/sd2.

See ‘Model Specification/The BUGS language: stochastic nodes/ Distributions’ in manual for full syntax.

Note: Functions cannot be used as argument in distributions!! You need to create additional nodes

Some Built in WinBUGS functions

• p <- step(x-.5) : = 1 if x ≥0.5, 0 otherwise hence can be used to monitor when x ≥0.5

• p <- equals(x,.5) : = 1 if x=0.5, 0 otherwise. • tau <- 1/pow(s,2) sets • s <- 1/sqrt(tau) sets • p[i,k] <- inprod(pi[], Lambda[i,,k]) sets

• See ‘Model Specification/Logical nodes’ in manual for full syntax.

j ijkjikp

2/1 s/1s

Graphical Models

Model buildingStatistical modelling of complex systems involve usually

many interconnected random variables.How to we build the connections?Key idea: conditional independenceIt is helpful to represent the modelling process by a graph• Nodes: all random quantities• Links (directed or undirected): association between the

nodesDirected edges: natural ordering of association, “causal”

influenceUndirected edges: symmetric association, correlationThe graph is used to represent a set of conditional

independence statements

Graphical Model: Doodle in WinBUGS

The following is a doodle for the rats example in WinBUGS

for(j IN 1 : T)

for(i IN 1 : N)

sigma.alpha sigma.beta

Y[i, j]

mu[i, j]

beta[i]alpha[i]

tau.betabeta.calpha0alpha.ctau.alpha

The MLwiN to WinBUGS interface

MLwiN has an interface that will generate WinBUGS code for an MLwiN model.

The interface will save the model in two possible versions of WinBUGS and will input back the WinBUGS output in CODA format and use MLwiN diagnostics.

Model Descriptionmodel{# Level 1 definitionfor(i in 1:N) {lncfs[i] ~ dnorm(mu[i],tau)mu[i]<- beta[1] * cons[i] + u2[cow[i]] * cons[i] + u3[herd[i]] * cons[i] }# Higher level definitionsfor (j in 1:n2) {u2[j] ~ dnorm(0,tau.u2)}for (j in 1:n3) {u3[j] ~ dnorm(0,tau.u3)}# Priors for fixed effectsbeta[1] ~ dflat()# Priors for random termstau ~ dgamma(0.001000,0.001000)sigma2 <- 1/tautau.u2 ~ dgamma(0.001000,0.001000)sigma2.u2 <- 1/tau.u2tau.u3 ~ dgamma(0.001000,0.001000)sigma2.u3 <- 1/tau.u3}

WinBUGS describes the model in a mixture of deterministic and stochastic statements.

The interface takes the names from the MLwiN columns and uses some other standard names e.g. beta, sigma2.

Note this means there is some redundancy e.g. cons is included.

Caution: Care must be taken with some column names.

Data description

Note that the data is stored in the file after the word list and between parentheses.

For example:list(N= 3027, n2 = 1575, n3 = 50,

cow = c(1,1,2,2,3,4,5,5,5,5,6,6,6,7,7,7,8,8,8,9,

9,9,9,10,10,10,11,11,12,12,13,14,14,14,14,15,15,15,16,16,

16,17,17,17,17,18,18,18,18,19,20,20,20,21,21,21,22,22,22,22,

23,23,23,23,24,24,25,26,27,27,28,29,29,29,30,30,31,31,32,33,

33,33,33,34,34,34,35,35,35,36,36,37,38,39,39,40,40,40,41,41,

41,41,42,42,42,43,43,43,44,44,44,44,45,45,45,46,46,47,47,47,

….),….

Note some notation e.g. c for vectors borrowed from R and S-plus.

Initial values in WinBUGS

Note that the initial values are also stored in the file after the word list and between parentheses.

For example:list(beta= c(4.217674),

u2 = c( 0.062732,-0.020885,0.054452,0.125001,-0.063034,-0.028254,-0.013445,-0.137141,-0.056420,-0.017268,

0.025402,-0.013377,-0.013452,-0.018461,-0.103801,-0.095923,0.031627,-0.019062,-0.026862,-0.037929,

0.152077,0.008395,-0.015221,0.047518,-0.029718,0.041764,-0.040790,-0.055156,0.048479,-0.036886,

The MLwiN->WinBUGS interface has given the IGLS starting values to WinBUGS.

Running the reunion island dataset in WinBUGS

The model was run for 3 parallel chains using MLwiN starting values and 2 other sets.

We run the 3 chains for 5,000 after a 500 burnin

Even the worst mixing parameter shows overlap in the three chains (see below)

sigma2.u2 chains 1:3

iteration

501 2000 4000

Running the reunion island dataset in WinBUGS

Brooks,Gelman & Rubin Diagnostic

Summary Statistics for the runnode mean sd (2.5%, 97.5%)beta[1] 4.217 0.0192 (4.18,4.255)sigma2 0.135 0.0048 (0.126,0.145)sigma2.u2 0.0193 0.0041 (0.0117,0.0274)sigma2.u3 0.0152 0.0039 (0.0092,0.0244)

sigma2.u2 chains 1:3

iteration

501 2000 4000

Testing the sensitivity to the Normal assumption

When we look at plots of residuals in a multilevel model we often see outlying residuals.

These may be genuine outlying higher level units that do not follow the same distribution or it may be that the Normal distribution is not appropriate.

The Normal is a member of the t family of distributions and other members of the family have longer tails and so may be more appropriate.

We will investigate this in the practical.

Information on the practical

In the practical we will look at how to fit the variance components model for the education dataset using WinBUGS.

We will also examine sensitivity to the Normal assumption by considering instead a t distribution with unknown degrees of freedom and with fixed degrees of freedom (8).

Lecture 8 Other MCMC features in MLwiN and the MLwiN->WinBUGS interface.

Documents