+ All Categories
Home > Documents > Longitudinal Data Analysis

Longitudinal Data Analysis

Date post: 11-May-2023
Category:
Upload: khangminh22
View: 0 times
Download: 0 times
Share this document with a friend
81
Longitudinal Data Analysis Mike Allerhand This document has been produced for the CCACE short course: Longitudinal Data Analysis. No part of this document may be reproduced, in any form or by any means, without permission in writing from the CCACE. The CCACE is jointly funded by the University of Edinburgh and four of the United Kingdom’s research councils, BBSRC, EPSRC, ESRC, MRC, under the Lifelong Health and Wellbeing initiative. The document was generated using the R knitr package and typeset using MiKTeX, (LaTeX for Windows), with beamer and TikZ packages. © 2016, Dr Mike Allerhand, CCACE Statistician. 1 / 81
Transcript

Longitudinal Data Analysis

Mike Allerhand

This document has been produced for the CCACE short course: Longitudinal Data Analysis. No part of this document may be reproduced, in any form or by any means, without permission in writing from the CCACE. The CCACE is jointly funded by the University of Edinburgh and four of the United Kingdom’s research councils, BBSRC, EPSRC, ESRC, MRC, under the Lifelong Health and Wellbeing initiative. The document was generated using the R knitr package and typeset using MiKTeX, (LaTeX for Windows), with beamer and TikZ packages. © 2016, Dr Mike Allerhand, CCACE Statistician.

1 / 81

Straight-line growth

Wide format

1 2 3 4 5

1.75 2.50 3.25 4.00 4.75

Observations at 5 measurement occasions.

Long format

j y x

1 1.75 1.002 2.50 2.003 3.25 3.004 4.00 4.005 4.75 5.00

0 1 2 3 4 5

0

1

2

3

4

5

x

y

2 / 81

Most longitudinal analysis programs require data in “long format”.

Wide format is one row per case, and each row is a complete record.Here there’s just one case, measures of something on 5 subsequent occasions.

Long format is wide format reshaped so that repeated measures of a variable arestacked into a column, here y .

It also needs a column to indicate which measurement occasion, (wave or time-point),each measure belongs to, here j = 1, . . . , 5.

The measurement times are: x = 1, . . . , 5.Here x represents units of time.The coding assumes equal time intervals between successive measurements.

The graph shows the measures y plotted against time x .

Straight-line growth

Equation of the straight line:

yj = β0 + β1xj j = 1, . . . , 5

00

x

y

Interceptβ0

1

Slopeβ1

3 / 81

The straight line is a model of how the measurements y change over time x .

The parameters of the model are the intercept β0 and slope β1.The intercept β0 is y when x = 0.The slope β1 is the change in y per unit x , the change in y when x increases by 1.

This particular model assumes a constant growth rate.y changes by the same amount for ANY unit increase of x .The model does not show how the rate of growth might tail off.

This model is a perfect fit to these data.

Regression by ordinary least squares (OLS)

0 x0

yyj − y

x

yr = 1

0 x0

y

●ejResidual

error

x

y−1 < r < 1

4 / 81

Pearson correlation r = 1 indicates a perfect fit to a straight line.

Correlation −1 < r < 1 indicates there is some “residual error”.A straight line is not a perfect fit.

How do we choose the “best” line if no straight line is a perfect fit?

OLS is a procedure for estimating the parameters of a regression model,such as a straight line.

OLS estimates β0 and β1 for the line with the smallest “residual variance”.

The residual variance σ2e is the variance of the residual errors ej around the line.

Residual variance

Straight line regression model:

yj = β0 + β1xj︸ ︷︷ ︸fixed

+ ej︸︷︷︸random

The residuals ej are assumed to be random measurement errors,as if drawn at random from a normal population with mean 0 and variance σ2

e :

ej ∼ N(0, σ2e )

The errors have zero mean because they vary symmetrically around the line.The consequences of that assumption are:

1. Any regression line always passes through the point (x ,y).y = β0 + β1x

2. If β1 = 0 the intercept β0 is y , the mean response.yj = β0 + ej is a model of the mean.

5 / 81

y = β0 + β1x + e

y =1

n

∑(β0 + β1x + e)

=1

n

∑β0 +

1

n

∑β1x +

1

n

∑e

= β0 + β11

n

∑x +

1

n

∑e

But∑

e = 0 because the errors have zero mean, and 1n

∑x = x , so:

y = β0 + β1x

Therefore the point (x ,y) always lies on the regression line.If β1 = 0 then β0 = y .

This is also true for multiple regression.

y = β0 + β1x1 + β2x2 + . . .

Unconditional and conditional modelsUnconditional model of the mean.

yj = β0 + ej

0 x0

y

x

y

σe2

Conditional model of the mean.

yj = β0 + β1xj + ej

0 x0

y ●

x

y

σe2

6 / 81

The unconditional model is intercept-only.There is no slope (it is flat) so the intercept β0 estimates the mean response y .Here the mean response is assumed not to depend upon x .

The conditional model is “conditional” upon an explanatory variable x .Here the mean response is assumed to be different at different values of x .

The intercept β0 estimates the mean response when x = 0.The slope β1 estimates the change in the mean per unit increase of x .

If x is mean-centered so that x = 0, the intercept β0 = y .

In the unconditional model, residual variance σ2e equals response variance, Var(y).

The conditional model has less residual variance. Part of Var(y) is explained by x .Var(y) is decomposed into two parts:a. The part explained by the straight-line relationship with x .b. The part that is unexplained residual variance σ2

e .

Residual variance of 0 would indicate all of Var(y) is explained by x .In that sense the size of the residual variance tells how closely the data fit theregression line, (how well x explains the variation in y).

OLS assumptions (1)

Residuals should be:

1. Homoskedastic.The same residual variance σ2

e at all x .

2. Un-correlated.Residuals should not depend upon each other.

These assumptions are used to derive a formulafor the variance of the slope estimator.

Var(β1) =σ2

e

(n − 1)σ2x

Its square root is the slope standard error.

00

σe2

σx2

x

y

If these assumptions are violated standard errors will be incorrect.Then confidence intervals and p-values will also be incorrect.

7 / 81

The box around the regression line represents the slope standard error.

The standard error is the standard deviation of the estimator’s sampling distribution.Small standard error indicates greater precision. Results are more repeatable.

Small standard error, (represented by a long thin box), is given bylow residual variance, more degrees-of-freedom, and greater range of x .

Including more explanatory variables in the model does not always improve it.Overfitting a specific sample loses generality.

� Residual variance goes down.

� Standard errors may increase.

– More parameters to estimate loses degrees-of-freedom.– Explanatory variables may confound each other,

reducing each variable’s unique variance.

Aim for a parsimoneous model with acceptable fit.

OLS assumptions (2)

You have to assume a functional form for the relationship.A straight line is not the only model. It may be “mis-specified”.

x

y

●●

x

y

8 / 81

These two datasets are contrived to have identical variances and covariance.Correlation is blind to the difference.

“Linear correlation” only knows about straight-lines.Fitting a straight-line regression model to both datasets:the estimates, standard errors, and p-values are identical.

You have to compare different models fitted to the same data.Compare their goodness-of-fit and test the difference.

Compared with a straight-line model, a quadratic model is a much better fit for thedata on the right. It also has much lower standard errors.

A small dataset

Wide format

1 2 3 4

1 108 96 110 1222 103 117 127 1333 96 107 106 1074 84 85 92 995 118 125 125 1166 110 107 96 917 129 128 123 1288 90 84 101 1139 84 104 100 88

10 96 100 103 10511 105 114 105 11212 113 117 132 130

Each row is one person’s repeated measures.Each column is a measurement occasion.

Long format

i j y

1 1 1081 2 961 3 1101 4 1222 1 1032 2 1172 3 127

... ... ...12 1 11312 2 11712 3 13212 4 130

i ’th personj ’th measurement occasion.

9 / 81

These data are Table 11.5 in:Maxwell & Delaney (1990) Designing Experiments and Analyzing Data.

12 children were tested at age 30, 36, 42, and 48 months,(McCarthy scale of children’s abilities).1. Is there, on average, systematic growth in ability over time?2. Is there variability in growth over time?

In wide format each row is a case: one subject’s record of observations.Long format is wide format reshaped so that repeated measures are stacked.

Long format needs extra columns for indicator variables.i and j indicate which person and which time-point each measurement belongs to.yij denotes a measurement of the i ’th person at the j ’th time-point.

Time-points j are repeated within each person i , (and vice versa).Each pair (ij) is unique because the indices are nested.

Time as an independent variable

Metrics

I Coded time-points, (eg. 0,1,2,3).

I Chronological age (eg. 30,36,42,48 months).

I Time since baseline, (eg. 0,6,12,18 months).

I Any meaningful non-decreasing measure.

i j y x

1 1 108 301 2 96 361 3 110 421 4 122 482 1 103 302 2 117 362 3 127 42

... ... ... ...12 1 113 3012 2 117 3612 3 132 4212 4 130 48

10 / 81

Mixed effects models treat time as data.Time enters the model as an independent variable.

Here variable x is time as chronological age in months.

The growth rate is the slope of the response per unit time, (per month).

These data are “strongly balanced”.Everyone has the same time-points: the same baseline times and intervals.(Here the intervals are all equal, but that is not strictly necessary).No-one has any missing time-points.

Pooled and subject-specific data

Pooled data

●●

●●

● ●

●●

●●

●●

30 36 42 48

90

100

110

120

130

x

y

Subject-specific data

● ●

●●

●●

●●

● ●

●●

●●

●●

30 36 42 48

90

100

110

120

130

x

y

11 / 81

“Pooled” data are irrespective of grouping by subject.

Subject-specific data are indicated by a “spaghetti plot”.Joining the dots that belong to a specific subject.

Pooled and subject-specific data

Pooled regression line

●●

●●

● ●

●●

●●

●●

30 36 42 48

90

100

110

120

130

x

y

Subject-specific regression lines

●●

●●

● ●

●●

●●

●●

30 36 42 48

90

100

110

120

130

x

y

12 / 81

Subject-specific regression lines often show growth “fan-in” or “fan-out”.Here there is fan-in, (except for some unusual subjects).

If the data are strongly balanced, (same time-points, none missing),the pooled regression line is the average of the subject-specific regression lines.

The intercept of the pooled line is the average subject-specific intercept.The slope is the average subject-specific slope.

Fitting a straight-line model

R: fit = lm(y ∼ x, data)

summary(fit)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 88.5000 11.6707 7.583 1.23e-09

x 0.5000 0.2949 1.695 0.0968

R: fit = lmer(y ∼ x + (x|i), data)

summary(fit)

Random effects:

Groups Name Variance Std.Dev. Corr

i (Intercept) 735.6105 27.1221

x 0.4037 0.6354 -0.90

Residual 34.8167 5.9006

Fixed effects:

Estimate Std. Error df t value Pr(>|t|)

(Intercept) 88.5000 9.3028 11.0000 9.513 1.21e-06

x 0.5000 0.2231 11.0000 2.241 0.0466

13 / 81

The upper table shows a regression model fitted to pooled data by OLS.The lower table shows a mixed-effects model fitted to subject-specific data by REML,(restricted maximum likelihood).

The coefficients of the pooled analysis are the same as the “fixed effects” of themixed-effects model, (because these data are strongly balanced).But the standard errors, and hence p-values, are different.

The growth rate (x) is non-significant in the pooled analysis.But its standard errors are incorrect because these data violate OLS assumptions.

It is (just) significant in the mixed-effects model.This is achieved by accounting for individual growth, (blocking on persons).

The mixed-effects model has some additional parameters: the “random effects”.These represent variation around the average effects due to subject-specific differences.

The intercept estimate is the expected response when time x = 0.The intercept variance is the variation in intercepts between subjects.

These things have no meaning for a subject age 0.Centre time to give meaning to the intercept and its variance.

Centering time

Centering time gives meaning to the intercept.The “centre” is 0 on a continuous scale.Centre time by subtracting a value from the time variable.

Centre x on average baseline age:xij = xij − x1

Subtract 30 months from each x value.

i j y x

1 1 108 01 2 96 61 3 110 121 4 122 182 1 103 02 2 117 62 3 127 12

... ... ... ...12 1 113 012 2 117 612 3 132 1212 4 130 18

14 / 81

Long format makes centering and scaling easy.Subtract a mean or some substantively meaningful time value close to the mean.

Choose the centre to give meaning to the intercept.For example the expected response at the average baseline age, at the overall averageage, or at some particular age.

Note: if both time x and the response y are mean-centered, (eg. standardized), thenthe intercept becomes 0, (at the point (x , y)).

Centering can change intercept variance and intercept-slope covariance,depending upon fan-in/out of subject-specific slopes.

Centering on a time where fan-in/out is large makes the intercept variance large.Changing the intercept variance also changes the intercept-slope covariance.

Some other reasons for centering time are:

1. It reduces collinearity in quadratic (and higher-order polynomial) models.2. It can change the size and direction of a TIC direct effect,(if the TIC has a significant interaction with time).

Fitting a straight-line model

R: fit = lm(y ∼ x, data)

summary(fit)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 103.5000 3.3104 31.265 <2e-16

x 0.5000 0.2949 1.695 0.0968

R: fit = lmer(y ∼ x + (x|i), data)

summary(fit)

Random effects:

Groups Name Variance Std.Dev. Corr

i (Intercept) 169.8532 13.0328

x 0.4037 0.6354 -0.41

Residual 34.8167 5.9006

Fixed effects:

Estimate Std. Error df t value Pr(>|t|)

(Intercept) 103.5000 4.0231 11.0000 25.726 3.54e-11

x 0.5000 0.2231 11.0000 2.241 0.0466

15 / 81

Re-fitting the same models as before.The only difference is x is now centered on the average baseline age.

Now the intercept represents the expected response at 30 months.

Again the coefficients of the regression model are the same as the fixed effects of themixed-effects model, but the standard errors, and hence p-values, are different.

Variation in the data

Variance-covariance matrix of repeated measures.

Cov(Y ) =

σ2

1 σ12 σ13 σ14

σ21 σ21 σ23 σ24

σ31 σ32 σ23 σ34

σ41 σ42 σ43 σ24

Variances on the diagonal.Covariances off the diagonal.

90 110 130

9011

013

0

90 110

●●

●●

100 120

●●

●●

90 110 130

9011

013

0

●●

●●

●●

●●

9011

0

●●

●●

100

120

●●● ●

90 110 130

9011

013

0

Heteroskedasticity: different variance at different time-points.Serial correlation: non-zero covariance across time.

OLS assumptions: equal variance on the diagonal, 0 covariance off the diagonal.

16 / 81

1. Why are data serially correlated?Because the same panel is measured repeatedly over time.Some individual’s measures are all relatively high, others relatively low.(The more so when there is greater difference between than within persons).Dependency upon previous observations may also come from “practice effects”.

2. Why are data heteroskedastic?Growth trajectories tend to “fan-in” or “fan-out”.(Typically fan-in during development, fan-out during decline).This makes variance of measures different at different time-points.

Highly differential growth leads to independent measures.Consistent growth patterns lead to variance-covariance structure.

The aim is to exploit patterns to account for individual growth and changein the context of many different individuals.

OLS regression of pooled data

Longitudinal data violate a statistical assumption for OLS regression.(Residuals must be IID, Independent and Identically Distributed).

I Heteroskedastic, (residual variance not identical at each time point).

I Serial correlation, (residuals depend upon previous residuals).

Consequence for OLS regression of pooled longitudinal data:

I The estimate of the slope may be correct, (provided the data are strongly balanced).

I The slope standard error is incorrect, (confidence interval and p-value are incorrect).

How to account for the variation in the data?

I Decompose the total variance into between-person and within-person variance.

I Further decompose between-person variance into variance of growth parameters (intercept, slopes).

17 / 81

Linear mixed-effects

Preliminary assumptions:

I Subjects are a random sample of a population.Results are conditional upon the sample.If the sample is random the results are unbiased population estimates.

I Everyone’s growth curve has the same functional form.Different people may have different values for the growth parameters.Assuming straight-line growth for example, different people could have different intercepts and slopes.

18 / 81

A straight line is not the only model for the average person’s growth trajectory.It’s just the simplest.

Between-person variation

Everyone has a growth curve.For example a straight-line.

Different people have different parameters.For example different intercepts and slopes.

Two kinds of parameters:The average intercept and slope.The variation in intercepts and slopes.

90

100

110

120

130

0 6 12 18

● ●

● ●

0 6 12 18

●●

●●

●●

●●

● ● ●

● ● ● ●

90

100

110

120

130

90

100

110

120

130

0 6 12 18

● ●

●●

● ●

●●

●●

●●

0 6 12 18

● ●

●● ●

19 / 81

Subject-specific means

Each subject has a meanof their own repeated measures.

The grand mean β0 is the averageof the subject-specific means.

Each subject’s own mean may deviate fromthe grand mean. β0

0 6 12 18

● ●

● ●

0 6 12 18

●●

●●

●●

●●

● ● ●

● ● ● ●

β0●

β0●

0 6 12 18

● ●

●●

● ●

●●

●●

●●

0 6 12 18

● ●

●● ●

20 / 81

Unconditional model of the mean

An unconditional regression line is a model of the grand mean, (β0 estimates y):

yj = β0 + ej

Suppose the i ’th subject’s mean deviates from the grand mean by u0i .A model of the i ’th subject’s mean, incorporating the grand mean, is:

yij = β0 + u0i + eij

Re-write as a 2-level model, where π0i represents the i ’th subject’s mean:

yij = π0i + eij

π0i = β0 + u0i

The second level is another model of the mean.Its outcome is the subject-specific means, π0i .So its intercept β0 estimates the mean of those means.

π0i are “random effects”.Here they are subject-specific means, (the means of each subject’s repeated measures).β0 is a “fixed effect”, an average of random effects.Here it is the grand mean, the mean of the subject-specific means.

21 / 81

If everyone had the same average there would be no need for random effects.The fixed effects would be ordinary regression coefficients where one size fits all.

β0 is the grand mean in the equation yj = β0 + ej .β0 is also the average of the subject-specific means π0i in the equation π0i = β0 + u0i .

The point of estimating the grand mean as the average of subject-specific means is todivide the total variance into homogeneous subgroups. It is the same idea as ANOVAwith a blocking factor in a split-plot design. The aim is to get a more correct estimateof the standard error.

Decomposing variance

yij = β0 + u0i + eij ej ∼ N(0, σ2e )

u0i ∼ N(0, σ20)

Deviations from β0 are divided into two parts:u0i is the deviation of the i ’th subject’s mean from β0.eij is the deviation of the i ’th subject at the j ’th time-pointfrom their own mean.

Within and between-person variance:

Var(yij ) = Var(β0 + u0i + eij )

= σ20 + σ2

e + 2Cov(u0i , eij )

= σ20 + σ2

e

●●

●●

●●

●●

●●

●●

● ●

● ●

0 6 12 18

β0

π0i

u0i

eij

σ20 is between-person variance of the subject-specific means.σ2

e is within-person residual variance.

Collectively these are called the variance components.

22 / 81

Between-person variation is composed of deviations u0i of subject-specific means π0i

from the grand mean β0.

Within-person variation is composed of deviations eij of a person’s scores from theirown mean π0i .

Modelling the subject-specific regressions decomposes the total variation intobetween-person and within-person components.These variance components are independent of each other, (Cov(u0i , eij ) = 0).

This decomposition is fundamental to mixed-effects models.

Fitting the unconditional model of the mean

R: fit = lmer(y ∼ 1 + (1|i), data)

summary(fit)

Random effects:

Variance Std.Dev.

(Intercept) 132.78b 11.523

Residual 71.06c 8.429

Fixed effects:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 108.000a 3.542 30.49 5.59e-12

a. β0 Average subject-specific intercept, (average of subjects means).

b. σ20 Between-person variation in intercepts, (means).

c. σ2e Average within-person variation.

23 / 81

In the R model formula:y ∼ 1 + (1|i)

the 1 denotes the intercept.

Read this as:Regress y on the intercept, but treat the intercept as a random effect grouped by i .

In other words calculate intercepts by fitting the model y ∼ 1 individually to therepeated measures of each subject i .

Longitudinal intra-class correlation

Where is most of the variation?Within groups, (people), or between, or somewhere in the middle.

ICC =σ2

0

σ20 +σ2

eProportion of total variation that is between-persons.

ICC=0 No variation between-persons, (σ20 = 0). No difference from regression of pooled data.

ICC=1 No change within-person, (σ2e = 0). People differ only in their mean level.

ICC = 132.78132.78+71.06

= 0.65

65% of the total response variation is due to differences in mean level between-persons.

24 / 81

The purpose of the unconditional model is to decompose variance.

Low ICC (< 0.2) suggests people are very similar, as if one person. There is noadvantage to grouping.

High ICC (> 0.8) suggests growth curves are flat and there is little change over time.Then there is little to be gained from repeated measures over time.

Medium ICC, (say between 0.2 and 0.8), suggests there is within-person change overtime and it is also worth grouping by persons to account for variation in changebetween-persons.

Between-person variation in intercepts and slopesSubject-specific intercepts and slopes.

β0

0 6 12 18

● ●

● ●

0 6 12 18

●●

●●

●●

●●

● ● ●

● ● ● ●

β0●

β0 ●

0 6 12 18

● ●

●●

● ●

●●

●●

●●

0 6 12 18

● ●

●● ●

Deviations from the average.

●●

eij

u0i

●●

●●

●●

●●

●●

● ●

● ●

0 6 12 18

β0

π0i

π0i is the i ’th subject’s intercept.β0 is the average of the subject-specific intercepts.

25 / 81

Th left plot shows each person’s subject-specific regression line.

The right plot highlights one person’s repeated measures and their subject-specificregression line.

Between-person difference makes each subject-specific regression line deviate from theaverage. The i ’th subject’s regression line deviates from the average intercept β0 byu0i , and from the average slope β1 by u1i .

Within-person residuals eij are deviations of a subject’s repeated measures from theirown regression line.

Straight-line model of the mean conditional upon time

The conditional regression line upon time x is:

yj = β0 + β1xj + ej

A model of the i ’th person with subject-specific deviations from the average intercept and slope:

yij = (β0 + u0i ) + (β1 + u1i )xij + eij

Re-write as a 2-level model:

Level-1: yij = π0i + π1i xij + eij

Level-2: π0i = β0 + u0i

π1i = β1 + u1i

The second level models are again models of means.The outcomes are subject-specific intercepts π0i and slopes π1i .So the intercepts β0 and β1 estimate the mean intercept and slope.

Random effects π0i and π1i are the i ’th subject’s intercept and slope.Fixed effects β0 and β1 are the averages of the subject-specific intercepts and slopes.

26 / 81

Compared with the unconditional model, this model has more random effects.In the unconditional model π0i were subject-specific means.In the conditional model π0i are subject-specific intercepts and π1i are subject-specificslopes.

To specify the model, you choose which level-1 coefficients you want to treat asrandom effects. (It doesn’t have to be all of them).

Each random effect has some variance, (due to individual differences).These are collectively called the “variance components”.

The complete set of model parameters includes both the fixed effects and the variancecomponents.

You may mainly be interested in the fixed effects. Then the variance components are“nuisance parameters”. They are used just to decompose variance so that the fixedeffects have correct standard errors.Or the variance components may be of interest in their own right.

Variance components

Variance is decomposed by subject-specific deviations intobetween-person variance:

[u0i

u1i

]∼ N

([00

],

[σ2

0σ01 σ2

1

])

leaving residual within-person variance:

eij ∼ N(0, σ2e )

σ20 is variance of subject-specific intercepts.

σ21 is variance of subject-specific slopes.σ01 is intercept-slope covariance.

σ2e is within-person residual variance.

β1

0

β0

Slope

Intercept

27 / 81

The between-person variance components are drawn from a bivariate normal to allowthe random effects to covary. Their covariance is an additional variance component.(Generally these include variances and covariances).

The plot indicates intercept-slope covariance.

Covariance implies a fan-in or fan-out pattern of trajectories.For example with negative covariance people with higher intercepts have a morenegative slope. That suggests fan-in.

When there is fan-in/out the intercept variance, and hence the intercept-slopecovariance, depends upon centering.Slopes converge and cross over at some point. Re-centering can change the size andsign of intercept-slope covariance.

Shrinkage estimators

Efficient estimator for average subject-specific parameters.

I Shrink subject-specific estimates towards their mean.The random effects are the estimates after shrinkage.

I The fixed effects are the averages of the random effects.

The amount a subject shrinks depends upon their reliability:

I The distance to the mean.

I The subject’s residual variance.

I The number of non-missing observations of the subject.

Unreliable estimates are shrunk more towards the mean.

I Individuals “borrow strength” from others in thatpopulation.

I Unreliable estimates have less influenceon the fixed effects and their standard errors.

β1

0

β0

Slope

Intercept

28 / 81

Shrinkage estimators are “efficient” in the statistical sense of having lowest variance inthe long run of repeated sampling.

The blue dots on the plot are subject-specific estimates, the grey lines are theiraverages (β0 and β1), and the arrows show the direction and amount of shrinkage.

Subject-specific estimates are considered unreliable when they are distant from themean, with large residual errors, and missing observations. These are shrunk more. Asa result the mean and variance of the whole cloud of points becomes a more reliableestimator of the population.

Shrinkage enables subjects with missing values to contribute by allowing them toborrow strength from other subjects.

Fitting the standard mixed effects model

R: fit = lmer(y ∼ x + (x|i), data)

summary(fit)

Random effects:

Groups Name Variance Std.Dev. Corr

i (Intercept) 169.8532c 13.0328

x 0.4037d 0.6354 -0.41e

Residual 34.8167f 5.9006

Fixed effects:

Estimate Std. Error df t value Pr(>|t|)

(Intercept) 103.5000a 4.0231 11.0000 25.726 3.54e-11

x 0.5000b 0.2231 11.0000 2.241 0.0466

a. β0 Average intercept.

b. β1 Average slope.

c. σ20 Intercept variance.

d. σ21 Slope variance.

e. σ01 Intercept-slope covariance, (as a correlation coefficient r01).

f. σ2e Average within-person residual variance.

Slope-on-intercept regression coefficient: r01σ1σ0

= −0.41 × 0.6354/13.0328 = −0.02

29 / 81

The complete set of model parameters includes both the fixed effects and the variancecomponents. The R summary function reports the variance components as “randomeffects” in the upper part of the table.

The R model formula has an implied intercept. It could be written as:y ∼ 1 + x + (1+x|i)

Read as:Regress y on the intercept and slope of x, but treat both the intercept and slope asrandom effects grouped by i, and allow them to covary.

The same model with covariance fixed at 0 could be specified:y ∼ x + (x||i)

The intercept-slope covariance is given as a correlation coefficient.To convert between correlation and covariance: r01 = σ01/σ0σ1.

Confidence intervals for the variance components are provided by: confint(fit)

Fitting the standard mixed effects model

STATA: . mixed y x || i: x, covariance(unstructured) reml

------------------------------------------------------------------------------

cog | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

x | .5 .2230792 2.24 0.025 .0627728 .9372272

_cons | 103.5 4.023117 25.73 0.000 95.61484 111.3852

------------------------------------------------------------------------------

------------------------------------------------------------------------------

Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]

-----------------------------+------------------------------------------------

id: Unstructured |

var(x) | .403746 .2606852 .113898 1.4312

var(_cons) | 169.854 83.11644 65.09509 443.2035

cov(x,_cons) | -3.37311 3.629633 -10.48706 3.740839

-----------------------------+------------------------------------------------

var(Residual) | 34.81665 10.0507 19.77272 61.30666

------------------------------------------------------------------------------

30 / 81

Different programs have their own syntax for specifying models, and report the sameresults in their own way.

Stata calls the intercept cons.

The option covariance(unstructured) specifies that there be no structuralconstraints applied to the variance-covariance of random effects. Here this allowsintercept-slope covariance.

The option reml specifies that parameter estimation use the REML procedure,(restricted maximum likelihood). This is the default for R.

Fitting the standard mixed effects modelMplus: VARIABLE: NAMES = i j y x ;

USEVARIABLES = i y x ;

WITHIN = x ;

CLUSTER = i ;

ANALYSIS: TYPE = TWOLEVEL RANDOM ;

MODEL: %WITHIN%

s | y ON x ;

%BETWEEN%

y WITH s ;

Two-Tailed

Estimate S.E. Est./S.E. P-Value

Within Level

Residual Variances

Y 34.823 11.342 3.070 0.002

Between Level

Y WITH

S -2.945 2.399 -1.227 0.220

Means

Y 103.500 3.852 26.870 0.000

S 0.500 0.214 2.341 0.019

Variances

Y 153.647 59.482 2.583 0.010

S 0.354 0.238 1.485 0.138

31 / 81

Here Mplus calls the intercept Y and the slope of x S.

For continuous outcome variables, (as here), Mplus uses FIML (full information

maximum likelihood) estimation.

REML versus FIML

These are methods for estimating parameters and fitting models to data.FIML = “full information maximum likelihood”. REML = “restricted maximum likelihood”.

Why REML?

Variance components estimated by FIML are biased (under-estimated) in small samples.It is because the calculation uses sample regression coefficients β.

REML aims to correct small sample bias.It estimates variance components by maximizing the likelihood of residuals without using β.The β are calculated afterwards.

REML versus FIML

REML estimates of variance components are more accurate than FIML in small samples.They become similar in larger samples.

Model comparisons based on likelihood calculated by REML cannot tell a difference in the β.Fixed-effects specification must be tested under FIML.

Program defaults:

R REML (function lmer)

Stata FIML (function mixed)

SAS REML (proc mixed)

Mplus FIML

32 / 81

The REML procedure is analogous to the correction factor (1/(n − 1)) used forestimating population variance from a random sample.

Estimating a population variance is biased in small samples because the calculationuses the sample mean.Estimating population variance components is similarly biased because the calculationuses sample regression coefficients β.

Variance is corrected using n − 1 in the denominator for the average.Variance components are corrected by avoiding β in the REML calculation.

Mplus uses FIML. To specify FIML using R:lmer(y x + (x|i), data, REML=FALSE)

Fitting a latent growth curve model (LGC)

Mplus: VARIABLE: NAMES = y1 y2 y3 y4 ;

MODEL: i s | y1@0 y2@6 y3@12 y4@18 ;

y1 (err) ;

y2 (err) ;

y3 (err) ;

y4 (err) ;

Two-Tailed

Estimate S.E. Est./S.E. P-Value

S WITH

I -2.947 3.195 -0.922 0.356

Means

I 103.500 3.852 26.870 0.000

S 0.500 0.214 2.341 0.019

Variances

I 153.668 73.024 2.104 0.035

S 0.354 0.230 1.537 0.124

Residual Variances

Y1 34.817 10.051 3.464 0.001

Y2 34.817 10.051 3.464 0.001

Y3 34.817 10.051 3.464 0.001

Y4 34.817 10.051 3.464 0.001

33 / 81

Mplus can be used to fit growth curve models in the structural equation modellingframework.

These are called “latent” growth curve models because the estimated growthparameters, (here intercept and slope), are “latent variables”, (factors).

For equivalent results between LGC and mixed effects:

1. Hold residual variances equal across time-points.2. Use the same coding for time-points.3. Fit the mixed effects model using FIML.

Latent growth curve model

y1 y2 y3 y4

i s

1 1 1 1

0 6 12 18

0 1 2 3

i

s

s

s

1i+0s

1i+6s

1i+12s

1i+18s

34 / 81

Squares are observed variables, (outcome at each wave).Circles are latent variables for “growth factors”.Equivalant to random effects:i = intercept.s = slope.

Single-headed arrows point to a regression outcome.Four regression equations solved simultaneously:y1 = iy2 = i + 6sy3 = i + 12sy4 = i + 18s

The regression coefficients (factor loadings) are fixed.They represent time-points coded to contrive a growth curve.

Double-headed arrows are variances or covariances.The arrows at each y are residual variances.The arrow between i and s is covariance.

LGC models versus mixed-effects models

Advantages of mixed-effects models

I Treats time as data in a natural way.Allows individually varying baseline times and intervals.

Advantages of latent growth curve models

I Provides several goodness-of-fit measures.

I Can link multivariate measurement models into a growth model.

I Can link growth models into a multivariate structural model.

35 / 81

The main disadvantage is LGC models don’t treat time as a variable, but as astructural constraint. It is difficult to allow individually varying time-points.

Another disadvantage is LGC models are more susceptable to convergence problems.Mixed-effects models handle missing values straightforwardly. LGC models can haveconvergence problems here.

The main advantage is LGC models are relatively easy to link into more complicatedpath models.

Multivariate measurement models

y11 y21 y31 y12 y22 y32 y13 y23 y33

f1 f2 f3

i s

36 / 81

Each time-point is a multivariate measurement model.These are linked into a latent growth curve model.

The aim of these models is greater reliability through multivariate measurementmodels.These measure what is common to the set of indicators at each time-point, and rejectdifferential sources of measurement error.

But it is necessary to establish “longitudinal measurement invariance” to be sure themeasurement models measure the same thing in the same way at each time-point.

Bivariate (cross-lagged) LGC model

y1 y2 y3 y4

iy sy

ix sx

x1 x2 x3 x4

γx

γy

37 / 81

Two growth processes each modelled by a LGC model.The models are linked by cross-lagged regressions.

These specify association at the level of growth factors.Is the slope of one process determined by the baseline level of the other process?

Laird-Ware mixed-effects model

General mixed-effects model for the i ’th person:

Yi = Xiβ + Ziui + ei

Xi is a design matrix for the fixed effects β, Zi is a design matrix for the random effects ui .The columns of Zi are a subset of the columns of Xi , (your choice of random effects).Zi must contain only TVCs, (time-varying within-subject covariates, such as time itself).The remaining columns of Xi must contain only TICs, (between-subject covariates that are constant over time).

The standard model (random intercepts and slopes) in Laird-Ware form:(No TICs are included, so the columns of Zi are all the columns of Xi ).

yi1

yi2

yi3

yi4

=

1 xi1

1 xi2

1 xi3

1 xi4

[β0

β1

]+

1 xi1

1 xi2

1 xi3

1 xi4

[u0i

u1i

]+

ei1

ei2

ei3

ei4

=

β0 + β1xi1

β0 + β1xi2

β0 + β1xi3

β0 + β1xi4

+

u0i + u1i xi1

u0i + u1i xi2

u0i + u1i xi3

u0i + u1i xi4

+

ei1

ei2

ei3

ei4

yij = β0 + β1xij + u0i + u1i xij + eij

= (β0 + u0i ) + (β1 + u1i )xij + eij38 / 81

The random terms collected together form the “composite residual”.

yij = β0 + β1xij + u0i + u1i xij + eij

= β0 + β1xij + εij

The composite residuals of a mixed-effects model are more complicated than theindependent residuals assumed for an OLS regression model.

They depend upon time x . This gives the residuals a variance-covariance structure.

The composite residual

The standard model has a “composite” residual that depends upon time:

yij = β0 + β1xij + u0i + u1i xij + eij︸ ︷︷ ︸εij

For the standard mixed-effects model:

Residual variance (diagonal elements of the variance-covariance matrix)

Var(εij ) = σ2e + σ2

0 + 2σ01xij + σ21x2

ij

Residual covariance between measurement occasions j and j′, (off-diagonal elements)

Cov(εij , εij′ ) = σ20 + σ01(xij + xij′ ) + σ2

1xij xij′

For the general mixed-effects model:

Cov(Yi ) = Zi Cov(ui )Zti + σ2

eIn

Random effects in the model induce a residual variance-covariance structure.

39 / 81

Residual variance depends upon time.It is “heteroskedastic”. It may be different at different time-points.

Random effects in a model induce a “correlation structure”,(a pattern of variances and covariances amongst the residuals).

Without random effects the variance-covariance matrix reduces to:

Cov(Yi ) = σ2eIn =

σ2e 0

. . .

0 σ2e

This represents the OLS assumptions:

� Homoskedasticity = identical variances on the diagonal.

� Un-correlated = zero covariances off the diagonal.

Random effects induce structure (patterns) in the variance-covariance matrix.The model-implied correlation structure depends upon your choice of random effects.

The aim is to choose a structure that reflects correlations in the observed data.

Time-dependent variance and heteroskedasticity

The induced variance-covariance structure depends upon time.Consequently it can reflect heteroskedasticity.

Residual variance: σ2e + σ2

0 + 2σ01xij + σ21x2

ij

Minimum: − σ01/σ21

Curvature: 2σ21

Time (x)

0 6 12 18

Res

idua

l var

ianc

e

Either side of the minimum the variance changes monotonically with time.The location of the minimum determines where variance increases or decreases.Increasing variance with time reflects a “fan-out” pattern of growth curves. Decreasing variance reflects “fan-in”.

The smaller the slope variance σ21 the less curvature and the more homoskedastic.

40 / 81

The diagonal of Cov(Yi ) is the residual variance at different time-points.Homoskedasticity assumes it is the same at all time time-points.Heteroskedasticity means it changes over time.

Residual variance depends upon time when the model includes random effects.The form this takes provides some account of heteroskedasticity in the data.

The standard model, (random intercepts and slope of time), induces a parabola.This accounts for the typical fan-in/fan-out patterns of growth curves.

The time location of the minimum variance depends upon the slope variance and theintercept-slope covariance. The slope variance is usually dominant.

Any TVCs added to the model must be added to Zi so they appear at level-1.But the induced variance-covariance also depends upon Zi .Therefore adding further TVCs makes the variance-covariance structure more complexand time-dependent.

Correlation structure

“Correlation structure” is a pattern in the variance-covariance matrix.The pattern in the block of the i ’subject’s residuals is assumed the same for all subjects.

“Unstructured” assumes no pattern.All variances and covariances may be different.

Cov(Yi ) =

σ2

1 σ12 . . .

σ21

. . .

... σ2n

“Independence” assumes a strong pattern, (the OLS assumption).All variances are equal and all covariances are 0.

Cov(Yi ) = σ2eIn =

σ2e 0

. . .

0 σ2e

41 / 81

The i ’th subject has n repeated measures, i = 1, . . . , n.

Time-dependent variance on the diagonal reflects heteroskedasticity.Time-dependent covariance off the diagonal reflects serial correlation.

Covariance patterns and serial correlation

Two ways to add covariance structure:

1. Your choice of random effects induces a certain correlation structure.

2. Some programs provide options for a range of correlation structures.These aim to account for patterns of serial correlation.

I Independence.The matrix has a diagonal structure. All variances are equal and all covariances are 0.

I Exchangeable, (compound symmetry).All variances are equal, and all covariances are equal.

I Toeplitz.All variances are equal. Covariance is the same across equal time intervals.This leads to a diagonally banded structure.

I AR(1).First-order autoregressive relationship between successive time-points: eij = ρeij−1 + wijAll variances are equal. Covariance decreases as the time interval increases.

I Unstructured.No constraints. Every variance and covariance is free to be estimated.

42 / 81

Correlation structure exploits stable patterns of residual variance-covariancein order to apply constraints and reduce the number of parameters to estimate.

It is a trade-off between model fit and degrees of freedom.

Unstructured correlation may give a better fit but the model may be unestimable.There may not be enough unique bits of information in the data to estimate all therequired parameters.

Parallel slopes modelRegression equation for the i ’th person with subject-specific deviations from the average intercept:

yij = (β0 + u0i ) + β1xij + eij

= β0 + β1xij + u0i + eij︸ ︷︷ ︸εij

Var(εij ) = σ2e + σ

20

Cov(εij , εij′ ) = σ20

Compound symmetry, (sphericity):Residual variance-covariance is not time-dependent.Variance is constant at all time-points (homoskedsticity).Covariance is equal between any pair of time-points.

R: fit = lmer(y ∼ x + (1|i), data)

summary(fit)

Random effects:

Groups Name Variance Std.Dev. Corr

id (Intercept) 136.13 11.668

Residual 57.66 7.593

Fixed effects:

Estimate Std. Error df t value Pr(>|t|)

(Intercept) 103.5000 3.8350 14.9800 26.99 4.09e-14

x 0.5000 0.1634 35.0000 3.06 0.00423

43 / 81

The parallel slopes model induces “exchangeable” correlation structure.All variances are equal, all covariances (across time-points) are equal.

Cov(Yi ) =

σ2

e + σ20 σ2

0 . . .

σ20

. . .

... σ2e + σ2

0

Also called “compound symmetry”, (or “sphericity”).

Observations separated in time are assumed to be correlated, but the correlation isassumed to be the same between any pair of time-points regardless of how far apart intime.

Compared with the standard model, (random intercepts and slopes):

� The fixed effects are the same.

� Residual variance within-person is higher.The restriction of parallel slopes does not fit so well.The slope variation has been lumped into the residual variance.

� Standard errors for fixed effects are lower.There are fewer parameters to estimate,(no slope variance or intercept-slope covariance).More degrees of freedom.

Model specification

Two sides of model specification:

1. Specify the functional form of the growth model.For example a straight-line, or a quadratic curve, etc.

2. Specify the residual variance-covariance.

This has two sides:a. Choose random effects.

Your choice induces a variance-covariance structure.b. Specify program options for variance-covariance structure, if provided.

How to choose random effects?This can be guided by model goodness-of-fit and comparison.

44 / 81

Model comparison

Assess random effects specifications by comparing nested models fitted to the same data using FIML.

I AIC, BIC, and log likelihood, (lowest is best).

I Likelihood ratio test, (chi-squared test of difference in goodness-of-fit).

R: fit1 = lmer(y ∼ x + (1|i), data, REML=FALSE)a

fit2 = lmer(y ∼ x + (x||i), data, REML=FALSE)b

fit3 = lmer(y ∼ x + (x|i), data, REML=FALSE)c

anova(fit1, fit2, fit3)

Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)

4 364.93 372.41 -178.47 356.93

5 363.42 372.78 -176.71 353.42 3.5070 1 0.06111

6 364.26 375.49 -176.13 352.26 1.1634 1 0.28077

a. Random intercept only (parallel slopes).

b. Independent random intercepts and slopes.

c. Covarying random intercepts and slopes, (unstructured).

45 / 81

Which model fits best?

BIC suggests model (a).AIC suggests model (b).Model comparison by the LR test suggests there is no significant difference betweenmodels (a) and (b), or between models (b) and (c).

Conclusion:

If the fixed effects are the main interest and the variance components are nuisanceparameters, the random intercept only model (a) might be preferred.

If the variance components are of interest the random intercept and slope model (b)might be preferred. There is no significant benefit to allowing intercept-slopecovariance.

Model comparison

Stata: mixed y x || i:a

estimates store fit1

mixed y x || i: xb

estimates store fit2

mixed y x || i: x, covariance(unstructured)c

estimates store fit3

lrtest fit1 fit2

lrtest fit2 fit3

Likelihood-ratio test LR chi2(1) = 3.51

(Assumption: fit1 nested in fit2) Prob > chi2 = 0.0611

Likelihood-ratio test LR chi2(1) = 1.16

(Assumption: fit2 nested in fit3) Prob > chi2 = 0.2808

a. Random intercept only (parallel slopes).

b. Independent random intercepts and slopes.

c. Covarying random intercepts and slopes.

46 / 81

The same model comparison procedure using LR tests in Stata.

Including covariates to explain away residual variance

In the standard 2-level model:

I Level-1 is the within-person or individual level.

I Level-2 is the between-person or group level.

The levels decompose within and between-person variance.Between-person variance is further decomposed into variance of growth parameters.

One kind of variance might be the research interest, the others a “nuisance” to be controlled.Either way, variance is explained by including covariates.

Covariates are classified according to the kinds of variation they can explain.

I Time-varying covariates (TVCs) are variables that change over time, (eg. age).They explain variation within-person.

I Time-invariant covariates (TICs) are variables that are constant over time, (eg. sex).They explain variation between-persons.

47 / 81

Level-1 describes change in the i ’th person using variables that change over time.

Level-2 describes differences in change between-persons using time-invariant variablesthat have different levels for different people.

Time-invariant variables have time-invariant effects.They explain individual differences that are constant over time.

This does not imply there is no differential growth.A straight-line model with random slopes, for example, allows people to growdifferently with a constant difference in their growth rates.

The order of the difference is determined by the model.A quadratic model, for example, allows the growth rate to change but assumesconstant 2nd-order difference in curvature.

Longitudinal dataset with a TIC

Wide format

y.1 y.2 y.3 y.4 z

205 217 268 302 137219 243 279 302 123142 212 250 289 129206 230 248 273 125190 220 229 220 81165 205 207 263 110170 182 214 268 99

96 131 159 213 113138 156 197 200 104216 252 274 298 96180 225 215 249 125

97 136 168 222 115... ... ... ... ...

Willett J. B. (1988)Review of research in education, p345-422.

Long format

i j y x z

1 1 205 0 1371 2 217 1 1371 3 268 2 1371 4 302 3 1372 1 219 0 1232 2 243 1 1232 3 279 2 1232 4 302 3 1233 1 142 0 1293 2 212 1 1293 3 250 2 1293 4 289 3 129

... ... ... ... ...

i ’th personj ’th measurement occasion.

48 / 81

A panel of 35 subjects were assessed at baseline for their cognitive function (z). Thesubjects were given an “opposites naming” task on each of four consecutive days.They were given a long list of words and had to name the opposite of words as quicklyas possible. The data were the count of how many opposites they could name in 10minutes.

y.1 are the counts of the 35 persons on day 1, and so forth.

The researcher was interested in whether clever people’s performance improved at afaster rate. Their baseline cognitive function was assumed not to change over the fourdays.

These data are strongly balanced : everyone has the same measurement times x withno missing time-points.

Variable z is a TIC. By definition it does not change over time. It needs only to bemeasured once in each person, for example at baseline.

TICs in long format must be repeated within-person at each time-point.

TICs and residual between-person variance

I A TIC can only explain between-person variation.It cannot explain within-person variation because it is constant within-person.

I Between-person variation is further decomposed into growth parameters, (eg. intercept and slopes).A TIC can be used to explain some or all of these parts, depending upon how it enters the model.

TIC effect on the intercept only:

yij = (β00 + β01zi + u0i )︸ ︷︷ ︸π0i

+ (β10 + u1i )︸ ︷︷ ︸π1i

xij + eij

Level-1: yij = π0i + π1i xij + eij

Level-2: π0i = β00 + β01zi + u0i

π1i = β10 + u1i

TIC effects on the intercept and slope:

yij = (β00 + β01zi + u0i )︸ ︷︷ ︸π0i

+ (β10 + β11zi + u1i )︸ ︷︷ ︸π1i

xij + eij

Level-1: yij = π0i + π1i xij + eij

Level-2: π0i = β00 + β01zi + u0i

π1i = β10 + β11zi + u1i

49 / 81

TICs appear as level-2 covariates.

TICs are assumed to stay constant within subjects.It makes no sense for TICs to have random effects within subjects.They don’t change within subjects so they can’t change differently between subjects.

One subject’s constant TIC value may be different from another’s.So there may be a TIC effect between subjects.For example sex may have a fixed effect upon the slope.The slope may be different between female and male.

TIC direct effect and cross-level interaction

TIC effect on the intercept enters the model as a direct (main) effect.

yij = (β00 + β01zi ) + β10xij + εij

= β00 + β10xij + β01zi︸ ︷︷ ︸direct effect

+εij

R: lmer(y ∼ 1 + x + z + (1+x|i), data)

lmer(y ∼ x + z + (x|i), data) # shorthand (implied intercept)

TIC effects on the intercept and slope of time enter the model as a direct effectand “cross-level” interaction with time, (a product term).

yij = (β00 + β01zi ) + (β10 + β11zi )xij + εij

= β00 + β10xij + β01zi︸ ︷︷ ︸direct effect

+ β11zi xij︸ ︷︷ ︸interaction

+εij

R: lmer(y ∼ 1 + x + z + z:x + (1+x|i), data)

lmer(y ∼ x * z + (x|i), data) # shorthand

50 / 81

Error terms are collected into a composite residual εij for convenience.

The models include random intercept and slope of time (x) and their covariance.The R formula syntax tries to look like the model equation.

The cross-level interaction describes how an individual-level variable such as the slopeof time (at level-1) is moderated by a group-level variable such as a TIC, (at level-2).

Interactions depend upon their constituent direct effects and how they are centered.If interaction x:z is significant, the effect of x is conditional upon the value z iscentered on, and vice versa.

Centering

yij = (β00 + β01zi ) + (β10 + β11zi )xij + εij

Re-centering time xij

I Changes the intercept β00 and the TIC effect on the intercept β01.

I Changes the intercept variance and intercept-slope covariance.The direction of change depends upon fan-in/out.

Re-centering TIC zi

I Changes the intercept β00 and the slope of time β10but not the TIC effects on the intercept and slope, (β01 and β11).

Mean-centering the TIC

I Preserves the intercept β00 and the slope of time β10 as they were before including the TIC.

I Sets the reference level of z to be the “average” person.

51 / 81

Re-centering time can change the size and sign of the TIC’s direct effect β01.

Mean-centering the TIC exactly preserves the intercept and slope only if itsdistribution is symmetrical.

Mean-centering keeps the TIC in its native units.Standardizing the TIC mean-centres it and scales it into standard deviation units.

Centering: direct effects on the intercepty = (β0 + β2z) + β1x

y

x

z

x

y

52 / 81

The regression line becomes a surface when there are two or more independents.Re-centering a variable shifts the surface as a whole along the axis of the variable.Partial regression coefficients are lines on the surface.

The surface is planar if the model contains no product or polynomial terms.

Re-centering z changes the y -on-x intercept β0.The y -on-x slope β1 remains constant.

Centering: interaction effects on the slopey = (β0 + β2z) + (β1 + β3z)x

y

x

z

x

y

53 / 81

A product term causes the regression surface to be curved.

Re-centering z changes both the y -on-x intercept β0 and slope β1.

The relationship between y and x is different for different values of z.z “interacts” with or “moderates” the effect of x upon y .

At different levels of z, (when it is re-centered), the slope of x changes its effect sizeand may also change sign, (direction).

At a certain level of z, the intercept and slope of x are as they were before z wasincluded. This is the mean of z if it is symmetrically distributed, since its effectsbalance out.

Hence the intercept and slope of x are effects for the average person, in the sense ofaverage on all unobserved TICs.

TIC direct effect only

yij = (β00 + β01zi ) + β10xij + εij

R: lmer(y ∼ x + z + (x|i), data)

Random effects:

Groups Name Variance Std.Dev. Corr

i (Intercept) 1302.8d 36.09

x 132.4e 11.51 -0.53fResidual 159.5g 12.63

Number of obs: 140, groups: i, 35

Fixed effects:

Estimate Std. Error df t value Pr(>|t|)

(Intercept) 164.374a 6.357 30.720 25.857 < 2e-16

x 26.960b 2.167 34.000 12.443 3.29e-14

z 7.288c 5.312 33.000 1.372 0.179

0

β00

x

y

●●

●●

●●

● ●

●●

●●

● ●

zz + σz

Assuming time is centered on the baseline and the TIC is standardized:

a. b. β00 β10 Intercept and slope of the average person’s growth curve.

c. β01 Change in intercept β00 per unit of the TIC z.

d. e. σ20 σ

21 Unexplained intercept and slope variance.

f. σ01 Intercept-slope covariance (as a correlation).

g. σ2e Unexplained within-person variance.

54 / 81

TIC direct effect and cross-level interaction

yij = (β00 + β01zi ) + (β10 + β11zi )xij + εij

R: lmer(y ∼ x * z + (x|i), data)

Random effects:

Groups Name Variance Std.Dev. Corr

id (Intercept) 1236.4e 35.16

x 107.2f 10.36 -0.49gResidual 159.5h 12.63

Number of obs: 140, groups: i, 35

Fixed effects:

Estimate Std. Error df t value Pr(>|t|)

(Intercept) 164.374a 6.206 33.000 26.486 < 2e-16

x 26.960b 1.994 33.000 13.521 5.33e-15

z -1.403c 6.228 33.000 -0.225 0.8231

x:z 5.349d 2.001 33.000 2.673 0.01160

β00

x

y

●●

●●

●●

● ●

●●

●●

● ●

zz + σz

Assuming time is centered on the baseline and the TIC is standardized:

a. b. β00 β10 Intercept and slope of the average person’s growth curve.

c. d. β01 β11 Change in intercept β00 and slope β10 per unit of the TIC z.

e. f. σ20 σ

21 Unexplained intercept and slope variance.

g. σ01 Intercept-slope covariance (as a correlation).

h. σ2e Unexplained within-person variance.

55 / 81

Pseudo R-squared

There is no overall R squared.

I Different parts of the model explain different variance components.

I TVCs explain within-person variance at level-1.TICs explain between-person variance at level-2.

A “pseudo R-squared” calculates change in a particular variance component.

I Compare nested models fitted to the same data.

I Calculate proportional reduction in unexplained variance accounted for by explanatory variables.

I The pseudo R-squared is an effect size measure for the explanatory variables.

56 / 81

The change in variance explained calculated by a pseudo R-squared can be due to asingle explanatory variable, of a block of explanatory variables with the usual caveatsabout collinearity.

Pseudo R-squared for the effect of timeR: fit0 = lmer(y ∼ 1 + (1|i), data)

Random effects:

Groups Name Variance Std.Dev.

i (Intercept) 602.8 24.55

Residual 1583.7 39.80

R: fit1 = lmer(y ∼ x + (x|i), data)

Random effects:

Groups Name Variance Std.Dev. Corr

i (Intercept) 1198.8 34.62

time 132.4 11.51 -0.45

Residual 159.5 12.63

(σ2efit0

− σ2efit1

)

σ2efit0

=(1583.7 − 159.5)

1583.7= 0.90

R: anova(fit0, fit1)

Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)

3 1466.1 1475 -730.06 1460.1

6 1287.4 1305 -637.68 1275.4 184.76 3 < 2.2e-16

57 / 81

90% of within-person variation in repeated measures is explained by change over time.

The first model is unconditional means, the second is means conditional upon time.

The second model has less within-person residual variance σ2e than the unconditional

means model because the slope explains how the mean changes conditional upon time.

The anova function carries out a formal test of the difference in goodness-of-fit interms of the likelihoods.

Pseudo R-squared for the slope effect of a TICR: fit1 = lmer(y ∼ x + z + (x|i), data)

Random effects:

Groups Name Variance Std.Dev. Corr

i (Intercept) 1302.8 36.09

x 132.4 11.51 -0.53

Residual 159.5 12.63

R: fit2 = lmer(y ∼ x * z + (x|i), data)

Random effects:

Groups Name Variance Std.Dev. Corr

i (Intercept) 1236.4 35.16

x 107.2 10.36 -0.49

Residual 159.5 12.63

(σ21fit1

− σ21fit2

)

σ21fit1

=(132.4 − 107.2)

132.4= 0.19

R: anova(fit1, fit2)

Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)

7 1287.8 1308.3 -636.88 1273.8

8 1282.9 1306.4 -633.45 1266.9 6.8603 1 0.008813

58 / 81

19% of between-person variation in slopes is explained by the TIC.

The first model includes the TIC as a direct effect only, the second includes the TIC asa direct effect and interaction with time. In this way model comparison assesses theTIC’s explanatory effect on slope variance with the effect on intercept variancecontrolled.

The within-person residual is unchanged because the TIC explains onlybetween-person variation.

Note: proportional reduction in intercept variance can be negative! This is becauseintercept variance can increase when a TIC direct effect is added due to intercept-slopecovariance and depending upon fan-in/out. Centering time can minimise this.

Prediction from LMM

Subject-specific prediction

R: fit = lmer(y ∼ x + (x|i), data)

predict(fit, data.frame(x=0:3, i=1))

Group-averaged prediction

R: fit = lmer(y ∼ x + (x|i), data)

predict(fit, data.frame(x=0:3), re.form=NA)

R: g = cut(z, breaks=quantile(z), inc=TRUE,

labels=1:4)

fit = lmer(y ∼ x * g + (x|i), data)

predict(fit, data.frame(x=0:3, g="4"), re.form=NA)

100

150

200

250

300

Time

0 1 2 3

Subject−specific (i=1)Group−averaged (sample)Group−averaged (4th quartile z)

59 / 81

R function predict returns predicted response y -values.There are plotted against the corresponding time x-values.

Subject-specific prediction is the model-predicted response for a specific subject.Here the response of subject ID 1 is predicted at time-points x = 0, 1, 2, 3.

Group-averaged prediction is the model-predicted response averaged over a group ofsubjects.

In the first example the group is the whole sample and prediction is for the sampleaveraged subject.For a linear mixed model this is equivalent to an estimate of the population averagedsubject.(This is not neccessarily true for a generalized linear mixed model with a non-linearlink function).

In the second example the group of subjects are in the 4’th quartile of the TIC z.We first derive a factor to indicate which quartile of z each subject belongs to.This is used in the model to dummy-out subjects not included in the average.

Balance

Data are:

I Balanced if all cells of the design (here persons) are complete.

I Unbalanced if some cells are missing some observations.

Longitudinal data are:

I Strongly balanced if everyone is measured at the same time-points.

Also called a time-structured design.

I Weakly balanced if people are measured at different times.

Accelerated or staggered baseline designs have individually varying baseline times but similar intervals.

I Unbalanced if some individuals are missing at some measurement occasions.

The pattern of missing observations in unbalanced longitudinal data may be:

I Monotonic, meaning attrition or drop-out.

I Non-monotonic, meaning an intermittent pattern of missing observations.

60 / 81

The terminology “strongly” and “weakly” balanced data comes from Stata.

Ignorable missingness

Missing observations are “ignorable” under maximum likelihood estimation if the data are:

I Missing completely at random (MCAR).

I Missing at random (MAR).“Covariate dependent” and ignorable if the model includes the necessary covariates.

Missing observations cannot be ignored if the data are:

I Missing not at random (MNAR).

There is no test for MNAR missingness.

I Complete-case analysis, (discarding cases with missing values), risks selection bias in the remaining sample.

I Multiple imputation.

I Sensitivity analysis.

61 / 81

MCAR/MAR/MNAR is Rubin’s classification of kinds of missingness.

Missing data are “ignorable” in the sense that it will not bias parameter estimates.

Unbalance affects variance components first. Standard errors are sensitive tomissingness. If too much data is missing programs may provide parameter estimatesbut report: “std errors cannot be estimated”.

Multiple imputation means average the results of several analyses, each using datawith missing values filled-in by some method. For example “value at last wave carriedforward”, and other methods.

Sensitivity analysis means explore the sensitivity of parameter estimates to dataperturbation. For example bootstrap estimate the sampling distribution of parameterestimates using datasets with missing values randomly imputed to the limits ofplausibility.

A weakly balanced dataset

id wave age cage read emot anti

1 1 6.830 -3.170 2.100 -0.142 0.9791 2 8.830 -1.170 2.900 -0.142 2.9281 3 10.830 0.830 4.500 -0.142 1.6291 4 12.830 2.830 4.500 -0.142 2.2782 1 6.500 -3.500 2.300 -1.468 -0.9702 2 8.500 -1.500 4.500 -1.468 0.3292 3 10.500 0.500 4.200 -1.468 -0.9702 4 12.500 2.500 4.600 -1.468 -0.3203 1 7.420 -2.580 2.300 -0.584 -0.3203 2 9.420 -0.580 3.800 -0.584 -0.3203 3 11.420 1.420 4.300 -0.584 0.3293 4 13.420 3.420 6.200 -0.584 -0.320

... ... ... ... ... ... ...

Curran, P.J. (2007) Comparing three modern approaches to longitudinal data analysis:An examination of a single developmental sample.

Children’s reading ability was tested on 4 occasions at 2-year intervals.

id Indexes people.

wave Indexes time-points.

age Child’s age at testing. The time variable.

cage Child’s age centered on 10 years. (Sample mean age = 9.9 years).

read Reading ability. The dependent variable.

emot Emotional support at home. A standardized TIC.

anti Antisocial behavior. A TVC standardized at baseline.

62 / 81

Growth in children’s reading ability.What is the growth rate?What is the variation in children’s growth rates?Is a child’s rate of growth in reading ability related to the level of emotional supportthey receive at home?

Variables are easily transformed in long format.

The TIC is standardized by subtracting the sample mean and dividing by the samplestandard deviation.

The TVC is standardized at baseline, (the first wave of measurements), by subtractingthe baseline mean and dividing by the baseline standard deviation.

A weakly balanced dataset

N = 221

Age at baseline:

Mean(SD) = 6.89(0.62) yearsRange = 6.00 to 8.00 years

Interval between successive measurement occasions:

Mean(SD) = 2.00(0) years

Repeated measures for two individuals:

−4 −2 0 2 4

2

4

6

8

Age − 10 years

Rea

ding

abi

lity

63 / 81

Individuals are all assumed to have the same form of growth curve as each other overthe whole time range.

Times where a person is not observed are treated as ignorable missing values.

Quadratic growth model

y = β0 + β1x + β2x2 + ε

dy/dx = β1 + 2β2x

d2y/dx2 = 2β2

Linear coefficient β1 is the slope where x = 0.Quadratic coefficient β2 is the curvature, (rate of change of the slope).

Positive curvature indicates the curve is convex and has a minimum.Negative curvature indicates the curve is concave and has a maximum.The curve has a turning-point (minimum or maximum) at:

x = −β1/2β2

Predicted growth curve for the average person:

−4 −2 0 2 4

2

4

6

8

Age − 10 years

Rea

ding

abi

lity

64 / 81

In a straight-line model the linear term, (the slope), is constant.In a quadratic model the slope depends upon time (x).It changes over time following a curved (parabolic) trajectory.

It is important to centre time to give meaning to the linear coefficient β1.For example if x is mean-centered the slope β1 represents the rate of changeof people of average x .

Mean-centering time minimises collinearity between the linear and quadratic terms.This makes their standard errors smaller.

Model comparisons

1. Assess fixed effects specification.

R: fit1 = lmer(read ∼ cage + (1|id), data, REML=FALSE)

fit2 = lmer(read ∼ cage + I(cage^2) + (1|id), data, REML=FALSE)

anova(fit1, fit2)

Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)

4 2192.7 2211.8 -1092.3 2184.7

5 2065.2 2089.2 -1027.6 2055.2 129.45 1 < 2.2e-16

2. Assess random effects specification.

R: fit1 = lmer(read ∼ cage + I(cage^2) + (1|id), data, REML=FALSE)

fit2 = lmer(read ∼ cage + I(cage^2) + (cage|id), data, REML=FALSE)

fit3 = lmer(read ∼ cage + I(cage^2) + (cage+I(cage^2)|id), data, REML=FALSE)

anova(fit1, fit2, fit3)

Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)

5 2065.2 2089.2 -1027.61 2055.2

7 1916.9 1950.4 -951.46 1902.9 152.301 2 < 2.2e-16

10 1887.7 1935.5 -933.84 1867.7 35.242 3 1.083e-07

65 / 81

Model specification:

1. Which is better, a straight line or a quadratic curve?Compare their goodness-of-fit and test the difference.

Conclusion: the quadratic model has the better fit, (lower BIC and AIC),and the difference in fit between the models is significant, (p < .001).

2. Which terms of the quadratic model should be random effects,(allowed to vary between subjects)?

Conclusion: the model with random intercept, slope, and curvature has the best fit,(lowest BIC and AIC). It fits significantly better than the model with random interceptand slope only, (p < .001).

Fitting the random intercepts model

R: lmer(read ∼ cage + I(cage^2) + (1|id), data, REML=FALSE)

Random effects:

Groups Name Variance Std.Dev.

id (Intercept) 0.6796 0.8244

Residual 0.3474 0.5894

Fixed effects:

Estimate Std. Error df t value Pr(>|t|)

(Intercept) 4.663325 0.062831 284.50 74.22 <2e-16

cage 0.530944 0.008867 674.50 59.88 <2e-16

I(cage^2) -0.048672 0.004073 663.80 -11.95 <2e-16

Predicted growth curve the average personand two individuals:

−4 −2 0 2 4

2

4

6

8

Age − 10 yearsR

eadi

ng a

bilit

y

66 / 81

This model has a random intercept only.

The dashed lines show subject-specific predicted growth curves for two individuals.They are parallel because subjects vary in their intercepts only.

Fitting the random intercepts and slopes model

R: lmer(read ∼ cage + I(cage^2) + (cage+I(cage^2)|id), data, REML=FALSE)

Random effects:

Groups Name Variance Std.Dev. Corr

id (Intercept) 0.9302026 0.96447

cage 0.0152528 0.12350 0.80

I(cage^2) 0.0005453 0.02335 -0.95 -0.56

Residual 0.2272376 0.47669

Fixed effects:

Estimate Std. Error df t value Pr(>|t|)

(Intercept) 4.658715 0.069222 221.10 67.30 <2e-16

cage 0.530094 0.010942 219.30 48.45 <2e-16

I(cage^2) -0.047094 0.003831 341.00 -12.29 <2e-16

Predicted growth curve the average personand two individuals:

−4 −2 0 2 4

2

4

6

8

Age − 10 yearsR

eadi

ng a

bilit

y

67 / 81

This model has random intercept, slope, and curvature.Subject-specific predicted growth curves are not parallel because intercept, slope, andcurvature are allowed to vary between subjects.

The fixed effects are (close to) the same as the random intercept model.The residual variance is lower because more variance is explained.But standard errors are (mostly) a bit larger because there are more parameters toestimate, (fewer degrees-of-freedom).

Conclusions

Is there growth in reading ability?

The expected reading ability score at age 10 is 4.66 (units of reading ability).The average rate of change in reading ability at age 10 is 0.53 units per year.This rate is decelerating by −2 ∗ 0.047 = −0.094 units per year.Reading ability is predicted to reach a maximum at age −0.53/(2 ∗ −0.047) + 10 = 15.6 years.

Is there individual variability in growth?

Variance in reading ability at age 10 is 0.93 (squared units of reading ability).Variance in rate of change in reading ability at age 10 is 0.015.Variation in curvature is small.Intercept-slope correlation is positive, indicating children with higher reading ability at age 10also have higher rate of change in reading ability at age 10.

68 / 81

Quadratic growth model with a TIC

Quadratic regression equation for the i ’th person with subject-specific TIC effectson the intercept, slope, and curvature:

yij = (β00 + β01zi + u0i )︸ ︷︷ ︸π0i

+ (β10 + β11zi + u1i )︸ ︷︷ ︸π1i

xij + (β20 + β21zi + u2i )︸ ︷︷ ︸π2i

x2ij + eij

Level-1: yij = π0i + π1i xij + π2i x2ij + eij

Level-2: π0i = β00 + β01zi + u0i

π1i = β10 + β11zi + u1i

π2i = β20 + β21zi + u2i

69 / 81

A TIC can be used to explain some or all of the between-person variation in randomeffects, depending upon how it enters the model.

The model equations show the TIC (z) enters the level-2 model for each randomeffect.It enters as a direct effect on the intercept, and as interactions with the slope andcurvature.

Fitting the quadratic model with TIC effects

R: lmer(read ∼ (cage + I(cage^2)) * emot + (cage+I(cage^2)|id), data, REML=FALSE)

Random effects:

Groups Name Variance Std.Dev. Corr

id (Intercept) 0.8849179 0.94070

cage 0.0129180 0.11366 0.79

I(cage^2) 0.0005633 0.02373 -0.93 -0.52

Residual 0.2271207 0.47657

Fixed effects:

Estimate Std. Error df t value Pr(>|t|)

(Intercept) 4.657160a 0.067717 220.90 68.774 < 2e-16

cage 0.530065b 0.010463 219.80 50.659 < 2e-16

I(cage^2) -0.046579c 0.003827 323.30 -12.171 < 2e-16

emot 0.221029d 0.067730 220.60 3.263 0.00128

cage:emot 0.048090e 0.010412 217.20 4.619 6.61e-06

I(cage^2):emot -0.005222f 0.003793 322.40 -1.377 0.16954−4 −2 0 2 4

2

3

4

5

6

Age − 10 years

Rea

ding

abi

lity

β01

zz + σz

a. b. c. β00 β10 β20 Expected reading ability, rate of change, and curvature for the average 10 year old.

d. e. f. β01 β11 β21 Change in β00 β10 and β20 per standard deviation of emot.

70 / 81

The TIC “emot”, (the level of emotional support at home), is specified as a directeffect and as interactions with the slope and curvature of centered age.

The direct effect and interaction with the slope are significant.

The plot shows the predicted growth curve for the average subject with average levelof emotional support, (z), and the corresponding curve if that subject were to haveone standard deviation greater emotional support.

(β01 is the size of the direct effect of a standard deviation increase in emotionalsupport, when all else is held constant at 0).

Conclusions

Is growth in reading ability related to emotional support at home?

The expected reading ability for children aged 10 with an average level of emotional supportis 4.66 units (of reading ability). At age 10 it is growing by 0.53 units per year,with growth rate decelerating by -2 * 0.046 = -0.092 units per year.

Significant growth in reading ability is associated with higher levels of emotional support.At age 10 the difference in reading ability and its growth rate for children with 1 standard deviationgreater emotional support is respectively 0.221 units (p = .001) and 0.049 units (p < .001).The deceleration in growth rate is not significantly different for children with greater support.

71 / 81

Segmented (“broken stick”) growth model

The level-1 model for a segmented growth curve with a single knot at x = k

y = β0 + β1x + β2(x − k)+

(x − k)+ =

{0 x ≤ k

x − k x > k

The growth curve has two straight-line segments:

y = β0 + β1x x ≤ k

= (β0 − β2k) + (β1 + β2)x x > k

β0 and β1 are the intercept and slope of the first segment.β2 is the change in slope between the first and second segment.

72 / 81

The segmented model has straight-line segments joined at change-points called“knots”.

The term (x − k)+ is a dummy variable contrived to break the line into two segments.The parameterisation brings out slope differences between successive segmentsas model coefficients to test.

The first segment (for x ≤ k) has equation:y = β0 + β1x

The second segment (for x > k) has equation:y = β0 + β1x + β2(x − k)y = (β0 − β2k) + (β1 + β2)x

Comparing slopes, β2 represents the change in slope between segments.

More change-points can be added by including more dummy variables.

Fitting the segmented modelR: plus = function(x, k=0) ifelse(x <= k, 0, x - k) # "plus" function for dummy variable

lmer(read ∼ cage + plus(cage) + (cage|id), data)

Random effects:

Groups Name Variance Std.Dev. Corr

id (Intercept) 0.72389 0.8508

cage 0.01587 0.1260 0.85

Residual 0.25413 0.5041

Fixed effects:

Estimate Std. Error df t value Pr(>|t|)

(Intercept) 4.76321a 0.06645 329.8 71.68 <2e-16

cage 0.71221b 0.01811 651.2 39.32 <2e-16

plus(cage) -0.35491c 0.02985 567.2 -11.89 <2e-16

−4 −2 0 2 4

2

3

4

5

6

Age − 10 years

Rea

ding

abi

lity

a. β0 Expected reading ability for the average 10 year old.

b. β1 Average rate of change in reading ability for children aged between 6 and 10.

c. β2 Change in rate of change in reading ability at age 10.

73 / 81

The “plus” function derives the dummy variable (x − k)+.

The intercept is the intercept of the first segment, (β0).cage is the slope of the first segment, (β1).plus(cage) is the additive change in slope between the first and second segments, (β2).

The model specifies the intercept and slope of the first segment as random effects.

Fitting the segmented model with TIC effects

R: plus = function(x, k=0) ifelse(x <= k, 0, x - k) # "plus" function for dummy variable

lmer(read ∼ (cage + plus(cage)) * emot + (cage|id), data)

Random effects:

Groups Name Variance Std.Dev. Corr

id (Intercept) 0.68644 0.8285

cage 0.01347 0.1161 0.86

Residual 0.25536 0.5053

Fixed effects:

Estimate Std. Error df t value Pr(>|t|)

(Intercept) 4.76159a 0.06519 334.2 73.045 < 2e-16

cage 0.71054b 0.01782 658.8 39.880 < 2e-16

plus(cage) -0.35237c 0.02982 569.3 -11.815 < 2e-16

emot 0.23124d 0.06511 332.3 3.551 0.000439

cage:emot 0.06798e 0.01860 678.6 3.655 0.000278

plus(cage):emot -0.03872f 0.02954 582.1 -1.311 0.190477

−4 −2 0 2 4

2

3

4

5

6

Age − 10 years

Rea

ding

abi

lity

zz + σz

a. b. c. Expected reading ability, rate of change, and change in rate of change for the average 10 year old.

d. e. f. Change in a., b., and c., per standard deviation of emot.

74 / 81

The TIC (“emot”) enters the model as a direct effect on the intercept of the firstsegment, and an interaction with the slope of the first segment and the change inslope between segments at the specified change-point, (age 10).

The interaction with the change in slope is not significant, suggesting a standarddeviation greater emotional support does not significantly change the growth rateafter age 10.

Using a change-point suggests the benefit of emotional support to reading ability isbefore age 10.

GLMMs

LMMs (Linear Mixed Models) are for continuous normal responses.

GLMMs (Generalized Linear Mixed Models) allows other kinds of response.

I Binary (0/1, yes/no)

I Ordinal (ordered categories)

I Counts

GLMMs generalize LMMs in the same way that GLMs generalize LMs.

1. Non-normal distribution of residual errors.For example binomial instead of normal (gaussian).

2. A link function to map predictions to the response scale.For example logit instead of identity (do nothing).

75 / 81

Longitudinal dataset with a binary response variable

i y x z

1 1 1 01 1 2 01 1 3 01 1 4 01 1 5 01 1 6 0

... ... ... ...54 1 1 154 1 2 154 0 3 154 0 4 154 0 5 154 0 6 1... ... ... ...

76 / 81

These data come from:http://www.ats.ucla.edu/stat/stata/library/gee.htm.

45 women with post-natal depression were given a patch to wear.For 17 women the patch was a placebo and for 28 it contained oestrogen.Their depression was tested six times (x) at monthly intervals,(”Edinburgh Postnatal Depression Scale”).

The depression score was dichotomised (y) and coded: 0=not depressed, 1=depressed.

z is a TIC coded: 0=placebo (control), 1=oestrogen (treatment).

Does the probability of post-natal depression change over time?Does it change differently if women are treated with oestrogen patches?

Fitting a logistic straight-line model with a TIC effect

R: fit = glmer(y ∼ x * z + (1|i), data, family=binomial(link="logit"))

cbind(Odds=exp(fixef(fit)), coef(summary(fit)))

Random effects:

Groups Name Variance Std.Dev.

i (Intercept) 6.133 2.476

Fixed effects:

Odds Estimate Std. Error z value Pr(>|z|)

(Intercept) 66.5781 4.1984 0.9891 4.2448 0.0000

x 0.5092 -0.6749 0.1992 -3.3877 0.0007

z 0.0894 -2.4148 1.1523 -2.0955 0.0361

x:z 0.8493 -0.1633 0.2510 -0.6506 0.5153

The estimates are additive effects in units on the scale of the link function.With a logit link they are in log odds units.

Exponentiate (anti-log) to see effects in odds units.

77 / 81

The R function for GLMMs is glmer.

We fit a random intercepts, (parallel slopes), model for comparison with GEE using“exchangeable” correlation structure.

There is no within-person residual variance.The variance of a binomial variable depends directly upon its mean.The model equation is a conditional mean.It carries the variance so the model does not need a separate error term.

There is a residual “deviance”.It indicates how well the data meet assumptions about how the variance depends uponthe mean. Large residual deviance is called “over-dispersion”.

Interpreting multiplicative effects

Additive effects

logOdds(y) = β0 + β1x + β2z + β3xz

= (β0 + β2z) + (β1 + β3z)x

0 1 2 3 4 5 6

x

Log Odds

(β0 + β2) + (β1 + β3)

β0 + β2

β0 + β1

β0

z=0z=1●

Multiplicative effects

Odds(y) = exp(β0 + β1x + β2z + β3xz)

= (eβ0 .eβ2z ) . (eβ1 .eβ3z )x

0 1 2 3 4 5 6

x

Odds

(eβ0eβ2)(eβ1eβ3)

eβ0eβ2

eβ0eβ1

eβ0

z=0z=1●

78 / 81

Change in response log odds per unit time is additive.z = 0: β0 + β1x x = 0, 1, 2, . . .z = 1: (β0 + β2) + (β1 + β3)x

Change in response odds per unit time is multiplicative.z = 0: eβ0 .eβ1x x = 0, 1, 2, . . .z = 1: (eβ0 .eβ2 ) . (eβ1 .eβ3 )x

The additive effect of unit time increase is the same at any time.Likewise the multiplicative effect of unit time increase is the same at any time.

The plots of model predictions help interpret effects.

R: exp( predict(fit, expand.grid(x=0:6, z=0:1), re.form=NA, type="link") )

(Intercept) eβ0 is the baseline odds, (when x = 0 and z = 0).x eβ1 is the odds multiplier per unit x , (when z = 0).z eβ2 is the multiplier for eβ0 when z = 1.x:z eβ3 is the multiplier for eβ1 when z = 1.

Each month the odds of depression diminish by:100*(1-eβ1 ) = 49% for the control group (z = 0).100*(1-eβ1 .eβ3 ) = 56.8% for the treatment group (z = 1).

The rate-of-change between groups differed by a factor of 0.85, (eβ3 ),not significantly different from 1 the factor of no change, (p=0.515).

Prediction from GLMM

Prediction on the response scale.

R: predict(fit, expand.grid(x=0:6, z=0:1), re.form=NA, type="response")

For binary outcome data this means probabilities.

1 2 3 4 5 6

0.0

0.2

0.4

0.6

0.8

1.0

xP

roba

bilit

y of

Dep

ress

ion

z=0z=1

79 / 81

The coefficients are difficult to interpret on the probability scale.Effects on response probability per unit time are not constant.They depend upon time in a non-linear way.

But the model-predicted outcome probability can be plotted against time.

Fitting a logistic model with GEE

R: glmer(y ∼ x + (1|i), data, family=binomial(link="logit"))

glm(y ∼ x, data, family=binomial(link="logit"))

geeglm(y ∼ x, id=i, data, family=binomial(link="logit"), corstr="exchangeable")

GLMM

Estimate Std. Error z value Pr(>|z|)

(Intercept) 2.9407 0.6587 4.4643 0

x -0.7888 0.1382 -5.7076 0

GLM

Estimate Std. Error z value Pr(>|z|)

(Intercept) 1.2811 0.2711 4.7251 0

x -0.4027 0.0739 -5.4524 0

GEE

Estimate Std.err Wald Pr(>|W|)

(Intercept) 1.270 0.2822 20.2475 0

x -0.350 0.0683 26.2626 0

0 1 2 3 4 5 6

0.0

0.2

0.4

0.6

0.8

1.0

x

Pro

babi

lity

of D

epre

ssio

n

GLMMGLMGEE

80 / 81

GEE (Generalized Estimating Equations) are a method of fitting models.It does not estimate variance components.It fits a regression model directly, but adjusts the standard errors to account forcorrelations in the data.

Effects and predictions using GEE are similar to GLM, but not the same as GLMM.

Effects and predictions in LMMs and GLMMs have subject-specific interpretation.This may be a specific individual, or the average of a group in the sample.

In LMMs it is equivalent to a population-averaged interpretation.In other words representing the average person in the population.

In GLMMs it is not equivalent to a population-averaged interpretation.This is due to mapping the average of random effects through a non-linear linkfunction. Predictions from GLMM must be interpreted as subject-specific.

GEE provides a method of accounting for the correlations in longitudinal datawhile also estimating effects and predictions that have a population-averagedinterpretation.

Prediction from GEE

R: fit = geeglm(y x * z, id=i, corstr="exchangeable"), data=dat,

family=binomial(link="logit")

cbind(Odds=exp(fixef(fit)), coef(summary(fit)))

predict(fit, expand.grid(x=0:6, z=0:1), re.form=NA, type="response")

Coefficients:

Odds Estimate Std.err Wald Pr(>|W|)

(Intercept) 8.7776 2.1722 0.5835 13.8608 0.0002

x 0.7080 -0.3453 0.1328 6.7582 0.0093

z 0.2737 -1.2958 0.6766 3.6682 0.0555

x:z 0.9154 -0.0884 0.1650 0.2869 0.5922

0 1 2 3 4 5 6

0.0

0.2

0.4

0.6

0.8

1.0

xP

roba

bilit

y of

Dep

ress

ion

z=0z=1

81 / 81

Specify a “working correlation structure” to account for correlation in the data.The options are:-independence

All covariances between pairs of observations within-subject are 0.Residuals are independent of each other.(This is the OLS assumption, not generally valid for repeated measures).

-exchangeableAll variances equal, and all covariances equal. (Compound symmetry).

-ar1Covariances between pairs of observations within-subject diminishas the time interval between the pair grows.

-unstructuredNo constraints. Every covariance is free to be estimated.

Trade-off model fit against degrees-of-freedom.For example “unstructured” gives a good fit but has to estimate more parameters.The more parameters you need to estimate from the same data, the fewer degrees offreedom you have left to do it with.

The z statistic is the estimate divided by its standard error. This is compared againsta normal distribution to derive the p-value.The Wald statistic is the same as the z statistic. An alternative form, (used by thegeeglm function), is the square of the estimate divided by the square of its standarderror. This is compared against a chi-squared distribution.


Recommended