Mixed Models
See the book Mixed Effects Models and Extensions in Ecology by A.F.
Zuur et al.
Steps in Analyzing Data
• Data Exploration – Examine your data for outliers
• Extreme outliers can sometimes be removed from the analysis
• A transformation may reduce the impact of outliers
– Test for collinearity • Plot explanatory variables against each other • Calculate correlation coefficients • If two explanatory variables are extremely highly correlated
(r > 0.8 or so), then you may want to remove one • In general, don’t use an explanatory variable that is a
combination of other explanatory variables (e.g., tail length, snout-vent length and total length)
Develop a Modeling Philosophy
• Seven possibilities according to Zuur et al.: – Start with a model with no interactions – if there are patterns in the
residuals, investigate why and add interaction terms to improve the model fit
– Use biological knowledge to choose interaction terms to include – Apply data exploration to see which interactions might be important – Identify the explanatory variables of most interest and include the
interaction terms for these variables – Only include the main terms and two-way interaction terms – Only include higher order interactions (three-way and higher) if you
have a good reason – Include all interactions by default *if you include interaction terms, you must include the main terms
Model Selection
• Not all explanatory variables and interactions will be significant. What do I do with the non-significant ones? – Keep them all (Whitlock and Schluter; good for simple
models) – Drop them one by one based on hypothesis testing
procedures (drop the least significant term, use anova to compare models)
– Drop them one by one and use a model selection criterion like AIC or BIC to choose the best model
– Specify a priori chosen models and compare these models with each other
AIC and BIC
• Aikake Information Criterion (AIC)
• Bayesian Information Criterion (BIC)
• These techniques are aimed at choosing the best model, even when models vary with respect to the number of parameters
• The AIC is more widely used, and we will use it exclusively in this course
• There are alternatives to the AIC, one of which is BIC but there are many others
Aikake Information Criterion (AIC)
• Derived from information theory
• AIC = -2*log(L) + 2K
• L is the likelihood, which equals the probability of the data given the model – this term will be related to how well the model fits
• K is the number of parameters
• AIC is a log likelihood penalized for the number of parameters (because adding parameters allows a better fit, all else being equal)
• AIC is useful for comparing two different models, and the model with the smallest AIC is preferred
Model Validation
• After you choose and fit your model, check that it fits correctly. For a linear regression: – Plot residuals against fitted values to assess homogeneity
– Examine the histogram of residuals to check for normality
– Plot residuals against each explanatory variable – there should be no obvious patterns
– Plot residuals against explanatory variables you did not include in the model – if you see a pattern, then you may want to consider including this variable
– Look for unduly influential data points (outliers) and see how much they are influencing the results (by running the model with and without them)
Some Rules of Thumb Moving into Generalized Models
• Generalized models are designed to overcome some of the shortcomings of linear models – if your data do not have these shortcomings, you can stick with linear models
• Always try to use the simplest model that adequately fits the data in light of the biological question – more complex models will become more and more difficult to interpret
• A problem may have more than one “correct” solution
• We will be introduced to these models, but we will not be able cover every possible approach in great detail
Problem – My Data Are Not Linear
1000 2000 3000 4000 5000
010
20
30
40
50
Depth
Bio
lum
inescence
Generalized Additive Models
• Linear Model:
• Generalized Additive Model:
iii XY
iii XfY )(
f(Xi) is a smoothing curve estimated by a LOESS (local regression) smoother or splines (piecewise polynomial functions), depending on the package
Fitting a GAM
1000 2000 3000 4000 5000
010
20
30
40
50
Depth
Bio
lum
inescence
1000 2000 3000 4000 5000
-10
010
20
30
40
Depth
Bio
lum
inescence
1000 2000 3000 4000 5000
010
20
30
40
50
Depth
Bio
lum
inescence
Code to Fit GAM setwd("~/Rexamples/Week11") ISIT <- read.csv("ISIT.csv") op <- par(mfrow = c(2,2), mar=c(5,4,1,2)) Sources16 <- ISIT$Sources[ISIT$Station == 16] Depth16 <- ISIT$SampleDepth[ISIT$Station == 16] plot(Depth16, Sources16, type="p", xlab="Depth", ylab="Bioluminescence") library(mgcv) M3 <- gam(Sources16 ~ s(Depth16, fx=FALSE, k=-1, bs="cr")) plot(M3, se=TRUE, xlab="Depth", ylab="Bioluminescence")
#s() means to use a smoother #fx=FALSE, k=-1 tells it to use cross-validation to determine the amount of smoothing #bs="cr" tells it to use a cubic regression spline
GAM graphs
1000 2000 3000 4000 5000
010
20
30
40
50
Depth
Bio
lum
inescence
1000 2000 3000 4000 5000
-10
010
20
30
40
Depth
Bio
lum
inescence
1000 2000 3000 4000 5000
010
20
30
40
50
Depth
Bio
lum
inescence
Graph with Error Bars 1000 2000 3000 4000 5000
010
20
30
40
50
Depth
Bio
lum
inescence
1000 2000 3000 4000 5000
-10
010
20
30
40
Depth
Bio
lum
inescence
1000 2000 3000 4000 5000
010
20
30
40
50
Depth
Bio
lum
inescence
M3pred <- predict(M3, se=TRUE, type="response") plot(Depth16,Sources16,type="p", xlab="Depth", ylab="Bioluminescence") I1 <- order(Depth16) lines(Depth16[I1], M3pred$fit[I1], lty=1) lines(Depth16[I1], M3pred$fit[I1]+2*M3pred$se[I1],lty=2) lines(Depth16[I1], M3pred$fit[I1]-2*M3pred$se[I1],lty=2)
predict() produces predicted values from the model
I1 <- order() is used just to order the observations from smallest to largest so the line doesn’t zig-zag all over
From the predictions: fit produces the expected y for a given x, se produces the standard error for a given x. Here 2*se is used as an approximate 95% CI
Using GAM for Hypothesis Testing
• Multiple smoothers can be included in the same model
• Hybrid models, with smoothers and linear or categorical explanatory variables can also be included
Hybrid GAM Example
• Example: bioluminescence data from two different locations
• Data: A measure of bioluminescence as a function of depth from each of two places
• The null hypothesis is that the relationship between depth and bioluminescence is the same in both places
Bioluminescence Example
500 1000 1500 2000 2500 3000
01
02
03
04
0
Station 8
Depth
So
urc
es
500 1000 1500 2000 2500 3000
01
02
03
04
0
Station 13
Depth
So
urc
es
GAM Model
• The model:
– Sourcesi = α + f(Depthi) + factor(Stationi) + εi
• Depth is fit as a smoothed function
• Station is fit as a factor
• The error is normally distributed, N(0,σ2)
GAM Code
library(mgcv) M4 <- gam(So ~ s(De) + factor(ID), subset=I1) summary(M4) anova(M4)
Using a smoother for Depth (De)
ID (which is the name of the station/location) is a factor
summary() output Family: gaussian Link function: identity Formula: So ~ s(De) + factor(ID) Parametric coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 19.198 1.054 18.207 < 2e-16 *** factor(ID)13 -12.296 1.397 -8.801 7.59e-13 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Approximate significance of smooth terms: edf Ref.df F p-value s(De) 4.849 5.904 14.77 7.08e-12 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 R-sq.(adj) = 0.695 Deviance explained = 71.9% GCV = 38.802 Scale est. = 35.259 n = 75
Factor is significant, but better to test this with anova()
Smoother is significant, and the number of df provides an indication of how much smoothing was imposed
About 72% of the variation is explained by the model
anova() output
Family: gaussian Link function: identity Formula: So ~ s(De) + factor(ID) Parametric Terms: df F p-value factor(ID) 1 77.46 7.59e-13 Approximate significance of smooth terms: edf Ref.df F p-value s(De) 4.849 5.904 14.77 7.08e-12
Here the output is the same as summary because the factor only has two levels. If the factor had more than two levels, anova() would test all levels simultaneously and give an overall p-value.
Visualize the Results
ID
De
line
ar p
redic
tor
> par(mar=c(2,2,2,2,)) > vis.gam(M4, theta=120, color=“heat”)
Note that the lines are parallel
Other Validation Steps
-15 -10 -5 0 5 10 15
-10
-50
510
theoretical quantiles
devia
nce r
esid
uals
0 5 10 15 20 25
-10
-50
510
Resids vs. linear pred.
linear predictor
resid
uals
Histogram of residuals
Residuals
Fre
quency
-15 -10 -5 0 5 10 15
05
10
15
20
25
0 5 10 15 20 25
010
20
30
40
Response vs. Fitted Values
Fitted Values
Response
> gam.check(M4)
Left column: Checks for normality of residuals Upper Right: Test for heterogeneity (heteroscedasticity) Lower Right: Ideally should be a straight line
ID
De
line
ar p
redic
tor
500 1000 1500 2000 2500 3000
01
02
03
04
0
Station 8
Depth
So
urc
es
500 1000 1500 2000 2500 3000
01
02
03
04
0
Station 13
Depth
So
urc
es
Interaction
• Add an interaction term by adding a second smoother for a subset of the data (say, only station 13)
• This second smoother will be compared with the smoother from the overall dataset
> M5 <- gam(So ~ s(De) + s(De, by = as.numeric(ID==13)) + factor(ID), subset = I1)
> anova(M5)
Interaction Family: gaussian Link function: identity Formula: So ~ s(De) + s(De, by = as.numeric(ID == 13)) + factor(ID) Parametric Terms: df F p-value factor(ID) 1 2.374 0.129 Approximate significance of smooth terms: edf Ref.df F p-value s(De) 8.073 8.608 101.88 <2e-16 s(De):as.numeric(ID == 13) 7.196 8.163 52.93 <2e-16
The interaction is significant, so the relationship between depth and bioluminescence is different between these two stations.
gam.check(M5)
-4 -2 0 2 4
-6-4
-20
24
6
theoretical quantiles
0 5 10 15 20 25 30 35
-6-4
-20
24
6
Resids vs. linear pred.
linear predictor
resid
uals
Histogram of residuals
-6 -4 -2 0 2 4 6
05
10
15
20
25
30
0 5 10 15 20 25 30 35
010
20
30
40
Response vs. Fitted Values
Response
Which Model is Better?
• Examine model validation plots from gam.check(): M5 better
• summary() shows that the second model explains 96.8% of the deviance: M5 better
• Is the interaction significant? Yes: M5 better
• Formally compare the models using AIC
• Use an F-test to compare the models (if they are simple nested models)
Obtaining and Comparing AIC
• AIC(M4)
488.56
• AIC(M5)
345.26
• The model with the interaction term has a much smaller AIC, so it’s the preferred model
Using an F-test
• > anova(M4, M5, test=“F”)
Analysis of Deviance Table Model 1: So ~ s(De) + factor(ID) Model 2: So ~ s(De) + s(De, by = as.numeric(ID == 13)) + factor(ID) Resid. Df Resid. Dev Df Deviance F Pr(>F) 1 68.151 2402.90 2 58.231 272.94 9.9198 2130 45.809 < 2.2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Summary • GAMs can overcome a major limitation of linear models by fitting a non-
linear function to the relationship between an explanatory and response variable
• The function is fit by using a smoothing algorithm. Many such algorithms exist, and we have chosen splines as the best approach. These fit local polynomial functions and then hook them together using fancy math
• The p-value is approximate for smoothing splines, so if the p-value is in the range of 0.01-0.10, it should be interpreted with caution.
• Some of the problems for linear regression are also problems for GAMs. The most important problems are non-independence of observations, heterogeneity, and nested data (so technically, the example is not entirely appropriately analyzed here).