Date post: | 11-May-2015 |
Category: |
Technology |
Upload: | ben-bolker |
View: | 960 times |
Download: | 7 times |
Precursors GLMMs Results Conclusions References
Generalized linear mixed models for ecologists:coping with non-normal, spatially and temporally
correlated data
Ben Bolker
McMaster UniversityDepartments of Mathematics & Statistics and Biology
30 August 2011
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
Outline
1 PrecursorsExamplesDefinitionsANOVA vs. (G)LMMs
2 GLMMsEstimationInference
3 ResultsCoral symbiontsGlyceraArabidopsis
4 Conclusions
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
Outline
1 PrecursorsExamplesDefinitionsANOVA vs. (G)LMMs
2 GLMMsEstimationInference
3 ResultsCoral symbiontsGlyceraArabidopsis
4 Conclusions
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
Examples
Coral protection by symbionts
none shrimp crabs both
Number of predation events
Symbionts
Num
ber
of b
lock
s
0
2
4
6
8
10
1
2
0
1
2
0
2
0
1
2
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
Examples
Environmental stress: Glycera cell survival
H2S
Cop
per
0
33.3
66.6
133.3
0 0.03 0.1 0.32
Osm=12.8Normoxia
Osm=22.4Normoxia
0 0.03 0.1 0.32
Osm=32Normoxia
Osm=41.6Normoxia
0 0.03 0.1 0.32
Osm=51.2Normoxia
Osm=12.8Anoxia
0 0.03 0.1 0.32
Osm=22.4Anoxia
Osm=32Anoxia
0 0.03 0.1 0.32
Osm=41.6Anoxia
0
33.3
66.6
133.3
Osm=51.2Anoxia
0.0
0.2
0.4
0.6
0.8
1.0
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
Examples
Arabidopsis response to fertilization & clipping
panel: nutrient, color: genotype
Log(
1+fr
uit s
et)
0
1
2
3
4
5
unclipped clipped
●●●●● ●
●●
●
●●
●●
●
●
●
●●
●
●
●●●
●
●
●
●●● ●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●● ●● ●
●
●●
●
●●
●
●●
●
●
●
●
●● ●
●●
●
●
●
● ●●● ●●● ●●● ●●● ●● ●● ●● ●
●
● ●● ●●● ●●●●
●
●
●
●●
●
●
● ●
●
●
●
●●●●●● ●●
●
●●●
●
●
● ●
●
●
●
●
●
: nutrient 1
unclipped clipped
●
●
●●●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
● ●●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●● ●● ●●● ●●●● ●
●
●
●●
●
●
●
●
●
●
●●●●●●● ●●● ●●
●
●
●
●
●
●
●●
●
●
●
●●
●●●● ●●●●●●●
●
●●
●
●
●
●
●
●
●
●
●
●
: nutrient 8
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
Definitions
Generalized linear models (GLMs)
non-normal data: binary, binomial,count (Poisson/negative binomial)
non-linearity: log/exponential, logit/logistic:link function L
flexibility via linear predictor: L(response) = a + bi + cx . . .
stable, robust, fast, easy to use
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
Definitions
Random vs. fixed effects
Fixed effects (FE) Interested in specific levels (“treatments”)
Random effects (RE):2
Interested in distribution (“blocks”)ExperimentalTemporal, spatialGenera, species, genotypesIndividuals (“repeated measures”)inference on population of blocks(blocks randomly selected?)(large number of blocks [> 5 − 7]?)
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
Definitions
Random vs. fixed effects
Fixed effects (FE) Interested in specific levels (“treatments”)
Random effects (RE):2
Interested in distribution (“blocks”)ExperimentalTemporal, spatialGenera, species, genotypesIndividuals (“repeated measures”)inference on population of blocks(blocks randomly selected?)(large number of blocks [> 5 − 7]?)
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
Definitions
Random vs. fixed effects
Fixed effects (FE) Interested in specific levels (“treatments”)
Random effects (RE):2
Interested in distribution (“blocks”)ExperimentalTemporal, spatialGenera, species, genotypesIndividuals (“repeated measures”)inference on population of blocks(blocks randomly selected?)(large number of blocks [> 5 − 7]?)
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
Definitions
Random vs. fixed effects
Fixed effects (FE) Interested in specific levels (“treatments”)
Random effects (RE):2
Interested in distribution (“blocks”)ExperimentalTemporal, spatialGenera, species, genotypesIndividuals (“repeated measures”)inference on population of blocks(blocks randomly selected?)(large number of blocks [> 5 − 7]?)
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
ANOVA vs. (G)LMMs
Mixed models: classical approach
traditional approach tonon-independence
nested, randomized block,split-plot . . .
sum-of-squaresdecomposition/ANOVA:figure out treatment SSQ/df,error SQ/df
3
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
ANOVA vs. (G)LMMs
You can use an ANOVA if . . .
data are normal(or can be transformed)
responses are linear
design is (nearly) balanced
simple design (single or nested REs)(not crossed REs: e.g. year effects that apply across all spatialblocks)
no spatial or temporal correlation within blocks
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
ANOVA vs. (G)LMMs
“Modern” mixed models
Data still normal(izable), linear, butunbalanced/crossed/correlated
Balance(dispersion of observation around block mean)with(dispersion of block means around overall average)
Good for large, messy data. . . and when variation is interesting
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
ANOVA vs. (G)LMMs
Shrinkage (Arabidopsis)
● ●
●
●●
●●
● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●
Arabidopsis block estimates
Genotype
Mea
n(lo
g) fr
uit s
et
0 5 10 15 20 25
−15
−3
0
3
●●
●
●●
●●
● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●
32
10
810
43 9 9 4 6
4 2 6 10 5 7 9 4 9 11 2 5 5
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
ANOVA vs. (G)LMMs
Shrinkage (sparrows)
Log(harmonic mean pop size)
Het
eroz
ygos
ity
0.68
0.70
0.72
0.74
0.76
0.78
0.80●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
2.0 2.5 3.0 3.5 4.0 4.5
island● Hestmannøy● Sleneset● Gjerøy● Indre Kvarøy● Husøy● Selvær● Ytre Kvarøy● Aldra● Myken● Lovund● Onøy● Nesøy● Lurøy● Sundøy
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
ANOVA vs. (G)LMMs
GLMMs
Data not normal(izable), nonlinear
Standard distributions (Poisson, binomial etc.)
Specific forms of nonlinearity (exponential, logistic etc.)
Conceptually v. similar to LMMs, but harder
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
ANOVA vs. (G)LMMs
Challenges
Small # RE levels (<5–6)
Big data (> 1000 observations)
Spatial/temporal correlation structure (in GLMMs)
Unusual distributions of data (in GLMMs)
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
Outline
1 PrecursorsExamplesDefinitionsANOVA vs. (G)LMMs
2 GLMMsEstimationInference
3 ResultsCoral symbiontsGlyceraArabidopsis
4 Conclusions
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
Estimation
Penalized quasi-likelihood (PQL)1
flexible (e.g. handles spatial/temporal correlations)
least accurate: biased for small samples (low counts per block)
SAS PROC GLIMMIX, R MASS:glmmPQL
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
Estimation
Laplace and Gauss-Hermite quadrature
more accurate than PQL: speed/accuracy tradeoff
lme4:glmer, glmmML, glmmADMB, R2ADMB (AD Model Builder,gamlss.mx:gamlssNP, repeated
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
Estimation
Bayesian approaches
usually slow but flexible
best confidence intervals
must specify priors, assess convergence
specialized: glmmAK, MCMCglmm6, INLA
general: BUGS (glmmBUGS, R2WinBUGS, BRugs, WinBUGS,OpenBUGS, R2jags, rjags, JAGS)
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
Estimation
Extensions
Overdispersion Variance > expected from statistical model
Quasi-likelihood MASS:glmmPQL;overdispersed distributions (e.g. negativebinomial): glmmADMB, gamlss.mx:gamlssNP;observation-level random effects (e.g.lognormal-Poisson): lme4, MCMCglmm
Zero-inflation Overabundance of zeros in a discrete distribution
zero-inflated models: glmmADMB, MCMCglmmhurdle models: MCMCglmm
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
Inference
Wald tests/CIs
Widely available (e.g. summary())
Assume data set is large/well-behaved
Always approximate, sometimes awful; bad for varianceestimates
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
Inference
Likelihood ratio tests
Compare models (easy)
Confidence intervals — expensive and rarely available(lme4a for LMMs)
Asymptotic assumption
LMMs: F tests; estimate “equivalent” denominator df?approximations8;13: doBy:KRmodcomp
don’t really know what to do for GLMMsOK if number obs � number of parameters and
large # of blocks . . .
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
Inference
Information-theoretic approaches
Above issues apply, but less well understood4;5;7;11:AIC is asymptotic too
For comparing models with different REs,or for AICc , what is p?
“Level of focus” issue: what are you trying to predict?5;14;15
(cAIC)
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
Inference
Bootstrapping
1 fit null model to data
2 simulate “data” from null model
3 fit null and working model, compute likelihood difference
4 repeat to estimate null distribution
simulate/refit methods; bootMer in lme4a (LMMs only!),doBy:PBModComp, or “by hand”:
> pboot <- function(m0, m1) {
s <- simulate(m0)
2 * (logLik(refit(m1, s)) - logLik(refit(m0, s)))
}
> replicate(1000, pboot(fm2, fm1))
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
Inference
Bayesian inference
CIs, prediction intervals etc. computationally “free” afterestimation
Post hoc MCMC sampling:(glmmADMB, R2ADMB, lme4:MCMCsamp)
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
Inference
Bottom line
Large data: computation slow, inference easy
Bayesian computation slow, inference easy
Small data: computation fast
Problems with zero variance (blme), correlations = ±1Bootstrapping for inference?
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
Outline
1 PrecursorsExamplesDefinitionsANOVA vs. (G)LMMs
2 GLMMsEstimationInference
3 ResultsCoral symbiontsGlyceraArabidopsis
4 Conclusions
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
Coral symbionts
Coral symbionts: comparison of results
Regression estimates−6 −4 −2 0 2
Symbiont
Crab vs. Shrimp
Added symbiont
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
GLM (fixed)GLM (pooled)PQLLaplaceAGQ
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
Glycera
Glycera fit comparisons
Effect on survival (logit)
−60 −40 −20 0 20 40 60
Osm
Cu
H2S
Anoxia
Osm:Cu
Osm:H2S
Cu:H2S
Osm:Anoxia
Cu:Anoxia
H2S:Anoxia
Osm:Cu:H2S
Osm:Cu:Anoxia
Osm:H2S:Anoxia
Cu:H2S:Anoxia
Osm:Cu:H2S:Anoxia
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
MCMCglmmglmer(OD:2)glmer(OD)glmmMLglmer
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
Glycera
Glycera: parametric bootstrap results
True p value
Infe
rred
p v
alue
0.001
0.005
0.01
0.05
0.1
0.5
0.001
0.005
0.01
0.05
0.1
0.5
Osm
H2S
0.001 0.0050.01 0.05 0.1 0.5
Cu
Anoxia
0.001 0.0050.01 0.05 0.1 0.5
variable
normal
t7
t14
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
Arabidopsis
Arabidopsis results
Regression estimates−1.0 0.0 1.0
nutrient8
amdclipped
nutrient8:amdclipped
rack2
statusPetri.Plate
statusTransplant
●
●
●
●
●
●
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
Outline
1 PrecursorsExamplesDefinitionsANOVA vs. (G)LMMs
2 GLMMsEstimationInference
3 ResultsCoral symbiontsGlyceraArabidopsis
4 Conclusions
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
What about space and/or time?
if in blocks, no problem (crossed random effects)10
test residuals, try to fail to reject NH of no autocorrelation
if normal (LMM), corStruct in lme, spdep
otherwise . . . spatcounts, geoRglm, geoBUGS, . . . ???
big data9
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
Primary tools
Special-purpose:
lme4: multiple/crossed REs, (profiling): fastMCMCglmm: Bayesian, fairly flexibleglmmADMB: negative binomial, zero-inflated etc.
General-purpose:
AD Model Builder (and interfaces)BUGS/JAGS (and interfaces)INLA12
Tools are getting better, but still not easy!
Info: http://glmm.wikidot.com
Slides: http://www.slideshare.net/bbolker
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
Acknowledgements
Funding: NSF, NSERC, NCEAS
Data: Josh Banta and Massimo Pigliucci (Arabidopsis);Adrian Stier and Seabird McKeon (coral symbionts); CourtneyKagan, Jocelynn Ortega, David Julian (Glycera);
Co-authors: Mollie Brooks, Connie Clark, Shane Geange, JohnPoulsen, Hank Stevens, Jada White
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
[1] Breslow NE, 2004. In DY Lin & PJ Heagerty,eds., Proceedings of the second Seattlesymposium in biostatistics: Analysis of correlateddata, pp. 1–22. Springer. ISBN 0387208623.
[2] Gelman A, 2005. Annals of Statistics, 33(1):1–53.doi:doi:10.1214/009053604000001048.
[3] Gotelli NJ & Ellison AM, 2004. A Primer ofEcological Statistics. Sinauer, Sunderland, MA.
[4] Greven S, 2008. Non-Standard Problems inInference for Additive and Linear Mixed Models.Cuvillier Verlag, Gottingen, Germany. ISBN3867274916. URL http://www.cuvillier.de/
flycms/en/html/30/-UickI3zKPS,3cEY=
/Buchdetails.html?SID=wVZnpL8f0fbc.
[5] Greven S & Kneib T, 2010. Biometrika,97(4):773–789. URL http:
//www.bepress.com/jhubiostat/paper202/.
[6] Hadfield JD, 2 2010. Journal of StatisticalSoftware, 33(2):1–22. ISSN 1548-7660. URLhttp://www.jstatsoft.org/v33/i02.
[7] Hurvich CM & Tsai CL, Jun. 1989. Biometrika,76(2):297 –307.doi:10.1093/biomet/76.2.297. URLhttp://biomet.oxfordjournals.org/content/
76/2/297.abstract.
[8] Kenward MG & Roger JH, 1997. Biometrics,53(3):983–997.
[9] Latimer AM, Banerjee S et al., 2009. EcologyLetters, 12(2):144–154.
[10] Ozgul A, Oli MK et al., Apr. 2009. EcologicalApplications: A Publication of the EcologicalSociety of America, 19(3):786–798. ISSN1051-0761. URL http:
//www.ncbi.nlm.nih.gov/pubmed/19425439.PMID: 19425439.
[11] Richards SA, 2005. Ecology, 86(10):2805–2814.doi:10.1890/05-0074.
[12] Rue H, Martino S, & Chopin N, 2009. Journal ofthe Royal Statistical Society, Series B,71(2):319–392.
[13] Schaalje G, McBride J, & Fellingham G, 2002.Journal of Agricultural, Biological &Environmental Statistics, 7(14):512–524. URLhttp://www.ingentaconnect.com/content/
asa/jabes/2002/00000007/00000004/art00004.
[14] Spiegelhalter DJ, Best N et al., 2002. Journal ofthe Royal Statistical Society B, 64:583–640.
[15] Vaida F & Blanchard S, Jun. 2005. Biometrika,92(2):351–370.doi:10.1093/biomet/92.2.351. URLhttp://biomet.oxfordjournals.org/cgi/
content/abstract/92/2/351.
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs
Precursors GLMMs Results Conclusions References
Extras
Spatial and temporal correlation (R-side effects):MASS:glmmPQL (sort of), GLMMarp, INLA;WinBUGS, AD Model Builder
Additive models: amer, gamm4, mgcv, lmeSplines
Ordinal models: ordinal
Population genetics: pedigreemm, kinship
Survival: coxme, kinship, phmm
Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology
GLMMs