+ All Categories
Home > Documents > GLMs and extensions in R

GLMs and extensions in R

Date post: 08-Jul-2015
Category:
Upload: ben-bolker
View: 435 times
Download: 2 times
Share this document with a friend
Description:
Talk on generalized linear models in R for the inaugural meeting of the Greater Toronto
Popular Tags:
25
Generalized linear models, and extensions, in R Ben Bolker Departments of Mathematics & Statistics and Biology, McMaster University 7 January 2011 Ben Bolker (McMaster University) GLMs in R 7 January 2011 1 / 25
Transcript
Page 1: GLMs and extensions in R

Generalized linear models, and extensions, in R

Ben Bolker

Departments of Mathematics & Statistics and Biology, McMaster University

7 January 2011

Ben Bolker (McMaster University) GLMs in R 7 January 2011 1 / 25

Page 2: GLMs and extensions in R

1 Introduction

2 Example

3 Challenges, tricks, extensions

4 (Extended examples)

Ben Bolker (McMaster University) GLMs in R 7 January 2011 2 / 25

Page 3: GLMs and extensions in R

What are generalized linear models?

Modeling framework to solve two common statistical problems:

Non-normal dataNon-linearity (continuous predictors)

. . . superset of, and often confused with,“general” linear models (i.e. ANOVA/ANCOVA/regression:SAS PROC GLM)

Ben Bolker (McMaster University) GLMs in R 7 January 2011 3 / 25

Page 4: GLMs and extensions in R

GLMs: technical details

Constraints:

Distributions from exponential family(Normal, Poisson, binomial, Gamma, inverse Gaussian)Invertible nonlinearities, i.e. there exists a link function that wouldmake the relationship linear(log, logit, probit, inverse, square root, “cauchit”, . . . )

Efficient, stable algorithm: iteratively re-weighted least squares (IRLS)/ Fisher scoring)

standard methods (methods(class="glm")):coef, summary, plot, predict, residuals, vcov, profile,update, confint, simulate, anova, add1/drop1, logLik, AIC, . . .

logistic and Poisson regression probably make up 99% of GLMs . . .

Ben Bolker (McMaster University) GLMs in R 7 January 2011 4 / 25

Page 5: GLMs and extensions in R

Google scholar scraping

Ghits

binomial+regression

generalized+linear+model

Poisson+regression

logistic+regression

13500

28700

39300

580000

104 104.5 105 105.5 106

Ben Bolker (McMaster University) GLMs in R 7 January 2011 5 / 25

Page 6: GLMs and extensions in R

Example: reed frog predation data

Initial density

Fra

ctio

n ki

lled

0.0

0.2

0.4

0.6

0.8

1.0

● ●

● ●

20 40 60 80 100

Vonesh and Bolker (2005):

> library(emdbook)

> data(ReedfrogFuncresp)

> glm1 <- glm(Killed/Initial~

Initial,

weight=Initial,

family=binomial,

data=ReedfrogFuncresp)

Ben Bolker (McMaster University) GLMs in R 7 January 2011 6 / 25

Page 7: GLMs and extensions in R

Summary

> summary(glm1)

Call:

glm(formula = Killed/Initial ~ Initial, family = binomial, data = ReedfrogFuncresp,

weights = Initial)

Deviance Residuals:

Min 1Q Median 3Q Max

-4.4132 -0.7275 0.4347 1.0120 1.8172

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -0.094563 0.188952 -0.50 0.61675

Initial -0.008416 0.002697 -3.12 0.00181 **

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 47.518 on 15 degrees of freedom

Residual deviance: 37.717 on 14 degrees of freedom

AIC: 98.639

Number of Fisher Scoring iterations: 4

Ben Bolker (McMaster University) GLMs in R 7 January 2011 7 / 25

Page 8: GLMs and extensions in R

Diagnostics

−0.8 −0.6 −0.4 −0.2

−4

−2

02

Predicted values

Res

idua

ls

Residuals vs Fitted

11

13 5

●●

●●

●●

−2 −1 0 1 2−

30−

20−

100

1020

Theoretical Quantiles

Std

. dev

ianc

e re

sid.

Normal Q−Q

11

1613

−0.8 −0.6 −0.4 −0.2

01

23

45

Predicted values

Std

. dev

ianc

e re

sid.

●●

●●●●

●●

Scale−Location11

1613

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35

−4

−2

02

Leverage

Std

. Pea

rson

res

id.

Cook's distance

1

0.5

0.5

1

Residuals vs Leverage

16

11

13

diagnostics inheritfrom plot.lm

overdispersion:residual deviance≈ χ2

n−p

(Venables and Ripley,

2002, p. 209):sum(residuals(glm1,

type="pearson")^2)

=34.3:p � 0.05

Ben Bolker (McMaster University) GLMs in R 7 January 2011 8 / 25

Page 9: GLMs and extensions in R

Inference

Coefficients: may be hard to communicate (reflect differences on thescale of linear predictor, e.g. logit/log-odds differences)

Wald statistics: beware the Hauck-Donner effect(Venables and Ripley, 2002, p. 198). Wald CI of slope:stats:::confint.lm(glm1) (-0.0142,-0.0026)

Likelihood ratio test, via anova:

> anova(glm1,test="Chisq") ## OR

> glm0 <- update(glm1, . ~ -Initial)

> anova(glm1,glm0,test="Chisq")

Likelihood profiles (via MASS::profile.glm),profile confidence intervals:MASS:::confint.glm(glm1) (-0.0137,-0.0031)

Ben Bolker (McMaster University) GLMs in R 7 January 2011 9 / 25

Page 10: GLMs and extensions in R

Estimation issues

Convergence difficulties, especially with non-standard links: setstarting values, center/scale variables (?)

Complete separation: brglm, logistf, arm (bayesglm)

Big data: biglm (bigglm)

Many predictors (penalized regression):glmnet, glmpath, penalized (Machine learning task view)

Ben Bolker (McMaster University) GLMs in R 7 January 2011 10 / 25

Page 11: GLMs and extensions in R

Tricks (within GLM framework)

non-standard link functions:

fitting hyperbolic models of predator attack rates (Michaelis-Menten)via binomial/inverse link(http://emdbolker.wikidot.com/voneshglm)exponential survivorship models via binomial/log link (Strong et al.,1999; Tiwari et al., 2006)Gaussian family with log link: fit exponential growth models withconstant variance

subtleties with Gamma GLMs and dispersion parameter:V&R MASS online complements,Paul Johnson’s notes

offsets: variation in sampling area/intensity(e.g. strict proportionality)

Ben Bolker (McMaster University) GLMs in R 7 January 2011 11 / 25

Page 12: GLMs and extensions in R

Overdispersion

Quasilikelihood models:

> glmQ <- update(glm1,family="quasibinomial")

> anova(glmQ,test="F")

(φ̂ = 2.45). No likelihood: qAIC requires some contortions

extended GLMs

negative binomial: MASS (glm.nb)beta-binomial:

aod (betabin)gnlm (gnlr)VGAM (vglm)bbmle (mle2)

GLMMs: lognormal-Poisson, logit-normal-binomial

robust estimation (lmtest, sandwich):

> coeftest(glm1,vcov=sandwich)

See also the vignette for the pscl package.

Ben Bolker (McMaster University) GLMs in R 7 January 2011 12 / 25

Page 13: GLMs and extensions in R

Extensions

Generalized additive models (Wood, 2006): mgcv, gamlss

Zero-inflated/altered/hurdle models: pscl, VGAM

Beta regression: betareg

Generalized regression models: bbmle, VGAM, gnlm

Random effects (generalized linear mixed models): lme4 and otherpackages (http://glmm.wikidot.com/faq)

Ben Bolker (McMaster University) GLMs in R 7 January 2011 13 / 25

Page 14: GLMs and extensions in R

References

Strong, D.R., Whipple, A.V., et al., 1999. Ecology, 80:2750–2761.

Tiwari, M., Bjorndal, K.A., et al., 2006. Marine Ecological Progress Series,326:283–293.

Venables, W. and Ripley, B.D., 2002. Modern Applied Statistics with S.Springer, New York, 4th edition.

Vonesh, J.R. and Bolker, B.M., 2005. Ecology, 86(6):1580–1591.

Wood, S.N., 2006. Generalized Additive Models: An Introduction with R.Chapman & Hall/CRC.

Ben Bolker (McMaster University) GLMs in R 7 January 2011 14 / 25

Page 15: GLMs and extensions in R

Basic ggplot code

> qplot(Initial,Killed/Initial,data=ReedfrogFuncresp)+

geom_smooth(method=glm,family=binomial,

aes(weight=Initial,group=NA))

Ben Bolker (McMaster University) GLMs in R 7 January 2011 15 / 25

Page 16: GLMs and extensions in R

Confidence intervals on # killed, by hand

> pframe <- data.frame(Initial=1:100)

> pp <- predict(glm1,newdata=pframe,se.fit=TRUE)

> pmat <- with(pp,plogis(cbind(fit,

fit-1.96*se.fit,

fit+1.96*se.fit)))

> par(bty="l",las=1)

> with(ReedfrogFuncresp,plot(Initial,Killed/Initial,

xlim=c(0,100),ylim=c(0,1),

pch=16))

> matlines(pframe$Initial,pmat,lty=c(1,2,2),col=1,type="l")

Ben Bolker (McMaster University) GLMs in R 7 January 2011 16 / 25

Page 17: GLMs and extensions in R

Prediction intervals

● ●

● ●

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

Initial

Kill

ed/In

itial

●●

●●●●

●●●●

●●

●●

> simhack <- function(params) {

glmnew <- glm1

glmnew$coefficients <- params

## simulates on PROBABILITY scale

simulate(glmnew)[[1]]

}

> set.seed(101)

> params <- MASS::mvrnorm(1000,mu=coef(glm1),

Sigma=vcov(glm1))

> sims <- apply(params,1,simhack)

> qmat <- t(apply(sims,1,quantile,

c(0.5,0.025,0.975)))

(Constructing the simulatedvalues at Initial densities from1 to 100 is a bit more work —ideally all simulate methodswould have newdata andnewparam arguments . . . )

Ben Bolker (McMaster University) GLMs in R 7 January 2011 17 / 25

Page 18: GLMs and extensions in R

Alternative display (display, coefplot from arm

package)

−0.015 −0.010 −0.005 0.000

Initial ●

> display(glm1)

glm(formula = Killed/Initial ~ Initial, family = binomial, data = ReedfrogFuncresp,

weights = Initial)

coef.est coef.se

(Intercept) -0.09 0.19

Initial -0.01 0.00

---

n = 16, k = 2

residual deviance = 37.7, null deviance = 47.5 (difference = 9.8)

Ben Bolker (McMaster University) GLMs in R 7 January 2011 18 / 25

Page 19: GLMs and extensions in R

Beta-binomial with aod

> library(aod)

> glmBB1 <- betabin(cbind(Killed, Initial-Killed)~Initial,

random=~1,

data=ReedfrogFuncresp)

Ben Bolker (McMaster University) GLMs in R 7 January 2011 19 / 25

Page 20: GLMs and extensions in R

Beta-binomial with bbmle

> library(bbmle)

> glmBB3 <- mle2(Killed~dbetabinom(prob=plogis(logitp),

theta=exp(logtheta),size=Initial),

parameters=list(logitp~Initial),

data=ReedfrogFuncresp,

start=list(logitp=0,logtheta=0))

Ben Bolker (McMaster University) GLMs in R 7 January 2011 20 / 25

Page 21: GLMs and extensions in R

Beta-binomial with VGAM

> library(VGAM)

> glmBB4 <- vglm(cbind(Killed,Initial-Killed)~Initial,

betabinomial,

data=ReedfrogFuncresp)

> coef(glmBB4,matrix=TRUE)

Ben Bolker (McMaster University) GLMs in R 7 January 2011 21 / 25

Page 22: GLMs and extensions in R

Beta-binomial with gnlm

> library(gnlm)

> attach(ReedfrogFuncresp) ## no data= argument!

> glmBB2 <- gnlr(cbind(Killed,Initial-Killed),

dist="beta binomial",

pmu=c(0,0),pshape=0,

mu=function(p,linear) plogis(linear),

linear=~Initial)

> detach(ReedfrogFuncresp)

> detach("package:gnlm")

> detach("package:rmutil")

Ben Bolker (McMaster University) GLMs in R 7 January 2011 22 / 25

Page 23: GLMs and extensions in R

Logit-normal-Poisson with lme4

> library(lme4)

> ReedfrogFuncresp$ID <- 1:nrow(ReedfrogFuncresp)

> glmLNP <- glmer(cbind(Killed,Initial-Killed)~Initial+(1|ID),

family=binomial,

data=ReedfrogFuncresp)

> summary(glmLNP)

Ben Bolker (McMaster University) GLMs in R 7 January 2011 23 / 25

Page 24: GLMs and extensions in R

Alternate link functions for reed frog data

Initial density

Fra

ctio

n ki

lled

0.0

0.2

0.4

0.6

0.8

1.0

● ●

● ●

20 40 60 80 100

Ben Bolker (McMaster University) GLMs in R 7 January 2011 24 / 25

Page 25: GLMs and extensions in R

Comparing overdispersion estimates

initial density effect

mod

el

binomial Wald

binomial profile

q−binom Wald

sandwich

beta−binomial

LN−binomial

−0.015 −0.010 −0.005 0.000

Ben Bolker (McMaster University) GLMs in R 7 January 2011 25 / 25


Recommended