Post on 05-Apr-2018
transcript
7/31/2019 Generalized Linear Models-1
1/29
Generalized Linear Models
7/31/2019 Generalized Linear Models-1
2/29
Generalized linear models: Exponential family
In Generalized Linear Models the response is assumed to possess a
probability distribution of Exponential distribution function:
dispersin)
momentos)(
enlacedefuncin
),()(
)(
a(
b
y
eyc
a
by
7/31/2019 Generalized Linear Models-1
3/29
Generalized linear models: Exponential family
Normal distribution has the form:
)2
1ln
2()c(y,)a(
2)(
2
1)|(
2
22
2
),()(
)(2
1ln
2
2
2
1ln
2
2
2
1ln
2
)(
2
)(
2
2
2
2
2
22
2
2
2
2
ybyuy
eee
eeyf
yca
by
yyu
uyy
yy
7/31/2019 Generalized Linear Models-1
4/29
Generalized linear models: Exponential family
Poisson Distribution has the form:
)!ln(),(1))()ln(
!)|(
),()(
)(
)!ln()ln(
yyca(ubuyy
eey
ueyf
yca
by
yuuyyu
7/31/2019 Generalized Linear Models-1
5/29
Generalized linear models: Exponential family
Binomial Distribution has the form:
ynyca(pnb
ppyy
ee
eppy
nyf
yca
by
y
npn
p
py
pynpyy
n
yny
ln),(1)))1ln(()()1
log(
)1()|(
),()(
)(ln))1ln(()
1ln(
)1ln()()ln(ln
7/31/2019 Generalized Linear Models-1
6/29
Generalized linear models: Exponential family
In Generalized Linear Models the response is assumed to possess a
probability distribution of Exponential distribution function:
)(g)))
)|(
),()(
)(
a(b(Var(Y)b(YE
eyf
yca
by
7/31/2019 Generalized Linear Models-1
7/29
Generalized linear models: Exponential family
Normal distribution has the form:
uugubYVaruy
ub
ybyuy
eee
eeyf
uy
yca
by
yyu
uyy
yy
)()a() ()(
2
2)(YE
)21ln
2()c(y,)a(
2)(
2
1)|(
2
2
2
2
2
),()(
)(2
1ln
2
2
2
1ln
2
2
2
1ln
2
)(
2
)(
2
2
2
2
2
22
2
2
2
2
7/31/2019 Generalized Linear Models-1
8/29
Generalized linear models: Exponential family
Poisson Distribution has the form:
)!ln(),(1))()ln(
!)|(
),()(
)(
)!ln()ln(
yyca(ubuyy
eey
ueyf
yca
by
yuuyyu
)ln()()exp()a() ()(
u)exp()(YE)exp()
)ln(
)ln(
uuguubYVar
ubb(u
u
u
7/31/2019 Generalized Linear Models-1
9/29
Generalized linear models: Exponential family
Binomial Distribution has the form:
y
nyca(pnb
p
pyy
ee
eppy
nyf
yca
by
y
n
pnp
p
y
pynpyyn
yny
ln),(1)))1ln(()()1
log(
)1()|(
),()(
)(ln))1ln(()1ln(
)1ln()()ln(ln
)
1
ln()()1()a()()(
)())exp1ln)(1
ln
p
ppgpnpubYVar
npb((nbp
p
7/31/2019 Generalized Linear Models-1
10/29
Probability distributions
Normal:(2)
Inverse Gaussian: (3)
Gamma: (4)
7/31/2019 Generalized Linear Models-1
11/29
Probability distributions
Negative Binomial: (5)
Poisson: (6)
Binomial:
(6)
7/31/2019 Generalized Linear Models-1
12/29
Generalized Linear Models (GLM)
General class of linear models that are made up of 3components: Random, Systematic, and Link Function
Random component: Identifies dependent variable(Y) and its probability distribution
Systematic Component: Identifies the set ofexplanatory variables (X1,...,Xk)
Link Function: Identifies a function of the meanthat is a linear function of the explanatory
variables
kkXXg 11)(
7/31/2019 Generalized Linear Models-1
13/29
Generalised linear model
If the distribution of observations is one of the distributions from theexponential family and some function of the expected value of the
observations is a linear function of the parameters then generalisedlinear model is used:
Function g is called the link function. Here is a list of the popular distributionand corresponding link functions:
binomial - logit = ln(p/(1-p))
normal - identity
Gamma - inverse
Poisson - log
Most natural way is to use =X . The optimization for this kind of functions isdone iteratively.
Xyy ))('(,,),('(())((,,)),((( 11 nn BgBgEgEg
7/31/2019 Generalized Linear Models-1
14/29
Likelihood function
)(
)()/()/(
Xp
pXfXf
7/31/2019 Generalized Linear Models-1
15/29
Likelihood function
n iXn
n
iXn
pxfpxxxL
pxfpxxxL
);(ln);,...,,(ln
);();,...,,(
21
21
7/31/2019 Generalized Linear Models-1
16/29
Likelihood function Poisson distribution
)!log()log(ln
!)|(
)!log()log(
yuuyL
ey
ueyfL
yuuyyu
Newton Raphson algorithm
7/31/2019 Generalized Linear Models-1
17/29
Likelihood ratio test
Let us assume that we have a sample of size n(x=(x1,,,,xn)) and we want to estimate aparameter vector =( 1, 2). Both 1 and 2can also be vectors. We want to test null-hypothesis against alternative one:
Let us assume that likelihood function is L(x| ). Then likelihood ratio test works asfollows: 1) Maximise the likelihood function under null-hypothesis (I.e. fixparameter(s) 1 equal to 10, find the value of likelihood at the maximum, 2)maximisethe likelihood under alternative hypothesis (I.e. unconditional maximisation), find thevalue of the likelihood at the maximum, then find the ratio:
wis the likelihood ratio statistic. Tests carried out using this statistic are called likelihoodratio tests. In this case it is clear that:
If the value of wis small then null-hypothesis is rejected. If g(w) is the the density of thedistribution for wthen critical region can be calculated using:
10111010 :against: HH
onmaximisatinedunconstraiafterparametersboththeofvaluestheare
,
onmaximisati)(dconstraineafterparamatertheofvaluetheis
),
|(/),|(
21
1011
21210
xLxLw
0 w 1
c
dwwg
0
)(
7/31/2019 Generalized Linear Models-1
18/29
Deviances
In linear model, we maximise the likelihood with full model and under the nullhypothesis. The ratio of the values of the likelihood function under twohypotheses (null and alternative) is related to F-distribution. Interpretation is that
how much variance would increase if we would remove part of the model (nullhypothesis).
In logisitc and log-linear models, again likelihood function is maximised under the nulland alternative hypotheses. Then logarithm (it is called deviance) of ratio of thevalues of the likelihood under these two hypotheses asymptotically has chi-
squared distribution:
That is the difference between maximum achievable log-likelihood and the value oflikelihood at the estimated parameters
That is the reason why in log-linear and logistic regressions it is usual to talk aboutdeviances and chi-squared statistics instead of variances and F-statistics.Analysis based on log-linear and logistic models (in general for generalisedlinear models) is usually called analyisis of deviances. Reason for this is thatchi-squared is related to deviation of the fitted model and observations.
Another test is based on Pearsons chi-squared test. It approaches asymptotically to
chi-squared with n-p (n is the number of observations and p is the numberparameters) degrees of freedom.
2 2.0 (l(y | y) log(l(y | )
X
2 (yi i)
2
Var(yi)i1
n
7/31/2019 Generalized Linear Models-1
19/29
19
Goodness of Fits: Deviance
Deviance = -2[LM
-LS
]where LM is the maximum log likelihood of the model of interestLS is the maximum log likelihood for the most complex model, which has aseparated parameter at each explanatory setting (saturated model).Deviance has approximately a chi-square distribution with df = N-pWhere N = number of observations and p = number of parameters (including
intercept).
Likelihood ratio test for model comparison between M1 and M0 (M0 is asimpler model than M
1)
Likelihood ratio = -2[L0-L1)=2[L0-LS]-{-2[L1-LS]} = Deviance0-Deviance1
7/31/2019 Generalized Linear Models-1
20/29
Model fit metrics Covariance matrix for parameters
computed from 2nd partial derivatives of the loglikelihood function
Likelihood ratio test Ratio of max square log likelihood to square log
likelihood of null hypothesis
Deviance Measure of how overdetermined the system is
Compare max of full system to max of saturatedmodel (number of parameters equals number of datapoints)
7/31/2019 Generalized Linear Models-1
21/29
Range of plausible modelsLikelihood ratio
);,f(
);,f(
2
2
0
y
y
b
With b0 the specified model and the bestmodel
Ratio of likelihood of any model to likelihood of best model
z
-y
-y
-y
2
21
2
21
2
21
2
21
-exp-exp
-exp
-exp
0
b
0
Log-likelihood ratio ln = - z2z2 = -2ln
7/31/2019 Generalized Linear Models-1
22/29
ExampleSite
Longitude
Latitude
Alt Sl Te Pp V Ndvi Soil Lc S M B P
Nicols B. -104.7 24.38 1920 2 17 450 5 90 7 8 9 3 79 80
Librado R. -104.26 24.4 2005 2 17 450 7 84 9 8 24 28 31 32
La Ermita -104.33 23.89 2169 10 17 550 6 109 9 11 47 6 13 14
Madero -104.29 24.27 1942 2 17 450 3 74 9 8 16 85 110 111
Castillo N. -104.49 24.34 1923 2 17 450 7 78 7 8 20 58 33 34
Km 188 -104.61 25.38 1501 3 21 350 4 83 2 10 29 28 27 28
Km 130 -104.51 24.99 1733 4 19 450 4 85 3 9 22 20 20 21
Las Huertas -104.29 24.27 1930 2 17 450 3 75 9 8 15 13 20 21
18 de Agosto -104.15 23.95 1866 1 17 450 7 81 9 12 17 10 20 21
El Venado -104.28 23.87 1747 4 17 450 6 83 4 8 20 10 20 21
Km 23 -104.46 24.51 2126 4 15 550 7 83 3 11 16 8 14 15
Km 73 -104.32 25.13 1284 5 21 350 4 78 9 8 1 0 1 2
Rodrguez -104.09 24.32 2064 4 17 550 6 78 2 11 15 0 78
Berros - Tuitan -104.27 23.97 1855 4 17 450 6 84 9 9 15 0 0 1
27 de Noviembre -104.49 24.22 1862 2 17 450 5 91 9 8 17 0 31 32
Km 86 -104.64 24.7 1954 6 17 450 3 89 4 8 8 4 10 11
Morcillo -104.7 24.11 1971 3 17 550 8 88 9 10 2 0 5 6
Km 43 -104.47 24.65 1908 3 17 450 7 75 3 9 12 0 3 4Zarco -
Cieneguilla -104.04 24.1 2143 5 15 550 7 82 2 9 10 0 9 10
Berros - Saltito -104.28 23.94 1858 15 17 450 6 83 9 8 6 0 1 2
Km 36 -104.7 24.27 1909 2 17 450 3 86 9 9 0 0 0 1
Zaragoza -104.16 23.87 1856 1 17 450 7 76 9 8 11 15 21 22
Entrada Guadiana -104.34 23.95 1867 8 17 550 6 83 9 11 1 0 23
Carlos R. -104.44 24.27 1867 1 17 450 5 88 7 8 15 0 5 6
Km 153 -104.53 25.12 1360 2 21 350 7 81 9 9 8 6 0 1
Km 51 -104.16 25.21 1416 24 21 250 7 79 4 9 1 0 0 1
Km 29 -104.16 25.36 1396 7 21 250 7 71 4 10 0 0 0 1
Km 237 -104.75 25.76 1905 2 17 450 6 79 9 11 5 1 0 1
Km 260 -104.89 25.96 1940 2 17 450 6 82 2 9 0 0 0 1
Km 84 -104.42 25.82 1770 3 19 350 7 75 9 11 0 0 0 1
Francisco Z. -104.07 24.22 2166 3 15 550 6 74 2 8 2 0 0 1
Km 61 -104.29 25.55 1651 7 21 250 7 77 9 9 0 0 0 1
Km 104 -104.58 25.79 1942 1 17 450 6 77 9 10 3 0 1 2
Km 76 -104.28 25.67 1817 5 19 250 4 77 4 10 1 0 01
7/31/2019 Generalized Linear Models-1
23/29
Variable distribution
M
B
P-20 0 20 40 60 80 100 120
Grasshopper count
0
2
4
6
8
10
12
14
16
18
20
Frequency
7/31/2019 Generalized Linear Models-1
24/29
Correlation
Longitude Latitude Altitude Slope Temperature Precipitation Vegetation Ndvi Soil Landcover
M. lakinus 0.09 -0.42a 0.28 -0.10 -0.19 0.34a -0.09 0.57a -0.05 0.03
B. nubilum 0.03 -0.16 0.07 -0.21 -0.05 0.04 -0.24 -0.16 0.10 -0.26
P.
nebrascensis-0.04 -0.28 0.17 -0.25 -0.17 0.14 -0.33 0.07 0.13 -0.31
7/31/2019 Generalized Linear Models-1
25/29
MulticolinearityLongitude Latitude Altitude Slope Temperature Precipitation Vegetation Ndvi Soil Landcover
Longitude 1.00 -0.37a 0.00 0.32 -0.01 -0.06 0.22 -0.30 0.09 -0.01
Latitude -0.37a 1.00 -0.45a 0.02 0.57a -0.66a -0.02 -0.37a -0.18 0.23
Altitude 0.00 -0.45a 1.00 -0.28 -0.92a 0.79a 0.04 0.30 -0.08 0.09
Slope 0.32 0.02 -0.28 1.00 0.32 -0.30 0.16 0.11 0.12 0.03
Temperature -0.01 0.57 -0.92a 0.32 1.00 -0.85a -0.04 -0.22 0.05 0.04
Precipitation -0.06 -0.64 0.79a -0.30 -0.85a 1.00 0.09 0.39a -0.03 0.08
Vegetation 0.22 -0.02 0.04 0.15 -0.04 0.06 1.00 -0.10 0.04 0.32
Ndvi -0.30 -0.37a 0.30 0.11 -0.22 0.39a -0.10 1.00 0.16 0.06
Soil 0.09 -0.18 -0.08 0.12 0.05 -0.03 0.04 0.16 1.00 -0.02
Landcover -0.01 0.23 0.09 0.04 0.05 0.08 0.32 0.06 -0.02 1.00
7/31/2019 Generalized Linear Models-1
26/29
Deviance
Species
Model Link function Deviance
M. lakinus Value d.f. Value/df
Gamma Log 2.244 7 0.335
B. nubilum
Gamma Log 11.211 9 1.080
P. nebrascensis
Gamma Log 2.835 7 0.715
95% W ld C fid I l H h i T
7/31/2019 Generalized Linear Models-1
27/29
Parameter B Std. Error
95% Wald Confidence Interval Hypothesis Test
Lower Upper Wald Chi-Square df Sig.
(Intercept) 835.919 62.1403 714.126 957.712 180.960 1 .000
[Temperature=15.00] -2.627 .5287 -3.663 -1.591 24.692 1 .000
[Temperature =17.00] -2.889 .6660 -4.195 -1.584 18.822 1 .000
[Temperature =19.00] -5.630 .5807 -6.768 -4.491 93.982 1 .000
[Temperature =21.00] 0a
. . . . . .
[Precipitation =250.00] -4.156 .3781 -4.897 -3.415 120.788 1 .000
[Precipitation =350.00] 0a
. . . . . .[Precipitation =450.00] 2.332 .4734 1.404 3.260 24.268 1 .000
[Precipitation =550.00] 0a
. . . . . .
[Vegetation=3.00] -3.481 .5117 -4.484 -2.478 46.261 1 .000
[Vegetation=4.00] -2.383 .8388 -4.027 -.739 8.072 1 .004
[Vegetation=5.00] -3.402 .5694 -4.518 -2.286 35.696 1 .000
[Vegetation=6.00] -4.161 .5299 -5.199 -3.122 61.647 1 .000
[Vegetation=7.00] -4.288 .5890 -5.442 -3.133 52.991 1 .000
[Vegetation=8.00] 0a
. . . . . .
[Soil=2.00] -.103 .3156 -.721 .516 .106 1 .745[Soil=3.00] 1.911 .2833 1.356 2.467 45.522 1 .000
[Soil=4.00] .488 .1837 .128 .848 7.052 1 .008
[Soil=7.00] .533 .2793 -.015 1.080 3.638 1 .056
[Soil=9.00] 0a
. . . . . .
[Landcover=8.00] .436 .2687 -.090 .963 2.636 1 .104
[Landcover=9.00] .369 .2951 -.210 .947 1.561 1 .212
[Landcover=10.00] -.118 .3949 -.892 .656 .089 1 .765
[Landcover=11.00] 1.666 .4357 .812 2.520 14.623 1 .000
[Landcover=12.00] 0a
. . . . . .
Longitude 8.360 .6374 7.111 9.610 172.047 1 .000
Latitude 1.366 .2438 .889 1.844 31.421 1 .000
Slope -.039 .0147 -.067 -.010 6.859 1 .009
Ndvi .122 .0113 .100 .144 117.147 1 .000
(Scale) .048b
.0122 .029 .079
Dependent Variable: M1
Model: (Intercept), Precipitation, Temperature, Vegetation, Soil, Landcover, Longitude, Latitude, Slope, Ndvi
a. Set to zero because this parameter is redundant.
b. Maximum likelihood estimate.
7/31/2019 Generalized Linear Models-1
28/29
Residual
7/31/2019 Generalized Linear Models-1
29/29
Fit