Date post: | 18-Dec-2015 |
Category: |
Documents |
View: | 226 times |
Download: | 4 times |
Datamining and statistical learning - lecture 9
Generalized linear models (GAMs)
Some examples of linear models
Proc GAM in SAS
Model selection in GAM
Datamining and statistical learning - lecture 9
Linear regression models
The inputs can be:
quantitative inputs
functions of quantitative inputs
base expansions of quantitative inputs
dummy variables
interaction terms
p
jjjn XXXYE
101 )...,,|(
Datamining and statistical learning - lecture 9
Justification of linear regression models
Many response variables are linearly or almost linearly
related to a set of inputs
Linear models are easy to comprehend and to fit to observed data
Linear regression models are particularly useful when:
• the number of cases is moderate
• data are sparse
• the signal-to-noise ratio is low
Datamining and statistical learning - lecture 9
Performance of predictors based on:
(i) a simple linear regression model
(ii) a quadratic regression model
when the true expected response is a second order polynomial
in the input
02468
1012141618
0 1 2 3 4
E(y)
yhat
02468
1012141618
0 1 2 3 4
E(y)
yhat2
Predictions based on a linear model Predictions based on a quadratic model
Datamining and statistical learning - lecture 9
Logistic regression of multiple purchases
vs first amount spent
0
0.2
0.4
0.6
0.8
1
0 1000 2000 3000 4000 5000 6000 7000
First amount spent
Observed binary response Estimated event probability
Datamining and statistical learning - lecture 9
Logistic regression for a binary response variable Y
xxXYP
xXYP
xXYE
xXYE
x
xxXYE
10
10
10
)|0(
)|1(log
)|(1
)|(log
)exp(1
)exp()|(
The expectation of Y given x is a linear function of x
0
0.2
0.4
0.6
0.8
1
0 2 4x
E(Y
| X
=x)
Datamining and statistical learning - lecture 9
Generalized additive models: some examples
A nonlinear, additive model
A mixed linear and nonlinear, additive model
A mixed linear and nonlinear, additive model with a class variable
)(...)()...,,|( 111 ppn XsXsXXYE
q
jjpjppn XXsXsXXYE
1111 )(...)()...,,|(
q
jjpjppkn XXsXskClassXXYE
1111 )(...)(),...,,|(
Datamining and statistical learning - lecture 9
Generalized additive models:Modelling the concentration of total nitrogen at Lobith on the Rhine
1989
1993
1997
2001
JanJun
Nov0
2
4
6
8
Total-N concentration
(mg N/l)
Jan Feb
Mar Apr
May Jun
Jul Aug
Sep Oct
Nov Dec
1989
1993
1997
2001
JanJun
Nov0
2
4
6
8
Total-N concentration
(mg N/l)
Jan Feb
Mar Apr
May Jun
Jul Aug
Sep Oct
Nov Dec
Observed data
Fitted model
Output:
Total-N conc
Inputs:
Monthly pattern
Trend function
Datamining and statistical learning - lecture 9
Modelling the concentration of total nitrogen at Lobith on the Rhine:
Extracted additive components
Year components
Month components
-1.0
-0.5
0.0
0.5
1.0
1.5
0 2 4 6 8 10 12 14
Month: linear component Month: Smooth component
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
1988 1990 1992 1994 1996 1998 2000 2002 2004
Year: linear component Year: Smooth component
Datamining and statistical learning - lecture 9
Weekly mortality and confirmed cases of influenza in Sweden
0
500
1000
1500
2000
2500
3000
1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004
Mo
rta
lity
0
150
300
450
Co
nfi
rme
d c
as
es
of
infl
ue
nza
Mortality Influenza
Response:
Weekly mortality
Inputs:
Confirmed cases of influenza
Seasonal dummies
Long-term trend
Datamining and statistical learning - lecture 9
SYNTAX for common GAM models
Type of Model Syntax Mathematical Form
Parametric model y = param(x);
Nonparametric model y = spline(x);
Nonparametric model y = loess(x);
Semiparametric model y = param(x1) spline(x2);
Additive model y = spline(x1) spline(x2);
Thin-plate spline model y = spline2(x1,x2);
Datamining and statistical learning - lecture 9
Generalized additive models:Modelling the concentration of total nitrogen at Lobith on the Rhine
Model 1proc gam data=Mining.Rhine;model Nconc = spline(Year) spline(Month);output out = addmodel1;run;
Model 2proc gam data=Mining.Rhine;model Nconc = spline2(Year, Month);output out = addmodel2;run;
Datamining and statistical learning - lecture 9
Proc GAM – degrees of freedom of the spline components
The degrees of freedom of the spline components is selected by the user or by specifying method=GCV
proc gam data=Mining.Rhine;model Nconc = spline(Year, df=3) spline(Month, df=3);output out = addmodel1;run;
• Df=3 implies that the same cubic polynomial is valid in the entire range of the input
• Increasing the df-value implies that knots are introduced
Datamining and statistical learning - lecture 9
Generalized additive models:Modelling the concentration of total nitrogen at Lobith on the Rhine
-0.10
-0.05
0.00
0.05
0.10
0.15
0.20
0.25
1 13 25 37 49 61 73 85 97 109 121 133 145 157
Observation nr
P_Year
-1.00
-0.50
0.00
0.50
1.00
1.50
1 13 25 37 49 61 73 85 97 109 121 133 145 157
Observation nr
P_Month
proc gam data=Mining.Rhine;model Nconc = spline(Year) spline(Month);output out = addmodel1;run;
Datamining and statistical learning - lecture 9
Generalized additive models:Modelling the concentration of total nitrogen at Lobith on the Rhine
2
4
199019901995
2000
6
4
02000
8
4
12
P_Nconc
Month
Year
Surface Plot of P_Nconc vs Month, Year
Model 1
Datamining and statistical learning - lecture 9
Generalized additive models:Modelling the concentration of total nitrogen at Lobith on the Rhine
Model 2
df=4
3
4
5
199019901995
2000
6
4
02000
8
4
12
P_Nconc_2
Month
Year
Surface Plot of P_Nconc_2 vs Month, Year
Datamining and statistical learning - lecture 9
Generalized additive models:Modelling the concentration of total nitrogen at Lobith on the Rhine
12
83.0
4.5
4
6.0
19901995 02000
P_Nconc_3
Month
Year
Surface Plot of P_Nconc_3 vs Month, Year
Model 3
df=20
Datamining and statistical learning - lecture 9
Generalized additive models:Modelling the concentration of total nitrogen at Lobith on the Rhine
The GAM Procedure
Dependent Variable: Nconc Smoothing Model Component(s): spline(Year) spline(Month)
Summary of Input Data Set
Number of Observations 168 Number of Missing Observations 0 Distribution Gaussian Link Function Identity
Iteration Summary and Fit Statistics
Final Number of Backfitting Iterations 2 Final Backfitting Criterion 1.987193E-30 The Deviance of the Final Estimate 42.92519322
The local score algorithm converged.
Model 1
Datamining and statistical learning - lecture 9
Generalized additive models:Modelling the concentration of total nitrogen at Lobith on the Rhine
Regression Model Analysis Parameter Estimates
Parameter Standard Parameter Estimate Error t Value Pr > |t|
Intercept 420.69388 19.84413 21.20 <.0001 Linear(Year) -0.20824 0.00994 -20.94 <.0001 Linear(Month) -0.10461 0.01161 -9.01 <.0001
Smoothing Model Analysis
Analysis of Deviance
Sum of Source DF Squares Chi-Square Pr > ChiSq
Spline(Year) 3.00000 2.527155 9.3609 0.0249 Spline(Month) 3.00000 51.143931 189.4432 <.0001
Model 1
Datamining and statistical learning - lecture 9
Generalized additive models:Modelling the concentration of total nitrogen at Lobith on the Rhine
Iteration Summary and Fit Statistics Final Number of Backfitting Iterations 2 Final Backfitting Criterion 0 The Deviance of the Final Estimate 74.22284569 Regression Model Analysis Parameter Estimates Parameter Standard Parameter Estimate Error t Value Pr > |t| Intercept 4.46475 0.05206 85.76 <.0001 Smoothing Model Analysis Analysis of Deviance Sum of Source DF Squares Chi-Square Pr > ChiSq Spline2(Year Month) 4.00000 162.668070 357.2336 <.0001
Model 2
Datamining and statistical learning - lecture 9
Generalized additive models:Modelling the concentration of total nitrogen at Lobith on the Rhine
Iteration Summary and Fit Statistics Final Number of Backfitting Iterations 2 Final Backfitting Criterion 0 The Deviance of the Final Estimate 36.577160798 Regression Model Analysis Parameter Estimates Parameter Standard Parameter Estimate Error t Value Pr > |t| Intercept 4.46475 0.03849 116.01 <.0001 Smoothing Model Analysis Analysis of Deviance Sum of Source DF Squares Chi-Square Pr > ChiSq Spline2(Year Month) 20.00000 200.313755 805.0412 <.0001
Model 2 (20 df)
Datamining and statistical learning - lecture 9
Estimation of additive models
- the backfitting algorithm
N
iijjjj
jkikkijj
j
N
ii
xfN
ff
xfysf
pppj
pjfyN
1
1
)(ˆ1ˆˆ
)(ˆˆ(ˆ
,...,1,...,,...,2,,...,1:2.Cycle
,...,1,0ˆ,1
ˆ:ze1.Initiali
)(...)(),...,|( 1111 pppp xfxfxXxXYE
Datamining and statistical learning - lecture 9
Modelling ln daily electricity consumption as a spline function
of the population-weighted mean temperature in Sweden
proc gam data=sasuser.smhi;
model lnDaily_consumption = spline(Meantemp, df=20);
ID Time;
output out=smhiouttemp pred resid;
run;
12.2
12.4
12.6
12.8
13.0
13.2
13.4
-30 -20 -10 0 10 20 30
Population-weighted temperature
ln d
aily
ele
ctri
city
co
nsu
mp
tio
n (
MW
h)
Observed Fitted
Datamining and statistical learning - lecture 9
Modelling ln daily electricity consumption as a spline function
of the population-weighted mean temperature in Sweden:
residual analysis
-0.25
-0.20
-0.15
-0.10
-0.05
0.00
0.05
0.10
0.15
0.20
0 100 200 300 400
Julian day
Res
idu
al
Datamining and statistical learning - lecture 9
Modelling ln daily electricity consumption in Sweden
- residual analysis
-0.25
-0.20
-0.15
-0.10
-0.05
0.00
0.05
0.10
0.15
0.20
0 100 200 300 400
Julian day
Res
idu
al
-0.25
-0.20
-0.15
-0.10
-0.050.00
0.05
0.10
0.15
0.20
0 100 200 300 400Julian day
Res
idu
al
Spline of temperature
Spline of temperatureSpline of Julian dayWeekday dummies
Datamining and statistical learning - lecture 9
Modelling ln daily electricity consumption in Sweden
- residual analysis
-0.25
-0.20
-0.15
-0.10
-0.050.00
0.05
0.10
0.15
0.20
0 100 200 300 400Julian day
Res
idu
al
Spline of temperatureSpline of Julian dayWeekday dummies
-0.25
-0.20
-0.15
-0.10
-0.05
0.00
0.05
0.10
0.15
0.20
0 100 200 300 400Julian day
Re
sid
ua
l
Splines of contemporaneous and time-lagged weather data
Splines of Julian day and timeWeekday and holiday dummies
Datamining and statistical learning - lecture 9
Deviance analysis of the investigated models of
ln daily electricity consumption in Sweden
Deviance
10.233
3.822
0.742
0
2
4
6
8
10
12
Temp only Temp, Julian day,weekday
Final model
The residual deviance of a fitted model is minus twice its log-likelihood
If the error terms are normally distributed, the deviance is equal to the
sum of squared residuals
Datamining and statistical learning - lecture 9
Modelling ln daily electricity consumption in Sweden:
time series plot of residuals
-0.15
-0.10
-0.05
0.00
0.05
0.10
0.15
0 500 1000 1500 2000
Time
Re
sid
ua
l