Outline

Data mining and statistical learning, lecture 3

Outline

Ordinary least squares regression

Ridge regression


Ordinary least squares regression (OLS)

x1 x2 xp…

yModel:

Terminology:

0: intercept (or bias)

1, …, p: regression coefficients (or weights)

The response variable responds directly and linearly to changes in the inputs

errorxβ...xy pp 110

errory T Xβ0


Least squares regression

Assume that we have observed a training set of data

Estimate the coefficients by minimizing the residual sum of squares

N

i

p

jijji XyRSS

1 1

20 )()(

Case X 1 X 2 X p Y1 x 11 x 21 x p 1 y 12 x 12 x 22 x p 2 y 23 x 13 x 23 x p 3 y 3

N x 1N x 2N x pN y N


Matrix formulation of OLS regression

n

i

p

jijji XyRSS

1 1

20 )()(

Differentiating the residual sum of squares and setting the first derivatives equal to zero we obtain

where

and

0)( XyX T

pNNN

p

p

xxx

xxxxxx

21

22212

12111

1

11

X

Ny

yy

2

1

y


Parameter estimates and predictions

n

i

p

jijji XyRSS

1 1

20 )()(

HyyXXXXXy TT 1)(ˆˆ

Least squares estimates of the parameters

Predicted values

yXXX TT 1)(ˆ



Different sources of inputs

n

i

p

jijji XyRSS

1 1

20 )()(


Quantitative inputs

Transformations of quantitative inputs

Numeric or dummy coding of the levels of qualitative inputs

Interactions between variables (e.g. X3 = X1 X2)

Example of dummy coding:

otherwise 0,Nov if ,1

otherwise 0,Feb if ,1

otherwise 0,Jan if ,1

11

2

1

X

X

X


An example of multiple linear regression

n

i

p

jijji XyRSS

1 1

20 )()(

Response variable: Requested price of used Porsche cars (1000 SEK)

Inputs:X1 = Manufacturing yearX2 = Milage (km)X3 = Model (0 or 1)X4 = Equipment (1 2, 3)X5 = Colour (Red Black Silver Blue Black White Green)


Price of used Porsche cars

n

i

p

jijji XyRSS

1 1

20 )()(

Response variable: Requested price of used Porsche cars (1000 SEK)

Inputs:X1 = Manufacturing yearX2 = Milage (km)

Inputs Estimated model RSS Year Price = -76829 + 38.6Year 113030 Milage Price = 430.7 -0.001862Milage 230212 Year, Milage Price = -6389 +32.1Year – 0.000789Milage 92541


Interpretation of multiple regression coefficients

Assume that

and that the regression coefficients are estimated by ordinary least squares regression

Then the multiple regression coefficient represents the additional contribution of xj on y, after xj has been adjusted for x0, x1, …, xj-1, xj+1, …, xp

j

p

jjjXY

10


Confidence intervals for regression parameters

n

i

p

jijji XyRSS

1 1

20 )()(

Assume that

where the X-variables are fixed and the error terms are i.i.d. and N(0, )

Then

where vj is the jth diagonal element of

p

jjjXY

10

%)95(ˆ)1(ˆ05.0 jjj vpNt

1)( XX T


Interpretation of software outputs

Adding new independent variables to a regression model alters at least one of the old regression coefficients unless the columns of the X-matrix are orthogonal, i.e.

Regression of the price of used Porsche cars vs

milage (km) and manufacturing year

Predictor Coef SE Coef T P

Constant 430.69 17.42 24.72 0.000

Milage (km) -0.0018621 0.0002959 -6.29 0.000

Predictor Coef SE Coef T P

Constant -63809 6976 -9.15 0.000

Milage (km) -0.0007894 0.0002222 -3.55 0.001

Year 32.103 3.486 9.21 0.000

N

iikijxx

1

0


Stepwise Regression: Price (1000SEK) versus Year, Milage (km), ...

Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15

Step 1 2 3 4Constant -76829 -63809 -53285 -52099

Year 38.6 32.1 26.8 26.2T-Value 11.87 9.21 7.00 6.88P-Value 0.000 0.000 0.000 0.000

Milage (km) -0.00079 -0.00066 -0.00062T-Value -3.55 -3.08 -2.88P-Value 0.001 0.003 0.006

Model 37 27T-Value 2.72 1.83P-Value 0.009 0.073

Equipment 11.0T-Value 1.52P-Value 0.135

S 44.1 40.3 38.2 37.8R-Sq 70.82 76.11 78.89 79.74R-Sq(adj) 70.32 75.27 77.76 78.27Mallows Cp 23.8 11.3 5.7 5.4

The p-value refers to a t-test of the hypothesis that the regression coefficient of the last entered x-variable is zero

Classical statistical model selection techniques are model-based.

In data-mining the model selection is data-driven.


Stepwise Regression: Price (1000SEK) versus Year, Milage (km), ...- model validation by visual inspection of residuals

500450400350300250200

200

150

100

50

0

-50

-100

Fitted Value

Resid

ual

Versus Fits(response is Price (1000SEK))

140000120000100000800006000040000200000

200

150

100

50

0

-50

-100

Milage (km)

Resid

ual

Residuals Versus Milage (km)(response is Price (1000SEK))

Residual = Observed - Predicted


The Gram-Schmidt procedure for regression by successive orthogonalization and simple linear regression

1. Intialize z0 = x0 = 1

2. For j = 1, … , p, compute

where depicts the inner product (the sum of coordinate-wise products)

3. Regress y on zp to obtain the multiple regression coefficient

1

0

1

0

,ˆ,,j

k

j

kkkjjk

kk

jkjj zxz

zzxz

xz

p


Prediction of a response variable using correlated explanatory variables- daily temperatures in Stockholm, Göteborg, and Malmö

-20

-10

0

10

20

30

-20 -10 0 10 20 30Stockholm temperature

Göt

ebor

g te

mpe

ratu

re

-20

-10

0

10

20

30

-20 -10 0 10 20 30Stockholm temperature

Mal

mö

tem

pera

ture

-20

-10

0

10

20

30

-20 -10 0 10 20 30Malmö temperature

Göt

ebor

g te

mpe

ratu

re


Absorbance records for ten samples of chopped meat

0.00.51.01.52.02.53.03.54.04.55.0

1 12 23 34 45 56 67 78 89 100

Channel

Abs

orba

nce

Sample_1Sample_2Sample_3Sample_4Sample_5Sample_6Sample_7Sample_8Sample_9Sample_10

1 response variable (protein)

100 predictors (absorbance at 100 wavelengths or channels)

The predictors are strongly correlated to each other


Absorbance records for 240 samples of chopped meat

The target is poorly correlated to each predictor

0

5

10

15

20

25

0 2 4 6

Absorbance in channel 50

Prot

ein

(%)


Ridge regression

The ridge regression coefficients minimize a penalized residual sum of squares:

or

Normally, inputs are centred prior to the estimation of regression coefficients

N

i

p

jjpjpji

ridge xxy1 1

22110 )...(argminˆ

p

jj

N

ipjpji

ridge

s

xxy

1

2

1

2110

)...(argminˆ

tosubject


Matrix formulation of ridge regression for centred inputs

If the inputs are orthogonal, the ridge estimates are just a scaled version

of the least squares estimates

Shrinking enables estimation of regression coefficients even if the number of parameters exceeds the number of cases

Figure 3.7

yXIXX TTridge 1)(ˆ

T--RSS )()()( 1 XyXy

10 where,ˆˆ ridge


Ridge regression – pros and cons

Ridge regression is particularly useful if the explanatory variables are strongly correlated to each other.

The variance of the estimated regression coefficient is reduced at the expensive of (slightly) biased estimates


The Gauss-Markov theorem

Consider a linear regression model in which:– the inputs are regarded as fixed– the error terms are i.i.d. with mean 0 and variance 2.

Then, the least squares estimator of a parameter aT has variance no bigger than any other linear unbiased estimator of aT

Biased estimators may have smaller variance and mean squared error!


SAS code for an ordinary least squares regression

proc reg data=mining.dailytemperature outest = dtempbeta;model daily_consumption = stockholm g_teborg malm_;run;


SAS code for ridge regression

proc reg data=mining.dailytemperature outest = dtempbeta ridge=0 to 10 by 1;model daily_consumption = stockholm g_teborg malm_;proc print data=dtempbeta;run;

_TYPE_ _DEPVAR_ _RIDGE_ _RMSE_ Intercept STOCKHOLM G_TEBORG MALM_PARMS Daily_Consumption 30845.8 480268.9 -5364.6 -548.3 -3598.2RIDGE Daily_Consumption 0 30845.8 480268.9 -5364.6 -548.3 -3598.2RIDGE Daily_Consumption 1 36314.6 462824.0 -2327.8 -2357.6 -2512.6RIDGE Daily_Consumption 2 43008.7 450349.7 -1830.1 -1899.4 -2011.6RIDGE Daily_Consumption 3 48325.9 442054.5 -1514.3 -1584.8 -1674.9RIDGE Daily_Consumption 4 52401.2 436146.6 -1292.7 -1358.6 -1434.4RIDGE Daily_Consumption 5 55571.5 431726.2 -1128.0 -1188.6 -1254.1RIDGE Daily_Consumption 6 58092.1 428294.6 -1000.8 -1056.3 -1114.1RIDGE Daily_Consumption 7 60138.0 425553.4 -899.4 -950.4 -1002.1RIDGE Daily_Consumption 8 61829.0 423313.5 -816.7 -863.8 -910.6RIDGE Daily_Consumption 9 63248.9 421448.8 -747.9 -791.7 -834.4RIDGE Daily_Consumption 10 64457.3 419872.4 -689.8 -730.6 -770.0

Date post:	14-Mar-2016
Category:	Documents
Upload:	wyatt-witt
View:	33 times
Download:	0 times

Outline

Documents