+ All Categories
Home > Documents > Multiple Linear Regression - eskisehir.edu.tr 310/duyuru...Multiple regression model Note Any...

Multiple Linear Regression - eskisehir.edu.tr 310/duyuru...Multiple regression model Note Any...

Date post: 08-Feb-2021
Category:
Upload: others
View: 11 times
Download: 0 times
Share this document with a friend
46
Multiple Linear Regression Sukru Acitas Anadolu University, Department of Statistics, 26470 Eskisehir, TURKEY, [email protected]. ENM310 Experimental Design& Regression Analysis
Transcript
  • Multiple Linear Regression

    Sukru Acitas

    Anadolu University, Department of Statistics, 26470 Eskisehir, TURKEY,[email protected].

    ENM310 Experimental Design& Regression Analysis

  • Reference textbook

    ⊲ Montgomery, D. C., Peck, E. A., & Vining, G. G. (2015). Introductionto linear regression analysis. John Wiley & Sons. (Chapter 3)

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 2 / 35

  • Multiple regression model

    Definition

    A regression model that involves more than one regressor variable is calleda multiple regression model.

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 3 / 35

  • Multiple regression model

    The response y may be related to k regressor or predictor variables.

    Statistical model

    yi = β0 + β1x1 + β1x2 + . . . + βkxk + ε (1)

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 4 / 35

  • Multiple regression model

    The response y may be related to k regressor or predictor variables.

    Statistical model

    yi = β0 + β1x1 + β1x2 + . . . + βkxk + ε (1)

    The parameters βj (j = 0, 1, . . . , k), are called the regressioncoefficients.

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 4 / 35

  • Multiple regression model

    The response y may be related to k regressor or predictor variables.

    Statistical model

    yi = β0 + β1x1 + β1x2 + . . . + βkxk + ε (1)

    The parameters βj (j = 0, 1, . . . , k), are called the regressioncoefficients.

    This model describes a hyperplane in the k− dimensional space of theregressor variables xj .

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 4 / 35

  • Multiple regression model

    The response y may be related to k regressor or predictor variables.

    Statistical model

    yi = β0 + β1x1 + β1x2 + . . . + βkxk + ε (1)

    The parameters βj (j = 0, 1, . . . , k), are called the regressioncoefficients.

    This model describes a hyperplane in the k− dimensional space of theregressor variables xj .

    The parameter βj represents the expected change in the response yper unit change in xj when all of the remaining regressor variables xj(i j) are held constant. For this reason the parameters βj are oftencalled partial regression coefficients.

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 4 / 35

  • Multiple regression model

    Note

    Any regression model that is linear in the parameters (theβ’ s) is a linearregression model, regardless of the shape of the surface that it generates.

    Example

    y = β0 + β1x1 + ε

    y = β0 + β1x1 + β2x2 + ε

    y = β0 + β1x1 + β2x1x2 + ε

    y = β0 + β1x31 + β2x

    22 + β3x3 + ε

    y = β0 +√

    β1x31 ++ε

    y =1

    β1x31 ++ε

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 5 / 35

  • Estimation of the model parameters

    The method of least squares can be used to estimate the regressioncoefficients in model (1).

    Suppose that n > k observations are available, and let yi denote thei − th observed response and xij denote the i − th observation or levelof regressor xj . The data will appear as in the following Table.

    Data for multiple linear regression

    i y x1 x2 . . . xk1 y1 x11 x12 . . . x1k2 y2 x21 x22 . . . x2k...

    ......

    ... . . ....

    n yn xn1 xn2 . . . xnk

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 6 / 35

  • Estimation of the model parameters

    We may write the sample regression model corresponding to Eq. (1) asfollows:

    Multiple linear regression model

    yi = β0 + β1xi1 + β1xi2 + . . .+ βkxik + εi , i = 1, 2, . . . , n (2)

    = β0 +

    k∑

    j=1

    βjxij , i = 1, 2, . . . , n (3)

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 7 / 35

  • Estimation of the model parameters

    Assumptions:

    The error term ε in the model has E (ε) = 0 , Var(ε) = σ2 , and thatthe errors are uncorrelated,

    The regressor variables are fixed (i.e., mathematical or nonrandom)variables, measured without error and they are uncorrelated.

    When testing hypotheses or constructing CIs, we will have to assumethat the error term ε has normal distribution with mean 0 andvariance σ2.

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 8 / 35

  • Estimation of the model parameters

    The least - squares function is

    S(β0, β1, . . . , βk) =n∑

    i=1

    ε2i =n∑

    i=1

    yi − β0 −k∑

    j=1

    βjxij

    2

    (4)

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 9 / 35

  • Estimation of the model parameters

    The least - squares function is

    S(β0, β1, . . . , βk) =n∑

    i=1

    ε2i =n∑

    i=1

    yi − β0 −k∑

    j=1

    βjxij

    2

    (4)

    ⊲ The function S must be minimized with respect to β0, β1, β2, . . . , βk .

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 9 / 35

  • Estimation of the model parameters

    The least - squares estimators of β0, β1, β2, . . . , βk must satisfy

    ∂S(β0, β1, . . . , βk)

    ∂β0

    ∣∣∣∣β̂0,β̂1,β̂2,...,β̂k

    = (−2)

    n∑

    i=1

    yi − β̂0 −

    k∑

    j=1

    β̂jxij

    = 0

    (5)and

    ∂S(β0, β1, . . . , βk)

    ∂βj

    ∣∣∣∣β̂j ,β̂1,β̂2,...,β̂k

    = (−2)n∑

    i=1

    yi − β̂0 −k∑

    j=1

    β̂jxij

    xij = 0.

    (6)

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 10 / 35

  • Estimation of the model parameters

    Simplifying Eq. (5) and (6), we obtain the least - squares normalequations:

    Normal equations

    n

    n∑

    i=1

    yi = β̂0 + β̂1

    n∑

    i=1

    xi1 + · · ·+ β̂k

    n∑

    i=1

    xik (7)

    n∑

    i=1

    xi1yi = β̂0

    n∑

    i=1

    xi1 + β̂1

    n∑

    i=1

    x2i1 + · · ·+ β̂k

    n∑

    i=1

    xi1xik (8)

    ...n∑

    i=1

    xikyi = β̂0

    n∑

    i=1

    xi1xik + β̂1

    n∑

    i=1

    xi1xik + · · ·+ β̂k

    n∑

    i=1

    xi1x2ik (9)

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 11 / 35

  • Estimation of the model parameters

    Note

    There are p = k + 1 normal equations, one for each of the unknownregression coefficients. The solution to the normal equations will bethe least - squares estimators β̂0, β̂1, β̂2, . . . , β̂k .

    It is more convenient to deal with multiple regression models if theyare expressed in matrix notation.

    This allows a very compact display of the model, data, and results.

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 12 / 35

  • Matrix form of multiple linear regression model

    In matrix notation, the model given by Eq. (3) is

    Matrix notation

    y = Xβ + ε (10)

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 13 / 35

  • Matrix form of multiple linear regression model

    y =

    y1y2...yn

    , X =

    1 x11 x12 · · · x1k1 x21 x22 · · · x2k...

    ... · · ·. . .

    ...1 xn1 xn2 · · · xnk

    ,

    β =

    β0β1β2...βk

    ve ε =

    ε1ε2...εn

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 14 / 35

  • Least-squares estimation

    We wish to find the vector of least-squares estimators, β̂, that minimizes

    Least-squares function: Matrix form

    S(β) =

    n∑

    i=1

    ε2i = ε′ε = (y − Xβ)′(y − Xβ) (11)

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 15 / 35

  • Least-squares estimation

    Note that S(β) may be expressed as

    S(β) = y′y − β′X′y − y′Xβ + βX′Xβ′ (12)

    = y′y − 2β′X′y + βX′Xβ′ (13)

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 16 / 35

  • Least-squares estimation

    The least-squares estimators must satisfy

    ∂S(β)

    ∂β

    ∣∣∣∣β=β̂

    = −2X′y + 2X′Xβ̂ = 0 (14)

    which simplifies to

    Least - squares normal equations

    X′y = X′Xβ̂ (15)

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 17 / 35

  • Least-squares estimation

    Least-squares estimator

    β̂ = (X′X)−1X′y (16)

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 18 / 35

  • Some definitions

    Fitted regression model

    The fitted regression model is given by

    ŷ = Xβ̂ = X(X′X)−1X′y. (17)

    Hat matrix

    The hat matrix is defined as

    H = X(X′X)−1X′ (18)

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 19 / 35

  • Some definitions

    Note

    ŷ = Hy (19)

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 20 / 35

  • Some definitions

    Note

    ŷ = Hy (19)

    The hat matrix maps the vector of observed values into a vector offitted values.

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 20 / 35

  • Some definitions

    Note

    ŷ = Hy (19)

    The hat matrix maps the vector of observed values into a vector offitted values.

    The hat matrix and its properties play a central role in regressionanalysis.

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 20 / 35

  • Some definitions

    Residual

    The difference between the observed value yi and the corresponding fittedvalue ŷi is the residual ei = yi − ŷi . The n residuals may be convenientlywritten in matrix notation as

    e = y − ŷ. (20)

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 21 / 35

  • Some definitions

    Alternative notation for residual

    e = y − Xβ̂ (21)

    = y −Hy (22)

    = y(I−H) (23)

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 22 / 35

  • Estimation of σ2

    As in simple linear regression, we may develop an estimator of σ2 from theresidual sum of squares

    SSRes =n∑

    i=

    (yi − ŷi)2 =

    n∑

    i=1

    e2i = e′e (24)

    Substituting y − Xβ̂, we have

    SSRes = e′e (25)

    = (y − Xβ̂)′(y − Xβ̂) (26)

    = y′y − 2β̂′

    X′y + β̂′

    X′Xβ̂︸ ︷︷ ︸

    X′y

    (27)

    = y′y − β̂′

    X′y. (28)

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 23 / 35

  • Estimation of σ2

    The residual sum of squares has n − p degrees of freedom associatedwith it since p parameters are estimated in the regression model.

    The residual mean square is

    MSRes

    MSRes =SSRes

    n − p(29)

    The expected value of MSRes is σ2, so an unbiased estimator of σ2 is

    Estimator of σ2

    σ̂2 = MSRes . (30)

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 24 / 35

  • Properties of LS estimators

    β̂ is an unbiased estimator of β. That is E (β̂) = β.

    The variance property of β̂ is expressed by the variance-covariancematrix:

    Variance of β̂

    Var(β̂) = E

    {

    (β̂ − β)′(β̂ − β)

    }

    = σ̂2(X′X)−1.

    Var(β̂) is p × p symmetric matrix.

    j−th diagonal element of Var(β̂) is the variance of β̂j .

    (ij)−th off-diagonal element is the covariance between β̂i and β̂j .

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 25 / 35

  • Hypothesis testing in multiple linear regression

    Once we have estimated the parameters in the model, we face twoimmediate questions:

    What is the overall adequacy of the model?

    What is the overall adequacy of the model?

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 26 / 35

  • Test for significance of regression

    The test for significance of regression is a test to determine if there is alinear relationship between the response y and any of the regressorvariables x1, x2, . . . , xk . This procedure is often thought of as an overall orglobal test of model adequacy.

    Hypotheses

    H0 : β1 = β2 = · · · = βk = 0

    H1 : βj 6= 0 for at least one j

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 27 / 35

  • Test for significance of regression

    Test Statistic

    To test the null hypothesis H0,

    F0 =SSReg/k

    SSRes/(n − k − 1)=

    MSReg

    MSRes

    is used. It can be shown that F0 has F distribution with degrees offreedom ν1 = k and ν2 = n − k − 1.

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 28 / 35

  • Test for significance of regression

    Total sum of squares

    SST = y′y −

    (n∑

    i=1

    yi

    )2

    n

    Regression sum of squares

    SSReg = β̂′

    X′y −

    (n∑

    i=1

    yi

    )2

    n

    Residual sum of squares

    SSRes = y′y − β̂

    X′y

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 29 / 35

  • Test for significance of regression

    Decomposition of total sum of squares

    SST = SSReg + SSRes

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 30 / 35

  • Test for significance of regression

    ANOVA Table

    Source SS df MS F

    Regression SSReg k MSReg F0Residual SSRes n − k − 1 MSResTotal SST n − 1

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 31 / 35

  • Test for significance of regression

    ANOVA Table

    Source SS df MS F

    Regression SSReg k MSReg F0Residual SSRes n − k − 1 MSResTotal SST n − 1

    Reject H0 : β1 = β2 = · · · = βk = 0 if F0 > Fα,ν1,ν2.

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 31 / 35

  • R2 and Adjusted R2

    Two other ways to assess the overall adequacy of the model are R2

    and adjusted R2, denoted R2Adj .

    In general, R2 never decreases when a regressor is added to themodel, regardless of the value of the contribution of that variable.Therefore, it is difficult to judge whether an increase in R2 is reallytelling us anything important.

    Adjusted R2

    R2Adj = 1−SSRes/(n − p)

    SST/(n − 1)(31)

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 32 / 35

  • R2 and Adjusted R2

    Two other ways to assess the overall adequacy of the model are R2

    and adjusted R2, denoted R2Adj .

    In general, R2 never decreases when a regressor is added to themodel, regardless of the value of the contribution of that variable.Therefore, it is difficult to judge whether an increase in R2 is reallytelling us anything important.

    Adjusted R2

    R2Adj = 1−SSRes/(n − p)

    SST/(n − 1)(31)

    R2Adj will only increase on adding a variable to the model if theaddition of the variable reduces the residual mean square.

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 32 / 35

  • Tests on individual regression coefficients

    Adding a variable to a regression model always causes the sum ofsquares for regression to increase and the residual sum of squares todecrease.

    We must decide whether the increase in the regression sum of squaresis sufficient to warrant using the additional regressor in the model.

    The addition of a regressor also increases the variance of the fittedvalue ŷ , so we must be careful to include only regressors that are ofreal value in explaining the response.

    Furthermore, adding an unimportant regressor may increase theresidual mean square, which may decrease the usefulness of themodel.

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 33 / 35

  • Tests on individual regression coefficients

    Hypotheses

    H0 : βj = 0, j = 1, 2, . . . , k

    H1 : βj 6= 0.

    Test statistic

    t0 =β̂j

    se(β̂j ), j = 1, 2, . . . , k

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 34 / 35

  • Tests on individual regression coefficients

    Hypotheses

    H0 : βj = 0, j = 1, 2, . . . , k

    H1 : βj 6= 0.

    Test statistic

    t0 =β̂j

    se(β̂j ), j = 1, 2, . . . , k

    Reject H0 : βj = 0 if |t0| > tα/2,n−k−1.

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 34 / 35

  • Tests on individual regression coefficients

    Hypotheses

    H0 : βj = 0, j = 1, 2, . . . , k

    H1 : βj 6= 0.

    Test statistic

    t0 =β̂j

    se(β̂j ), j = 1, 2, . . . , k

    Reject H0 : βj = 0 if |t0| > tα/2,n−k−1.

    If H0 : βj = 0 is not rejected, then this indicates that the regressor xjcan be deleted from the model.

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 34 / 35

  • Tests on individual regression coefficients

    Hypotheses

    H0 : βj = 0, j = 1, 2, . . . , k

    H1 : βj 6= 0.

    Test statistic

    t0 =β̂j

    se(β̂j ), j = 1, 2, . . . , k

    Reject H0 : βj = 0 if |t0| > tα/2,n−k−1.

    If H0 : βj = 0 is not rejected, then this indicates that the regressor xjcan be deleted from the model.

    This is really a partial or marginal test because the regressioncoefficient β̂j depends on all of the other regressor variables xi (i 6= j)that are in the model. Thus, this is a test of the contribution of xjgiven the other regressors in the model.

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 34 / 35

  • Thank you :)

    Multiple Linear Regression Exp.Desg.& Reg.Ana. 35 / 35


Recommended