+ All Categories
Home > Documents > Regression Explained

Regression Explained

Date post: 03-Jun-2018
Category:
Upload: hsingla25
View: 223 times
Download: 0 times
Share this document with a friend

of 40

Transcript
  • 8/13/2019 Regression Explained

    1/40

  • 8/13/2019 Regression Explained

    2/40

    Regression Analysis

    There are three kinds of data arrangements. Time series Cross sectional Panel

    Therefore regression can be of all three. Based on number of variables regression is

    Bivariate and multivariate.

  • 8/13/2019 Regression Explained

    3/40

    Bivariate Regression

    A measure of linear association thatinvestigates a straight line relationship

    Useful in estimation/forecasting

  • 8/13/2019 Regression Explained

    4/40

    Introduction to Regression Analysis Regression analysis is used to:

    Predict the value of a dependent variable based onthe value of at least one independent variable

    Explain the impact of changes in an independentvariable on the dependent variable

    Dependent variable: the variable we wish toexplainIndependent variable: the variable used toexplain the dependent variable

  • 8/13/2019 Regression Explained

    5/40

    Types of Regression Models

    Positive Linear Relationship

    Negative Linear Relationship

    Relationship NOT Linear

    No Relationship

  • 8/13/2019 Regression Explained

    6/40

    Bivariate Linear Regression

    A measure of linear association thatinvestigates a straight-line relationship.

    Y = + X + where Y is the dependent variable X is the independent variable and are two constants to be estimated is error or residual term

  • 8/13/2019 Regression Explained

    7/40

    Y intercept

    An intercepted segment of a line The point at which a regression line intercepts

    the Y-axis

  • 8/13/2019 Regression Explained

    8/40

    Slope

    The inclination of a regression line ascompared to a base line

  • 8/13/2019 Regression Explained

    9/40

    X

    Y

    160

    150

    140

    130

    120

    110

    100

    90

    80

    70 80 90 100 110 120 130 140 150 160 170 180 190

    Y hat

    Actual Y

    Y hat

    Actual Y

    Regression Line

    = a + bx + e

    is usedforpredictedvalue of Y

  • 8/13/2019 Regression Explained

    10/40

    The Least-Square Method

    The criterion of attempting to make the leastamount of total error in prediction of Y fromX. More technically, the procedure used in theleast-squares method generates a straight linethat minimizes the sum of squared deviationsof the actual values from this predicted

    regression line.

  • 8/13/2019 Regression Explained

    11/40

    The Least-Square Method

    A relatively simple mathematical techniquethat ensures that the straight line will mostclosely represent the relationship between Xand Y.

  • 8/13/2019 Regression Explained

    12/40

    = - (The residual)

    = actual value of the dependent variable

    = estimated value of the dependent variable (Y hat)

    n = number of observations

    ie

    iY

    iY

    iY

    iY

    n

    i

    ie1

    2 minimumis

    Regression - Least-Square Method

    210

    22

    x))b(a(y

    )y(ye

  • 8/13/2019 Regression Explained

    13/40

    22 X X nY X XY n

    Finding out the values of a and b

    = estimated slope of the line (the regression coefficient)

    = estimated intercept of the y axis

    = dependent variable

    = mean of the dependent variable

    = independent variable

    = mean of the independent variable

    = number of observations

    X

    Y

    n

    a

    Y

    X

    22

    X n X

    Y X n XY

    X Y a

  • 8/13/2019 Regression Explained

    14/40

    The other method of calculating &

    Use of simultaneous equation Method Y=N+X (where y is dependent variable and x is

    independent variable) XY=X+X2

  • 8/13/2019 Regression Explained

    15/40

    F-Test (Regression)-Goodness of fit

    A procedure to determine whether there ismore variability explained by the regression orunexplained by the regression.

    Total deviation equals= Deviation explained bythe regression + Deviation unexplained by theregression

  • 8/13/2019 Regression Explained

    16/40

    222 iiii Y Y Y Y Y Y Totalvariation =

    Explainedvariation

    Unexplainedvariation(residual)

    +

    iiii Y Y Y Y Y Y

    Partitioning the Variance

    = Mean of the total group

    = Value predicted with regression equation

    = Actual value

    Y

    Y

    iY

  • 8/13/2019 Regression Explained

    17/40

    SSeSSr SSt

    Sum of Squares

    SSt

    SSe

    SSt

    SSr r 12

    The proportion of variance in Y that is explained by X (or vice versa) is referred as

    Coefficient of Determination-r 2.R2 can also be calculated by squaring the correlation i.e. r. This is also known asexplained variance.

  • 8/13/2019 Regression Explained

    18/40

    X Y

    3 40

    10 35

    11 30

    15 32

    22 19

    22 26

    23 24

    28 22

    28 18

    35 6

    Equation for Line of Best Fit: y = .94x + 43.7

    Correlation = -.94

    Calculating The Value of R Square

  • 8/13/2019 Regression Explained

    19/40

    X YPredicted

    Y ValueError Error

    Squared

    Distancebetween Y

    values andtheir mean

    Meandistancessquared

    3 40

    10 35

    11 30

    15 32

    22 19

    22 26

    23 24

    28 22

    28 18

    35 6

    Mean: Sum: Sum:

    Equation for Line of Best Fit: y = .94x + 43.7

  • 8/13/2019 Regression Explained

    20/40

    X YPredicted

    Y ValueError Error

    Squared

    Distancebetween Y

    values andtheir mean

    Meandistancessquared

    3 40 40.88 .88 .77 14.8 219.04

    10 35 34.30 -.70 .49 9.8 96.04

    11 30 33.36 3.36 11.29 4.8 23.04

    15 32 29.60 -2.40 5.76 6.8 46.24

    22 19 23.02 4.02 16.16 -6.2 38.44

    22 26 23.02 -2.98 8.88 .8 .64

    23 24 22.08 -1.92 3.69 -1.2 1.44

    28 22 17.38 -4.62 21.34 -3.2 10.24

    28 18 17.38 -.62 .38 -7.2 51.84

    35 6 10.80 4.8 23.04 -19.2 368.65

    Mean: 25.2 Sum: 91.81 Sum: 855.60

    Equation for Line of Best Fit: y = .94x + 43.7

  • 8/13/2019 Regression Explained

    21/40

    1-Sum of squared distances between the

    actual and predicted Y values

    Sum of squared distances between theactual Y values and their mean

    To calculate R Squared

    1- 91.81855.601- 0.11 = .89

  • 8/13/2019 Regression Explained

    22/40

    X Y

    3 40

    10 35

    11 30

    15 32

    22 19

    22 26

    23 24

    28 22

    28 18

    35 6

    r = -.944

    The value we got for R Squared was .89

    Heres a short -cut. To find R Squared

    Square r

    r2 = -.944 -.944r2 = .89

  • 8/13/2019 Regression Explained

    23/40

    R Squared To determine how well the regression line

    fits the data, we find a value called R-Squared (r2)

    To find r 2, simply square the correlation The closer r 2 is +1, the better the line fits

    the data r2 will always be a positive number

  • 8/13/2019 Regression Explained

    24/40

    Understanding the Output of

    Regression

  • 8/13/2019 Regression Explained

    25/40

    Sample Data for House Price Model

    House Price in Thousands(y)

    Square Feet(x)

    245 1400

    312 1600

    279 1700

    308 1875

    199 1100

    219 1550

    405 2350

    324 2450

    319 1425

    255 1700

  • 8/13/2019 Regression Explained

    26/40

    Regression Using Excel

    Tools / Data Analysis / Regression

  • 8/13/2019 Regression Explained

    27/40

    Excel OutputRegress ion Stat i s t ics

    Multiple R 0.76211

    R Square 0.58082

    Adjusted RSquare 0.52842

    Standard Error 41.3303

    Observations 10

    ANOVAd f SS MS F

    Signi f icanceF

    Regression 1 18934.9348 18934.934 11.08 0.01039

    Residual 8 13665.5652 1708.1957

    Total 9 32600.5000

    Coefficien ts Standard Error t Stat

    P- va lue Low er 95%

    Upper95 %

    Intercept 98.24833 58.03348 1.69296 0.1289 -35.57720 232.0738

    Square Feet 0.10977 0.03297 3.32938 0.0103 0.03374 0.18580

    The regression equation is:

    feet)(square0.1097798.24833pricehouse

  • 8/13/2019 Regression Explained

    28/40

    050

    100

    150

    200250

    300350

    400

    450

    0 500 1000 1500 2000 2500 3000

    Square Feet

    H o u s e

    P r i c e

    ( $ 1 0 0 0 s

    )

    Graphical Presentation House price model: scatter plot and regression

    line

    feet)(square0.1097798.24833pricehouse

    Slope= 0.10977

    Intercept= 98.248

  • 8/13/2019 Regression Explained

    29/40

    Interpretation of theIntercept, b 0

    is the estimated average value of Y when thevalue of X is zero (if x = 0 is in the range ofobserved x values)

    Here, no houses had 0 square feet, so =98.24833 just indicates that, for houses within therange of sizes observed, 98,248.33 is the portionof the house price not explained by square feet

    feet)(square0.1097798.24833pricehouse

  • 8/13/2019 Regression Explained

    30/40

    Interpretation of theSlope Coefficient, b 1

    measures the estimated change in theaverage value of Y as a result of a one-unitchange in X

    Here, = .10977 tells us that the average value of ahouse increases by .10977(1000) = 109.77, onaverage, for each additional one square foot of size

    feet)(square0.1097798.24833pricehouse

  • 8/13/2019 Regression Explained

    31/40

    Excel OutputRegress ion S ta ti s t i cs

    Multiple R 0.76211

    R Square 0.58082

    Adjusted R Square 0.52842

    Standard Error 41.33032

    Observations 10

    ANOVAd f SS MS F Signi f icance F

    Regression (Main) 1 18934.9348 18934.9348 11.0848 0.01039

    Residual (Error) 8 13665.5652 1708.1957

    Total 9 32600.5000

    Coefficients Standard Er ror t Stat P-value Low er 95% Upper 95%

    Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386

    Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

    58.08% of the variation in house prices isexplained by variation in square feet

    0.5808232600.500018934.9348

    SSTSSRR 2

    Adjusted R square Used to test if an additionalindependent variable improves the model.

  • 8/13/2019 Regression Explained

    32/40

    Standard Error of Estimate

    The standard deviation of the variation ofobservations around the regression line isestimated by

    1knSSE

    s

    WhereSSE = Sum of squares error

    n = Sample sizek = number of independent variables in the model

  • 8/13/2019 Regression Explained

    33/40

    The Standard Deviation of theRegression Slope

    The standard error of the regression slopecoefficient (b 1) is estimated by

    n

    x)(x

    s

    )x(x

    ss 2

    2

    2

    b1

    where:= Estimate of the standard error of the least squares slope

    = Sample standard error of the estimate

    1bs

    2n

    SSEs

  • 8/13/2019 Regression Explained

    34/40

    Regress ion Stat i s t ics

    Multiple R 0.76211

    R Square 0.58082Adjusted RSquare 0.52842

    Standard Error 41.33032

    Observations 10

    ANOVAd f SS MS F

    Signi f icanceF

    Regression 1 18934.9348 18934.934 11.084 0.01039

    Residual 8 13665.5652 1708.1957

    Total 9 32600.5000

    Coefficien ts Standard Error t Stat

    P- va lue Low er 95%

    Upper95 %

    Intercept 98.24833 58.03348 1.69296 0.1289 -35.57720 232.0738

    Square Feet 0.10977 0.03297 3.32938 0.0103 0.03374 0.18580

    Thus, 41.33 means that the expected error for ahouse price prediction is off by 41330 rupees.

  • 8/13/2019 Regression Explained

    35/40

    Inference about the Slope: t Test

    t test for a population slope Is there a linear relationship between x and y? Null and alternative hypotheses

    H0: 1 = 0 (no linear relationship) H1: 1 0 (linear relationship does exist)

    Test statistic

    1b

    11

    s bt

    2nd.f.

    where:

    b1 = Sample regression slopecoefficient

    1 = Hypothesized slope

    sb1 = Estimator of the standarderror of the slope

  • 8/13/2019 Regression Explained

    36/40

    House Pricein $1000s

    (y)

    Square Feet(x)

    245 1400

    312 1600279 1700

    308 1875

    199 1100

    219 1550

    405 2350

    324 2450

    319 1425

    255 1700

    (sq.ft.)0.109898.25pricehouse

    Estimated Regression Equation:

    The slope of this model is 0.1098

    Does square footage of the houseaffect its sales price?

    Inference about the Slope: t Test(continued)

  • 8/13/2019 Regression Explained

    37/40

    Inferences about the Slope:t Test Example

    H0: 1 = 0HA: 1 0

    Test Statistic: t = 3.329

    There is sufficient evidencethat square footage affects

    house price

    Reject H 0

    Coeff ic ient s

    StandardError t Stat

    P- va lue

    Intercept 98.24833 58.03348 1.6929 0.1289

    SquareFeet 0.10977 0.03297 3.3293 0.0103

    1bs

    tb1

    Decision:

    Conclusion:

    Reject H 0Reject H 0

    a /2=.025

    -t /2Do not reject H 0

    0 t /2

    a /2=.025

    -2.3060 2.3060 3.329

    d.f. = 10-2 = 8

  • 8/13/2019 Regression Explained

    38/40

    Regression Analysis for Description

    Confidence Interval Estimate of the Slope:

    Excel Printout for House Prices:

    At 95% level of confidence, the confidence interval for theslope is (0.0337, 0.1858)

    1b/21stb a

    Coeff ic ient

    s Standard

    Error t Sta t P-value Low er 95% Upper

    95%

    Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386

    Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

    d.f. = n - 2

  • 8/13/2019 Regression Explained

    39/40

    Regression Analysis for Description

    Since the units of the house price variable is $1000s, we are 95%confident that the average impact on sales price is between$33.70 and $185.80 per square foot of house size

    Coeff ic ient s

    StandardError t Sta t P-value Low er 95%

    Upper95 %

    Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386

    Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

    This 95% confidence interval does not include 0 .

    Conclusion: There is a significant relationship between house price and squarefeet at the .05 level of significance

  • 8/13/2019 Regression Explained

    40/40

    Multiple Regression

    Extension of Bivariate Regression Multidimensional when three or more

    variables are involved Simultaneously investigates the effect of two

    or more variables on a single dependentvariable


Recommended