Interpreting Regression Output in Excel

8/12/2019 Interpreting Regression Output in Excel

1/14

EXCEL 2007: Multiple Regression

A. Colin Cameron, Dept. of Economics, Univ. of Calif. - Davis

This January 2009 help sheet gives information on

Multiple regression using the Data Analysis Add-in. Interpreting the regression statistic. Interpreting the ANOVA table (often this is skipped). Interpreting the regression coefficients table. Confidence intervals for the slope parameters. Testing for statistical significance of coefficients Testing hypothesis on a slope parameter. Testing overall significance of the regressors. Predicting y given values of regressors. Excel limitations.

There is little extra to know beyond regression with one explanatory variable.The main addition is the F-test for overall fit.

MULTIPLE REGRESSION USING THE DATA ANALYSIS ADD-IN

This requires the Data Analysis Add-in: see Excel 2007: Access and Activating the DataAnalysis Add-in The data used are in carsdata.xls

We then create a new variable in cells C2:C6, cubed household size as a regressor.Then in cell C1 give the the heading CUBED HH SIZE.(It turns out that for the se data squared HH SIZE has a coefficient of exactly 0.0 the cube isused).

The spreadsheet cells A1:C6 should look like:

We have regression with an intercept and the regressors HH SIZE and CUBED HH SIZE
http://cameron.econ.ucdavis.edu/excel/ex01access.htmlhttp://cameron.econ.ucdavis.edu/excel/ex01access.htmlhttp://cameron.econ.ucdavis.edu/excel/ex01access.htmlhttp://cameron.econ.ucdavis.edu/excel/ex01access.htmlhttp://cameron.econ.ucdavis.edu/excel/carsdata.xlshttp://cameron.econ.ucdavis.edu/excel/carsdata.xlshttp://cameron.econ.ucdavis.edu/excel/carsdata.xlshttp://cameron.econ.ucdavis.edu/excel/carsdata.xlshttp://cameron.econ.ucdavis.edu/excel/ex01access.htmlhttp://cameron.econ.ucdavis.edu/excel/ex01access.html


2/14

The population regression model is: y = 1 + 2 x2 + 3 x3 + uIt is assumed that the error u is independent with constant variance (homoskedastic) - seeEXCEL LIMITATIONS at the bottom.

We wish to estimate the regression line: y = b 1 + b 2 x2 + b 3 x3

We do this using the Data analysis Add-in and Regression.

The only change over one-variable regression is to include more than one column in the Input XRange.

Note, however, that the regressors need to be in contiguous columns (here columns B and C).If this is not the case in the original data, then columns need to be copied to get the regressors incontiguous columns.


3/14

Hitting OK we obtain

The regression output has three components:

Regression statistics table ANOVA table Regression coefficients table.

INTERPRET REGRESSION STATISTICS TABLE

This is the following output. Of greatest interest is R Square.

Explanation

Multiple R 0.895828 R = square root of R

R Square 0.802508 R

Adjusted R Square 0.605016 Adjusted R used if more than one x variableStandard Error 0.444401 This is the sample estimate of the standard deviation of the error u

Observations 5 Number of observations used in the regression (n)

The above gives the overall goodness-of-fit measures:R 2 = 0.8025


4/14

Correlation between y and y-hat is 0.8958 (when squared gives 0.8025).Adjusted R 2 = R 2 - (1-R 2 )*(k-1)/(n-k) = .8025 - .1975*2/2 = 0.6050.

The standard error here refers to the estimated standard deviation of the error term u.It is sometimes called the standard error of the regression. It equals sqrt(SSE/(n-k)).

It is not to be confused with the standard error of y itself (from descriptive statistics) or with thestandard errors of the regression coefficients given below.

R 2 = 0.8025 means that 80.25% of the variation of y i around ybar (its mean) is explained by theregressors x 2i and x 3i.

INTERPRET ANOVA TABLE

An ANOVA table is given. This is often skipped.

df SS MS F Significance F

Regression 2 1.6050 0.8025 4.0635 0.1975

Residual 2 0.3950 0.1975

Total 4 2.0

The ANOVA (analysis of variance) table splits the sum of squares into its components.

Total sums of squares= Residual (or error) sum of squares + Regression (or explained) sum of squares.

Thus i (y i - ybar) 2 = i (y i - yhat i)2 + i (yhat i - ybar) 2 where yhat i is the value of y i predicted from the regression lineand ybar is the sample mean of y.

For example:R 2 = 1 - Residual SS / Total SS (general formula for R 2)

= 1 - 0.3950 / 1.6050 (from data in the ANOVA table)= 0.8025 (which equals R 2 given in the regression Statistics table).

The column labeled F gives the overall F- test of H0: 2 = 0 and 3 = 0 versus Ha: at least one of2 and 3 does not equal zero.Aside: Excel computes F this as:F = [Regression SS/(k-1)] / [Residual SS/(n-k)] = [1.6050/2] / [.39498/2] = 4.0635.

The column labeled significance F has the associated P-value.Since 0.1975 > 0.05, we do not reject H0 at signficance level 0.05.

Note: Significance F in general = FINV(F, k-1, n-k) where k is the number of regressors


5/14

including hte intercept.Here FINV(4.0635,2,2) = 0.1975.

INTERPRET REGRESSION COEFFICIENTS TABLE

The regression output of most interest is the following table of coefficients and associatedoutput:

Coefficient St. error t Stat P-value Lower 95% Upper 95%

Intercept 0.89655 0.76440 1.1729 0.3616 -2.3924 4.1855

HH SIZE 0.33647 0.42270 0.7960 0.5095 -1.4823 2.1552

CUBED HH SIZE 0.00209 0.01311 0.1594 0.8880 -0.0543 0.0585

Let j denote the population coefficient of the jth regressor (intercept, HH SIZE and CUBED HHSIZE).

Then

Column " Coefficient " gives the least squares estimates of j. Column " Standard error " gives the standard errors (i.e.the estimated standard deviation)

of the least squares estimates b j of j. Column " t Stat " gives the computed t- statistic for H0: j = 0 against Ha: j 0.

This is the coefficient divided by the standard error. It is compared to a t with (n-k)degrees of freedom where here n = 5 and k = 3.

Column " P-value " gives the p- value for test of H0: j = 0 against Ha: j 0..

This equals the Pr{|t| > t-Stat}where t is a t-distributed random variable with n-k degreesof freedom and t-Stat is the computed value of the t-statistic given in the previouscolumn.

Note that this p-value is for a two-sided test. For a one-sided test divide this p-value by 2(also checking the sign of the t-Stat).

Columns "Lower 95%" and "Upper 95%" values define a 95% confidence interval for j.

A simple summary of the above output is that the fitted line is

y = 0.8966 + 0.3365*x + 0.0021*z

CONFIDENCE INTERVALS FOR SLOPE COEFFICIENTS

95% confidence interval for slope coefficient 2 is from Excel output (-1.4823, 2.1552).


6/14

Excel computes this as b2 t_.025(3) se(b 2)

= 0.33647 TINV(0.05, 2) 0.42270= 0.33647 4.303 0.42270= 0.33647 1.8189

= (-1.4823, 2.1552).

Other confidence intervals can be obtained.For example, to find 99% confidence intervals: in the Regression dialog box (in the DataAnalysis Add-in),check the Confidence Level box and set the level to 99%.

TEST HYPOTHESIS OF ZERO SLOPE COEFFICIENT ("TEST OF STATISTICALSIGNIFICANCE")

The coefficient of HH SIZE has estimated standard error of 0.4227, t-statistic of 0.7960 and p-

value of 0.5095.It is therefore statistically insignificant at significance level = .05 as p > 0.05.

The coefficient of CUBED HH SIZE has estimated standard error of 0.0131, t-statistic of 0.1594and p-value of 0.8880.It is therefore statistically insignificant at significance level = .05 as p > 0.05.

There are 5 observations and 3 regressors (intercept and x) so we use t(5-3)=t(2).For example, for HH SIZE p = =TDIST(0.796,2,2) = 0.5095.

TEST HYPOTHESIS ON A REGRESSION PARAMETER Here we test whether HH SIZE has coefficient 2 = 1.0.

Example: H0: 2 = 1.0 against Ha: 2 1.0 at significance level = .05.

Thent = (b 2 - H0 value of 2) / (standard error of b 2 )

= (0.33647 - 1.0) / 0.42270= -1.569.

Using the p-value approach p-value = TDIST(1.569, 2, 2) = 0.257. [Here n=5 and k=3 so n-k=2]. Do not reject the null hypothesis at level .05 since the p-value is > 0.05.

Using the critical value approach

We computed t = -1.569


7/14

The critical value is t_.025(2) = TINV(0.05,2) = 4.303. [Here n=5 and k=3 so n-k=2]. So do not reject null hypothesis at level .05 since t = |-1.569| < 4.303.

OVERALL TEST OF SIGNIFICANCE OF THE REGRESSION PARAMETERS

We test H0: 2 = 0 and 3 = 0 versus Ha: at least one of 2 and 3 does not equal zero.

From the ANOVA table the F-test statistic is 4.0635 with p-value of 0.1975.Since the p-value is not less than 0.05 we do not reject the null hypothesis that the regression

parameters are zero at significance level 0.05.Conclude that the parameters are jointly statistically insignificant at significance level 0.05.

Note: Significance F in general = FINV(F, k-1, n-k) where k is the number of regressorsincluding hte intercept.Here FINV(4.0635,2,2) = 0.1975.

PREDICTED VALUE OF Y GIVEN REGRESSORS

Consider case where x = 4 in which case CUBED HH SIZE = x^3 = 4^3 = 64.

yhat = b 1 + b 2 x2 + b 3 x3 = 0.88966 + 0.33654 + 0.002164 = 2.37006

EXCEL LIMITATIONS

Excel restricts the number of regressors (only up to 16 regressors ??).

Excel requires that all the regressor variables be in adjoining columns.You may need to move columns to ensure this.e.g. If the regressors are in columns B and D you need to copy at least one of columns B and Dso that they are adjacent to each other.

Excel standard errors and t-statistics and p-values are based on the assumption that the error isindependent with constant variance (homoskedastic).Excel does not provide alternaties, such asheteroskedastic-robust or autocorrelation-robuststandard errors and t-statistics and p-values.

More specialized software such as STATA, EVIEWS, SAS, LIMDEP, PC-TSP, ... is needed.For further information on how to use Excel go to

http://cameron.econ.ucdavis.edu/excel/excel.html

EXCEL 2007: Multiple Regression
http://cameron.econ.ucdavis.edu/excel/excel.htmlhttp://cameron.econ.ucdavis.edu/excel/excel.htmlhttp://cameron.econ.ucdavis.edu/excel/excel.html


8/14

A. Colin Cameron, Dept. of Economics, Univ. of Calif. - Davis

This January 2009 help sheet gives information on

Multiple regression using the Data Analysis Add-in.

Interpreting the regression statistic. Interpreting the ANOVA table (often this is skipped). Interpreting the regression coefficients table. Confidence intervals for the slope parameters. Testing for statistical significance of coefficients Testing hypothesis on a slope parameter. Testing overall significance of the regressors. Predicting y given values of regressors. Excel limitations.

There is little extra to know beyond regression with one explanatory variable.

The main addition is the F-test for overall fit.

MULTIPLE REGRESSION USING THE DATA ANALYSIS ADD-IN

This requires the Data Analysis Add-in: see Excel 2007: Access and Activating the DataAnalysis Add-in The data used are in carsdata.xls

We then create a new variable in cells C2:C6, cubed household size as a regressor.Then in cell C1 give the the heading CUBED HH SIZE.

(It turns out that for the se data squared HH SIZE has a coefficient of exactly 0.0 the cube isused).

The spreadsheet cells A1:C6 should look like:

We have regression with an intercept and the regressors HH SIZE and CUBED HH SIZE

The population regression model is: y = 1 + 2 x2 + 3 x3 + uIt is assumed that the error u is independent with constant variance (homoskedastic) - seeEXCEL LIMITATIONS at the bottom.
http://cameron.econ.ucdavis.edu/excel/ex01access.htmlhttp://cameron.econ.ucdavis.edu/excel/ex01access.htmlhttp://cameron.econ.ucdavis.edu/excel/ex01access.htmlhttp://cameron.econ.ucdavis.edu/excel/ex01access.htmlhttp://cameron.econ.ucdavis.edu/excel/carsdata.xlshttp://cameron.econ.ucdavis.edu/excel/carsdata.xlshttp://cameron.econ.ucdavis.edu/excel/carsdata.xlshttp://cameron.econ.ucdavis.edu/excel/carsdata.xlshttp://cameron.econ.ucdavis.edu/excel/ex01access.htmlhttp://cameron.econ.ucdavis.edu/excel/ex01access.html


9/14

We wish to estimate the regression line: y = b 1 + b 2 x2 + b 3 x3

We do this using the Data analysis Add-in and Regression.

The only change over one-variable regression is to include more than one column in the Input XRange.

Note, however, that the regressors need to be in contiguous columns (here columns B and C).If this is not the case in the original data, then columns need to be copied to get the regressors incontiguous columns.

Hitting OK we obtain


10/14

The regression output has three components:

Regression statistics table ANOVA table Regression coefficients table.

INTERPRET REGRESSION STATISTICS TABLE

This is the following output. Of greatest interest is R Square.

Explanation

Multiple R 0.895828 R = square root of R

R Square 0.802508 R

Adjusted R Square 0.605016 Adjusted R used if more than one x variable

Standard Error 0.444401 This is the sample estimate of the standard deviation of the error u

Observations 5 Number of observations used in the regression (n)

The above gives the overall goodness-of-fit measures:R 2 = 0.8025Correlation between y and y-hat is 0.8958 (when squared gives 0.8025).Adjusted R 2 = R 2 - (1-R 2 )*(k-1)/(n-k) = .8025 - .1975*2/2 = 0.6050.


11/14

The standard error here refers to the estimated standard deviation of the error term u.It is sometimes called the standard error of the regression. It equals sqrt(SSE/(n-k)).It is not to be confused with the standard error of y itself (from descriptive statistics) or with thestandard errors of the regression coefficients given below.

R 2

= 0.8025 means that 80.25% of the variation of y i around ybar (its mean) is explained by theregressors x 2i and x 3i.

INTERPRET ANOVA TABLE

An ANOVA table is given. This is often skipped.

df SS MS F Significance F

Regression 2 1.6050 0.8025 4.0635 0.1975

Residual 2 0.3950 0.1975

Total 4 2.0

The ANOVA (analysis of variance) table splits the sum of squares into its components.

Total sums of squares= Residual (or error) sum of squares + Regression (or explained) sum of squares.

Thus i (y i - ybar) 2 = i (y i - yhat i)2 + i (yhat i - ybar) 2 where yhat i is the value of y i predicted from the regression lineand ybar is the sample mean of y.

For example:R 2 = 1 - Residual SS / Total SS (general formula for R 2)

= 1 - 0.3950 / 1.6050 (from data in the ANOVA table)= 0.8025 (which equals R 2 given in the regression Statistics table).

The column labeled F gives the overall F- test of H0: 2 = 0 and 3 = 0 versus Ha: at least one of2 and 3 does not equal zero.Aside: Excel computes F this as:F = [Regression SS/(k-1)] / [Residual SS/(n-k)] = [1.6050/2] / [.39498/2] = 4.0635.

The column labeled significance F has the associated P-value.Since 0.1975 > 0.05, we do not reject H0 at signficance level 0.05.



12/14

INTERPRET REGRESSION COEFFICIENTS TABLE

The regression output of most interest is the following table of coefficients and associatedoutput:

Coefficient St. error t Stat P-value Lower 95% Upper 95%

Intercept 0.89655 0.76440 1.1729 0.3616 -2.3924 4.1855

HH SIZE 0.33647 0.42270 0.7960 0.5095 -1.4823 2.1552

CUBED HH SIZE 0.00209 0.01311 0.1594 0.8880 -0.0543 0.0585

Let j denote the population coefficient of the jth regressor (intercept, HH SIZE and CUBED HHSIZE).

Then

Column " Coefficient " gives the least squares estimates of j. Column " Standard error " gives the standard errors (i.e.the estimated standard deviation)

of the least squares estimates b j of j. Column " t Stat " gives the computed t- statistic for H0: j = 0 against Ha: j 0.

This is the coefficient divided by the standard error. It is compared to a t with (n-k)degrees of freedom where here n = 5 and k = 3.

Column " P-value " gives the p- value for test of H0: j = 0 against Ha: j 0..

This equals the Pr{|t| > t-Stat}where t is a t-distributed random variable with n-k degreesof freedom and t-Stat is the computed value of the t-statistic given in the previouscolumn.

Note that this p-value is for a two-sided test. For a one-sided test divide this p-value by 2(also checking the sign of the t-Stat).

Columns "Lower 95%" and "Upper 95%" values define a 95% confidence interval for j.

A simple summary of the above output is that the fitted line is

y = 0.8966 + 0.3365*x + 0.0021*z

CONFIDENCE INTERVALS FOR SLOPE COEFFICIENTS

95% confidence interval for slope coefficient 2 is from Excel output (-1.4823, 2.1552).

Excel computes this as b2 t_.025(3) se(b 2)


13/14

= 0.33647 TINV(0.05, 2) 0.42270= 0.33647 4.303 0.42270= 0.33647 1.8189= (-1.4823, 2.1552).

Other confidence intervals can be obtained.For example, to find 99% confidence intervals: in the Regression dialog box (in the DataAnalysis Add-in),check the Confidence Level box and set the level to 99%.

TEST HYPOTHESIS OF ZERO SLOPE COEFFICIENT ("TEST OF STATISTICALSIGNIFICANCE")

The coefficient of HH SIZE has estimated standard error of 0.4227, t-statistic of 0.7960 and p-value of 0.5095.It is therefore statistically insignificant at significance level = .05 as p > 0.05.

The coefficient of CUBED HH SIZE has estimated standard error of 0.0131, t-statistic of 0.1594and p-value of 0.8880.It is therefore statistically insignificant at significance level = .05 as p > 0.05.

There are 5 observations and 3 regressors (intercept and x) so we use t(5-3)=t(2).For example, for HH SIZE p = =TDIST(0.796,2,2) = 0.5095.

TEST HYPOTHESIS ON A REGRESSION PARAMETER

Here we test whether HH SIZE has coefficient 2 = 1.0.

Example: H0: 2 = 1.0 against Ha: 2 1.0 at significance level = .05.

Thent = (b 2 - H0 value of 2) / (standard error of b 2 )

= (0.33647 - 1.0) / 0.42270= -1.569.

Using the p-value approach

p-value = TDIST(1.569, 2, 2) = 0.257. [Here n=5 and k=3 so n-k=2]. Do not reject the null hypothesis at level .05 since the p-value is > 0.05.

Using the critical value approach

We computed t = -1.569 The critical value is t_.025(2) = TINV(0.05,2) = 4.303. [Here n=5 and k=3 so n-k=2]. So do not reject null hypothesis at level .05 since t = |-1.569| < 4.303.


14/14

OVERALL TEST OF SIGNIFICANCE OF THE REGRESSION PARAMETERS

We test H0: 2 = 0 and 3 = 0 versus Ha: at least one of 2 and 3 does not equal zero.

From the ANOVA table the F-test statistic is 4.0635 with p-value of 0.1975.Since the p-value is not less than 0.05 we do not reject the null hypothesis that the regression parameters are zero at significance level 0.05.Conclude that the parameters are jointly statistically insignificant at significance level 0.05.


PREDICTED VALUE OF Y GIVEN REGRESSORS

Consider case where x = 4 in which case CUBED HH SIZE = x^3 = 4^3 = 64.

yhat = b 1 + b 2 x2 + b 3 x3 = 0.88966 + 0.33654 + 0.002164 = 2.37006

EXCEL LIMITATIONS

Excel restricts the number of regressors (only up to 16 regressors ??).

Excel requires that all the regressor variables be in adjoining columns.

You may need to move columns to ensure this.e.g. If the regressors are in columns B and D you need to copy at least one of columns B and Dso that they are adjacent to each other.

Excel standard errors and t-statistics and p-values are based on the assumption that the error isindependent with constant variance (homoskedastic).Excel does not provide alternaties, such asheteroskedastic-robust or autocorrelation-robuststandard errors and t-statistics and p-values.More specialized software such as STATA, EVIEWS, SAS, LIMDEP, PC-TSP, ... is needed.

For further information on how to use Excel go to

http://cameron.econ.ucdavis.edu/excel/excel.html
http://cameron.econ.ucdavis.edu/excel/excel.htmlhttp://cameron.econ.ucdavis.edu/excel/excel.htmlhttp://cameron.econ.ucdavis.edu/excel/excel.html

Date post:	03-Jun-2018
Category:	Documents
Upload:	may-ann-toyoken
View:	244 times
Download:	0 times

Interpreting Regression Output in Excel

Documents