+ All Categories
Home > Documents > Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2....

Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2....

Date post: 30-Dec-2015
Category:
Upload: willis-mccormick
View: 219 times
Download: 0 times
Share this document with a friend
73
Regression Continued: Functional Form LIR 832
Transcript
Page 1: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Regression Continued:Functional Form

LIR 832

Page 2: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Topics for the Evening

1. Qualitative Variables

2. Non-linear Estimation

Page 3: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

Not all relations among variables are linear: Our basic linear model:

y=0+ 1X1 + 2X2 +…+ kXk + e

Page 4: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

Q: Given that we are using OLS, can we mimic these non-linear forms?

A: We have a small bag of tricks which we can use with OLS.

Page 5: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

Page 6: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

Page 7: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

Page 8: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

A first point about functional form: You must have an intercept. Consider the following case: We estimate a model and test

the intercept to determine if it is significantly different than zero. We are not able to reject the null in a hypothesis test and we decide to re-estimate the model without an intercept. What is really going on?

Return to our basic model:

y=0+ 1X1 + 2X2 +…+ kXk + e What are we doing when we remove the intercept?

y=+ 1X1 + 2X2 +…+ kXk + e

Page 9: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

Page 10: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

Page 11: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

/* Regression without an intercept */Regression Analysis: weekearn versus years ed

The regression equation isweekearn = 57.3 years ed

47576 cases used, 7582 cases contain missing values

Predictor Coef SE Coef T PNoconstantyears ed 57.3005 0.1541 371.96 0.000

S = 534.450

Page 12: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

/* Regression with an intercept */Regression Analysis: weekearn versus years ed

The regression equation isweekearn = - 485 + 87.5 years ed

47576 cases used, 7582 cases contain missing values

Predictor Coef SE Coef T PConstant -484.57 18.18 -26.65 0.000years ed 87.492 1.143 76.54 0.000

S = 530.510 R-Sq = 11.0% R-Sq(adj) = 11.0%

Page 13: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

Consequences of forcing through zero: Unless the intercept is really zero, we are going to bias

both the intercept and the slope coefficients. Remember that we calculate the intercept so that the line

passes through the point of means: Assures that the Σε = 0 If we impose 0 as the intercept, the line may not pass through the

point of means and the sum of the errors may not equal zero. Biases the coefficients and leads to incorrect estimates of the

standard errors of the βs. Never suppress the intercept, even if your theory suggests

that it is not necessary.

Page 14: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

/* What About Those Residuals? */

Descriptive Statistics: RESI1, RESI2

Variable N N* Mean SE Mean StDev Minimum Q1 Median

RESI1 47576 7582 -8.67 2.45 534.38 -1180.31 -359.12 -122.21

RESI2 47576 7582 0.00 2.43 530.50 -1329.77 -340.32 -107.62

Variable Q3 Maximum

RESI1 218.59 2311.61

RESI2 237.69 2494.26

Page 15: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

Returning to the issue of non-linearity… In our basic model:

= Y/X = change in Y for a one-unit change in X Consider the effect of Education on base salary…

Page 16: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

Descriptive Statistics: years ed, Exp

Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Maximumyears ed 55158 0 15.734 0.00941 2.211 1.000 14.000 16.000 18.000 21.000Exp 55107 51 21.644 0.0496 11.640 0.0000 13.000 22.000 30.000 76.000

Regression Analysis: weekearn versus years ed

The regression equation isweekearn = - 485 + 87.5 years ed

47576 cases used, 7582 cases contain missing values

Predictor Coef SE Coef T PConstant -484.57 18.18 -26.65 0.000years ed 87.492 1.143 76.54 0.000

S = 530.510 R-Sq = 11.0% R-Sq(adj) = 11.0%

Page 17: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

Now create a graph in MINITAB: Work in a new worksheet: Create values for years of education 0 - 21 Use the calculator to create the predicted weekly

earnings. Use the scatterplot graphing function:

Page 18: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

Every year of education increases earnings by $87.49!

Page 19: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

Q: How do we estimate non-linear relations? A: We can use log transforms of variables to measure

relations between variables as percentages rather than units.

What is a log? What is a log transform? Take any number, let’s take 10. Then calculate b such that 10 = 2.71828b. Then b is the log of

10. In this case b = 2.302585. You can do this on your calculator, in a spreadsheet, or in

MINITAB.

Page 20: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

As your text shows: ln(100) = 4.605 100 = 2.71828b

ln(1000) = 6.908 1000 = 2.71828b

ln(10,000) = 9.210 10,000 = 2.71828b

ln(1,000,000) = 13.816 1,000,000 = 2.71828b

We typically do not write 2.71828, rather we substitute e the natural base (there are also base 10 logs). So… 10 = e2.302585

Some nice properties of log functions: ln(X*Y) = ln(X) + ln(Y) ln(X2) = 2*ln(X)

Page 21: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

This property made it possible to manipulate very large numbers very easily and provides the foundation for slide rules and many modern computer calculations. Consider: 1,212,345*375,282 A real mess to do by hand

Now consider the following transformation of this problem: ln(1,212,345*375,282)

=ln(1,212,345) + ln(375,282) =14.008067 + 12.83543 = 26.8435 = 2.7182826.8435 = antilog(26.8435) = 45,484,956.5078803

Page 22: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

The Shell presentation has an equation associated with an upward curve of: Earnings = 62988x0.2676

Or… y=0X1

We cannot estimate this in its current form using regression, but think about taking the log of each side: ln(y) = ln(0X1) ln(y) = ln(0)+ln(X1) ln(y) = ln(0)+1ln(X)

So, if we take the log of each side, we get a linear equation that we can estimate!

Page 23: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

Consider the following equation: (single log equation) ln(weekearn) = 0 + 1*YearsEd + e

The interpretation of the coefficient on years of education is now the % change in base salary for a 1 year change in Education.

How to do this in MINITAB: Calculate the log of weekly earnings Estimate the regression as…

Page 24: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

Regression Analysis: ln week earn versus years ed

The regression equation isln week earn = 4.87 + 0.109 years ed

47576 cases used, 7582 cases contain missing values

Predictor Coef SE Coef T PConstant 4.86646 0.02382 204.33 0.000years ed 0.108980 0.001497 72.78 0.000

S = 0.694967 R-Sq = 10.0% R-Sq(adj) = 10.0%

Analysis of Variance

Source DF SS MS F PRegression 1 2558.4 2558.4 5297.03 0.000Residual Error 47574 22977.3 0.5Total 47575 25535.6

Page 25: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

Now we find that an additional year of education results in a 10.98% increase in salary. Interpretation is different from linear model r2 is different between linear and log model.

Linear: r2 =11.0% Log: r2 = 10.0%

Does this mean the fit of the log model is worse than the linear model?

No, cannot compare the two because you have transformed the equation. Fundamentally altered the variance of the dependent variable.

Page 26: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

Descriptive Statistics: weekearn, ln week earn

Variable N N* Mean SE Mean StDev Minimum Q1 Medianweekearn 47576 7582 894.53 2.58 562.22 0.01 519.00 769.23ln week earn 47576 7582 6.5843 0.00336 0.7326 -4.6052 6.2519 6.6454

Variable Q3 Maximumweekearn 1153.00 2884.61ln week earn 7.0501 7.967

What Does the Log Model Look Like? -- How to create a prediction in MINITAB & graph: Use regression equation to create estimated log wage from years of

education data Exponentiate the predicted value using the MINITAB calculator Graph predicted wage against years of education

Page 27: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

Page 28: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

What is the equation underlying this model?

Model of growth (such as compound interest)…

Page 29: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

Now lets try another approach, taking the log of both sides (double log equation):

The interpretation of the coefficient on JEP is now the % change in base salary for a 1 % change in JEP.

Note that this is an elasticity (which you will discuss in 809 in talking about supply and demand – the elasticity of labor demand with respect to the wage is the % change in the demand for labor for a 1% change in the wage).

Page 30: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

Regression Analysis: ln week earn versus ln ed

The regression equation is

ln week earn = 2.13 + 1.62 ln ed

47576 cases used, 7582 cases contain missing values

Predictor Coef SE Coef T P

Constant 2.12844 0.06203 34.32 0.000

ln ed 1.62142 0.02254 71.93 0.000

S = 0.695775 R-Sq = 9.8% R-Sq(adj) = 9.8%

Page 31: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

Page 32: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

Page 33: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

What is going on graphically? What are we really doing?

Page 34: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

Page 35: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

Page 36: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

Q: How do we choose? A: Prior work and theory

Is it sensible to measure as a linear model, or does one of these non-linear forms make better sense?

Example: Thinking of the relationship between education and wages: wage = β0 + β1*Years_of_Education

ln(wage) = β0 + β1*Years_of_Education

ln(wage) = β0 + β1*ln(Years_of_Education)

Page 37: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

What does prior work indicate? We typically use a log wage equation rather than a

wage equation because… Turns out the error term is normally distributed in a log

wage equation. More readily compared across models as it is not

dependent on the scaling of the variable. Comparing the effect of education in percentage terms

frees us from the effect of inflation and alternative currencies.

Page 38: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

A more general non-linear form (The Polynomial Form) Problem: Do we really believe that you get an

additional 0.723% in weekly earnings for each year you get older. Hardly makes it worth getting older.

Page 39: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

Regression Analysis: ln(wkern) versus age, gender, edattain

The regression equation is

ln(wkern) = 2.41 + 0.00723 age - 0.368 gender + 0.105 edattain

47576 cases used 7582 cases contain missing values

Predictor Coef SE Coef T P

Constant 2.41075 0.06470 37.26 0.000

age 0.0072344 0.0002669 27.11 0.000

gender -0.368278 0.006115 -60.22 0.000

edattain 0.105032 0.001491 70.45 0.000

S = 0.6626 R-Sq = 18.2% R-Sq(adj) = 18.2%

This model remains linear in ln(weekly earnings), each unit increase in age causes earnings to rise by 0.7%.

Page 40: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

It would be more reasonable to believe we will get a relationship which looks like: Why?

Page 41: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

How do we mimic this? Consider estimating the following linear regression:

Notice that age enters twice, first as a linear term and then as a square. What does this model look like with real data?

Page 42: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

Regression Analysis: ln(wkern) versus age, age2, gender, edattain

The regression equation is

ln(wkern) = 0.927 + 0.104 age - 0.00113 age2 - 0.376 gender + 0.0948 edattain

47576 cases used 7582 cases contain missing values

Predictor Coef SE Coef T P

Constant 0.92706 0.06640 13.96 0.000

age 0.103919 0.001547 67.17 0.000

age2 -0.00112565 0.00001776 -63.37 0.000

gender -0.376012 0.005874 -64.01 0.000

edattain 0.094822 0.001441 65.82 0.000

S = 0.6363 R-Sq = 24.6% R-Sq(adj) = 24.6%

Page 43: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

Note that we now have two coefficients on Age: Age .103919 Age2 -0.00112565

We know that the first term indicates that for each additional year our weekly earnings rise by 10.39%. But how do we chart out the second term. so that we have the full effect of age on earnings?

Page 44: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

Page 45: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

The effect of an additional year on earnings (formula for a polynomial model):

If our model is: y = 0 + 1X + 2X2 + ….

Then Y/X = 1+2*2*X First issue, look at the prediction of ln weekly

earnings based on age (leave all other variables at their mean).

Page 46: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

Page 47: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

Page 48: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

What about the ‘marginal effect’ of age? What is the effect on income of getting an additional

year older? Obviously varies with how old you are. Things are pretty

good when you are young Two ways of obtaining this:

1. Calculate the difference in the total effect of age for any two years.

Age22 1.741 Age21 1.686 Diff 0.055 or + 5.5%

Page 49: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

2. Alternatively, use the polynomial formula:

Page 50: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

What is the increase in earnings at age 21? .103919 - .0022513*21 =0.056642

What about age 25? .103919 - .0022513*25 =0.0476365

What about age 50? (Class work) Note that the effect of an additional year of

education is no longer constant, it depends on how old you are.

Page 51: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

Page 52: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Functional Form

The gains to aging are greatest when you are youngest: They decline steadily as you age. By age fifty your earnings are falling as you get older (oops!).

A couple points about polynomial and functional forms: Polynomial forms have the strength of letting the data tell you

if the relationship is linear or not. If it is, the coefficient on X2 will be 0 or very close to it.

You cannot compare r2 across log and non-log forms because it changes the dependent variable and the sum of squares. You can between linear and non-linear forms.

Page 53: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Recap on Functional Form

Not all relationships are linear Regression allows us to estimate non-

linear models and to let the data tell us whether we should be using a non-linear form Single and double log transforms Polynomial form

Page 54: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

MultiCollinearity

Issue: What happens when two variables contain the same, or almost the same information? Condition is called multicollinearity

Page 55: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Perfect MultiCollinearity Is Not a Problem

Try putting both a Male and Female dummy variable in a wage equation

Page 56: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Base Regression: Earnings=F(age, Education)

Regression Analysis: weekearn versus years ed, age

The regression equation is weekearn = - 707 + 83.5 years ed + 6.87 age Predictor Coef SE Coef T P Constant -706.63 19.24 -36.73 0.000 years ed 83.463 1.137 73.38 0.000 age 6.8717 0.2118 32.45 0.000

S = 524.739 R-Sq = 12.9% R-Sq(adj) = 12.9%

Page 57: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Now Put Male & Female Into Model

Regression Analysis: weekearn versus years ed, age, Male, Female

* Female is highly correlated with other X variables

* Female has been removed from the equation.

Page 58: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

The Regression

The regression equation is weekearn = - 720 + 76.4 years ed + 6.29 age + 319

Male

Predictor Coef SE Coef T P Constant -720.28 18.35 -39.25 0.000 years ed 76.432 1.089 70.16 0.000 age 6.2874 0.2021 31.11 0.000 Male 318.522 4.625 68.87 0.000

S = 500.391 R-Sq = 20.8% R-Sq(adj) = 20.8%

Page 59: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Male & Female Contain the Same Information

Correlations: Male, Female

Pearson correlation of Male and Female = -1.000

P-Value = *

Page 60: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

What If Several Variables Contain the Same Information Regression Analysis: weekearn versus age, years ed, Female, NE, MW, S, W

* W is highly correlated with other X variables * W has been removed from the equation.

The regression equation is weekearn = - 392 + 6.25 age + 75.9 years ed - 318 Female + 47.7 NE - 18.2 MW - 20.3 S

47576 cases used, 7582 cases contain missing values

Predictor Coef SE Coef T P Constant -392.10 19.21 -20.42 0.000 age 6.2532 0.2019 30.98 0.000 years ed 75.895 1.089 69.67 0.000 Female -318.406 4.619 -68.93 0.000 NE 47.658 6.768 7.04 0.000 MW -18.155 6.594 -2.75 0.006 S -20.323 6.317 -3.22 0.001

S = 499.701 R-Sq = 21.0% R-Sq(adj) = 21.0%

Page 61: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

What Are the Regional Dummies Correlated With?

Descriptive Statistics: NE, MW, S, W

Variable N N* Mean SE Mean StDev Minimum Q1 Median

NE 55158 0 0.22310 0.00177 0.41633 0.00000 0.00000 MW 55158 0 0.23873 0.00182 0.42631 0.00000 0.00000 S 55158 0 0.29211 0.00194 0.45474 0.00000 0.00000 W 55158 0 0.24606 0.00183 0.43072 0.00000 0.00000

Page 62: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Imperfect MultiCollinearity

Two or more variables contain similar but not identical information

Page 63: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Log Wage Regression Source | SS df MS Number of obs = 156130 -------------+------------------------------ F( 11,156118) = 4227.42 Model | 11630.4798 11 1057.31635 Prob > F = 0.0000 Residual | 39046.5066156118 .250108934 R-squared = 0.2295 -------------+------------------------------ Adj R-squared = 0.2294 Total | 50676.9864156129 .324584071 Root MSE = .50011 ------------------------------------------------------------------------------ lnwage3 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .0712402 .0005528 128.87 0.000 .0701567 .0723237 age2 | -.0007535 6.58e-06 -114.54 0.000 -.0007664 -.0007406 female | -.1999096 .0025452 -78.54 0.000 -.2048982 -.1949211 married | .0947973 .0028481 33.28 0.000 .089215 .1003796 black | -.1314511 .0043814 -30.00 0.000 -.1400385 -.1228637 other | -.0063689 .0057833 -1.10 0.271 -.0177041 .0049663 NE | .0328108 .0038223 8.58 0.000 .0253191 .0403024 Midwest | .007487 .0036482 2.05 0.040 .0003367 .0146373 South | -.0204817 .0035696 -5.74 0.000 -.027478 -.0134854 city1mil | .1440377 .0026054 55.28 0.000 .1389312 .1491443 union2 | .1358151 .0037783 35.95 0.000 .1284097 .1432205 _cons | .9784856 .0107005 91.44 0.000 .9575129 .999458

Page 64: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Switch CBC for Union Source | SS df MS Number of obs = 156130 -------------+------------------------------ F( 11,156118) = 4242.43 Model | 11662.2696 11 1060.20633 Prob > F = 0.0000 Residual | 39014.7168156118 .249905307 R-squared = 0.2301 -------------+------------------------------ Adj R-squared = 0.2301 Total | 50676.9864156129 .324584071 Root MSE = .49991 ------------------------------------------------------------------------------ lnwage3 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .0710808 .0005528 128.59 0.000 .0699974 .0721642 age2 | -.000752 6.58e-06 -114.34 0.000 -.0007649 -.0007391 female | -.2003086 .0025431 -78.77 0.000 -.205293 -.1953242 married | .0946468 .002847 33.24 0.000 .0890668 .1002269 black | -.1321203 .0043799 -30.17 0.000 -.1407048 -.1235358 other | -.0061873 .005781 -1.07 0.284 -.0175179 .0051434 NE | .033546 .0038197 8.78 0.000 .0260595 .0410324 Midwest | .0079032 .0036465 2.17 0.030 .000756 .0150503 South | -.0200437 .003568 -5.62 0.000 -.0270369 -.0130504 city1mil | .1442921 .0026043 55.41 0.000 .1391878 .1493965 cbc2 | .1363582 .0036181 37.69 0.000 .1292668 .1434495 _cons | .9799436 .0106968 91.61 0.000 .9589782 1.000909 ------------------------------------------------------------------------------

Page 65: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Use Union & CBC Source | SS df MS Number of obs = 156130 -------------+------------------------------ F( 12,156117) = 3889.14 Model | 11662.8996 12 971.908303 Prob > F = 0.0000 Residual | 39014.0867156117 .249902872 R-squared = 0.2301 -------------+------------------------------ Adj R-squared = 0.2301 Total | 50676.9864156129 .324584071 Root MSE = .4999 ------------------------------------------------------------------------------ lnwage3 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .0710741 .0005528 128.58 0.000 .0699907 .0721575 age2 | -.0007519 6.58e-06 -114.32 0.000 -.0007648 -.000739 female | -.2001837 .0025443 -78.68 0.000 -.2051704 -.1951969 married | .0946413 .002847 33.24 0.000 .0890612 .1002213 black | -.1321795 .00438 -30.18 0.000 -.1407643 -.1235947 other | -.0061938 .005781 -1.07 0.284 -.0175244 .0051367 NE | .0333811 .0038211 8.74 0.000 .0258919 .0408703 Midwest | .0078341 .0036468 2.15 0.032 .0006864 .0149817 South | -.0199589 .0035684 -5.59 0.000 -.0269529 -.0129649 city1mil | .1442482 .0026044 55.39 0.000 .1391436 .1493528 union2 | .0175444 .0110493 1.59 0.112 -.0041121 .0392008 cbc2 | .1205632 .0105851 11.39 0.000 .0998166 .1413098 _cons | .9800641 .010697 91.62 0.000 .9590982 1.00103

Page 66: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Consequences of MultiCollinearity

Estimates remain unbiased Variances and Standard Errors Increase

Computed t-scores fall Estimates will be very sensitive to

specification Overall fit of the model (r-square) will be

unaffected Predictions are also unaffected

Page 67: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

What Is the Issue

Where there is MultiCollinearity, we need to be careful about interpreting results Can be misleading about effect of variables

Page 68: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Detecting Collinearity

High correlation between variables Issue: multiple variables are collectively

collinear (region example) Variance Inflation Factor

Regress each explanatory variable on all other explanatory variables

Calculate

)1(

12i

iR

VIF

Page 69: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

How Do We Calculate the VIF? Regression Analysis: age versus years ed, Female, NE, MW, S, W

* W is highly correlated with other X variables * W has been removed from the equation.

The regression equation is age = 35.8 + 0.480 years ed - 1.59 Female + 0.098 NE - 0.617 MW - 0.204 S

Predictor Coef SE Coef T P Constant 35.7977 0.3712 96.43 0.000 years ed 0.47978 0.02241 21.41 0.000 Female -1.59360 0.09896 -16.10 0.000 NE 0.0979 0.1443 0.68 0.498 MW -0.6174 0.1416 -4.36 0.000 S -0.2044 0.1349 -1.52 0.130

S = 11.5764 R-Sq = 1.5% R-Sq(adj) = 1.5%

Page 70: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

It’s a Different Story with Regional Variables Regression Analysis: NE versus age, years ed, Female, MW, S, W

The regression equation is NE = 1.00 + 0.000000 age + 0.000000 years ed + 0.000000 Female - 1.00 MW - 1.00 S - 1.00 W

Predictor Coef SE Coef T P Constant 1.00000 0.00000 * * age 0.00000000 0.00000000 * * years ed 0.00000000 0.00000000 * * Female 0.00000000 0.00000000 * * MW -1.00000 0.00000 * * S -1.00000 0.00000 * * W -1.00000 0.00000 * *

S = 0 R-Sq = 100.0% R-Sq(adj) = 100.0%

Page 71: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

CBC Has A High VIF . reg cbc2 age age2 female married black other NE Midwest South city1mil union2

Source | SS df MS Number of obs = 161792 -------------+------------------------------ F( 11,161780) = . Model | 18165.9762 11 1651.45238 Prob > F = 0.0000 Residual | 2301.31742161780 .014224981 R-squared = 0.8876 -------------+------------------------------ Adj R-squared = 0.8876 Total | 20467.2936161791 .126504525 Root MSE = .11927

------------------------------------------------------------------------------ cbc2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .0013903 .0001288 10.80 0.000 .0011379 .0016426 age2 | -.0000133 1.53e-06 -8.72 0.000 -.0000163 -.0000103 female | .0025409 .0005963 4.26 0.000 .0013722 .0037096 married | .0013089 .0006676 1.96 0.050 4.52e-07 .0026174 black | .0063441 .001032 6.15 0.000 .0043214 .0083668 other | -.0016395 .0013597 -1.21 0.228 -.0043046 .0010255 NE | -.0043777 .000895 -4.89 0.000 -.0061319 -.0026234 Midwest | -.0027157 .0008563 -3.17 0.002 -.0043941 -.0010374 South | -.0041338 .0008356 -4.95 0.000 -.0057716 -.0024961 city1mil | -.0018596 .0006102 -3.05 0.002 -.0030555 -.0006636 union2 | .9811512 .0008888 1103.92 0.000 .9794092 .9828932 _cons | -.013585 .0025048 -5.42 0.000 -.0184943 -.0086757

Page 72: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

What To Do About MultiCollinearity

Do Nothing Get More Data

We had 156,000 observations for the wage regressions

Drop the Redundant Variable Care needed in interpretation

Page 73: Regression Continued: Functional Form LIR 832. Topics for the Evening 1. Qualitative Variables 2. Non-linear Estimation.

Compare Specification IssuesOmitted Extraneous MultiCollinearity

Added Variable Right signed & Large in Magnitude

Coefficient close to zero

Right or wrong signed

Significance Highly Significant Non-significant Weak or n.s.

Other Coef Change sign Little Change Possibly change sign

Significance Remains singificant Little Change Becomes weak or n.s.

R-square Increase alot Little change Little change

New Sample Little Difference Little Difference Unstable Estimates


Recommended