+ All Categories
Home > Documents > Lecture 13. Dummy variables - University of Southern...

Lecture 13. Dummy variables - University of Southern...

Date post: 28-Aug-2019
Category:
Upload: duongnhan
View: 214 times
Download: 0 times
Share this document with a friend
40
Lecture 13. Dummy variables Types of variables Continuous (income, height, weight, etc.) Discrete (gender, season, points scored etc.) Continuous variables have Origin, i.e. value is 0 Unit of measurement Often obvious, e.g. price in US$. In regression both origin and unit of measurement can be changed.
Transcript
Page 1: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:

Lecture 13. Dummy variables Types of variables

• Continuous (income, height, weight, etc.) • Discrete (gender, season, points scored etc.)

Continuous variables have

• Origin, i.e. value is 0 • Unit of measurement

Often obvious, e.g. price in US$. In regression both origin and unit of measurement can be changed.

Page 2: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:

Discrete variables: three types

• Counts, e.g. number of runs scored • Ordinal, e.g. agree/neutral/disagree • Nominal/categorical, e.g. gender

With counts there is obvious origin and also unit of measurement is obvious Continuous variables and counts together are called quantitative variables

Page 3: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:

With ordinal variables there is no origin and no unit of measurement, but there is an order With nominal variables there is no unit of measurement and no origin and even no order Ordinal and nominal variables are called qualitative variables

Page 4: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:

Discrete variables can be

• Dependent variable • Independent variable

If dependent variable is discrete various problems, e.g. in uXY ++= βα random error cannot be continuous variable and hence cannot have a normal distribution

u

In this lecture we consider qualitative variables as independent variables in linear regression models.

Page 5: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:

To use a qualitative variable as an independent variables in a linear regression uXY ++= βα we must first attach numerical values to the categories. For this dummy/indicator variables are very useful. A dummy/indicator variable D is a variable that has two values: 0 and 1

Page 6: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:

Consider gender with categories female and male. We could choose

if i is female 0=iD(1) if i is male 1=iD

or

if i is male 0* =iD(2) if i is female 1* =iD

Because the labels are arbitrary this should not make a difference. Note the 0 is not the origin and 1 is not the unit of measurement. They are just labels and we could have used –2 and 99 instead (but that is not a convenient choice).

Page 7: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:

The category with label 0 is called the control or reference category (I prefer reference category) Now consider the regression model uDY ++= βα with as in (1) and with D Y is monthly salary. What is the interpretation of βα , ?

Page 8: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:

If assumption 2 of the CLR model holds, then 0)1|()0|( ==== DuEDuE and hence αα ==+== )0|()0|( DuEDYE βαβα +==++== )1|()1|( DuEDYE with

)0|( =DYE is average monthly salary female employees (reference category)

)1|( =DYE is average monthly salary male employees

Page 9: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:

This suggests for OLS estimators βα ˆ,ˆ femaleY=α̂ maleY=+ βα ˆˆ and hence femalemale YY −=β̂ Intercept is average for reference category

Page 10: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:

Example: Sample of 49 employees , 26=malen 23=femalen 93.2086=maleY , 70.1518=femaleY Compare with regression results: 70.1518ˆ =α , 23.568ˆ =β Advantage of regression: direct confidence interval of/test for salary difference between male and female employees

Page 11: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:
Page 12: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:

If we replace by D *D , i.e. now 0 indicates male and 1 female we have the regression model uDY ++= *** βα and *)0*|( α==DYE **)1*|( βα +==DYE and hence maleY=*α̂ malefemale YY −=*β̂

Page 13: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:
Page 14: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:

For the OLS estimates we find 92.2086*ˆ =α 23.568*ˆ −=β Note and standard error is identical: tests/confidence intervals give same conclusion.

ββ ˆ*ˆ −=

Is the result a proof of gender discrimination? Why (not)?

Page 15: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:

Now consider two dummy variables if i is female 01 =iD

1 if i is male 1 =iDand

if is nonwhite 02 =iD i

12 =iD if i is white

Page 16: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:

We consider the following models (1) uDDY +++= 33121 βββ (2) uDDDDY ++++= 21423121 ββββ We consider the salary difference between men and women by ethnicity.

Page 17: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:

In model (1)

)1,0|()1,1|(

)0,0|()0,1|(

2121

22121

==−===

====−==

DDYEDDYE

DDYEDDYE β

Restriction: Salary difference the same for whites and nonwhites In model (2)

22121 )0,0|()0,1|( β===−== DDYEDDYE and

422121 )1,0|()1,1|( ββ +===−== DDYEDDYE

Page 18: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:

Estimation results: Salary difference only for whites. Also: Race difference only for men. Model (2) has an interaction term . 21DD

Page 19: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:
Page 20: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:
Page 21: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:

Next, we consider qualitative variable with more than 2 categories Examples: State of residence, level of education, income category (grouped continuous variable)

0=S if no high school diploma

1=S if high school diploma, but no college degree

2=S if college degree Using S in this way is bad idea (why?)

Page 22: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:

Instead we introduce two dummy variables if high school diploma, but no 11 =S

college degree

0 otherwise 1 =S and

if college degree 12 =S

otherwise 02 =S Note: reference group has not a high school diploma

Page 23: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:

Regression model uSSY +++= 23121 βββ Now

1β is average of Y for reference group (no high school diploma)

21 ββ + is average of Y for group with high school diploma, but

no college degree

31 ββ + is average of Y for group with college degree

Page 24: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:

How do you test

• Education has no impact on income • The return (in income) to having a college degree is 0

Give and indicate which test you want to use. 0H Define

if no high school diploma 13 =S otherwise 03 =S Consider the regression model uSSSY ++++= 3423121 ββββ Why can the coefficients of this model not be estimated?

Page 25: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:

This is called the dummy variable trap Example: Monthly salary and type of work Maint=maintenance work Crafts=works in crafts Clerical=clerical work Reference category is professional Interpret the constant and the other coefficients.

Page 26: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:
Page 27: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:

Combining quantitative and qualitative independent variables Consider the model uXDY +++= 321 βββ with Y is log of monthly salary, is gender and D X is education (in years of schooling) In relation between Y and X the intercept is 1β for women and

21 ββ + for men (see figure) Estimation results (what is interpretation of coefficient of gender?) Note that gender difference is not due to difference in level of education.

Page 28: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:

Consider two other models (3) uXDXY +++= 321 βββ In this model intercept is the same but slope is different for men and women (see figure) For women slope is 2β For men slope is 32 ββ + (4) uXDXDY ++++= 4321 ββββ In this model both slope and intercept are different

Page 29: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:
Page 30: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:
Page 31: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:
Page 32: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:

Model for women uXY ++= 31 ββ and for men uXY ++++= )( 4321 ββββ This amounts to splitting the sample and estimating two separate regressions OLS estimates Advantage dummy approach: Tests

Page 33: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:
Page 34: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:
Page 35: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:
Page 36: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:
Page 37: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:
Page 38: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:
Page 39: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:
Page 40: Lecture 13. Dummy variables - University of Southern ...ridder/Lnotes/Undeconometrics/Transparanten/Lecture 13.pdf · dummy/indicator variable D is a variable that has two values:

Recommended