+ All Categories
Home > Documents > Dummy Variables

Dummy Variables

Date post: 14-Apr-2017
Category:
Upload: annamyem
View: 222 times
Download: 0 times
Share this document with a friend
25
Dummy Variables (K.R. Shanmugam, Madras School of Economics)
Transcript
Page 1: Dummy Variables

Dummy Variables(K.R. Shanmugam, Madras School of Economics)

Page 2: Dummy Variables

IntroductionConsider a simple Two variables

regression:YYii = = + + X Xii +u +ui i Where, Y - Earnings or wages; X - Where, Y - Earnings or wages; X -

Job experienceJob experienceData set: 50 employees dataData set: 50 employees data

Page 3: Dummy Variables

Earnings Equation Results

Wages = 685.9993 + 129.7805 * Experience

Dependent Variable: WAGES Method: Least Squares Date: 03/16/07 Time: 20:50 Sample: 1 50 Included observations: 50

Variable Coefficient Std. Error t-Statistic Prob. C 685.9993 99.41455 6.900391 0.0000

EXPER 129.7805 10.76783 12.05261 0.0000 R-squared 0.751637 Mean dependent var 1796.920

Adjusted R-squared 0.746463 S.D. dependent var 523.0871 S.E. of regression 263.3874 Akaike info criterion 14.02431 Sum squared resid 3329900. Schwarz criterion 14.10079 Log likelihood -348.6077 F-statistic 145.2654 Durbin-Watson stat 1.554684 Prob(F-statistic) 0.000000

Page 4: Dummy Variables

Regression with intercept only

Let everybody has the same experience

That is, experience is a constantThen, R2 =0 Intercept = 1796.92 (What is

this?)

Page 5: Dummy Variables

Different Segments of Sample

In the data set, we find that 25 respondents are male and remaining are female.

That is 25 sample belong to male employees and 25 belong to female.

Two groups or segments of sample. Can we treat them same? No. In general labor markets for

different groups may be different. eg. In agriculture, female gets less

wage than male

Page 6: Dummy Variables

What is Average Wages?

 Male average wages=1867.68 Female average wages= 1726.68 There is a gender (sex) discrimination What should we do? Option 1: Analyze male and female

samples separately Option 2: Analyze them jointly but we

need to take into account the gender difference

Page 7: Dummy Variables

Option 1: Separate Analysis

What is the difference?

Dependent Variable: WAGES Method: Least Squares Date: 03/16/07 Time: 20:51 Sample: 1 25 Included observations: 25

Variable Coefficient Std. Error t-Statistic Prob. C 610.0101 128.9632 4.730111 0.0001

EXPER 129.7849 13.89676 9.339215 0.0000 R-squared 0.791328 Mean dependent var 1726.160

Adjusted R-squared 0.782256 S.D. dependent var 519.2505 S.E. of regression 242.2984 Akaike info criterion 13.89484 Sum squared resid 1350295. Schwarz criterion 13.99235 Log likelihood -171.6854 F-statistic 87.22095 Durbin-Watson stat 1.769131 Prob(F-statistic) 0.000000

Dependent Variable: WAGES Method: Least Squares Date: 03/16/07 Time: 20:51 Sample: 26 50 Included observations: 25

Variable Coefficient Std. Error t-Statistic Prob. C 757.5909 145.1904 5.217912 0.0000

EXPER 130.2921 15.80774 8.242302 0.0000 R-squared 0.747074 Mean dependent var 1867.680

Adjusted R-squared 0.736077 S.D. dependent var 527.8150 S.E. of regression 271.1568 Akaike info criterion 14.11989 Sum squared resid 1691099. Schwarz criterion 14.21740 Log likelihood -174.4986 F-statistic 67.93554 Durbin-Watson stat 1.641091 Prob(F-statistic) 0.000000

Page 8: Dummy Variables

Earning functions for Male and female Separately

610.01

=130

757.59

=130

Slope is the same; but intercept is different!

MALE FEMALE

Page 9: Dummy Variables

Option 2: Joint Analysis Take into account the Gender Differences? Gender is a qualitative factor and not

readily quantifiableSolution: Dummy variable-specially constructed

variable to represent gender difference Implicit Assumption: Regression lines for

different groups differ only in intercept but have same slope coefficient

Page 10: Dummy Variables

Option 2: Use of a dummy variable

Dummy Variable: Definition Artificially created variable by us to

incorporate the effect of a variable that is not readily quantifiable.

That is, Dummy variables are a device of incorporating in to the regression model certain variables that are not readily quantified such as region, time, occupation and ownership.

Page 11: Dummy Variables

How do we create a Dummy?

For our case,D1 =Gender=1 if respondent is a

male 0 if respondent is a female

(It takes value 1 for some observations to indicate the presence of a group/category and 0 for the remaining observations)

Dummy is also called as : Indicator Variable, Binary Variable, Categorical Variable, Dichotomous Variable, and Qualitative Variable

Page 12: Dummy Variables

Option 2: Single Model with Dummy

Regression Model: Yi = 1 + 2 D1 + X Estimated relationship for Two Groups:E (Y|X, D1=0) = 1 + X (for female)E (Y|X, D1=1) = 1 + 2+ X (for male)That is, the slope is the same for both Intercept varies: The original intercept

(1) is the intercept for female (base group with dummy value zero)Intercept for male = (1 + 2)

Page 13: Dummy Variables

Diagrammatic Explanation

X

Y

1 + X

1+2+X

Constant Term 1 – intercept for base group; 1 + 2 – intercept for male; and 2 the coefficient of the dummy variable measures the difference in intercept

1

2

Page 14: Dummy Variables

Option 2: Estimation Results

Wages = 607.86 + 151.92 GENDER +130.03 Exper (2.649) (1.102) (9.297) R2 =0.64

Since 2 is significant, there exists gender differentials!

Dependent Variable: WAGES Method: Least Squares Date: 03/16/07 Time: 20:50 Sample: 1 50 Included observations: 50

Variable Coefficient Std. Error t-Statistic Prob. C 607.8644 102.9013 5.907259 0.0000

EXPER 130.0344 10.40046 12.50275 0.0000 GENDER 151.9227 71.95553 2.111342 0.0401

R-squared 0.773152 Mean dependent var 1796.920

Adjusted R-squared 0.763499 S.D. dependent var 523.0871 S.E. of regression 254.3842 Akaike info criterion 13.97369 Sum squared resid 3041432. Schwarz criterion 14.08841 Log likelihood -346.3423 F-statistic 80.09379 Durbin-Watson stat 1.721055 Prob(F-statistic) 0.000000

Page 15: Dummy Variables

Alternative Way: Changing base

Define Gender 1= D2 =1 if female 0 if male

Dependent Variable: WAGES Method: Least Squares Date: 03/24/07 Time: 23:48 Sample: 1 50 Included observations: 50

Variable Coefficient Std. Error t-Statistic Prob. C 759.7872 102.1789 7.435854 0.0000

EXPER 130.0344 10.40046 12.50275 0.0000 GENDER1 -151.9227 71.95553 -2.111342 0.0401

R-squared 0.773152 Mean dependent var 1796.920

Adjusted R-squared 0.763499 S.D. dependent var 523.0871 S.E. of regression 254.3842 Akaike info criterion 13.97369 Sum squared resid 3041432. Schwarz criterion 14.08841 Log likelihood -346.3423 F-statistic 80.09379 Durbin-Watson stat 1.721055 Prob(F-statistic) 0.000000

Page 16: Dummy Variables

Suppose, we define two dummy variables as:

D1=1 if male and D2 = 1 if female =0 if female =0 if male

The Regression equation can be specified as:Yi = 1D1(=Gender) + 2 D2 (=Gender1) + X What is the difference?

Overall intercept term is missing. Why?

Alternative Way: Both Dummies

Page 17: Dummy Variables

Both DummiesDependent Variable: WAGES Method: Least Squares Date: 03/16/07 Time: 21:28 Sample: 1 50 Included observations: 50

Variable Coefficient Std. Error t-Statistic Prob. EXPER 130.0344 10.40046 12.50275 0.0000

GENDER 759.7872 102.1789 7.435854 0.0000 GENDER1 607.8644 102.9013 5.907259 0.0000

R-squared 0.773152 Mean dependent var 1796.920

Adjusted R-squared 0.763499 S.D. dependent var 523.0871 S.E. of regression 254.3842 Akaike info criterion 13.97369 Sum squared resid 3041432. Schwarz criterion 14.08841 Log likelihood -346.3423 Durbin-Watson stat 1.721055

Page 18: Dummy Variables

•If we include a constant term, we face the problem of perfect multi-collinearity problem (i.e., linear dependence exists among columns of X Matrix.)This is known as Dummy Variable TrapTo avoid the dummy variable trap, we

can either drop the dummy for one category as in the earlier case or we can include dummies for all categories without intercept term.

Dummy Variable Trap

Page 19: Dummy Variables

Rule

• With overall intercept, use m-1 dummies if m groups or category or without intercept, use m dummies for m groups • If there is no intercept, then the coefficients of dummy variables measure the intercepts for respective groups• Wages =759.78 Gender +607.86 Gender1 + 130.03 X (7.44) (5.91) (12.50)

Page 20: Dummy Variables

Salest = + pt + 1 D1 + 2 D2 + 3 D3

where, D1 =1 if 1st Quarter; 0 otherwise

D2 = 1 if 2nd Quarter; 0 otherwise

D3 = 1 if 3rd Quarter;0 otherwise

(or) Salest= pt + 1 D1 + 2 D2 + 3 D3 + 4

D4

Several Categories: Suppose we want to control the seasons when we analyze the sales for umbrella

Page 21: Dummy Variables

Several Qualitative Variables

Suppose there are two qualitative factors: Sex, and race

Define dummy variables as:Gender = 1 if male and =0

otherwise Race = 1 if belong to white and

=0 if black

Page 22: Dummy Variables

With Two Qualitative Factors

Dependent Variable: WAGES Method: Least Squares Date: 03/16/07 Time: 20:58 Sample: 1 50 Included observations: 50

Variable Coefficient Std. Error t-Statistic Prob. C 488.9543 102.5994 4.765664 0.0000

EXPER 129.6110 9.591059 13.51373 0.0000 GENDER 168.2290 66.56434 2.527314 0.0150

RACE 204.2518 67.05229 3.046157 0.0038 R-squared 0.811231 Mean dependent var 1796.920

Adjusted R-squared 0.798920 S.D. dependent var 523.0871 S.E. of regression 234.5626 Akaike info criterion 13.82994 Sum squared resid 2530902. Schwarz criterion 13.98290 Log likelihood -341.7485 F-statistic 65.89459 Durbin-Watson stat 1.433962 Prob(F-statistic) 0.000000

C intercept for both base groups-female and black

Intercept for male = c+168.23; and for white=c+204.25

Page 23: Dummy Variables

Example 3: Consumption function analysis.

Suppose there are three qualitative factors: Sex, age of household head and education level of head.Define dummy variables as:D1 = 1 if sex is male and =0 otherwise D2 = 1 if age <25 and =0 otherwiseD3 = 1 if age between 25 and 50 and =0 otherwiseD4 = 1 if high school education and =0 otherwise D5 = 1 if H.sc., Degree and above and =0 otherwise

With 3 Qualitative Factors

Page 24: Dummy Variables

Example 3: Base or Reference Groups

Sex: Female

Age: Above 50 years

Education: Below High School

Regression Model:Ct = + Yt + 1D1+ 2D2 + 3D3 + 4D4 + 5D5 + ut

- the intercept term for female head of household - the intercept term if age of head is above 50 years - the intercept term if head’s education is below high

school

Page 25: Dummy Variables

Intercepts for Other Groups:

+ 1- for male household head

+ 2 – for age is less than 25 years

+ 3 - for age between 25 and 50 years

+ 4 – for high school education

+ 5 –for above high school education

If the household head is male with age 40 years and high school education, what is the intercept? + 1+ 3+ 4


Recommended