Multiple Regression Analysis - Web.UVic.caweb.uvic.ca/~mfarnham/345/T7notes.pdf · A Dummy...

Multiple Regression Analysis

y = β0 + β1x1 + β2x2 + . . . βkxk + u

5. Dummy Variables

0

Dummy Variables

!  A dummy variable (or “indicator variable” is a variable that takes on the value 1 or 0

Examples: !  male (=1 if male, 0 otherwise) !  south (=1 if in the south, 0 otherwise) !  year70 (=1 if year=1970, 0 otherwise) !  Dummy variables are also called binary variables,

for obvious reasons

1

A Dummy Independent Variable

!  Dummy variables can be quite useful in regression analysis

!  Consider the simple model with one continuous variable (x) and one dummy (d)

!  y = β0 + δ0d + β1x + u !  We can interpret d as an intercept shift !  If d = 0, then y = β0 + β1x + u !  If d = 1, then y = (β0 + δ0) + β1x + u !  Think of the case where d = 0 as the base group

(or reference group) 2

Example of δ0 > 0

x

y

{δ0

} β0 y = β0 + β1x

slope = β1 for both! d = 0

y=(β0 + δ0) + β1x

d= 1

3

Dummy Trap

!  Suppose we include a dummy variable called female (=1 if female, 0 otherwise)

!  We could not also include a dummy variable called male (=1 if male, 0 otherwise)

!  This is because for all i, female=1-male; perfect colinearity

!  Intuitively it wouldn’t make sense to have both, we only need 2 intercepts for the 2 groups

!  In general, with n groups use n-1 dummies

4

Dummies With log(y)

!  Not surprisingly, when y is in log form the coefficients have a percentage interpretation

!  Suppose the model is: log(y) = β0 + δ0d + β1x + u

!  100(δ0) = is (very roughly) the percentage change in y between two groups holding x fixed

Example: !  Regression of log of housing prices on dummy for

“heritage” and square feet

5

Interpreting coefficient on dummy with logs Actually, it’s not correct to say that 100*(δ0) is the

percent change in y associated with a 1 unit change in x. Because the model is non-linear, we have to think of the RHS parameters as partial derivatives of the LHS w.r.t. the RHS variables of interest.

But when we take a derivative, we are asking what the change in the LHS is as we make a small change in one of the RHS variables, holding the others constant.

Changing a dummy variable from 0 to 1is not a small change. It’s a discrete jump.

6

Interpreting coefficient on dummy with log y

Changing d from 0 to 1 leads to a change in y. Changing d from 1 to 0 leads to a change in y.

For instance, if δ0=0.2, this implies that changing d from 0 to 1 leads to a 22% increase in y. Changing d from 1 to 0 leads to an 18% decrease in y.

For small values of δ0, the “100*δ0%” approximation will be close. Not so for larger values. 7

�

100* eδ 0 −1( )%

�

100* e−δ 0 −1( )%

Interpreting coefficient on dummy with log y

Things get slightly more complicated when we obtain an estimate of δ0. You can’t just plug the OLS estimate of δ0 into the formula on the previous slide, as this will produce a biased estimate of the effect of changing d from 0 to 1. Instead, for an estimate of the effect of changing d from 0 to 1 use the formula:

For more on quirks involving dummy variables, see http://davegiles.blogspot.ca/2011/03/dummies-for-

dummies.html 8 �

100 * e( ˆ δ 0 −

12

(se( ˆ δ 0 ))2 )−1

⎛ ⎝ ⎜

⎞ ⎠ ⎟ %

Dummies for Multiple Categories 1. Several dummy variables can be included in the

same equation, e.g., y = β0 + δ0d0 + δ1d1 + β1x + u

!  Here the base case is whatever is implied by d0 = d1 = 0 (i.e. when all the dummy variables are zero)

Example: log(hprice)= β0 + δ0heritage + δ1pool + β1sqft + u

!  The base case is a non-heritage house with no pool !  δ0 gives information about the effect of heritage

status (vs. not) on house price; δ1 gives us information about how a pool changes house value 9

Dummies for Multiple Categories

2. We can use dummy variables to control for one variable with more than two categories

!  Suppose everyone in your data is either a HS dropout, HS grad only, or college grad (at least)

!  To compare HS and college grads to HS dropouts, include 2 dummy variables

hsgrad = 1 if HS grad only, 0 otherwise; and colgrad = 1 if college grad, 0 otherwise

10

Multiple Categories (cont) !  Any categorical variable can be turned into a set

of dummy variables !  However, because the base group is represented

by the intercept, if there are n categories there should be n-1 dummy variables

Example: Could turn # of bedrooms into dummies !   If there are a lot of categories, it may make sense

to group some of them together Example: Age, usually divide into age categories !  In fact, can convert continuous variables as well

11

Interactions Among Dummies !   Interacting dummies is like subdividing the group Example: have dummies for male, as well as hsgrad

and colgrad !  Could include the new variables male*hsgrad and

male*colgrad !  We now have a total of 5 dummy variables or 6

categories Base group is female HS dropouts hsgrad female hsgrad colgrad female college grad male male hs dropouts

male*hsgrad male hsgrad male*colgrad male college

grad

12

More on Dummy Interactions Formally, the model is y = β0 + δ1male + δ2hsgrad + δ3colgrad + δ4male*hsgrad + δ5male*colgrad + β1x + u,

then, for example: 1. If male = 0 and hsgrad = 0 and colgrad = 0 !   y = β0 + β1x + u 2. If male = 0 and hsgrad = 1 and colgrad = 0 !   y =β0+δ2hsgrad+β1x+u=(β0+δ2) + β1x + u 3. If male = 1 and hsgrad = 0 and colgrad = 1 !   y =β0+δ1male+δ3colgrad+δ5male*colgrad+β1x +u =(β0+δ1+δ3+δ5)+β1x +u

13

Other Interactions with Dummies

!  Can also consider interacting a dummy variable, d, with a continuous variable, x

!  y = β0 + δ0d + β1x + δ1d*x + u !  If d = 0, then y = β0 + β1x + u !  If d = 1, then y = (β0 + δ0) + (β1 + δ1)x + u !  i.e. the dummy variable allows both the intercept

and the slope to change depending on whether it is “switched” on or off

!  Example: suppose δ0>0 and δ1<0

14

y

x

y = β0 + β1x

Example of δ0 > 0 and δ1 < 0

y = (β0 + δ0) + (β1 + δ1) x d = 1

d = 0

15

Testing for Differences Across Groups

!  Sometimes we will want to test whether a regression function (intercept and slopes) differs across groups

Example: Wage Equation log(wage)=β0 + β1educ + β2exper + u

!  We might think that the intercept and slopes will differ for men and women (e.g. discrimination), so

log(wage)=β0 + δ0male + β1educ + δ1male*educ + β2exper + δ2male*exper + u

16

Testing Across Groups (cont)

!  This can be thought of as simply testing for the joint significance of the dummy and its interactions with all other x variables

!  So, we could estimate the model with all the interactions (unrestricted) and without (restricted) and form the F statistic

!  However, forming all of the interaction variables etc. can be unwieldy

!  There is an alternative way to form the F statistic 17

The Chow Test !  It turns out that we can compute the proper F

statistic without estimating the unrestricted model !  In the general model with an intercept and k

continuous variables we can test for differences across two groups as follows:

1. Run the restricted model for group one and get SSR1, then for group two and get SSR2

2. Run the restricted model for all to get SSR, then

( )[ ] ( )[ ]112

21

21

++−•

++−=

kkn

SSRSSRSSRSSRSSRF

18

The Chow Test (continued)

!  Notice that the Chow test is really just a simple F test for exclusion restrictions, but we’ve realized that SSRur = SSR1 + SSR2

!  Unrestricted model is exactly the same as running the restricted model for the groups separately

!  Note, we have k + 1 restrictions (each of the slope coefficients and the intercept)

!  Note also, the unrestricted model would estimate 2 different intercepts and 2 different slope coefficients, so the denominator df is n – 2k – 2

19

Linear Probability Model

!  Binary variables can also be introduced on the LHS of an equation

!  e.g. y=1 if employed, 0 otherwise !  In this setting the βj’s have a different

interpretation !  As long as E(u|x)=0, then !  E(y|x) = β0 + β1x1 + … + βkxk , or !  P(y = 1|x) = β0 + β1x1 + … + βkxk !  The probability of “success” is a linear function of

the x’s 20

Linear Probability Model (cont)

!  So, the interpretation of βj is the change in the probability of success (y=1) when xj changes, holding the other x’s fixed

!  If xj is education and y is as above, βj is the change in the probability of being employed when education increases by 1 year

!  Predicted y is the predicted probability of success !  One potential problem is that our predictions can

be outside [0,1] !  How can a probability be greater than 1?

21

Linear Probability Model (cont)

!  Even without predictions outside of [0,1], we may estimate marginal effects that are greater than 1 and less than -1

!  Usually best to use values of independent variables near the sample means

!  This model will violate assumption of homoskedasticity

!  Problem? !  Standard errors are not valid (correctable) !  Despite drawbacks, it’s usually a good place to

start when y is binary 22

Nonlinear Alternatives to LPM !  If you go on to practice econometrics with limited

dependent variables (such as dummies), you will learn about some non-linear models that may be more appropriate   e.g. logit, probit models   These are little more complicated to work with than

LPM (marginal effects differ from coefficient estimates, as in other nonlinear models)

  Advantages: They restrict predicted values to lie between 0 and 1; don’t have built-in heteroskedasticity like LPM

  EViews and other packages can estimate such models 23

Policy Analysis Using Dummy Variables

!  Policy analysts often use dummies to measure the effect of a government program   e.g., What’s the effect of raising the minimum wage?   A simple before/after comparison could be done like

this   Take a sample of low-wage firms (say, fast food restaurants) in

a state or province that is changing its minimum wage law.   Sample restaurants before minimum wage increase, then again

after minimum wage increase (could use same restaurants or different ones, as long as sampling is random)

  Estimate model of staffing levels, where Post is dummy indicating whether the observation is for a firm in the post-increase regime or the pre-increase regime (=1 if post)

24

�

#employeesit = β0 + β1Postit + uit


!  If zero conditional mean assumption holds, then gives us the effect of increasing the minimum

wage on employment levels at these firms. What sign do you expect?   Do you see any reason the zero conditional mean assumption

might fail?   What if something else changes as the policy regime changes?

  Economy gets better or worse.   Taxes on business change   Immigration increases size of local labour force   Any of these changes, if not controlled for, would be captured in u and would

be correlated with Post. violation of zero conditional mean assumption   Is there a way (other than controlling for all possible things that

might change simultaneously with minimum wage law) we can deal with this problem?

25

�

ˆ β 1


!  It would be nice if we had a “control group”—a bunch of similar firms existing under similar circumstances that didn’t experiment the minimum wage increase.   How about fast food restaurants on the other side of the

state/provincial border?   Similar economic circumstances   Similar labour supply issues

!  NJ=1 if firm is in NJ; =0 if firm is in Pennsylvania !  Post*NJ is an interaction of the two dummies

26 �

#employeesit = β0 + β1Postit + β2NJ + β3Post *NJit + uit


!  What this analysis does is look at the effect of an increase in the minimum wage in NJ, using fast-food restaurants in Pennsylvania as a control group.   We have to assume that other determinants of

employment that change at the same time as the minimum wage change are occurring in both states and with equal magnitudes (recessions, booms, etc.)

  One way to reassure ourselves (other than looking at data on the local economies) is to look at employment trends in FF restaurants in both states. If they tend to move together closely, that will provide some reassurance about our “identifying assumption”

27


!   Note that the ideal experiment would let us observe NJ at one point in time, simultaneously in 2 states of the world   The state with no minimum wage increase   The state with minimum wage increase

!   Unfortunately we can’t observe these parallel universes; with this approach we’re trying to do the next best thing   We’re inferring a counterfactual (parallel universe where NJ

experiences no minimum wage increase), by observing what happens in PA

  The counterfactual for NJ assumes NJ follows the same trend as in PA; then we just compare the actual (with min wage increase) outcome in NJ to the (inferred) counterfactual for NJ

  This gives us our estimate of the effect of increasing the minimum wage.

28


!  Focus on this picture and see if you can deduce for yourself what the effect of a minimum wage increase in NJ is (assuming common trends in employment in PA and NJ).

29


30

�


�

ˆ β 0�

ˆ β 2

�

ˆ β 1

�

ˆ β 3

�

ˆ β 1, ˆ β 2 < 0

�

ˆ β 0, ˆ β 3 > 0


!  Given this model, average employment at a firm is   in PA before min wage increase   in PA after min wage increase   in NJ before min wage increase   in NJ after min wage increase

!  Change in employment in NJ: !  Change in employment in PA:

  Note: This is the counterfactual change in employment for NJ

!  Diff-in-Diff (ΔNJ-ΔPA): 31

�

β0

�

β0 + β1

�

β0 + β2

�

β0 + β1 + β2 + β3

�

β1 + β3

�

β1

�

β3

�



!   This is called a “difference in difference” analysis. gives the effect of the minimum wage.

!   Card and Krueger (1994) studied a minimum wage law change in NJ in 1992, using fast food restaurants in NJ and eastern PA as the sample   Famously found no effect of minimum wage on employment   Some follow-up studies have disputed their findings, though not

much evidence of a large effect of the minimum wage on employment

  One hypothesis: minimum wage was so low to begin with that it was close to the equilibrium (market-clearing) wage.

  Remains a point of empirical study and argument.

32

�

ˆ β 3


!   We can denote a generic diff-in-diff specification as

!   It’s not hard to think of dozens of cases in policy analysis where this type of approach could be used   Effect of changes in welfare generosity on

  employment   fertility   marriage   school performance of kids of recipients

  Effect of change in divorce laws on   divorce   child outcomes   domestic violence incidents

33

�

Outcomeit = β0 + β1Postit + β2Treat + β3Post _Treatit + uit


!   Note the crucial assumption: Common trends   If employment in NJ and PA always move together; or (more

generally) if outcome variable in Treat=1 vs. Treat=0 always move together, then this is likely a valid assumption

  If employment and NJ and PA often move together but sometimes randomly deviate (for reasons other than changes in minimum wage laws) then assumption is more suspect

  Does the minimum wage really increase employment?   Seems unlikely theoretically   A more sensible conclusion would be either that minimum wage has no effect

or small effect on employment; OR something else changed in NJ (or PA) at the same time as minimum wage law change is driving their finding (i.e., their results are subject to bias

34

Caveats on Policy Analysis When Those Studied Choose Whether to be Treated

!  Consider a job training program that unemployed workers have the option of signing up for

!  If individuals choose whether to participate in a program, this may lead to a self-selection problem

!  e.g. Those that apply for job training may be more diligent or may have had to qualify

35

Self-selection Problems

!  If we can control for everything that is correlated with both participation and the outcome of interest then it’s not a problem

!  Often, though, there are unobservables that are correlated with participation

!  In this case, the estimate of the program effect is biased, and we don’t want to set policy based on it!

36

Date post:	23-Jul-2018
Category:	Documents
Upload:	truongbao
View:	216 times
Download:	0 times

Multiple Regression Analysis - Web.UVic.caweb.uvic.ca/~mfarnham/345/T7notes.pdf · A Dummy...

Documents