+ All Categories
Home > Documents > Econometrics Ch6 Applications

Econometrics Ch6 Applications

Date post: 03-Nov-2015
Category:
Upload: mihaela-sirianu
View: 4 times
Download: 0 times
Share this document with a friend
Description:
econometrics
49
 Martin Luther University of Halle-Wittenberg Department of Economics Chair of Econometrics Econometrics Lecture 6. Appli cations Summer 2015 1/49
Transcript
  • Martin Luther University of Halle-WittenbergDepartment of Economics

    Chair of Econometrics

    EconometricsLecture

    6. Applications

    Summer 2015

    1 / 49

  • Key questions and objectives

    This chapter focuses on the following key questions:

    How does changing the units of measurement of variables affect theOLS regression results (OLS intercept, slope estimates, standard errors,t statistics, F statistics, and confidence intervals)?

    How can we specify an appropriate functional form relationship betweenthe explained and explanatory variables?

    How can we obtain confidence intervals for a prediction from the OLSregression line?

    2 / 49

  • Applications

    6 Applications6.1 Effects of data scaling on OLS statistics6.2 Functional form specification6.2.1 Using logarithmic functional forms6.2.2 Models with quadratics6.2.3 Models with interaction terms

    6.3 Goodness-of-fit and selection of regressors6.3.1 Adjusted R-squared6.3.2 Selection of regressors

    6.4 Prediction6.4.1 Confidence intervals for predictions6.4.2 Predicting y when ln y is the dependent variable

    3 / 49

  • Applications

    Effects of data scaling on OLS statistics

    6 Applications6.1 Effects of data scaling on OLS statistics6.2 Functional form specification6.2.1 Using logarithmic functional forms6.2.2 Models with quadratics6.2.3 Models with interaction terms

    6.3 Goodness-of-fit and selection of regressors6.3.1 Adjusted R-squared6.3.2 Selection of regressors

    6.4 Prediction6.4.1 Confidence intervals for predictions6.4.2 Predicting y when ln y is the dependent variable

    4 / 49

  • Applications

    Effects of data scaling on OLS statistics

    6.1 Effects of data scaling on OLS statistics

    In general, the coefficients, standard errors, confidence intervals, tstatistics, and F statistics change in ways that preserve all measuredeffects and testing outcomes when variables are rescaled.

    Data scaling is often used to reduce the number of zeros after adecimal point in an estimated coefficient.

    Example: birth weight and cigarette smoking

    Regression model:

    bwght = 0 + 1cigs + 2faminc, (6.1)

    where

    bwght = child birth weight, in ounces.cigs = no. of cigs smoked by the pregnant mother, per day.

    faminc = annual family income, in thousands of dollars

    5 / 49

  • Applications

    Effects of data scaling on OLS statistics

    CHAPTER 6 Multiple Regression Analysis: Further Issues 187

    The estimates of this equation, obtained using the data in BWGHT.RAW, are given in the first column of Table 6.1. Standard errors are listed in parentheses. The estimate on cigs says that if a woman smoked 5 more cigarettes per day, birth weight is predicted to be about .4634(5) 2.317 ounces less. The t statistic on cigs is 5.06, so the variable is very statistically significant.

    Now, suppose that we decide to measure birth weight in pounds, rather than in ounces. Let bwghtlbs bwght/16 be birth weight in pounds. What happens to our OLS statistics if we use this as the dependent variable in our equation? It is easy to find the effect on the coefficient estimates by simple manipulation of equation (6.1). Divide this entire equation by 16:

    ! bwght /16 C 0 /16 ( C 1 /16)cigs ( C 2 /16)faminc.

    Since the left-hand side is birth weight in pounds, it follows that each new coefficient will be the corresponding old coefficient divided by 16. To verify this, the regression of bwghtlbs on cigs, and faminc is reported in column (2) of Table 6.1. Up to four digits, the intercept and slopes in column (2) are just those in column (1) divided by 16. For example, the coefficient on cigs is now .0289; this means that if cigs were higher by five, birth weight would be .0289(5) .1445 pounds lower. In terms of ounces, we have .1445(16) 2.312, which is slightly different from the 2.317 we obtained earlier due to rounding error. The point is, once the effects are transformed into the same units, we get exactly the same answer, regardless of how the dependent variable is measured.

    What about statistical significance? As we expect, changing the dependent variable from ounces to pounds has no effect on how statistically important the independent variables are. The standard errors in column (2) are 16 times smaller than those in column (1). A few quick calculations show that the t statistics in column (2) are indeed identical to the t sta-tistics in column (1). The endpoints for the confidence intervals in column (2) are just the endpoints in column (1) divided by 16. This is because the CIs change by the same factor as the standard errors. [Remember that the 95% CI here is C j 1.96 se( C j).]

    T A B L E 6 . 1 Effects of Data Scaling

    Dependent Variable (1) bwght (2) bwghtlbs (3) bwght

    Independent Variables

    cigs .4634 (.0916)

    .0289 (.0057)

    packs 9.268 (1.832)

    faminc .0927 (.0292)

    .0058(.0018)

    .0927 (.0292)

    intercept 116.974 (1.049)

    7.3109(.0656)

    116.974 (1.049)

    Observations 1,388 1,388 1,388

    R-Squared .0298 .0298 .0298

    SSR 557,485.51 2,177.6778 557,485.51

    SER 20.063 1.2539 20.063

    C

    enga

    ge Le

    arni

    ng, 2

    013

    Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

    Source: Wooldridge (2013), Table 6.1

    6 / 49

  • Applications

    Effects of data scaling on OLS statistics

    Conversion of the dependent variable: All OLS estimates change. But once the effects are transformed into

    the same units, we get exactly the same answer, regardless of how thedependent variable is measured.

    Standard errors and confidence intervals change. Residuals and SSR change. Statistical significance is not affected. t and p values remain

    unchanged. R-squared is not affected.

    Conversion of an explanatory variable affects only its coefficient andstandard error.

    Question: in the birth weight equation, suppose that faminc ismeasured in dollars rather than in thousands of dollars. Thus, definethe variable fincdol = 1, 000 faminc. How will the OLS statisticschange when fincdol is substituted for faminc? Do you think it is betterto measure income in dollars or in thousands of dollars?

    7 / 49

  • Applications

    Effects of data scaling on OLS statistics

    If the dependent variable appears in logarithmic form, changing theunit of measurement does not affect the slope coefficients:

    Conversion: ln(cyi ) = ln c + ln yi , c > 0 New intercept: new0 = old0 + ln c

    Similarly, changing the unit of measurement of any explanatoryvariable xj , where ln(xj) appears in the regression, only affects theintercept.

    Conversion: ln(cxij) = ln c + ln xij , c > 0 New intercept: new0 = old0 j ln c

    8 / 49

  • Applications

    Functional form specification

    6 Applications6.1 Effects of data scaling on OLS statistics6.2 Functional form specification6.2.1 Using logarithmic functional forms6.2.2 Models with quadratics6.2.3 Models with interaction terms

    6.3 Goodness-of-fit and selection of regressors6.3.1 Adjusted R-squared6.3.2 Selection of regressors

    6.4 Prediction6.4.1 Confidence intervals for predictions6.4.2 Predicting y when ln y is the dependent variable

    9 / 49

  • Applications

    Functional form specification

    6.2.1 Using logarithmic functional forms

    Example: housing prices and air pollution

    Estimated equation:ln price =9.23 .718 ln nox +.306rooms (6.7)

    (0.19) (.066) (.019)

    The coefficient 1 is the elasticity of price with respect to nox: if noxincreases by 1%, price is predicted to fall by .718%, ceteris paribus.

    The coefficient 2 is the semi-elasticity of price with respect to rooms.It is the change in ln price, when rooms = 1. When multiplied by 100,this is the approximate percentage change in price: one more roomincreases price by about 30.6%.

    The approximation error occurs because, as the change in ln ybecomes larger and larger, the approximation %y 100 ln ybecomes more and more inaccurate.

    10 / 49

  • Applications

    Functional form specification

    For the exact interpretation, consider the general estimated model:

    ln y = 0 + 1 ln x1 + 2x2.

    Holding fixed x1, we have ln y = 2x2. Exact percentage change:

    %y = 100[exp (2x2) 1], (6.8)where the multiplication by 100 turns the proportionate change into apercentage change.

    When x2 = 1,

    %y = 100[exp (2) 1]. (6.9) In the housing price example,

    %price = 100 [exp(.306) 1] = 35.8%, which is notably larger thanthe approximate percentage change, 30.6%.

    11 / 49

  • Applications

    Functional form specification

    Adjustment in 6.8 is not as crucial for small percentage changes.Approximate Exact

    2 2 100 100[exp (2) 1]0.05 5 5.130.10 10 10.520.15 15 16.180.20 20 22.140.30 30 34.990.50 50 64.87

    Advantages of using logarithmic variables: Appealing interpretations When y > 0, models using ln y as the dependent variable often satisfy

    the CLM assumptions more closely than models using the level of y . Taking the log of a variable often narrows its range (e.g. monetary

    values, such as firms annual sales). Narrowing the range of thedependent and independent variables can make OLS estimates lesssensitive to outliers.

    12 / 49

  • Applications

    Functional form specification

    Using explanatory variables that are measured as percentages:

    ln(wage) = 0.3 0.05unemployment rateln(wage) = 0.3 0.05 ln(unemployment rate)

    The first equation says that an increase in the unemployment rate byone percentage point (e.g. a change from 8 to 9) decreases wages byabout 5%.

    The second equation says that an increase in the unemployment rate byone percent (e.g. a change from 8 to 8.08) decreases wages by about0.05%.

    Limitations of logarithms: logs cannot be used if a variable takes onzero or negative values. Sometimes, ln(1 + y) is used. However, thisapproach is acceptable only when the data on y contain relatively fewzeros. Alternatives are Tobit and Poisson models.

    13 / 49

  • Applications

    Functional form specification

    6.2.2 Models with quadratics

    Quadratic functions are also used often to capture decreasing orincreasing marginal effects.

    Example:

    y = 0 + 1x + 2x2, (6.10)

    where y = wage and x = exper.

    Interpretation: the effect of x on y depends on the value of x .

    y (1 + 22x)x , so yx 1 + 22x . (6.11)

    Typically, we might plug in the average value of x in the sample, orsome other interesting values, such as the median or the lower andupper quartile values.

    14 / 49

  • Applications

    Functional form specification

    Example: wage regression

    Estimated equation:

    wage = 3.73 + .298exper .0061exper2 (6.12)

    Equation 6.12 implies that exper has a diminishing effect on wage. The first year of experience is worth 0.298 cent per hour. The second year of experience is worth less: .298 2(.0061)(1) = .286. In going from 10 to 11 years of experience, wage is predicted to

    increase by about .298 2(.0061)(10) = .176. The turning point (or maximum of the function) is achieved at the

    coefficient on x over twice the absolute value of the coefficient on x2:

    x =122

    =.298

    2(.0061) 24.4. (6.13)

    15 / 49

  • Applications

    Functional form specification

    CHAPTER 6 Multiple Regression Analysis: Further Issues 195

    This estimated equation implies that exper has a diminishing effect on wage. The first year of experience is worth roughly 30 per hour ($.298). The second year of experience is worth less [about .298 2(.0061)(1) .286, or 28.6, according to the approximation in (6.11) with x 1]. In going from 10 to 11 years of experience, wage is predicted to increase by about .298 2(.0061)(10) .176, or 17.6. And so on.

    When the coefficient on x is positive and the coefficient on x2 is negative, the qua-dratic has a parabolic shape. There is always a positive value of x where the effect of x on y is zero; before this point, x has a positive effect on y; after this point, x has a negative effect on y. In practice, it can be important to know where this turning point is.

    In the estimated equation (6.10) with C1 0 and C2 0, the turning point (or maxi-mum of the function) is always achieved at the coefficient on x over twice the absolute value of the coefficient on x2:

    x* U C1/(2 C 2)U. [6.13]

    In the wage example, x* exper* is .298/[2(.0061)] 24.4. (Note how we just drop the minus sign on .0061 in doing this calculation.) This quadratic relationship is illustrated in Figure 6.1.

    In the wage equation (6.12), the return to experience becomes zero at about 24.4 years. What should we make of this? There are at least three possible explanations. First, it may be that few people in the sample have more than 24 years of experience, and so the part of the curve to the right of 24 can be ignored. The cost of using a quadratic to capture diminishing effects is that the quadratic must eventually turn around. If this point is beyond all but a small percentage of the people in the sample, then this is not of much concern.

    F I G U R E 6 . 1 Quadratic relationship between ! wage and exper.

    3.73

    7.37

    exper

    wage

    24.4

    C

    enga

    ge Le

    arni

    ng, 2

    013

    Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

    Source: Wooldridge (2013), Figure 6.1

    16 / 49

  • Applications

    Functional form specification

    Example: effects of pollution on housing prices

    ln price = 0 + 1 ln nox + 2 ln dist + 3rooms + 4rooms2 5stratio. reg lprice lnox ldist c.rooms##c.rooms stratio

    Source | SS df MS Number of obs = 506

    -------------+------------------------------ F( 5, 500) = 151.77

    Model | 50.9872385 5 10.1974477 Prob > F = 0.0000

    Residual | 33.5949865 500 .067189973 R-squared = 0.6028

    -------------+------------------------------ Adj R-squared = 0.5988

    Total | 84.582225 505 .167489554 Root MSE = .25921

    ---------------------------------------------------------------------------------

    lprice | Coef. Std. Err. t P>|t| [95% Conf. Interval]

    ----------------+----------------------------------------------------------------

    lnox | -.901682 .1146869 -7.86 0.000 -1.12701 -.6763544

    ldist | -.0867814 .0432807 -2.01 0.045 -.1718159 -.001747

    rooms | -.545113 .1654541 -3.29 0.001 -.870184 -.2200419

    |

    c.rooms#c.rooms | .0622612 .012805 4.86 0.000 .037103 .0874194

    |

    stratio | -.0475902 .0058542 -8.13 0.000 -.059092 -.0360884

    _cons | 13.38548 .5664731 23.63 0.000 12.27252 14.49844

    ---------------------------------------------------------------------------------17 / 49

  • Applications

    Functional form specification

    Interpretation: what is the effect of rooms on ln price? Because the coefficient on rooms is negative and the coefficient on

    rooms2 is positive, this equation implies that, at low values of rooms,an additional room has a negative effect on ln price.

    At some point, the effect becomes positive, and the quadratic shapemeans that the semi-elasticity of price with respect to rooms isincreasing as rooms increases.

    Turnaround value of rooms:

    x =(.5451)

    2(.0623)= 4.4

    18 / 49

  • Applications

    Functional form specification

    CHAPTER 6 Multiple Regression Analysis: Further Issues 197

    and so

    % ! price 100{[.545 2(.062)]rooms}rooms (54.5 12.4 rooms)rooms.

    Thus, an increase in rooms from, say, five to six increases price by about 54.5 12.4(5) 7.5%; the increase from six to seven increases price by roughly 54.5 12.4(6) 19.9%. This is a very strong increasing effect.

    The strong increasing effect of rooms on log(price) in this example illustrates an im-portant lesson: one cannot simply look at the coefficient on the quadratic termin this case, .062and declare that it is too small to bother with based only on its magnitude. In many applications with quadratics the coefficient on the squared variable has one or more zeros after the decimal point: after all, this coefficient measures how the slope is chang-ing as x (rooms) changes. A seemingly small coefficient can have practically important consequences, as we just saw. As a general rule, one must compute the partial effect and see how it varies with x to determine if the quadratic term is practically important. In doing so, it is useful to compare the changing slope implied by the quadratic model with the constant slope obtained from the model with only a linear term. If we drop rooms2 from the equation, the coefficient on rooms becomes about .255, which implies that each additional room starting from any number of roomsincreases median price by about 25.5%. This is very different from the quadratic model, where the effect becomes 25.5% at rooms 6.45 but changes rapidly as rooms gets smaller or larger. For example, at rooms 7, the return to the next room is about 32.3%.

    F I G U R E 6 . 2 ! log(price ) as a quadratic function of rooms.

    rooms

    log(price)

    4.4

    C

    enga

    ge Le

    arni

    ng, 2

    013

    Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

    Source: Wooldridge (2013), Figure 6.2

    19 / 49

  • Applications

    Functional form specification

    Only five of the 506 communities in the sample have houses averaging4.4 rooms or less, about 1% of the sample. Hence, the quadratic to theleft of 4.4 can, for practical purposes, be ignored.

    To the right of 4.4, we see that adding another room has an increasingeffect on the percentage change in price:

    %price 100 {[.545 + 2(.062)] rooms}rooms= (54.5 + 12.4rooms)rooms

    An increase in rooms from, say, five to six increases price by about54.5 + 12.4(5) = 7.5%.

    An increase from six to seven increases price by54.5 + 12.4(6) = 19.9%.

    20 / 49

  • Applications

    Functional form specification

    If the coefficients on the level and squared terms have the same sign(either both positive or both negative) and the explanatory variable isnonnegative, then there is no turning point for values x > 0.

    Quadratic functions may also be used to allow for a nonconstantelasticity.

    Example:

    ln price = 0 + 1 ln nox + 2(ln nox)2 + ...+ u. (6.15)

    The elasticity depends on the level of nox:

    %price [1 + 22 ln nox]%nox. (6.16)

    Further (higher) polynomial terms can be included in regression models:

    y = 0 + 1x + 2x2 + 3x

    3 + 4x4 + u.

    21 / 49

  • Applications

    Functional form specification

    6.2.3 Models with interaction terms

    Sometimes, the partial effect, elasticity, or semi-elasticity of thedependent variable with respect to an explanatory variable depends onthe magnitude of another explanatory variable.

    Example: in the model

    price = 0 + 1sqrft + 2bdrms + 3sqrft bdrms + 4bthrms + u

    the partial effect of bdrms on price is

    price

    bdrms= 2 + 3sqrft. (6.17)

    Interaction effect between square footage and number of bedrooms:if 3 > 0, then an additional bedroom yields a higher increase inhousing price for larger houses.

    22 / 49

  • Applications

    Functional form specification

    Example: did returns to education change between 1978 and 1985?

    Consider the following wage regression:

    ln wage =1 + 2y85 + 3educ + 4y85 educ + ...+ u.

    Returns to education are: ln wage

    educ= 3 + 4y85 =

    {3, if y85 = 0;3 + 4, if y85 = 1.

    23 / 49

  • Applications

    Functional form specification

    Source | SS df MS Number of obs = 1084

    -------------+------------------------------ F( 8, 1075) = 99.80

    Model | 135.992074 8 16.9990092 Prob > F = 0.0000

    Residual | 183.099094 1075 .170324738 R-squared = 0.4262

    -------------+------------------------------ Adj R-squared = 0.4219

    Total | 319.091167 1083 .29463635 Root MSE = .4127

    ------------------------------------------------------------------------------

    lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]

    -------------+----------------------------------------------------------------

    y85 | .1178062 .1237817 0.95 0.341 -.125075 .3606874

    educ | .0747209 .0066764 11.19 0.000 .0616206 .0878212

    y85educ | .0184605 .0093542 1.97 0.049 .000106 .036815

    [output omitted]

    _cons | .4589329 .0934485 4.91 0.000 .2755707 .642295

    ------------------------------------------------------------------------------

    Returns to education in 1978: 7.47% Returns to education in 1985: (.0747 + .0185)100 = 9.32% Returns to education increased between 1978 and 1985 by4 = 0.0185, i.e. by 1.85 percentage points.

    24 / 49

  • Applications

    Goodness-of-fit and selection of regressors

    6 Applications6.1 Effects of data scaling on OLS statistics6.2 Functional form specification6.2.1 Using logarithmic functional forms6.2.2 Models with quadratics6.2.3 Models with interaction terms

    6.3 Goodness-of-fit and selection of regressors6.3.1 Adjusted R-squared6.3.2 Selection of regressors

    6.4 Prediction6.4.1 Confidence intervals for predictions6.4.2 Predicting y when ln y is the dependent variable

    25 / 49

  • Applications

    Goodness-of-fit and selection of regressors

    6.3.1 Adjusted R-squared

    R-squared is the proportion of the total sample variation in y that isexplained by x1, x2, ..., xk .

    The size of R-squared does not affect unbiasedness. R-squared never decreases when additional explanatory variables are

    added to the model because SSR never goes up (and usually falls) asmore variables are added:

    R2 = 1 SSRSST

    .

    The adjusted R-squared imposes a penalty for adding additionalindependent variables to a model:

    R2 = 1 SSR/(n k 1)SST/(n 1) = 1

    2

    SST/(n 1) . (6.21)

    26 / 49

  • Applications

    Goodness-of-fit and selection of regressors

    SSR/(n k 1) can go up or down when a new independent variableis added to a regression.

    If we add a new independent variable to a regression equation, R2increases if, and only if, the t statistic on the new variable is greaterthan one in absolute value.

    It holds that

    R2 = 1 (1 R2)(n 1)

    (n k 1) . (6.22)

    R2 can be negative, indicating a very poor model fit relative to thenumber of degrees of freedom.

    27 / 49

  • Applications

    Goodness-of-fit and selection of regressors

    Adjusted R-squared can be used to choose between nonnestedmodels. (Two equations are nonnested models when neither equationis a special case of the other.)

    Example: explaining major league baseball players salaries

    Model 1: ln salary = 0 + 1yrs + 2games + 3bavg + 4hrunsyr + u

    R2 = .6211

    Model 2: ln salary = 0 + 1yrs + 2games + 3bavg + 4rbisyr + u

    R2 = .6226

    Based on the adjusted R-squared, there is a very slight preference forthe model with rbisyr.

    28 / 49

  • Applications

    Goodness-of-fit and selection of regressors

    Example: explaining R&D intensity

    Model 1: rdintens = 0 + 1 ln sales + u

    R2 = .061, R2 = .030

    Model 2: rdintens = 0 + 1sales + 2sales2 + u

    R2 = .148, R2 = .090

    The first model captures a diminishing return by including sales inlogarithmic form; the second model does this by using a quadratic.Thus, the second model contains one more parameter than the first.

    Neither R2 nor R2 can be used to choose between different functionalforms for the dependent variable.

    29 / 49

  • Applications

    Goodness-of-fit and selection of regressors

    6.3.2 Selection of regressors

    A long regression (i.e. with many explanatory variables) is more likelyto have ceteris paribus interpretation than a short regression.

    Furthermore, a long regression generates more precise estimates of thecoefficients on the variables included in a short regression because thesecovariates lead to a smaller residual variance.

    However, it is also possible to control for too many variables in aregression analysis (over controlling).

    30 / 49

  • Applications

    Goodness-of-fit and selection of regressors

    Example: impact of state beer taxes on traffic fatalities

    Idea: a higher tax on beer will reduce alcohol consumption, andlikewise drunk driving, resulting in fewer traffic fatalities.

    Model to measure the ceteris paribus effect of taxes on fatalities:fatalities = 0 + 1tax + 2miles + 3percmale + 4perc16 21 + ...,

    where

    miles = total miles driven.percmale = percentage of the state population that is male.

    perc16 21 = percentage of the population between ages 16 and 21, and so on. The model does not included a variable measuring per capita beer

    consumption. Are we committing an omitted variables error?

    No, because controlling for beer consumption would imply that wemeasures the difference in fatalities due to a one percentage pointincrease in tax, holding beer consumption fixed. This is notinteresting.

    31 / 49

  • Applications

    Prediction

    6 Applications6.1 Effects of data scaling on OLS statistics6.2 Functional form specification6.2.1 Using logarithmic functional forms6.2.2 Models with quadratics6.2.3 Models with interaction terms

    6.3 Goodness-of-fit and selection of regressors6.3.1 Adjusted R-squared6.3.2 Selection of regressors

    6.4 Prediction6.4.1 Confidence intervals for predictions6.4.2 Predicting y when ln y is the dependent variable

    32 / 49

  • Applications

    Prediction

    6.4.1 Confidence intervals for predictions

    (a) CI for E (y |x1, ..., xk) (for the average value of y for thesubpopulation with a given set of covariates) Predictions are subject to sampling variation because they are obtained

    using the OLS estimators. Estimated equation:

    y = 0 + 1x1 + 2x2 + ...+ kxk . (6.27)

    Plugging in particular values of the independent variables, we obtain aprediction for y . The parameter we would like to estimate is:

    0 = 0 + 1c1 + 2c2 + ...+ kck (6.28)

    = E (y |x1 = c1, x2 = c2, ..., xk = ck). The estimator of 0 is

    0 = 0 + 1c1 + 2c2 + ...+ kck . (6.29)

    33 / 49

  • Applications

    Prediction

    The uncertainty in this prediction is represented by a confidenceinterval for 0.

    With a large df, we can construct a 95% confidence interval for 0using the rule of thumb 0 2 se(0).

    How do we obtain the standard error of 0? Trick: Write 0 = 0 1c1 2c2 ... kck . Plug this into

    y = 0 + 1x1 + 2x2 + ...+ kxk + u.

    This gives

    y = 0 + 1(x1 c1) + 2(x2 c2) + ...+ k(xk ck) + u. (6.30)

    That is, we run a regression where we subtract the value cj from eachobservation on xj .

    The predicted value and its standard error are obtained from theintercept in regression 6.30.

    34 / 49

  • Applications

    Prediction

    Example: confidence interval for predicted college GPAEstimation results for predicting college GPA:

    Source | SS df MS Number of obs = 4137

    -------------+------------------------------ F( 4, 4132) = 398.02

    Model | 499.030504 4 124.757626 Prob > F = 0.0000

    Residual | 1295.16517 4132 .313447524 R-squared = 0.2781

    -------------+------------------------------ Adj R-squared = 0.2774

    Total | 1794.19567 4136 .433799728 Root MSE = .55986

    ------------------------------------------------------------------------------

    colgpa | Coef. Std. Err. t P>|t| [95% Conf. Interval]

    -------------+----------------------------------------------------------------

    sat | .0014925 .0000652 22.89 0.000 .0013646 .0016204

    hsperc | -.0138558 .000561 -24.70 0.000 -.0149557 -.0127559

    hsize | -.0608815 .0165012 -3.69 0.000 -.0932328 -.0285302

    hsizesq | .0054603 .0022698 2.41 0.016 .0010102 .0099104

    _cons | 1.492652 .0753414 19.81 0.000 1.344942 1.640362

    ------------------------------------------------------------------------------

    Note: definition of variables is colgpa=GPA after fall semester,sat=combined SAT score, hsperc=high school percentile (from top),hsize=size grad. class (100s).

    35 / 49

  • Applications

    Prediction

    What is predicted college GPA, when sat=1,200, hsperc=30, andhsize=5 (which means 500)?

    Define a new set of independent variables: sat0 = sat - 1,200, hsperc0= hsperc - 30, hsize0 = hsize - 5, and hsizesq0 = hsize2 - 25.

    Source | SS df MS Number of obs = 4137

    -------------+------------------------------ F( 4, 4132) = 398.02

    Model | 499.030503 4 124.757626 Prob > F = 0.0000

    Residual | 1295.16517 4132 .313447524 R-squared = 0.2781

    -------------+------------------------------ Adj R-squared = 0.2774

    Total | 1794.19567 4136 .433799728 Root MSE = .55986

    ------------------------------------------------------------------------------

    colgpa | Coef. Std. Err. t P>|t| [95% Conf. Interval]

    -------------+----------------------------------------------------------------

    sat0 | .0014925 .0000652 22.89 0.000 .0013646 .0016204

    hsperc0 | -.0138558 .000561 -24.70 0.000 -.0149557 -.0127559

    hsize0 | -.0608815 .0165012 -3.69 0.000 -.0932328 -.0285302

    hsizesq0 | .0054603 .0022698 2.41 0.016 .0010102 .0099104

    _cons | 2.700075 .0198778 135.83 0.000 2.661104 2.739047

    ------------------------------------------------------------------------------

    36 / 49

  • Applications

    Prediction

    The variance of the prediction is smallest at the mean values of the xj(because the variance of the intercept estimator is smallest when eachexplanatory variable has zero sample mean).

    (b) CI for a particular unit from the population: prediction interval In forming a confidence interval for an unknown outcome on y , we

    must account for the variance in the unobserved error. Let y0 be the value for an individual not in our original sample. Let x01 , x02 , ..., x0k be the new values of the independent variables. Let u0 be the unobserved error. Model for observation (y0, x01 , ..., x0k ):

    y0 = 0 + 1x01 + 2x

    02 + ...+ kx

    0k + u

    0. (6.33)

    Prediction:y0 = 0 + 1x

    01 + 2x

    02 + ...+ kx

    0k .

    Prediction error:e0 = y0 y0 = (0 + 1x01 + 2x02 + ...+ kx0k ) + u0 y0. (6.34)

    37 / 49

  • Applications

    Prediction

    The expected prediction error is zero, E (e0) = 0, becauseE (y0) = y0 (as the j are unbiased) and u

    0 has zero mean.

    The variance of the prediction error is the sum of the variancesbecause u0 and y0 are uncorrelated:

    Var(e0) = Var(y0) + Var(u0) = Var(y0) + 2. (6.35)

    There are two sources of variation in e0:1 Sampling error in y0, which arises because we have estimated the j ;

    decreases with sample size.2 2 is variance of the error in the population; it does not change with the

    sample size.

    Standard error of e0:

    se(e0) = {[se(y0)]2 + 2}1/2. (6.36)

    38 / 49

  • Applications

    Prediction

    It holds that e0/se(e0) has a t distribution with n k 1 degrees offreedom.

    Therefore,

    P

    [t/2 6

    e0

    se(e0)6 t/2

    ]= 1

    P

    [t/2 6

    y0 y0se(e0)

    6 t/2]

    = 1

    P[y0 t/2se(e0) 6 y0 6 y0 + t/2se(e0)

    ]= 1

    39 / 49

  • Applications

    Prediction

    Example: prediction interval (for GPA) for any particular student

    Source | SS df MS Number of obs = 4137

    -------------+------------------------------ F( 4, 4132) = 398.02

    Model | 499.030503 4 124.757626 Prob > F = 0.0000

    Residual | 1295.16517 4132 .313447524 R-squared = 0.2781

    -------------+------------------------------ Adj R-squared = 0.2774

    Total | 1794.19567 4136 .433799728 Root MSE = .55986

    ------------------------------------------------------------------------------

    colgpa | Coef. Std. Err. t P>|t| [95% Conf. Interval]

    -------------+----------------------------------------------------------------

    sat0 | .0014925 .0000652 22.89 0.000 .0013646 .0016204

    hsperc0 | -.0138558 .000561 -24.70 0.000 -.0149557 -.0127559

    hsize0 | -.0608815 .0165012 -3.69 0.000 -.0932328 -.0285302

    hsizesq0 | .0054603 .0022698 2.41 0.016 .0010102 .0099104

    _cons | 2.700075 .0198778 135.83 0.000 2.661104 2.739047

    ------------------------------------------------------------------------------

    se(e0) = [(.020)2 + 0.5602]1/2 .560. Prediction interval: 2.70 1.96 .560 = [1.60, 3.80].

    40 / 49

  • Applications

    Prediction

    6.4.2 Predicting y when ln y is the dependent variable

    Given the OLS estimators, we can predict ln y for any value of theexplanatory variables:

    ln y = 0 + 1x1 + 2x2 + ...+ kxk . (6.39)

    How to predict y? N.B.: y 6= exp(ln y). Hence, simply exponentiate the predicted value

    for ln y does not work. In fact, it will systematically underestimate theexpected value of y .

    It can be shown that

    E (y |x) = exp(2/2) exp(0 + 1x1 + 2x2 + ...+ kxk),

    where 2 is the variance of u.

    41 / 49

  • Applications

    Prediction

    Hence, the prediction of y is:

    y = exp(2/2) exp(ln y), (6.40)

    where 2 is the unbiased estimator of 2.

    The prediction in 6.40 relies on the normality of the error term, u. How to obtain a prediction that does not rely on normality? General model:

    E (y |x) = 0 exp(0 + 1x1 + 2x2 + ...+ kxk), (6.41)

    where 0 is the expected value of exp(u).

    Given an estimate 0, we can predict y as

    y = 0 exp(ln y). (6.42)

    42 / 49

  • Applications

    Prediction

    First approach to estimate 0: a consistent but not unbiasedsmearing estimate is

    0 = n1

    ni=1

    exp(ui ). (6.43)

    Second approach to estimate 0: Define mi = exp(0 + 1xi1 + 2xi2 + ...+ kxik). Replace the j with their OLS estimates and obtain mi = exp(ln yi ). Estimate a simple regression of yi on mi without an intercept. The

    slope estimate is a consistent but not unbiased estimate for 0.

    With a consistent estimate for 0, the prediction for y can becalculated as 0 exp(ln y).

    43 / 49

  • Applications

    Prediction

    Example: predicting CEO salaries

    Model:ln salary = 0 + 1 ln sales + 2 ln mktval + 3ceoten + u,

    Estimation results:Source | SS df MS Number of obs = 177

    -------------+------------------------------ F( 3, 173) = 26.91

    Model | 20.5672434 3 6.85574779 Prob > F = 0.0000

    Residual | 44.0789697 173 .254791732 R-squared = 0.3182

    -------------+------------------------------ Adj R-squared = 0.3063

    Total | 64.6462131 176 .367308029 Root MSE = .50477

    ------------------------------------------------------------------------------

    lsalary | Coef. Std. Err. t P>|t| [95% Conf. Interval]

    -------------+----------------------------------------------------------------

    lsales | .1628545 .0392421 4.15 0.000 .0853995 .2403094

    lmktval | .109243 .0495947 2.20 0.029 .0113545 .2071315

    ceoten | .0117054 .0053261 2.20 0.029 .001193 .0222178

    _cons | 4.503795 .2572344 17.51 0.000 3.996073 5.011517

    ------------------------------------------------------------------------------

    44 / 49

  • Applications

    Prediction

    The smearing estimate for 0 is:. predict uhat, res

    . gen euhat = exp(uhat)

    . su euhat

    Variable | Obs Mean Std. Dev. Min Max

    -------------+--------------------------------------------------------

    euhat | 177 1.135661 .6970541 .0823372 6.378018

    45 / 49

  • Applications

    Prediction

    The regression estimate for 0 is:. predict lsalary_hat

    (option xb assumed; fitted values)

    . gen m_hat = exp(lsalary_hat)

    . reg salary m_hat, nocons

    Source | SS df MS Number of obs = 177

    -------------+------------------------------ F( 1, 176) = 562.39

    Model | 147352711 1 147352711 Prob > F = 0.0000

    Residual | 46113901 176 262010.801 R-squared = 0.7616

    -------------+------------------------------ Adj R-squared = 0.7603

    Total | 193466612 177 1093031.71 Root MSE = 511.87

    ------------------------------------------------------------------------------

    salary | Coef. Std. Err. t P>|t| [95% Conf. Interval]

    -------------+----------------------------------------------------------------

    m_hat | 1.116857 .0470953 23.71 0.000 1.023912 1.209801

    ------------------------------------------------------------------------------

    46 / 49

  • Applications

    Prediction

    Prediction for sales = 5,000 (which means $5 billion because sales is inmillions), mktval = 10,000 (or $10 billion), and ceoten = 10:

    ln salary = 4.503 + 0.163 ln(5000) + 0.109 ln(10000) + 0.012 10= 7.013.

    Naive prediction: exp(7.013) = 1110.983. Prediction using smearing estimate: 1.136 exp(7.013) = 1262.076. Prediction using regression estimate: 1.117 exp(7.013) = 1240.967.

    47 / 49

  • Key terms References References

    Key terms

    adjusted R-squared interaction effect nonnested models over controlling prediction error prediction interval predictions quadratic functions smearing estimate variance of the prediction error

    48 / 49

  • Key terms References References

    References

    Textbook: Chapter 6 in Wooldridge (2013).

    Further readings: Chapter 8, Chapter 9 in Stock and Watson (2012).Chapter 6, Chapter 10 in Hill et al. (2001)

    Hill, R. C., Griffiths, W. E., and Judge, G. G. (2001). UndergraduateEconometrics. John Wiley & Sons, New York.

    Stock, J. H. and Watson, M. W. (2012). Introduction to Econometrics.Pearson, Boston.

    Wooldridge, J. M. (2013). Introductory Econometrics: A Modern Approach.Cengage Learning, Mason, OH.

    49 / 49

    MainPartApplicationsEffects of data scaling on OLS statisticsFunctional form specificationGoodness-of-fit and selection of regressorsPrediction

    AppendixKey terms


Recommended