Regression III Non-Normality and Heteroskedasticity · PDF fileNon-constant Error Variance...

Regression IIINon-Normality and Heteroskedasticity

Dave Armstrong

University of Wisconsin – MilwaukeeDepartment of Political Science

e: [email protected]: www.quantoid.net/ICPSRR.html

1 / 60

Goals of this Lecture

• Discuss methods for detecting non-normality, non-constant errorvariance, and nonlinearity

• Each of these reflect problems with the specification of the model• Discuss various ways that transformations can be used to

remedy these problems

2 / 60

Non-Normal ErrorsAssessing Non-normality

Non-constant Error VarianceAssessing Non-constant Error VarianceTesting for Non-constant Error VarianceFixing Non-constant Error Variance: With a ModelFixing Non-constant Error Variance: Without a Model

3 / 60

Non-nomrally distributed errors

• The least-squares fit is based on the conditional mean• The mean is not a good measure of center for either a highly

skewed distribution or a multi-modal distribution• Non-Normality does not produce bias in the coefficient

estimates, but it does have two important consequences:• It poses problems for efficiency - i.e., the OLS standard errors are

no longer the smallest. Weighted least squares (WLS) is moreefficient

• Standard errors can be biased - i.e., confidence intervals andsignificance test may lead to wrong conclusions. Robust standarderrors can compensate for this problem

• Transformations can often remedy the heavy-tailed problem• Re-specification of the model - i.e., include a missing discrete

predictor - can sometimes fix a multi-modal problem

4 / 60

Distribution of the Residuals Example: Inequality data

• Quantile comparison plotsand density estimates of theresiduals from a model areuseful for assessingnormality

• The density estimate of thestudentized residuals clearlyshows a positive skew, andthe possibility of a groupingof cases to the right −2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5

rstudent(mod1)

Prob

abilit

y de

nsity

func

tion

library(sm)Weakliem <- read.table("http://www.quantoid.net/files/reg3/weakliem.txt")mod1 <- lm(secpay~log(gdp), data=Weakliem)sm.density(rstudent(mod1), model="normal")

5 / 60

Assessing Unusual Cases

• A quantile comparison plotcan give us a sense of whichobservations depart fromnormality.

• We can see that the pointswith the biggest departureare the Czech Republic andSlovakia.

−2 −1 0 1 2

−10

12

34

t Quantiles

Stud

entiz

ed R

esid

uals

(mod

1)

● ● ●● ●

● ●●●●●●

●●●●

●●●●●●●

●●●●

●●●●

●●●●

●●●●

●●

●●

● ●●

●

●

●

library(car)qqPlot(mod1)

6 / 60

Studentized Residuals after Removing the Czech Republic andSlovakia

−2 −1 0 1 2

−10

12

t Quantiles

Stud

entiz

ed R

esid

uals

(mod

2)

●

●● ●

● ●

●

●●●●●

●●●

●

●●●●●

●●●●

●

●●

●●●●●●

●●●●

●

●

●

●

●

●

−3 −2 −1 0 1 2 3 4

0.0

0.1

0.2

0.3

0.4

0.5

0.6

rstudent(mod2)

Prob

abilit

y de

nsity

func

tion

W <- Weakliem[-c(21,22,24, 25,49), ]mod2 <- lm(secpay ~ log(gdp), data=W)qqPlot(mod2, simulate=T, labels=FALSE)

sm.density(rstudent(mod2), model="normal")

7 / 60



8 / 60

Non-constant Error Variance

• Also called Heteroskedasticity• An important assumption of the least-squares regression model

is that the variance of the errors around the regression surface iseverywhere the same: V(E) = V(Y |x1, . . . , xk

) = �2.• Non-constant error variance does not cause biased estimates,

but it does pose problems for efficiency and the usual formulasfor standard errors are inaccurate

• OLS estimates are inefficient because they give equal weight toall observations regardless of the fact that those with largeresiduals contain less information about the regression

• Two types of non-constant error variance are relatively common:• Error variance increases as the expectation of Y increases;• There is a systematic relationship between the errors and one of

the X’s

9 / 60



10 / 60

Assessing Non-constant Error Variance

• Direct examination of the data is usually not helpful in assessingnon-constant error variance, especially if there are manypredictors. Instead, we look to the residuals to uncover thedistribution of the errors.

• It is also not helpful to plot Y against the residuals E, becausethere is a built-in correlation between Y and E:

Y = Y + E

• The least squares fit ensures that the correlation between Y andE is 0, so a plot of these (residual plot) can help us uncovernon-constant error variance.

• The pattern of changing spread is often more easily seen usingstudentized residuals E

⇤2i

against Y

• If the values of Y are all positive, we can use a Spread-level plot• plot log(| E⇤

i

|) (called the log spread) against log Y (called the loglevel)

• The slope b of the regression line fit to this plot suggests thevariance-stabilizing transformation Y

(p), with p = 1 � b

11 / 60

Assessing Heteroskedasticity: Example - Inequality Data

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

1.05 1.10 1.15 1.20 1.25

−10

12

Studentized Residuals vs Fitted Values

fitted.values(mod2)

rstudent(mod2)

• In the residual plot , we see the familiar “fanning” out of the data -i.e., the variance of the residuals is increasing as the fittedvalues get larger

plot(fitted.values(mod2), rstudent(mod2),main="Studentized Residuals vs Fitted Values")

abline(h=0, lty=2)

12 / 60



13 / 60

Testing for Non-Constant Error variance (1)

• Assume that a discrete X (or combination of X’s) partitions thedata into m groups.

• Let Y

i j

denote that ith of n

j

outcome-variable scores in group j

• Within-group sample variances are then calculated as follows:

S

2j

=

P

n

j

i=1(Yi j

� Y

j

)2

n

j

� 1

• We could then compare these within-group sample variances tosee if they differ

• If the distribution of the errors is non-normal, however, tests thatexamine S

2j

directly are not valid because the mean is not a goodsummary of the data

14 / 60

Testing for Non-Constant Error variance (2): Score Test

• A score test for the null hypothesis that all of the error variances�2 are the same provides a better alternative

1. We start by calculating the standardized squared residuals

U

i

=E

2i

�2 =E

2i

P

E

2i

n

2. Regress the U

i

on all of the explanatory variable X’s, finding thefitted values:

U

i

= ⌘0 + ⌘1X

i1 + · · · + ⌘p

X

ip

+ !i

3. The score test, which s distributed as �2 with p degrees offreedom is:

S

20 =

P

(Ui

� U)2

215 / 60

R-script testing for non-constant error variance

• The ncvTest function in the car library provides a simple wayto carry out the score test

• The result below shows that the non-constant error variance isstatistically significantncvTest(mod2, data=W)

## Non-constant Variance Score Test## Variance formula: ~ fitted.values## Chisquare = 6.025183 Df = 1 p = 0.01410317

ncvTest(mod2, var.formula=~log(gdp), data=W)

## Non-constant Variance Score Test## Variance formula: ~ log(gdp)## Chisquare = 6.025183 Df = 1 p = 0.01410317

16 / 60



17 / 60

Weighted least squares (1)

• If the error variances are proportional to a particular X (i.e., theerror variances are known up to a constant of proportionality �2

",so that V("1) = �2

1w

i

), weighted least squares provides a goodalternative to OLS

• WLS minimizes the weighted sum of squaresP

w

i

E

2i

givinggreater weight to observations with smaller variance

• The WLS maximum likelihood estimators are defined as:

� = (X

0WX)�1

X

0Wy

�2" =

P

E

2i

w

i

n

• The estimated asymptotic covariance for the estimators is:

V(�) = �2"�

X

0WX

��1

• Here W is a square diagonal matrix with individual weights w

i

onthe diagonal and zeros elsewhere

18 / 60

Weighted Least Squares Example: Inequality Data

• The “fanning” pattern in the residual plot for the inequality modelindicates that the error variance is proportional to the gini.

• We could then proceed to estimate a WLS using the weightgini.

19 / 60

WLS Example: Inequality Data (2)

In R , we simply add a weight argument to the lm function:

mod.wls <- update(mod2, weight=I(1/log(gdp)), data=Weakliem)summary(mod.wls)

#### Call:## lm(formula = secpay ~ log(gdp), data = Weakliem, weights = I(1/log(gdp)))#### Weighted Residuals:## Min 1Q Median 3Q Max## -0.046450 -0.029308 -0.005446 0.015373 0.137529#### Coefficients:## Estimate Std. Error t value Pr(>|t|)## (Intercept) 0.70123 0.15870 4.419 5.82e-05 ***## log(gdp) 0.05491 0.01756 3.127 0.00302 **## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#### Residual standard error: 0.03849 on 47 degrees of freedom## Multiple R-squared: 0.1722, Adjusted R-squared: 0.1546## F-statistic: 9.78 on 1 and 47 DF, p-value: 0.003025

20 / 60

Interpreting WLS Results

• It is important to remember that the parameters in the WLSmodel are estimators of �, just like the OLS parameters are(these are just more efficient in the presence ofheteroskedasticity).

• Thus interpretation takes the same form as it does with the OLSparameters.

• The R

2 is less interesting here because we are explainingvariance in Life Expectancy ⇥ log(GDP/capita), rather than LifeExpectancy.

21 / 60

Generalized Least Squares (1)

• Sometimes, we do not know the relationship between x

i

andvar(u

i

|xi

).• In this case, we can use a Feasible GLS model.• FGLS estimates the weight from the data. That weight is then

used in a WLS fashion.

22 / 60

GLS: Steps

1. Regress y on x

i

and obtain residuals u

i

.2. Create log(u2

i

) by squaring and then taking the natural log of theOLS residuals from step 1.

3. Run a regression of log(u2i

) on x

i

and obtain the fitted values g

i

.4. Generate h

i

= exp(gi

).5. Estimate the WLS of y on x

i

with weights of 1h

i

.

23 / 60

FGLS Example: Inequality Data

W2 <- Weakliem[-c(25,49), ]mod1.ols <- lm(secpay ~ gini*democrat, data=W2)aux.mod1 <- update(mod1.ols, log(resid(mod1.ols)^2) ~ .)h <- exp(predict(aux.mod1))mod.fgls <- update(mod1.ols, weight=1/h)with(summary(mod.fgls), printCoefmat(coefficients))

## Estimate Std. Error t value Pr(>|t|)## (Intercept) 0.9619416 0.0476415 20.1913 < 2.2e-16 ***## gini 0.0044149 0.0013294 3.3210 0.001836 **## democrat 0.4662928 0.0806427 5.7822 7.577e-07 ***## gini:democrat -0.0102999 0.0022211 -4.6374 3.288e-05 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

24 / 60

Modeling the Variance

Sometimes, the variance of observations around y is of substantiveinterest, rather than being a nuisance. If the latter, FGLS is perfectlyreasonable. Otherwise, modeling the variance in a more interpretableway is perhaps better.

• Need to use MLE to simultaneously estimate models of theconditional variance and mean of the distribution of y.

The model is:

y ⇠ N(x

i

�,�i

) (1)

Here � is also indexed by i and we can model it as:

�2i

= e

z

i

� (2)

25 / 60

Heteroskedastic Regression in RCharles Franklin wrote a library of ML techniques in R includingheteroskedastic regression. We can do the following:

source("http://www.quantoid.net/files/reg3/mlhetreg.r")X <- model.matrix(mod1.ols)[,-1]Z <- Xhet.mod <- MLhetreg(W2$secpay, X, Z)summary(het.mod)

#### Heteroskedastic Linear Regression#### Estimated Parameters## Estimate Std. Error z-value Pr(>|z|)## Constant 0.9844411 0.0491585 20.0259 < 2.2e-16 ***## gini 0.0037411 0.0015136 2.4717 0.0134456 *## democrat 0.4474368 0.0753627 5.9371 2.901e-09 ***## gini:democrat -0.0097254 0.0021095 -4.6103 4.020e-06 ***## ZConstant -9.9427564 1.4019902 -7.0919 1.323e-12 ***## gini 0.1038281 0.0359335 2.8894 0.0038592 **## democrat 7.1919603 1.8088570 3.9760 7.009e-05 ***## gini:democrat -0.1847861 0.0495987 -3.7256 0.0001948 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#### Log-Likelihood: 109.9483#### Wald Test for Heteroskedasticity## Wald statistic: 16.16162 with 3 degrees of freedom## p= 0.001050662

26 / 60

Table : Comparing ModelsOLS FGLS HetReg

(Intercept) 0.941 0.962 0.984(0.060) (0.048) (0.049)

gini 0.005 0.004 0.004(0.002) (0.001) (0.002)

democrat 0.486 0.466 0.447(0.088) (0.081) (0.075)

gini:democrat -0.011 -0.010 -0.010(0.002) (0.002) (0.002)

27 / 60

Variance Model

(Intercept) -9.943(1.402)

gini 0.104(0.036)

democrat 7.192(1.809)

gini:democrat -0.185(0.050)

gini

Pred

icte

d R

esid

ual V

aria

nce

0.000

0.005

0.010

0.015

0.020

0.025

20 30 40 50 60

Non DemocracyDemocracy

28 / 60



29 / 60

Robust Standard Errors (1)

• Robust standard errors can be calculated to compensate for anunknown pattern of non-constant error variance

• Robust standard errors require fewer assumptions about themodel than WLS (which is better if there is increasing errorvariance in the level of Y)

• Robust standard errors do not change the OLS coefficientestimates or solve the inefficiency problem, but do give moreaccurate p-values.

• There are several methods for calculating heteroskedasticityconsistent standard errors (e.g., known variously as White,Eicker or Huber standard errors) but most are variants on themethod originally proposed by White (1980).

30 / 60

Robust Standard Errors (2): White’s Standard Errors

• The covariance matrix of the OLS estimator is:

V(b) = (X

0X)�1

X

0⌃X(X

0X)�1

= (X

0X)�1

X

0V(y)X(X

0X)�1

• Where V(y) = �2"In

if the assumptions of normality andhomoskedasticity are satisfied. The variance simplifies to:

V(b) = �2"(X

0X)�1

• In the presence of non-constant error variance, however, V(y)contains nonzero covariance and unequal variances

• In these cases, White suggests a consistent estimator of thevariance that constrains ⌃ to a diagonal matrix containing onlysquared residuals

31 / 60

Robust Standard Errors (3): White’s Standard Errors

• The heteroskedasticity consistent covariance matrix (HCCM)estimator is then:

V(b) = (X

0X)�1

X

0�X(X

0X)�1

where � = e

2i

I

n

and the e

i

are the OLS residuals• This is what is known as HC0 - White’s (1980) original recipe.

32 / 60

Hat ValuesOther HCCMs use the “hat value” which are the diagonal elements ofX

(X

0X

)�1X

0

• These give a sense of how far each observation is from themean of the X’s.

• Below is a figure that shows two hypothetical X variables and theplotting symbols are proportional in size to the hat value

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

−2 −1 0 1

−2−1

01

X[, 2]

X[, 3

]

33 / 60

Other HCCM’s

MacKinnon and White (1985) considered three alternatives: HC1,HC2 and HC3, each of which offers a different method for finding �.

• HC1: N

N�K

⇥ HC0.

• HC2: � = diag

e

2i

1�h

ii

�

where h

ii

= x

i

(X

0X)�1

x

0i

• HC3: � = diag

e

2i

(1�h

ii

)2

�

34 / 60

HC4 Standard Errors

• HC3 standard errors are shown to outperform the alternatives insmall samples

• HC3 standard errors can still fail to generate the appropriateType I error rate when outliers are present.

• HC4 standard errors can produce the appropriate test statisticseven in the presence of outliers:

� = diag2

6

6

6

6

4

e

2i

(1 � h

ii

)�i

3

7

7

7

7

5

• �i

= min

n

4, Nh

ii

p

o

with n = number of obs, and p = number ofparameters in model

• HC4 outperform HC3 in the presence of influential observations,but not in other situations.

35 / 60

HC4m Standard Errors

• HC4 standard errors are not universally better than others andas Cribari-Neto and da Silva (2011) show, HC4 SEs haverelatively poor performance when there are many regressors andwhen the maximal leverage point is extreme.

• Cribari-Neto and da Silva propose a modified HC4 estimator,called HC4m, where, as above

� = diag2

6

6

6

6

4

e

2i

(1 � h

ii

)�i

3

7

7

7

7

5

• and here, �i

= min

n

�1,nh

ii

p

o

+ min

n

�2,nh

ii

p

o

• They find that the best values of the � parameters are �1 = 1 and�2 = 1.5.

36 / 60

HC5

• HC5 standard errors are supposed to also provide differentdiscounting than HC4 and HC4m estimators. The HC5 standarderrors are operationalized as:

� = diag2

6

6

6

6

4

e

2i

(1 � h

ii

)�i

3

7

7

7

7

5

• and here, �i

= min

n

nh

ii

p

,max

n

4, nkh

max

p

oo

with k = 0.7.• For observations with bigger hat-values, their residuals get

increased in size, thus increasing the standard error (generally).

37 / 60

Deltas for the Inequality Model

0.02 0.04 0.06 0.08 0.10 0.12 0.14

0.5

1.0

1.5

2.0

2.5

3.0

h

δ i

HC4HC4mHC5

38 / 60

Denominator of HCC for Inequality Data

0.02 0.04 0.06 0.08 0.10 0.12 0.14

0.7

0.8

0.9

1.0

h

(1−h i)

δ i

HC4HC4mHC5

39 / 60

Discounts for Inequality Data

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

● ●●

●

●

●

−0.10 −0.05 0.00 0.05 0.10 0.15 0.20

0.00

0.01

0.02

0.03

e

e2

(1−h i)

δ i

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

● ●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

● ●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

● ●●

●

●

●

●

●

●

HC4HC4mHC5

40 / 60

Function for lm output that includes Robust Standard Errors

• I modified the summary.lm() function to allow an argument forrobust standard error type.

• THis just makes a call to vcovHC() from the sandwich package.source("http://www.quantoid.net/files/reg3/summary.lm.robust.r")

41 / 60

Summary of DHS Model

## Estimate Std. Error t value Pr(>|t|)## (Intercept) 0.532959 0.128776 4.1387 0.0001643 ***## log(gdp) 0.070409 0.013988 5.0334 9.55e-06 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

42 / 60

Output for robust.se

summary.lm.robust(mod2, typeHC="HC4m")

#### Call:## lm(formula = secpay ~ log(gdp), data = W)#### Residuals:## Min 1Q Median 3Q Max## -0.12860 -0.06641 -0.00516 0.05650 0.19135#### Coefficients:## Estimate Std. Error t value Pr(>|t|)## (Intercept) 0.53296 0.11318 4.709 2.72e-05 ***## log(gdp) 0.07041 0.01307 5.388 2.99e-06 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#### Residual standard error: 0.0787 on 42 degrees of freedom## Multiple R-squared: 0.3763, Adjusted R-squared: 0.3614## F-statistic: 25.34 on 1 and 42 DF, p-value: 9.55e-06

43 / 60

Comparison

t−statistic

6.5

7.0

7.5

HC0

HC1

HC2

HC3

HC4

HC4m HC

5

●

●

●

●●

●

●

(Intercept)

1.10

1.15

1.20

1.25

1.30 ●

●

●

●●

●

●

democrat

−0.19

−0.18

−0.17

●

●

●

● ●

●

●

giniHC0

HC1

HC2

HC3

HC4

HC4m

HC5

−1.15

−1.10

−1.05

−1.00

●

●

●

● ●

●

●

gini:democrat

44 / 60

Robust Standard Errors (4)

• Since the HCCM is found without a formal model of theheteroskedasticity, relying instead on only the regressors andresiduals from the OLS for its computation, it can be easilyadapted to many applications

• If you choose to use HCCMs, the latest simulation advicesuggests that HC4 SEs perform generally best (better than otherWhite-type HCCMs and bootstrap intervals of various types).

•Cautions

• Robust standard errors should not be seen as a substitute forcareful model specification. In particular, if the pattern ofheteroskedasticity is known, it can often be more effectivelycorrected - and the model more efficiently estimated - using WLS

45 / 60

Comparison of Robust and Classical Standard Errors

King and Roberts (2014) suggest that when robust and classicalstandard errors diverge, it is not an indication that Robust SEs shouldbe used, but that the model is mis-specified.

• A formal test can tell whether the mis-specification is badenough.

46 / 60

GIM Test

The classical variance-covariance matrix of parameters in MLE is thenegative inverse of the Hessian:

•V

c

(�) = �P

�1

The robust variance-covariance matrix is given by:•

V

r

(�) = P

�1MP

�1 where M is the square of the gradient.•

V

r

(�) = V

c

(�) when M = �P

A test of model mis-specification comes by evaluating E(M + P) = 0and obtaining sampling variance estimates of the test statistic with aparametric bootstrap.

47 / 60

GIM Test in R

mod2a <- update(mod2, data=W2)source("http://www.quantoid.net/files/reg3/bootstrapim.normal.r")library(maxLik) # for numericGradient function

bs.out <- bootstrapIM.normal(formula(mod2a), W, 100, 100)

bs.out

## $stat## [,1]## [1,] 21.29937#### $pval## [1] 0.2574257

48 / 60

Test Results

The test suggests that• We do have heteroskedasticity according to ncvTest().• The difference between robust SEs and classical SEs is not

sufficiently big that the choice between the two is meaningful.

49 / 60

Bootstrapping

Remember, we said that the bootstrap didn’t require us to makedistributional assumptions

• We don’t have to make the assumption about the constantvariance in the distribution.

• However, the bootstrap we’ve used isn’t exactly the right one forthis particular situation.

• The wild bootstrap has been shown to perform well in thissituation.

• Deals with uncertainty in a slightly different manner.

50 / 60

Percentile-t Intervals with the Wild Bootstrap

1. Draw t

⇤i

from a population with zero mean and unit variance (e.g.,Rademacher Distribution or Standard Normal).

2. Generate:y

⇤i

= x

i

� + t

⇤i

e

ip1 � h

i

3. Compute �⇤ = (X

0X

)�1X

0y

⇤ and z

⇤ = (�⇤��)S E(�⇤)

, where S E(�⇤) comesfrom an HCCM of your choice. (repeat 1-3 many (R) times)

4. Calculate the �1 =1�↵

2 and �2 =↵2 quantiles of z

⇤. The CI is⇣

� � �1S E(�), � � �2S E(�)⌘

51 / 60

Wild Bootstrap Intervals in R

library(devtools)install_github("davidaarmstrong/hcci")

library(hcci)Weakliem <- read.table("http://www.quantoid.net/files/reg3/weakliem.txt")Weakliem <- Weakliem[-c(25,49), ]mod2 <- lm(secpay ~ gini*democrat, data=Weakliem)out <- Tboot(mod2, J=1000)out

#### Bootstrap-t Intervals using the Single Bootstrap## 8501 bootrstrap iterations## Estimate Lower Upper## (Intercept) 0.941 0.851 1.031## gini 0.005 0.002 0.008## democrat 0.486 0.288 0.682## gini:democrat -0.011 -0.016 -0.005

52 / 60

Comparative Studies of HCCIs

• Long and Ervin (2000) found screening tests have very littlepower so if there looks like there might be a problem, useHCCMs.

• Long and Ervin suggest using HC3 (which was the state of theart at the time).

• Cribari-Neto and Lima (2009) show that HC4 dominate all of theother options (though HC4m and HC5 were not tested), includingthe bootstrap.

• The bootstrap intervals don’t do too poorly relative to the otheroptions, but HC0-HC3 are all quite bad in many circumstances.

53 / 60

Coverage Percentages for HCCMs (N=20)

N=20, x~t3

Empi

rical

Cov

erag

e of

N

omin

al 9

5% In

terv

al

0.7

0.8

0.9

const HC1 HC2 HC3 HC4 HC4m HC5

●

●

●

●●

●

●

Lam

bda=

1

●

●

●

●●

●●


●

●

●

●● ●

●

●

●

●

●

●●

●

Lam

bda=

9

●

●

●

●

●●

●

0.7

0.8

0.9

●

●

●

●●

●●

0.7

0.8

0.9

●

●

●

●● ●

●

normal

Lam

bda=

49


●

●

●

●● ●

●

skew

●

●

●

●●

●●

fat

54 / 60


N=60, x~t3

Empi

rical

Cov

erag

e of

N

omin

al 9

5% In

terv

al

0.75

0.80

0.85

0.90

0.95


●

●●

●

●●

●

Lam

bda=

1

●

●●

●● ● ●


● ● ●● ● ● ●

●

●●

●●

●●

Lam

bda=

9

●

● ●●

●●

●

0.75

0.80

0.85

0.90

0.95

●

● ●●

● ●●

0.75

0.80

0.85

0.90

0.95

●

●●

●●

●●

normal

Lam

bda=

49


●

●●

●

●●

●

skew

●

● ●●

●●

●

fat

55 / 60


N=100, x~t3

Empi

rical

Cov

erag

e of

N

omin

al 9

5% In

terv

al

0.75

0.80

0.85

0.90

0.95


●

●● ●

● ● ●

Lam

bda=

1

●● ● ●

● ● ●


●●

● ●● ●

●

●

● ●●

●● ●

Lam

bda=

9

●

● ● ●●

● ●

0.75

0.80

0.85

0.90

0.95

●

●●

● ● ● ●

0.75

0.80

0.85

0.90

0.95

●

● ●●

● ● ●

normal

Lam

bda=

49


●

●● ●

● ● ●

skew

●

● ●● ● ● ●

fat

56 / 60

Coverage Percentages for Bootstrap-t Intervals

57 / 60

Conclusions

• Heteroskedasticity, while not bias-inducing, can cause problemswith efficiency of the model.

• Tests of heteroskedasticity only have sufficient power when n islarge (e.g., > 250).

• If the errors are found to be heteroskedastic, there are a numberof potential fixes that all require various assumptions:

1. If the heteroskedasticity is proportional to a single variable,weighted least squares can be used.

2. A model of the variance can be obtained through FGLS (if themodel is a “nuisance” model) or heteroskedastic regression (if themodel of the variance is substantively interesting).

3. If heteroskedasticity is thought to exist and no suitable functionalform of the variance can be found, then robust standard errorscould work. There is little difference in empirical coverage ratesbetween HC3, HC4, HC4m and HC5 standard errors insimulation. That said, HC4 may be marginally better.

58 / 60

Readings

Today: Heteroskedasticity

⇤ Fox (2008) Chapters 12 & 13⇤ Fox and Weisberg (2011) Chapters 3 & 6⇤ Long and Ervin (2000)⇤ Harvey (1976)� Cribari-Neto (2004), Cribari-Neto, Souza and Klaus L.P.Vasconcellos (2007), Cribari-Neto and da Silva (2011)

Tomorrow: Model Selection

⇤ Fox (2008) Chapter 22⇤ Leamer (1983)⇤ Leamer and Leonard (1983)⇤ Box (1976), Box and Hunter (1962)� Freedman (1991b,a), Berk (1991), Blalock (1991), Mason (1991)� Miller (2002), Breiman (1992), Breiman and Spector (1992)⇤ Burnham and Anderson (2004)

59 / 60

Berk, Richard A. 1991. “Toward a Methodology for Mere Mortals.” Sociological Methodology 21:315–324.Blalock, Hubert M. 1991. “Are There Really Any Constructive Alternatives to Causal Modeling?” Sociological Methodology

21:325–335.Box, George E. P. 1976. “Science and Statistics.” Journal of the American Statistical Association 71(356):791–799.Box, George E. P. and William G. Hunter. 1962. “A Useful Method for Model-Building.” Technometrics 4(3):301–318.Breiman, Leo. 1992. “The Little Bootstrap and Other Methods for DImensionality Selection in Regression: X-Fixed Prediction

Error.” Journal of the American Statistical Association 87(419):738–754.Breiman, Leo and Philip Spector. 1992. “Submodel Selection and Evaluation in Regression. The X-Random Case.” Internation

Statistical Review 60(3):291–319.Burnham, Kenneth P. and David R. Anderson. 2004. “Multimodel Inference: Understanding AIC and BIC in Model Selection.”

Sociological Methods and Research 33(2):261–304.Cribari-Neto, Francisco. 2004. “Asymptotic Inference Under Heteroskedasticity of Unknown Form.” Computational Statistics &

Data Analysis 45:215–233.Cribari-Neto, Francisco, Tatiene C Souza and Klaus L.P. Vasconcellos. 2007. “Inference Under Heteroskedasticity and

Leveraged Data.” Communications in Statistics 36(10):1877–1888.Cribari-Neto, Francisco and Wilton Bernardio da Silva. 2011. “A New Heteroskedasticity-Consistent Covariance Matrix

Estimator for the Linear Regression Model.” Advances in Statistical Analysis 95(1):129–146.

Fox, John. 2008. Applied Regression Analysis and Generalized Linear Models, 2nd edition. Thousand Oaks, CA: Sage, Inc.

Fox, John and Sanford Weisberg. 2011. An R Companion to Applied Regression, 2nd ed. Thousand Oaks, CA: Sage.Freedman, David A. 1991a. “A Rejoinder to Berk, Blalock and Mason.” Sociological Methodology 21:353–358.Freedman, David A. 1991b. “Statistical Models and Shoe Leather.” Sociological Methodology 21:291–313.Harvey, Andrew C. 1976. “Estimating Regression Models with Multiplicative Heteroskedasticity.” Econometrica 44(3):461–465.Leamer, Edward E. 1983. “Let’s Take the Con Out of Econometrics.” The American Economic Review 73(1):31–43.Leamer, Edward E. and Herman Leonard. 1983. “Reporting the Fragility of Regression Estimates.” The Review of Economics

and Statistics 65(2):306–317.Long, J. Scott and Laurie H. Ervin. 2000. “Using Heteroscedasticity Consistent Standard Errors in the Linear Regression

Model.” The American Statistician 54(3):217–224.Mason, William M. 1991. “Freedman is Right as Far as He Goes, but THere is More, and It’s Worse. Statisticians Could Help.”

Sociological Methodology 21:337–357.

Miller, Alan. 2002. Subset Selection in Regression, 2nd edition. Boca Raton, FL: Chapman & Hall/CRC.

60 / 60

Date post:	05-Feb-2018
Category:	Documents
Upload:	ngonguyet
View:	245 times
Download:	0 times

Regression III Non-Normality and Heteroskedasticity · PDF fileNon-constant Error Variance...

Documents