Generalized Regression Model
Based on Greene’s Note 15 (Chapter 8)
Generalized Regression ModelSetting: The classical linear model
assumes that E[] = Var[] = 2I. That is, observations are uncorrelated and all are drawn from a distribution with the same variance. The generalized regression (GR) model allows the variances to differ across observations and allows correlation across observations.
Implications• The assumption that Var[] = 2I is used to derive the
result Var[b] = 2(XX)-1. If it is not true, then the use of s2(XX)-1 to estimate Var[b] is inappropriate.
• The assumption was used to derive most of our test statistics, so they must be revised as well.
• Least squares gives each observation a weight of 1/n. But, if the variances are not equal, then some observations are more informative than others.
• Least squares is based on simple sums, so the information that one observation might provide about another is never used.
GR Model• The generalized regression model: y = X + , E[|X] = 0, Var[|X] = 2. Regressors are well behaved. We consider some
examples with Trace = n. (This is a normalization with no content.)
• Leading Cases– Simple heteroscedasticity– Autocorrelation– Panel data and heterogeneity more generally.
Least Squares• Still unbiased. (Proof did not rely on )• For consistency, we need the true variance of b,
Var[b|X] = E[(b-β)(b-β)’|X] = (X’X)-1 E[X’εε’X] (X’X)-1 = 2 (X’X)-1 XX (X’X)-1 .
Divide all 4 terms by n. If the middle one converges to a finite matrix of constants, we have the result, so we need to examine
(1/n)XX = (1/n)ij ij xi xj.
This will be another assumption of the model.• Asymptotic normality? Easy for heteroscedasticity case, very
difficult for autocorrelation case.
Robust Covariance Matrix• Robust estimation:• How to estimate Var[b|X] = 2 (X’X)-1 XX (X’X)-1 for
the LS b? • The distinction between estimating 2 an n by n matrix and estimating 2 XX = 2 ijij xi xj
• For modern applied econometrics, – The White estimator– Newey-West.
The White Estimator
n1 2 1i i ii 1
n 2i2 i 1
2i
i i2
12
2 2
Est.Var[ ] ( ) e ( )
eUse ˆ nne ˆ ˆ = , =diag( ) note tr( )=nˆ ˆˆ
ˆˆEst.Var[ ] n n n nˆ
Does ˆ n n
b X'X x x' X'X
Ω Ω
X'X X'ΩX X'Xb
X'ΩX X'ΩX
0?
Newey-West Estimator
n 20 i i ii 1
L n1 l t t l t t l t l tl 1 t l 1
l
Heteroscedasticity Component - Diagonal Elements1 en
Autocorrelation Component - Off Diagonal Elements1 we e ( )n
lw 1 = "Bartlett weight"L 11Est.Var[ ]=n
S x x'
S x x x x
Xb1 1
0 1[ ]n n
'X X'XS S
Generalized Least SquaresA transformation of the model: P = -1/2. P’P = -1
Py = PX + P or y* = X* + *. Why? E[**’|X*]= PE[’|X*]P’ = PE[’|X]P’ = σ2PP’ = σ2 -1/2 -1/2 = σ2 0
= σ2I
Generalized Least SquaresAitken theorem. The Generalized Least
Squares estimator, GLS. Py = PX + P or y* = X* + *. E[**’|X*]= σ2I Use ordinary least squares in the
transformed model. Satisfies the Gauss – Markov theorem.
b* = (X*’X*)-1X*’y*
Generalized Least SquaresEfficient estimation of and, by implication,
the inefficiency of least squares b. = (X*’X*)-1X*’y* = (X’P’PX)-1 X’P’Py = (X’Ω-1X)-1 X’Ω-1y
≠ b. is efficient, so by construction, b is not.
β̂
β̂β̂
Asymptotics for GLSAsymptotic distribution of GLS. (NOTE: We
apply the full set of results of the classical model to the transformed model).
UnbiasednessConsistency - “well behaved data”Asymptotic distributionTest statistics
Unbiasedness
1
1
1
ˆ ( )( )
ˆE[ ] ( ) E[ | ]
= if E[ | ]
-1 -1
-1 -1
-1 -1
β X'Ω X X'Ω y β X'Ω X X'Ω ε
β| X =β X'Ω X X'Ω ε X
β ε X 0
Consistency
12
ni ii 1
ii
Use Mean Square
ˆVar[ | ]= ?n n
Requires to be "well behaved"nEither converge to a constant matrix or diverge.Heteroscedasticity case:
1 1n n
Autocorrelatio
-1
-1
-1
X'Ω Xβ X 0
X'Ω X
X'Ω X x x'
n n 2i ji 1 j 1
ij
n case:1 1 . n terms. Convergence is unclear.n n
-1X'Ω X x x '
Asymptotic Normality
111
11
1
' 1ˆn( ) n 'n nConverge to normal with a stable variance O(1)?
' a constant matrix?n1 ' a mean to which we can apply then central limit theorem?Het
X Ω Xβ β X Ω ε
X Ω X
X Ω ε
n1 2i i i ii 1
i i i i
i i
eroscedasticity case?1 1' . Var , is just data.n n Apply Lindeberg-Feller. (Or assuming / is a draw from a common distribution with mean and fixed va
x xX Ω ε =
xriance - some recent treatments.)
Autocorrelation case?
Asymptotic Normality (Cont.)
n n1i j i ii 1 j 1
1
For the autocorrelation case1 1' n n
Does the double sum converge? Uncertain. Requires elementsof to become small as the distance between i and j increases.(Has to resem
ijX Ω ε = Ω x x
Ωble the heteroscedasticity case.)
Test Statistics (Assuming Known Ω)
• With known Ω, apply all familiar results to the transformed model.
• With normality, t and F statistics apply to least squares based on Py and PX
• With asymptotic normality, use Wald statistics and the chi-squared distribution, still based on the transformed model.
Generalized (Weighted) Least Squares:Heteroscedasticity
1
22 2
n
1
-1/2 2
n1
n n1i ii 1 i 1
i i
i i
i2
0 ... 00 ... 0Var[ ] 0 0 ... 00 0 ...
1/ 0 ... 00 1/ ... 0 0 0 ... 00 0 ... 1/
1 1ˆ ( ) ( ) y
ˆy
ˆ
-1 -1i iβ X'Ω X X'Ω y x x x
xβ2
ni 1
n K
Autocorrelation t = t-1 + ut (‘First order autocorrelation.’ How does this come
about?) Assume -1 < < 1. Why?ut = ‘nonautocorrelated white noise’t = t-1 + ut (the autoregressive form) = (t-2 + ut-1) + ut
= ... (continue to substitute)= ut + ut-1 + 2ut-2 + 3ut-3 + ...= (the moving average form)
(Some observations about modeling time series.)
Autocorrelation
2t t t 1 t 1
it ii=0
22i 2 u
u 2i=0
t t 1 t t 1 t2
t t 1 t t 1 t
Var[ ] Var[u u u ...] = Var u
= 1An easier way: Since Var[ ] Var[ ] and uVar[ ] Var[ ] Var[u ] 2 Cov[ ,u ] =
2 2t u
2u
2
Var[ ]
1
Autocovariances
t t 1 t 1 t t 1
t 1 t 1 t t 1
t-1 t2u
2
t t 2 t 1 t t 2
Continuing...Cov[ , ] = Cov[ u , ] = Cov[ , ] Cov[u , ] = Var[ ] Var[ ]
=(1 )Cov[ , ] = Cov[ u , ]
t 1 t 2 t t 2
t t 12 2
u2
= Cov[ , ] Cov[u , ] = Cov[ , ]
= and so on.(1 )
Autocorrelation Matrix
2 1
22
2 2 32
1 2 3
11
11
1
(Note, trace = n as required.)
Ω
Ω
T
T
u T
T T T
Generalized Least Squares
2
1/ 2
21
2 21/ 23 2
T T 1
1 0 0 ... 01 0 ... 0
0 1 ... 0... ... ... ... ...0 0 0 01 yy yy y
...y
Ω
Ω y=
The Autoregressive Transformation
t t t t 1 t
t 1 t 1
t t 1 t t 1
t t 1 t
y u y
y y ( ) + ( )y y ( ) + u
(Where did the first observation go?)
t
t-1
t t-1
t t-1
x 'βx 'β
x x 'βx x 'β
Unknown • The problem (of course), is unknown. For now, we will
consider two methods of estimation:– Two step, or feasible estimation. Estimate first, then
do GLS. Emphasize - same logic as White and Newey-West. We don’t need to estimate . We need to find a matrix that behaves the same as (1/n)X-1X.
– Properties of the feasible GLS estimator• Maximum likelihood estimation of , 2, and all at the
same time.– Joint estimation of all parameters. Fairly rare. Some
generalities…– We will examine two applications: Harvey’s model of
heteroscedasticity and Beach-MacKinnon on the first order autocorrelation model
Specification must be specified first.• A full unrestricted contains n(n+1)/2 - 1
parameters. (Why minus 1? Remember, tr() = n, so one element is determined.)
is generally specified in terms of a few parameters. Thus, = () for some small parameter vector . It becomes a question of estimating .
• Examples:
Heteroscedasticity: Harvey’s Model • Var[i | X] = 2 exp(zi)
• Cov[i,j | X] = 0
e.g.: zi = firm size
e.g.: zi = a set of dummy variables (e.g., countries) (The groupwise heteroscedasticity model.)
• [2 ] = diagonal [exp( + zi)],
= log(2)
AR(1) Model of Autocorrelation
2 1
22
2 2 32
1 2 3
11
11
1
Ω
T
T
u T
T T T
Two Step EstimationThe general result for estimation when is
estimated.GLS uses [X-1X]X -1 y which converges in
probability to .We seek a vector which converges to the same
thing that this does. Call it “Feasible GLS” or FGLS, based on [X X]X y
The object is to find a set of parameters such that [X X]X y - [X -1 X]X -1 y 0
ˆ -1Ω ˆ -1Ω
ˆ -1Ω ˆ -1Ω
Feasible GLS
For FGLS estimation, we do not seek an estimator of such that
ˆ ˆThis makes no sense, since is nxn and does not "converge" to
anything. We seek a matrix such that
Ω
Ω- Ω 0ΩΩ
ˆ (1/n) (1/n)For the asymptotic properties, we will require that
ˆ (1/n) (1/n)Note in this case, these are two random vectors, which we requireto converge
-1 -1
-1 -1
X'Ω X - X'Ω X 0
X'Ω - X'Ω 0
to the same random vector.
Two Step FGLS(Theorem 8.5) To achieve full efficiency, we
do not need an efficient estimate of the parameters in , only a consistent one. Why?
Harvey’s ModelExamine Harvey’s model once again.Methods of estimation:Two step FGLS: Use the least squares residuals to
estimate , then use {X[Ω()]-1 X}-1X’[Ω()]-1y to estimate .
Full maximum likelihood estimation. Estimate all parameters simultaneously.
A handy result due to Oberhofer and Kmenta - the “zig-zag” approach.
Examine a model of groupwise heteroscedasticity.
Harvey’s Model for Groupwise Heteroscedasticity
Groupwise sample, yig, xig,…
N groups, each with Ng observations.
Var[εig] = σg2
Let dig = 1 if observation i,g is in group j, 0 else.
= group dummy variable.Var[εig] = σg
2 exp(θ2d2 + … θGdG)Var1 = σg
2 , Var2 = σg2 exp(θ2) and so on.
Estimating Variance Components• OLS is still consistent:• Est.Var1 = e1’e1/N1 estimates σg
2
• Est.Var2 = e2’e2/N2 estimates σg2 exp(θ2)
• Estimator of θ2 is ln[(e2’e2/N2)/(e1’e1/N1)]• (1) Now use FGLS – weighted least squares• Recompute residuals using WLS slopes• (2) Recompute variance estimators• Iterate to a solution… between (1) and (2)