IINN IIIIVI IVI I I I II IIIII II tlI IIIIi I IIVll lllll
FEW 240~~-:i` `!.~ï.~3.~~'~~~~ ~;;r~t St? ï'H1:~`rC~,,,
~ l~ Tlt~p~RG ~~ .,..-!
EFFICIENCY GAINS DUE TO USINGMISSING DATA PROCEDURES
IN REGRESSION MODELS
Th.E. NijmanF.C. Palm
November 1986
EFFICIENCY GAINS DUE TO USING MISSING DATAPROCEDURES IN REGRESSION MODELS
Th.E. Nijmanwr
F.C. Palm
november 19f36
.
In the chapter on "Economic Data Issues" of the Handbook of Econometrics,
Griliches (1986) analyzes the asymptotic variance of an estimator of a
reqression coefficient using imputations for the missing regressor values
and he compares it with that of an estimation procedure based on the com-
plete observations only. His derivation of an expression for the relati-
ve efficiency is íncorrect. In this note, we qive the correct result and
show that the relative efficiency of three estimators desiqned to handle
incomplete samples depends on parameters that have a atraightforward sta-
tistical interpretation. In terms of a gain of asymptotíc efficiency,
the use of these estimators is equivalent to the observation of a percen-
tage of the values which are actually missing. This percentage depends on
three R2-measures only, which can be straiqhtforwardly computed in applied
work. Therefore it should be easy in practice to check whether it is
worthwhile to use a more elaborate estimator.
The authors thank Professor Z. Griliches for his comments on an earlierversion of this note.
t
rt
Department of Econometrics, Tilburq University, P.O.B. 90153,
5000 LE Tilburg, The Netherlands.
Department of Economics, University of Limburg, P.O.B. 616,6200 MD Maastricht, The Netherlands.
1
Griliches (1986) considers the following regression model
2yi - Rxi t yzi t ei . ei ~ IN(0,o ),
and
xi - dzi t vi , vi ~ IN(O,Q~),
where the regressors xi and zi are assumed to be independent of the
correspondinq disturbances ei and vi. Actually, the normality
assumption is not made by Griliches, but it will be required below
for maximum likelihood (ML) estimation and it does not affect the
results for the other estimators. The variables yi and zi are ob-
served for i- 1, ...~N1 t N2, whereas xi is observed for i- 1,
(1)
(2)
... N1 only.Besides the OLS estimator of the reqression of yi on xi and z1 for
i- 1, ... N1 only, denoted by ~a and ya, Griliches ( 1986) considers
an estimation procedure in which the missing xi's are replaced by
d z., where d is the OLS estimate of d in ( 2) usinq the first N1 ob-a i a -servations. The estimate yatb is subsequently computed by OIS on
yiwhere
i 5It
- Baxi - Yzi t wi.
t e Bv, f e.(~dwi - ei .i i iN1 and xi - dazi and Ai -
is straightforward to show yatb
Ai - 0 if
can be alternatively com-
- dasa)zi, with xi - xi and1 otherwise.
that
puted by OLS of yi on xi and zi
yi - Bxi t yzi t{ei t A1Bvi t Bei(d - da)zi}-
Contrary to what is stated by Griliches ( 1986), the contribution ofSAi(d - da)zi to the asymptotic variance of yatb is not negliqible
if plim N2 N-1 - À~ 0 for N-~ ~, with N~ N1 } N2.
As the three components of the disturbance c~ (4) are independent,
the large sample distributíon of yatb is given by
`IN (Yatb - Y) ~ N(O,V)a
(3)
(4)
with V~ plim N(X'X)-1{X'StX } X'WS2 E W'X} (X'X)-1, (5)
where X is the matrix of regressors in (4), W is a vector with typical2 2 2element Aizi, Sl is a diagonal matrix with typical element Q f 9is av,
and E is the asymptotic variance of da.
After some alqebra, we get2a
V - {(1 - r2 )-1 t ~(U-1 - 2)}.aQ z
2 xz
Nwhere QZ - plim N-1 E zi (assumed to exist), U a
o2(SZQV } Q2)-1 and1-1
r2 is the theoretical R2 of the regression (2) of x on z. The resultxz
in (6) has been obtained by Gouriéroux and Monfort [1981, expression
(11) on p. 583].
(6)
The relative efficiency of Yatb with respect to Yais
Avar(vNY )Eff(Y ) - -a{b - 1 t ~(U-1 - 2)(1 - r2 ). (7)atb Avar ( v N Ya) xz
Accordinq to (7), using imputed values as in (3) leads to a gain of
efficiency compared with using complete observations only if )t ~},1 - ~
which is more stringent than the condition U~ 2-~ given by
Griliches (1986). Both conditionsrrequire that the unpredictable part
of x from z is not too important relative to Q2, the overall noise
level of (1). 21 - r
As U - 2xz ~ ~ (8)
1 - ryz
where r2 and rz denote the theozetical Rz's of a reqression of y ony xz yz
respectively x and z and on z only, it is obvious that a sufficient con-
dition for an efficiency gain is ry Xz ~}, i.e. the predictible pazt of
y is small.
As noted by Griliches (1986) and others, an efflciency qain is assuzed
if (4) is estimated by a generalized least squares (GLS) method which
3
takes the correlation structure of the disturbance in (4) into account.Again, the term AiB(d - da)zi cannot be neqlected (see Palm and Nijman
(1982) and Nijman and Palm (1985)). Alternatively, the fully efficient
ML estimator can be computed, e.q. using the convenient reparametrisationsuggested by Gouriéroux and Monfort (1981). From their results, the re-
lative efficiency of the GLS and ML estimators with respect to that of Yacan be obtained
Eff(YGLS) -
1 - a)1(1 - rXZ)
and Eff(Y~) - 1- ay(1 - rXZ) - 2aU(1 - u)rxz'
(9)
(10)
The relative efficiency in (7), ( 9) and ( 10) only depends on the threemagnitudes ~, u and rXZ. Equation ( 9) indicates that in terms of a gainof asymptotic efficiency, the use of GLS is equivalent to the observationof 100 U(1 - rX2) B of the values of xi that are actually missing. Similarexpressions can be obtained from (7) and (10) for Yatb and Y~ respecti-vely. The values in Table 1 illustrate this result.
TABLE 1 Percentage of missing observations that are regained by theuse of missing data prcx:edures instead of the complete dataonly.
1- ryxzu - 21- ryz
2rXZGain in percentage points for- - -
' Yatb YGLS YML
.3 .2 -106 24 32
.3 .8 - 27 6 40
.6 .2 27 48 58
.6 .8 7 12 50
.9 .2 71 72 76
.9 .8 18 18 32
Note that a good fit in (2) yielding a"qood proxy" for the missingvalues of xi does not imply that a large part of the missing information
on x, can be recovered, because of the induced multicollinearity betweenixi and zi in (4). Especially, when rXZ is small, the efficiency gainobtained by usinq the appropriate estlmators can be substantial. Thevalue of u is crucial for the efficíency of Ya}b. The loss of efficiency
4
can be important when u ~}. This lose incr~ases ae rXZ decreasea.Finally, íf u is close to one, i.e. xi is not very important in ex-plaining y in equation (1), all three approaches which take intoaccount the incomplete data, yield about equally efficient estimators.
5
References
Gouriéroux, C., and A. Monfort (1981), "On the problem of missing data
in linear models", Review of Economic Studies, 48, 579-586.
Griliches, Z. (1986), "Economic data issues", in Z. Griliches and
M.D. Intriligator, eds, Handbook of Econometrics, North
Holland, Amsterdam, 1466-1514.
Nijman, Th.E., and F.C. Palm (1985), "Consistent estimation of a regression
model with incompletely observed exogenous variable",
Netherlands Central Bureau of Statistics, unpublished paper.
Palm, F.C., and Th.E. Nijman ( 1982), "Linear regression using both tem-
porally aggregated and temporally disaggregated data",
Journal of Econometrics, 19, 333-343.
i
IN 1985 REEllS VERSCHENEN
168 T.M. Doup, A.J.J. TalmanA continuous deformation algorithm on the product space of unitsimplices
169 P.A. BekkerA note on the identification of restricted factor loading matricea
170 J.H.M. Donders, A.M, van NunenEconomische politiek ín een twee-sectoren-model
171 L.H.M. Bosch, W.A.M. de LangeShift work in health care
172 B.B. van der GenugtenAsymptotic Normality of Least Squares Estimators in AutoregressiveLinear Regression Models
173 R.J, de GroofGe3soleerde versus gecoSrdineerde economische politiek in een twee-regiomodel
174 G, van der Laan, A.J.J. TalmanAdjustment processes for finding economic equilibria
175 B.R. MeijboomHorizontal mixed decomposition
176 F. van der Ploeg, A.J. de ZeeuwNon-cooperative strategies for dynamic policy games and the problemof time inconsistency: a comment
177 B.R. MeijboomA two-level planning procedure with respect to make-or-buy deci-sions, including cost allocations
178 N.J. de BeerVoorspelprestaties van het Centraal Planbureau in de periode 1953t~m 1980
178a N.J, de BeerBIJLAGEN bij Voorspelprestaties van het Centraal Planbureau in deperiode 1953 t~m 1980
179 R.J.M. Alessie, A. Kapteyn, W.H.J, de FreytasDe invloed van demografische factoren en ínkomen op consumptieveuitgaven
180 P. Kooreman, A. KapteynEstimatíon of a~;ame theoretic model of household labor supply
l81 A.J. de 'Lceuw, A.C. MeijdamOn Expectatíons, Information and Uynamic Game liquílibria
ii
182 Cristina PennavajaPeriodization approaches of capitalist development.A critical survey
183 J.P.C. Kleíjnen, G.L.J. Kloppenburg and F.L. MeeuwsenTesting the mean of an asymmetric population: Johnson's modified Ttest revisited
184 M.O. Nijkamp, A.M. van NunenFreia versus Vintaf, een analyse
185 A.H.M. GerardsHomomorphisms of graphs to odd cycles
186 P. Bekker, A. Kapteyn, T. WansbeekConsistent sets of estimates for regressions with correlated oruncorrelated measurement errors in arbitrary subsets of allvariables
187 P. Bekker, J, de LeeuwThe rank of reduced dispersion matrices
188 A.J. de Zeeuw, F, van der PloegConsístency of conjectures and reactíons: a critique
189 E.N. KertzmanBelastingstructuur en privatisering
190 J.P.C. KleijnenSimulation with too many factora: review of random and group-screening designs
191 J.P.C. KleijnenA Scenario for Sequential Experimentation
192 A. DortmansDe loonvergelijkingAfwenteling van collectieve lasten door loontrekkera?
193 R. Heuts, J, van Lieshout, K. BakenThe quality of some approximation formulas in a continuous reviewinventory model
194 J.P.C. KleijnenAnalyzing simulation experiments with common random numbers
195 P.M. KortOptimal dynamic investment policy under financial restrictions andadjustment costs
196 A.H, van den Elzen, G. van der Laan, A.J.J. TalmanAdjustment processes for finding equílibria on the simplotope
iii
197 J.P.C, KleijnenVariance heterogeneity in experimental design
198 J.P.C. KleijnenSelecting random number seeds in practice
199 J.P.C. KleijnenRegression analysis of simulatíon experiments: functional softwarespecification
200 G, van der Laan and A.J.J, TalmanAn algorithm for the linear complementarity problem with upper andlower bounds
201 P. KooremanAlternative specífication tests for Tobit and related models
iv
IN 1986 ltl?F.US VEItSCHh:NP:N
202 J.H.F. SchilderinckInterregional Structure of the European Community. Part III
203 Antoon van den Elzen and Dolf TalmanA new strategy-adjustment process for computing a Nash equilibriumin a noncooperative more-person game
204 Jan VíngerhoetsFabrication of copper and copper semis in developing countries.A review of evidence and opportunities.
205 R. Heuts, J. v. Lieshout, K. BakenAn inventory model: what is the influence of the shape of the leadtime demand distributton?
206 A, v. Soest, P. KooremanA Microeconometric Analysis of Vacation Behavior
207 F. Boekema, A. NagelkerkeLabour Relations, Networks, Job-creatíon and Regional DevelopmentA view to the consequences of technological change
208 R. Alessie, A. KapteynHabit Formation and Interdependent Preferences in the Almost IdealDemand System
209 T. Wansbeek, A. KapteynEstimation of the error components model with incomplete panels
210 A.L. HempeníusThe relation between dividends and profits
211 J. Kriens, J.Th. van LieshoutA generalisation and aome properties of Markowitz' portfolioselection method
212 Jack P.C. Kleijnen and Charles R. StandridgeExperimental design and regression analysis in simulation: an FMScase study
213 T.M. Doup, A.H. van den Elzen and A.J.J. TalmanSimplicial algorithms for solving the non-linear complementarityproblem on the simplotope
214 A.J.W, van de GevelThe theory of wage differentials: a correction
215 J.P.C. Kleijnen, W. van GroenendaalRegression analysis of factorial designs with sequential replica-tion
V
216 T.E, Nijman and F,C, PalmConsistent estimation of rational expectations models
217 P,M. KortThe firm's ínvestment policy under a concave adjustment cost func-tion
218 J.P,C. KleijnenDecision Support Systems ( DSS), en de kleren van de keizer ...
219 T.M. Doup and A,J..J, TalmanA continuous deformation algorithm on the product space of unitsimplices
220 T.M, Doup and A.J.J, TalmanThe 2-ray algorithm for solving equilibrium problems on the unitsimplex
221 Th. van de Klundert, P, PetersPrice Inertia in a Macroeconomic Model of Monopolistíc Competition
222 Christian MulderTesting Korteweg's rational expectations model for a small openeconomy
223 A.C, Meijdam, J,E,J. PlasmansMaximum Likelihood Estimation of Econometric Models with RationalExpectations of Current Endogenous Variables
224 .~rie Kapteyn, Peter Kooreman, Arthur van SoestNon-convex budget sets, institutional constraints and imposition ofconcavity in a flexibele household labor supply model.
225 R,J, de GroofInternationale coSrdinatie van economische politiek in een twee-regio-twee-sectoren model.
226 Arthur van Soest, Peter KooremanComment on 'Microeconometric Demand Systems with Binding Non-Nega-tivity Constraints: The Dual Approach'
227 A.J,J, Talman and Y, Yamamotoa globally convergent simplicial algorithm for stationary pointproblems on polytopes
228 Jack P,C. Kleijnen, Peter C.A, Karremans, Wim K, Oortwijn, WillemJ.H, van GroenendaalJackknifing estímated weighted least squares
229 A.H. van den Elzen and G, van der LaanA price adjustment for an economy with a block-diagonal pattern
230 M,H,C. PaardekooperJacobi-type algorithms for eigenvalues on vector- and parallelcomputer
vi
231 J.Y.C. KleijnenAnalyzing simulation experiments with common random numbers
232 A.B.T.M. van Schaik, R.J. MulderOn Superimposed Recurrent Cycles
233 M.H.C. PaardekooperSameh's parallel eigenvalue algorithm revisited
234 Pieter H.M. Ruys and Ton J.A. StorckenPreferences revealed by the choice of friends
235 C.J.J. Huys en E.N. KertzmanEffectieve belastíngtarieven en kapitaalkosten
236 A.M.H. GerardsAn extension of Kónig's theorem to graphs with no odd-K4
237 A.M.H. Gerards and A. SchrijverSigned Graphs - Regular Matroids - Grafts
238 Rob J.M. Alessie and Arie KapteynConsumption, Savings and Demography
239 A.J, van ReekenAegrippen rondom "kwaliteit"
M I ~ A ~ÍÍ~~~~I~~N~~~~ VII II
page 1page 2page 3page 4page 5page 6page 7page 8page 9page 10page 11page 12page 13page 14page 15