Robust tests for heteroskedasticity and autocorrelation using score function

ROBUST TESTS FOR HETEROSKEDASTICITY ANDAUTOCORRELATION USING SCORE FUNCTIONAnil K. BeraDepartment of EconomicsUniversity of IllinoisChampaign, IL 61820USAPin T. NgDepartment of EconomicsUniversity of HoustonHouston, TX 77204-5882USAKey Words and Phrases: Nonparametric statisitics; score function; speci�ca-tion tests; heteroskedasticity; autocorrelation; robustnessABSTRACTThe standard Lagrange multiplier test for heteroskedasticity was originallydeveloped assuming normality of the disturbance term [see Godfrey (1978b),and Breush and Pagan (1979)]. Therefore, the resulting test depends heavilyon the normality assumption. Koenker (1981) suggests a studentized formwhich is robust to nonnormality. This approach seems to be limited becauseof the unavailability of a general procedure that transforms a test to a robustone. Following Bickel (1978), we use a di�erent approach to take account ofnonnormality. Our tests will be based on the score function which is de�ned asthe negative derivative of the log-density function with respect to the underly-ing random variable. To implement the test we use a nonparametric estimateof the score function. Our robust test for heteroskedasticity is obtained byrunning a regression of the product of the score function and ordinary leastsquares residuals on some exogenous variables which are thought to be causingthe heteroskedasticity. We also use our procedure to develop a robust test forautocorrelation which can be computed by regressing the score function on thelagged ordinary least squares residuals and the independent variables. Finally,we carry out an extensiveMonte Carlo study which demonstrates that our pro-posed tests have superior �nite sample properties compared to the standardtests. 1

1 IntroductionConventional model speci�cation tests are performed with some paramet-ric, usually the Gaussian, assumptions on the stochastic process generatinga model. These parametric speci�cation tests have the drawback of havingincorrect sizes, suboptimal power or even being inconsistent when any of theparametric speci�cations of the stochastic process is incorrect, [see Box (1953),Tukey (1960), Bickel (1978) and Koenker (1981) for theoretical arguments, andBera and Jarque (1982), Bera and McKenzie (1986), and Davidson and MacK-innon (1983) for Monte Carlo evidence]. In this paper, we use a nonparametricestimate of score function to develop some tests for heteroskedasticity and au-tocorrelation which are robust to distributional misspeci�cations.The importance of the score function, de�ned as (x) = � log f 0(x) =�f 0(x)f(x) , where f(x) is the probability density function of a random variable,to robust statistical procedures has been sporadically mentioned, implicitly orexplicitly, throughout the past few decades [see, e.g., Hampel (1973, 1974),Bickel (1978), Koenker (1982), Joiner and Hall (1983), Manski (1984), andCox (1985)]. Only during the past decade, numerous works were done onnonparametric estimation of the score function, [see Stone (1975), Cs�org}o andR�ev�esz (1983), Manski (1984), Cox (1985), Cox and Martin (1988), and Ng(1994, 1991)]. These facilitate our development of nonparametric tests of spec-i�cations using the score function without making any explicit parametric as-sumption on the underlying distribution. Therefore, we expect our proceduresto be immune to loss of powers and incorrect sizes caused by distributionalmisspeci�cations.The use of the score function in the context of model speci�cation testingis not new. Robustifying the procedures of Anscombe (1961) and Anscombeand Tukey (1963), Bickel (1982) derives the test statistics for testing nonlin-earlity and heteroskedasticity which implicitly use the score function, [see alsoPagan and Pak (1991)]. In this paper, we follow the Lagrange multiplier testprocedure and derive the test statistics which turn out to be functions of thescore function.Our nonparametric test for heteroskedasticity is obtained by running a re-gression of the product of the score function and the ordinary least squaresresiduals on some exogenous variables which are thought to be causing theheteroskedasticity. The nonparametric autocorrelation test is performed byregressing the score function on the lagged residuals and the independent vari-ables, which may include lagged dependent variables. We also show in thepaper that when normality assumption is true, our tests for heteroskedastic-ity and autocorrelation reduce to the familiar Breusch and Pagan (1979) orGodfrey (1978b) tests for heteroskedasticity and Breusch (1978) or Godfrey2

(1978a) tests for autocorrelation respectively.We perform an extensive Monte Carlo study which demonstrates that ourproposed tests have superior �nite sample properties compared to the standardtests when the innovation deviates from normality while still retain comparableperformances under the normal innovation.The model and the test statistics are introduced and de�ned in Section 2.In Section 3, we derive the one-directional test statistics for heteroskedasticityand autocorrelation. Section 4 gives a brief review of existing score functionestimation techniques and a description of the score estimator used in theMonte Carlo study. The �nite sample performances of the conventional teststatistics and our proposed nonparametric tests are reported in Section 5.2 The Model and the Test Statistics2.1 The ModelIn order to compare our �ndings with those of previous studies, we considerthe following general model which incorporates various deviations from theclassical linear regression model (L) yi = x0i � + ui i = 1; : : : ; n (1)�(L) ui = �i (2)where yi is a dependent variable, xi is a k x 1 vector of non-stochastic explana-tory variables, � is a k x 1 vector of unknown parameters, and (L) and �(L)are polynomials in the lag operator with (L) = 1 � mXj=1 jLj�(L) = 1 � pXj=1 �jLj :The normalized innovation term is de�ned as zi = �i�i . The innovation �i isindependently distributed and has a symmetric probability density functionf�(�i) = 1�i fz( �i�i ) with the location parameter assumed to be zero and thescale parameter taking the form�i = qh(v0i�)in which vi is a q x 1 vector of �xed variables having one as its �rst element,�0 = (�1; �02) is a q x 1 vector of unknown parameters, and h is a known,smooth positive function with continuous �rst derivative. The score functionof the innovation �i is de�ned as �(�i) = �f 0�(�i)f�(�i) = � 1�i f 0z(zi)fz(zi) = 1�i z( �i�i ): (3)3

Model (1) and (2) can be written more compactly asyi = Y 0i + x0i� + ui (4)ui = U 0i� + �i (5)whereYi = (yi�1; : : : ; yi�m)0Ui = (ui�1; : : : ; ui�p)0 = ( 1; : : : ; m)0� = (�1; : : : ; �p)0 :In matrix form the model isy = Y + X� + u = W� + uu = U� + �wherey = 0BB@ y1...yn 1CCA Y = 2664 Y 01...Y 0n 3775 X = 2664 x01...x0n 3775 u = 0BB@ u1...un 1CCAU = 2664 U 01...U 0n 3775 W = �Y ...X� = 2664 W 01...W 0n 3775 � = � ! :2.2 Test StatisticsMost conventional hypotheses testings utilize the likelihood ratio (LR), Wald(W ), or Lagrange multiplier (LM) principle. Each has its own appeals. TheLR test is favorable when a computer package conveniently produces the con-strained and unconstrained likelihoods. The Wald test is preferable when theunrestricted MLE is easier to estimate. In model speci�cation tests, LM isthe preferred principle since the null hypotheses can usually be written as re-stricting a subset of the parameters of interest to zero and the restricted MLEbecomes the OLS estimator for the classical normal linear model.Even though our nonparametric approach to speci�cation tests does notlead to the OLS estimator for the restricted MLE under the null hypothesis, wewill demonstrate that LM test can still use the OLS or some other consistentestimators and speci�cation tests can be performed conveniently through mostof the popular computer packages. For this reason, we concentrate solely onderiving the LM test statistics in this paper.4

Let li(�) be the log-density of the ith observation, where � is a sx1 vectorof parameters. The log-likelihood function for the n independent observationsis thenl � l(�) = nXi=1 li(�) :The hypothesis to be tested is:H0 : h(�) = 0 :where h(�) is an r x 1 vector function of � with r � s. We denote H(�) =@h(�)=@�0 and assume that rank(H) = r, i.e. there are no redundant restric-tions. The LM statistic is given byLM = ~d0~I�1 ~dwhere d � d(�) = @l=@� is the score vector,I � I(�) = V ar[d(�)] = �E[ @2l@�@�0 ] = E[ @l@� @l@�0 ]is the information matrix and the `~'s indicate that the quantities are evaluatedat the restricted MLE of �. Under H0, LM is distributed as �2r asymptotically.3 Speci�cation TestsThe usual one-directional speci�cation tests of the model given by (1) and (2)in Section 2.1 involve testing the following hypotheses:1. Homoskedasticity (H): H0 : �2 = 0, assuming � = 0.2. Serial Independence (I): H0 : � = 0, assuming �2 = 0.3.1 Test for HeteroskedasticityBreusch and Pagan (1979) derived the LM test statistic for testing the presenceof heteroskedasticity under normality assumption. Here we provide the fullderivation for the LM statistic since the situation is somewhat di�erent dueto the nonparametric speci�cation of the innovation distribution.Assuming � = 0, the p.d.f. of the stochastic process speci�ed in Section2.1 can be written as 1�ifz(ui�i ).We shall partition the vector of parameters of model (4) and (5) into� = 0BBBBBB@ �: : :�1�2 1CCCCCCA = 0B@ �1: : :�2 1CA : 5

The log-likelihood function is then given byl(�) = nXi=1 �log fz �ui�i� � log �i�= nXi=18<:log fz 24 uiqh(v0i�)35 � 12 log[h(v0i�)]9=; :The score vector under H0 becomes@l(�)@ ��~� = nXi=1 � 8<: f 0z( ~ui~� )fz � ~ui~� � 1~�Yi9=;= nXi=1 z � ~ui~� � 1~�Yi = 0@l(�)@� ��~� = nXi=1 � 8<: f 0z( ~ui~� )fz � ~ui~� � 1~�xi9=;= nXi=1 z � ~ui~� � 1~�xi = 0@l(�)@� ��~� = 12 nXi=18<: � f 0z( ~ui~� )fz � ~ui~� � ~ui~�3h0(~�1)vi � h0(~�1)~�2 vi9=;= h0(~�1)2~�2 nXi=1 vi � z �~ui~� �� ~ui~� � � 1�where ~�2 = h(~�1), ~ui = yi � Y 0i ~ � x0i ~�, ~�1, ~ and ~� are the restrictedMLE obtained as the solutions to the above �rst order conditions.If we partition the information matrix intoI = I11 I12I21 I22 !corresponding to � = (�01; �02)0, we can see thatI12 = I 021 = �E @2l@�1@�02!= E (12 nXi=1 h0(v0i�)�2i " z(ui�i )( 1�i )W 0i + (ui�2i ) 0z(ui�i )W 0i# v0i) : (6)With the assumption of a symmetric p.d.f. for ui, I12 = I 021 = 0 due to thefact that both terms in (6) are odd functions. The lower right partition of Iis given byI22 = V ar[d2(�)] = V ar[@l(�)@�2 ]6

Letting ci = 12 h0(v0i�)�i2 and gi = z(ui�i )(ui�i ) = �(ui)ui, we getd2(�) = nXi=1 civi(gi � 1)from the �rst order conditions. This gives usI22 = nXi=1 c2i viV ar(gi)v0i :Denoting �2g = V ar(gi), we haveI22 = �2g nXi=1 c2iviv0i:We can estimate �2g by the consistent estimator�2g = Pni=1 g2in � Pni=1 gin !2and get�2~g = �2g ��~� = Pni=1 ~g2in � Pni=1 ~gin !2 = Pni=1 ~g2in � 1since Pni=1 ~gi = n from the �rst order condition for �1. Let~g = 0BB@ ~g1...~gn 1CCA ; V = 2664 v01...v0n 3775 ; 1 = 0BB@ 1...1 1CCA :Since the information matrix is block diagonal, the LM statistics for testingH0 : �0 ... I� 264 �1: : :�2 375 = 0can be written asLMH = ~d02~I�122 ~d2= 1�2~g (~g � 1)0 V fV 0V g�1 V 0 (~g � 1)= 1�2~g n~g0V (V 0V )�1 V 0~g � 2~g01 + no= 1�2~g n~g0V (V 0V )�1 V 0~g � ~g01 (101)�1 10~go7

If we substitute �2~g for �2~g into LMH , we getdLMH = 1�2~g n~g0V (V 0V )�1 V 0~g � ~g01 (101)�1 10~goThe LMH test is not feasible and neither is dLMH because the score function of the innovation is unknown and, hence, prevents us from solving for therestricted MLE ~�1, ~ , and ~�. To obtain a feasible version of the LMH statistic,let and � be any weakly consistent estimators, e.g. the OLS estimators, for ,and � respectively, and �� be a weakly consistent estimator for the true scorefunction � over the interval [u(1); u(n)]. Here u(1) and u(n) are the extremeorder statistics of the consistent residuals. Denoting gi = �� (ui) (ui) and�2g = P g2in � 1, we de�ne our operational form of the LM statistic asddLMH = 1�2g ng0V (V 0V )�1 V 0g � g01 (101)�1 10go= nR2where R2 is the centered coe�cient of determination from running a regressionof g on V .We now demonstrate that under the null hypothesis, ddLMH is asymptot-ically distributed as �2q�1. Since LMH is the standard Lagrange multiplierstatistic, under H0, LMH D! �2q�1. Under homoskedasticity, we are in an IIDset up and hence �2~g � �2~g = op(1). Hence, under the null, LMH and dLMH willhave the same asymptotic distribution. Next we show the asymptotic equiva-lence of dLMH and ddLMH. First, we note that, under H0, ui� ~ui = op(1). Since � is a continuous function, �(ui)� �(~ui) = op(1). With �� being consistentover [u(1); u(n)], gi�~gi = op(1), and therefore �2g��2~g = op(1). Next we considerthe numerators of dLMH and ddLMH . These numerators are based on the OLSregression of, respectively ~g and g on V . Let us denote ~� = (V 0V )�1V 0~g and� = (V 0V )�1V 0g. Now denoting d = g � ~g, we have� � ~� = V 0Vn !�1 V 0dn != V 0Vn !�1 1n nXi=1 vidiCox (1985,p.276) showed that jdij = jgi � ~gij = Op(n��) for 0 < � < 12.Therefore, 1n1�� Pni=1 jdij = Op(1). Suppose jv1j; : : : ; jvnj are bounded by m <1, then we can write1n �� nXi=1 vidi�� 1n nXi=1 jvidij 8

� mn nXi=1 jdij= mn� 1n1�� nXi=1 jdij= op(1)This establishes that dLMH and ddLMH are asymptotically equivalent, andhence, under H0, ddLMH D! �2q�1.Several interesting special cases can easily be derived from ddLMH assumingdi�erent speci�cation for f�(�i). For example, under the normality assumptionon f�(�i), �(ui) = ui=�2, and ddLMH � LMBP p! 0, where LMBP is the LMstatistic for testing heteroskedasticity in Breusch and Pagan (1979). If f�(�i) isa double exponential distribution [Box and Tiao (1973, p.157)], ddLMH asymp-totically becomes the Glesjer's (1969) statistic which regresses juij on vi, [seePagan and Pak (1991)]. Finally, for the logistic innovation, our ddLMH statisticis obtained by regressing ui � eui � 1eui + 1� on vi. Note that the score functions forthe double exponential and logistic distributions are bounded, and therefore,the latter two tests might perform better for fat tailed distributions.3.2 Test for Serial CorrelationGiven the model speci�ed by (4) and (5) along with the assumption � = 0,the null hypothesis for no serial independence isH0 : � = 0:Writing� = 0B@ �1: : :�2 1CA = 0BBBBBB@ �: : : �� 1CCCCCCA ;our model for testing serial independence can be written asyi = qi(Wi; Ui; �2) + �i (7)where �2 is a (m + k + p) x 1 vector and the �i's are I.I.D. with symmetricp.d.f. f�(�i) = 1�1 fz( �i�1 ), in which �1 is the scale parameter.The log-likelihood function isl(�) = nXi=1 �log fz � �i�1� � log �1�9

and the �rst order conditions for the restricted MLE are@l@ ��~� = nXi=1 z � ~ui~� � 1~�Yi = 0@l@� ��~� = nXi=1 z � ~ui~� � 1~�xi = 0@l@� ��~� = nXi=1 z � ~ui~� � 1~� ~Uiwhere the `e's again denote quantities evaluated at the restricted MLE, ~ui =yi � Y 0i ~ � x0i ~�, and ~Ui = (~ui�1; : : : ; ~ui�p)0.With the symmetry assumption on f�(�i), as before, it can be easily provedthat E(@2l(�)=@�2@�1) = 0 :We, therefore, only need to evaluate d2 and I22 if we are testing for restrictionson �2. Denoting Q as a n x (m + k + p) matrix with the ith row being@qi(Wi; Ui; �2)=@�02 and a n x 1 vector with elements i = 1�1 z � �i�1� = �(�i), we haved2 = � nXi=1 f 0z � �i�1 �fz � �i�1 � 1�1Qi= Q0and E [d2d02] = Q0 (E0)Q= �2Q0Qwhere �2 = E(i)2. The LMI statistic for testing H0 : � = 0 is given byLMI = ~0 ~Q( ~Q0 ~Q)�1 ~Q0 ~�2~where �2~ = E(2i )��~�.Letting �2~ = ~0 ~n be the consistent estimator for �2~, we havedLM I = ~0 ~Q( ~Q0 ~Q)�1 ~Q0 ~�2~Similar to the test for heteroskedasticity, neither LMI nor dLM I is feasible.To obtain a feasible version of the LM test, let �2 be any weakly consistentestimator for �2, �� be a weakly consistent estimator for the true score function10

� over the interval [�(1); �(n)], �i = yi � qi(Wi; Ui; �2), a n x 1 vector withelements i = �� (�i), Q a n x (m + k + p) matrix with the ith row being@qi(Wi; Ui; �2)=@�02 and �2 = 0=n, then the feasible LM statistic for testingserial independence in model (7) is given byddLM I = 0Q(Q0Q)�1Q0�2 = nR2where R2 is the uncentered coe�cient of determination of regressing on Q.Notice that the n x (m + k + p) matrix Q above has component Qi =(Y 0i ; x0i; U 0i). This facilitates the following simpler LM statistic.ddLM I = 0U hU 0U � U 0W (W 0W )�1W 0Ui�1 U 0�2 = nR2 (8)where R2 is the uncentered coe�cient of determination of regressing on Uand W due to the orthogonality given in the �rst order condition on the scorevector under H0. A well known alternative for computing the ddLM I statisticis to regress on U and W and test the signi�cance of the coe�cients of U .Following similar arguments as in the case of heteroskedasticity, we can showthat under serial independence, ddLM I D! �2p.As in the case of ddLMH, several interesting special cases can be obtainedfrom ddLM I . Under normality assumption, we have i = �i and ddLM I �LMBG p! 0, where LMBG is the LM statistic for testing autocorrelation inBreusch (1978) and Godfrey (1978a). The test can be performed by regression� on U and W . When the density of the innovation is double exponential, ourtest is performed by regressing sign(�i) on U 0i and W 0i . This is similar to thesign test for randomness of a process. If the innovation has a logistic density,our ddLM I test is equivalent to regressing e�i � 1e�i + 1 on U 0i and W 0i .4 Score Function EstimationThe score function as de�ned in (3) plays an important role in many aspects ofstatistics. It can be used for data exploration purpose, for Fisher informationestimation and for the construction of adaptive estimators of semiparametriceconometric models in robust econometrics [see e.g. Cox and Martin (1988)and Ng (1994)]. Here we use it to construct the nonparametric test statisticsddLMH and ddLM I .Most existing score function estimators are constructed by computing thenegative logarithmic derivative of some kernel based density estimators [see11

e.g. Stone (1975), Manski (1984), and Cox and Martin (1988)]. Cs�org}o andR�ev�esz (1983) suggested a nearest-neighbor approach. Modifying the approachsuggested in Cox (1985), Ng (1994) implemented an e�cient algorithm tocompute the smoothing spline score estimator that solvedmin 2H2[a;b] Z ( 2 � 2 0)dFn + � Z ( 00(x))2dx (9)where H2[a; b] = f : ; 0 are absolutely continuous, and R ba [ 00(x)]2dx <1g.The objective function (9) is the (penalized) empirical analogue of minimizingthe following mean-squared error:Z ( � 0)2dF0 = Z ( 2 � 2 0)dF0 + Z 20dF0 (10)in which 0 is the unknown true score function and the equality is due to thefact that under some mild regularity conditions [see Cox (1985)]Z 0�dF0 = � Z f 00(x)�(x)dx = Z � 0dF0:Since the second term on the right hand side of (10) is independent of ,minimizing the mean-squared error may focus exclusively on the �rst term.Minimizing (9) yields a balance between \�delity-to-data" measured by themean-squared error term and the smoothness represented by the second term.As in any nonparametric score function estimator, the smoothing spline scoreestimator has the penalty parameter � to choose. The penalty parametermerely controls the tradeo� between \�delity-to-data" and smoothness of theestimated score function. An automatic penalty parameter choice mechanismis suggested and implemented in Ng (1994) through robust information criteria[see Ng (1991) for a FORTRAN source codes].The performances of the kernel based score estimators depend very muchon using the correct kernel that re ects the underlying true distribution gen-erating the stochastic process besides choosing the correct window width. Theright choice of kernel becomes even more important for observations in thetails where density is low since few observations will appear in the tail to helpsmooth things out. This sensitivity to correct kernel choice is further ampli-�ed in score function estimation where higher derivatives of the density areinvolved [see Ng (1995)]. It is found in Ng (1995) that the smoothing splinescore estimator which �nds its theoretical justi�cation from an explicit statis-tical decision criterion, i.e. minimizing the mean-squared error, is more robustthan the ad hoc estimators, like the kernel based estimators, to distributionvariations. We, therefore, use it to construct our nonparametric test statistics.Since no estimator can estimate the tails of the score function accurately,some form of trimming is needed in the tails where observations are scarceto smooth things out. Cox (1985) showed that the smoothing spline score12

estimator achieved uniformly weak consistency over a bounded �nite support[a0; b0] which contains the observations x1; : : : ; xn. Denoting the solution to (9)as (x), the score estimator used in constructing our nonparametric statisticsddLMH and ddLM I given in Section 3 takes the form �(x) = ( (x) if x(1) � x � x(n)0 otherwise : (11)5 Small Sample PerformancesAll the results on the LM statistics discussed earlier are valid only asymp-totically. We would, therefore, like to study the �nite sample behavior of thevarious statistics in this section. We are interested in the closeness of the distri-butions of the statistics under the null,H0, to the asymptotic �2 distributions,the estimates of the probabilities of Type-I error as well as the estimated pow-ers. The LM statistics involved in this simulation are LM�H [given in Godfrey(1978b), and Breusch and Pagan (1979)], LM�I [given in Breusch (1978), andGodfrey (1978a)], ddLMH and ddLM I. For the LM statistics, the closeness of thedistributions under the null to the asymptotic �2 distributions are measuredby the Kolmogorov-Smirnov statistics, the estimated probabilities of Type-Ierrors are measured by the portion of rejections in the replications when theasymptotic �2 signi�cant values are used, and the estimated powers are mea-sured by the number of times the test statistics exceeded the correspondingempirical signi�cant points divided by the total number of replications.We are using the simulation models of Bera and Jarque (1982) and Beraand McKenzie (1986) so that our results can be compared with their prior�ndings. The linear regression model is given byyi = 4Xj=1 xij�j + uiwhere xi1 = 1, xi2 are random variates from N(10; 25), xi3 from the uniformU(7:5; 12:5) and xi4 from �210. The regression matrix,X, remain the same fromone replication to another. Serial correlated (I) errors are generated by the �rstorder autoregressive (AR) process, ui = �ui�1 + �i, where j�j < 1. As in Beraand Jarque (1982), and Bera and McKenzie (1986), the level of autocorrelationis categorized into `weak' and `strong' by setting � = �1 = 0:3 and � =�2 = 0:7, respectively. Heteroskedasticity (H) are generated by E(�i) = 0and V (�i) = �2i = 25 + �vi, where pvi � N(10; 25) and � is the parameterthat determines the degree of heteroskedasticity, with � = �1 = 0:25 and� = �2 = 0:85 represent `weak' and `strong' heteroskedasticity respectively.In order to study the robustness of the various test statistics to distributional13

deviations from the conventional Gaussian innovation assumption, the non-normal (N) disturbances used are (1) the Student's t distribution with �vedegrees of freedom, t5, which represent moderately thick-tail distributions,(2) the log-normal, log, which represent asymmetric distributions, (3) thebeta distribution with scale and shape parameters 7, B(7; 7), which representdistributions with bounded supports, (4) the 50% normal mixture, NM , oftwo normal distributions, N(�3; 1) and N(3; 1), which represents bi-modaldistributions, (5) the beta distribution with scale 3 and shape 11, B(3; 11),which represents asymmetric distributions with bounded supports, and (6)the contaminated normal, CN , which is the standard normal N(0; 1) with.05% contamination from N(0; 9), that attempts to capture contamination ina real life situation. All distributions are normalized to having variance 25under H0. Figure 1 presents the score functions of all the above distributions.Notice from Figure 1 that distributions with thicker tails than the normal havereceding score in the tails while those with thinner tails than the normal haveprogressive score in the tails.The experiments are performed for sample size N = 25; 50; and 100.The number of replication is 250. The Komogorov-Smirnov statistics for thevarious LM statistics are reported in TABLE I.

14

TABLE IDisturbance Sample SizeDistribution 25 50 100LM�H .0450 .0429 .0510N(0; 25) LM�I .0457 .0361 .0380ddLMH .0734 .0504 .0288ddLM I .0440 .0398 .0420LM�H .0754 .1385 .1167t5 LM�I .0707 .0351 .0660ddLMH .0444 .0436 .0674ddLM I .0454 .0293 .0706LM�H .1787 .3005 .4767log LM�I .0676 .0680 .0522ddLMH .0440 .0394 .0371ddLM I .0511 .0568 .0714LM�H .0512 .0504 .0620B(7; 7) LM�I .0390 .0452 .0365ddLMH .0399 .0653 .0472ddLM I .0333 .0607 .0336LM�H .2372 .2837 .3546NM LM�I .0453 .0242 .0470ddLMH .0386 .0514 .0424ddLM I .0333 .0509 .0276LM�H .0393 .0817 .0379B(3; 11) LM�I .0721 .0539 .0457ddLMH .0487 .0987 .0301ddLM I .0947 .0685 .0496LM�H .0464 .1104 .1685CN LM�I .0396 .0444 .0849ddLMH .0447 .0416 .0387ddLM I .0539 .0450 .0906The 5% critical values for the Kolmogorov-Smirnov statistic for the samplesizes of 25; 50; and100 are :2640; :1884 and :1340 respectively while the 1%critical values for 25; 50 and100 observations are :3166; :2260; and :1608respectively [Pearson and Hartley (1966)]. In TABLE I, the Kolmogorov-15

Smirnov statistics that are signi�cant at the 1% level are boxed. From TABLEI, it is clear that no signi�cant departure from the asymptotic �21 distributioncan be concluded at either 5% or 1% level of signi�cance for all LM statisticsunder N(0; 25), B(7; 7), and B(3; 11). The departure from the �2 distribu-tion becomes more noticeable for LM�H as the sample size gets bigger whenthe disturbance term follows the log, NM or CN distributions. This is illus-trated in Figure 2 for log and Figure 3 for the NM disturbance terms; bothsample sizes equal 100. Both �gures are plots of the nonparametric adaptivekernel density estimates of LM�H and ddLMH [see Silverman (1986) for detailsof adaptive kernel density estimation]. We can see that LM�H has thinner tailunder NM and thicker tail under log than the asymptotic �21 distribution.This suggests that under the null hypothesis of homoskedasticity and serialindependence, the distribution of the conventional LM statistic for testingheteroskedasticity deviates away from the �2 distribution as the distributionof the disturbance term departs further from the normal distribution in shapewhile our nonparametric heteroskedasticity test statistics are more robust tothese distributional deviations. From Figures 2 and 3, it is clear that at thetails, the distributions of ddLMH and the �21 are very close. To maintain thecorrect size of a test statistic, only the tail of its distribution matters. As wewill see later in TABLE II, the true Type-I error probabilities of ddLMH are veryclose to the nominal level of 10%. Both the LM�I and ddLM I statistics seem tobe much less sensitive to distributional deviations in the disturbance term.The estimated probabilities of Type-I error for the LM statistics are re-ported in TABLE II. The estimated probabilities are the portions of the repli-cations for which the estimated LM statistics exceed the asymptotic 10%critical values of the �21 distributions. Since the number of replication is 250,the standard errors of the estimated probabilities of Type-I error is no biggerthan q0:5(1 � 0:5)=250 ' 0:032.16

TABLE IIDisturbance Sample SizeDistribution 25 50 100LM�H .080 .116 .112N(0; 25) LM�I .064 .092 .096ddLMH .108 .128 .112ddLM I .076 .092 .100LM�H .108 .208 .200t5 LM�I .108 .088 .104ddLMH .108 .088 .068ddLM I .116 .092 .104LM�H .248 .388 .544log LM�I .084 .080 .060ddLMH .112 .068 .108ddLM I .100 .136 .104LM�H .076 .072 .068B(7; 7) LM�I .084 .124 .100ddLMH .116 .100 .100ddLM I .072 .124 .096LM�H .016 .016 .000NM LM�I .144 .116 .064ddLMH .120 .084 .108ddLM I .112 .104 .084LM�H .088 .124 .104B(3; 11) LM�I .076 .080 .104ddLMH .128 .100 .112ddLM I .072 .076 .100LM�H .144 .204 .228CN LM�I .100 .064 .140ddLMH .088 .092 .100ddLM I .092 .068 .144From TABLE II, it is obvious that the Type-I error probabilities for our non-parametric test statistics, ddLMH and ddLM I are very close to the nominal 10%level under almost all sample sizes and distributions. On the other hand, thetrue sizes for LM�H could be very high. For example, when the distribution17

is log, for sample of size 100, LM�H rejects the true null hypothesis of ho-moskedasticity 54% of the times. When the distribution is t5 or CN , LM�Halso overly rejects, though less severely. As we have noted while discussing theimplications of Figure 2, over rejection occurs since the distribution of LM�Hhas much thicker tail when the normality assumption is violated. On the otherhand, the e�ect of NM distribution on LM�H is quite the opposite. LM�H hasthinner tail than �21 as noted in Figure 3 resulting in very low Type-I errorprobabilities. The Type-I error probabilities for ddLMH is, in contrast, veryclose to the nominal signi�cant level of 10%.As we observed in TABLE I that LM�I is not as sensitive to departuresfrom normality as LM�H is and hence the deviations from the 10% Type-I er-ror probability of LM�I are not as severe as those of LM�H . These �ndingsare consistent with those of Bera and Jarque (1982) and Bera and McKenzie(1986), in which the LM�H and LM�I tests have incorrect Type-I error probabil-ities under log and t5 when the asymptotic critical values of the �2 distributionare used.Given the above results that the estimated probabilities of Type-I errorfor the various LM statistics are di�erent, it is only appropriate to comparethe estimate powers of the LM statistics using the simulated critical values.The 100�% simulated critical values are the (1 � �) sample quantiles of theestimated LM statistics. The estimated powers of the LM statistics are,hence, the number of times the statistics exceed the (1 ��) sample quantilesdivided by the total number of replications. The � used in our replicationsis 10%. The standard errors of the estimated powers are again � 0:032: Theestimated powers for N = 50 and 100 are presented in TABLE III and IVrespectively.18

TABLE IIIDisturbance Alternatives: H1Distributions HI(�1) HI(�2) HI(�1) HI(�2) HI(�1; �1) HI(�1; �2) HI(�2; �1) HI(�2; �2)LM�H .592 .832 .104 .084 .548 .292 .740 .372N (0; 25) LM�I .112 .112 .552 .996 .576 .996 .564 .992ddLMH .524 .760 .100 .088 .456 .272 .688 .376ddLM I .116 .108 .568 .976 .568 .968 .552 .968LM�H .448 .608 .096 .052 .388 .132 .540 .224t5 LM�I .112 .120 .504 1.00 .524 1.00 .536 1.00ddLMH .432 .608 .104 .096 .396 .236 .600 .348ddLM I .132 .140 .504 .984 .532 .972 .544 .960LM�H .220 .292 .084 .032 .164 .072 .268 .084log LM�I .108 .116 .584 1.00 .572 1.00 .576 1.00ddLMH .600 .752 .124 .132 .472 .220 .656 .272ddLM I .092 .076 .748 .940 .716 .956 .656 .960LM�H .660 .896 .100 .060 .624 .252 .828 .448B(7; 7) LM�I .108 .092 .528 1.00 .552 1.00 .564 1.00ddLMH .648 .852 .120 .088 .640 .276 .788 .424ddLM I .108 .092 .500 .996 .524 .992 .564 .988LM�H .960 .996 .148 .284 .916 .500 .992 .720NM LM�I .100 .096 .540 .984 .536 .992 .548 .992ddLMH .896 .956 .176 .156 .824 .352 .928 .548ddLM I .104 .088 .844 .980 .744 .992 .564 .988LM�H .588 .844 .108 .092 .556 .264 .772 .404B(3; 11) LM�I .092 .116 .572 .992 .608 .996 .612 1.00ddLMH .604 .848 .108 .124 .588 .324 .784 .496ddLM I .116 .120 .560 .956 .572 .988 .616 .988LM�H .396 .692 .088 .064 .400 .180 .600 .276CN LM�I .104 .112 .524 .992 .560 .988 .548 .992ddLMH .488 .708 .104 .104 .448 .264 .636 .388ddLM I .112 .132 .524 .968 .544 .968 .564 .96419

TABLE IVDisturbance Alternatives: H1Distributions HI(�1) HI(�2) HI(�1) HI(�2) HI(�1; �1) HI(�1; �2) HI(�2; �1) HI(�2; �2)LM�H .840 .988 .092 .060 .804 .412 .968 .664N (0; 25) LM�I .124 .132 .864 1.00 .848 1.00 .848 1.00ddLMH .808 .976 .100 .072 .784 .408 .952 .640ddLM I .120 .132 .852 1.00 .848 1.00 .848 1.00LM�H .688 .916 .080 .040 .636 .300 .876 .492t5 LM�I .080 .068 .828 1.00 .860 1.00 .876 1.00ddLMH .764 .952 .144 .116 .700 .448 .884 .648ddLM I .084 .084 .828 1.00 .876 .992 .896 .996LM�H .256 .364 .088 .008 .184 .052 .300 .080log LM�I .108 .112 .912 1.00 .904 1.00 .900 1.00ddLMH .880 .972 .136 .124 .764 .368 .928 .540ddLM I .116 .080 .988 1.00 .996 .992 .996 1.00LM�H .928 .996 .104 .100 .896 .532 .980 .816B(7; 7) LM�I .120 .120 .852 1.00 .832 1.00 .844 1.00ddLMH .900 .992 .096 .088 .848 .492 .964 .748ddLM I .116 .120 .848 .996 .828 1.00 .848 1.00LM�H 1.00 1.00 .212 .308 1.00 .792 1.00 .944NM LM�I .104 .112 .908 1.00 .892 1.00 .896 1.00ddLMH .992 .996 .084 .096 .960 .512 .988 .780ddLM I .112 .092 1.00 1.00 .984 1.00 .972 1.00LM�H .856 .984 .080 .052 .792 .444 .968 .712B(3; 11) LM�I .112 .108 .884 1.00 .864 1.00 .876 1.00ddLMH .884 .984 .064 .080 .792 .448 .960 .712ddLM I .116 .092 .900 1.00 .880 .996 .880 1.00LM�H .596 .828 .108 .032 .548 .220 .780 .376CN LM�I .068 .072 .760 1.00 .752 1.00 .752 1.00ddLMH .648 .896 .120 .108 .588 .340 .848 .496ddLM I .088 .092 .776 1.00 .784 .996 .800 1.00First we note that the estimated powers of the parametric tests LM�H andLM�I are similar to those reported in Bera and Jarque (1982), and Bera andMcKenzie (1986). Regarding the powers of our nonparametric tests ddLMH andddLM I , we observe that they are comparable to their parametric counterpartsfor N(0; 25), B(7; 7), B(3; 11) and NM disturbances. In particular, when thedisturbance distribution is normal, for which LM�H and LM�I are designed to20

perform best, we observe very little loss of power in using ddLMH and ddLM I .On the other hand, ddLMH substantially outperform its parametric counterpartwhen the disturbance term follows a lognormal distribution. To see the dif-ference between the performances of LM�H and ddLMH, we consider the caseof lognormal distribution with sample size 50. LM�H has \optimal" power of.832 for the alternative HI(�2) with normal disturbance. However, the esti-mated power for LM�H reduces to .292 when the disturbance distribution islognormal. When we further contaminate the data by strong autocorrelation,that is under HI(�2; �2), the estimated power is merely .084, even less thanthe size of the test. The estimated powers for ddLMH for the above three situa-tions are respectively .760, .752 and .272. The power do reduces with gradualcontamination, but not as drastically as that of LM�H . For the t5 and CNdisturbances, the advantage of the nonparametric ddLMH becomes more emi-nant as the sample size gets bigger, under which the nonparametric e�ciencybegins to show up. Note that all the distributions t5, log, and CN , underwhich ddLMH outperforms LM�H , have thicker tails than the normal distribu-tion. The B(7; 7) and B(3; 11) distributions, under which LM�H is comparableto ddLMH , have thinner tails than the normal distribution. The NM distri-bution, which has the same tail behavior as the normal distribution does notdeteriorate the power of LM�H substantially even though the distribution ofLM�H deviates quite remarkably from the �2 under H0 as we noticed in Figure3. As we noted in Figure 1, the thick-tails distributions like t5 and CN havereceding score in the tails while thin-tails distributions have progressive scorein the tails. It is exactly the thick-tails distributions that cause problems inconventional statistical methods and it is these thick-tails distributions thatrobust procedures are trying to deal with.The parametric LM�I , however, seems to be less sensitive to distributionaldeviation of the innovation and, hence, there are no drastic di�erences betweenLM�I and ddLM I even for severe departures from the normal distribution suchas under t5 log, and CN .As was indicated above, both the LM�H and ddLMH statistics for testingheteroskedasticity are not robust to misspeci�cations in serial independence.The power of both tests drop when there are severe serial correlations presentin the disturbances. The e�ect of serial correlation is, however, more seriousfor LM�H . For instance, when the sample size is 100 and the distribution ist5, estimated power of LM�H reduces by :424 (= :916 � :492) as we movefrom HI(�2) to HI(�2; �2). On the other hand, for ddLMH the power loss is:304 (= :952 � :648). This pattern is observed for almost all distributions.The powers of LM�I and ddLM I are, however, more robust to violation on the21

maintained assumption of homoskedasticity. This is easily seen by looking atthe powers of LM�I and ddLM I under three sets of alternatives: (i) HI(�1) andHI(�2); (ii) HI(�1; �1) and HI(�1; �2), and (iii) HI(�2; �1) and HI(�2; �2).Nevertheless, this suggests that some join tests or Multiple Comparison Proce-dure in the same spirit of Bera and Jarque (1982) will be able to make our testsfor heteroskedasticity more robust to violation on the maintained serial inde-pendence assumption. Furthermore by adopting a nonparametric conditionalmean instead of the linear conditional mean model [see e.g. Lee (1992)] or evenusing a nonparametric conditional median speci�cation [see e.g. Koenker, Ngand Portnoy (1994)] will further make our test statistics robust to misspeci�-cation on the conditional structural model. These extensions will be reportedin future work.Our simulation results indicate that the distribution of our nonparamet-ric LM statistic for testing heteroskedasticity are closer to the asymptotic �2distribution under homoskedasticity and serial independence for all distribu-tions under investigation than its parametric counterpart. The parametricLM statistic for testing autocorrelation is, nevertheless, much less sensitive todeparture from the normality assumption and hence fares as good as its non-parametric counterpart. The estimated probabilities of Type I Error for thenonparametric LM statistics for testing both heteroskedasticity and autocor-relation are also much closer to the nominal 10% value. The superiority of ournonparametric LM test for heteroskedasticity becomes more prominent as thesample size increases and as the severity of the departure (measured roughlyby the thickness in the tails ) from normality increases. Therefore, we mayconclude that our nonparametric test statistics are robust to distributionalmisspeci�cation and will be useful in empirical work.ACKNOWLEDGEMENTSThe authors would like to thank Alok Bhargava, Roger Koenker, Paul New-bold, Steve Portnoy, Frank Vella and, especially, Bill Brown for their helpfulcomments. Part of the paper was completed while the �rst author was visitingCentER, Tilburg University. The second author would also like to acknowl-edge �nancial support from the President's Research and Scholarship Fund ofUniversity of Houston.22

BIBLIOGRAPHYAnscombe, F.J. (1961). \Examination of residuals," in: Proceedings of thefourth Berkeley symposium on mathematical statistics and probability,Vol. 1 (University of California Press, Berkeley, CA) 1-36.Anscombe, F.J. and Tukey, J.W. (1963). \Analysis of residuals," Technomet-rics, 5, 141-160.Bera, A. and Jarque, C. (1982). \Model speci�cation tests: A simultaneousapproach," Journal of Econometrics, 20, 59-82.Bera, A. and McKenzie, C. (1986). \Alternative forms and properties of thescore test," Journal of Applied Statistics, 13, 13-25.Bickel, P. (1978). \Using residuals robustly I: Tests for heteroscedasticity,nonlinearity," The Annals of Statistics, 6, 266-291.Box, G.E.P. (1953). \Non-normality and tests on variance," Biometrika, 40,318-335.Box, G.E.P. and Tiao, G.C. (1973). Bayesian inference in statistical analysis,Addison-Wesley, Mass.Breusch, T.S. (1978). \Testing for autocorrelation in dynamic linear models,"Australian Economic Papers, 17, 334-55.Breusch, T.S. and Pagen, A.R. (1979). \A simple test for heteroscedasticityand random coe�cient variation," Econometrica, 47, 1287-1294.Cox, D. (1985). \A penalty method for nonparametric estimation of thelogarithmic derivative of a density function," Annals of the Institute ofStatistical Mathematics, 37, 271-288.Cox, D. and Martin, D. (1988). \Estimation of score functions," TechnicalReport, University of Washington, Seattle, WA.Cs�org}o, M., and R�ev�esz, P. (1983). \An N.N.-estimator for the score func-tion," in: Proceedings of the �rst Easter conference on model theory,Seminarbericht Nr.49 (der Humboldt-Universit�at zu Berlin, Berlin) 62-82.Davidson, R. and MacKinnon, J. (1983). \Small sample properties of al-ternative forms of the Lagrange multiplier test," Economics Letters, 12,269-275. 23

Hampel, F. (1974). \The in uence curve and its role in robust estimation,"Journal of The American Statistical Association, 69, 383-393.Glesjer, H. (1969). \A new test for heteroscedasticity," Journal of the Amer-ican Statistical Association, 64, 316-323.Godfrey, L.G. (1978a). \Testing against general autoregressive and movingaverage error models when the regressors include lagged dependent vari-ables," Econometrica, 46, 1293-1301.Godfrey, L.G. (1978b). \Testing for multiplicative heteroscedasticity," Jour-nal of Econometrics, 8, 227-236.Joiner, B. and Hall, D. (1983). \The ubiquitous role of f 0f in e�cient estima-tion of location," The American Statistician, 37, 128-133.Koenker, R. (1981). \A note on Studentizing a test for heteroscedasticity,"Journal of Econometrics, 17, 107-112.Koenker, R. (1982). \Robust methods in econometrics," Econometric Re-view, 1, 214-255.Koenker, R., Ng, P. and Portnoy, S. (1994). \Quantile smoothing splines,"Biometrika, 81, 673-680.Lee, B. (1992). \A heteroskedasticity test robust to conditional mean mis-speci�cation," Econometrica, 60, 159-171.Manski, C. (1984). \Adaptive estimation of non-linear regression models,"Econometric Reviews, 3, 145-194.Ng, P. (1991). \Computing smoothing spline score estimator," Discussionpaper (University of Houston, Houston, TX).Ng, P. (1994). \Smoothing Spline Score Estimation," SIAM, Journal onScienti�c Computing, 15, 1003-1025.Ng, P. (1995). \Finite Sample Properties of Adaptive Regression Estima-tors," Econometric Reviews, forthcoming.Pagan, A. R. and Pak, Y. (1991). \Tests for heteroskedasticity," Workingpaper (University of Rochester, New York, NY).Pearson, E.S. and Hartley, H.O. (1966). Biomatrika Tables for Statisticians,Vol. 2 Cambridge University Press, Cambridge, England.Silverman, B.W. (1986). Density estimation for statistics and data analysis,Chapman and Hall, New York. 24

Stone, C. (1975). \Adaptive maximum likelihood estimators of a locationparameter," The Annals of Statistics, 3, 267-284.Tukey, J.W. (1960). \A survey of sampling from contaminated distributions,"in: I. Olkin, ed., Contributions to Probability and Statistics (StanfordUniversity Press, Stanford, CA).

25

Date post:	21-Nov-2023
Category:	Documents
Upload:	nau
View:	0 times
Download:	0 times

Robust tests for heteroskedasticity and autocorrelation using score function

Documents