+ All Categories
Home > Documents > Abstract - nyu.edu · analysis of where rep eated observ ations (often ann ual) are made on the...

Abstract - nyu.edu · analysis of where rep eated observ ations (often ann ual) are made on the...

Date post: 01-Sep-2018
Category:
Upload: lynhan
View: 215 times
Download: 0 times
Share this document with a friend
29
Transcript

Time-Series{Cross-Section Data: What Have

We Learned in the Last Few Years?

Nathaniel BeckDepartment of Political Science

University of California, San DiegoLa Jolla, CA [email protected]

http://weber.ucsd.edu/�nbeck1

January 29, 2001

1Thanks to Geo�rey Garrett for supplying the data used here. This article is

based on work done jointly with Jonathan N. Katz, and all uses of \we" indicate

Katz and myself. This is a revised version of a paper presented at a conference

on the Analysis of Repeated Cross Section, Nijmegen, The Netherlands, June 15,

2000. Draft of January 29, 2001.

Abstract

This article treats the analysis of \time-series{cross-section" (TSCS) data.Such data consists of repeated observations on a series of �xed units. Exam-ples of such data are annual observations on the political economy of OECDnations in the post-war era. TSCS data is distinguished from \panel" data, inthat asymptotics are in the number of repeated observations, not the numberof units.

The article begins by treating the complications of TSCS data in an \old-fashioned" manner, that is, as a nuisance which causes estimation diÆculties.It claims that TSCS data should be analyzed via ordinary least squareswith \panel correct standard errors" rather than generalized least squaresmethods. Dynamics should be modeled via a lagged dependent variable or,if appropriate, a single equation error correction model.

The article then treats more modern issues, in particular, the modeling ofspatial e�ects and heterogeneity. It also claims that heterogeneity should beassessed with \panel cross-validation" as well as more standard tests. Thearticle concludes with a discussion of estimation in the presence of a binarydependent variable.

Keywords: Panel Data, Robust Standard Errors, Dynamics, Spatial Mod-els, Heterogeneity, Random CoeÆcients, Binary Dependent Variable

1 Time-Series{Cross-Section Data

Time-series{cross-section (TSCS) data is one type of repeated observationdata that is commonly analyzed in political science and related disciplines.TSCS data is common in the analysis of data where repeated observations(often annual) are made on the same �xed political units (usually states orcountries).

While there are many applications, the prototypical application is thestudy of political economy, and in particular the impact of political arrange-ments on economic performance in advanced industrial societies. Here I usean example due to Garrett (1998), who examines the political economy ofgovernment economic policy and performance in 14 OECD nations from 1966to 1990. In particular, he is interested in whether labor organization and po-litical partisanship a�ect economic policy and/or performance, and whetherthe impacts of those variables have changed over time (whether \globaliza-tion" has limited the impact of domestic political arrangements).

Other applications have more or fewer repeated observations. Alvarez,Garrett, and Lange (1991), the study that initially sparked our (all uses of\we" or \our" indicate joint work with Jonathan Katz) interest, observed 16OECD nations for only 14 years. While there is no strict lower limit to thenumber of repeated observations, we need enough repeated observations forsome averaging operations to make sense. Thus, for example, our simulationsuse a minimum of 15 repeated observations. There is no reason in principalthat the observations need be annual, but they typically are. While quarterlyor monthly data would increase the number of repeated observations, suchdata is often not meaningful in the political economy context. I return to thisissue in discussion of the dynamics. There is no upper limit to the number ofrepeated observations we can study, and more repeated observations simplyimprove the performance of TSCS estimators.

While the 14 units studied by Garrett (see also Hall and Franzese, 1998;Hicks and Kenworthy, 1998; Iversen, 1998; Montanari, 1999; Pampel, 1996;Radcli�e and Davis, 2000) is typical, other researchers have studied the 50American states (Fiorina, 1994; Fording, 1997; Hollingsworth, Hanneman,Hage, and Ragin, 1996; Smith, 1997; Su, Kamlet, and Mowery, 1993) or the100 or so nations that we have good data on (Blanton, 2000; Burkhart andLewis-Beck, 1994; Gasiorowski, 2000; Grier and Tullock, 1989; Poe, Tate,and Keith, 1999). Applications may be to any number of subgovernmentalunits, such as the 654 Louisiana parishes (counties) studied by Giles andHertz (1994). The critical issue, as we shall see in Section 2, is that the unitsbe �xed and not sampled, and that inference be conditional on the observedunits.

TSCS data has also become of interest in International Relations (IR).

1

Many quantitative IR researchers use a \dyad-year" design (Maoz and Rus-sett, 1993; Oneal and Russett, 1997), where pairs of nations are observedannually for long periods of time (ranging from 40 to over 100 years). Thedependent variable of interest in these studies is often the binary indicator ofwhether a dyad was in con ict in a given year. Binary dependent variablescause special problems.

I discuss the characteristics of TSCS data in Section 2, and use that char-acterization to discuss the modeling of unit speci�c e�ects in Section 3. Ithen discuss \old-fashioned" estimation issues in Sections 4, with the sug-gested estimation strategy proposed in Beck and Katz (1995) discussed inSection 5. These sections restrict themselves to static models; dynamics arediscussed in Section 6. The paper then turns to current issues of interest.Section 7 considers spatial e�ects and the detection and modeling of hetero-geneity while Section 8 considers the TSCS models with a binary dependentvariable. The �nal section o�ers a brief conclusion.

2 Characterizing TSCS Data

TSCS can be characterized by

yi;t = xi;t� + �i;t; i = 1; : : : ; N ; t = 1; : : : ; T (1)

where xi;t is a K vector of exogenous variables and observations are indexedby both unit (i) and time (t). Let be the NT�NT covariance matrix of theerrors with typical element E(�i;t�j;s). (This assumes a \rectangular" datastructure of the data. This assumptions is purely for notational convenience.We could easily allow for each unit to have Ti observations.) I assume, untilSection 8, that the dependent variable, y, is continuous (at least in the senseof social science, where 7 point scales and the like are treated as continuous).Given the nature of typical TSCS data, I will often refer to the units ascountries and the time periods as years, but the discussion generalizes to anydataset that is TSCS.

Equation 1 hides as much as it reveals. In particular, it does not distin-guish \panel" data from TSCS data. Panel data is repeated cross-sectiondata, but the units are sampled and they are typically only observed a fewtimes. TSCS units are �xed; there is no sampling scheme for the units andany \resampling" experiment must keep the units �xed and only resamplecomplete units (Freedman and Peters, 1984). In panel data the observed\people" are of no interest per se, with all inferences of interest being to theunderlying population that was sampled, rather than being conditional onthe observed sample. TSCS data is exactly the opposite; all inferences ofinterest are conditional on the observed units. For TSCS data we cannot

2

even contemplate a thought experiment of resampling a new \Germany," al-though we can contemplate observing a new draw of German data for someyear.

The di�erence between TSCS and panel data has both theoretical andpractical consequences; these consequences go hand in hand. Theoretically,all asymptotics for TSCS data are in T ; the number of units is �xed andasymptotic arguments must be based on the N observed units. We can,however, contemplate what might happen as T ! 1, and methods can betheoretically justi�ed based on their large T behavior.

Panel data has just the opposite characteristic. However many waves apanel has, that number is �xed by the design, and there can be no justi�cationof methods by an appeal to asymptotics in T . There are, however, reasonableasymptotics in N , as sample sizes can be thought of as getting larger andlarger.

Many common panel methods are justi�ed by asymptotics in N . In par-ticular, the currently popular \General Estimating Equation" approach ofLiang and Zeger (1986) is only known to have good properties as N becomeslarge. Thus while it might be a very useful method for panel data, there is noreason to believe that the General Estimating Equation is a good approachfor TSCS data.

Many of the methods we propose require that T be large enough so thataverages over the T time periods for each unit make sense. We also usestandard time series methods to model the dynamics of TSCS data; thisis only possible when T is not small. Panel data methods, conversely, areconstructed to deal with small T 's; one would not attempt to use a laggeddependent variable when one has only three repeated observations per unit!

Thus, TSCS methods are justi�ed by asymptotics in T , and typicallyrequire a reasonably large T to be useful. Again, there is no hard and fastminimum T for TSCS methods to work, but one ought to be suspicious ofTSCS methods used for, to pick a round number, T < 10. TSCS methods,on the other hand, do not require a large N , although a large N is typicallynot harmful. Thus estimation on 14 OECD nations does not violate any as-sumption which justi�es a (correct) TSCS method, but estimation on 1000'sof �xed units via TSCS methods is perfectly acceptable (though it might benumerically diÆcult). Panel methods have the opposite characteristic. Theyare designed for, and work well, with very small T 's (three, or perhaps eventwo), but require a large N for the theoretical properties of the estimators tohave any practical consequences. Panel estimators are also designed to avoidpractical issues that arise from the large N that characterizes panel data.This convergence of theoretical justi�cation and practical requirements canbe most easily seen by considering the modeling of unit speci�c e�ects.

3

3 Fixed vs. Random E�ects

Equation 1 assumes that all units map the covariates into the dependentvariable identically. The simplest way to drop this assumption is to alloweach unit to have its own intercept, that is to add a term �i to Equation 1.This can either be done by adding a series of unit dummy variables to thespeci�cation (\�xed e�ects") or to assume that the �i are draws from a(invariably normal) distribution, which we must assume is independent ofthe distribution of the �i;t.

Both practical considerations and theory rule out �xed e�ects for com-mon panel data. In practice, �xed e�ects use up 1

Tof our degrees of freedom,

far too high a price to pay to model di�erent intercepts when T is verysmall. Theoretically, �xed e�ects are subject to the famous Neyman andScott (1948) \incidental parameters" problem, where the number of parame-ters in the model is unbounded as N !1. Finally, inference in panel modelsis never conditional on the observed sample; we seldom want to predict whatrespondent 17 will do in the next election. As Hsaio (1986, 41{7) clearlyshows, random e�ects are appropriate if one wants to make inferences to anunderlying population from which the observed sample was drawn.

The situation is reversed for TSCS data. With a reasonably large T (sayover 10), �xed e�ects do not use up an absurd number of degrees of freedom.As T gets bigger, we get better and better estimates of the �xed e�ects.And since asymptotics are in T and not N , there is no incidental parametersproblem. Furthermore, inferences in TSCS models are conditional on theobserved units; we are not going to draw a new \Germany" and all thoughtexperiments are conditional on the Germany that was observed. Thus ifwe want to predict German economic growth under some conditions, weshould use the intercept estimated via �xed e�ects in that prediction. And�nally, of course, the use of �xed e�ects does not require us to make theoften untenable assumption that the e�ects are uncorrelated with the otherindependent variables; such an assumption is required for typical randome�ects models. Thus �xed e�ects are, in principal, the appropriate way toestimate unit e�ects in TSCS data.

As T grows large, it makes little di�erence whether we use a �xed orrandom e�ects model to estimate Equation 1. As is well known (see anyeconometrics text, such as Greene, 1999, 568{70), as T gets large, randome�ects converges to �xed e�ects. To see this, note that random e�ects is justa linear combination of the �xed e�ects estimator and the \between units"estimator (which averages up all observations for each unit and then performsleast squares on the N \cross-sectional" observations). The weight on thebetween units estimator is

� =�2�

�2� + T�2�

4

As T gets large � goes to zero, so that the random e�ects estimator convergesto its �xed e�ects counterpart.

A Practical Example

The argument above is based on asymptotics. What happens in real researchsituations, where T is not small but not in�nite? How do �xed and randome�ects estimators compare in such a situation? Here I compare the results forone particular model, a model of the political economy of economic growthin the OECD nations (Garrett, 1998). This comparison is obviously onlyillustrative, though some interesting points emerge.

The Garrett model uses data on 14 OECD nations observed from 1966{1990, yielding T = 25. I work with his simplest model which assumes thatgrowth in GDP is a linear function of lagged growth (GDPL), oil dependence(OIL), overall OECD GDP growth, weighted for each country by its tradewith the other OECD nations, (DEMAND), the proportion of cabinet postsoccupied by left parties (LEFT), the degree of centralized labor bargainingas a measure of corporatism (LABOR) and the product of the latter two vari-ables (LEFTxLABOR). The �rst three variables control for economic vari-ables which should e�ect economic performance; the latter three are politicalvariables used to test the \social democratic corporatist" model of economicperformance, which argues that nations with congruent political and laborbargaining perform best; nations with politically powerful left parties andcentralized labor bargaining or nations with politically powerful right partiesand decentralized bargaining have congruent institutions. (Both the theoryand data are spelled out in Garrett's book.) Dummy variables to mark im-portant time periods (labelled with the pre�x PER and the time interval)were also included in the model.

To get some feel for the data, the growth in GDP has a mean of about3.3%, with a standard deviation of 2.4% and a range of -4.3% to 12.8%. Thelabor organization variable ranges, in principal from 0 to 5, with a mean of3 and a standard deviation of 1; the LEFT variable ranges from 0 to 3.5,with a mean of 2 and a standard deviation of 1. Both of these variables werestandardized by Garrett.

Garrett estimated this model with �xed e�ects; these estimates are inthe right columns of Table 1. (All estimations were done using Stata 7;earlier releases incorrectly estimate some standard errors that are relevant inlater sections.) The same model was reestimated with a standard randome�ects speci�cation; results are in the last two columns of Table 1. Standardhypothesis tests show that either �xed or random e�ects are required in thismodel.

With the exception of the LABOR coeÆcient, the estimates based on �xed

5

Table 1: Comparison of �xed and random e�ects estimates of Garrett modelof economic growth in 14 OECD nations, 1966{1990

Fixed E�ects Random E�ects

Variable � SE � SEGDPL :14 :05 :16 :06OIL �6:62 6:26 �4:44 5:47DEMAND :64 :11 :55 :11LABOR �:13 :62 �:64 :38LEFT �:68 :42 �:78 :39LEFTxLABOR :23 :16 :23 :14PER6673 1:41 :55 1:42 :55PER7479 :04 :56 :06 :56PER8084 �:54 :58 �:56 :58PER8690 �:14 :55 �:15 :55CONSTANT 2:39 1:36 3:42 1:12

and random e�ects are, as expected, similar. I return to the one di�erence inthe next paragraph. As expected, trade-weighted OECD demand a�ects do-mestic growth; both the OPEC oil crisis and the various period e�ects workas expected. Since the dependent variable is a growth rate, the autoregres-sive coeÆcient is modest. Most importantly, as the social democratic corpo-ratist theory predicts, growth is highest in right controlled governments thathave decentralized labor bargaining, and in left controlled governments thathave centralized labor bargaining, with the worst performance coming fromleft governments with decentralized bargaining and right governments withcentralized bargaining. A country with a completely right government andcompletely decentralized bargaining will grow about .6% faster than a similarcountry with completely centralized bargaining; a country with a completelyleft government and completely centralized bargaining will grow almost 3%faster than a similar government with completely decentralized bargaining.Interestingly, the best combination is the social democratic corporatist one, aleft government and centralized bargaining. Countries in this situation growabout 1% faster than do their right wing non-corporatist counterparts. Allof these estimates have large standard errors, but the interaction betweenLEFT and LABOR is statistically signi�cant (when standard errors are cor-rectly computed, as we shall see in Section 5). This positive interaction isthe major prediction of the social democratic corporatist theory.

Why the big di�erence in the estimated coeÆcient of LABOR? WhileLABOR shows some intra-country variation (else its coeÆcient could not be

6

estimated in a �xed e�ects setup), it varies very little over time. Thus it ishighly correlated with the country e�ects, and so in the �xed e�ects modelits e�ect is estimated very imprecisely. If we were really interested in theimpact of LABOR on the growth of GDP, then the �xed e�ects estimateswould be very problematic. (Since the random e�ects are assumed orthogonalto the e�ect of LABOR, their inclusion obviously has almost no impact onits estimated coeÆcient.) At that point, we would need to trade o� the gainin precision from using �xed e�ects against the cost of not being able toassess the e�ect of LABOR (Beck and Katz, 2001). But in this case, interestfocuses on the LEFTxLABOR interaction, with the linear term just includedto insure that the model really is picking up an interaction e�ect. Thus thetwo sets of results are equivalent for any intended purpose, and so we canjust use the more theoretically suitable �xed e�ects model. Later estimationsall use a �xed e�ects speci�cation.

The estimated e�ects from the two models are in Table 2. It does ap-pear that the estimated e�ects are di�erent. But it must be rememberedthat the random e�ects are estimated assuming that they are orthogonal tothe other covariates, while the �xed e�ects are estimated without such anassumption. Thus, just as the estimated coeÆcient of LABOR varies de-pending on whether we use �xed or random e�ects, so will the estimatede�ects themselves. Since the e�ects in the two models are di�erent (basedon whether we want to assume independence of the e�ects and covariates), itis not surprising that the estimated e�ects di�er. But since both e�ects areestimated by construction to ensure that mean country GDP is correctly pre-dicted, the two sets of e�ects, correctly interpreted, must, by construction,be interpretively identical; that is, both must lead to identical predictions ofmean country GDP. Thus, once again, there is no reason not to work with a�xed e�ects estimator for TSCS data.

4 Estimation Issues: Old-fashioned Problems

and Solutions

The �xed e�ects model in Table 1 was estimated by ordinary least squares(OLS). This is optimal so long as the Gauss-Markov assumptions hold. Sev-eral of the Gauss-Markov assumptions are often suspect for TSCS data. An\old fashioned" approach is to treat these violations as a nuisance and cor-rect for them using feasible generalized least squares (\FGLS"). I use theterm old-fashioned because this perspective views violations of the Gauss-Markov assumptions as an estimation nuisance rather than something to bemodeled. The modern perspective, at least in time series, is to regard these\violations" as interesting features to be modeled, not swept under the rug

7

Table 2: Comparison of estimated �xed and random e�ects by country, Gar-rett model of economic growth in 14 OECD nations, 1966{1990

Fixed E�ect Random E�ectCountrya � SE �

Canada �:05 :56 :11UK �1:32 :74 �:71Netherlands �1:65 :71 �1:10Belgium �1:83 1:02 �:80France :30 :79 �:08Germany �:98 1:19 :15Austria �1:56 1:83 :22Italy �:18 :98 :69Finland �:39 1:24 :80Sweden �1:98 1:56 �:45Norway �1:18 1:24 :06Denmark �1:85 1:41 �:46Japan 2:45 :58 2:53

SD: E�ects :77aUS e�ect as baseline

8

(Hendry and Mizon, 1978). The situation for TSCS data is a bit unlike thatfor single time series, in that the FGLS approach is capable of doing consid-erable harm with TSCS data; this is because it is possible to estimate someerror properties of TSCS data that cannot be estimated with either a singletime series or cross section. I therefore consider the old-fashioned methodsprimarily as a warning of what can go wrong and for historical reasons.

Since I deal with dynamic issues in Section 6, let me assume for expositorypurposes until that section that the errors are temporally independent. Theassumption of no dynamics is usually false. But in most cases, it is possibleto separate dynamic issues from cross-sectional ones, and so I proceed in thatway.

The Gauss-Markov assumption is that the stochastic process generatingthe errors appears \spherical," that is, each of the �i;t are independent andidentically distributed (\iid") so that

E(�i;t�j;s) =

(�2 if i=j and s=t

0 otherwise.

The OLS errors will be wrong if the errors show any of the following:

� panel heteroskedasticity

� contemporaneous correlation of the errors

� serially correlated errors

Putting aside serially correlated error (for this section only), TSCS modelswith contemporaneously correlated and panel-heteroskedastic errors have thecovariance matrix of the errors, as an NT�NT block diagonal matrix withan N�N matrix of contemporaneous covariances, � (having typical elementE(�i;t�j;t)), along the block diagonal. This follows from the assumption thatthe error process can be characterized by

E(�i;t�j;s) =

8><>:�2i if i = j and s=t

�i;j if i 6= j and s=t

0 otherwise.

Note that the data provide T sets of residuals to estimate �. Thus FGLScould be used to estimate Equation 1 with the panel heteroskedastic andcontemporaneously correlated error matrix. Such a procedure �rst does OLS,uses the OLS residuals to estimate � and then uses the standard FGLSformulae to estimate model parameters and standard errors. This procedurewas �rst described by Parks (1967) and then popularized in Kmenta's (1986)text, so it is usually known as Parks or Parks-Kmenta.

9

Unlike many common FGLS applications, this procedure requires esti-mating an enormous number of parameters for the error covariances. Notethat FGLS assumes that the parameters of � are known, not estimated. InBeck and Katz (1995), we showed that the properties of the Parks estimatorfor typical TSCS T 's were very bad, and that, in particular, the estimatedstandard errors could underestimate variability by 50% to 200%, depend-ing on T . Our conclusion is that the Parks-Kmenta estimator simply should

not be used. Table 3 shows why many researchers liked Parks-Kmenta; itgives nice t-ratios that are so prized by journal editors. The Parks-Kmentaestimator of the basic Garrett model is in the third set of columns of thetable. Note that with a T of 25, standard errors are anywhere between 50%and 100% smaller than corresponding OLS standard errors. While this maymake it easier to publish, the simple fact is that the Parks-Kmenta standarderrors are wrong, and, worse, they are wrong in the direction of being wildlyoptimistic.

Some researchers, noting this problem, avoided Parks-Kmenta, but stillused FGLS to correct for panel heteroskedasticity. In this model, all errorcovariances between di�erent units are assumed to be zero, but each unit hasits own error variance, �2i . Panel heteroskedastic errors thus yield

E(�i;t�j;s) =

(�2i if i=j and s = t

0 otherwise.

It di�ers from simple heteroskedasticity in that error variances are constantwithin a unit.

This model appears to avoid the craziness of Parks-Kmenta, since onlyN error parameters need be estimated (and N is typically not enormous forTSCS studies). The FGLS correction for panel studies proceeds, as usual,by a �rst round of OLS, with a second round of weighted OLS, with weightsbeing inversely proportional to the estimated �i for each unit (and these �i

estimated in the obvious manner).While our simulations do not show that FGLS for panel heteroskedasticity

has horrible properties (at least for reasonable N and T ), we do feel thatFGLS for panel heteroskedasticity is very problematic. This is because theweights used in the procedure are simply how well the observations for aunit �t the original OLS regression plane. Thus the second round of FGLSsimply downweights the observations for a country if that country does not�t the OLS regression plane well. Thus on the second round �t will begood! In other, non-TSCS, applications, the correction for heteroskedasticityis theoretical, and does not simply downweight poorly �tting observations.The closest analog to the FGLS procedure for TSCS data would be to run across-sectional regression and then weight each observation by the inverse ofits residual. While such a procedure would yield nice R2's and t's, it would

10

be an odd procedure. In Beck and Katz (1996b), we reanalyzed a study ofBurkhart and Lewis-Beck (1994) which analyzed economic growth in over100 countries, and found that three quarters of the weight in the secondround regression came from only 20 advanced industrial societies that �t the�rst round regression well. The FGLS correction for panel heteroskedasticitydoes less harm in the Garrett example (the middle columns of Table 3), butit does seem wrong to use a procedure which weights observations by howwell they �t a prior regression.

Table 3: Comparison of OLS and FGLS estimates of Garrett model of eco-nomic growth in 14 OECD nations, 1966{1990

OLS/PCSE Panel. Het. Cont. Corr.

Variable � PCSE � SE � SEGDPL :14 :07 :10 :05 :08 :05OIL �6:62 6:42 �5:92 5:79 �4:71 3:12DEMAND :64 :16 :60 :10 :72 :07LABOR �:13 :56 �:44 :52 �:03 :29LEFT �:68 :31 �:57 :27 �:65 :13LEFTxLABOR :23 :12 :23 :14 :23 :06PER6673 1:41 :74 1:56 :49 1:85 :40PER7479 :04 :77 :29 :49 :52 :40PER8084 �:54 :80 �:51 :51 �:18 :42PER8690 �:14 :76 �:02 :49 :13 :40CONSTANT 2:39 1:36 2:90 1:19 1:84 :79

All models estimated with �xed e�ects

5 Panel Correct Standard Errors

The results of the previous section are negative: Parks-Kmenta has verypoor properties and the FGLS correction for panel heteroskedasticity is, inmy view, inherently awed. But this does not mean that OLS is a goodestimator for TSCS data; the errors are likely, after all, to show both panelheteroskedasticity and contemporaneous correlation of the errors. (Standardlikelihood ratio tests indicate that the Garrett residuals su�er from statisti-cally signi�cant panel heteroskedasticity and contemporaneous correlation ofthe errors, with the null of spherical errors rejected with a P value under .01for both alternative hypotheses.) Under these conditions, OLS is still con-sistent, though it is ineÆcient, and the OLS standard errors may be wrong.

11

(It is often said that violations of the Gauss-Markov assumptions lead toincorrect standard errors, but this is true only if the error variance-matrixis dependent on the X'X matrix, which is tested for by whether the unitsquares and cross-products of the residuals are related to the squares andcross-products of the independent variables.) While ineÆciency may be animportant issue, it is easy to at least compute \panel correct standard errors"(PCSEs), which correctly measure the sampling variability of the OLS esti-mates, �, even with panel heteroskedastic and contemporaneously correlatederrors.

The usual OLS formula for the standard errors may be misleading forTSCS data. The correct formula is given by the square roots of the diagonalterms of

Cov(�) = (X0X)�1fX0

Xg(X0X)�1:

OLS estimates this by

dCov(�) = (X0

X)�1��P

i

Pt e

2

i;t

NT � k

�X

0X

�(X0

X)�1 (2)

which then simpli�es to the usual OLS estimate of the variance-covariancematrix of the estimates (the e's are OLS residuals). OLS standard errors areincorrect insofar as the middle terms in the two equations (in braces) di�er.

For TSCS models with contemporaneously correlated and panel het-eroskedastic (but temporally independent) errors, is a block diagonal ma-trix. To estimate Equation 2 we need an estimate of these diagonals, �.Since the OLS estimates of Equation 1 are consistent, we can use the OLSresiduals from that estimation for this purpose. Let ei;t be the OLS residualfor unit i at time t. We can estimate a typical element of � by

�i;j =

PT

t=1 ei;tej;t

T:

Letting E denote the T �N matrix of the OLS residuals, we can estimate by

=(E0

E)

T IT

where is the Kronecker product.We can then compute \Panel Correct Standard Errors" (PCSEs) by tak-

ing the square root of the diagonal elements of

(X0X)�1X0

�E

0E

T IT

�X(X0

X)�1:

These still use the OLS estimates of � but provide correct reports of thevariability of these estimates. It should be noted that PCSEs are di�erent

12

fromWhite's (1980) heteroskedasticity consistent standard errors. The latterdeal only with ordinary heteroskedasticity, not panel heteroskedasticity, anddo not account for contemporaneous correlation of the errors.

Simulations reported in Beck and Katz (1995) show that PCSEs accu-rately portray the sampling variability of the OLS �. Even when there isno panel heteroskedasticity or contemporaneous correlation of the errors, thePCSEs are within a few percent of the OLS standard errors; but when theTSCS structure of the data leads to incorrect standard errors, the PCSEsare still very accurate. Thus there is no cost to computing PCSEs, and somepotential gain. They are, in addition, easy to compute and implemented instandard software (such as Stata).

The second column of Table 3 shows the PCSEs for the Garrett model (theOLS estimates of the � are identical to those in Table 1). While the PCSEsare not tremendously di�erent from their OLS counterparts, they do di�er byabout one third. Thus, for example, the OLS standard errors understate ouruncertainty about the critical LEFTxLABOR interaction term; with correctstandard errors, we can clearly reject the null hypothesis that this interactionhas no e�ect on economic growth.

Our simulations leave no doubt that the PCSEs are superior to the OLSstandard errors, and so lead to more correct inference. While they do not dealwith the potential ineÆciency of OLS, they do not cause the harm that theFGLS estimators cause. Obviously it would be good to model the variableswhich lead to the error complications, just like it would be good to modelthe causes of serial correlation in time series models. But in practice thiscan be hard to do, so OLS with PCSEs provides a nice, practical, solution tostatistical problems common in TSCS data. But this is still an old-fashionedsolution (though at least one that works); the complications of TSCS are stilla nuisance which impedes correct estimation. It is better to correct theseproblems than to ignore them, but it would be still better to explicitly modelthe interesting special properties of TSCS data. I begin with dynamics.

6 Models with Temporally Dependent Ob-

servations

So far I have dealt with only cross-sectional issues. I now turn to problemscaused by dynamics. Obviously TSCS data will show dynamics. The old-fashioned treatment is to think of these dynamics as a nuisance, that is, tomodel them as serially correlated errors which must be corrected by FGLS(Prais-Winsten or the like). Since this correction only requires the estimationof one serial correlation parameter, it does not have bad properties. But itis not consistent with modern time series analysis.

13

Modern time series analysts model the dynamics directly as part of thespeci�cation. The simplest form of this is the use of lagged dependent vari-ables, but, depending on the data, other forms, such as single equation errorcorrection (Davidson, Hendry, Srba, and Yeo, 1978), may be tried. Whateverwe can do for time series, we can do for TSCS data. The only limitation isthat typical TSCS data is annual, and we typically observe fewer time pointsper unit than we do for single time series analysis.

So following the identical argument for time series, we can usually replaceserially correlated error models with models involving a lagged dependentvariable. This simpli�es other estimation issues so long as the error process,conditional on the lagged dependent variable being in the speci�cation, istemporally independent. It is easy to test for this independence via a stan-dard Lagrange multiplier test. All the speci�cations of the Garrett model usea single lag of the dependent variable, which seems adequate. The Garrettdata appear to be well modeled by stationary time series methods, and sono attempt was made to investigate error correction models of the growthof GDP. (Note that Garrett does not model the level of GDP, which wouldrequire non-stationary methods.)

It should be noted that panel analysts have shied away from the use oflagged dependent variables (at least with �xed e�ects) because of what isknown as Hurwicz (1950) bias. Hurwicz showed that the estimates of anautoregressive term were always biased downwards if the model included aconstant term. The �xed e�ects serve like a series of constant terms. But, asis well known, Hurwicz bias is of order 1

T, and so is not a problem for typical

TSCS data, and disappears asymptotically as T grows.The lagged dependent variable approach has also been criticized because

it may lead to inconsistent parameter estimates in the presence of seriallycorrelated errors (Madalla, 1998). As noted above, this is seldom a problem inpractice, and an easy problem to test for. Analysts of single time series havetypically attempted to directly model the dynamics, rather than thinkingof them as an estimation nuisance (Hendry and Mizon, 1978). There is noreason that TSCS analysts should not proceed in the same manner.

7 Modeling the Features of TSCS Data

Just as modern time series analysts directly model dynamics, it is possibleto model other interesting features of TSCS data. The direct modeling ofthese features eliminates any of the old-fashioned nuisance issues. More im-portantly, the more modern approach allows for a better understanding ofthe properties of TSCS data. Two critical issues that are current areas ofresearch are 1) allowing for heterogeneous units and 2) the introduction ofspatial ideas. I begin with the latter.

14

Spatial modeling in TSCS data

There may be some relationship between units, with a bigger relationshipbetween nearby units. Nearby could be used in a geographical sense, orit could be used to indicate that nations are \nearby" if they engage insubstantial trade. Spatial econometrics (Anselin, 1988) is a vibrant area,and it presents many technical issues. It turns out to be easy to use spatialideas in TSCS data.

Spatial econometricians model two types of spatial dependence. The er-rors of nearby units may be correlated (spatial autocorrelation), with thedegree of correlation being a function of nearness; alternatively, a \spatiallag," that is the weighted sum of the dependent variable of all other units,with weight being proportional to nearness, may be added to the speci�ca-tion. If one has only cross-sectional data either of these present formidabletechnical challenges. But with TSCS data, if one can assume that neighborshave an e�ect only with a lag, and then so long as the errors are temporallyindependent, one can add this spatial lag to Equation 1 and still use OLS.

I only discuss spatial lags, since spatial autocorrelation seems to me tobe an odd assumption. The latter assumes that the observational errors forunit i are related to those of unit j in proportion to their closeness, but thereare no other ties between i and j. Thus if country j grows rapidly becauseof some measured variable, that growth will have no impact on country i,but if it grows rapidly because of some unmeasured variable (what else isthe error term?), then it will have an e�ect. This seems wrong. Thus I willsimply assume that the yj;t�1 have an e�ect on yi;t where the size of the e�ectis prespeci�ed by some \nearness" weighting matrix (given a priori). Thetemporal lag of the spatial lag term is critical, since it means that includingspatial lags causes no econometric problems so long as the observations are(conditionally) temporally independent.

In fact, this lagged spatial lag has already been included in the Gar-rett speci�cation I have been estimating. His DEMAND variable is a tradeweighted average (for each country) of the growth of GDP in all other OECDcountries. Its inclusion in the model re ects the two obvious ideas that whenthe world does better, any nation will likely do better, and how much betterthat nation does re ects how much better other nations are doing, weightedby the importance of those other nations to any given economy. As is typi-cal in spatial econometrics, these weights are speci�ed a priori, here by theproportion of national GDP accounted for by dyadic trade. It is diÆcult toimagine a TSCS model that somehow does not use spatial ideas. But spatialthinking need not cause any econometric complications for TSCS modelers.

Finally, analysts can combine the old-fashioned PCSE approach with themore modern spatial analysis to assess whether the spatially lagged depen-dent variable accounts for the observed spatial correlation of the errors. If

15

this is the case, then the di�erence between the OLS standard errors andthe PCSEs should be decreased when the spatially lagged dependent vari-able is included in the model, since the only reason that OLS standard errorsand PCSEs di�er is that the errors show some unspeci�ed spatial correla-tion pattern. Looking at the Garrett estimates, the ratio of OLS standarderrors to their corresponding PCSEs is similar whether or not we include theDEMAND variable in the speci�cation. This indicates that DEMAND isnot explaining the spatial correlation of the residuals, and so further spatialanalysis is required. Since the PCSEs di�er from the OLS standard errorsby only 20% or so, the threat to inference of this ignored spatial componentis not great.

Assessing Heterogeneity via Cross-Validation

Political scientists seem to subscribe to one of two views on heterogeneity: theworld is either completely heterogeneous and scholars should analyze coun-tries in isolation; or the world is completely homogeneous and one shouldfreely estimate models like Equation 1, which assume complete homogeneity(pooling). Of course researchers never assume a totally homogeneous world;they restrict analysis to OECD nations, or developing nations or Latin na-tions or whatever. But once they restrict the set of nations, they typicallyassume that all nations that meet the restriction all follow exactly the samespeci�cation.

One way to assess this is via cross-validation (Stone, 1974). The simplestform of cross-validation is to leave out one observation, �t a regression with allthe others and then \predict" the left out observation. For all but very smalldata sets this procedure is useless, since the \prediction errors" converge tothe OLS residuals as N increases. But cross-validation works well for TSCSdata, if instead of leaving out one observation at a time, we leave out one unitat a time. We can then compare speci�cations by seeing how they performin terms of mean absolute (or square) \prediction" error, or we can see if anyunits are predicted less well than the others. I go through such an exercisein Table 4.

In that table, we see that typical mean absolute forecast errors range from1.2 to 2 percent in the growth in GDP, except for Japan, which has a forecasterror of 3.2 percent in the growth of GDP. Thus clearly Japan �ts the basicspeci�cation much less well than any other OECD nation. A researcher mightbe well advised to drop the Japanese data from the Garrett speci�cation.This makes particular sense since the Japanese political economy di�ers inmany ways from other OECD political economies. I would be less sanguineif a country like the Netherlands had the worst forecast errors. One alwaysmust be careful that one is not getting good �ts by simply dropping units

16

that �t poorly. But in this case, both cross-validation and knowledge ofthe OECD nations leads me to argue that Japan should not be included inthe analysis. Dropping Japan from the analysis increases the impact of thecritical LEFTxLABOR interaction term by about 20%.

Table 4: Out of sample forecast errors (OLS) by country for Garrett modelof economic growth in 14 OECD nationsa, 1966{1990

Country Mean absolute errorUS 1:9Canada 1:7UK 1:7Netherlands 1:6Belgium 1:6France 1:2Germany 1:4Austria 1:3Italy 1:7Finland 2:0Sweden 1:2Norway 1:5Denmark 1:7Japan 3:2

aNo unit e�ects

Random CoeÆcients Models

The random coeÆcients model (RCM) is an interesting compromise betweenassuming complete homogeneity and complete heterogeneity. This modelis the same as the Bayesian hierarchical model. Western (1998) has a fulldiscussion of this model in the context of TSCS data.

The RCM is a compromise between estimating the fully pooled Equa-tion 1 and a fully unpooled estimate, that is, a separate OLS for each unit.There is not enough data for the latter (that is separate OLS estimationswill have huge standard errors), but the former requires the very strong as-sumption of complete pooling. The RCM model uses the idea of \borrowingstrength" (Gelman, Carlin, Stern, and Rubin, 1995). This Bayesian no-tion \shrinks" each of the individual unit OLS estimates back to the overall(pooled) estimate. RCM's just generalize random e�ects from the intercept

17

to all parameters of interest. The model can either be estimated via classi-cal maximum likelihood (Pinheiro and Bates, 2000), FGLS (Swamy, 1971),or by modern Bayesian methods (Western, 1998). While the connectionbetween \mixed estimators" and empirical Bayes is well known (Judge, Grif-�ths, Hill, L�utkepohl, and Lee, 1985, ch. 3), many analysts using randome�ects or RCM's treat the problem as a classical estimation problem, usingTheil and Goldberger's (1961) classical \mixed" estimation method whichincorporates non-sample information into classical regression models. Butwe can better understand what is going on with RCM (and random e�ects)models by thinking of them as empirical Bayes methods.

The RCM model is thus

yi;t = xi;t�i + �i;t (3)

where the �i � N(�;�). Note that if we only allow the intercept to berandom, this reduces to the usual random e�ects model. If we write �i =� + �i where �i � N(0; ��), we get

yi;t = xi;t� + fxi;t�i + �i;tg

where the latter term (in braces) is just a complicated error process. Themodel requires the identifying assumption that the randomness in the �i isindependent of the randomness in the �i;t.

The RCM can be made more useful by allowing the �i to be functionsof other unit variables, zi, which allows for modeling di�erential e�ects asa function of di�ering institutions. (Note: the z's are time invariant, sothey only measure properties of units.) This is particularly important incomparative politics, where we might expect that the e�ect of some x onthe dependent variable is contingent on structural features that vary fromcountry to country. As an example, the Garrett model asserts that the e�ectof having a left government is contingent on the type of labor bargaining ineach country. We can then write:

�i = zi + � + �i: (4)

Substituting Equation 4 into Equation 3, we see that this model is just aninteraction model with random coeÆcients on the linear terms only. Thuswhile Equation 4 is extremely useful, the introduction of interactions alonecauses no interesting estimation problems (still allowing OLS/PCSEs). Thuswhether or not analysts use random coeÆcients, they clearly should considermodeling the coeÆcients as functions of unit-invariant variables.

As noted, the RCM model can be estimated classically (in the mixedsense) or in a fully Bayesian manner. Most econometricians have followedSwamy (1971) and estimated the RCM via FGLS; our unpublished research

18

(Beck and Katz, 1996a; see also Madalla and Hu, 1996) indicates that com-mon FGLS implementations have poor �nite performance. Pinheiro andBates (2000) provides both a maximum likelihood and restricted maximumlikelihood implementation in both R and S-Plus. The complete RCM model,as speci�ed by Swamy, asks a lot of the data. In particular, it assumesthat all random coeÆcients are drawn from a K-variate multivariate normaldistribution, which means that an enormous number of parameters in thevariance-covariance matrix of that distribution must be estimated. A moreuseful assumption for practical purposes, and the one made by Western, isthat each of the �k are generated independently, dramatically reducing thenumber of parameters in the model. The model can be simpli�ed furtherby assuming that some of the �k are �xed. These simpli�cations are al-most certainly necessary if we are going to have any chance of meaningfullyestimating the model.

How useful is the RCM likely to be in practice? It is most useful toapproach this question by thinking of the RCM as a shrinkage estimator. Ifthe individual estimates of �i are completely shrunk back to the overall �,then there is little gain to using the RCM over the fully pooled estimator;on the other hand, if there is almost no shrinkage, then the RCM provideslittle gain over unit by unit OLS. In any shrinkage estimator, the degree ofshrinkage is a function of the heterogeneity of the unit by unit estimates,and the information contained in those estimates. Heterogeneity is assessedby the standard F -statistic which tests the homogeneity of the unit by unitOLS estimates (that is, which compares the sums of squared errors from thecompletely pooled and unit by unit estimations). The information content inthe individual OLS unit estimates is a function of the number of observationsper unit T , and the number of coeÆcients estimated, K. Madalla and Hu(1996) show that the shrinkage factor is c

Fwhere

c =(N � 1)K � 2

NT �NK + 2

and F is the appropriate F -statistic.How much shrinkage will we get in typical TSCS data? For the Gar-

rett model I have been using, F = 1:97 so the RCM shrinks the unit OLSestimates back to the pooled estimate by 18%. (This assumes we are not al-lowing the period dummies to vary.) But note what happens if we only allow,as suggested, a few coeÆcients of interest to vary. The degree of shrinkagedeclines, for large T , more or less linearly with K. Thus if we only allowthe e�ect of LEFT to vary randomly, the shrinkage factor declines to about2% (depending on how heterogeneous that one estimated coeÆcient is in theunit by unit OLS estimates). Thus, the RCM yields little improvement overthe unit by unit estimates when those estimates are relatively heterogeneous;

19

but when the unit by unit estimates are relatively homogeneous, the simplefully pooled estimator is usually superior.

The RCM model may be most important for estimating standard errors.The unit by unit estimates have high standard errors, both because each is es-timated with only T observations and because of multicolinearity issue. TheRCM's \borrowing of strength" may have its most important consequencein providing more realistic standard errors for each of the unit e�ects. Therelative accuracy of RCM standard errors for the unit e�ects has not beenwell studied, and so for now the superiority of RCM standard errors for theunit e�ects must be considered a conjecture.

Thus, what is clearly a very important approach for panel data is a muchless interesting approach for TSCS data. This should not blind us to con-sidering heterogeneous models, nor should it lead us to blindly accepting thefully pooled model. But the RCM does not appear to completely solve theinteresting problem of modeling heterogeneity for TSCS data. But it mayprovide more accurate assessments of the standard errors of the parametersof interest.

8 Models with Binary Dependent Variables

So far we have assumed a continuous dependent variable, but what if it isbinary (or worse yet, polychotomous or count or tobit)? This is a commonproblem in the panel data literature, wherein we may inquire about whetherpeople voted in repeated interviews. This causes well known problems, es-pecially if one wants to include �xed or random e�ects (see the discussionin any econometrics text, such as Greene, 1999, 837{41). This is also anactive area of research among biometricians, since we have many repeatedtrials with a binary outcome variable. But we must remember that binaryTSCS (BTSCS) data is not panel data, and so the problems and solutionsare di�erent.

BTSCS analysts invariably use some variant of a logit or probit model.Unlike the panel case, there is no reason that �xed e�ects cannot be includedin the speci�cation (since asymptotics are in T , not N). Thus what is a verydiÆcult issue for panel analysts is a relatively simple one for BTSCS analysts.

While e�ects are thus easier for BTSCS data than for binary panel data,modeling the dynamics for such data is hard. Many researchers use ordinaryprobit or logit, but ignoring the dynamics can lead to highly optimistic stan-dard errors. This problem is most severe when the errors are correlated andthe independent variables trend. Beck and Katz (1997) presents simulationresults showing that in the presence of severe trending and autocorrelationthat standard errors can be o� by 50% or more.

20

In Beck, Katz, and Tucker (1998) we proposed one solution that worksif BTSCS is really event history data; I provide a quick summary of thatline of inquiry here. Event history methods model the time between events.IR scholars working on the causes of international con ict have often useda BTSCS setup, where pairs of nations (dyads) are observed yearly; thedependent variable in such studies is the presence or absence of con ict,with independent variables marking structural properties of the dyad, mostnotably whether or not the dyadic partners are democracies or whether theyengage in substantial trade (see, for example, Maoz and Russett, 1993 orOneal and Russett, 1997 for examples of this type of analysis). This dyad-year data was typically analyzed using ordinary probit or logit, and hencesubject to the problems described above.

The information in these dyad-year data sets can be well summarized bythe time between con icts. Thus we can use event history methods to modelthe BTSCS dyad-year data; these methods invariably allow for temporaldependence of the observations. As is well known, event history models canbe analyzed using either discrete or continuous methods, and there is a one toone relationship between continuous and discrete methods; every continuousmethod has a discrete counterpart and vice versa (Sueyoshi, 1995).

To be more speci�c, it is easy to visualize BTSCS dyad-year data asgrouped duration data suitable for a grouped Cox (1972) proportional haz-ards semi parametric estimation. The discrete version of the grouped du-ration Cox model is a complementary log-log model with a series of timedummy variables added to the speci�cation; these dummy variables marktime from the previous event, not calendar time. It is hard in practice tosee much di�erence between the complementary log-log speci�cation and themore common logit or probit. We thus recommend that researchers withdata similar to the IR dyad-year con ict data do logit or probit, but add thetemporal dummies to their speci�cation.

In Table 5 I compare a discrete grouped time logit analysis presented inBeck, Katz, and Tucker (1998) with an ordinary logit analysis of Oneal andRussett (1997). Duration is measured both with dummy variables and witha natural cubic spline in time; it matters little which is used. The majorconsequence of including the temporal variables is that the paci�c e�ects ofinternational trade disappear.

What if the BTSCS data does not look like event history data (that is,one has lots of movement between zero and one, so that the time until a onedoes not really do a good job of characterizing the information)? At thatpoint one could go back to the dynamic latent variable setup of a probit

21

Table 5: Comparison of ordinary logit and grouped duration analyses

Ordinary Grouped Duration

Logit Logit Logit CloglogDummya Spline Dummyb

Variable I II III IV

Democracy �0:50 �0:55 �0:54 �0:49(0:07) (0:08) (0:08) (0:07)

Economic Growth �2:23 �1:15 �1:15 �0:81(0:85) (0:92) (0:92) (0:76)

Alliance �0:82 �0:47 �0:47 �0:43(0:08) (0:09) (0:09) (0:08)

Contiguous 1:31 0:70 0:69 0:55(0:08) (0:09) (0:09) (0:08)

Capability Ratio �0:31 �0:30 �0:30 �0:30(0:04) (0:04) (0:04) (0:04)

Trade �66:13 �12:67 �12:88 �12:50(13:44) (10:50) (10:51) (9:96)

Constant �3:29 �0:94 �0:96 �1:11(0:08) (0:09) (0:09) (0:08)

Peace Years �1:82(0:11)

Spline(1)c �:24(0:03)

Spline(2)c �:08(0:01)

Spline(3)c �:01(0:003)

Log Likelihood �3477:6 �2554:7 �2582:9 �2554:1df 20983 20036 20979 20949

N=20990Standard errors in parenthesesa31 temporal dummy variables in speci�cation not shown3 dummy variables and 916 observations droppeddue to outcomes being perfectly predicted

b34 temporal dummy variables in speci�cation not showncCoeÆcients of Peace Years cubic spline segments

22

model, writing

y�i;t = xi;t� + �y�i;t�1 + �i;t

yi;t = 1 if yi;t > 0

yi;t = 0 if yi;t � 0

This model is diÆcult to estimate by standard methods, but is estimableusing Markov Chain Monte Carlo methods. As this method becomes morestandard, it will probably be the case that all dynamic discrete dependentvariable models will be estimated this way.

9 Conclusion

Katz and I became interested in TSCS data in about 1993 (through our in-terest in cross-validation). TSCS models were becoming of great interest inpolitical science, since they appeared to allow students of comparative poli-tics (broadly de�ned) to use powerful statistical methods that had been theprovince of students of American politics (typically studying voting behaviorvia large N surveys). Most studies either ignored TSCS issues or treatedthose issues as a nuisance, using an FGLS estimation method to treat thosenuisances. These FGLS procedures either have poor statistical properties orseem dangerous on other grounds. One reason that TSCS data is of interestis that the richness of the data allows us to do many things; but many ofthose things should not be done.

By now most political science articles appear to use our recommendedmethodology of OLS estimates of � coupled with panel correct standard er-rors, with dynamics modeled via a lagged dependent variable. Obviously thisseems to me like a reasonable way to estimate the fully pooled TSCS model.My one hope is that researchers will take the error correction dynamic modelseriously, though in practice stationary models have appeared to performadequately. The use of (single equation) error correction models does notrequire researchers to leave the simple OLS world.

Now that estimation issues have been dealt with, interest should focus onspeci�cation. Presumably political scientists have a comparative advantagein speci�cation, not estimation! One current area of research is modelingTSCS data with a binary dependent variable; this is a particularly hot topicin the study of international con ict. While this arena can be thought of asan arena of high technique, the important issues are issues of speci�cation |that is, do the BTSCS data look like event history data, or should they bemodeled with a lagged latent variable in a dynamic probit setup. Two otherpressing issues are the incorporation of spatial e�ects and the modeling ofheterogeneity. The former does not appear hard to do, and many researchers

23

might take pleasure in noting that they have been estimating spatial modelsall along. Heterogeneity appears to be a more diÆcult problem, though\high tech" approaches such as Bayesian random coeÆcient models shouldnot detract from the use of simpler ideas, such as cross-validation. In the end,all these issues are primarily issues of speci�cation, not (diÆcult) estimation.

References

Alvarez, R. M., G. Garrett, and P. Lange (1991), Government partisanship,labor organization and macroeconomic performance, American Political

Science Review 85, 539{56.

Anselin, L. (1988), Spatial Econometrics: Methods and Models, Kluwer Aca-demic, Boston.

Beck, N. and J. N. Katz (1995), What to do (and not to do) with time-seriescross-section data, American Political Science Review 89, 634{47.

Beck, N. and J. N. Katz (1996a), Lumpers and splitters united: The randomcoeÆcients model, paper presented at the Annual Meeting of the Societyfor Political Methodology, Ann Arbor, July.

Beck, N. and J. N. Katz (1996b), Nuisance vs. substance: Specifying andestimating time-series{cross-section models, Political Analysis 6, 1{36.

Beck, N. and J. N. Katz (1997), The analysis of binary time-series{cross-section data and/or the democratic peace, paper presented at the AnnualMeeting of the Political Methodology Group, Columbus, OH.

Beck, N. and J. N. Katz (2001), Throwing out the baby with the bathwater:A comment on green, kim and yoon, International Organizations .

Beck, N., J. N. Katz, and R. Tucker (1998), Taking time seriously: Time-series{cross-section analysis with a binary dependent variable, AmericanJournal of Political Science 42, 1260{88.

Blanton, S. L. (2000), Promoting human rights and democracy in the de-veloping world: Us rhetoric versus us arms exports, American Journal of

Political Science 44, 123{31.

Burkhart, R. and M. Lewis-Beck (1994), Comparative democracy: The eco-nomic development thesis, American Political Science Review 88, 903{10.

Cox, D. R. (1972), Regression models and life tables, Journal of the Royal

Statistical Society, Series B 34, 187{220.

24

Davidson, J., D. Hendry, F. Srba, and S. Yeo (1978), Econometric modellingof the aggregate time-series relationship between consumers' expendituresand income in the United Kingdom, Economic Journal 88, 661{92.

Fiorina, M. (1994), Divided government in the American states | a byprod-uct of legislative professionalism, American Political Science Review 88,304{16.

Fording, R. (1997), The conditional e�ect of violence as a political tactic:Mass insurgency, welfare generosity, and electoral context in the americanstates, American Journal of Political Science 41, 1{29.

Freedman, D. and S. Peters (1984), Bootstrapping a regression equation:Some empirical results, Journal of the American Statistical Association

79, 97{106.

Garrett, G. (1998), Partisan Politics in the Global Economy , CambridgeUniversity Press, New York.

Gasiorowski, M. J. (2000), Democracy and macroeconomic performance inunderdeveloped countries: An empirical analysis, Comparative Political

Studies 33, 319{49.

Gelman, A., J. B. Carlin, H. S. Stern, and D. B. Rubin (1995), BayesianData Analysis, Chapman & Hall, London.

Giles, M. and K. Hertz (1994), Racial threat and partisan identi�cation,American Political Science Review 88, 317{26.

Greene, W. (1999), Econometric Analysis, Prentice Hall, Upper SaddleRiver, N.J., 4th ed.

Grier, K. and G. Tullock (1989), An empirical analysis of cross-nationalgrowth, 1951{80, Journal of Monetary Economics 24, 259{76.

Hall, P. and R. Franzese (1998), Mixed signals: Central bank independence,coordinated wage bargaining, and european monetary union, InternationalOrganizations 52, 505{35.

Hendry, D. and G. Mizon (1978), Serial correlation as a convenient simpli�-cation, not a nuisance: A comment on a study of the demand for moneyby the Bank of England, Economic Journal 88, 549{563.

Hicks, A. and L. Kenworthy (1998), Cooperation and political economic per-formance in a�uent democratic capitalism, American Journal of Sociology

103, 1631{72.

25

Hollingsworth, R., R. Hanneman, J. Hage, and C. Ragin (1996), The e�ectof human capital and state intervention on the performance of medicalsystems, Social Forces 75, 459{84.

Hsaio, C. (1986), Analysis of Panel Data, Cambridge University Press, NewYork.

Hurwicz, L. (1950), Least-squares bias in time series, in T. Koopmans (ed.),Statistical Inference in Dynamic Economic Models, Wiley, New York, 365{83.

Iversen, T. (1998), Wage bargaining, central bank independence and the reale�ects of money, International Organizations 52, 469{504.

Judge, G., W. E. GriÆths, R. C. Hill, H. L�utkepohl, and T.-C. Lee (1985),The Theory and Practice of Econometrics, Wiley, New York, 2nd ed.

Kmenta, J. (1986), Elements of Econometrics, Macmillan, New York, 2nded.

Liang, K.-Y. and S. L. Zeger (1986), Longitudinal data analysis using gener-alized linear models, Biometrika 73, 13{22.

Madalla, G. (1998), Recent developments in dynamic econometric modeling:A personal viewpoint, Political Analysis 7, 59{87.

Madalla, G. S. and W. Hu (1996), The pooling problem, in L. M�aty�as andP. Sevestre (eds.), The Econometrics of Panel Data, Kluwer Academic,Dordrecht, 307{22, 2nd ed.

Maoz, Z. and B. M. Russett (1993), Normative and structural causes ofdemocratic peace, 1946{1986, American Political Science Review 87, 639{56.

Montanari, I. (1999), From family assistance to support for spouses andchildren: Economic aid for families, 1950{1990 in 18 countries, SociologiskForskning Suppl. S, 218{52.

Neyman, J. and E. L. Scott (1948), Consistent estimation based on partiallyconsistent observations, Econometrica 16, 1{32.

Oneal, J. R. and B. M. Russett (1997), The classical liberals were right:Democracy, interdependence, and con ict, 1950{1985, International Stud-ies Quarterly 41, 267{94.

Pampel, F. (1996), Cohort size and age speci�c suicide rates: A contingentrelationship, Demography 33, 341{55.

26

Parks, R. (1967), EÆcient estimation of a system of regression equationswhen disturbances are both serially and contemporaneously correlated,Journal of the American Statistical Association 62, 500{509.

Pinheiro, J. C. and D. M. Bates (2000),Mixed E�ects Models in S and S-Plus,Springer, New York.

Poe, S. C., C. N. Tate, and L. C. Keith (1999), Repression of the human rightsto personal integrity revisited: A global cross-national study covering theyears 1976{1993, International Studies Quarterly 43, 291{313.

Radcli�e, B. and P. Davis (2000), Labor organization and electoral partic-ipation in industrial democracies, American Journal of Political Science

44, 132{41.

Smith, K. (1997), Explaining variation in state-level homicide rates: Doescrime policy pay?, Journal of Politics 59, 350{67.

Stone, M. (1974), Crossvalidatory choice and assessment of statistical pre-diction, Journal of the Royal Statistical Society, Series B 36, 111{33.

Su, T., M. Kamlet, and D. Mowery (1993), Modeling United States budgetaryand �scal policy outcomes | a disaggregated, systemwide perspective,American Journal of Political Science 37, 213{245.

Sueyoshi, G. T. (1995), A class of binary response models for grouped dura-tion data, Journal of Applied Econometrics 10, 411{31.

Swamy, P. A. V. B. (1971), Statistical Inference in Random CoeÆcient Mod-

els, Springer-Verlag, New York.

Theil, H. and A. S. Goldberger (1961), On pure and mixed estimation ineconomics, International Economic Review 2, 65{78.

Western, B. (1998), Causal heterogeneity in comparative research: ABayesian hierarchical modelling approach, American Journal of Political

Science 42, 1233{59.

White, H. (1980), A heteroskedasticity-consistent covariance matrix and adirect test for heteroskedasticity, Econometrica 48, 817{38.

27


Recommended