Grouped E ects Estimators in Fixed E ects...

transcript

Grouped Effects Estimators in Fixed Effects Models

C. Alan Bester and Christian B. Hansen∗

April 2009

Abstract.

We consider estimation of nonlinear panel data models with common and individual specific parameters.Fixed effects estimators are known to suffer from the incidental parameters problem, which can lead to largebiases in estimates of common parameters. Pooled estimators, which ignore heterogeneity across individuals,are also generally inconsistent. We assume that individuals in our data are grouped on multiple levels. Thesegroups may be based on some external classification (for example, SIC codes), geographic location (censustract, county, state, etc.), or perhaps based on observable right hand side variables, and may be nested(hierarchical) or non-nested. We consider “group effects” estimators, where individual specific parametersare assumed common across groups at some level. We provide conditions under which group effects estimatesof common parameters are asymptotically unbiased and normal. Our conditions suggest a tradeoff betweentwo sources of bias, one due to incidental parameters and the other due to misspecification of unobservedheterogeneity. Our findings suggest that one may wish to control for heterogeneity at the group level evenwhen individual specific effects are present. These findings are confirmed in a Monte Carlo study andillustrated in two empirical examples.

Keywords: Fixed Effects, Panel Data, Hierarchical Models

JEL Codes: C10, C13, C23

1. Introduction

Panel data is widely used in empirical economics. Such data allows researchers to control forunobservable, time invariant individual-level heterogeneity that, according to economic theory,may be related to covariates of interest. Such heterogeneity arises, for example, with household-specific willingness to pay for a given product, which may be correlated with income, or firm-specificpolicies, which may be related to capital structure.

∗ The University of Chicago, Booth School of Business, 5807 South Woodlawn Avenue, Chicago, IL 60637, USA.

We suppose that the model to be estimated is known up to a finite dimensional common pa-rameter and another finite dimensional parameter that may be specific to each individual but isassumed constant over time. In a linear model, individual specific effects are often treated as pa-rameters to be estimated, an approach referred to as fixed effects estimation. Using fixed effectsallows researchers to make inference about common parameters while placing very little structureon the distribution of unobservable heterogeneity. However, this approach may be problematicin nonlinear or dynamic models. As noted by Neyman and Scott (1948), noise in the estimationof individual level effects when the time dimension of the panel is short will in general contami-nate estimates of the common parameters, a phenomenon generally referred to as the incidentalparameters problem.

Recently a number of papers have studied econometric properties of fixed effects (hereafterFE) estimators, with the explicit aim of characerizing biases arising from the incidental parametersproblem.2 These papers work with asymptotic sequences where N , the number of individuals, andT , the number of time periods in the panel, both go to infinity, so that individual specific parametersare consistently estimable. However, they show that for nonlinear models, estimation of individuallevel effects introduces biases in the common parameters of order 1/T , implying that fixed effectsestimators will generally perform badly when T is small. These papers propose to estimate the 1/Tbias directly and remove the estimated bias from common parameters and other objects of interest(e.g., marginal effects). In simulations, these bias corrections provide dramatic MSE improvementsover the uncorrected fixed effects estimator in moderate-T panels. However, these bias correctedestimators may still perform badly when the time dimension is short. Further, as we emphasizebelow, inference based on these estimators tends to suffer from severe size distortions unless T isat least an appreciable fraction of N .

Another common approach, termed broadly as random effects, places restrictions on the distri-bution of unobserved heterogeneity, either assuming independence between observables and unob-servables, or assuming unobservables are drawn from a distribution defined up to a finite dimen-sional parameter, or both. An extreme example is a pooled estimator, which ignores heterogeneityentirely. When these assumptions about the distribution of unobserved heterogeneity are satisfied,the resulting estimators can perform extremely well even in very short-T panels. Unfortunately,economic theory often implies dependence between observed quantities and unobservables andrarely suggests a parametric form for this dependence. In random effects approaches, misspecifyingthe distribution of unobservables may result in inconsistency of estimates for common parameters.Random effects estimators that involve integration over a specified distribution of unobservables

2See, e.g., Hahn and Kuersteiner (2002). An excellent survey of this literature is provided by Arellano and Hahn

(2005). Further citations are provided in Section 1.1.

are also often computationally burdensome except in very simple cases. Fixed effects approachesare therefore often preferred in empirical applications despite their potentially poor finite sampleproperties.

This paper considers settings where individuals may be grouped at different levels. This type ofhierarchical setup is actually quite common in economics. For example, households may be groupedat the school district, county, or state level, while firms may be grouped according to course or fineindustry classifications using SIC codes to a given number of digits. Individuals may also be groupedbased on other observable information. In finance, for example, one often considers firms sortedinto 25 groups based on quintiles of their size and market-to-book ratios. We consider “groupedeffects” estimators, which estimate model parameters treating individual specific effects as if theyare constant within groups at a particular level. Such estimators may be naturally thought of asintermediate to pooled and fixed effects estimators. They may be thought of as a particular type ofrandom effects estimators, since they restrict the distribution of unobserved individual level effectsand their relationship with observables.

We consider an asymptotic sequence where N and T go to infinity jointly. We show that groupedeffect estimates suffer from two sources of bias. The first is due to incidental parameters, and isof order 1/(NgT ) since each group-level parameter is estimated using a total of NgT observationswhere Ng is the number of individuals in group g. The second arises from model misspecification, inthe sense that individual specific heterogeneity is incorrectly assumed to be constant within groupsof individuals. We provide conditions on the sampling scheme and the behavior of unobservableswithin groups such that the group effects estimator is asymptotically unbiased and normal andshow that this asymptotic framework leads to useful insights in practice. These conditions suggesta tradeoff between the two sources of bias. We study this tradeoff in a Monte Carlo study, wherewe find that it plays a crucial role in determining finite sample properties of estimates of structuralparameters. We find that grouped effects estimators can offer large gains in finite samples relativeto fixed effects approaches, even in situations where individual effects vary significantly withingroups.

The key conditions involve the rate of growth for the number of groups within which individualsare grouped and the rate at which the error from projecting the true individual specific effects ontothe groups goes to zero. To obtain interesting asymptotic results, we require that the numberof groups increases more slowly than information accumulates about common parameters (moreslowly than

√NT ) and that squared approximation error goes to zero more quickly than information

accumulates about common parameters. Satisfying these conditions will necessarily involve placingrestrictions on unobserved heterogeneity. The asymptotic environment we use suggests that our

asymptotic results will be most useful in environments where researchers may not have perfectinformation on how individuals are grouped but have ex ante beliefs or information about potentialgrouping structures that allows a significant portion of the variation in unobserved effects to becaptured by the set of group effects.

It is important to note that analyzing our grouped effects estimators does require assumptionsabout the distribution of unobserved effects. However, these assumptions are very different fromthose employed in typical random effects estimators. We neither assume independence between ob-servables and unobservables nor impose a parametric form for the distribution of unobserved effects.We motivate these assumptions in economically meaningful ways through examples, including twoempirical applications. In addition, our simulations suggest that two commonly used informationcriteria may be useful in deciding between grouping schemes. We also note this approach is com-putationally tractable and easily implemented in standard econometrics software. Taken together,we believe these results offer a useful way to think about models with unobservable individual-leveleffects that will perform quite well in practice with economic data.

1.1. Related Work

Many papers in econometrics have noted biases which arise in nonlinear and/or dynamic paneldata models with incidental parameters; see, for example, the early papers of Nickell (1981) andHeckman (1981). In certain special cases, estimators of common parameters have been developedthat do not depend on unobserved effects. In rare examples where a sufficient statistic for theunobserved effect is available, common parameters may be estimated by conditional maximumlikelihood as in Anderson (1970). For certain models, estimators have been proposed that do notdepend on unobserved effects; see, e.g., Manski (1987), Honore (1992), Honore and Kyriazidou(2000), and several examples discussed in Wooldridge (2002). In general, however, one must eitherestimate the unobserved parameter(s) for each individual or rely on ‘random effects assumptions’described above.

The approach in this paper is related to the recent large-N and T literature for panel datamodels, for example, Lancaster (2002), Hahn and Kuersteiner (2002, 2004), Arellano (2003), Hahnand Newey (2004), or Woutersen (2005), among many others.3 These papers consider fixed effectsestimators with N and T going to infinity jointly, and propose a correction that removes the 1/Tbias resulting from incidental parameters. The resulting bias corrected estimators have been shownin simulations to offer substantial MSE improvements over uncorrected fixed effects estimators.Arellano and Hahn (2005) provide an excellent survey of this literature.

3See also Fernandez-Val (2005), Carro (2006), and Bester and Hansen (2007).

Perhaps the best known random effects assumption is independence between observables andunobservables; see Hausman and Taylor (1981) for a very general discussion of identification andestimation in the linear model and Wooldridge (2002) for a discussion of this type of restrictionin many commonly employed models. Honore and Lewbel (2002) and Lewbel (2005) employ aweaker version of this restriction, where a single regressor is assumed independent of unobservables.Nonparametric random effects estimators are proposed by Lin and Carroll (2000) and Ullah andRoy (1998), with general properties of such estimators established in Henderson and Ullah (2005).These approaches rely on independence between unobservables and observed covariates and onsome degree of additive separability, neither of which is assumed in this paper.

Another common restriction is that unobservables depend upon observables only through alinear index, such as Mundlak (1978), Chamberlain (1980), and Wooldridge (2005). These and manyother papers also specify the distribution of unobservables up to a finite dimensional parameter,which may then be estimated jointly with the common structural parameters, e.g., by maximumlikelihood. Models where the distribution of unobservables is parametric are an important specialcase of hierarchical models, dicussed by Lindley and Smith (1972) and Raudenbush and Bryk(2002), which have a long tradition in Bayesian statistics. Several more recent papers, includingChen and Khan (2007), Gayle and Viauroux (2007), and Bester and Hansen (2008), exploit indexrestrictions to obtain identification of common parameters and marginal effects of covariates withpanel data in very general semi- and nonparametric settings.

Surprisingly few papers have studied the behavior of estimators in panel data models whenassumptions about the distribution of unobserved heterogeneity are violated. Baltagi (1992) con-siders misspecification of error components models, and shows that omitting a component may leadto inconsistency. Matyas and Blanchard (1998) conduct a simulation study to assess the impactof misspecification on commonly used estimators in linear panel data models. In a recent paper,Arellano and Bonhomme (2009) consider the impact of misspecification in parametric random ef-fects models. When the distribution of unobservables does not depend upon observables, they showthat the bias in common parameters depends on the Kullback-Leibler distance between the truedistribution of unobservables and its best approximation in the class of models considered.4

A recent alternative and promising approach, pursued in Honore and Tamer (2006), Cher-nozhukov, Hong, and Tamer (2004), and Chernozukov, Fernandez-Val, Hahn, and Newey (2009),

4Their paper conjectures the result would remain true when the distribution of unobservables depends on x.

However, given that their main results are based on an asymptotic sequence where N and T →∞ jointly, this would

require approximation of a conditional density where the conditioning argument is increasing in dimension with the

sample size.

is to take T as fixed, but impose no restrictions on the distribution of unobservables. In general, inthis setting, common parameters and marginal effects of covariates are not point-identified but maybe restricted to a potentially informative set in the parameter space. This approach is extremelyinteresting, but at present appears to present a challenging computational problem in all but quitesimple models.

Our approach is related to the panel structure model and estimator proposed by Sun (2005),who also studies a panel data model based on grouping individuals. Sun (2005) assumes the modelis linear with Gaussian errors and that individuals are perfectly classified within a finite number ofgroups. He then treats group membership as unobserved to the researcher. Our approach applies togeneral nonlinear models and is based on a sequence of observable grouping structures that classifyindividuals at courser and finer levels. Importantly, none of these grouping structures are assumedto perfectly classify individuals: For any finite N and T , individual specific effects will differ withingroups. Whether our results can be extended to a setting where group structure is estimated bythe researcher is a question we leave for future work.

The remainder of the paper is organized as follows. Section 2 describes our modeling frameworkand estimators and provides general examples. Section 3 presents asymptotic theory, and Section 4presents a brief Monte Carlo study. Section 5 provides empirical examples, and Section 6 concludes.

2. Model and Examples

Denote observed data as wit, where i = 1, . . . , N indexes individuals and t = 1, . . . , Ti indexestime. We consider a panel data model defined by an objective function

QNT (θ, α1, ..., αN ) =1∑i Ti

N∑i=1

Ti∑t=1

ϕ(wit, θ, αi);

i.e., the model is known up to a finite dimensional common parameter, θ, and a set of individualspecific parameters αi. For simplicity, in the body of the paper we focus on the case ϕ = log f ,where f is a density function with respect to some measure and f(w, θ0, αi0) is the p.d.f of wit, αiis a scalar, and the panel is balanced, Ti ≡ T so

∑i Ti = NT . All assumptions and theorems are

restated and proofs given for general M-estimators in the unbalanced case in the appendix. Thefixed effects (hereafter FE) maximum likelihood estimator is then defined as

(θFE , αi

)= argmax

θ,αiNi=1

N∑i=1

QiT (θ, αi), where QiT (θ, α) =1T

T∑t=1

log f(wit, θ, α).

We also suppose the researcher has available a sequence of grouping schemes, which we representfor a given N and T by a collection of index sets,

(2.1) INTg = i : individual i belongs to group g , g = 1, . . . , GNT .

As we suggest below in examples, these groups may be based on wit, or on other observableinformation such as classification of firms based on industry groupings or households based ongeographic locations. For a given N,T , we have GNT groups consisting of Ng individuals each.The indexing by NT is due to groups potentially changing as the panel grows along either orboth dimensions; for readability, we will drop this indexing for the remainder of the paper. Againfor simplicity, in the body of the paper we suppose that groups are equal in size, so that in thebalanced panel case we have Ng = N/GNT .5 As an alternative to the FE estimator, we supposethe researcher considers the group effects (hereafter GE) estimator,

(θG, γg

)= argmax

θ,γgGNTg=1

GNT∑g=1

QgT (θ, γi), where QgT (θ, γ) =1Ng

∑i∈Ig

T∑t=1

log f(wit, θ, γ).

Note that the GE estimator solves an optimization problem with the same objective function as theFE estimator, subject to the linear constraints that αi = γg for all i ∈ Ig and all 1 ≤ g ≤ GNT . It isobvious, but important to note, that the grouped effects estimator nests the fixed effects estimatorwhen GNT = N and Ng = 1 for all g, and the pooled estimator when Ng = N and GNT = 1.

2.1. Two Sources of Bias

To understand the large sample behavior of the FE and GE estimators, it is useful to concentrateindividual- and group- level effects out of the problem. To this end, we define

αiT (θ) = argmaxα

QiT (θ, α) and θFE = argmaxθ

N∑i=1

QiT (θ, αiT (θ)),

Note that, for a given finite T , due to sampling error we will in general have αiT (θ0) 6= αi0.Therefore, with T fixed and N →∞, we will have

θFEp−→ θT where θT ≡ argmax

θlimN→∞

N∑i=1

T∑t=1

E [log f(wit, θ, αiT (θ))]

and in general θT 6= θ0. This is the source of the incidental parameters problem noted by Neymanand Scott (1948).

5Like the balanced panel assumption, we drop the assumption of equally sized groups in the appendix.

For fixed T , we may view θFE as the solution to a misspecified problem, in the sense that ifone replaces αi(θ) with

αiT (θ) = argmaxα

E [QiT (θ, α)] ,

one would have θT = θ0. That is, noise in estimation of individual specific parameters causescommon parameters to be inconsistent. Intuitively, one gets bias terms of order 1/T since eachindividual specific effect is estimated using T observations. Because the problem is nonlinear, thesebias terms enter the probability limit of θFE , leading to inconsistency when T is fixed.

The same heuristic argument may be applied to the grouped effects estimator. Define

γgT (θ) = argmaxγ

QgT (θ, γ) and θG = argmaxθ

GNT∑g=1

QgT (θ, γgT (θ)),

where, as above, Ng is the number of individuals in group g.6 Like the fixed effects estimator, oneobtains θG as the solution to a misspecified problem. With T fixed and N →∞, we have

θGp−→ θT where θT = lim

N→∞

GNT∑g=1

∑i∈Ig

T∑t=1

E [log f(wit, θ, γgT (θ))] ,

and again in general θT 6= θ0. We show below that, via an expansion of γgT (θ) around γgT (θ) =argmax

γE [QgT (θ, γ)], the grouped effects estimator also suffers from ‘incidental parameters bias’

that is of order 1/(NgT ) = GNT /(NT ), as each group level effect is being estimated with NgT

observations. Depending on the behavior of Ng as N and T increase, it is clear that the incidentalparameters bias in θG is potentially (much) smaller order than that in θFE . Here, however, there isan additional source of bias: Replacing γgT (θ) with γgT (θ) does not give θ0. This happens becausein general individual effects will differ within groups; i.e. we will not have γgT (θ) = αiT (θ). Weconsider the discrepancy between individual effects within groups under the sup norm,

ξNT = supg

sup(i,j)∈Ig : i 6=j

|αi0 − αj0| ,

and show in Section 3 that the second source of bias is closely related to ξNT .

6In this heuristic argument, we ignore the fact that Ig and hence the definition of γgT changes with N as well the

fact that γgT (θ)p−→ γgT (θ) = argmax

γE [QgT (θ, γ)] with T fixed and N →∞ for many potential grouping structures.

We do note that this provides the possibility of N →∞, T fixed inference in the grouped effects setup when groups

are such that γgT (θ) = αiT (θ) for all i and g.

2.2. Examples of Restrictions on Unobservables

As part of our assumptions in Section 3, we will need to place restrictions on the behavior of indi-vidual effects αi0 within groups. Here we present two characterizations of data generating processesthat are compatible with our assumptions. Most importantly, note that the restrictions placed onthe data generating process are quite different from most ‘random effects’ estimators. In particu-lar, both examples will allow for fairly general dependence between observables and unobservables,and neither assumes a parametric form for the distribution of αi. Though they are equivalent,we believe it intuitively helpful to discuss the two examples below separately. In both examples,g = 1, 2, . . . , GNT will index groups in a given grouping scheme, while m = 1, 2, . . . will index‘levels’ at which the data can be grouped.

Example 1. Suppose groups can be represented by a sequence of matrices, DNT , where for agiven N and T , DNT ∈ RN×GNT with typical element [DNT ]i,g = 1(i ∈ Ig). These group mem-bership matrices may be generated, for example, by grouping individuals according to quantiles,quintiles, deciles, etc. of a certain observable or set of observables, or by grouping according toother observable information such as SIC codes to a given number of digits. Consider the infeasibleregression of individual level effects on group indicators,

(2.2) (α10 . . . αN0)′ = DNTβ + νNT ,

and let R2NT be the R-squared of this regression. For ξNT → 0, we must have that the error sum

of squares in this regression goes to zero as GNT increases, or equivalently that R2NT → 1. It turns

out that a key ingredient in understanding the bias in θG will be the rate at which this occurs.

For comparison, it is useful to consider a very simple benchmark where αi i.i.d.N(0, 1) and thesequence of (in this case non-nested) groupings

is formed for each N,T by assigning each

individual i at random to one of GNT equally sized groups. Though in this case groupings contain noinformation about individual-specific effects, it is an easy excersise to show that R2

NT = O(GNT /N),so that the R-squared of regression above goes to one linearly with the number of groups. That is,with uninformative groups, we must have GNT /N → 1, essentially implying that one must run fixedeffects to keep this source of bias small. As we show below, in order for θG to be asymptoticallyunbiased, we will need that groupings contain information about αi.

Example 2. Suppose unobservables have an error components structure,

(2.3) αi0 =∞∑m=1

λmηgm(i),

where gm(i) is a sequence of group assignment functions mapping 1, 2, . . . , N 7→ 1, . . . , GNT ,gm(i) = gm(j) ⇔ i, j ∈ Img for some g. Each ηk is assumed i.i.d. with unit variance, and forsimplicity is assumed to have compact support. The number of groups is determined by MNT , thelevel at which the researcher chooses to hold individual specific heterogeneity constant (which is afunction of the panel size), with m = 0 corresponding to pooling over the entire sample (GNT = 1).For example, suppose that every time m is increased by one, new groups are formed by dividingeach current group into k equal sized blocks of observations. Ignoring integer problems, this givesGNT = min

kMNT , N

. In this setting, the behavior of ξNT may be characterized by the behavior

of λm as m→∞, since with compactly supported η we have

(i, j) ∈ Img ⇒ |αi0 − αj0| ≤ ∆∞∑

m=MNT+1

Note that λm may be thought of as the standard deviation of the error component at group levelm. At a minimum, λm will need to be summable in order for ξNT

p−→ 0, which requires that thevariance of group-level errors goes to zero as one moves to finer grouping schemes.

Note that both examples allow for dependence between observables and unobservables. Forexample, group-level averages of covariates may differ substantially, such as mean levels of incomeacross counties or states. In addition, groups may be formed by sorting individuals based on somecovariates, such as sorting of firms into quintiles based on size and book-to-market. Furthermore,neither assumes ‘perfect classification’ of individuals, in the sense that unobservables are allowedto differ within groups for any finite GNT . As will be shown below, the relative magnitude of biasesdue to incidental parameters and misspecification of individual level heterogeneity is determinedby the rate at which information about unobservables accumulates as groups are added.

3. Asymptotic Theory

The main theorems in our paper establish consistency and asymptotic normality of θG under as-sumptions about the sampling environment and the behavior of unobservables within the observablegroup structure. As in the previous section, assumptions are stated for the balanced panel casewhere groups are equally sized and αi are scalar. All proofs are given in the appendix.7

Throughout this section we define EitWit =∫WitdFit as the expectation with respect to the

marginal distribution Fit associated with individual i at time t, EiW = limT→∞1T

∑Tt=1EitWit,

7In the appendix, assumptions are restated to apply to unbalanced panels, unequally sized groups, and dim(αi) ≥ 1,

and proofs are given under these more general assumptions.

and EW = limN,T→∞ 1N

∑Ni=1

∑Tt=1EitWit. For the vector (x1, . . . , xm) ∈ Rm, let ‖x‖1 =∑m

j=1 |xj |. Let ϕit(θ, α) ≡ ϕ(wit, θ, α) where θ ∈ Θ ⊂ Rk and α ∈ A ⊂ R. We write α0(i)to denote a sequence αi0i∈N whose elements are in A, and let `∞(A) be the space of such se-quences equipped with the norm ‖α0‖ = supi∈N |α0(i)|. Finally, define a norm over Rm × `∞(A)as ‖x, α0‖1,∞ = ‖x‖1 + supi∈N |α0(i)|. Denote by Bε(x) the ball of radius ε around x ∈ Rk+1 inthe norm ‖ · ‖1, and Bε (x, α0(·)) the ball of radius ε around (x, α0(·)) ∈ Rk × `∞(A) in the norm‖ · ‖1,∞. For u ∈ Rk+1, define the differentiation operator Duφ(θ, α) = ∂‖u‖1φ/∂θu1

1 . . . ∂θukk ∂αuk+1 .

Assumption C. Assume that Θ×A ⊂ Rk+1 is compact in ‖ · ‖1. As N →∞ and T →∞ jointly,which we denote by N,T → ∞, the following hold

(i) wit, α0(i) are independent across i. For each i, wit is strong mixing with mixing co-

efficients ai(m), and ∃τ ∈ 2N and r > τ such that supi |ai(m)| ≤ Cm(1−τ)rr−τ −ε, where

0 < C <∞ and ε > 0.(ii) ∃M(wit) with supi,tEitM(wit)τ+ε ≤ ∆ <∞ such that, ∀(θ, α), (θ, α) ∈ Θ×A and ‖u‖1 ≤ 3,|Duϕit(θ, α)−Duϕit(θ, α)| ≤M(wit)‖(θ, α)− (θ, α)‖1 and supΘ×A |Duϕit(θ, α)| < M(wit).

(iii) For each i and ε1 > 0, and for each ε2 > 0,

limT→∞

Eiϕit(θ0, αi0)− sup(θ,α)/∈Bε1 (θ0,αi0)

limT→∞

Eiϕit(θ, α) > 0

limN,T→∞

∑i,t

Eitϕit(θ0, αi0)− sup(θ,α)/∈Bε2 (θ0,α0(·))

limN,T→∞

∑i,t

Eitϕit(θ, α) > 0.

Assumption C consists of standard conditions used to verify consistency of M-estimators andare sufficient to establish consistency of the FE estimator. Parts (i) and (ii) are mixing and momentconditions on the data, respectively, while part (iii) assumes unique maximization of the populationobjective function as T → ∞ for each i and as N,T → ∞. Our next assumption is about thesequence of grouping schemes that define the GE estimator. Though used to verify consistency, westate this assumption separately for discussion purposes.

Assumption G’. There exists a sequence of partitions of the data into groups such that for eachN,T, we have GNT equally sized groups defined by index sets of the form (2.1).The number ofindividuals per group is Ng =

∑i 1 (i ∈ Ig), and we define Ng = 1

∑gNg ≡ N/GNT . We define

ξNT = supg supi,j∈Ig |α0(i)− α0(j)|. As N,T → ∞,11

(i)(Ng−1Ng

)ξNT → 0

Assumption G’ is a high level condition which asserts that either the maximum discrepancybetween individuals within groups goes to zero as the sample size increases, or that the data areeventually grouped at the individual level.8 We show below how the former condition can be verifiedin the two examples discussed in Section 2.2. We now state our basic consistency result for θG.

Proposition 1. Under Assumptions C and G’,(θG, γg

)p−→ (θ0, αi0), where convergence is

in the norm ‖·‖1,∞.

Proposition 1 is proven in the appendix under general conditions including unbalanced panelsand vector αi. Note that under Assumptions C and G’, we can consistently estimate both commonparameters and individual specific parameters for each i with the grouped effects estimator. Thisresult follows from the fact that the fixed effects estimator is consistent as T → ∞ regardless ofN for sensible models such that Assumption C(iii) is satisfied and that Assumption G’ impliesthat one is eventually running fixed effects or unobserved effects are “close enough” within groupsto be well-estimated by group effects. Note that in a fixed-T environment, the fixed effects andgrouped effects estimators would generally have different probability limits and the grouped effectscould remain consistent with sufficient “smoothness” in the individual effects within the groupingstructure.

We now state additional assumptions that allow us to establish asymptotic normality of θG.Before proceeding, it is useful to define

Hααg

(θ, αii∈Ig

∑i∈Ig

T∑t=1

(∂2ϕit(θ, αi)

∂α2

Hθθ(θ, αiNi=1

N∑i=1

T∑t=1

(∂2ϕit(θ, αi)∂θ∂θ′

and write Hθθg0 = Hθθ

g (θ0, αi0), and similarly for Hααg0 . Quantities such as Hαθ

g are definedanalogously, with superscripts denoting differentiation of ϕ and subscript g denoting averagingonly at the group level. Also define the Hessian for the common parameter θ in the problem with

8Recall that with Ng = 1 for all g, the researcher is running fixed effects.

αi concentrated out,

JNT =1

GNT∑g=1

[Hθθg0 −Hθα

(Hααg0

)−1Hαθg0

Finally define S∗it = uθit −Hθαg0

(Hααg0

)−1uαit, where uθit = ∂ϕit(θ0,αi0)

∂θ − Eit ∂ϕit(θ0,αi0)∂θ , i.e., the differ-

ence between the score for θ and its expectation, and uαit is the same quantity for α.

Assumption G. Assumption G’ holds, with G’(i) replaced by (i) below, and the additional con-ditions listed below.

(i)√NT

(Ng−1Ng

)ξ2NT → 0

(ii) GNT /√NT → 0

(iii) |dFit(w)− dFjt(w)| < C(w)|α0(i)− α0(j)| for some C with supi,tC(w)/dFit(w) ≤M(w).

Establishing asymptotic normality requires strengthening Assumption G’ in two important ways,corresponding to the two sources of bias discussed in Section 2.1. First, we require either thesquare of the discrepancy between α0(i) within groups in the sup norm or the ratio (Ng − 1)/Ng

to decrease to zero faster than√NT . This strengthening of Assumption G’ is required so that the

‘pooling bias’ caused by individual effects differing within groups does not enter the asymptoticdistribution of θG. The (Ng − 1)/Ng term in G(i) means that in the case where groups are notsufficiently informative, the researcher may still be able to satisfy Assumption G(i) by running fixedeffects. Second, Assumption G(ii) stipulates that the rate at which groups are added grows slowerthan

√NT , which ensures that bias due to incidental parameters does not enter the asymptotic

distribution of the GE estimator. Recalling that the FE estimator sets Ng = 1 and GNT = N , wesee that fixed effects estimators satisfy G(i)-(ii) when N/

√NT → 0, or in other words T grows

faster than N . This is a well known necessary condition (c.f. Hahn and Kuersteiner (2002) forthe dynamic linear model and Hahn and Newey (2004) for the general nonlinear case) for theasymptotic distribution of θFE to be correctly centered. Assumption G(iii) is a technical conditionon smoothness of individual specific marginal distributions in the parameter αi.9

When Ng > 1, there is a natural tradeoff between assumptions G(i) and G(ii). As we shall seein examples below, they provide lower and upper bounds, respectively, on the growth rate of GNT .

9Note that, in the likelihood case, Assumption C(ii) requires Lipschitz continuity of the log likelihood and its

derivatives, while G(iii) requires Lipschitz continuity of the likelihood in α. For convenience, we assume that in the

case where A is continuous, Fit has a density so that |dFit − dFjt| may be interpreted in the obvious way.

Understanding this tradeoff allows us to understand the tradeoffs between bias due to pooling andbias due to incidental parameters. This will be the focus of simulation experiments and empiricalexamples in Sections 4 and 5, where we argue that understanding this tradeoff is critical to under-standing the finite sample properties of group effects estimators.

Assumption N. As N →∞ and T →∞ jointly, the following hold

(i) 1√NT

∑Ni=1

∑Tt=1Eit

∂ϕit(θ0,αi0)∂θ → 0 and supg

∣∣∣∣ 1√NgT

∑i∈Ig

∑Tt=1Eit

∂ϕit(θ0,αi0)∂α

∣∣∣∣→ 0

(ii) Let ΩiT = Var(

∑Tt=1 S

), and λiT be the minimum eigenvalue of ΩiT . Assume

infi infT λiT > 0 and that Ω = limN,T→∞ 1N

∑Ni=1 ΩiT exists.

(iii) limN,T→∞Hθθ(θ, α0(i)) exists for all (θ, α0(i)) ∈ Θ × `∞(A). limN,T→∞Hααg0 and

limN,T→∞Hθαg0 exist for all g.

(iv) Letting λg be the minimum eigenvalue of Hααg0 , we have infg λg ≥ δ > 0, where δ does not

depend on N,T.(v) J = limN,T→∞ JNT exists and has minimum eigenvalue λJ ≥ δ > 0.

Assumption N consists of standard conditions used to verify asymptotic normality of M-estimators. With the exception of allowing Ng ≥ 1 in N(i) and the g subscripts in N(iii) andN(iv), these conditions are essentially identical to those required to establish asymptotic normalityof the FE estimator. Notice that Assumptions C, G, and N allow for very general dependence inthe time series direction.

Proposition 2. Under Assumptions C, G, and N,√NT

(θG − θ0

)d−→ N(0, J−1ΩJ−1).

Proposition 2 establishes that θG is asymptotically normal and unbiased, and is proven in theappendix under more general conditions. It is worth comparing the result in Proposition 2 to similarresults for the fixed effects estimator as in, for example, Hahn and Kuersteiner (2004). Under anasymptotic sequence where N

T → c < ∞, we have√NT

(θFE − θ0

)d−→ N(cB, J−1ΩJ−1) where

cB is bias resulting from incidental parameters. We see that the fixed effects estimator and thegrouped effects estimator are both asymptotically normal with the same variance but differentcentering under this sequence. Specifically, the fixed effects estimator is biased due to incidentalparameters while the grouped effects estimator, which exploits “smoothness” in the underlyingunobserved effects, is not. Exploiting the assumed smoothness allows the grouped effects estimator

to remain correctly centered and asymptotically normal in situations where fixed effects or bias-corrected fixed effects are dominated by bias or may even be divergent when centered around thetrue parameter value and normalized by the sample size. We provide additional discussion belowin the context of our illustrations.

The key assumptions underlying Proposition 2 are Asssumptions G(i) and G(ii), which arerelated to the two sources of bias discussed in Section 2.1. Note that Assumption G(ii) is purelyabout how quickly the number of groups may increase as observations are added to the sample.Assumption G(i) is implicitly a restriction on the data generating process, as it presumes it ispossible to group individuals so that the maximum within group discrepancy in the unobservableeffects goes to zero sufficiently quickly as groups are added. We discuss both of these restrictionsin the context of the two examples introduced in Section 2.2.

Example 1 (continued). As above, let R2NT be the R-squared of the (infeasible) regression

(2.2) of individual specific effects αi0 on the GNT dummy variables indicating group membership.Suppose that, as individuals are added to the panel, the researcher chooses to add groups at rateGNT = N δ.

Suppose the sampling scheme is such that N = O(T ρ). In this case, Assumption G(ii), whichcontrols bias due to incidental parameters, requires that T ρδ/T

(1+ρ) → 0, or equivalently δ <12(1 + ρ−1). That is, the faster N increases relative to T , the slower groups must be added to avoidincurring asymptotic biases due to noisy estimates of group-level effects.

It is instructive to compare Assumption G(ii) with the conditions required for consistency ofthe fixed effects estimator. Since GNT = N for the fixed effects estimator, Assumption G(ii) wouldimply N/T → 0 that is, T increases faster than N (or equivalently ρ < 1), which is a well knowncondition for asymptotic unbiasedness of θFE . When T and N increase at the same rate (ρ = 1),we have

(θFE − θ0

)d−→ N(cB, J−1ΩJ−1). That is, the asymptotic distribution of the fixed

effects estimator is incorrectly centered, with bias cB arising due to the incidental parametersproblem where N

T → c < ∞. Hahn and Kuersteiner (2004) propose an estimate of this bias andshow that the resulting bias corrected fixed effects estimator is asymptotically unbiased when ρ = 1.Under the additional assumption that wit is i.i.d. for each i, Hahn and Newey (2004) show thata similar bias corrected fixed effects estimator is asymptotically unbiased when ρ < 3.

Returning to θG, assume that as groups are added, ξNT = O(G−κ/2NT ), so that 1 − R2NT =

O(G−κNT ). Recall that, in an example where groups are uninformative, we have that R2NT behaves

like GNT /N . For consistency, Assumption G’(i) requires that either κ > 0 or Ng → 1; that is,

either groups contain some information about unobservable effects or the researcher will eventuallyrun fixed effects.10 For θG to be asymptotically normal and unbiased, Assumption G(i) requiresthat 1

(1 + ρ−1

)< κδρ, or equivalently κδ > 1

(1 + ρ−1

). We see immediately that, in order for

Assumptions G(i) and G(ii) to be compatible, we must have κ > 1. In the case where δ = ρ−1,that is, GNT grows at the same rate as T , the requirement κ > 1 amounts to saying that the errorfrom a projection of unobservables onto group dummies is smaller than the sampling error, 1/

√T ,

that would arise from estimating each unobservable effect separately.

Consider the case ρ = 1, where θFE is asymptotically biased, but the bias correction proposedby Hahn and Kuersteiner (2004) results in an asymptotically unbiased estimator. In this case,Assumption G(ii) simply requires δ < 1, and Assumption G(ii) requires δ > 1

κ . That is, to avoidbias due to incidental parameters appearing in the asymptotic distribution of θG, we must haveGNT growing slower than N , and similarly to avoid biases due to misspecification of heterogeneitywe must have GNT growing faster than N1/κ. As groups become less informative (κ ↓ 1), we areforced to add groups at a rate very close to the rate at which individuals are added to the panel,and with κ = 1, both fixed and group effects estimators are asymptotically biased.11

Finally consider the case ρ = 3, where even with bias correction and wit assumed i.i.d. foreach i, fixed effects estimators are still biased asymptotically. Assuming for simplicity that κ = 4/3,Assumptions G(i) and G(ii) require that 1

2 < δ < 23 . In other words, it is possible for θG to be

asymptotically unbiased in a setting where, to our knowledge, any estimator that attempts toestimate all individual-level parameters will be biased asymptotically.

Example 2 (continued). Consider the error components structure (2.3) with a sequence ofgroupings indexed by m, where m = 0 denotes the entire sample (G = 1). Suppose that for eachm, the m + 1st grouping scheme is obtained by dividing each current group into k equally sizedsubgroups. Letting MNT be the level at which the researcher decides to group individuals, wehave GNT = min

kMNT , N

. Assumption G(ii) implies that, in order for biases due to incidental

parameters to go to zero, we must have logNT − 2MNT log k →∞. In other words, when groupsincrease this quickly as one moves up hierarchical levels, MNT must grow like the the logarithm of√NT .12 We can still have Assumption G(ii) satisfied in this case; for example, when the standard

10For the remaining discussion we rule out the case Ng → 1, so that bias due to pooling depends on ξNT .11In theory, it is possible to bias correct θG as well. We do not pursue this extension here, as the theory and our

simulations suggest this will only improve the performance of group effects estimators when the grouping structure

contains very little information about αi, in which case we would advocate the bias corrected fixed effects estimator.12Note that if MNT is chosen such that GNT grows faster than N , we assume the researcher runs fixed effects,

and G(ii) again reduces to logNT − 2 logN →∞, i.e. T grows faster than N .

deviation of error components at hierarchical level m behaves like λm = O(e−ωm), AssumptionG(ii) requires that 4ωMNT − logNT →∞, which is compatible with G(i) when ω > 1

2 log k.

This setting is essentially equivalent to the previous example. To see this, recognize that∑∞m=MNT+1 λ

2m = O(e−2ωMNT ), and that here MNT = logGNT

log k , we have that the residual sumof squares from regression of the αi0 on a set of group level dummy variables would be at mostO(G−2ω/ log k

NT ). The requirement for G(i) and G(ii) to be compatible here is therefore sufficient toguarantee that the R-squared of the regression (2.2) approaches one at a rate faster than G−1

NT ,as we discussed above for Example 1. We could then set MNT = logN δ and proceed to deriverestrictions on δ in the same fashion as before.

4. Monte Carlo

The previous section provides an asymptotic framework in which we can show that the grouped-effects estimator of the common parameters in the model presented in Section 2 is asymptoticallynormal and unbiased. This asymptotic approximation relies on two important restrictions thatcontrol two sources bias discussed in Section 2.1. First, groups must be added sufficiently slowlyto control bias due to accumulation of incidental parameters. Second, the error from a projectionof unobservables onto group-level dummies must go to zero sufficiently quickly to control bias dueto pooling of individuals with different unobservable effects into groups. Our results suggest thatgroup effects estimators will perform better in finite sample situations in which a researcher hasreasonably good a priori information about unobservables, specifically in the form of a groupingstructure where variation in unobservables is well-explained by classifying individuals into a numberof groups that is reasonably small relative to the sample size.

In this section, we complement the asymptotic analysis of the previous section with simulationevidence regarding grouped effect estimators’ performance relative to fixed effects and bias-correctedfixed effects implemented using the correction of Hahn and Kuersteiner (2004). The simulationresults are obtained within the context of a simple probit model in which unobserved effects aregenerated according to a hierarchical model. Let G = 5, 10, 20, 50, 100, 200, 500, and let Gm referto the mth element of G. For each Gm ∈ G and all g ∈ 1, . . . , Gm, set Ng,m ≡ N/Gm, and letgm(i) =

∑g g1 ((g − 1)Ng,m < i ≤ gNg,m). We generate data from the model

xit, uit, εit, ηgm,m, νi ∼ N(0, 1) i.i.d.

y∗it = αi + βxit + σεεit

yit = 1 (yit > 0)17

αi =7∑

κmηgm(i),m + σννi

σ2ε =

κ2m + σ2

for T ∈ 2, 8, N ∈ 200, 1000, m ∈= 1, . . . , 7 = and gm ∈ 1, . . . , Gm. We report estimates ofthe common parameter, β, for several configurations of the other parameters described below, ineach case relative to a population value β0 = 1.13

By varying the parameters in αi =∑7

m=1 κmηgm(i),m + σννi, we can control the strength ofthe relationship between a proposed grouping scheme and the true unobserved individual-specificeffects. We consider three different specifications which we term “hierarchical”, “mixed”, and“random effects”. In the hierarchical design, we set σν = 0 and set κ = (1, .5, .25, .1, 0, 0, 0) whenN = 200 and κ = (1, .5, .25, .1, .05, .02, 0) when N = 1000. In this case, all of the heterogeneity isbeing generated by group level effects and R2 quickly increases to one as groups are added downthe hierarchical structure. We consider this a baseline, best-case scenario in which one wouldexpect the grouped effects approach to work very well. In the mixed model, we set σν = .25 andκ = (1, .75, .5, 0, 0, 0, 0). In this case, there is variation in the unobserved effects at the individuallevel, and the R2 of the regression of the true unobserved effects on group dummies increases toone very slowly after the first three levels of the hierarchy are controlled for, though the R2 is .967at that point. We expect the group effects estimator to also work quite well in this case since, forthe sample sizes considered, the specification error should be small relative to sampling variationwhich is intuitively what Assumption G(i) requires. We believe this is a very empirically relevantcase and is representative of a situation in which grouping is not perfect but is quite informativeabout the underlying structure of unobserved effects. The random effects specification sets σν = 1and κj = 0 for all j. Here groups are uninformative about the true unobserved effects, and R2

will only approach one if one uses a large number of groups such that GNT /N ≈ 1. In this case,fixed effects should dominate the grouped effects estimator for any grouping structure that is notarbitrarily close to fixed effects for large enough T . However, for small and moderate T , it is notobvious that one estimator should outperform the other.

We report simulation results for the grouped effect estimator for a variety of numbers of groups.With N = 200, we construct grouped effects estimators using each number of groups in the set

13Note that in this example, the parameter β is identified only up to scale. We normalize σε = 1 in the estimation

and rescale estimates by σε so that estimates are always compared to a true value β0 = 1. Note that σ2ε is scaled so

that the conditional variance of y∗it given xit is equal to the variance of αi, so the bias of the pooled probit estimator

is similar in magnitude across all designs.

5, 10, 20, 50, 100; and when N = 1000, we construct grouped effects estimators using each numberof groups in the set 5, 10, 20, 50, 100, 200, 500. We also report results from pooled probit whichobviously corresponds to a grouped effects estimator with G = 1, fixed effects which is the grouped-effects estimator with G = N , and bias-corrected fixed effects using the bias-correction of Hahnand Kuersteiner (2004). All results are based on 1000 Monte Carlo replications.

An important practical problem is group selection. We consider two commonly used informationcriteria that may be useful in deciding between grouping schemes. Specifically, we consider groupselection based on AIC and BIC.14 We consider AIC and BIC because they are extremely simpleto compute and are commonly employed in other areas for model selection. The simulation resultspresented below also suggest that they may be useful in choosing grouping structures that deliverreasonable finite sample performance across all of the designs considered. A more formal analysisof group selection is an interesting and important extension of our present results but beyond thescope of the current paper.

4.1. Simulation Results

We report results from our simulation experiments in Tables 1-3. Table 1 gives the results forthe hierarchical design, Table 2 for the mixed design, and Table 3 for the random effects design.In each table, the column label indicates which grouping scheme was used. “FE” and “FE-BC”respectively denote fixed effects and bias-corrected fixed effects. Columns labeled “1”-“500” usethe grouped effect estimator with the corresponding number of groups. Columns labeled “AIC”and “BIC” provide results based on using the estimator that respectively minimizes AIC or BIC ineach simulation iteration. For each estimator, we report bias, root mean squared error (RMSE), thefraction of simulation replications where the estimator was chosen by AIC (AIC %), the fractionof simulation replications where the estimator was chosen by BIC (BIC %), size of 5% level testsbased on clustering at the individual level (SIZEN ), and size of 5% level tests using five clusterscorresponding to the coarsest grouping scheme (SIZE5).

Looking first at Table 1, we see a number of interesting results. The grouped effect estimators,with the exception of the simple pooled probit with G = 1, uniformly dominate fixed effects and bias-corrected fixed effects on all reported criteria. We also see that, as theory would suggest, the bias

14We recognize that there is some ambiguity in defining BIC in the present context. We choose to use BIC =

−2 log(likelihood) + K∗ log(NT ) where K∗ is the total number of parameters in the model including individual or

grouped effects and NT is the total number of observations. Note that we use K∗ and NT based on the complete data

set including observations that would be dropped in say obtaining fixed effects estimates because they are perfectly

predicted by the fixed effects. We also note that using log(NT ) as the penalty may over-penalize complexity depending

on the specifics of the problem but is a common choice; see, e.g. StataCorp (2007) Reference A-H pp. 169-173.

and RMSE of fixed effects and bias-corrected fixed effects decrease with T but are roughly invariantas N increases. Note, however, that inference based on the fixed effects estimator deteriorates asN increases for a given T . This phenomenon has been noted in the literature but is often ignoredin practice. Intuitively, while it is true that the bias and RMSE of fixed effects estimators becomes‘small’ when T is large, for inference to be reliable, this bias, which behaves like 1/T , must be smallrelative to sampling error in the common parameters, which behaves like 1/

√NT . Even with bias

correction, inference based on fixed effects estimators may suffer from substantial size distortionswhen T is small relative to N . The grouped effects estimators, by making use of “smoothness” ofthe unobserved effects in the grouping structure, avoid this problem and tend to have small biasthat does not dominate the sampling error even for small T and large N . We also see that usingstandard errors clustered at a broad level offers some robustness in terms of size of tests relative togrouping at the individual level when less than optimal numbers of groups are considered.

While the grouped effects estimators dominate fixed effects and pooled probit, we do see con-siderable variation in the performance of the grouped effects estimators across grouping schemes.This suggests that group selection may play an important role in determining the finite sampleperformance of the estimator. The simulation results do show that both AIC and BIC are usefulfor group selection in this design. Using either AIC or BIC to select the grouping structure pro-duces an estimator with good bias and MSE properties as well as tests that have close to correctsize. BIC seems to do slightly better than AIC in the small N setting, but both perform very wellacross the board in this design.

Results from the mixed design, reported in Table 2, are quite similar to those from the purehierarchical design discussed above. We once again see that fixed effects and bias-corrected fixedeffects are uniformly dominated by the grouped effect estimators across all criteria considered, andwe again see that the performance of fixed effects based inference deteriorates rapidly as N increasesfor fixed T. Unlike in the previous case, there is a significant individual specific component that isnot absorbed by the grouping schemes in this design. This has little effect on the overall resultsbecause, after controlling for group-level effects, the variation in individual effects within groups issmall relative to sampling error. It is true that for N and T large enough, the specification errorincurred by estimating a group effects model with group structure as we are using in the simulationwould result in a breakdown of our asymptotic approximation; but we see strong evidence thatthe approximation is quite good in the sample sizes considered. This good performance highlightsthe usefulness of our approach. Importantly, we view this asymptotic environment not as a literaldescription of the sampling process, but as a means to think about estimation and inference incircumstances where a researcher has reasonable but imperfect information about predictability

of unobserved effects given group membership. Finally, we also see that AIC and BIC are botheffective in choosing estimators with good estimation and inference properties in this design.

The final set of results are from a Gaussian random effects model in which the individual specificparameters are not predictable within any of our grouping schemes. This is clearly a worst-case typescenario for our approach in that the only way specification error can be made small is by havingthe number of groups be approximately equal to N. It is thus not surprising that the group-effectestimators are no longer uniformly dominant within this design. Nor is it particularly surprisingthat none of the considered estimators do very well. In the experiments with T=8, bias-correctedfixed effects is comparable, though slightly inferior, to the best considered grouped effect estimatorin terms of bias and RMSE and better in terms of inference properties. With T=2, it is clearly bestto use a grouping scheme with less than N groups despite the obvious specification error. Lookingat the results displayed in the table and extrapolating, it also appears that there would be groupingschemes that would be preferred to bias-corrected fixed effects that did not fall within the supportof grouping schemes that we considered. In this case, there is also a clear and strong distinctionbetween AIC and BIC in terms of group selection with AIC outperforming BIC in each case exceptthe small sample with N=200 and T=2 where AIC is superior in terms of bias and size of tests butBIC produces a smaller MSE estimator.

Overall, the simulation results are quite favorable for the grouped effect approach. Sensiblegrouping strategies with less than N groups outperform fixed effects and bias-corrected fixed effectsin almost every case considered, with the only exceptions being cases with T=8 in the randomeffects design where the grouped effects assumptions are grossly violated. The resulting estimatorsalso tend to have good bias and MSE properties and to perform relatively well in terms of inference.It is also encouraging that AIC and BIC are successful in helping to choose a grouping scheme withreasonable finite sample properties in the hierarchical and mixed designs and that AIC choosesessentially the best available estimator in the random effects design though none of the estimatorswe consider performs particularly well in this case. The results clearly show that grouped effectestimators may significantly outperform fixed effects estimators and suggest that a more formaland systematic treatment of data-dependent group selection may yield interesting results.

5. Empirical Examples

In this section, we apply the grouped effects strategy in two empirical examples. In the first, weconsider the association of firm cash flow, asset tangibility, size, net worth, and market to book towhether a firm has a line of credit as in Sufi (2009). The goal of the analysis is to isolate directeffects of these variables on firm access to a line of credit to provide insight into the types of market

friction that may make lines of credit a poor liquidity substitute for cash for some firms. Thisanalysis is complicated by the presence of firm level characteristics, such as corporate governance,that are both related to a firm’s cash flows and possible credit constraints and are difficult to observeor measure. Simply including firm specific effects to account for such factors is complicated by thebinary nature of the dependent variable and the short time span available. In this example, weargue that firms that have similar realizations of observable variables over the sample period mayhave similar values of unobservables and construct groups based on partitioning the independentvariables. In the second example, we follow Roulstone (2006) in studying the extent to whichcorporate insiders trade on short-term information about how future earnings surprises will affecttheir firm’s share price. Again this analysis is complicated by unobserved firm level characteristicsthat may influence the incentives faced by firm executives. This heterogeneity in incentives mightlead one to want to control for firm specific effects. However, one might also wish to focus on anarrow time window to avoid results being influenced by changes in the regulatory environment.Again, we believe the grouped effects approach is appealing here as the relevant firm specificunobservables are likely relatively constant within sets of firms in the same industry and/or withsimilar realized characteristics over the sample period.

5.1. Bank Lines of Credit

The dependent variable in our first application is a binary variable which is one if a firm has accessto a bank line of credit. Understanding factors associated with firms’ having access to bank linesof credit is an important ingredient to understanding firms’ corporate finance decisions and whattypes of market frictions may exist in the market for firm credit. Specifically, there is a theoreticaland empirical literature on firm cash holdings that argues that firms that are constrained in thecredit market should retain cash to be able to pursue investment opportunities in periods in whichthey are unable to raise sufficient external financing, though the literature is silent on what types ofmarket frictions may lead to these constraints; see Almeida, Campbello, and Weibach (2004) andFaulkender and Wang (2006) for recent examples. There is also a theoretical literature that arguesthat bank lines of credit are a financial product designed to overcome exactly the types of marketfrictions discussed in the cash literature; see, for example, Holmstrom and Tirole (1998). It thusseems useful to try to understand which factors are associated to a firm’s having a line of credit.This question has been addressed in a recent paper, Sufi (2009), which we largely follow here.Our brief analysis complements the detailed analysis in Sufi (2009) by allowing for firm-specificheterogeneity.

For our analysis, we estimate models of the form

P(yit = 1|xit,Firm = i) = Φ(x′itβ + γt + αg(i))

where Φ(·) is the standard normal distribution function, xit is a vector of observed characteristicsfor firm i at time t, γt is a time specific effect, and αg(i) is an unobserved group-specific effect. Thevector xit consists of EBITDA scaled by non-cash total assets to measure firm cash flow, tangibleassets scaled by non-cash assets, the natural logarithm of non-cash total assets to measure firmsize, net worth scaled by non-cash assets, the market to book ratio, and a vector of dummies forone-digit SIC classification. The firm characteristics are motivated by the theoretical literaturementioned above and are meant to be associated with firms facing a high cost of external financerelative to internal finance. The empirical specification is identical to Sufi (2009) with the exceptionof our additional term αg(i). Our results also differ in that our sample period is different; we useannual firm level data from 2002-2003 (T = 2).15 This provides us with a sample of 3648 firms anda total of 7034 observations. Further details regarding data sources and data construction may befound in Sufi (2009).16

We consider a variety of specifications for αg(i) ranging from pooled probit, with αg(i) = α forall i, to fixed effects probit with αg(i) = αi. For the intermediate grouping schemes, we considerforming groups based on realized x’s. Grouping based on the x’s is motivated by the simple beliefthat firms that have similar observables will also have similar values for unobserved firm-specificheterogeneity; similar beliefs have been used fruitfully in other areas of economics; see, for example,Altonji, Elder, and Taber (2005). Specifically, we form groups based on the within-firm sampleaverages of the observed x’s by putting two firms into the same group if their average x’s fall intothe same percentile regions. For example, a grouping scheme with four groups may be based onwhether the average tangible assets and net worth for a particular firm during the sample periodare above or below the median within-firm average tangible assets and median within-firm averagenet worth across all firms. The first of four groups in such a scheme would be comprised of firmswhose average tangible assets and average net worth during the sample period are both above thesample medians of within-firm average tangible assets and within-firm average net worth across allfirms, with the remaining three groups constructed in the obvious fashion. We consider a variety ofsuch grouping schemes ranging between four groups and a potential 3125 groups.17 Motivated by

15Sufi (2009) uses annual data from 1996 to 2003.16We thank Amir Sufi for kindly providing us with the data used in this example.17The potential 3125 groups comes from considering all possible cells using all five variables split at the quintiles.

Not surprisingly, many of these cells are empty in the actual example.

our simulation results, we use AIC to choose among the various grouping schemes, including fixedeffects and pooled probit.18

In Table 4, we report the results from the exercise for pooled probit, fixed effects and bias-corrected fixed effects probit using the correction of Hahn and Kuersteiner (2004), and the AICminimizing group data estimator with reported standard errors clustered at the firm level.19 Look-ing across the table, we see that the fixed effects point estimates are quite different from the pooledprobit and grouped data estimates. We see that there is relatively little difference between the fixedeffect and bias-corrected fixed effect point estimates as we would expect from our simulation as wellas the asymptotic theory underlying bias-corrected fixed effects. Relative to the other estimates,the fixed effects estimates are also incredibly imprecise. Using our T = 2 simulation results as arough guide would suggest that both the fixed effects and bias-corrected fixed effects suffer fromsubstantial bias.

Comparing pooled probit to the grouped effect estimator, we see that the signs all agree and thatthe coefficients are not of wildly different magnitudes. We do see that in some cases the estimatesand precision of the grouped effects estimator are different from pooled probit in important ways.For example, one would conclude there is a significant association between tangible assets andhaving a line of credit and between net worth and having a line of credit at usual significancelevels using pooled probit but would not based on the grouped effects estimator. We also see thecoefficients on EBITDA and market to book from the grouped effects estimator are substantiallyattenuated relative to pooled probit, though both remain statistically significant at conventionallevels. Grouped effects estimates of average marginal effects20 are also generally different frompooled probit, suggesting smaller marginal effects of the covariates on the probability of havinga line of credit then predicted by pooled probit. As such, our preferred estimates are broadlyconsistent with the results and findings of Sufi (2009), though the estimated effects are somewhatsmaller than he reports, suggesting that the findings of his paper are fairly robust to the presenceof firm-specific heterogeneity.

18In total, we considered 44 different grouping schemes formed from various splits on the x-variables. Details are

available upon request.19Using AIC, we select a model with 432 potential groups based on splitting tangible assets, EBITDA, and market

to book into thirds and net worth and size at the quartiles. This results in a total of 405 non-empty groups.20We calculate the average marginal effect of variable xj as 1

∑t βjφ(x′itβ + γt + αg(i)) where φ(·) is the

standard normal density function.

5.2. Insider Trading

In our second application, we consider the relationship between corporate insiders’ decisions to buyor sell own-company stock and insiders’ private information about how future earnings surpriseswill affect the firm’s share price. There is evidence that insiders trade profitably on nonpublicinformation prior to takeovers,21 but empirical evidence on the relationship between insider tradingand earnings announcements is mixed. Studying the relationship between insider trading andearnings announcements is complicated by the changing regulatory environment which complicatesusing long time series data sets. In addition, it is a priori quite plausible that there would beunobserved firm specific factors, such as corporate governance and earnings management, thatwould be related to both insider trading activity as well as the behavior of the firm’s share pricearound earnings announcement dates. We attempt to address these concerns in this exercise bylimiting ourselves to a very short time span where the regulatory regime is hopefully fairly constantand controlling for firm specific heterogeneity using grouped effects schemes based on 4-digit SICcode and information about market value of equity, institutional ownership, and asset turnover.Our brief analysis also provides a useful complement to closely related work of Roulstone (2006)who examines the quantity of insider trading using OLS with firm-specific fixed effects and a Tobitmodel without firm-specific effects.

For our analysis, we estimate separate probit models for insider buys, defined as an indicatorvariable which is one if there were any purchases of own company stock by corporate insiders(defined as top officers and directors) during the period starting one day after the prior quarter’searnings announcement and ending one day before the current quarter’s earning announcement,and insider sales, defined similarly to insider buys. For either buys or sales, the estimated modeltakes the form

P(yit = 1|xit,Firm = i) = Φ(x′itβ + γt + αg(i))

where Φ(·) is the standard normal distribution function, xit is a vector of observed characteristicsfor firm i at time t, γt is a time specific effect, and αg(i) is an unobserved group-specific effect.The vector xit consists of the cumulative abnormal return over the three days -1, 0, and 1 relativeto the earnings announcement date (CARit), CAR2

it, CARit−1, CARit+1, unexpected earnings22

21See, e.g., Meulbroek (1992).22Unexpected earnings is defined as either I/B/E/S-reported actual earnings minus the mean analyst forecast of

earnings or as actual earnings less actual earnings four quarters previously if there is no analyst forcast. Unexpected

earnings are scaled by stock price ten days prior to the earnings announcement.

(UEit), UE2it, UEit−1, UEit+1 and a vector of other control variables.23 For our analysis, we focus

on the variables CAR, CAR2, UE, and UE2 which are meant to capture the news of the earningsannouncement, and note from the timing we are asking whether current news forecasts past insidertrading activity. Our empirical specification is similar to Roulstone (2006) with the exception ofour additional term αg(i) and slight changes in the control variables. Our results also differ in thatour sample period is different; we use quarterly firm level data from 1999 (T = 4).24 Our totalsample consists of 18,527 observations across 5582 firms. Further details regarding data sourcesand data construction may be found in Roulstone (2006).25

As in the previous example, we considered a variety of different grouping schemes and selectedamong them using AIC. In addition to pooled probit and fixed effects probit, we considered groupsbased on one-, two-, three-, and four-digit SIC code as well as groups formed by interacting SICcode with splits on the control variables market value of equity, institutional ownership, and assetturnover. Groups using the x’s are constructed as in the previous example, and we refer the readerto the discussion there for details.26

We report estimation results from pooled probit, fixed effects and bias-corrected fixed effectsprobit using the correction of Hahn and Kuersteiner (2004), and the AIC minimizing group effectsestimator with reported standard errors clustered at the firm level in Table 5.27 As in the previousexample, we see that the signs of the estimates generally line up, though there are importantdifferences in magnitudes. For insider buys, we see that fixed effects is AIC preferred to pooledprobit, which is not true in any of the other examples. In all cases, we do see that there is a distinctinterior minimum of AIC away from both fixed effects and pooling.

Looking first at the results for insider buys, we see that there is robust evidence across allspecifications that insider purchases of own-company stock are related to information in futureearnings announcement as measured by CAR and UE. Intuitively, if insiders are able to tradeprofitably on private information, we should expect that the marginal effect of each variable on

23The other control variables are turnover of company stock, an indicator for whether any analysts follow the

firm, the number of analysts following the firm, the return on the firm’s stock from six months to two days prior to

the announcement day minus the market return over the same period, the return on the firm’s stock over the period

two days to six months after the announcement minus the market return over the same period, the percentage of

institutional ownership, the natural logarithm of the firm’s market value of equity ten days prior to the announcement,

and the book to market ratio.24Roulstone (2006) uses all available quarterly data from 1980 to 2002.25We thank Darren Roulstone for providing us with the data used in our analysis.26In this example, we considered 23 different grouping schemes. Details are available upon request.27In this case, AIC is minimized by simply grouping on the four-digit SIC level for both buys and sales which

results in a total of 425 groups.26

the probability of insider trading activity to be positive in the case of buys and negative for sells.For CAR and CAR2, the estimated effects are statistically strong and of the theoretically expectedsign across all reported models; and for UE, the signs are as theoretically expected though theestimate of the coefficient on the second-order term is not statistically strong. There are rathersubstantial differences in magnitudes between the fixed effect and grouped effect estimates whichour simulation results would suggest are largely influenced by small sample bias in the fixed effectsestimates. The grouped effects estimates strongly indicate that there is a moderate but statisticallystrong relationship between insider buying of own-company stock and price moves around futureearnings announcements which clearly suggests that insiders are making trading decisions based onnon-public information about company performance.

The results are much more muddled regarding insider selling. The difficulty in finding aneffect on insider sales is not surprising and is consistent with the existing literature. For example,Roulstone (2006) argues that the impact of earnings announcements on insider selling may be hardto identify due to liquidity trades (e.g., selling by insiders for portfolio rebalancing purposes). In ourresults, we see that the estimated average marginal effects of both CAR and UE are economicallysmall across all estimators considered. Again the fixed effects point estimates are quite differentfrom the grouped effects estimates, and the fixed effects estimates are the only estimates whichsuggest a statistically significant effect of CAR or UE on insider sales. Using our simulations andthe theory of fixed effects estimates with small T suggests that this association is likely spurious.Looking at the grouped effects estimates, one could not rule out that there is no relationship ora (theoretically) wrong-signed relationship between insider sales and the earnings variables at anyconventional significance level.

Overall, our preferred group effects results are consistent with a small but quite robust effect offuture earnings announcement returns on insiders’ trading decisions. The results are qualitativelyquite similar to those reported in Roulstone (2006) who uses the same basic data over a muchlonger time span and with a very different specification that does not control for firm effects. Thus,our results complement and further strengthen the results and analysis in that paper.

6. Conclusion

This paper has analyzed group effects estimators for nonlinear panel data models with a finitedimensional common parameter and time invariant individual specific effects that are unobserved tothe econometrician. Group effects estimators hold individual-level heterogeneity constant accordingto an observed grouping structure, and may be thought of as intermediate to pooled and fixedeffects estimators. We provided conditions under which group effects estimators of the common

parameter are asymptotically unbiased. These conditions suggest a tradeoff between two sources ofasymptotic bias, one due to the well known incidental parameters problem suffered by fixed effectsestimators, and another arising from discrepancies in individual level effects within groups, that is,misspecification in the structure of unobservable heterogeneity. We illustrated this tradeoff in twoexamples, and a set of simulations that suggest group effects estimators may perform significantlybetter in finite samples relative to pooling or fixed effects. We also considered the group effectsapproach in empirical studies of firm lines of credit and insider trading.

The results in this paper may be extended in several interesting ways. First, following Sun(2005), one can consider a setup where the group structure is unobservable or only partially ob-servable to the econometrician. Second, one may wish to consider bias correction of group effectsestimators. Correction of biases due to incidental parameters should follow from similar approachesfor fixed effects estimators, e.g., Hahn and Newey (2004) and Hahn and Kuersteiner (2004). Finally,though AIC and BIC seem to perform well in selecting grouping schemes in our simulations andempirical examples, we have not yet found a formal justification for this procedure. Our resultssuggest these may be interesting avenues for future research.

7. Appendix

For a random variable Wit, let Eit[Wit] =∫WitdFit be the expectation with respect to the marginal dis-

tribution of the data for individual i at time t, and let EiW = 1Ti

∑Tit=1 Eit[Wit] where Ti is the number of

observations for individual i. Let T = 1N

∑Ni=1 Ti. Throughout, α and |α| denote a real scalar (or vector)

and its absolute value (sum of absolute values of its elements) while α(·) and ‖α(·)‖ = supi∈N |α(i)| denote asequence of real numbers (vectors) and its supremum norm. For a real vector θ, define ‖(θ, α(·)‖ = |θ|+‖α(·)‖.

Assumption 1. Let N,T → ∞ indicate that N →∞ and each Ti →∞ jointly in such a way that TiT → ρi

and supi |TiT − ρi| → 0 where infi ρi ≥ δ > 0 and supi ρi ≤ ∆ < ∞. For notational readability, we indexelements of a sequence indexed by N and (T1, ..., TN ) only by NT .

Assumption 2. wit, α0(i) are independent across i. For each i, wit is a strong mixing sequencewith mixing coefficient ai(m) = supt supB1∈Bi−∞,t,B2∈Bit+m,∞ |P (B1 ∩ B2) − P (B1)P (B2)| where Bi−∞,t =σ(wit, wit−1, wit−2, ...) and Bit,∞ = σ(wit, wit+1, wit+2, ...), and there exists a τ ∈ 2N and r > τ such that

supi|ai(m)| ≤ Cm(1−τ)rr−τ −ε for some C > 0 and some ε that satisfies ε− δ > 0 for some δ > 0.

Assumption 3. Let ϕ(wit; θ, α) be a function indexed by the parameters θ ∈ Θ and α ∈ A where Θ and Aare compact, convex subsets of Rk and Rp respectively, and let ϕit(θ, α) ≡ ϕ(wit; θ, α). Assume ϕit(θ, α) iscontinuous in θ and α. Let θ0 ∈ int(Θ) and α0(i) ∈ int(A) for each i = 1, ..., N be such that for each i andη > 0,

limTi→∞

Ei[ϕ(θ0, α0(i))]− sup(θ,α):|(θ,α)−(θ0,α0(i))|>η

limTi→∞

Ei[ϕ(θ, α)] > 0

Also assume that for each η > 0,

limN,T→∞

N∑i=1

Ti∑t=1

Eit[ϕit(θ0, α0(i))]

− sup(θ,α(·)):‖(θ,α(·))−(θ0,α0(·))‖>η

limN,T→∞

N∑i=1

Ti∑t=1

Eit[ϕit(θ, α(i))] > 0

where ‖(θ, α(·))‖ =∑kj=1 |θj |+ supi

∑pj=1 |αj(i)|.

Assumption 4. Let v = (v1, ..., vk)′ and u = (u1, ..., up)′ be vectors of nonnegative integers. Define

D(v,u)ϕit(θ, α) = ∂|v|+|u|ϕit(θ,α)

∂θv11 ...∂θ

vkk ∂α

u11 ...∂α

. Assume there exists a function M(wit) such that |D(v,u)ϕit(θ2, α2)−D(v,u)ϕit(θ1, α1)| ≤ M(wit)‖(θ2, α2) − (θ1, α1)‖ind for all (θ1, α1), (θ2, α2) ∈ Θ × A and |v| + |u| ≤ 3 andthat sup(θ,α)∈Θ×A ‖D(v,u)ϕit(θ, α)‖ ≤M(wit) for |v|+ |u| ≤ 3. Assume that supi,t Eit[M(wit)τ+ε] ≤ ∆ <∞for some ε that satisfies ε− δ > 0 for some δ > 0.

Assumption 5. Suppose that there exists a sequence of partitions such that for each N,T1, ..., TN thedata are partitioned into GNT groups consisting of

∑i∈Ig Ti observations for g = 1, ..., GNT where Ig =

i : individual i belongs to group g. Let Ng denote the number of elements in the set Ig, and let Tg =1Ng

∑i∈Ig Ti. Let Ng = 1

∑GNTg=1 Ng and assume supg |

NgNg− ζg| → 0 where supgζg ≤ ∆ < ∞ and

infg ζg ≥ δ > 0. Let ξNT = supg supi,j∈Ig max1≤s≤p |α0,s(i) − α0,s(j)| where α0,s(·) is the sth element of

vector α0(·). Assume(

supgNg−1Ng

)ξNT → 0 such that

Ng−1Ng

)ξ2NT → 0 as N,T → ∞.

Note that indexing of g by NT is supressed throughout the paper for readability; the objects Ig, Tg,Ng, etc. are defined with respect to the partition for a given sample size N,T.

Assumption 6. |dFit(wit)− dFjt(wit)| ≤ C(wit)|α0(i)− α0(j)| with supi,t C(wit)/dFit(wit) ≤M(wit).

Assumption 7. Let λg be the minimum eigenvalue of 1NgTg

∑i∈Ig

∑Tit=1 Eit

[∂2ϕit(θ0,α0(i))

∂α∂α′

]. Assume that

infg λg ≥ δ > 0 where δ does not depend on N or (T1, ..., TN ).

Assumption 8. Suppose Hθθ(θ, α(·)) = limN,T→∞ 1NT

∑Ni=1

∑Tit=1 Eit[

∂2ϕit(θ,α(i))∂θ∂θ′ ] exists for all (θ, α(·)) ∈

Θ× (×∞i=1A) and let Hθθ0 = Hθθ(θ0, α0(·)). Let

JNT =1NT

GNT∑g=1

∑i∈Ig

Ti∑t=1

[∂2ϕit(θ0, α0(i))

∂θ∂θ′

]− κgEit

[∂2ϕit(θ0, α0(i))

∂α∂θ′

∑i∈Ig

Ti∑t=1

[∂2ϕit(θ0, α0(i))

∂θ∂α′

] 1NgTg

∑i∈Ig

Ti∑t=1

[∂2ϕit(θ0, α0(i))

∂α∂α′

and suppose J = limN,T→∞ JNT exists and has minimum eigenvalue λJ ≥ δ > 0.

Assumption 9. GNT (infg NgTg)−τ/2 → 0 and GNT /√NT → 0 as N,T → ∞.

Assumption 10. For i ∈ Ig, let S∗it =√ρi(uθit − κguαit

)where uθit = ∂ϕjt(θ0,α0(j))

∂θ − Eit[∂ϕjt(θ0,α0(j))

]and uαit = ∂ϕjt(θ0,α0(j))

∂α − Eit[∂ϕjt(θ0,α0(j))

]. Let Ωi,Ti = Var

(1√Ti

∑Tit=1 S

), and let λi,Ti be the minimum

eigenvalue of Ωi,Ti . Assume that infi infTi λi,Ti > 0 and that Ω = limN,T→∞ 1N

∑Ni=1 Ωi,Ti exists.

Assumption 11. 1√NT

∑Ni=1

∑Tit=1Eit

[∂ϕit(θ0,α0(i))

]→ 0, 1√

∑Ni=1

∑Tit=1 κgEit

]→ 0, and

supg ‖ 1√NgTg

∑i∈Ig

∑Tit=1Eit

]‖ → 0.

7.1. Consistency

Lemma 1. Under Assumptions 1, 2, and 4 1NT

∑Ni=1

∑Tit=1 (ϕit(θ, α(i))− Eit[ϕit(θ, α(i))])

p−→ 0.

Proof. Note that

N∑i=1

Ti∑t=1

(ϕit(θ, α(i))− Eit[ϕit(θ, α(i))])

N∑i=1

(TiT− ρi

Ti∑t=1

N∑i=1

ρi1Ti

Ti∑t=1

(ϕit(θ, α(i))− Eit[ϕit(θ, α(i))]) .

We then have

E∣∣∣∣ 1N

N∑i=1

(TiT− ρi

Ti∑t=1

∣∣∣∣∣≤ 1N

N∑i=1

∣∣∣∣TiT − ρi∣∣∣∣ 1Ti

Ti∑t=1

E |ϕit(θ, α(i))− Eit[ϕit(θ, α(i))]|

≤ 1N

N∑i=1

∣∣∣∣TiT − ρi∣∣∣∣ 1Ti

Ti∑t=1

= ∆ supi

∣∣∣∣TiT − ρi∣∣∣∣→ 0

where the first inequality follows from the triangle inequality, the second from Assumptions 2 and 4, and theconvergence to zero from Assumption 1. Thus,

N∑i=1

(TiT− ρi

Ti∑t=1

(ϕit(θ, α(i))− Eit[ϕit(θ, α(i))]) = op(1).

Abusing notation to define ZiT = ρi1Ti

∑Tit=1 (ϕit(θ, α(i))− Eit[ϕit(θ, α(i))]), the conclusion follows from

Lemmas 1 and 3 in Hansen (2007). 30

Lemma 2. Let γ = (θ, α(·)) and ‖γ‖ be as in Assumption 3. Let Γ = Θ × (×∞i=1A). Under Assumptions1, 2, and 4,

∣∣∣ 1NT

∑Ni=1

∑Tit=1 ϕit(θ, α(i))− 1

∑Ni=1

∑Tit=1 ϕit(θ, α(i))

∣∣∣ ≤ BNT ‖γ − γ‖ for γ, γ ∈ Γ andBNT = Op(1).

Proof. ∣∣∣∣∣ 1NT

N∑i=1

Ti∑t=1

ϕit(θ, α(i)) − 1NT

N∑i=1

Ti∑t=1

ϕit(θ, α(i))

∣∣∣∣∣≤ 1NT

N∑i=1

Ti∑t=1

M(wit)‖(θ, α(i))− (θ, αi)‖ind

≤ 1NT

N∑i=1

Ti∑t=1

M(wit)

k∑j=1

|θj |+ supi

p∑j=1

|αj(i)|

N∑i=1

Ti∑t=1

M(wit)

)‖γ − γ‖

where the first inequality follows from the triangle inequality and Lipschitz condition in Assumption 4.Defining BNT =

∑Ni=1

∑Tit=1M(wit)

), Assumption 2 and 4 can be used to show BNT = Op(1) using

standard arguments.

Lemma 3. Let ΓG = γ ∈ Γ : |α(i)− α(j)| = 0, ∀g, ∀i, j ∈ Ig for Ig defined in Assumption 5. For γ =(θ∗, α∗(·)) ∈ Γ and γG = (θ∗, αG(·)) ∈ ΓG where αG(i) = 1

∑j∈Ig α

∗(j), we have ‖γ − γG‖ → 0 underAssumption 5.

Proof. Under Assumption 5, we have(

supgNg−1Ng

)ξNT → 0. The conclusion then follows from

‖γ − γG‖ = supi

p∑j=1

|α∗(i)− αG(i)|

= supi

p∑j=1

|α∗j (i)−1Ng

∑s∈Ig

α∗j (s)|

= supi

p∑j=1

∑s 6=i∈Ig

(α∗j (i)− α∗j (s))|

≤ supi

p∑j=1

∑s6=i∈Ih

|α∗j (i)− α∗j (s)|

= supi

∑s6=i∈Ih

p∑j=1

|α∗j (i)− α∗j (s)|

≤ supi

∑s6=i∈Ih

Ng − 1Ng

)ξNT → 0.

Proposition 1. Let γG = (θ, αG(·) = arg max(θ,α(·))∈ΓG1NT

∑Ni=1

∑Tit=1 ϕit(θ, α(i)) for

ΓG = γ ∈ Γ : |αG(i)− αG(j)| = 0 ∀ i, j ∈ Ig

for Ig defined in Assumption 5. If Assumptions 1, 2, 3, 4, and 5 are satisfied, then γGp−→ γ0 = (θ0, α0(·)).

Proof. We prove the result by verifying the conditions of Newey and Powell (2003) Lemma A1. We notethat Γ = Θ× (×∞i=1A) is compact for the norm ‖ · ‖ defined in Assumption 3. The conditions of Newey andPowell (2003) Lemma A2 are therefore satisfied using Lemmas 1 and 2. Lemma A2 of Newey and Powell(2003) implies conditions (i) and (ii) of Newey and Powell (2003) Lemma A1, and Lemma 3 above impliescondition (iii). Thus, the conditions of Newey and Powell (2003) Lemma A1 are satisfied, and the conclusionfollows. .

7.2. Asymptotic Normality

In the following, we let αg(θ) = arg maxα∈A 1NgTg

∑i∈Ig

∑Tit=1 ϕit(θ, α) and use αg = αg(θ0) ∈ A. We

similarly use θ = arg maxθ∈Θ1NT

∑GNTg=1

∑i∈Ig

∑Tit=1 ϕit(θ, αg(θ)) and note that the estimators (θ, α(θ))

obtained in this way are numerically identical to the solution to

(θ, α1, ...αG) = arg maxθ∈Θ,α1∈A,...,αG∈A

GNT∑g=1

∑i∈Ig

Ti∑t=1

ϕit(θ, αg).

Throughout the following we use superscripts to denote partial differentiation; e.g. ϕαit = ∂ϕit∂α and

ϕθαit = ∂2ϕit∂θ∂α′ . We also let αg = 1

∑i∈Ig ωiα0(i) for weight

∑i∈Ig

Ti∑t=1

Ejt [ϕααit (θ0, α0(j))]

−1(Ti∑t=1

for some j ∈ Ig.

7.2.1. Expansion for αg

By definition, αg = arg maxα∈A 1NgTg

∑i∈Ig

∑Tit=1 ϕit(θ, α) which implies that

∑i∈Ig

Ti∑t=1

ϕαit(θ0, αg).(7.1)

Expanding (7.1) about αg = αg yields

∑i∈Ig

Ti∑t=1

ϕαit(θ0, αg) +1

∑i∈Ig

Ti∑t=1

ϕααit (θ0, αg)(αg − αg)(7.2)

where αg are intermediate values satisfying |αg − αg| ≤ |αg − αg|. Further expanding (7.2), we obtain

∑i∈Ig

Ti∑t=1

ϕαit(θ0, α0(i)) +1

∑i∈Ig

Ti∑t=1

ϕααit (θ0, αg(i))(αg − α0(i))(7.3)

∑i∈Ig

Ti∑t=1

ϕααit (θ0, αg)(αg − αg)

where αg(i) are intermediate values satisfying |αg(i)− α0(i)| ≤ |αg − α0(i)|.

It follows from (7.3) with some addition and subtraction that

− 1NgTg

∑i∈Ig

Ti∑t=1

ϕααit (θ0, αg)(αg − αg)

∑i∈Ig

Ti∑t=1

(ϕαit(θ0, α0(i))− Eit [ϕαit(θ0, α0(i))])(7.4)

∑i∈Ig

Ti∑t=1

Eit [ϕαit(θ0, α0(i))](7.5)

∑i∈Ig

Ti∑t=1

(ϕααit (θ0, αg(i))− Eit [ϕααit (θ0, αg(i))]) (αg − α0(i))(7.6)

∑i∈Ig

Ti∑t=1

(Eit [ϕααit (θ0, αg(i))]− Eit [ϕααit (θ0, αg(j))]) (αg − α0(i))(7.7)

∑i∈Ig

Ti∑t=1

(Eit [ϕααit (θ0, αg(j))]− Ejt [ϕααit (θ0, αg(j))]) (αg − α0(i))(7.8)

∑i∈Ig

Ti∑t=1

(Ejt [ϕααit (θ0, αg(j))]− Ejt [ϕααit (θ0, α0(j))]) (αg − α0(i))(7.9)

∑i∈Ig

Ti∑t=1

(Ejt [ϕααit (θ0, α0(j))]) (αg − α0(i))(7.10)

where j ∈ Ig. Also, note that

∑i∈Ig

Ti∑t=1

(Ejt [ϕααit (θ0, α0(j))]) (αg − α0(i))

∑i∈Ig

Ti∑t=1

− 1NgTg

∑i∈Ig

(Ti∑t=1

(Ejt [ϕααit (θ0, α0(j))])

)α0(i)

where the last equality follows from substituting in the definition of αg. Letting the expressions given indisplays (7.4)-(7.9) be denoted as ψg1-ψg6 respectively, we can then write

αg − αg = −[Hααg

(6∑j=1

ψgj)(7.11)

Hααg =

∑i∈Ig

Ti∑t=1

ϕααit (θ0, αg).(7.12)

7.2.2. Expansion for θ

We start by noting that we can totally differentiate the identity 0 = 1NgTg

∑i∈Ig

∑Tit=1 ϕ

αit(θ, αg(θ)) to obtain

∂αg(θ)∂θ

= −Hααg (θ, αg(θ))−1Hαθ

g (θ, αg(θ))(7.13)

Hααg (θ, αg) =

∑i∈Ig

Ti∑t=1

ϕααit (θ, αg)(7.14)

Hαθg (θ, αg) =

∑i∈Ig

Ti∑t=1

ϕαθit (θ, αg).(7.15)

From the definition of θ and the assumed differentiability, we also have

0 =1NT

GNT∑g=1

∑i∈Ig

Ti∑t=1

ϕθit(θ, αg(θ)) +1NT

GNT∑g=1

∑i∈Ig

Ti∑t=1

ϕαit(θ, αg(θ))

∂αg(θ)∂θ

GNT∑g=1

∑i∈Ig

Ti∑t=1

ϕθit(θ, αg(θ))(7.16)

since the definition of αg(θ) implies∑i∈Ig

∑Tit=1 ϕ

αit(θ, αg(θ)) = 0.

Expanding (7.16) about θ = θ0 with θ an intermediate value yields

0 =1NT

GNT∑g=1

∑i∈Ig

Ti∑t=1

ϕθit(θ0, αg(θ0))

GNT∑g=1

∑i∈Ig

Ti∑t=1

[ϕθθit (θ, αg(θ))− ϕθαit (θ, αg(θ))Hαα

g (θ, αg(θ))−1Hαθg (θ, αg(θ))

] (θ − θ0)

from which we obtain

θ − θ0 = −J−1 1NT

GNT∑g=1

∑i∈Ig

Ti∑t=1

ϕθit(θ0, αg(θ0))(7.17)

J =1NT

GNT∑g=1

∑i∈Ig

Ti∑t=1

[ϕθθit (θ, αg(θ))− ϕθαit (θ, αg(θ))Hαα

g (θ, αg(θ))−1Hαθg (θ, αg(θ))

].(7.18)

We now expand the term 1NT

∑GNTg=1

∑i∈Ig

∑Tit=1 ϕ

θit(θ0, αg(θ0)) in (7.17) about αg(θ0) = αg similarly

to the expansion in Section 7.2.1 above to obtain

θ − θ0 = −J−1

7∑j=1

Bj −1NT

GNT∑g=1

∑i∈Ig

Ti∑t=1

ϕθαit (θ0, αg)(Hααg )−1(

6∑j=1

(7.19)

B1 =1NT

GNT∑g=1

∑i∈Ig

Ti∑t=1

(ϕθit(θ0, α0(i))− Eit[ϕθit(θ0, α0(i))]

)(7.20)

B2 =1NT

GNT∑g=1

∑i∈Ig

Ti∑t=1

Eit[ϕθit(θ0, α0(i))],(7.21)

B3 =1NT

GNT∑g=1

∑i∈Ig

Ti∑t=1

(ϕθαit (θ0, αg(i))− Eit[ϕθαit (θ0, αg(i))]

)(αg − α0(i)),(7.22)

B4 =1NT

GNT∑g=1

∑i∈Ig

Ti∑t=1

(Eit[ϕθαit (θ0, αg(i))]− Eit[ϕθαit (θ0, αg(j))]

)(αg − α0(i)),(7.23)

B5 =1NT

GNT∑g=1

∑i∈Ig

Ti∑t=1

(Eit[ϕθαit (θ0, αg(j))]− Ejt[ϕθαit (θ0, αg(j))]

)(αg − α0(i)),(7.24)

B6 =1NT

GNT∑g=1

∑i∈Ig

Ti∑t=1

(Ejt[ϕθαit (θ0, αg(j))]− Ejt[ϕθαit (θ0, α0(j))]

)(αg − α0(i)),(7.25)

B7 =1NT

GNT∑g=1

∑i∈Ig

Ti∑t=1

Ejt[ϕθαit (θ0, α0(j))](αg − α0(i)),(7.26)

and αg(·) is a sequence intermediate values satisfying ‖αg(·)−α0(·)‖ ≤ ‖αg−α0(·)‖. Finally, we have B7 = 0since

∑i∈Ig

∑Tit=1 Ejt[ϕθαit (θ0, α0(j))](αg−α0(i)) = 0 for each g = 1, ..., GNT as was demonstrated in Section

7.2.1.

7.2.3. Preliminary Lemmas

Lemma 4. Under Assumptions 1-5 and 7, supg supi∈Ig ‖αg − α0(i)‖ ≤ CNT

Ng−1Ng

)ξNT for some

CNT = O(1).

Proof. We have αg − α0(i) = 1Ng

∑j∈Ig ωjα0(j) − α0(i) = 1

∑j∈Ig ωjα0(j) − 1

∑j∈Ig ωjα0(i) =

∑j∈Ig:j 6=i ωj(α0(j)− α0(i)). Thus,

supi∈Ig‖αg − α0(i)‖ ≤ sup

gsupi∈Ig

∑j∈Ig:j 6=i

‖ωj‖‖α0(i)− α0(j)‖

≤ supg

∑j∈Ig :j 6=i

supi∈Ig‖ωi‖ sup

i,j∈Ig‖α0(i)− α0(j)‖

≤ supg

Ng − 1Ng

supi,j∈Ig

‖α0(i)− α0(j)‖ supg

supi∈Ig‖ωi‖

≤ C(p)ξNT supg

Ng − 1Ng

supi∈Ig‖ωi‖

where C(p) depends only on the dimension of α and the norm.

It remains to be shown that supg supi∈Ig ‖ωi‖ ≤ CNT where CNT = O(1).

‖ωi‖ ≤ ‖

∑i∈Ig

Ti∑t=1

Ejt[ϕααit (θ0, α0(j))]

‖‖Ti∑t=1

Ejt[ϕααit (θ0, α0(j))]‖

≤ TiTg

∆‖

∑i∈Ig

Ti∑t=1

Ejt[ϕααit (θ0, α0(j))]

≤ TiTgC∆ = CNT

for some C <∞ where the second inequality follows from Assumption 4 and the last from Assumption 7. Itfollows from Assumption 1 that supg supi∈Ig

= supg supi∈IgTi/TTg/T

= O(1). The conclusion follows.

Lemma 5. Under Assumptions 1-5 and 8, 1NT

∑GNTg=1

∑i∈Ig

∑Tit=1 ϕ

θθit (θ, αg(θ))

p−→ Hθθ0 .

Proof. Letting Γ = Θ× (×∞i=1A) as in Lemma 2 and (θ, α(·)) = γ ∈ Γ, we have

GNT∑g=1

∑i∈Ig

Ti∑t=1

ϕθθit (θ, α(i))p−→ Hθθ(θ, α(·))

under Assumptions 1, 2, 4, and 8 by argument similar to those in Lemma 1. By arguments similar to thoseused to demonstrate Lemma 2, we also have that∣∣∣∣∣∣ 1

GNT∑g=1

∑i∈Ig

Ti∑t=1

ϕθθit (θ, α(i)) − 1NT

GNT∑g=1

∑i∈Ig

Ti∑t=1

ϕθθit (θ∗, α∗(i))

∣∣∣∣∣∣≤

GNT∑g=1

∑i∈Ig

Ti∑t=1

M(wit)

‖γ − γ∗‖with 1

∑GNTg=1

∑i∈Ig

∑Tit=1M(wit) = Op(1) under the conditions of the Lemma. It follows that

supγ∈Γ

∣∣∣∣∣∣ 1NT

GNT∑g=1

∑i∈Ig

Ti∑t=1

ϕθθit (θ, α(i))−Hθθ(θ, α(·))

∣∣∣∣∣∣ p−→ 0

by Newey and Powell (2003) Lemma A2.

Thus, defining αθg(·) : N→ A such that αθg(i) = arg minα∈A∑i∈Ig

∑Tit=1 ϕit(θ, α), we have∣∣∣∣∣∣ 1

GNT∑g=1

∑i∈Ig

Ti∑t=1

ϕθθit (θ, αg(θ))−Hθθ0

∣∣∣∣∣=

∣∣∣∣∣∣ 1NT

GNT∑g=1

∑i∈Ig

Ti∑t=1

ϕθθit (θ, αg(θ))−Hθθ(θ, αθg(·)) +Hθθ(θ, αθg(·))−Hθθ0

∣∣∣∣∣∣≤ sup

γ∈Γ

∣∣∣∣∣∣ 1NT

GNT∑g=1

∑i∈Ig

Ti∑t=1

ϕθθit (θ, αg(θ))−Hθθ(θ, αθg(·))

∣∣∣∣∣∣+∣∣∣Hθθ(θ, αθg(·))−Hθθ

∣∣∣p−→ 0

where the convergence in probability follows from the argument above and Proposition 1.

Lemma 6. Suppose supg |vgT − vg0|p−→ 0 and infg vg0 ≥ δ > 0. Then, supg |v−1

gT − v−1g0 |

p−→ 0.

Proof. For given η∗ > 0 and ε∗ > 0,

Pr(supg|v−1gT − v

−1g0 | > η∗) = Pr(sup

g|v−1gT − v

−1g0 | > η∗| sup

g|v−1gT − v

−1g0 | ≤ δ − ξ)

× Pr(supg|v−1gT − v

−1g0 | ≤ δ − ξ)

+ Pr(supg|v−1gT − v

−1g0 | > η∗| sup

g|v−1gT − v

−1g0 | > δ − ξ)

−1g0 | > δ − ξ) for some 0 < ξ < δ

≤ Pr(supg|v−1gT − v

−1g0 | > η∗| sup

g|v−1gT − v

−1g0 | ≤ δ − ξ)

−1g0 | ≤ δ − ξ) + (1)(ε∗/2) for N , T large enough

= Pr(supg|v−2gT (vgT − vg0)| > η∗| sup

g|v−1gT − v

−1g0 | ≤ δ − ξ)

−1g0 | ≤ δ − ξ) + ε∗/2 for vgT an intermediate value

≤ Pr(supg|vgT − vg0| > ξ2η∗| sup

g|v−1gT − v

−1g0 | ≤ δ − ξ)

−1g0 | ≤ δ − ξ) + ε∗/2

= Pr(ξ2η∗ < supg|vgT − vg0| < δ − ξ) + ε∗/2

≤ Pr(supg|vgT − vg0| > ξ2η∗) + ε∗/2

≤ ε∗/2 + ε∗/2 = ε∗ for N , T large enough.

Lemma 7. Let YiT =∑Tit=1 yit have E[YiT ] = 0 and suppose the yit satisfy supitEit‖yit‖2c+ε ≤ ∆ <∞ for

c ∈ N and are α-mixing with mixing coefficients of size (1−k)r/(r−k) uniformly in i where k ∈ 2N, k ≥ 2c,r > k. Under the sequence given in Assumption 1, E‖

∑Ni=1 YiT ‖τ = O((NT )τ/2) for τ ≤ 2c.

Proof. E‖∑Ni=1 YiT ‖τ ≤ (E‖

∑Ni=1 YiT ‖2c)τ/2c ≤ (∆N c−1

∑Ni=1E‖yiT ‖2c)τ/2c ≤ (∆N c(supi Ti)c)τ/2c =

(∆N cT c(supi Ti/T )c)τ/2c = O((NT )τ/2)O(1) where the second inequality follows from the Marcinkiewitz-Zygmund inequality and the third inequality from straightforward modifications of standard arguments suchas those in Doukhan (1994) or Kim (1994).

Lemma 8. max1≤g≤GNT ‖ 1NgTg

∑i∈Ig

∑Tit=1(ϕααit (θ, αg(θ))−Eit[ϕααit (θ0, α0(i))])‖ p−→ 0 under Assumptions

1-5 and 9.

Proof. Consider

max1≤g≤GNT

∥∥∥∥∥∥ 1NgTg

∑i∈Ig

Ti∑t=1

Eit[ϕααit (θ, αg(θ))− ϕααit (θ0, α0(i))]

∥∥∥∥∥∥≤ max

1≤g≤GNT

∑i∈Ig

Ti∑t=1

∥∥∥Eit[ϕααit (θ, αg(θ))− ϕααit (θ0, α0(i))]∥∥∥

∑i∈Ig

Ti∑t=1

supi,t

Eit[M(wit)]

(‖γ − γ0‖)

≤ ∆‖γ − γ0‖ ≤ ∆‖γ − γ0‖p−→ 0

using Proposition 1 and the definition of θ as an intermediate value.

max1≤g≤GNT

sup(θ,α)∈Θ×A

∥∥∥∥∥∥ 1NgTg

∑i∈Ig

Ti∑t=1

(ϕααit (θ, α)− Eit[ϕααit (θ, α)])

∥∥∥∥∥∥ > η

≤GNT∑g=1

sup(θ,α)∈Θ×A

∥∥∥∥∥∥ 1NgTg

∑i∈Ig

Ti∑t=1

∥∥∥∥∥∥ > η

.Let ε > 0 be such that 2ε supi,t Eit[M(wit)] < η/3. Divide Γg = Θ × A into subsets Γ1, ...,Γm(ε) such that‖(θ, α) − (θ∗, α∗)‖ < ε whenever (θ, α) and (θ∗, α∗) are in the same subset. Let (θj , αj) denote some pointin Γj for each j. Then

sup(θ,α)∈Θ×A

∥∥∥∥∥∥ 1NgTg

∑i∈Ig

Ti∑t=1

∥∥∥∥∥∥= max

(θ,α)∈Γj

∥∥∥∥∥∥ 1NgTg

∑i∈Ig

Ti∑t=1

∥∥∥∥∥∥which implies

sup(θ,α)∈Θ×A

∥∥∥∥∥∥ 1NgTg

∑i∈Ig

Ti∑t=1

∥∥∥∥∥∥ > η

≤m(ε)∑j=1

sup(θ,α)∈Γj

∥∥∥∥∥∥ 1NgTg

∑i∈Ig

Ti∑t=1

∥∥∥∥∥∥ > η

.For (θ, α) ∈ Γj ,∥∥∥∥∥∥ 1

∑i∈Ig

Ti∑t=1

∥∥∥∥∥=

∥∥∥∥∥∥ 1NgTg

∑i∈Ig

Ti∑t=1

(ϕααit (θj , αj)− Eit[ϕααit (θj , αj)])

∑i∈Ig

Ti∑t=1

(ϕααit (θ, α)− ϕααit (θj , αj))

∑i∈Ig

Ti∑t=1

(Eit[ϕααit (θj , αj)]− Eit[ϕααit (θ, α)])

∥∥∥∥∥∥≤

∥∥∥∥∥∥ 1NgTg

∑i∈Ig

Ti∑t=1

∥∥∥∥∥∥+

∥∥∥∥∥∥ 1NgTg

∑i∈Ig

Ti∑t=1

(ϕααit (θ, α)− ϕααit (θj , αj))

∥∥∥∥∥∥+

∥∥∥∥∥∥ 1NgTg

∑i∈Ig

Ti∑t=1

(Eit[ϕααit (θj , αj)]− Eit[ϕααit (θ, α)])

∥∥∥∥∥∥39

∥∥∥∥∥∥ 1NgTg

∑i∈Ig

Ti∑t=1

∥∥∥∥∥∥+

∑i∈Ig

Ti∑t=1

‖ϕααit (θ, α)− ϕααit (θj , αj)‖

∑i∈Ig

Ti∑t=1

‖Eit[ϕααit (θj , αj)]− Eit[ϕααit (θ, α)]‖

∥∥∥∥∥∥ 1NgTg

∑i∈Ig

Ti∑t=1

∥∥∥∥∥∥+

∑i∈Ig

Ti∑t=1

M(wit) ‖(θ, α)− (θj , αj)‖

∑i∈Ig

Ti∑t=1

Eit[M(wit)] ‖(θj , αj)− (θ, α)‖

∥∥∥∥∥∥ 1NgTg

∑i∈Ig

Ti∑t=1

∥∥∥∥∥∥+

∑i∈Ig

Ti∑t=1

(M(wit)− Eit[M(wit)]) ‖(θ, α)− (θj , αj)‖

∑i∈Ig

Ti∑t=1

Eit[M(wit)] ‖(θj , αj)− (θ, α)‖

∥∥∥∥∥∥ 1NgTg

∑i∈Ig

Ti∑t=1

∥∥∥∥∥∥+ ε

∥∥∥∥∥∥ 1NgTg

∑i∈Ig

Ti∑t=1

(M(wit)− Eit[M(wit)])

∥∥∥∥∥∥+η

sup(θ,α)∈Γj

∥∥∥∥∥∥ 1NgTg

∑i∈Ig

Ti∑t=1

∥∥∥∥∥∥ > η

≤ Pr

∥∥∥∥∥∥ 1NgTg

∑i∈Ig

Ti∑t=1

∥∥∥∥∥∥+ ε

∥∥∥∥∥∥ 1NgTg

∑i∈Ig

Ti∑t=1

∥∥∥∥∥∥ > 2η3

≤ Pr

∥∥∥∥∥∥ 1NgTg

∑i∈Ig

Ti∑t=1

∥∥∥∥∥∥ > η

∥∥∥∥∥∥ 1NgTg

∑i∈Ig

Ti∑t=1

∥∥∥∥∥∥ > 2η3ε

∥∥∥∥∥∥ 1NgTg

∑i∈Ig

Ti∑t=1

∥∥∥∥∥∥τ

)τ+ Pr

∥∥∥∥∥∥ 1NgTg

∑i∈Ig

Ti∑t=1

∥∥∥∥∥∥τ

(2η3ε

)τ= O((NgT )−τ/2)

by the Markov inequality and standard results for mixing sequences as in Doukhan (1994) or Kim (1994) asdemonstrated in Lemma 7.

It follows that Pr[max1≤g≤GNT sup(θ,α)∈Θ×A

∥∥∥ 1NgTg

∑i∈Ig

∑Tit=1 (ϕααit (θ, α)− Eit[ϕααit (θ, α)])

∥∥∥ > η]≤

∆∑GNTg=1 (NgT )−τ/2 ≤ ∆GNT (infg NgT )−τ/2 → 0 under Assumption 9. Thus,

max1≤g≤GNT

sup(θ,α)∈Θ×A

∥∥∥∥∥∥ 1NgTg

∑i∈Ig

Ti∑t=1

∥∥∥∥∥∥ p−→ 0.

Finally, we have

max1≤g≤GNT

∥∥∥∥∥∥ 1NgTg

∑i∈Ig

Ti∑t=1

(ϕααit (θ, αg(θ))− Eit[ϕααit (θ0, α0(i))])

∥∥∥∥∥∥≤ max

1≤g≤GNT

∥∥∥∥∥∥ 1NgTg

∑i∈Ig

Ti∑t=1

(ϕααit (θ, αg(θ))− Eit[ϕααit (θ, αg(θ))])

∥∥∥∥∥∥+ max

1≤g≤GNT

∥∥∥∥∥∥ 1NgTg

∑i∈Ig

Ti∑t=1

(Eit[ϕααit (θ, αg(θ))]− Eit[ϕααit (θ0, α0(i))])

∥∥∥∥∥∥≤ max

1≤g≤GNTsup

(θ,α)∈Θ×A

∥∥∥∥∥∥ 1NgTg

∑i∈Ig

Ti∑t=1

∥∥∥∥∥∥+ max

1≤g≤GNT

∥∥∥∥∥∥ 1NgTg

∑i∈Ig

Ti∑t=1

(Eit[ϕααit (θ, αg(θ))]− Eit[ϕααit (θ0, α0(i))])

∥∥∥∥∥∥p−→ 0

by the previous arguments. 41

Lemma 9. Under Assumptions 1-5, 7, and 9,

max1≤g≤GNT

∑i∈Ig

Ti∑t=1

ϕααit (θ, αg(θ))

∑i∈Ig

Ti∑t=1

Eit[ϕααit (θ0, α0(i))]

‖ p−→ 0.

Proof. The result is immediate given convergence in Lemma 8 and Assumption 7. See Lemma 6.

Lemma 10. max1≤g≤GNT ‖ 1NgTg

∑i∈Ig

∑Tit=1(ϕθαit (θ, αg(θ))−Eit[ϕθαit (θ0, α0(i))])‖ p−→ 0 under Assumptions

1-5 and 9.

Proof. Proof proceeds similarly to the proof of Lemma 8 and is omitted. .

Lemma 11. Under Assumptions 1-5, 7, and 9,∥∥∥∥ 1NT

GNT∑g=1

∑i∈Ig

Ti∑t=1

ϕθαit (θ, α(θ))

∑i∈Ig

Ti∑t=1

ϕααit (θ, α(θ))

−1 1NgTg

∑i∈Ig

Ti∑t=1

ϕαθit (θ, α(θ))

∑i∈Ig

Ti∑t=1

Eit[ϕθαit (θ0, α0(i))]

∑i∈Ig

Ti∑t=1

Eit[ϕααit (θ0, α0(i))]

−1 1NgTg

∑i∈Ig

Ti∑t=1

Eit[ϕαθit (θ0, α0(i))]

∥∥∥∥∥∥∥

p−→ 0.

Proof. For notational simplicity, we write ϕθαit = ϕθαit (θ, α(θ)), ϕααit = ϕααit (θ, α(θ)), ϕθαit = ϕθαit (θ0, α0(i)),and ϕααit = ϕααit (θ0, α0(i)). Also define Hθα

NT = 1NgTg

∑i∈Ig

∑Tit=1 ϕ

θαit , Hαα

NT = 1NgTg

∑i∈Ig

∑Tit=1 ϕ

ααit ,

HθαNT = 1

∑i∈Ig

∑Tit=1 Eit[ϕθαit ], and Hαα

NT = 1NgTg

∑i∈Ig

∑Tit=1 Eit[ϕααit ]. We have that∥∥∥∥ 1

GNT∑g=1

[(NgTg)Hθα

(HααNT

HαθNT − (NgTg)Hθα (Hαα

NT )−1HαθNT ]

∥∥∥∥=

∥∥∥∥∥ 1NT

GNT∑g=1

[(NgTg)Hθα

((HααNT

− (HααNT )−1

)HαθNT

GNT∑g=1

[(NgTg)Hθα

NT (HααNT )−1

(HαθNT −Hαθ

GNT∑g=1

[(NgTg)

(HθαNT −Hθα

)(Hαα

NT )−1HαθNT

]∥∥∥∥∥42

≤ max1≤g≤GNT NgTgNgT

max1≤g≤GNT

∥∥∥∥(HααNT

− (HααNT )−1

∥∥∥∥ 1GNT

GNT∑g=1

∑i∈Ig

Ti∑t=1

M(wit)

max1≤g≤GNT

∥∥∥HαθNT −Hαθ

∥∥∥ 1GNT

GNT∑g=1

∑i∈Ig

Ti∑t=1

M(wit)

+∆δ

max1≤g≤GNT

∥∥∥HαθNT −Hαθ

∥∥∥]

=max1≤g≤GNT NgTg

op(1)1

GNT∑g=1

∑i∈Ig

Ti∑t=1

M(wit)

+op(1)1

GNT∑g=1

∑i∈Ig

Ti∑t=1

M(wit) + op(1)

using Lemmas 9 and 10. Under Assumptions 1 and 5, we have max1≤g≤GNT NgTg

NgT= O(1). We also

have E∥∥∥ 1GNT

∑GNTg=1

∑i∈Ig

∑Tit=1M(wit)

∥∥∥ ≤ ∆ and E∥∥∥∥ 1GNT

∑GNTg=1

∑i∈Ig

∑Tit=1M(wit)

)2∥∥∥∥ ≤

1(NgT )2

∑i∈Ig

∑Tit=1

∑j∈Ig

∑Tis=1(E[M(wit)2]E[M(wjs)2])1/2 ≤ ∆ under Assumption 4 which gives

∑GNTg=1

∑i∈Ig

∑Tit=1M(wit)

= Op(1) and 1GNT

∑GNTg=1

∑i∈Ig

∑Tit=1M(wit) = Op(1). The

conclusion then follows.

Lemma 12. If Assumptions 1-5 and 7-9 are satisfied, Jp−→ J for J in equation (7.18) and J defined in

Assumption 8 and J−1 p−→ J−1.

Proof. The first result is immediate from Lemmas 5 and 11 and second follows immediately from thecontinuous mapping theorem under the eigenvalue condition in Assumption 8.

Lemma 13. Under Assumptions 1-4, 7-8, and 10,√NT (B1 − 1

∑GNTg=1 (NgTg)κgψg1) d−→ N(0,Ω) for κg

and Ω defined in Assumption 8.

Proof. Let Uit = uθit − κguαit. Note that supi E∣∣∣ 1√

∑Tit=1 Uit

∥∥∥2

≤ C supi,t,k E[U2it,k] for some C <∞ where

Uit,k is the kth element of vector Uit follows from standard results for mixing sequences; see, e.g. Doukhan(1994) or Kim (1994). supi,t,k E[U2

it,k] ≤ supi,t E[M(wit)2] + 2 supi,t |κg|E[M(wit)2] + supi,t κ2gE[M(wit)2] ≤

∆(1 + 2 supg |κg| + supg κ2g) follows under Assumptions 2 and 4. supg |κg| ≤ ∆/δ and supg κ2

g ≤ ∆2/δ2 are

also obvious under Assumption 4 and 7. It thus follows that supi E∣∣∣ 1√

∑Tit=1 Uit

∥∥∥2

≤ C for some constantC <∞.

Consider

∥∥∥∥∥ 1√N

N∑i=1

(1√T

Ti∑t=1

Uit −√ρi

1√Ti

Ti∑t=1

)∥∥∥∥∥2

∥∥∥∥∥ 1√N

N∑i=1

(√Ti√T−√ρi

)1√Ti

Ti∑t=1

∥∥∥∥∥2

N∑i=1

(√Ti√T−√ρi

∥∥∥∥∥ 1√Ti

Ti∑t=1

∥∥∥∥∥2

≤ C supi

(√TiT−√ρi

where the second equality follows from independence across i and the inequality from the previous argument.

Assumption 1 also gives that supi

(√TiT −

√ρi

→ 0, so

N∑i=1

(1√T

Ti∑t=1

Uit −√ρi

1√Ti

Ti∑t=1

)= op(1).

Abusing notation and defining YiT =√ρi

1√Ti

∑Tit=1 Uit = 1

∑Tit=1 S

∗it, we have 1√

∑Ni=1 YiT

d−→N(0,Ω) as in Hansen (2007) Lemma 2. The conclusion then follows by noting that

√NT (B1 −

GNT∑g=1

(NgTg)κgψg1) =1√NT

N∑i=1

Ti∑t=1

=1√N

N∑i=1

(1√T

Ti∑t=1

Uit −√ρi

1√Ti

Ti∑t=1

N∑i=1

Lemma 14. Under Assumptions 1-5 and 7, B3 = Op

(ξNT supg

Ng−1Ng√

Proof. Recall that B3 = 1NT

∑GNTg=1

∑i∈Ig

∑Tit=1

(ϕθαit (θ0, αg(i))− Eit[ϕθαit (θ0, αg(i))]

)(αg − α0(i)), and to

conserve notation, let zit = ϕθαit (θ0, αg(i))−Eit[ϕθαit (θ0, αg(i))]. Note that E[zit] = 0 and that E[∥∥∥∑Ti

t=1 zit

∥∥∥2]≤

Ti∆ under Assumptions 2 and 4, so we have

E[‖B3‖2] =1

(NT )2E

GNT∑g=1

∑i∈Ig

Ti∑t=1

GNT∑h=1

∑j∈Ih

Tj∑s=1

(αg − α0(i))′z′itzjs(αh − α0(j))

1(NT )2

GNT∑g=1

∑i∈Ig

(αg − α0(i))′E

( Ti∑t=1

)′( Ti∑t=1

) (αg − α0(i))

(NT )2

GNT∑g=1

∑i∈Ig

‖αg − α0(i)‖2E

∥∥∥∥∥Ti∑t=1

∥∥∥∥∥2

≤ 1(NT )2

GNT∑g=1

∑i∈Ig

Ng − 1Ng

C2NTTiξ

supgNg−1Ng

C2NT ξ

where the last inequality follows from Lemma 4. The conclusion is then immediate. 44

Lemma 15. Under Assumptions 1-5 and 7, B4 = O

(((supg

Ng−1Ng

supgNg−1Ng

)2)ξ2NT

Proof. Recall that B4 = 1NT

∑GNTg=1

∑i∈Ig

∑Tit=1

(Eit[ϕθαit (θ0, αg(i))]− Eit[ϕθαit (θ0, αg(j))]

)(αg − α0(i)) =

∑GNTg=1

∑i∈Ig

∑Tit=1

(Eit[ϕθαit (θ0, αg(i))− ϕθαit (θ0, αg(j))]

)(αg − α0(i)). Then,

‖B4‖ ≤1NT

GNT∑g=1

∑i∈Ig

Ti∑t=1

(Eit‖ϕθαit (θ0, αg(i))− ϕθαit (θ0, αg(j))‖

)‖αg − α0(i)‖

≤ 1NT

GNT∑g=1

∑i∈Ig

Ti∑t=1

Eit[M(wit)]‖αg(i)− αg(j)‖‖αg − α0(i)‖

GNT∑g=1

∑i∈Ig

Ti∑t=1

Eit[M(wit)]‖(αg(i)− α0(i))− (αg(j)− α0(j))− (α0(j)− α0(i))‖‖αg − α0(i)‖

≤ ∆CNT

Ng − 1Ng

)ξNT + C(p)ξNT

)the second inequality follows under Assumption 4 and the last inequality follows from Lemma 4 using thetriangle inequality and the definition of αg(i).

Lemma 16. Under Assumptions 1-5 and 6-7, B5 = O((

supgNg−1Ng

)ξ2NT

Proof. Note that under Assumption 6

‖Eit[ϕθαit (θ0, αg(j))]− Ejt[ϕθαit (θ0, αg(j))]‖ = ‖∫W

ϕθαit (θ0, αg(j))(dFit − dFjt)‖

≤∫W

‖ϕθαit (θ0, αg(j))‖‖(dFit − dFjt)‖

≤∫W

M(w)C(w)‖α0(i)− α0(j)‖

≤(∫

M(w)2dFit

)‖α0(i)− α0(j)‖

≤ ∆‖α0(i)− α0(j)‖.

It then follows that ‖B5‖ ≤ 1NT

∑GNTg=1

∑i∈Ig

∑Tit=1 ∆‖α0(i)−α0(j)‖‖αg −α0(i)‖ ≤ ∆

Ng−1Ng

)ξ2NT .

Lemma 17. Under Assumptions 1-5 and 7, B6 = O

((supg

Ng−1Ng

Proof. B6 = 1NT

∑GNTg=1

∑i∈Ig

∑Tit=1

(Ejt[ϕθαit (θ0, αg(j))− ϕθαit (θ0, α0(j))]

)(αg − α0(i)), so it follows from

the Lipschitz condition and moments bounds in Assumption 4 that ‖B6‖ ≤ 1NT

∑GNTg=1

∑i∈Ig

∑Tit=1 ∆‖αg(j)−

α0(j)‖‖αg−α0(i)‖ ≤ ∆ 1NT

∑GNTg=1

∑i∈Ig

∑Tit=1 ‖αg−α0(i)‖2 = O

((supg

Ng−1Ng

)using Lemma 4 and

the fact that ‖αg(j)− α0(j)‖ ≤ ‖αg − α0(i)‖. 45

In the following lemmas, let

Hθαg =

∑i∈Ig

Ti∑t=1

ϕθαit (θ0, αg),

Hααg =

∑i∈Ig

Ti∑t=1

ϕααit (θ0, αg),

Hθαg =

∑i∈Ig

Ti∑t=1

Eit[ϕθαit (θ0, αg)],

Hααg =

∑i∈Ig

Ti∑t=1

Eit[ϕααit (θ0, αg)],

Hθαg =

∑i∈Ig

Ti∑t=1

Eit[ϕθαit (θ0, α0(i))], and

Hααg =

∑i∈Ig

Ti∑t=1

Eit[ϕααit (θ0, α0(i))].

Note that

GNT∑g=1

∑i∈Ig

Ti∑t=1

ϕθαit (θ0, αg)(Hααg )−1(

6∑j=1

ψgj) =1NT

GNT∑g=1

(NgTg)Hθαg (Hαα

g )−1(6∑j=1

and that

GNT∑g=1

(NgTg)Hθαg (Hαα

g )−1(6∑j=1

ψgj) =1NT

GNT∑g=1

(NgTg)Hθαg (Hαα

g )−1(6∑j=1

GNT∑g=1

(NgTg)(Hθαg −Hθα

g )(Hααg )−1(

6∑j=1

GNT∑g=1

(NgTg)Hθαg

[(Hαα

g )−1 − (Hααg )−1

6∑j=1

Lemma 18. Under Assumptions 1-7, 9, and 11, 1√NT

∑GNTg=1 (NgTg)Hθα

g (Hααg )−1(

∑6j=2 ψgj) = op(1).

Proof. 1NT

g (Hααg )−1(ψg2) = 1

∑GNTg=1

∑i∈Ig

∑Tit=1 κgEit[ϕ

αit(θ0, α0(i))] = 1√

NTo(1) by

Assumption 11.

GNT∑g=1

(NgTg)Hθαg (Hαα

g )−1(ψg3) =1NT

GNT∑g=1

∑i∈Ig

Ti∑t=1

κg(ϕααit (θ0, αg(i))− Eit[ϕααit (θ0, αg(i))])(αg − α0(i))

GNT∑g=1

∑i∈Ig

(Ti∑t=1

)(αg − α0(i))

with zit = κg(ϕααit (θ0, αg(i)) − Eit[ϕααit (θ0, αg(i))]). Then E∥∥∥ 1NT

∑GNTg=1

∑i∈Ig

(∑Tit=1 zit

)(αg − α0(i))

∥∥∥2

≤∆C2

NT ξ2NT

Ng−1Ng

NT by an argument similar to that used in Lemma 14 from which it follows that

GNT∑g=1

(NgTg)Hθαg (Hαα

g )−1(ψg3) = Op

supgNg−1Ng

)√NT

For the next term, we have

GNT∑g=1

(NgTg)Hθαg (Hαα

g )−1(ψg4) =1NT

GNT∑g=1

∑i∈Ig

Ti∑t=1

κgEit[ϕααit (θ0, αg(i))− ϕααit (θ0, αg(j))](αg − α0(i)),

‖ 1NT

GNT∑g=1

(NgTg)Hθαg (Hαα

g )−1(ψg4)‖

≤ (∆2/δ)CNT

Ng − 1Ng

)ξNT + C(p)ξNT

(((supg

Ng − 1Ng

)2)ξ2NT

by an argument similar to that in Lemma 15 using that ‖κg‖ ≤ ∆/δ.

From an argument similar to that used in Lemma 16, it follows that ‖Eit[ϕααit (θ0, αg(j))]−Ejt[ϕααit (θ0, αg(j))]‖ ≤∆‖α0(i)−α0(j)‖. It then follows that 1

g (Hααg )−1(ψg5) ≤ 1

∑GNTg=1

∑i∈Ig

∑Tit=1(∆2/δ)‖α0(i)−

α0(j)‖‖αg − α0(i)‖ = O((

supgNg−1Ng

)ξ2NT

Finally, we have 1NT

g (Hααg )−1(ψg5) = 1

∑GNTg=1

∑i∈Ig

∑Tit=1 κgEjt[ϕ

ααit (θ0, αg(j)) −

ϕααit (θ0, α0(j))](αg − α0(i)) = O

((supg

Ng−1Ng

)using the same argument as in Lemma 17.

It then follows that 1√NT

g (Hααg )−1(

∑6j=2 ψgj) = op(1).

Lemma 19. Under Assumptions 1-7, 9, and 11, 1√NT

∑GNTg=1 (NgTg)(Hθα

g − Hθαg )(Hαα

g )−1(∑6j=1 ψgj) =

op(1).

Proof.

GNT∑g=1

g )(Hααg )−1(

6∑j=1

ψgj) =1NT

GNT∑g=1

(NgTg)(Hθαg − Hθα

g )(Hααg )−1(

6∑j=1

GNT∑g=1

g )(Hααg )−1(

6∑j=1

ψgj).

(i) Considering the first term, we have

E‖ 1NT

GNT∑g=1

g )(Hααg )−1ψg1‖ ≤

GNT∑g=1

(NgTg)(E‖Hθαg − Hθα

g ‖2E‖ψg1‖2)1/2

≤ 1NT

GNT∑g=1

(NgTg)(

∆NgTg

= CGNTNT

where the first inequality is from Cauchy-Schwarz and the second from arguments as in Lemma 7. Thus,1√NT

g )−1ψg1p−→ 0 under Assumption 9.

(ii) Next

E‖ 1√NT

GNT∑g=1

g )(Hααg )−1ψg2‖

≤ 1√NT

GNT∑g=1

(E‖√NgTg(Hθα

g − Hθαg )‖2

‖√NgTgψg2‖

≤ 1√NT

supg‖√NgTgψg2‖GNT

= o(1)GNT√NT

where the first inequality follows from the triangle and Cauchy-Schwarz inequalities and Assumption 7 andthe second inequality follows from Assumption 11 and Lemma 7. Convergence to zero is guaranteed byAssumption 9.

(iii) 1NT

g )−1ψg3 = Op

(GNTNT

Ng−1Ng

)follows in a similar fash-

ion to (i) using an argument similar to that used in Lemma 14 to show that E‖√NgTgψg3‖2 ≤ C

Ng−1Ng

ξ2NT .

‖ψg4‖ = ‖ 1NgTg

∑i∈Ig

Ti∑t=1

(Eit [ϕααit (θ0, αg(i))]− Eit [ϕααit (θ0, αg(j))]) (αg − α0(i))‖

≤ ∆CNT

Ng − 1Ng

)ξNT + C(p)ξNT

)≡ ΞNT

follows from an argument similar to that in Lemma 15. We then have

E‖ 1NT

GNT∑g=1

g )(Hααg )−1ψg4‖

≤ 1NT

GNT∑g=1

ΞNTδ

(NgTg)(

E‖Hθαg − Hθα

g ‖2)1/2

≤ 1NT

GNT∑g=1

∆ΞNTδ

(NgTg)1/2 =GNT√NT

∆ΞNTδ

supg(NgTg)1/2

Under the assumptions, GNT√NT→ 0,

√NTΞNT → 0, and supg(NgTg)1/2

NgT= O(1), so 1√

g −Hθαg )(Hαα

g )−1ψg4p−→ 0.

1√NT

GNT∑g=1

g )(Hααg )−1ψg5

p−→ 0

and1√NT

GNT∑g=1

g )(Hααg )−1ψg6

p−→ 0

follow from arguments similar to those used in (iv) by making use of

∑i∈Ig

Ti∑t=1

(Eit [ϕααit (θ0, αg(j))]− Ejt [ϕααit (θ0, αg(j))]) (αg − α0(i))‖

≤ ∆(

Ng − 1Ng

)ξ2NT and

∑i∈Ig

Ti∑t=1

(Ejt [ϕααit (θ0, αg(j))]− Ejt [ϕααit (θ0, α0(j))]) (αg − α0(i))‖

≤ ∆(

Ng − 1Ng

which follow respectively from arguments similar to those in Lemmas 16 and 17.

(vi) For the remaining terms, we have∥∥∥∥∥ 1NT

GNT∑g=1

g )(Hααg )−1 (

6∑j=1

∥∥∥∥∥∥≤ 1NT

(NgTg)∥∥∥Hθα

g −Hθαg

∥∥∥∥∥(Hααg )−1

∥∥∥∥∥∥∥∥6∑j=1

∥∥∥∥∥∥≤ C

(NgTg)

∑i∈Ig

Ti∑t=1

Eit[M(wit)]

‖αg − αg‖∥∥∥∥∥∥

6∑j=1

∥∥∥∥∥∥≤ C

(NgTg)‖αg − αg‖

∥∥∥∥∥∥6∑j=1

∥∥∥∥∥∥=

(NgTg)

∥∥∥∥∥∥[(Hαα

g )−1 − (Hααg )−1

] 6∑j=1

ψgj + (Hααg )−1

6∑j=1

∥∥∥∥∥∥∥∥∥∥∥∥

6∑j=1

∥∥∥∥∥∥≤ C

1≤g≤GNT

∥∥∥(Hααg )−1 − (Hαα

g )−1∥∥∥GNT∑g=1

(NgTg)

∥∥∥∥∥∥6∑j=1

∥∥∥∥∥∥∥∥∥∥∥∥

6∑j=1

∥∥∥∥∥∥49

GNT∑g=1

(NgTg)

∥∥∥∥∥∥6∑j=1

∥∥∥∥∥∥∥∥∥∥∥∥

6∑j=1

∥∥∥∥∥∥ .From arguments identical to those used to verify Lemma 9, we can show

max1≤g≤GNT

∥∥∥(Hααg )−1 − (Hαα

g )−1∥∥∥ p−→ 0;

so it suffices to show 1√NT

∑GNTg=1 (NgTg)

∥∥∥∑6j=1 ψgj

∥∥∥∥∥∥∑6j=1 ψgj

∥∥∥ p−→ 0.

(vii) From the triangle inequality,∥∥∥∑6

j=1 ψgj

∥∥∥ ≤∑6j=1 ‖ψgj‖. From (iv) and (v), we have ‖ψg4‖ ≤ ΞNT ,

‖ψg5‖ ≤ ∆(

supgNg−1Ng

)ξ2NT , and ‖ψg6‖ ≤ ∆

Ng−1Ng

ξ2NT . Let ΥNT = max‖ψg4‖, ‖ψg5‖, ‖ψg6‖ and

note that√NTΥNT → 0. Then

∑6j=1 ‖ψgj‖ ≤

∑3j=1 ‖ψgj‖+ 3ΥNT . We also have

‖ψg2‖ =1√NgTg

‖ 1√NgTg

∑i∈Ig

Ti∑t=1

Eit[ϕαit(θ0, α0(i))]‖ ≤ 1√NgTg

where BNT = supg ‖ 1√NgTg

∑i∈Ig

∑Tit=1 Eit[ϕαit(θ0, α0(i))]‖ = o(1). Finally, we have E‖ψg1‖2 ≤ ∆

NgTgas in

(i) and E‖ψg3‖2 ≤C(

supgNg−1Ng

)2ξ2NT

NgTgas in (iii). Putting all of this together gives

∥∥∥∥∥∥6∑j=1

∥∥∥∥∥∥∥∥∥∥∥∥

6∑j=1

∥∥∥∥∥∥ ≤ E

(‖ψg1‖+ ‖ψg3‖+1√NgTg

BNT + 3ΥNT

[‖ψg1‖2 + 2‖ψg1‖‖ψg3‖+ ‖ψg3‖2 +

2√NgTg

‖ψg1‖BNT

+2√NgTg

‖ψg3‖BNT +1

NgTgB2NT + 6‖ψg1‖ΥNT + 6‖ψg3‖ΥNT

+6√NgTg

BNTΥNT + 9Υ2NT

≤ ∆NgTg

+2(C∆)1/2

Ng−1Ng

NgTg+C(

supgNg−1Ng

+2∆1/2BNTNgTg

+2C1/2

Ng−1Ng

)ξNTBNT

NgTg+B2NT

+6∆1/2ΥNT√

6C1/2(

supgNg−1Ng

)ξNTΥNT√

6BNTΥNT√NgTg

+ 9Υ2NT

=∆ + bNTNgTg

+ΥNT (6∆1/2 + cNT )√

NgTg+ 9Υ2

where bNT → 0 and cNT → 0.50

(viii) Using (vii) yields

∣∣∣∣∣ 1√NT

GNT∑g=1

(NgTg)

∥∥∥∥∥∥6∑j=1

∥∥∥∥∥∥∥∥∥∥∥∥

6∑j=1

∥∥∥∥∥∥∣∣∣∣∣∣ =

1√NT

GNT∑g=1

(NgTg)E

∥∥∥∥∥∥6∑j=1

∥∥∥∥∥∥∥∥∥∥∥∥

6∑j=1

∥∥∥∥∥∥

≤ 1√NT

GNT∑g=1

(NgTg)

(∆ + bNTNgTg

+ΥNT (6∆1/2 + cNT )√

NgTg+ 9Υ2

=GNT (∆ + bNT )√

ΥNT (6∆1/2 + cNT )√NT

GNT∑g=1

√NgTg + 9

√NTΥ2

= o(1) +ΥNT (6∆1/2 + cNT )√

GNT∑g=1

√NgTg + o(1)

≤ ΥNT (6∆1/2 + cNT )√NT

GNT∑g=1

NgTg + o(1)

=√NTΥNT (6∆1/2 + cNT ) + o(1) = o(1).

(ix) The conclusion of the lemma follows by combining (i)-(viii).

Lemma 20. Under Assumptions 1-7, 9, and 11,

1√NT

GNT∑g=1

(NgTg)Hθαg

[(Hαα

g )−1 − (Hααg )−1

6∑j=1

ψgj) = op(1).

Proof. From a mean value expansion of (Hααg )−1 about Hαα

g = Hααg , we have (Hαα

g )−1 − (Hααg )−1 =

−(Hαα∗g )−1(Hαα

g −Hααg )(Hαα∗

g )−1 where ‖Hαα∗g −Hαα

g ‖ ≤ ‖Hααg −Hαα

g ‖. Expanding further yields

(Hααg )−1 − (Hαα

g )−1 = −(Hααg )−1(Hαα

g −Hααg )(Hαα

g )−1

− (Hααg )−1(Hαα

g −Hααg )[(Hαα∗

g )−1 − (Hααg )−1]

− [(Hαα∗g )−1 − (Hαα

g )−1](Hααg −Hαα

g )(Hααg )−1

− [(Hαα∗g )−1 − (Hαα

g )−1](Hααg −Hαα

g )[(Hαα∗g )−1 − (Hαα

g )−1].

Plugging this expression into

1√NT

GNT∑g=1

(NgTg)Hθαg

[(Hαα

g )−1 − (Hααg )−1

6∑j=1

and making use of max1≤g≤GNT ‖(Hαα∗g )−1 − (Hαα

g )−1‖ p−→ 0 which can be demonstrated as in Lemma9 using that ‖Hαα∗

g − Hααg ‖ ≤ ‖Hαα

g − Hααg ‖, the result follows by an argument similar to that used to

demonstrate Lemma 19 with Hθαg , Hθα

g , and Hθαg replaced by Hαα

g , Hααg , and Hαα

g . 51

7.2.4. Main Result

Proposition 2. Under Assumptions 1-11,√NT (θ − θ0) d−→ J−1N(0,Ω).

Proof. The results is an immediate consequence of the expansions derived in Sections 7.2.1 and 7.2.2 andLemmas 12-20.

References

Almeida, H., M. Campbello, and M. Weibach (2004): “The Cash Flow Sensitivity of Cash,” Journal of Finance,59, 1777–1804.

Altonji, J., T. Elder, and C. Taber (2005): “Selection on Observed and Unobserved Variables: Assessing theEffectiveness of Catholic Schools,” Journal of Political Economy, 113, 151–184.

Anderson, E. (1970): “Asymptotic Properties of Conditional Maximum Likelihood Estimators,” Journal of theRoyal Statistical Society, Series B, 32(2), 283–301.

Arellano, M. (2003): “Discrete Choice with Panel Data,” Investigaciones Economicas, 27(3), 423–458.Arellano, M., and S. Bonhomme (2009): “Robust Priors in Nonlinear Panel Data Models,” forthcoming Econo-metrica.

Arellano, M., and J. Hahn (2005): “Understanding Bias in Nonlinear Panel Models: Some Recent Developments,”Invited Lecture, Econometric Society World Congress, London.

Baltagi, B. (1992): “Specification Issues,” in The Econometrics of Panel Data, ed. by Matyas, and Sevestre. KluwerAcademic Publishers.

Bester, A. C., and C. Hansen (2007): “A Penalty Function Approach to Bias Reduction in Nonlinear PanelModels with Fixed Effects,” forthcoming Journal of Business and Economic Statistics.

Bester, C. A., and C. B. Hansen (2008): “Identification of Marginal Effects in a Correlated Random EffectsModel,” forthcoming Journal of Business and Economic Statistics.

Carro, J. M. (2006): “Estimating Dynamic Panel Data Discrete Choice Models,” forthcoming Journal of Econo-metrics.

Chamberlain, G. (1980): “Analysis of Covariance with Qualitative Data,” Review of Economic Studies, 47, 225–238.Chen, S., and S. Khan (2007): “Semiparametric Estimation of Nonstationary Censored Panel Data Models withTime-Varying Factor Loads,” forthcoming, Econometric Theory.

Chernozhukov, V., H. Hong, and E. Tamer (2004): “Inference on Identified Parameter Sets in EconometricModels,” MIT Working Paper.

Chernozukov, V., I. Fernandez-Val, J. Hahn, and W. Newey (2009): “Identification and Estimation ofMarginal Effects in Nonlinear Panel Models,” Working Paper, Department of Economics, MIT.

Doukhan, P. (1994): Mixing: Properties and Examples, vol. 85 of Lecture Notes in Statistics (Springer-Verlag). NewYork: Springer-Verlag, Editors S. Fienberg, J. Gani, K. Krickeberg, I. Olkin, and N. Wermuth.

Faulkender, M., and R. Wang (2006): “Corporate Financial Policy and the Value of Cash,” Journal of Finance,61, 1957–1990.

Fernandez-Val, I. (2005): “Estimation of Structural Parameters and Marginal Effects in Binary Choice Panel DataModels with Fixed Effects,” Mimeo.

Gayle, G.-L., and C. Viauroux (2007): “Root-N Consistent Semiparametric Estimators of a Dynamic PanelSample Selection Model,” Journal of Econometrics, 141(1), 179–212.

Hahn, J., and G. Kuersteiner (2004): “Bias Reduction for Dynamic Nonlinear Panel Models with Fixed Effects,”Mimeo.

Hahn, J., and G. M. Kuersteiner (2002): “Asymptotically Unbiased Inference for a Dynamic Panel Model withFixed Effects When Both N and T Are Large,” Econometrica, 70(4), 1639–1657.

Hahn, J., and W. K. Newey (2004): “Jackknife and Analytical Bias Reduction for Nonlinear Panel Models,”Econometrica, 72(4), 1295–1319.

Hansen, C. B. (2007): “Asymptotic Properties of a Robust Variance Matrix Estimator for Panel Data when T isLarge,” Journal of Econometrics, 141, 597–620.

Hausman, J. A., and W. E. Taylor (1981): “Panel Data and Unobservable Individual Effects,” Econometrica,49(6), 1377–1398.

Heckman, J. J. (1981): “The Incidental Parameters Problem and the Problem of Initial Conditions in Estimating aDiscrete Time-Discrete Data Stochastic Process,” in Structural Analysis of Discrete Panel Data with EconometricApplications, ed. by C. F. Manski, and D. McFadden. Elsevier: North-Holland.

Henderson, D. J., and A. Ullah (2005): “A Nonparametric Random Effects Estimator,” Economics Letters,88(3), 403–407.

Holmstrom, B., and J. Tirole (1998): “Private and Public Supply of Liquidity,” Journal of Political Economy,106, 1–40.

Honore, B. E. (1992): “Trimmed LAD and Least Squares Estimation of Truncated and Censored Models withFixed Effects,” Econometrica, 60(3), 533–565.

Honore, B. E., and E. Kyriazidou (2000): “Panel Data Discrete Choice Models with Lagged Dependent Vari-ables,” Econometrica, 68(4), 839–874.

Honore, B. E., and A. Lewbel (2002): “Semiparametric Binary Choice Panel Data Models Without StrictlyExogenous Regressors,” Econometrica, 70, 2053–2063.

Honore, B. E., and E. Tamer (2006): “Bounds on the Parameters in Panel Dynamic Discrete Choice Models,”Econometrica, 74(3), 611–632.

Kim, T. Y. (1994): “Moment Bounds for Non-Stationary Dependent Sequences,” Journal of Applied Probability, 31,731–742.

Lancaster, T. (2002): “Orthogonal Parameters and Panel Data,” Review of Economic Studies, 69, 647–666.Lewbel, A. (2005): “Simple Endogenous Binary Choice and Selection Panel Model Estimators,” mimeo.Lin, X., and R. J. Carroll (2000): “Nonparametric function estimation for clustered data when the predictor ismeasured without/with error,” Journal of the American Statistical Association, 95, 520–534.

Lindley, D. V., and A. F. M. Smith (1972): “Bayes Estimates for the Linear Model,” Journal of the RoyalStatistical Society, Series B, 34, 1–41.

Manski, C. (1987): “Semiparametric Analysis of Random Effects Linear Models from Binary Panel Data,” Econo-metrica, 55(2), 357–362.

Matyas, L., and P. Blanchard (1998): “Misspecified heterogeneity in panel data models,” Statistical Papers, 39,1–27.

Meulbroek, L. K. (1992): “An Empirical Analysis of Illegal Insider Trading,” Journal of Finance, 47(5), 1661–1699.Mundlak, Y. (1978): “On the Pooling of Time Series and Cross Section Data,” Econometrica, 46, 69–85.Newey, W. K., and J. L. Powell (2003): “Instrumental Variable Estimation of Nonparametric Models,” Econo-metrica, 71, 1565–1578.

Neyman, J., and E. L. Scott (1948): “Consistent Estimates Based on Partially Consistent Observations,” Econo-metrica, 16(1), 1–32.

Nickell, S. (1981): “Biases in Dynamic Models with Fixed Effects,” Econometrica, 49(6), 1417–1426.Raudenbush, S. W., and A. S. Bryk (2002): Hierarchical Linear Models: Applications and Data Analysis Methods.Thousand Oaks: Sage Publications, second edn.

Roulstone, D. T. (2006): “Insider Trading and the Information Content of Earnings Announcements,” ChicagoGSB working paper.

StataCorp (2007): Stata Statistical Software: Release 10. College Station, TX: StataCorp LP.Sufi, A. (2009): “Bank Lines of Credit in Corporate Finance: An Empirical Analysis,” Review of Financial Studies,22, 1057–1088.

Sun, Y. X. (2005): “Estimation and Inference in Panel Structure Models,” working paper, UCSD.Ullah, A., and N. Roy (1998): “Nonparametric and Semiparametric Econometrics of Panel Data,” in Handbookof Applied Economic Statistics, ed. by A. Ullah, and D. E. A. Giles, vol. 1. Mercel Dekker: New York, NY.

Wooldridge, J. M. (2002): Econometric Analysis of Cross Section and Panel Data. Cambridge, Massachusetts:The MIT Press.

(2005): “Unobserved Heterogeneity and Estimation of Average Partial Effects,” in Identification and Infer-ence for Econometric Models: Essays in Honor of Thomas Rothenberg, ed. by D. W. K. Andrews, and J. H. Stock.Cambridge University Press.

Woutersen, T. (2005): “Robustness against Incidental Parameters and Mixing Distributions,” Mimeo.

FEFE‐BC

500200

Bias1.161

1.075*

*0.546

0.2200.082

0.031‐0.035

‐0.2410.094

‐0.008

1.2781.193

0.6250.284

0.1590.129

0.1260.276

0.2160.124

1.614.3

19.351.8

12.90.1

BIC %0

22.973.3

N0.812

0.731*

*0.593

0.2960.103

0.0730.096

0.6690.146

0.079Size

50.564

0.492*

*0.312

0.1670.080

0.0560.075

0.3680.108

Bias0.202

0.127*

*0.091

0.0420.017

‐0.004‐0.053

‐0.2480.013

‐0.008

0.2200.151

0.1150.076

0.0620.059

0.0810.271

0.0620.059

66.632.5

BIC %0

76.522.8

N0.708

0.315*

*0.285

0.1160.062

0.0640.215

0.9300.062

0.068Size

50.486

0.219*

*0.194

0.0950.074

0.0700.127

0.5510.073

Bias1.024

0.9410.488

0.1510.068

0.0290.009

‐0.009‐0.058

‐0.2570.007

‐0.012

1.0410.959

0.5020.167

0.0900.062

0.0530.052

0.0810.278

0.0530.053

0.20.5

77.721

BIC %0

81.817.6

N1.000

1.0001.000

0.6510.229

0.0930.053

0.0490.259

0.9580.051

0.052Size

50.989

0.9810.903

0.4210.161

0.0700.048

0.0540.137

0.6090.049

Bias0.197

0.1230.088

0.0330.017

0.0060.001

‐0.013‐0.060

‐0.2530.002

‐0.004

0.2010.129

0.0940.043

0.0310.026

0.0250.028

0.0700.272

0.0250.026

0.15.2

94.40.3

BIC %0

N1.000

0.9460.866

0.2350.097

0.0550.044

0.0740.643

0.9950.047

0.052Size

51.000

0.9460.866

0.2350.097

0.0550.044

0.0740.643

0.9950.047

Table 1. Simulation Results for Pure H

ierarchical Model

Grouping Schem

=200, T=2

B. N=200, T=8

C. N=1000, T=2

=1000, T=8

FEFE‐BC

500200

Bias1.150

1.061*

*0.481

0.1840.066

‐0.004‐0.092

‐0.2510.102

‐0.029

1.2651.177

0.5630.260

0.1570.132

0.1570.284

0.2280.137

2.818.1

47.429.1

2.50.1

BIC %0

49.745.3

N0.751

0.682*

*0.482

0.2020.083

0.0590.183

0.6500.127

0.091Size

50.514

0.444*

*0.249

0.1270.070

0.0630.116

0.3750.092

Bias0.187

0.116*

*0.074

0.0200.001

‐0.037‐0.112

‐0.2620.000

‐0.027

0.2080.143

0.1050.070

0.0640.073

0.1330.281

0.0640.070

95.44.4

BIC %0

74.56.9

N0.584

0.238*

*0.174

0.0640.056

0.1170.490

0.9600.056

0.090Size

50.405

0.189*

*0.130

0.0770.067

0.1020.266

0.6940.066

Bias1.033

0.9460.445

0.1280.051

0.011‐0.001

‐0.038‐0.109

‐0.2600.000

‐0.022

1.0520.966

0.4610.147

0.0810.059

0.0560.068

0.1270.277

0.0610.062

0.10.3

97.12.1

BIC %0

64.74.5

N1.000

0.9980.993

0.4580.152

0.0560.046

0.1280.555

0.9710.050

0.085Size

50.992

0.9780.838

0.3050.113

0.0800.071

0.1070.285

0.6930.074

Bias0.186

0.1150.074

0.0180.001

‐0.013‐0.012

‐0.042‐0.111

‐0.261‐0.012

‐0.012

0.1900.121

0.0820.035

0.0290.032

0.0310.053

0.1210.275

0.0310.031

99.90.1

BIC %0

N0.999

0.8660.658

0.0970.064

0.0930.086

0.3700.911

1.0000.086

0.090Size

50.947

0.6140.444

0.0850.053

0.0740.072

0.2240.512

0.8560.072

=200, T=2

B. N=200, T=8

C. N=1000, T=2

=1000, T=8

Grouping Schem

eTable 2. Sim

ulation Results for Mixed M

FEFE‐BC

500200

Bias1.090

1.009*

*0.258

‐0.123‐0.246

‐0.272‐0.283

‐0.2920.237

‐0.292

1.1951.114

0.3460.170

0.2620.285

0.2950.303

0.3420.303

92.26.7

0.20.8

BIC %0

0.10.1

N0.826

0.738*

*0.262

0.2820.775

0.8650.908

0.9330.269

0.933Size

50.534

0.461*

*0.125

0.1640.596

0.7160.762

0.8050.132

Bias0.210

0.132*

*‐0.121

‐0.223‐0.267

‐0.281‐0.287

‐0.2920.132

‐0.291

0.2250.152

0.1330.228

0.2710.284

0.2900.295

0.1520.294

BIC %0

0.52.9

N0.783

0.354*

*0.685

0.9951.000

1.0001.000

1.0000.354

1.000Size

50.568

0.269*

*0.434

0.9430.990

0.9950.998

0.9990.269

Bias1.006

0.9270.233

‐0.173‐0.245

‐0.270‐0.284

‐0.288‐0.290

‐0.2920.233

‐0.292

1.0230.944

0.2520.179

0.2480.273

0.2860.290

0.2930.294

0.2520.294

BIC %0

N1.000

1.0000.788

0.9591.000

1.0001.000

1.0000.788

1.000Size

50.992

0.9790.463

0.7950.996

1.0001.000

1.0000.463

Bias0.208

0.130‐0.124

‐0.240‐0.268

‐0.281‐0.288

‐0.290‐0.292

‐0.2930.130

‐0.292

0.2110.134

0.1260.240

0.2680.281

0.2890.291

0.2920.293

0.1340.293

BIC %0

0.21.5

N1.000

0.9830.999

1.0001.000

1.0000.983

1.000Size

50.995

0.8490.961

1.0001.000

1.0000.849

Table 3. Simulation Results for Pure Individual Effect M

odelGrouping Schem

=200, T=2

B. N=200, T=8

C. N=1000, T=2

=1000, T=8

Mean Std. Dev. Pooled FE AIC[EBITDA/(assets‐cash)]t‐1 ‐0.016 0.372 0.639 0.206 0.408

(0.070) (0.727) (0.084)[0.146] [0.005] [0.075]

0.192[Tangible assets/(assets‐cash)]t‐1 0.328 0.237 0.283 ‐1.041 0.245

(0.104) (1.644) (0.207)[0.064] [‐0.025] [0.045]

‐0.984[ln(assets‐cash)]t‐1 4.901 2.388 0.158 0.491 0.198

(0.013) (0.289) (0.030)[0.036] [0.012] [0.036]

0.469[Net worth, cash adjusted]t‐1 0.424 0.252 ‐0.350 ‐1.619 ‐0.221

(0.100) (0.967) (0.150)[‐0.080] [‐0.040] [‐0.041]

‐1.537[Market to book, cash adjusted]t‐1 2.406 2.866 ‐0.071 0.119 ‐0.026

(0.009) (0.065) (0.009)[‐0.016] [0.003] [‐0.005]

0.113AIC 5817.8 7905.6 5441.4

Descriptive Statistics Probit Coef. EstimatesTable 4. Results from Bank Lines of Credit Example

Note: The table reports descriptive statistics and estimated probit coefficients for key independent variables. The dependent variable in the analysis is whether a firm has a bank line of credit. The mean of the dependent variable is .795. We use two years of data (2002‐2003) giving a total of 7034 observations with 3648 firms. Columns labeled "Probit Coef. Estimates" provide estimation results from a pooled probit model (Pooled), a model which includes a full set of firm dummies (FE), and the AIC minimizing grouped‐effect estimator (AIC) which in this case makes use of 405 groups. The number in parentheses below the point estimates are estimated standard error clustered at the firm level. The numbers in brackets give estimated average marginal effects. For the FE estimator, bias‐corrected FE results based on the Hahn and Kuersteiner (2004) are provided in the braces.

Mean Std. Dev. Pooled FE AIC

Cumulative Abnormal Returns 0.008 0.114 0.306 0.716 0.336(0.100) (0.229) (0.103)

(Cumulative Abnormal Returns)2 ‐0.297 ‐0.617 ‐0.231(0.159) (0.190) (0.122)

‐0.537CAR Average Marginal Effect 0.103 0.041 0.107

Unexpected Earnings ‐0.006 0.698 0.287 0.286 0.281(0.096) (0.208) (0.094)

(Unexpected Earnings)2 ‐0.089 ‐0.078 ‐0.009(0.046) (0.058) (0.040)

‐0.071UE Average Marginal Effect 0.098 0.035 0.090AIC 22232 22028 21766

Cumulative Abnormal Returns 0.008 0.114 ‐0.099 ‐0.644 ‐0.137(0.107) (0.256) (0.112)

‐0.560

(Cumulative Abnormal Returns)2 ‐0.031 0.203 ‐0.030(0.081) (0.367) (0.091)

0.165CAR Average Marginal Effect ‐0.029 0.019 ‐0.038

Unexpected Earnings ‐0.006 0.698 0.014 0.223 0.033(0.076) (0139) (0.084)

(Unexpected Earnings)2 0.0003 0.0032 0.0006(0.0009) (0.0015) (0.0009)

0.0031UE Average Marginal Effect 0.004 0.016 0.009AIC 19268 20478 18891

Table 5. Results from Insider Trading ExampleDescriptive Statistics Probit Coef. Estimates

Note: The table reports descriptive statistics and estimated probit coefficients for key independent variables. The dependent variable for the results in Panel A is an indicator for whether there were any insider buys of own company stock during the quarter, and the dependent variable in Panel B is an indicator for whether there were any insider sells of own company stock during the quarter. We use quarterly data from 1999 giving a total of 19,716 observations with 5877 firms. Columns labeled "Probit Coef. Estimates" provide estimation results from a pooled probit model (Pooled), a model which includes a full set of firm dummies (FE), and the AIC minimizing grouped‐effect estimator (AIC) which uses 425 groups in both the buy results and the sell results. The number in parentheses below the point estimates are estimated standard error clustered at the firm level. For the FE estimator, bias‐corrected FE parameter estimates based on the Hahn and Kuersteiner (2004) are provided in the braces.

A. Insider Buys (Mean of Buys = 0.300)

B. Insider Sells (Mean of Sells = 0.276)

Grouped E ects Estimators in Fixed E ects...

Documents