Post on 21-May-2018
transcript
Grouped Effects Estimators in Fixed Effects Models
C. Alan Bester and Christian B. Hansen∗
April 2009
Abstract.
We consider estimation of nonlinear panel data models with common and individual specific parameters.Fixed effects estimators are known to suffer from the incidental parameters problem, which can lead to largebiases in estimates of common parameters. Pooled estimators, which ignore heterogeneity across individuals,are also generally inconsistent. We assume that individuals in our data are grouped on multiple levels. Thesegroups may be based on some external classification (for example, SIC codes), geographic location (censustract, county, state, etc.), or perhaps based on observable right hand side variables, and may be nested(hierarchical) or non-nested. We consider “group effects” estimators, where individual specific parametersare assumed common across groups at some level. We provide conditions under which group effects estimatesof common parameters are asymptotically unbiased and normal. Our conditions suggest a tradeoff betweentwo sources of bias, one due to incidental parameters and the other due to misspecification of unobservedheterogeneity. Our findings suggest that one may wish to control for heterogeneity at the group level evenwhen individual specific effects are present. These findings are confirmed in a Monte Carlo study andillustrated in two empirical examples.
Keywords: Fixed Effects, Panel Data, Hierarchical Models
JEL Codes: C10, C13, C23
1. Introduction
Panel data is widely used in empirical economics. Such data allows researchers to control forunobservable, time invariant individual-level heterogeneity that, according to economic theory,may be related to covariates of interest. Such heterogeneity arises, for example, with household-specific willingness to pay for a given product, which may be correlated with income, or firm-specificpolicies, which may be related to capital structure.
∗ The University of Chicago, Booth School of Business, 5807 South Woodlawn Avenue, Chicago, IL 60637, USA.
1
We suppose that the model to be estimated is known up to a finite dimensional common pa-rameter and another finite dimensional parameter that may be specific to each individual but isassumed constant over time. In a linear model, individual specific effects are often treated as pa-rameters to be estimated, an approach referred to as fixed effects estimation. Using fixed effectsallows researchers to make inference about common parameters while placing very little structureon the distribution of unobservable heterogeneity. However, this approach may be problematicin nonlinear or dynamic models. As noted by Neyman and Scott (1948), noise in the estimationof individual level effects when the time dimension of the panel is short will in general contami-nate estimates of the common parameters, a phenomenon generally referred to as the incidentalparameters problem.
Recently a number of papers have studied econometric properties of fixed effects (hereafterFE) estimators, with the explicit aim of characerizing biases arising from the incidental parametersproblem.2 These papers work with asymptotic sequences where N , the number of individuals, andT , the number of time periods in the panel, both go to infinity, so that individual specific parametersare consistently estimable. However, they show that for nonlinear models, estimation of individuallevel effects introduces biases in the common parameters of order 1/T , implying that fixed effectsestimators will generally perform badly when T is small. These papers propose to estimate the 1/Tbias directly and remove the estimated bias from common parameters and other objects of interest(e.g., marginal effects). In simulations, these bias corrections provide dramatic MSE improvementsover the uncorrected fixed effects estimator in moderate-T panels. However, these bias correctedestimators may still perform badly when the time dimension is short. Further, as we emphasizebelow, inference based on these estimators tends to suffer from severe size distortions unless T isat least an appreciable fraction of N .
Another common approach, termed broadly as random effects, places restrictions on the distri-bution of unobserved heterogeneity, either assuming independence between observables and unob-servables, or assuming unobservables are drawn from a distribution defined up to a finite dimen-sional parameter, or both. An extreme example is a pooled estimator, which ignores heterogeneityentirely. When these assumptions about the distribution of unobserved heterogeneity are satisfied,the resulting estimators can perform extremely well even in very short-T panels. Unfortunately,economic theory often implies dependence between observed quantities and unobservables andrarely suggests a parametric form for this dependence. In random effects approaches, misspecifyingthe distribution of unobservables may result in inconsistency of estimates for common parameters.Random effects estimators that involve integration over a specified distribution of unobservables
2See, e.g., Hahn and Kuersteiner (2002). An excellent survey of this literature is provided by Arellano and Hahn
(2005). Further citations are provided in Section 1.1.
2
are also often computationally burdensome except in very simple cases. Fixed effects approachesare therefore often preferred in empirical applications despite their potentially poor finite sampleproperties.
This paper considers settings where individuals may be grouped at different levels. This type ofhierarchical setup is actually quite common in economics. For example, households may be groupedat the school district, county, or state level, while firms may be grouped according to course or fineindustry classifications using SIC codes to a given number of digits. Individuals may also be groupedbased on other observable information. In finance, for example, one often considers firms sortedinto 25 groups based on quintiles of their size and market-to-book ratios. We consider “groupedeffects” estimators, which estimate model parameters treating individual specific effects as if theyare constant within groups at a particular level. Such estimators may be naturally thought of asintermediate to pooled and fixed effects estimators. They may be thought of as a particular type ofrandom effects estimators, since they restrict the distribution of unobserved individual level effectsand their relationship with observables.
We consider an asymptotic sequence where N and T go to infinity jointly. We show that groupedeffect estimates suffer from two sources of bias. The first is due to incidental parameters, and isof order 1/(NgT ) since each group-level parameter is estimated using a total of NgT observationswhere Ng is the number of individuals in group g. The second arises from model misspecification, inthe sense that individual specific heterogeneity is incorrectly assumed to be constant within groupsof individuals. We provide conditions on the sampling scheme and the behavior of unobservableswithin groups such that the group effects estimator is asymptotically unbiased and normal andshow that this asymptotic framework leads to useful insights in practice. These conditions suggesta tradeoff between the two sources of bias. We study this tradeoff in a Monte Carlo study, wherewe find that it plays a crucial role in determining finite sample properties of estimates of structuralparameters. We find that grouped effects estimators can offer large gains in finite samples relativeto fixed effects approaches, even in situations where individual effects vary significantly withingroups.
The key conditions involve the rate of growth for the number of groups within which individualsare grouped and the rate at which the error from projecting the true individual specific effects ontothe groups goes to zero. To obtain interesting asymptotic results, we require that the numberof groups increases more slowly than information accumulates about common parameters (moreslowly than
√NT ) and that squared approximation error goes to zero more quickly than information
accumulates about common parameters. Satisfying these conditions will necessarily involve placingrestrictions on unobserved heterogeneity. The asymptotic environment we use suggests that our
3
asymptotic results will be most useful in environments where researchers may not have perfectinformation on how individuals are grouped but have ex ante beliefs or information about potentialgrouping structures that allows a significant portion of the variation in unobserved effects to becaptured by the set of group effects.
It is important to note that analyzing our grouped effects estimators does require assumptionsabout the distribution of unobserved effects. However, these assumptions are very different fromthose employed in typical random effects estimators. We neither assume independence between ob-servables and unobservables nor impose a parametric form for the distribution of unobserved effects.We motivate these assumptions in economically meaningful ways through examples, including twoempirical applications. In addition, our simulations suggest that two commonly used informationcriteria may be useful in deciding between grouping schemes. We also note this approach is com-putationally tractable and easily implemented in standard econometrics software. Taken together,we believe these results offer a useful way to think about models with unobservable individual-leveleffects that will perform quite well in practice with economic data.
1.1. Related Work
Many papers in econometrics have noted biases which arise in nonlinear and/or dynamic paneldata models with incidental parameters; see, for example, the early papers of Nickell (1981) andHeckman (1981). In certain special cases, estimators of common parameters have been developedthat do not depend on unobserved effects. In rare examples where a sufficient statistic for theunobserved effect is available, common parameters may be estimated by conditional maximumlikelihood as in Anderson (1970). For certain models, estimators have been proposed that do notdepend on unobserved effects; see, e.g., Manski (1987), Honore (1992), Honore and Kyriazidou(2000), and several examples discussed in Wooldridge (2002). In general, however, one must eitherestimate the unobserved parameter(s) for each individual or rely on ‘random effects assumptions’described above.
The approach in this paper is related to the recent large-N and T literature for panel datamodels, for example, Lancaster (2002), Hahn and Kuersteiner (2002, 2004), Arellano (2003), Hahnand Newey (2004), or Woutersen (2005), among many others.3 These papers consider fixed effectsestimators with N and T going to infinity jointly, and propose a correction that removes the 1/Tbias resulting from incidental parameters. The resulting bias corrected estimators have been shownin simulations to offer substantial MSE improvements over uncorrected fixed effects estimators.Arellano and Hahn (2005) provide an excellent survey of this literature.
3See also Fernandez-Val (2005), Carro (2006), and Bester and Hansen (2007).
4
Perhaps the best known random effects assumption is independence between observables andunobservables; see Hausman and Taylor (1981) for a very general discussion of identification andestimation in the linear model and Wooldridge (2002) for a discussion of this type of restrictionin many commonly employed models. Honore and Lewbel (2002) and Lewbel (2005) employ aweaker version of this restriction, where a single regressor is assumed independent of unobservables.Nonparametric random effects estimators are proposed by Lin and Carroll (2000) and Ullah andRoy (1998), with general properties of such estimators established in Henderson and Ullah (2005).These approaches rely on independence between unobservables and observed covariates and onsome degree of additive separability, neither of which is assumed in this paper.
Another common restriction is that unobservables depend upon observables only through alinear index, such as Mundlak (1978), Chamberlain (1980), and Wooldridge (2005). These and manyother papers also specify the distribution of unobservables up to a finite dimensional parameter,which may then be estimated jointly with the common structural parameters, e.g., by maximumlikelihood. Models where the distribution of unobservables is parametric are an important specialcase of hierarchical models, dicussed by Lindley and Smith (1972) and Raudenbush and Bryk(2002), which have a long tradition in Bayesian statistics. Several more recent papers, includingChen and Khan (2007), Gayle and Viauroux (2007), and Bester and Hansen (2008), exploit indexrestrictions to obtain identification of common parameters and marginal effects of covariates withpanel data in very general semi- and nonparametric settings.
Surprisingly few papers have studied the behavior of estimators in panel data models whenassumptions about the distribution of unobserved heterogeneity are violated. Baltagi (1992) con-siders misspecification of error components models, and shows that omitting a component may leadto inconsistency. Matyas and Blanchard (1998) conduct a simulation study to assess the impactof misspecification on commonly used estimators in linear panel data models. In a recent paper,Arellano and Bonhomme (2009) consider the impact of misspecification in parametric random ef-fects models. When the distribution of unobservables does not depend upon observables, they showthat the bias in common parameters depends on the Kullback-Leibler distance between the truedistribution of unobservables and its best approximation in the class of models considered.4
A recent alternative and promising approach, pursued in Honore and Tamer (2006), Cher-nozhukov, Hong, and Tamer (2004), and Chernozukov, Fernandez-Val, Hahn, and Newey (2009),
4Their paper conjectures the result would remain true when the distribution of unobservables depends on x.
However, given that their main results are based on an asymptotic sequence where N and T →∞ jointly, this would
require approximation of a conditional density where the conditioning argument is increasing in dimension with the
sample size.
5
is to take T as fixed, but impose no restrictions on the distribution of unobservables. In general, inthis setting, common parameters and marginal effects of covariates are not point-identified but maybe restricted to a potentially informative set in the parameter space. This approach is extremelyinteresting, but at present appears to present a challenging computational problem in all but quitesimple models.
Our approach is related to the panel structure model and estimator proposed by Sun (2005),who also studies a panel data model based on grouping individuals. Sun (2005) assumes the modelis linear with Gaussian errors and that individuals are perfectly classified within a finite number ofgroups. He then treats group membership as unobserved to the researcher. Our approach applies togeneral nonlinear models and is based on a sequence of observable grouping structures that classifyindividuals at courser and finer levels. Importantly, none of these grouping structures are assumedto perfectly classify individuals: For any finite N and T , individual specific effects will differ withingroups. Whether our results can be extended to a setting where group structure is estimated bythe researcher is a question we leave for future work.
The remainder of the paper is organized as follows. Section 2 describes our modeling frameworkand estimators and provides general examples. Section 3 presents asymptotic theory, and Section 4presents a brief Monte Carlo study. Section 5 provides empirical examples, and Section 6 concludes.
2. Model and Examples
Denote observed data as wit, where i = 1, . . . , N indexes individuals and t = 1, . . . , Ti indexestime. We consider a panel data model defined by an objective function
QNT (θ, α1, ..., αN ) =1∑i Ti
N∑i=1
Ti∑t=1
ϕ(wit, θ, αi);
i.e., the model is known up to a finite dimensional common parameter, θ, and a set of individualspecific parameters αi. For simplicity, in the body of the paper we focus on the case ϕ = log f ,where f is a density function with respect to some measure and f(w, θ0, αi0) is the p.d.f of wit, αiis a scalar, and the panel is balanced, Ti ≡ T so
∑i Ti = NT . All assumptions and theorems are
restated and proofs given for general M-estimators in the unbalanced case in the appendix. Thefixed effects (hereafter FE) maximum likelihood estimator is then defined as
(θFE , αi
)= argmax
θ,αiNi=1
1N
N∑i=1
QiT (θ, αi), where QiT (θ, α) =1T
T∑t=1
log f(wit, θ, α).
6
We also suppose the researcher has available a sequence of grouping schemes, which we representfor a given N and T by a collection of index sets,
(2.1) INTg = i : individual i belongs to group g , g = 1, . . . , GNT .
As we suggest below in examples, these groups may be based on wit, or on other observableinformation such as classification of firms based on industry groupings or households based ongeographic locations. For a given N,T , we have GNT groups consisting of Ng individuals each.The indexing by NT is due to groups potentially changing as the panel grows along either orboth dimensions; for readability, we will drop this indexing for the remainder of the paper. Againfor simplicity, in the body of the paper we suppose that groups are equal in size, so that in thebalanced panel case we have Ng = N/GNT .5 As an alternative to the FE estimator, we supposethe researcher considers the group effects (hereafter GE) estimator,
(θG, γg
)= argmax
θ,γgGNTg=1
1GNT
GNT∑g=1
QgT (θ, γi), where QgT (θ, γ) =1Ng
∑i∈Ig
1T
T∑t=1
log f(wit, θ, γ).
Note that the GE estimator solves an optimization problem with the same objective function as theFE estimator, subject to the linear constraints that αi = γg for all i ∈ Ig and all 1 ≤ g ≤ GNT . It isobvious, but important to note, that the grouped effects estimator nests the fixed effects estimatorwhen GNT = N and Ng = 1 for all g, and the pooled estimator when Ng = N and GNT = 1.
2.1. Two Sources of Bias
To understand the large sample behavior of the FE and GE estimators, it is useful to concentrateindividual- and group- level effects out of the problem. To this end, we define
αiT (θ) = argmaxα
QiT (θ, α) and θFE = argmaxθ
1N
N∑i=1
QiT (θ, αiT (θ)),
Note that, for a given finite T , due to sampling error we will in general have αiT (θ0) 6= αi0.Therefore, with T fixed and N →∞, we will have
θFEp−→ θT where θT ≡ argmax
θlimN→∞
1N
N∑i=1
1T
T∑t=1
E [log f(wit, θ, αiT (θ))]
and in general θT 6= θ0. This is the source of the incidental parameters problem noted by Neymanand Scott (1948).
5Like the balanced panel assumption, we drop the assumption of equally sized groups in the appendix.
7
For fixed T , we may view θFE as the solution to a misspecified problem, in the sense that ifone replaces αi(θ) with
αiT (θ) = argmaxα
E [QiT (θ, α)] ,
one would have θT = θ0. That is, noise in estimation of individual specific parameters causescommon parameters to be inconsistent. Intuitively, one gets bias terms of order 1/T since eachindividual specific effect is estimated using T observations. Because the problem is nonlinear, thesebias terms enter the probability limit of θFE , leading to inconsistency when T is fixed.
The same heuristic argument may be applied to the grouped effects estimator. Define
γgT (θ) = argmaxγ
QgT (θ, γ) and θG = argmaxθ
1GNT
GNT∑g=1
QgT (θ, γgT (θ)),
where, as above, Ng is the number of individuals in group g.6 Like the fixed effects estimator, oneobtains θG as the solution to a misspecified problem. With T fixed and N →∞, we have
θGp−→ θT where θT = lim
N→∞
1GNT
GNT∑g=1
1NgT
∑i∈Ig
T∑t=1
E [log f(wit, θ, γgT (θ))] ,
and again in general θT 6= θ0. We show below that, via an expansion of γgT (θ) around γgT (θ) =argmax
γE [QgT (θ, γ)], the grouped effects estimator also suffers from ‘incidental parameters bias’
that is of order 1/(NgT ) = GNT /(NT ), as each group level effect is being estimated with NgT
observations. Depending on the behavior of Ng as N and T increase, it is clear that the incidentalparameters bias in θG is potentially (much) smaller order than that in θFE . Here, however, there isan additional source of bias: Replacing γgT (θ) with γgT (θ) does not give θ0. This happens becausein general individual effects will differ within groups; i.e. we will not have γgT (θ) = αiT (θ). Weconsider the discrepancy between individual effects within groups under the sup norm,
ξNT = supg
sup(i,j)∈Ig : i 6=j
|αi0 − αj0| ,
and show in Section 3 that the second source of bias is closely related to ξNT .
6In this heuristic argument, we ignore the fact that Ig and hence the definition of γgT changes with N as well the
fact that γgT (θ)p−→ γgT (θ) = argmax
γE [QgT (θ, γ)] with T fixed and N →∞ for many potential grouping structures.
We do note that this provides the possibility of N →∞, T fixed inference in the grouped effects setup when groups
are such that γgT (θ) = αiT (θ) for all i and g.
8
2.2. Examples of Restrictions on Unobservables
As part of our assumptions in Section 3, we will need to place restrictions on the behavior of indi-vidual effects αi0 within groups. Here we present two characterizations of data generating processesthat are compatible with our assumptions. Most importantly, note that the restrictions placed onthe data generating process are quite different from most ‘random effects’ estimators. In particu-lar, both examples will allow for fairly general dependence between observables and unobservables,and neither assumes a parametric form for the distribution of αi. Though they are equivalent,we believe it intuitively helpful to discuss the two examples below separately. In both examples,g = 1, 2, . . . , GNT will index groups in a given grouping scheme, while m = 1, 2, . . . will index‘levels’ at which the data can be grouped.
Example 1. Suppose groups can be represented by a sequence of matrices, DNT , where for agiven N and T , DNT ∈ RN×GNT with typical element [DNT ]i,g = 1(i ∈ Ig). These group mem-bership matrices may be generated, for example, by grouping individuals according to quantiles,quintiles, deciles, etc. of a certain observable or set of observables, or by grouping according toother observable information such as SIC codes to a given number of digits. Consider the infeasibleregression of individual level effects on group indicators,
(2.2) (α10 . . . αN0)′ = DNTβ + νNT ,
and let R2NT be the R-squared of this regression. For ξNT → 0, we must have that the error sum
of squares in this regression goes to zero as GNT increases, or equivalently that R2NT → 1. It turns
out that a key ingredient in understanding the bias in θG will be the rate at which this occurs.
For comparison, it is useful to consider a very simple benchmark where αi i.i.d.N(0, 1) and thesequence of (in this case non-nested) groupings
INTg
is formed for each N,T by assigning each
individual i at random to one of GNT equally sized groups. Though in this case groupings contain noinformation about individual-specific effects, it is an easy excersise to show that R2
NT = O(GNT /N),so that the R-squared of regression above goes to one linearly with the number of groups. That is,with uninformative groups, we must have GNT /N → 1, essentially implying that one must run fixedeffects to keep this source of bias small. As we show below, in order for θG to be asymptoticallyunbiased, we will need that groupings contain information about αi.
Example 2. Suppose unobservables have an error components structure,
(2.3) αi0 =∞∑m=1
λmηgm(i),
9
where gm(i) is a sequence of group assignment functions mapping 1, 2, . . . , N 7→ 1, . . . , GNT ,gm(i) = gm(j) ⇔ i, j ∈ Img for some g. Each ηk is assumed i.i.d. with unit variance, and forsimplicity is assumed to have compact support. The number of groups is determined by MNT , thelevel at which the researcher chooses to hold individual specific heterogeneity constant (which is afunction of the panel size), with m = 0 corresponding to pooling over the entire sample (GNT = 1).For example, suppose that every time m is increased by one, new groups are formed by dividingeach current group into k equal sized blocks of observations. Ignoring integer problems, this givesGNT = min
kMNT , N
. In this setting, the behavior of ξNT may be characterized by the behavior
of λm as m→∞, since with compactly supported η we have
(i, j) ∈ Img ⇒ |αi0 − αj0| ≤ ∆∞∑
m=MNT+1
λm.
Note that λm may be thought of as the standard deviation of the error component at group levelm. At a minimum, λm will need to be summable in order for ξNT
p−→ 0, which requires that thevariance of group-level errors goes to zero as one moves to finer grouping schemes.
Note that both examples allow for dependence between observables and unobservables. Forexample, group-level averages of covariates may differ substantially, such as mean levels of incomeacross counties or states. In addition, groups may be formed by sorting individuals based on somecovariates, such as sorting of firms into quintiles based on size and book-to-market. Furthermore,neither assumes ‘perfect classification’ of individuals, in the sense that unobservables are allowedto differ within groups for any finite GNT . As will be shown below, the relative magnitude of biasesdue to incidental parameters and misspecification of individual level heterogeneity is determinedby the rate at which information about unobservables accumulates as groups are added.
3. Asymptotic Theory
The main theorems in our paper establish consistency and asymptotic normality of θG under as-sumptions about the sampling environment and the behavior of unobservables within the observablegroup structure. As in the previous section, assumptions are stated for the balanced panel casewhere groups are equally sized and αi are scalar. All proofs are given in the appendix.7
Throughout this section we define EitWit =∫WitdFit as the expectation with respect to the
marginal distribution Fit associated with individual i at time t, EiW = limT→∞1T
∑Tt=1EitWit,
7In the appendix, assumptions are restated to apply to unbalanced panels, unequally sized groups, and dim(αi) ≥ 1,
and proofs are given under these more general assumptions.
10
and EW = limN,T→∞ 1N
∑Ni=1
1T
∑Tt=1EitWit. For the vector (x1, . . . , xm) ∈ Rm, let ‖x‖1 =∑m
j=1 |xj |. Let ϕit(θ, α) ≡ ϕ(wit, θ, α) where θ ∈ Θ ⊂ Rk and α ∈ A ⊂ R. We write α0(i)to denote a sequence αi0i∈N whose elements are in A, and let `∞(A) be the space of such se-quences equipped with the norm ‖α0‖ = supi∈N |α0(i)|. Finally, define a norm over Rm × `∞(A)as ‖x, α0‖1,∞ = ‖x‖1 + supi∈N |α0(i)|. Denote by Bε(x) the ball of radius ε around x ∈ Rk+1 inthe norm ‖ · ‖1, and Bε (x, α0(·)) the ball of radius ε around (x, α0(·)) ∈ Rk × `∞(A) in the norm‖ · ‖1,∞. For u ∈ Rk+1, define the differentiation operator Duφ(θ, α) = ∂‖u‖1φ/∂θu1
1 . . . ∂θukk ∂αuk+1 .
Assumption C. Assume that Θ×A ⊂ Rk+1 is compact in ‖ · ‖1. As N →∞ and T →∞ jointly,which we denote by N,T → ∞, the following hold
(i) wit, α0(i) are independent across i. For each i, wit is strong mixing with mixing co-
efficients ai(m), and ∃τ ∈ 2N and r > τ such that supi |ai(m)| ≤ Cm(1−τ)rr−τ −ε, where
0 < C <∞ and ε > 0.(ii) ∃M(wit) with supi,tEitM(wit)τ+ε ≤ ∆ <∞ such that, ∀(θ, α), (θ, α) ∈ Θ×A and ‖u‖1 ≤ 3,|Duϕit(θ, α)−Duϕit(θ, α)| ≤M(wit)‖(θ, α)− (θ, α)‖1 and supΘ×A |Duϕit(θ, α)| < M(wit).
(iii) For each i and ε1 > 0, and for each ε2 > 0,
limT→∞
Eiϕit(θ0, αi0)− sup(θ,α)/∈Bε1 (θ0,αi0)
limT→∞
Eiϕit(θ, α) > 0
limN,T→∞
∑i,t
Eitϕit(θ0, αi0)− sup(θ,α)/∈Bε2 (θ0,α0(·))
limN,T→∞
∑i,t
Eitϕit(θ, α) > 0.
Assumption C consists of standard conditions used to verify consistency of M-estimators andare sufficient to establish consistency of the FE estimator. Parts (i) and (ii) are mixing and momentconditions on the data, respectively, while part (iii) assumes unique maximization of the populationobjective function as T → ∞ for each i and as N,T → ∞. Our next assumption is about thesequence of grouping schemes that define the GE estimator. Though used to verify consistency, westate this assumption separately for discussion purposes.
Assumption G’. There exists a sequence of partitions of the data into groups such that for eachN,T, we have GNT equally sized groups defined by index sets of the form (2.1).The number ofindividuals per group is Ng =
∑i 1 (i ∈ Ig), and we define Ng = 1
GNT
∑gNg ≡ N/GNT . We define
ξNT = supg supi,j∈Ig |α0(i)− α0(j)|. As N,T → ∞,11
(i)(Ng−1Ng
)ξNT → 0
Assumption G’ is a high level condition which asserts that either the maximum discrepancybetween individuals within groups goes to zero as the sample size increases, or that the data areeventually grouped at the individual level.8 We show below how the former condition can be verifiedin the two examples discussed in Section 2.2. We now state our basic consistency result for θG.
Proposition 1. Under Assumptions C and G’,(θG, γg
)p−→ (θ0, αi0), where convergence is
in the norm ‖·‖1,∞.
Proposition 1 is proven in the appendix under general conditions including unbalanced panelsand vector αi. Note that under Assumptions C and G’, we can consistently estimate both commonparameters and individual specific parameters for each i with the grouped effects estimator. Thisresult follows from the fact that the fixed effects estimator is consistent as T → ∞ regardless ofN for sensible models such that Assumption C(iii) is satisfied and that Assumption G’ impliesthat one is eventually running fixed effects or unobserved effects are “close enough” within groupsto be well-estimated by group effects. Note that in a fixed-T environment, the fixed effects andgrouped effects estimators would generally have different probability limits and the grouped effectscould remain consistent with sufficient “smoothness” in the individual effects within the groupingstructure.
We now state additional assumptions that allow us to establish asymptotic normality of θG.Before proceeding, it is useful to define
Hααg
(θ, αii∈Ig
)=
1NgT
∑i∈Ig
T∑t=1
Eit
(∂2ϕit(θ, αi)
∂α2
)
Hθθ(θ, αiNi=1
)=
1NT
N∑i=1
T∑t=1
Eit
(∂2ϕit(θ, αi)∂θ∂θ′
),
and write Hθθg0 = Hθθ
g (θ0, αi0), and similarly for Hααg0 . Quantities such as Hαθ
g are definedanalogously, with superscripts denoting differentiation of ϕ and subscript g denoting averagingonly at the group level. Also define the Hessian for the common parameter θ in the problem with
8Recall that with Ng = 1 for all g, the researcher is running fixed effects.
12
αi concentrated out,
JNT =1
GNT
GNT∑g=1
[Hθθg0 −Hθα
g0
(Hααg0
)−1Hαθg0
].
Finally define S∗it = uθit −Hθαg0
(Hααg0
)−1uαit, where uθit = ∂ϕit(θ0,αi0)
∂θ − Eit ∂ϕit(θ0,αi0)∂θ , i.e., the differ-
ence between the score for θ and its expectation, and uαit is the same quantity for α.
Assumption G. Assumption G’ holds, with G’(i) replaced by (i) below, and the additional con-ditions listed below.
(i)√NT
(Ng−1Ng
)ξ2NT → 0
(ii) GNT /√NT → 0
(iii) |dFit(w)− dFjt(w)| < C(w)|α0(i)− α0(j)| for some C with supi,tC(w)/dFit(w) ≤M(w).
Establishing asymptotic normality requires strengthening Assumption G’ in two important ways,corresponding to the two sources of bias discussed in Section 2.1. First, we require either thesquare of the discrepancy between α0(i) within groups in the sup norm or the ratio (Ng − 1)/Ng
to decrease to zero faster than√NT . This strengthening of Assumption G’ is required so that the
‘pooling bias’ caused by individual effects differing within groups does not enter the asymptoticdistribution of θG. The (Ng − 1)/Ng term in G(i) means that in the case where groups are notsufficiently informative, the researcher may still be able to satisfy Assumption G(i) by running fixedeffects. Second, Assumption G(ii) stipulates that the rate at which groups are added grows slowerthan
√NT , which ensures that bias due to incidental parameters does not enter the asymptotic
distribution of the GE estimator. Recalling that the FE estimator sets Ng = 1 and GNT = N , wesee that fixed effects estimators satisfy G(i)-(ii) when N/
√NT → 0, or in other words T grows
faster than N . This is a well known necessary condition (c.f. Hahn and Kuersteiner (2002) forthe dynamic linear model and Hahn and Newey (2004) for the general nonlinear case) for theasymptotic distribution of θFE to be correctly centered. Assumption G(iii) is a technical conditionon smoothness of individual specific marginal distributions in the parameter αi.9
When Ng > 1, there is a natural tradeoff between assumptions G(i) and G(ii). As we shall seein examples below, they provide lower and upper bounds, respectively, on the growth rate of GNT .
9Note that, in the likelihood case, Assumption C(ii) requires Lipschitz continuity of the log likelihood and its
derivatives, while G(iii) requires Lipschitz continuity of the likelihood in α. For convenience, we assume that in the
case where A is continuous, Fit has a density so that |dFit − dFjt| may be interpreted in the obvious way.
13
Understanding this tradeoff allows us to understand the tradeoffs between bias due to pooling andbias due to incidental parameters. This will be the focus of simulation experiments and empiricalexamples in Sections 4 and 5, where we argue that understanding this tradeoff is critical to under-standing the finite sample properties of group effects estimators.
Assumption N. As N →∞ and T →∞ jointly, the following hold
(i) 1√NT
∑Ni=1
∑Tt=1Eit
∂ϕit(θ0,αi0)∂θ → 0 and supg
∣∣∣∣ 1√NgT
∑i∈Ig
∑Tt=1Eit
∂ϕit(θ0,αi0)∂α
∣∣∣∣→ 0
(ii) Let ΩiT = Var(
1√T
∑Tt=1 S
∗it
), and λiT be the minimum eigenvalue of ΩiT . Assume
infi infT λiT > 0 and that Ω = limN,T→∞ 1N
∑Ni=1 ΩiT exists.
(iii) limN,T→∞Hθθ(θ, α0(i)) exists for all (θ, α0(i)) ∈ Θ × `∞(A). limN,T→∞Hααg0 and
limN,T→∞Hθαg0 exist for all g.
(iv) Letting λg be the minimum eigenvalue of Hααg0 , we have infg λg ≥ δ > 0, where δ does not
depend on N,T.(v) J = limN,T→∞ JNT exists and has minimum eigenvalue λJ ≥ δ > 0.
Assumption N consists of standard conditions used to verify asymptotic normality of M-estimators. With the exception of allowing Ng ≥ 1 in N(i) and the g subscripts in N(iii) andN(iv), these conditions are essentially identical to those required to establish asymptotic normalityof the FE estimator. Notice that Assumptions C, G, and N allow for very general dependence inthe time series direction.
Proposition 2. Under Assumptions C, G, and N,√NT
(θG − θ0
)d−→ N(0, J−1ΩJ−1).
Proposition 2 establishes that θG is asymptotically normal and unbiased, and is proven in theappendix under more general conditions. It is worth comparing the result in Proposition 2 to similarresults for the fixed effects estimator as in, for example, Hahn and Kuersteiner (2004). Under anasymptotic sequence where N
T → c < ∞, we have√NT
(θFE − θ0
)d−→ N(cB, J−1ΩJ−1) where
cB is bias resulting from incidental parameters. We see that the fixed effects estimator and thegrouped effects estimator are both asymptotically normal with the same variance but differentcentering under this sequence. Specifically, the fixed effects estimator is biased due to incidentalparameters while the grouped effects estimator, which exploits “smoothness” in the underlyingunobserved effects, is not. Exploiting the assumed smoothness allows the grouped effects estimator
14
to remain correctly centered and asymptotically normal in situations where fixed effects or bias-corrected fixed effects are dominated by bias or may even be divergent when centered around thetrue parameter value and normalized by the sample size. We provide additional discussion belowin the context of our illustrations.
The key assumptions underlying Proposition 2 are Asssumptions G(i) and G(ii), which arerelated to the two sources of bias discussed in Section 2.1. Note that Assumption G(ii) is purelyabout how quickly the number of groups may increase as observations are added to the sample.Assumption G(i) is implicitly a restriction on the data generating process, as it presumes it ispossible to group individuals so that the maximum within group discrepancy in the unobservableeffects goes to zero sufficiently quickly as groups are added. We discuss both of these restrictionsin the context of the two examples introduced in Section 2.2.
Example 1 (continued). As above, let R2NT be the R-squared of the (infeasible) regression
(2.2) of individual specific effects αi0 on the GNT dummy variables indicating group membership.Suppose that, as individuals are added to the panel, the researcher chooses to add groups at rateGNT = N δ.
Suppose the sampling scheme is such that N = O(T ρ). In this case, Assumption G(ii), whichcontrols bias due to incidental parameters, requires that T ρδ/T
12
(1+ρ) → 0, or equivalently δ <12(1 + ρ−1). That is, the faster N increases relative to T , the slower groups must be added to avoidincurring asymptotic biases due to noisy estimates of group-level effects.
It is instructive to compare Assumption G(ii) with the conditions required for consistency ofthe fixed effects estimator. Since GNT = N for the fixed effects estimator, Assumption G(ii) wouldimply N/T → 0 that is, T increases faster than N (or equivalently ρ < 1), which is a well knowncondition for asymptotic unbiasedness of θFE . When T and N increase at the same rate (ρ = 1),we have
√NT
(θFE − θ0
)d−→ N(cB, J−1ΩJ−1). That is, the asymptotic distribution of the fixed
effects estimator is incorrectly centered, with bias cB arising due to the incidental parametersproblem where N
T → c < ∞. Hahn and Kuersteiner (2004) propose an estimate of this bias andshow that the resulting bias corrected fixed effects estimator is asymptotically unbiased when ρ = 1.Under the additional assumption that wit is i.i.d. for each i, Hahn and Newey (2004) show thata similar bias corrected fixed effects estimator is asymptotically unbiased when ρ < 3.
Returning to θG, assume that as groups are added, ξNT = O(G−κ/2NT ), so that 1 − R2NT =
O(G−κNT ). Recall that, in an example where groups are uninformative, we have that R2NT behaves
like GNT /N . For consistency, Assumption G’(i) requires that either κ > 0 or Ng → 1; that is,
15
either groups contain some information about unobservable effects or the researcher will eventuallyrun fixed effects.10 For θG to be asymptotically normal and unbiased, Assumption G(i) requiresthat 1
2
(1 + ρ−1
)< κδρ, or equivalently κδ > 1
2
(1 + ρ−1
). We see immediately that, in order for
Assumptions G(i) and G(ii) to be compatible, we must have κ > 1. In the case where δ = ρ−1,that is, GNT grows at the same rate as T , the requirement κ > 1 amounts to saying that the errorfrom a projection of unobservables onto group dummies is smaller than the sampling error, 1/
√T ,
that would arise from estimating each unobservable effect separately.
Consider the case ρ = 1, where θFE is asymptotically biased, but the bias correction proposedby Hahn and Kuersteiner (2004) results in an asymptotically unbiased estimator. In this case,Assumption G(ii) simply requires δ < 1, and Assumption G(ii) requires δ > 1
κ . That is, to avoidbias due to incidental parameters appearing in the asymptotic distribution of θG, we must haveGNT growing slower than N , and similarly to avoid biases due to misspecification of heterogeneitywe must have GNT growing faster than N1/κ. As groups become less informative (κ ↓ 1), we areforced to add groups at a rate very close to the rate at which individuals are added to the panel,and with κ = 1, both fixed and group effects estimators are asymptotically biased.11
Finally consider the case ρ = 3, where even with bias correction and wit assumed i.i.d. foreach i, fixed effects estimators are still biased asymptotically. Assuming for simplicity that κ = 4/3,Assumptions G(i) and G(ii) require that 1
2 < δ < 23 . In other words, it is possible for θG to be
asymptotically unbiased in a setting where, to our knowledge, any estimator that attempts toestimate all individual-level parameters will be biased asymptotically.
Example 2 (continued). Consider the error components structure (2.3) with a sequence ofgroupings indexed by m, where m = 0 denotes the entire sample (G = 1). Suppose that for eachm, the m + 1st grouping scheme is obtained by dividing each current group into k equally sizedsubgroups. Letting MNT be the level at which the researcher decides to group individuals, wehave GNT = min
kMNT , N
. Assumption G(ii) implies that, in order for biases due to incidental
parameters to go to zero, we must have logNT − 2MNT log k →∞. In other words, when groupsincrease this quickly as one moves up hierarchical levels, MNT must grow like the the logarithm of√NT .12 We can still have Assumption G(ii) satisfied in this case; for example, when the standard
10For the remaining discussion we rule out the case Ng → 1, so that bias due to pooling depends on ξNT .11In theory, it is possible to bias correct θG as well. We do not pursue this extension here, as the theory and our
simulations suggest this will only improve the performance of group effects estimators when the grouping structure
contains very little information about αi, in which case we would advocate the bias corrected fixed effects estimator.12Note that if MNT is chosen such that GNT grows faster than N , we assume the researcher runs fixed effects,
and G(ii) again reduces to logNT − 2 logN →∞, i.e. T grows faster than N .
16
deviation of error components at hierarchical level m behaves like λm = O(e−ωm), AssumptionG(ii) requires that 4ωMNT − logNT →∞, which is compatible with G(i) when ω > 1
2 log k.
This setting is essentially equivalent to the previous example. To see this, recognize that∑∞m=MNT+1 λ
2m = O(e−2ωMNT ), and that here MNT = logGNT
log k , we have that the residual sumof squares from regression of the αi0 on a set of group level dummy variables would be at mostO(G−2ω/ log k
NT ). The requirement for G(i) and G(ii) to be compatible here is therefore sufficient toguarantee that the R-squared of the regression (2.2) approaches one at a rate faster than G−1
NT ,as we discussed above for Example 1. We could then set MNT = logN δ and proceed to deriverestrictions on δ in the same fashion as before.
4. Monte Carlo
The previous section provides an asymptotic framework in which we can show that the grouped-effects estimator of the common parameters in the model presented in Section 2 is asymptoticallynormal and unbiased. This asymptotic approximation relies on two important restrictions thatcontrol two sources bias discussed in Section 2.1. First, groups must be added sufficiently slowlyto control bias due to accumulation of incidental parameters. Second, the error from a projectionof unobservables onto group-level dummies must go to zero sufficiently quickly to control bias dueto pooling of individuals with different unobservable effects into groups. Our results suggest thatgroup effects estimators will perform better in finite sample situations in which a researcher hasreasonably good a priori information about unobservables, specifically in the form of a groupingstructure where variation in unobservables is well-explained by classifying individuals into a numberof groups that is reasonably small relative to the sample size.
In this section, we complement the asymptotic analysis of the previous section with simulationevidence regarding grouped effect estimators’ performance relative to fixed effects and bias-correctedfixed effects implemented using the correction of Hahn and Kuersteiner (2004). The simulationresults are obtained within the context of a simple probit model in which unobserved effects aregenerated according to a hierarchical model. Let G = 5, 10, 20, 50, 100, 200, 500, and let Gm referto the mth element of G. For each Gm ∈ G and all g ∈ 1, . . . , Gm, set Ng,m ≡ N/Gm, and letgm(i) =
∑g g1 ((g − 1)Ng,m < i ≤ gNg,m). We generate data from the model
xit, uit, εit, ηgm,m, νi ∼ N(0, 1) i.i.d.
y∗it = αi + βxit + σεεit
yit = 1 (yit > 0)17
αi =7∑
m=1
κmηgm(i),m + σννi
σ2ε =
∑m
κ2m + σ2
ν
for T ∈ 2, 8, N ∈ 200, 1000, m ∈= 1, . . . , 7 = and gm ∈ 1, . . . , Gm. We report estimates ofthe common parameter, β, for several configurations of the other parameters described below, ineach case relative to a population value β0 = 1.13
By varying the parameters in αi =∑7
m=1 κmηgm(i),m + σννi, we can control the strength ofthe relationship between a proposed grouping scheme and the true unobserved individual-specificeffects. We consider three different specifications which we term “hierarchical”, “mixed”, and“random effects”. In the hierarchical design, we set σν = 0 and set κ = (1, .5, .25, .1, 0, 0, 0) whenN = 200 and κ = (1, .5, .25, .1, .05, .02, 0) when N = 1000. In this case, all of the heterogeneity isbeing generated by group level effects and R2 quickly increases to one as groups are added downthe hierarchical structure. We consider this a baseline, best-case scenario in which one wouldexpect the grouped effects approach to work very well. In the mixed model, we set σν = .25 andκ = (1, .75, .5, 0, 0, 0, 0). In this case, there is variation in the unobserved effects at the individuallevel, and the R2 of the regression of the true unobserved effects on group dummies increases toone very slowly after the first three levels of the hierarchy are controlled for, though the R2 is .967at that point. We expect the group effects estimator to also work quite well in this case since, forthe sample sizes considered, the specification error should be small relative to sampling variationwhich is intuitively what Assumption G(i) requires. We believe this is a very empirically relevantcase and is representative of a situation in which grouping is not perfect but is quite informativeabout the underlying structure of unobserved effects. The random effects specification sets σν = 1and κj = 0 for all j. Here groups are uninformative about the true unobserved effects, and R2
will only approach one if one uses a large number of groups such that GNT /N ≈ 1. In this case,fixed effects should dominate the grouped effects estimator for any grouping structure that is notarbitrarily close to fixed effects for large enough T . However, for small and moderate T , it is notobvious that one estimator should outperform the other.
We report simulation results for the grouped effect estimator for a variety of numbers of groups.With N = 200, we construct grouped effects estimators using each number of groups in the set
13Note that in this example, the parameter β is identified only up to scale. We normalize σε = 1 in the estimation
and rescale estimates by σε so that estimates are always compared to a true value β0 = 1. Note that σ2ε is scaled so
that the conditional variance of y∗it given xit is equal to the variance of αi, so the bias of the pooled probit estimator
is similar in magnitude across all designs.
18
5, 10, 20, 50, 100; and when N = 1000, we construct grouped effects estimators using each numberof groups in the set 5, 10, 20, 50, 100, 200, 500. We also report results from pooled probit whichobviously corresponds to a grouped effects estimator with G = 1, fixed effects which is the grouped-effects estimator with G = N , and bias-corrected fixed effects using the bias-correction of Hahnand Kuersteiner (2004). All results are based on 1000 Monte Carlo replications.
An important practical problem is group selection. We consider two commonly used informationcriteria that may be useful in deciding between grouping schemes. Specifically, we consider groupselection based on AIC and BIC.14 We consider AIC and BIC because they are extremely simpleto compute and are commonly employed in other areas for model selection. The simulation resultspresented below also suggest that they may be useful in choosing grouping structures that deliverreasonable finite sample performance across all of the designs considered. A more formal analysisof group selection is an interesting and important extension of our present results but beyond thescope of the current paper.
4.1. Simulation Results
We report results from our simulation experiments in Tables 1-3. Table 1 gives the results forthe hierarchical design, Table 2 for the mixed design, and Table 3 for the random effects design.In each table, the column label indicates which grouping scheme was used. “FE” and “FE-BC”respectively denote fixed effects and bias-corrected fixed effects. Columns labeled “1”-“500” usethe grouped effect estimator with the corresponding number of groups. Columns labeled “AIC”and “BIC” provide results based on using the estimator that respectively minimizes AIC or BIC ineach simulation iteration. For each estimator, we report bias, root mean squared error (RMSE), thefraction of simulation replications where the estimator was chosen by AIC (AIC %), the fractionof simulation replications where the estimator was chosen by BIC (BIC %), size of 5% level testsbased on clustering at the individual level (SIZEN ), and size of 5% level tests using five clusterscorresponding to the coarsest grouping scheme (SIZE5).
Looking first at Table 1, we see a number of interesting results. The grouped effect estimators,with the exception of the simple pooled probit with G = 1, uniformly dominate fixed effects and bias-corrected fixed effects on all reported criteria. We also see that, as theory would suggest, the bias
14We recognize that there is some ambiguity in defining BIC in the present context. We choose to use BIC =
−2 log(likelihood) + K∗ log(NT ) where K∗ is the total number of parameters in the model including individual or
grouped effects and NT is the total number of observations. Note that we use K∗ and NT based on the complete data
set including observations that would be dropped in say obtaining fixed effects estimates because they are perfectly
predicted by the fixed effects. We also note that using log(NT ) as the penalty may over-penalize complexity depending
on the specifics of the problem but is a common choice; see, e.g. StataCorp (2007) Reference A-H pp. 169-173.
19
and RMSE of fixed effects and bias-corrected fixed effects decrease with T but are roughly invariantas N increases. Note, however, that inference based on the fixed effects estimator deteriorates asN increases for a given T . This phenomenon has been noted in the literature but is often ignoredin practice. Intuitively, while it is true that the bias and RMSE of fixed effects estimators becomes‘small’ when T is large, for inference to be reliable, this bias, which behaves like 1/T , must be smallrelative to sampling error in the common parameters, which behaves like 1/
√NT . Even with bias
correction, inference based on fixed effects estimators may suffer from substantial size distortionswhen T is small relative to N . The grouped effects estimators, by making use of “smoothness” ofthe unobserved effects in the grouping structure, avoid this problem and tend to have small biasthat does not dominate the sampling error even for small T and large N . We also see that usingstandard errors clustered at a broad level offers some robustness in terms of size of tests relative togrouping at the individual level when less than optimal numbers of groups are considered.
While the grouped effects estimators dominate fixed effects and pooled probit, we do see con-siderable variation in the performance of the grouped effects estimators across grouping schemes.This suggests that group selection may play an important role in determining the finite sampleperformance of the estimator. The simulation results do show that both AIC and BIC are usefulfor group selection in this design. Using either AIC or BIC to select the grouping structure pro-duces an estimator with good bias and MSE properties as well as tests that have close to correctsize. BIC seems to do slightly better than AIC in the small N setting, but both perform very wellacross the board in this design.
Results from the mixed design, reported in Table 2, are quite similar to those from the purehierarchical design discussed above. We once again see that fixed effects and bias-corrected fixedeffects are uniformly dominated by the grouped effect estimators across all criteria considered, andwe again see that the performance of fixed effects based inference deteriorates rapidly as N increasesfor fixed T. Unlike in the previous case, there is a significant individual specific component that isnot absorbed by the grouping schemes in this design. This has little effect on the overall resultsbecause, after controlling for group-level effects, the variation in individual effects within groups issmall relative to sampling error. It is true that for N and T large enough, the specification errorincurred by estimating a group effects model with group structure as we are using in the simulationwould result in a breakdown of our asymptotic approximation; but we see strong evidence thatthe approximation is quite good in the sample sizes considered. This good performance highlightsthe usefulness of our approach. Importantly, we view this asymptotic environment not as a literaldescription of the sampling process, but as a means to think about estimation and inference incircumstances where a researcher has reasonable but imperfect information about predictability
20
of unobserved effects given group membership. Finally, we also see that AIC and BIC are botheffective in choosing estimators with good estimation and inference properties in this design.
The final set of results are from a Gaussian random effects model in which the individual specificparameters are not predictable within any of our grouping schemes. This is clearly a worst-case typescenario for our approach in that the only way specification error can be made small is by havingthe number of groups be approximately equal to N. It is thus not surprising that the group-effectestimators are no longer uniformly dominant within this design. Nor is it particularly surprisingthat none of the considered estimators do very well. In the experiments with T=8, bias-correctedfixed effects is comparable, though slightly inferior, to the best considered grouped effect estimatorin terms of bias and RMSE and better in terms of inference properties. With T=2, it is clearly bestto use a grouping scheme with less than N groups despite the obvious specification error. Lookingat the results displayed in the table and extrapolating, it also appears that there would be groupingschemes that would be preferred to bias-corrected fixed effects that did not fall within the supportof grouping schemes that we considered. In this case, there is also a clear and strong distinctionbetween AIC and BIC in terms of group selection with AIC outperforming BIC in each case exceptthe small sample with N=200 and T=2 where AIC is superior in terms of bias and size of tests butBIC produces a smaller MSE estimator.
Overall, the simulation results are quite favorable for the grouped effect approach. Sensiblegrouping strategies with less than N groups outperform fixed effects and bias-corrected fixed effectsin almost every case considered, with the only exceptions being cases with T=8 in the randomeffects design where the grouped effects assumptions are grossly violated. The resulting estimatorsalso tend to have good bias and MSE properties and to perform relatively well in terms of inference.It is also encouraging that AIC and BIC are successful in helping to choose a grouping scheme withreasonable finite sample properties in the hierarchical and mixed designs and that AIC choosesessentially the best available estimator in the random effects design though none of the estimatorswe consider performs particularly well in this case. The results clearly show that grouped effectestimators may significantly outperform fixed effects estimators and suggest that a more formaland systematic treatment of data-dependent group selection may yield interesting results.
5. Empirical Examples
In this section, we apply the grouped effects strategy in two empirical examples. In the first, weconsider the association of firm cash flow, asset tangibility, size, net worth, and market to book towhether a firm has a line of credit as in Sufi (2009). The goal of the analysis is to isolate directeffects of these variables on firm access to a line of credit to provide insight into the types of market
21
friction that may make lines of credit a poor liquidity substitute for cash for some firms. Thisanalysis is complicated by the presence of firm level characteristics, such as corporate governance,that are both related to a firm’s cash flows and possible credit constraints and are difficult to observeor measure. Simply including firm specific effects to account for such factors is complicated by thebinary nature of the dependent variable and the short time span available. In this example, weargue that firms that have similar realizations of observable variables over the sample period mayhave similar values of unobservables and construct groups based on partitioning the independentvariables. In the second example, we follow Roulstone (2006) in studying the extent to whichcorporate insiders trade on short-term information about how future earnings surprises will affecttheir firm’s share price. Again this analysis is complicated by unobserved firm level characteristicsthat may influence the incentives faced by firm executives. This heterogeneity in incentives mightlead one to want to control for firm specific effects. However, one might also wish to focus on anarrow time window to avoid results being influenced by changes in the regulatory environment.Again, we believe the grouped effects approach is appealing here as the relevant firm specificunobservables are likely relatively constant within sets of firms in the same industry and/or withsimilar realized characteristics over the sample period.
5.1. Bank Lines of Credit
The dependent variable in our first application is a binary variable which is one if a firm has accessto a bank line of credit. Understanding factors associated with firms’ having access to bank linesof credit is an important ingredient to understanding firms’ corporate finance decisions and whattypes of market frictions may exist in the market for firm credit. Specifically, there is a theoreticaland empirical literature on firm cash holdings that argues that firms that are constrained in thecredit market should retain cash to be able to pursue investment opportunities in periods in whichthey are unable to raise sufficient external financing, though the literature is silent on what types ofmarket frictions may lead to these constraints; see Almeida, Campbello, and Weibach (2004) andFaulkender and Wang (2006) for recent examples. There is also a theoretical literature that arguesthat bank lines of credit are a financial product designed to overcome exactly the types of marketfrictions discussed in the cash literature; see, for example, Holmstrom and Tirole (1998). It thusseems useful to try to understand which factors are associated to a firm’s having a line of credit.This question has been addressed in a recent paper, Sufi (2009), which we largely follow here.Our brief analysis complements the detailed analysis in Sufi (2009) by allowing for firm-specificheterogeneity.
22
For our analysis, we estimate models of the form
P(yit = 1|xit,Firm = i) = Φ(x′itβ + γt + αg(i))
where Φ(·) is the standard normal distribution function, xit is a vector of observed characteristicsfor firm i at time t, γt is a time specific effect, and αg(i) is an unobserved group-specific effect. Thevector xit consists of EBITDA scaled by non-cash total assets to measure firm cash flow, tangibleassets scaled by non-cash assets, the natural logarithm of non-cash total assets to measure firmsize, net worth scaled by non-cash assets, the market to book ratio, and a vector of dummies forone-digit SIC classification. The firm characteristics are motivated by the theoretical literaturementioned above and are meant to be associated with firms facing a high cost of external financerelative to internal finance. The empirical specification is identical to Sufi (2009) with the exceptionof our additional term αg(i). Our results also differ in that our sample period is different; we useannual firm level data from 2002-2003 (T = 2).15 This provides us with a sample of 3648 firms anda total of 7034 observations. Further details regarding data sources and data construction may befound in Sufi (2009).16
We consider a variety of specifications for αg(i) ranging from pooled probit, with αg(i) = α forall i, to fixed effects probit with αg(i) = αi. For the intermediate grouping schemes, we considerforming groups based on realized x’s. Grouping based on the x’s is motivated by the simple beliefthat firms that have similar observables will also have similar values for unobserved firm-specificheterogeneity; similar beliefs have been used fruitfully in other areas of economics; see, for example,Altonji, Elder, and Taber (2005). Specifically, we form groups based on the within-firm sampleaverages of the observed x’s by putting two firms into the same group if their average x’s fall intothe same percentile regions. For example, a grouping scheme with four groups may be based onwhether the average tangible assets and net worth for a particular firm during the sample periodare above or below the median within-firm average tangible assets and median within-firm averagenet worth across all firms. The first of four groups in such a scheme would be comprised of firmswhose average tangible assets and average net worth during the sample period are both above thesample medians of within-firm average tangible assets and within-firm average net worth across allfirms, with the remaining three groups constructed in the obvious fashion. We consider a variety ofsuch grouping schemes ranging between four groups and a potential 3125 groups.17 Motivated by
15Sufi (2009) uses annual data from 1996 to 2003.16We thank Amir Sufi for kindly providing us with the data used in this example.17The potential 3125 groups comes from considering all possible cells using all five variables split at the quintiles.
Not surprisingly, many of these cells are empty in the actual example.
23
our simulation results, we use AIC to choose among the various grouping schemes, including fixedeffects and pooled probit.18
In Table 4, we report the results from the exercise for pooled probit, fixed effects and bias-corrected fixed effects probit using the correction of Hahn and Kuersteiner (2004), and the AICminimizing group data estimator with reported standard errors clustered at the firm level.19 Look-ing across the table, we see that the fixed effects point estimates are quite different from the pooledprobit and grouped data estimates. We see that there is relatively little difference between the fixedeffect and bias-corrected fixed effect point estimates as we would expect from our simulation as wellas the asymptotic theory underlying bias-corrected fixed effects. Relative to the other estimates,the fixed effects estimates are also incredibly imprecise. Using our T = 2 simulation results as arough guide would suggest that both the fixed effects and bias-corrected fixed effects suffer fromsubstantial bias.
Comparing pooled probit to the grouped effect estimator, we see that the signs all agree and thatthe coefficients are not of wildly different magnitudes. We do see that in some cases the estimatesand precision of the grouped effects estimator are different from pooled probit in important ways.For example, one would conclude there is a significant association between tangible assets andhaving a line of credit and between net worth and having a line of credit at usual significancelevels using pooled probit but would not based on the grouped effects estimator. We also see thecoefficients on EBITDA and market to book from the grouped effects estimator are substantiallyattenuated relative to pooled probit, though both remain statistically significant at conventionallevels. Grouped effects estimates of average marginal effects20 are also generally different frompooled probit, suggesting smaller marginal effects of the covariates on the probability of havinga line of credit then predicted by pooled probit. As such, our preferred estimates are broadlyconsistent with the results and findings of Sufi (2009), though the estimated effects are somewhatsmaller than he reports, suggesting that the findings of his paper are fairly robust to the presenceof firm-specific heterogeneity.
18In total, we considered 44 different grouping schemes formed from various splits on the x-variables. Details are
available upon request.19Using AIC, we select a model with 432 potential groups based on splitting tangible assets, EBITDA, and market
to book into thirds and net worth and size at the quartiles. This results in a total of 405 non-empty groups.20We calculate the average marginal effect of variable xj as 1
NT
∑i
∑t βjφ(x′itβ + γt + αg(i)) where φ(·) is the
standard normal density function.
24
5.2. Insider Trading
In our second application, we consider the relationship between corporate insiders’ decisions to buyor sell own-company stock and insiders’ private information about how future earnings surpriseswill affect the firm’s share price. There is evidence that insiders trade profitably on nonpublicinformation prior to takeovers,21 but empirical evidence on the relationship between insider tradingand earnings announcements is mixed. Studying the relationship between insider trading andearnings announcements is complicated by the changing regulatory environment which complicatesusing long time series data sets. In addition, it is a priori quite plausible that there would beunobserved firm specific factors, such as corporate governance and earnings management, thatwould be related to both insider trading activity as well as the behavior of the firm’s share pricearound earnings announcement dates. We attempt to address these concerns in this exercise bylimiting ourselves to a very short time span where the regulatory regime is hopefully fairly constantand controlling for firm specific heterogeneity using grouped effects schemes based on 4-digit SICcode and information about market value of equity, institutional ownership, and asset turnover.Our brief analysis also provides a useful complement to closely related work of Roulstone (2006)who examines the quantity of insider trading using OLS with firm-specific fixed effects and a Tobitmodel without firm-specific effects.
For our analysis, we estimate separate probit models for insider buys, defined as an indicatorvariable which is one if there were any purchases of own company stock by corporate insiders(defined as top officers and directors) during the period starting one day after the prior quarter’searnings announcement and ending one day before the current quarter’s earning announcement,and insider sales, defined similarly to insider buys. For either buys or sales, the estimated modeltakes the form
P(yit = 1|xit,Firm = i) = Φ(x′itβ + γt + αg(i))
where Φ(·) is the standard normal distribution function, xit is a vector of observed characteristicsfor firm i at time t, γt is a time specific effect, and αg(i) is an unobserved group-specific effect.The vector xit consists of the cumulative abnormal return over the three days -1, 0, and 1 relativeto the earnings announcement date (CARit), CAR2
it, CARit−1, CARit+1, unexpected earnings22
21See, e.g., Meulbroek (1992).22Unexpected earnings is defined as either I/B/E/S-reported actual earnings minus the mean analyst forecast of
earnings or as actual earnings less actual earnings four quarters previously if there is no analyst forcast. Unexpected
earnings are scaled by stock price ten days prior to the earnings announcement.
25
(UEit), UE2it, UEit−1, UEit+1 and a vector of other control variables.23 For our analysis, we focus
on the variables CAR, CAR2, UE, and UE2 which are meant to capture the news of the earningsannouncement, and note from the timing we are asking whether current news forecasts past insidertrading activity. Our empirical specification is similar to Roulstone (2006) with the exception ofour additional term αg(i) and slight changes in the control variables. Our results also differ in thatour sample period is different; we use quarterly firm level data from 1999 (T = 4).24 Our totalsample consists of 18,527 observations across 5582 firms. Further details regarding data sourcesand data construction may be found in Roulstone (2006).25
As in the previous example, we considered a variety of different grouping schemes and selectedamong them using AIC. In addition to pooled probit and fixed effects probit, we considered groupsbased on one-, two-, three-, and four-digit SIC code as well as groups formed by interacting SICcode with splits on the control variables market value of equity, institutional ownership, and assetturnover. Groups using the x’s are constructed as in the previous example, and we refer the readerto the discussion there for details.26
We report estimation results from pooled probit, fixed effects and bias-corrected fixed effectsprobit using the correction of Hahn and Kuersteiner (2004), and the AIC minimizing group effectsestimator with reported standard errors clustered at the firm level in Table 5.27 As in the previousexample, we see that the signs of the estimates generally line up, though there are importantdifferences in magnitudes. For insider buys, we see that fixed effects is AIC preferred to pooledprobit, which is not true in any of the other examples. In all cases, we do see that there is a distinctinterior minimum of AIC away from both fixed effects and pooling.
Looking first at the results for insider buys, we see that there is robust evidence across allspecifications that insider purchases of own-company stock are related to information in futureearnings announcement as measured by CAR and UE. Intuitively, if insiders are able to tradeprofitably on private information, we should expect that the marginal effect of each variable on
23The other control variables are turnover of company stock, an indicator for whether any analysts follow the
firm, the number of analysts following the firm, the return on the firm’s stock from six months to two days prior to
the announcement day minus the market return over the same period, the return on the firm’s stock over the period
two days to six months after the announcement minus the market return over the same period, the percentage of
institutional ownership, the natural logarithm of the firm’s market value of equity ten days prior to the announcement,
and the book to market ratio.24Roulstone (2006) uses all available quarterly data from 1980 to 2002.25We thank Darren Roulstone for providing us with the data used in our analysis.26In this example, we considered 23 different grouping schemes. Details are available upon request.27In this case, AIC is minimized by simply grouping on the four-digit SIC level for both buys and sales which
results in a total of 425 groups.26
the probability of insider trading activity to be positive in the case of buys and negative for sells.For CAR and CAR2, the estimated effects are statistically strong and of the theoretically expectedsign across all reported models; and for UE, the signs are as theoretically expected though theestimate of the coefficient on the second-order term is not statistically strong. There are rathersubstantial differences in magnitudes between the fixed effect and grouped effect estimates whichour simulation results would suggest are largely influenced by small sample bias in the fixed effectsestimates. The grouped effects estimates strongly indicate that there is a moderate but statisticallystrong relationship between insider buying of own-company stock and price moves around futureearnings announcements which clearly suggests that insiders are making trading decisions based onnon-public information about company performance.
The results are much more muddled regarding insider selling. The difficulty in finding aneffect on insider sales is not surprising and is consistent with the existing literature. For example,Roulstone (2006) argues that the impact of earnings announcements on insider selling may be hardto identify due to liquidity trades (e.g., selling by insiders for portfolio rebalancing purposes). In ourresults, we see that the estimated average marginal effects of both CAR and UE are economicallysmall across all estimators considered. Again the fixed effects point estimates are quite differentfrom the grouped effects estimates, and the fixed effects estimates are the only estimates whichsuggest a statistically significant effect of CAR or UE on insider sales. Using our simulations andthe theory of fixed effects estimates with small T suggests that this association is likely spurious.Looking at the grouped effects estimates, one could not rule out that there is no relationship ora (theoretically) wrong-signed relationship between insider sales and the earnings variables at anyconventional significance level.
Overall, our preferred group effects results are consistent with a small but quite robust effect offuture earnings announcement returns on insiders’ trading decisions. The results are qualitativelyquite similar to those reported in Roulstone (2006) who uses the same basic data over a muchlonger time span and with a very different specification that does not control for firm effects. Thus,our results complement and further strengthen the results and analysis in that paper.
6. Conclusion
This paper has analyzed group effects estimators for nonlinear panel data models with a finitedimensional common parameter and time invariant individual specific effects that are unobserved tothe econometrician. Group effects estimators hold individual-level heterogeneity constant accordingto an observed grouping structure, and may be thought of as intermediate to pooled and fixedeffects estimators. We provided conditions under which group effects estimators of the common
27
parameter are asymptotically unbiased. These conditions suggest a tradeoff between two sources ofasymptotic bias, one due to the well known incidental parameters problem suffered by fixed effectsestimators, and another arising from discrepancies in individual level effects within groups, that is,misspecification in the structure of unobservable heterogeneity. We illustrated this tradeoff in twoexamples, and a set of simulations that suggest group effects estimators may perform significantlybetter in finite samples relative to pooling or fixed effects. We also considered the group effectsapproach in empirical studies of firm lines of credit and insider trading.
The results in this paper may be extended in several interesting ways. First, following Sun(2005), one can consider a setup where the group structure is unobservable or only partially ob-servable to the econometrician. Second, one may wish to consider bias correction of group effectsestimators. Correction of biases due to incidental parameters should follow from similar approachesfor fixed effects estimators, e.g., Hahn and Newey (2004) and Hahn and Kuersteiner (2004). Finally,though AIC and BIC seem to perform well in selecting grouping schemes in our simulations andempirical examples, we have not yet found a formal justification for this procedure. Our resultssuggest these may be interesting avenues for future research.
7. Appendix
For a random variable Wit, let Eit[Wit] =∫WitdFit be the expectation with respect to the marginal dis-
tribution of the data for individual i at time t, and let EiW = 1Ti
∑Tit=1 Eit[Wit] where Ti is the number of
observations for individual i. Let T = 1N
∑Ni=1 Ti. Throughout, α and |α| denote a real scalar (or vector)
and its absolute value (sum of absolute values of its elements) while α(·) and ‖α(·)‖ = supi∈N |α(i)| denote asequence of real numbers (vectors) and its supremum norm. For a real vector θ, define ‖(θ, α(·)‖ = |θ|+‖α(·)‖.
Assumption 1. Let N,T → ∞ indicate that N →∞ and each Ti →∞ jointly in such a way that TiT → ρi
and supi |TiT − ρi| → 0 where infi ρi ≥ δ > 0 and supi ρi ≤ ∆ < ∞. For notational readability, we indexelements of a sequence indexed by N and (T1, ..., TN ) only by NT .
Assumption 2. wit, α0(i) are independent across i. For each i, wit is a strong mixing sequencewith mixing coefficient ai(m) = supt supB1∈Bi−∞,t,B2∈Bit+m,∞ |P (B1 ∩ B2) − P (B1)P (B2)| where Bi−∞,t =σ(wit, wit−1, wit−2, ...) and Bit,∞ = σ(wit, wit+1, wit+2, ...), and there exists a τ ∈ 2N and r > τ such that
supi|ai(m)| ≤ Cm(1−τ)rr−τ −ε for some C > 0 and some ε that satisfies ε− δ > 0 for some δ > 0.
Assumption 3. Let ϕ(wit; θ, α) be a function indexed by the parameters θ ∈ Θ and α ∈ A where Θ and Aare compact, convex subsets of Rk and Rp respectively, and let ϕit(θ, α) ≡ ϕ(wit; θ, α). Assume ϕit(θ, α) iscontinuous in θ and α. Let θ0 ∈ int(Θ) and α0(i) ∈ int(A) for each i = 1, ..., N be such that for each i andη > 0,
limTi→∞
Ei[ϕ(θ0, α0(i))]− sup(θ,α):|(θ,α)−(θ0,α0(i))|>η
limTi→∞
Ei[ϕ(θ, α)] > 0
28
Also assume that for each η > 0,
limN,T→∞
1NT
N∑i=1
Ti∑t=1
Eit[ϕit(θ0, α0(i))]
− sup(θ,α(·)):‖(θ,α(·))−(θ0,α0(·))‖>η
limN,T→∞
1NT
N∑i=1
Ti∑t=1
Eit[ϕit(θ, α(i))] > 0
where ‖(θ, α(·))‖ =∑kj=1 |θj |+ supi
∑pj=1 |αj(i)|.
Assumption 4. Let v = (v1, ..., vk)′ and u = (u1, ..., up)′ be vectors of nonnegative integers. Define
D(v,u)ϕit(θ, α) = ∂|v|+|u|ϕit(θ,α)
∂θv11 ...∂θ
vkk ∂α
u11 ...∂α
upp
. Assume there exists a function M(wit) such that |D(v,u)ϕit(θ2, α2)−D(v,u)ϕit(θ1, α1)| ≤ M(wit)‖(θ2, α2) − (θ1, α1)‖ind for all (θ1, α1), (θ2, α2) ∈ Θ × A and |v| + |u| ≤ 3 andthat sup(θ,α)∈Θ×A ‖D(v,u)ϕit(θ, α)‖ ≤M(wit) for |v|+ |u| ≤ 3. Assume that supi,t Eit[M(wit)τ+ε] ≤ ∆ <∞for some ε that satisfies ε− δ > 0 for some δ > 0.
Assumption 5. Suppose that there exists a sequence of partitions such that for each N,T1, ..., TN thedata are partitioned into GNT groups consisting of
∑i∈Ig Ti observations for g = 1, ..., GNT where Ig =
i : individual i belongs to group g. Let Ng denote the number of elements in the set Ig, and let Tg =1Ng
∑i∈Ig Ti. Let Ng = 1
GNT
∑GNTg=1 Ng and assume supg |
NgNg− ζg| → 0 where supgζg ≤ ∆ < ∞ and
infg ζg ≥ δ > 0. Let ξNT = supg supi,j∈Ig max1≤s≤p |α0,s(i) − α0,s(j)| where α0,s(·) is the sth element of
vector α0(·). Assume(
supgNg−1Ng
)ξNT → 0 such that
√NT
(supg
Ng−1Ng
)ξ2NT → 0 as N,T → ∞.
Note that indexing of g by NT is supressed throughout the paper for readability; the objects Ig, Tg,Ng, etc. are defined with respect to the partition for a given sample size N,T.
Assumption 6. |dFit(wit)− dFjt(wit)| ≤ C(wit)|α0(i)− α0(j)| with supi,t C(wit)/dFit(wit) ≤M(wit).
Assumption 7. Let λg be the minimum eigenvalue of 1NgTg
∑i∈Ig
∑Tit=1 Eit
[∂2ϕit(θ0,α0(i))
∂α∂α′
]. Assume that
infg λg ≥ δ > 0 where δ does not depend on N or (T1, ..., TN ).
Assumption 8. Suppose Hθθ(θ, α(·)) = limN,T→∞ 1NT
∑Ni=1
∑Tit=1 Eit[
∂2ϕit(θ,α(i))∂θ∂θ′ ] exists for all (θ, α(·)) ∈
Θ× (×∞i=1A) and let Hθθ0 = Hθθ(θ0, α0(·)). Let
JNT =1NT
GNT∑g=1
∑i∈Ig
Ti∑t=1
(Eit
[∂2ϕit(θ0, α0(i))
∂θ∂θ′
]− κgEit
[∂2ϕit(θ0, α0(i))
∂α∂θ′
])for
κg =
1NgTg
∑i∈Ig
Ti∑t=1
Eit
[∂2ϕit(θ0, α0(i))
∂θ∂α′
] 1NgTg
∑i∈Ig
Ti∑t=1
Eit
[∂2ϕit(θ0, α0(i))
∂α∂α′
]−1
,
and suppose J = limN,T→∞ JNT exists and has minimum eigenvalue λJ ≥ δ > 0.
Assumption 9. GNT (infg NgTg)−τ/2 → 0 and GNT /√NT → 0 as N,T → ∞.
29
Assumption 10. For i ∈ Ig, let S∗it =√ρi(uθit − κguαit
)where uθit = ∂ϕjt(θ0,α0(j))
∂θ − Eit[∂ϕjt(θ0,α0(j))
∂θ
]and uαit = ∂ϕjt(θ0,α0(j))
∂α − Eit[∂ϕjt(θ0,α0(j))
∂α
]. Let Ωi,Ti = Var
(1√Ti
∑Tit=1 S
∗it
), and let λi,Ti be the minimum
eigenvalue of Ωi,Ti . Assume that infi infTi λi,Ti > 0 and that Ω = limN,T→∞ 1N
∑Ni=1 Ωi,Ti exists.
Assumption 11. 1√NT
∑Ni=1
∑Tit=1Eit
[∂ϕit(θ0,α0(i))
∂θ
]→ 0, 1√
NT
∑Ni=1
∑Tit=1 κgEit
[∂ϕit(θ0,α0(i))
∂α
]→ 0, and
supg ‖ 1√NgTg
∑i∈Ig
∑Tit=1Eit
[∂ϕit(θ0,α0(i))
∂α
]‖ → 0.
7.1. Consistency
Lemma 1. Under Assumptions 1, 2, and 4 1NT
∑Ni=1
∑Tit=1 (ϕit(θ, α(i))− Eit[ϕit(θ, α(i))])
p−→ 0.
Proof. Note that
1NT
N∑i=1
Ti∑t=1
(ϕit(θ, α(i))− Eit[ϕit(θ, α(i))])
=1N
N∑i=1
(TiT− ρi
)1Ti
Ti∑t=1
(ϕit(θ, α(i))− Eit[ϕit(θ, α(i))])
+1N
N∑i=1
ρi1Ti
Ti∑t=1
(ϕit(θ, α(i))− Eit[ϕit(θ, α(i))]) .
We then have
E∣∣∣∣ 1N
N∑i=1
(TiT− ρi
)1Ti
Ti∑t=1
(ϕit(θ, α(i))− Eit[ϕit(θ, α(i))])
∣∣∣∣∣≤ 1N
N∑i=1
∣∣∣∣TiT − ρi∣∣∣∣ 1Ti
Ti∑t=1
E |ϕit(θ, α(i))− Eit[ϕit(θ, α(i))]|
≤ 1N
N∑i=1
supi
∣∣∣∣TiT − ρi∣∣∣∣ 1Ti
Ti∑t=1
∆
= ∆ supi
∣∣∣∣TiT − ρi∣∣∣∣→ 0
where the first inequality follows from the triangle inequality, the second from Assumptions 2 and 4, and theconvergence to zero from Assumption 1. Thus,
1N
N∑i=1
(TiT− ρi
)1Ti
Ti∑t=1
(ϕit(θ, α(i))− Eit[ϕit(θ, α(i))]) = op(1).
Abusing notation to define ZiT = ρi1Ti
∑Tit=1 (ϕit(θ, α(i))− Eit[ϕit(θ, α(i))]), the conclusion follows from
Lemmas 1 and 3 in Hansen (2007). 30
Lemma 2. Let γ = (θ, α(·)) and ‖γ‖ be as in Assumption 3. Let Γ = Θ × (×∞i=1A). Under Assumptions1, 2, and 4,
∣∣∣ 1NT
∑Ni=1
∑Tit=1 ϕit(θ, α(i))− 1
NT
∑Ni=1
∑Tit=1 ϕit(θ, α(i))
∣∣∣ ≤ BNT ‖γ − γ‖ for γ, γ ∈ Γ andBNT = Op(1).
Proof. ∣∣∣∣∣ 1NT
N∑i=1
Ti∑t=1
ϕit(θ, α(i)) − 1NT
N∑i=1
Ti∑t=1
ϕit(θ, α(i))
∣∣∣∣∣≤ 1NT
N∑i=1
Ti∑t=1
M(wit)‖(θ, α(i))− (θ, αi)‖ind
≤ 1NT
N∑i=1
Ti∑t=1
M(wit)
k∑j=1
|θj |+ supi
p∑j=1
|αj(i)|
=
(1NT
N∑i=1
Ti∑t=1
M(wit)
)‖γ − γ‖
where the first inequality follows from the triangle inequality and Lipschitz condition in Assumption 4.Defining BNT =
(1NT
∑Ni=1
∑Tit=1M(wit)
), Assumption 2 and 4 can be used to show BNT = Op(1) using
standard arguments.
Lemma 3. Let ΓG = γ ∈ Γ : |α(i)− α(j)| = 0, ∀g, ∀i, j ∈ Ig for Ig defined in Assumption 5. For γ =(θ∗, α∗(·)) ∈ Γ and γG = (θ∗, αG(·)) ∈ ΓG where αG(i) = 1
Ng
∑j∈Ig α
∗(j), we have ‖γ − γG‖ → 0 underAssumption 5.
Proof. Under Assumption 5, we have(
supgNg−1Ng
)ξNT → 0. The conclusion then follows from
‖γ − γG‖ = supi
p∑j=1
|α∗(i)− αG(i)|
= supi
p∑j=1
|α∗j (i)−1Ng
∑s∈Ig
α∗j (s)|
= supi
p∑j=1
| 1Ng
∑s 6=i∈Ig
(α∗j (i)− α∗j (s))|
≤ supi
p∑j=1
1Ng
∑s6=i∈Ih
|α∗j (i)− α∗j (s)|
= supi
1Ng
∑s6=i∈Ih
p∑j=1
|α∗j (i)− α∗j (s)|
≤ supi
1Ng
∑s6=i∈Ih
ξNT
31
≤(
supg
Ng − 1Ng
)ξNT → 0.
Proposition 1. Let γG = (θ, αG(·) = arg max(θ,α(·))∈ΓG1NT
∑Ni=1
∑Tit=1 ϕit(θ, α(i)) for
ΓG = γ ∈ Γ : |αG(i)− αG(j)| = 0 ∀ i, j ∈ Ig
for Ig defined in Assumption 5. If Assumptions 1, 2, 3, 4, and 5 are satisfied, then γGp−→ γ0 = (θ0, α0(·)).
Proof. We prove the result by verifying the conditions of Newey and Powell (2003) Lemma A1. We notethat Γ = Θ× (×∞i=1A) is compact for the norm ‖ · ‖ defined in Assumption 3. The conditions of Newey andPowell (2003) Lemma A2 are therefore satisfied using Lemmas 1 and 2. Lemma A2 of Newey and Powell(2003) implies conditions (i) and (ii) of Newey and Powell (2003) Lemma A1, and Lemma 3 above impliescondition (iii). Thus, the conditions of Newey and Powell (2003) Lemma A1 are satisfied, and the conclusionfollows. .
7.2. Asymptotic Normality
In the following, we let αg(θ) = arg maxα∈A 1NgTg
∑i∈Ig
∑Tit=1 ϕit(θ, α) and use αg = αg(θ0) ∈ A. We
similarly use θ = arg maxθ∈Θ1NT
∑GNTg=1
∑i∈Ig
∑Tit=1 ϕit(θ, αg(θ)) and note that the estimators (θ, α(θ))
obtained in this way are numerically identical to the solution to
(θ, α1, ...αG) = arg maxθ∈Θ,α1∈A,...,αG∈A
1NT
GNT∑g=1
∑i∈Ig
Ti∑t=1
ϕit(θ, αg).
Throughout the following we use superscripts to denote partial differentiation; e.g. ϕαit = ∂ϕit∂α and
ϕθαit = ∂2ϕit∂θ∂α′ . We also let αg = 1
Ng
∑i∈Ig ωiα0(i) for weight
ωi =
1Ng
∑i∈Ig
Ti∑t=1
Ejt [ϕααit (θ0, α0(j))]
−1(Ti∑t=1
Ejt [ϕααit (θ0, α0(j))]
)
for some j ∈ Ig.
7.2.1. Expansion for αg
By definition, αg = arg maxα∈A 1NgTg
∑i∈Ig
∑Tit=1 ϕit(θ, α) which implies that
0 =1
NgTg
∑i∈Ig
Ti∑t=1
ϕαit(θ0, αg).(7.1)
32
Expanding (7.1) about αg = αg yields
0 =1
NgTg
∑i∈Ig
Ti∑t=1
ϕαit(θ0, αg) +1
NgTg
∑i∈Ig
Ti∑t=1
ϕααit (θ0, αg)(αg − αg)(7.2)
where αg are intermediate values satisfying |αg − αg| ≤ |αg − αg|. Further expanding (7.2), we obtain
0 =1
NgTg
∑i∈Ig
Ti∑t=1
ϕαit(θ0, α0(i)) +1
NgTg
∑i∈Ig
Ti∑t=1
ϕααit (θ0, αg(i))(αg − α0(i))(7.3)
+1
NgTg
∑i∈Ig
Ti∑t=1
ϕααit (θ0, αg)(αg − αg)
where αg(i) are intermediate values satisfying |αg(i)− α0(i)| ≤ |αg − α0(i)|.
It follows from (7.3) with some addition and subtraction that
− 1NgTg
∑i∈Ig
Ti∑t=1
ϕααit (θ0, αg)(αg − αg)
=1
NgTg
∑i∈Ig
Ti∑t=1
(ϕαit(θ0, α0(i))− Eit [ϕαit(θ0, α0(i))])(7.4)
+1
NgTg
∑i∈Ig
Ti∑t=1
Eit [ϕαit(θ0, α0(i))](7.5)
+1
NgTg
∑i∈Ig
Ti∑t=1
(ϕααit (θ0, αg(i))− Eit [ϕααit (θ0, αg(i))]) (αg − α0(i))(7.6)
+1
NgTg
∑i∈Ig
Ti∑t=1
(Eit [ϕααit (θ0, αg(i))]− Eit [ϕααit (θ0, αg(j))]) (αg − α0(i))(7.7)
+1
NgTg
∑i∈Ig
Ti∑t=1
(Eit [ϕααit (θ0, αg(j))]− Ejt [ϕααit (θ0, αg(j))]) (αg − α0(i))(7.8)
+1
NgTg
∑i∈Ig
Ti∑t=1
(Ejt [ϕααit (θ0, αg(j))]− Ejt [ϕααit (θ0, α0(j))]) (αg − α0(i))(7.9)
+1
NgTg
∑i∈Ig
Ti∑t=1
(Ejt [ϕααit (θ0, α0(j))]) (αg − α0(i))(7.10)
where j ∈ Ig. Also, note that
1NgTg
∑i∈Ig
Ti∑t=1
(Ejt [ϕααit (θ0, α0(j))]) (αg − α0(i))
33
=1Tg
1Ng
∑i∈Ig
Ti∑t=1
Ejt [ϕααit (θ0, α0(j))]
αg
− 1NgTg
∑i∈Ig
(Ti∑t=1
(Ejt [ϕααit (θ0, α0(j))])
)α0(i)
= 0
where the last equality follows from substituting in the definition of αg. Letting the expressions given indisplays (7.4)-(7.9) be denoted as ψg1-ψg6 respectively, we can then write
αg − αg = −[Hααg
]−1
(6∑j=1
ψgj)(7.11)
where
Hααg =
1NgTg
∑i∈Ig
Ti∑t=1
ϕααit (θ0, αg).(7.12)
7.2.2. Expansion for θ
We start by noting that we can totally differentiate the identity 0 = 1NgTg
∑i∈Ig
∑Tit=1 ϕ
αit(θ, αg(θ)) to obtain
∂αg(θ)∂θ
= −Hααg (θ, αg(θ))−1Hαθ
g (θ, αg(θ))(7.13)
where
Hααg (θ, αg) =
1NgTg
∑i∈Ig
Ti∑t=1
ϕααit (θ, αg)(7.14)
and
Hαθg (θ, αg) =
1NgTg
∑i∈Ig
Ti∑t=1
ϕαθit (θ, αg).(7.15)
From the definition of θ and the assumed differentiability, we also have
0 =1NT
GNT∑g=1
∑i∈Ig
Ti∑t=1
ϕθit(θ, αg(θ)) +1NT
GNT∑g=1
∑i∈Ig
Ti∑t=1
ϕαit(θ, αg(θ))
∂αg(θ)∂θ
|θ
=
1NT
GNT∑g=1
∑i∈Ig
Ti∑t=1
ϕθit(θ, αg(θ))(7.16)
since the definition of αg(θ) implies∑i∈Ig
∑Tit=1 ϕ
αit(θ, αg(θ)) = 0.
34
Expanding (7.16) about θ = θ0 with θ an intermediate value yields
0 =1NT
GNT∑g=1
∑i∈Ig
Ti∑t=1
ϕθit(θ0, αg(θ0))
+
1NT
GNT∑g=1
∑i∈Ig
Ti∑t=1
[ϕθθit (θ, αg(θ))− ϕθαit (θ, αg(θ))Hαα
g (θ, αg(θ))−1Hαθg (θ, αg(θ))
] (θ − θ0)
from which we obtain
θ − θ0 = −J−1 1NT
GNT∑g=1
∑i∈Ig
Ti∑t=1
ϕθit(θ0, αg(θ0))(7.17)
where
J =1NT
GNT∑g=1
∑i∈Ig
Ti∑t=1
[ϕθθit (θ, αg(θ))− ϕθαit (θ, αg(θ))Hαα
g (θ, αg(θ))−1Hαθg (θ, αg(θ))
].(7.18)
We now expand the term 1NT
∑GNTg=1
∑i∈Ig
∑Tit=1 ϕ
θit(θ0, αg(θ0)) in (7.17) about αg(θ0) = αg similarly
to the expansion in Section 7.2.1 above to obtain
θ − θ0 = −J−1
7∑j=1
Bj −1NT
GNT∑g=1
∑i∈Ig
Ti∑t=1
ϕθαit (θ0, αg)(Hααg )−1(
6∑j=1
ψgj)
(7.19)
where
B1 =1NT
GNT∑g=1
∑i∈Ig
Ti∑t=1
(ϕθit(θ0, α0(i))− Eit[ϕθit(θ0, α0(i))]
)(7.20)
B2 =1NT
GNT∑g=1
∑i∈Ig
Ti∑t=1
Eit[ϕθit(θ0, α0(i))],(7.21)
B3 =1NT
GNT∑g=1
∑i∈Ig
Ti∑t=1
(ϕθαit (θ0, αg(i))− Eit[ϕθαit (θ0, αg(i))]
)(αg − α0(i)),(7.22)
B4 =1NT
GNT∑g=1
∑i∈Ig
Ti∑t=1
(Eit[ϕθαit (θ0, αg(i))]− Eit[ϕθαit (θ0, αg(j))]
)(αg − α0(i)),(7.23)
B5 =1NT
GNT∑g=1
∑i∈Ig
Ti∑t=1
(Eit[ϕθαit (θ0, αg(j))]− Ejt[ϕθαit (θ0, αg(j))]
)(αg − α0(i)),(7.24)
B6 =1NT
GNT∑g=1
∑i∈Ig
Ti∑t=1
(Ejt[ϕθαit (θ0, αg(j))]− Ejt[ϕθαit (θ0, α0(j))]
)(αg − α0(i)),(7.25)
B7 =1NT
GNT∑g=1
∑i∈Ig
Ti∑t=1
Ejt[ϕθαit (θ0, α0(j))](αg − α0(i)),(7.26)
35
and αg(·) is a sequence intermediate values satisfying ‖αg(·)−α0(·)‖ ≤ ‖αg−α0(·)‖. Finally, we have B7 = 0since
∑i∈Ig
∑Tit=1 Ejt[ϕθαit (θ0, α0(j))](αg−α0(i)) = 0 for each g = 1, ..., GNT as was demonstrated in Section
7.2.1.
7.2.3. Preliminary Lemmas
Lemma 4. Under Assumptions 1-5 and 7, supg supi∈Ig ‖αg − α0(i)‖ ≤ CNT
(supg
Ng−1Ng
)ξNT for some
CNT = O(1).
Proof. We have αg − α0(i) = 1Ng
∑j∈Ig ωjα0(j) − α0(i) = 1
Ng
∑j∈Ig ωjα0(j) − 1
Ng
∑j∈Ig ωjα0(i) =
1Ng
∑j∈Ig:j 6=i ωj(α0(j)− α0(i)). Thus,
supg
supi∈Ig‖αg − α0(i)‖ ≤ sup
gsupi∈Ig
1Ng
∑j∈Ig:j 6=i
‖ωj‖‖α0(i)− α0(j)‖
≤ supg
1Ng
∑j∈Ig :j 6=i
supi∈Ig‖ωi‖ sup
i,j∈Ig‖α0(i)− α0(j)‖
≤ supg
Ng − 1Ng
supg
supi,j∈Ig
‖α0(i)− α0(j)‖ supg
supi∈Ig‖ωi‖
≤ C(p)ξNT supg
Ng − 1Ng
supg
supi∈Ig‖ωi‖
where C(p) depends only on the dimension of α and the norm.
It remains to be shown that supg supi∈Ig ‖ωi‖ ≤ CNT where CNT = O(1).
‖ωi‖ ≤ ‖
1Ng
∑i∈Ig
Ti∑t=1
Ejt[ϕααit (θ0, α0(j))]
−1
‖‖Ti∑t=1
Ejt[ϕααit (θ0, α0(j))]‖
≤ TiTg
∆‖
1NgTg
∑i∈Ig
Ti∑t=1
Ejt[ϕααit (θ0, α0(j))]
−1
‖
≤ TiTgC∆ = CNT
for some C <∞ where the second inequality follows from Assumption 4 and the last from Assumption 7. Itfollows from Assumption 1 that supg supi∈Ig
TiTg
= supg supi∈IgTi/TTg/T
= O(1). The conclusion follows.
Lemma 5. Under Assumptions 1-5 and 8, 1NT
∑GNTg=1
∑i∈Ig
∑Tit=1 ϕ
θθit (θ, αg(θ))
p−→ Hθθ0 .
Proof. Letting Γ = Θ× (×∞i=1A) as in Lemma 2 and (θ, α(·)) = γ ∈ Γ, we have
1NT
GNT∑g=1
∑i∈Ig
Ti∑t=1
ϕθθit (θ, α(i))p−→ Hθθ(θ, α(·))
36
under Assumptions 1, 2, 4, and 8 by argument similar to those in Lemma 1. By arguments similar to thoseused to demonstrate Lemma 2, we also have that∣∣∣∣∣∣ 1
NT
GNT∑g=1
∑i∈Ig
Ti∑t=1
ϕθθit (θ, α(i)) − 1NT
GNT∑g=1
∑i∈Ig
Ti∑t=1
ϕθθit (θ∗, α∗(i))
∣∣∣∣∣∣≤
1NT
GNT∑g=1
∑i∈Ig
Ti∑t=1
M(wit)
‖γ − γ∗‖with 1
NT
∑GNTg=1
∑i∈Ig
∑Tit=1M(wit) = Op(1) under the conditions of the Lemma. It follows that
supγ∈Γ
∣∣∣∣∣∣ 1NT
GNT∑g=1
∑i∈Ig
Ti∑t=1
ϕθθit (θ, α(i))−Hθθ(θ, α(·))
∣∣∣∣∣∣ p−→ 0
by Newey and Powell (2003) Lemma A2.
Thus, defining αθg(·) : N→ A such that αθg(i) = arg minα∈A∑i∈Ig
∑Tit=1 ϕit(θ, α), we have∣∣∣∣∣∣ 1
NT
GNT∑g=1
∑i∈Ig
Ti∑t=1
ϕθθit (θ, αg(θ))−Hθθ0
∣∣∣∣∣=
∣∣∣∣∣∣ 1NT
GNT∑g=1
∑i∈Ig
Ti∑t=1
ϕθθit (θ, αg(θ))−Hθθ(θ, αθg(·)) +Hθθ(θ, αθg(·))−Hθθ0
∣∣∣∣∣∣≤ sup
γ∈Γ
∣∣∣∣∣∣ 1NT
GNT∑g=1
∑i∈Ig
Ti∑t=1
ϕθθit (θ, αg(θ))−Hθθ(θ, αθg(·))
∣∣∣∣∣∣+∣∣∣Hθθ(θ, αθg(·))−Hθθ
0
∣∣∣p−→ 0
where the convergence in probability follows from the argument above and Proposition 1.
Lemma 6. Suppose supg |vgT − vg0|p−→ 0 and infg vg0 ≥ δ > 0. Then, supg |v−1
gT − v−1g0 |
p−→ 0.
Proof. For given η∗ > 0 and ε∗ > 0,
Pr(supg|v−1gT − v
−1g0 | > η∗) = Pr(sup
g|v−1gT − v
−1g0 | > η∗| sup
g|v−1gT − v
−1g0 | ≤ δ − ξ)
× Pr(supg|v−1gT − v
−1g0 | ≤ δ − ξ)
+ Pr(supg|v−1gT − v
−1g0 | > η∗| sup
g|v−1gT − v
−1g0 | > δ − ξ)
× Pr(supg|v−1gT − v
−1g0 | > δ − ξ) for some 0 < ξ < δ
≤ Pr(supg|v−1gT − v
−1g0 | > η∗| sup
g|v−1gT − v
−1g0 | ≤ δ − ξ)
× Pr(supg|v−1gT − v
−1g0 | ≤ δ − ξ) + (1)(ε∗/2) for N , T large enough
37
= Pr(supg|v−2gT (vgT − vg0)| > η∗| sup
g|v−1gT − v
−1g0 | ≤ δ − ξ)
× Pr(supg|v−1gT − v
−1g0 | ≤ δ − ξ) + ε∗/2 for vgT an intermediate value
≤ Pr(supg|vgT − vg0| > ξ2η∗| sup
g|v−1gT − v
−1g0 | ≤ δ − ξ)
× Pr(supg|v−1gT − v
−1g0 | ≤ δ − ξ) + ε∗/2
= Pr(ξ2η∗ < supg|vgT − vg0| < δ − ξ) + ε∗/2
≤ Pr(supg|vgT − vg0| > ξ2η∗) + ε∗/2
≤ ε∗/2 + ε∗/2 = ε∗ for N , T large enough.
Lemma 7. Let YiT =∑Tit=1 yit have E[YiT ] = 0 and suppose the yit satisfy supitEit‖yit‖2c+ε ≤ ∆ <∞ for
c ∈ N and are α-mixing with mixing coefficients of size (1−k)r/(r−k) uniformly in i where k ∈ 2N, k ≥ 2c,r > k. Under the sequence given in Assumption 1, E‖
∑Ni=1 YiT ‖τ = O((NT )τ/2) for τ ≤ 2c.
Proof. E‖∑Ni=1 YiT ‖τ ≤ (E‖
∑Ni=1 YiT ‖2c)τ/2c ≤ (∆N c−1
∑Ni=1E‖yiT ‖2c)τ/2c ≤ (∆N c(supi Ti)c)τ/2c =
(∆N cT c(supi Ti/T )c)τ/2c = O((NT )τ/2)O(1) where the second inequality follows from the Marcinkiewitz-Zygmund inequality and the third inequality from straightforward modifications of standard arguments suchas those in Doukhan (1994) or Kim (1994).
Lemma 8. max1≤g≤GNT ‖ 1NgTg
∑i∈Ig
∑Tit=1(ϕααit (θ, αg(θ))−Eit[ϕααit (θ0, α0(i))])‖ p−→ 0 under Assumptions
1-5 and 9.
Proof. Consider
max1≤g≤GNT
∥∥∥∥∥∥ 1NgTg
∑i∈Ig
Ti∑t=1
Eit[ϕααit (θ, αg(θ))− ϕααit (θ0, α0(i))]
∥∥∥∥∥∥≤ max
1≤g≤GNT
1NgTg
∑i∈Ig
Ti∑t=1
∥∥∥Eit[ϕααit (θ, αg(θ))− ϕααit (θ0, α0(i))]∥∥∥
≤
1NgTg
∑i∈Ig
Ti∑t=1
supi,t
Eit[M(wit)]
(‖γ − γ0‖)
≤ ∆‖γ − γ0‖ ≤ ∆‖γ − γ0‖p−→ 0
using Proposition 1 and the definition of θ as an intermediate value.
Also,
Pr
max1≤g≤GNT
sup(θ,α)∈Θ×A
∥∥∥∥∥∥ 1NgTg
∑i∈Ig
Ti∑t=1
(ϕααit (θ, α)− Eit[ϕααit (θ, α)])
∥∥∥∥∥∥ > η
38
≤GNT∑g=1
Pr
sup(θ,α)∈Θ×A
∥∥∥∥∥∥ 1NgTg
∑i∈Ig
Ti∑t=1
(ϕααit (θ, α)− Eit[ϕααit (θ, α)])
∥∥∥∥∥∥ > η
.Let ε > 0 be such that 2ε supi,t Eit[M(wit)] < η/3. Divide Γg = Θ × A into subsets Γ1, ...,Γm(ε) such that‖(θ, α) − (θ∗, α∗)‖ < ε whenever (θ, α) and (θ∗, α∗) are in the same subset. Let (θj , αj) denote some pointin Γj for each j. Then
sup(θ,α)∈Θ×A
∥∥∥∥∥∥ 1NgTg
∑i∈Ig
Ti∑t=1
(ϕααit (θ, α)− Eit[ϕααit (θ, α)])
∥∥∥∥∥∥= max
jsup
(θ,α)∈Γj
∥∥∥∥∥∥ 1NgTg
∑i∈Ig
Ti∑t=1
(ϕααit (θ, α)− Eit[ϕααit (θ, α)])
∥∥∥∥∥∥which implies
Pr
sup(θ,α)∈Θ×A
∥∥∥∥∥∥ 1NgTg
∑i∈Ig
Ti∑t=1
(ϕααit (θ, α)− Eit[ϕααit (θ, α)])
∥∥∥∥∥∥ > η
≤m(ε)∑j=1
Pr
sup(θ,α)∈Γj
∥∥∥∥∥∥ 1NgTg
∑i∈Ig
Ti∑t=1
(ϕααit (θ, α)− Eit[ϕααit (θ, α)])
∥∥∥∥∥∥ > η
.For (θ, α) ∈ Γj ,∥∥∥∥∥∥ 1
NgTg
∑i∈Ig
Ti∑t=1
(ϕααit (θ, α)− Eit[ϕααit (θ, α)])
∥∥∥∥∥=
∥∥∥∥∥∥ 1NgTg
∑i∈Ig
Ti∑t=1
(ϕααit (θj , αj)− Eit[ϕααit (θj , αj)])
+1
NgTg
∑i∈Ig
Ti∑t=1
(ϕααit (θ, α)− ϕααit (θj , αj))
+1
NgTg
∑i∈Ig
Ti∑t=1
(Eit[ϕααit (θj , αj)]− Eit[ϕααit (θ, α)])
∥∥∥∥∥∥≤
∥∥∥∥∥∥ 1NgTg
∑i∈Ig
Ti∑t=1
(ϕααit (θj , αj)− Eit[ϕααit (θj , αj)])
∥∥∥∥∥∥+
∥∥∥∥∥∥ 1NgTg
∑i∈Ig
Ti∑t=1
(ϕααit (θ, α)− ϕααit (θj , αj))
∥∥∥∥∥∥+
∥∥∥∥∥∥ 1NgTg
∑i∈Ig
Ti∑t=1
(Eit[ϕααit (θj , αj)]− Eit[ϕααit (θ, α)])
∥∥∥∥∥∥39
≤
∥∥∥∥∥∥ 1NgTg
∑i∈Ig
Ti∑t=1
(ϕααit (θj , αj)− Eit[ϕααit (θj , αj)])
∥∥∥∥∥∥+
1NgTg
∑i∈Ig
Ti∑t=1
‖ϕααit (θ, α)− ϕααit (θj , αj)‖
+1
NgTg
∑i∈Ig
Ti∑t=1
‖Eit[ϕααit (θj , αj)]− Eit[ϕααit (θ, α)]‖
≤
∥∥∥∥∥∥ 1NgTg
∑i∈Ig
Ti∑t=1
(ϕααit (θj , αj)− Eit[ϕααit (θj , αj)])
∥∥∥∥∥∥+
1NgTg
∑i∈Ig
Ti∑t=1
M(wit) ‖(θ, α)− (θj , αj)‖
+1
NgTg
∑i∈Ig
Ti∑t=1
Eit[M(wit)] ‖(θj , αj)− (θ, α)‖
=
∥∥∥∥∥∥ 1NgTg
∑i∈Ig
Ti∑t=1
(ϕααit (θj , αj)− Eit[ϕααit (θj , αj)])
∥∥∥∥∥∥+
1NgTg
∑i∈Ig
Ti∑t=1
(M(wit)− Eit[M(wit)]) ‖(θ, α)− (θj , αj)‖
+2
NgTg
∑i∈Ig
Ti∑t=1
Eit[M(wit)] ‖(θj , αj)− (θ, α)‖
≤
∥∥∥∥∥∥ 1NgTg
∑i∈Ig
Ti∑t=1
(ϕααit (θj , αj)− Eit[ϕααit (θj , αj)])
∥∥∥∥∥∥+ ε
∥∥∥∥∥∥ 1NgTg
∑i∈Ig
Ti∑t=1
(M(wit)− Eit[M(wit)])
∥∥∥∥∥∥+η
3.
So
Pr
sup(θ,α)∈Γj
∥∥∥∥∥∥ 1NgTg
∑i∈Ig
Ti∑t=1
(ϕααit (θ, α)− Eit[ϕααit (θ, α)])
∥∥∥∥∥∥ > η
≤ Pr
∥∥∥∥∥∥ 1NgTg
∑i∈Ig
Ti∑t=1
(ϕααit (θj , αj)− Eit[ϕααit (θj , αj)])
∥∥∥∥∥∥+ ε
∥∥∥∥∥∥ 1NgTg
∑i∈Ig
Ti∑t=1
(M(wit)− Eit[M(wit)])
∥∥∥∥∥∥ > 2η3
40
≤ Pr
∥∥∥∥∥∥ 1NgTg
∑i∈Ig
Ti∑t=1
(ϕααit (θj , αj)− Eit[ϕααit (θj , αj)])
∥∥∥∥∥∥ > η
3
+ Pr
∥∥∥∥∥∥ 1NgTg
∑i∈Ig
Ti∑t=1
(M(wit)− Eit[M(wit)])
∥∥∥∥∥∥ > 2η3ε
= Pr
∥∥∥∥∥∥ 1NgTg
∑i∈Ig
Ti∑t=1
(ϕααit (θj , αj)− Eit[ϕααit (θj , αj)])
∥∥∥∥∥∥τ
>(η
3
)τ+ Pr
∥∥∥∥∥∥ 1NgTg
∑i∈Ig
Ti∑t=1
(M(wit)− Eit[M(wit)])
∥∥∥∥∥∥τ
>
(2η3ε
)τ= O((NgT )−τ/2)
by the Markov inequality and standard results for mixing sequences as in Doukhan (1994) or Kim (1994) asdemonstrated in Lemma 7.
It follows that Pr[max1≤g≤GNT sup(θ,α)∈Θ×A
∥∥∥ 1NgTg
∑i∈Ig
∑Tit=1 (ϕααit (θ, α)− Eit[ϕααit (θ, α)])
∥∥∥ > η]≤
∆∑GNTg=1 (NgT )−τ/2 ≤ ∆GNT (infg NgT )−τ/2 → 0 under Assumption 9. Thus,
max1≤g≤GNT
sup(θ,α)∈Θ×A
∥∥∥∥∥∥ 1NgTg
∑i∈Ig
Ti∑t=1
(ϕααit (θ, α)− Eit[ϕααit (θ, α)])
∥∥∥∥∥∥ p−→ 0.
Finally, we have
max1≤g≤GNT
∥∥∥∥∥∥ 1NgTg
∑i∈Ig
Ti∑t=1
(ϕααit (θ, αg(θ))− Eit[ϕααit (θ0, α0(i))])
∥∥∥∥∥∥≤ max
1≤g≤GNT
∥∥∥∥∥∥ 1NgTg
∑i∈Ig
Ti∑t=1
(ϕααit (θ, αg(θ))− Eit[ϕααit (θ, αg(θ))])
∥∥∥∥∥∥+ max
1≤g≤GNT
∥∥∥∥∥∥ 1NgTg
∑i∈Ig
Ti∑t=1
(Eit[ϕααit (θ, αg(θ))]− Eit[ϕααit (θ0, α0(i))])
∥∥∥∥∥∥≤ max
1≤g≤GNTsup
(θ,α)∈Θ×A
∥∥∥∥∥∥ 1NgTg
∑i∈Ig
Ti∑t=1
(ϕααit (θ, α)− Eit[ϕααit (θ, α)])
∥∥∥∥∥∥+ max
1≤g≤GNT
∥∥∥∥∥∥ 1NgTg
∑i∈Ig
Ti∑t=1
(Eit[ϕααit (θ, αg(θ))]− Eit[ϕααit (θ0, α0(i))])
∥∥∥∥∥∥p−→ 0
by the previous arguments. 41
Lemma 9. Under Assumptions 1-5, 7, and 9,
max1≤g≤GNT
‖
1NgTg
∑i∈Ig
Ti∑t=1
ϕααit (θ, αg(θ))
−1
−
1NgTg
∑i∈Ig
Ti∑t=1
Eit[ϕααit (θ0, α0(i))]
−1
‖ p−→ 0.
Proof. The result is immediate given convergence in Lemma 8 and Assumption 7. See Lemma 6.
Lemma 10. max1≤g≤GNT ‖ 1NgTg
∑i∈Ig
∑Tit=1(ϕθαit (θ, αg(θ))−Eit[ϕθαit (θ0, α0(i))])‖ p−→ 0 under Assumptions
1-5 and 9.
Proof. Proof proceeds similarly to the proof of Lemma 8 and is omitted. .
Lemma 11. Under Assumptions 1-5, 7, and 9,∥∥∥∥ 1NT
GNT∑g=1
∑i∈Ig
Ti∑t=1
ϕθαit (θ, α(θ))
×
1NgTg
∑i∈Ig
Ti∑t=1
ϕααit (θ, α(θ))
−1 1NgTg
∑i∈Ig
Ti∑t=1
ϕαθit (θ, α(θ))
−
∑i∈Ig
Ti∑t=1
Eit[ϕθαit (θ0, α0(i))]
×
1NgTg
∑i∈Ig
Ti∑t=1
Eit[ϕααit (θ0, α0(i))]
−1 1NgTg
∑i∈Ig
Ti∑t=1
Eit[ϕαθit (θ0, α0(i))]
∥∥∥∥∥∥∥
p−→ 0.
Proof. For notational simplicity, we write ϕθαit = ϕθαit (θ, α(θ)), ϕααit = ϕααit (θ, α(θ)), ϕθαit = ϕθαit (θ0, α0(i)),and ϕααit = ϕααit (θ0, α0(i)). Also define Hθα
NT = 1NgTg
∑i∈Ig
∑Tit=1 ϕ
θαit , Hαα
NT = 1NgTg
∑i∈Ig
∑Tit=1 ϕ
ααit ,
HθαNT = 1
NgTg
∑i∈Ig
∑Tit=1 Eit[ϕθαit ], and Hαα
NT = 1NgTg
∑i∈Ig
∑Tit=1 Eit[ϕααit ]. We have that∥∥∥∥ 1
NT
GNT∑g=1
[(NgTg)Hθα
NT
(HααNT
)−1
HαθNT − (NgTg)Hθα (Hαα
NT )−1HαθNT ]
∥∥∥∥=
∥∥∥∥∥ 1NT
GNT∑g=1
[(NgTg)Hθα
NT
((HααNT
)−1
− (HααNT )−1
)HαθNT
]
+1NT
GNT∑g=1
[(NgTg)Hθα
NT (HααNT )−1
(HαθNT −Hαθ
NT
)]
+1NT
GNT∑g=1
[(NgTg)
(HθαNT −Hθα
NT
)(Hαα
NT )−1HαθNT
]∥∥∥∥∥42
≤ max1≤g≤GNT NgTgNgT
max1≤g≤GNT
∥∥∥∥(HααNT
)−1
− (HααNT )−1
∥∥∥∥ 1GNT
GNT∑g=1
1NgT
∑i∈Ig
Ti∑t=1
M(wit)
2
+1δ
max1≤g≤GNT
∥∥∥HαθNT −Hαθ
NT
∥∥∥ 1GNT
GNT∑g=1
1NgT
∑i∈Ig
Ti∑t=1
M(wit)
+∆δ
max1≤g≤GNT
∥∥∥HαθNT −Hαθ
NT
∥∥∥]
=max1≤g≤GNT NgTg
NgT
op(1)1
GNT
GNT∑g=1
1NgT
∑i∈Ig
Ti∑t=1
M(wit)
2
+op(1)1
GNT
GNT∑g=1
1NgT
∑i∈Ig
Ti∑t=1
M(wit) + op(1)
using Lemmas 9 and 10. Under Assumptions 1 and 5, we have max1≤g≤GNT NgTg
NgT= O(1). We also
have E∥∥∥ 1GNT
∑GNTg=1
1NgT
∑i∈Ig
∑Tit=1M(wit)
∥∥∥ ≤ ∆ and E∥∥∥∥ 1GNT
∑GNTg=1
(1
NgT
∑i∈Ig
∑Tit=1M(wit)
)2∥∥∥∥ ≤
1GNT
∑g
1(NgT )2
∑i∈Ig
∑Tit=1
∑j∈Ig
∑Tis=1(E[M(wit)2]E[M(wjs)2])1/2 ≤ ∆ under Assumption 4 which gives
1GNT
∑GNTg=1
(1
NgT
∑i∈Ig
∑Tit=1M(wit)
)2
= Op(1) and 1GNT
∑GNTg=1
1NgT
∑i∈Ig
∑Tit=1M(wit) = Op(1). The
conclusion then follows.
Lemma 12. If Assumptions 1-5 and 7-9 are satisfied, Jp−→ J for J in equation (7.18) and J defined in
Assumption 8 and J−1 p−→ J−1.
Proof. The first result is immediate from Lemmas 5 and 11 and second follows immediately from thecontinuous mapping theorem under the eigenvalue condition in Assumption 8.
Lemma 13. Under Assumptions 1-4, 7-8, and 10,√NT (B1 − 1
NT
∑GNTg=1 (NgTg)κgψg1) d−→ N(0,Ω) for κg
and Ω defined in Assumption 8.
Proof. Let Uit = uθit − κguαit. Note that supi E∣∣∣ 1√
Ti
∑Tit=1 Uit
∥∥∥2
≤ C supi,t,k E[U2it,k] for some C <∞ where
Uit,k is the kth element of vector Uit follows from standard results for mixing sequences; see, e.g. Doukhan(1994) or Kim (1994). supi,t,k E[U2
it,k] ≤ supi,t E[M(wit)2] + 2 supi,t |κg|E[M(wit)2] + supi,t κ2gE[M(wit)2] ≤
∆(1 + 2 supg |κg| + supg κ2g) follows under Assumptions 2 and 4. supg |κg| ≤ ∆/δ and supg κ2
g ≤ ∆2/δ2 are
also obvious under Assumption 4 and 7. It thus follows that supi E∣∣∣ 1√
Ti
∑Tit=1 Uit
∥∥∥2
≤ C for some constantC <∞.
Consider
E
∥∥∥∥∥ 1√N
N∑i=1
(1√T
Ti∑t=1
Uit −√ρi
1√Ti
Ti∑t=1
Uit
)∥∥∥∥∥2
= E
∥∥∥∥∥ 1√N
N∑i=1
(√Ti√T−√ρi
)1√Ti
Ti∑t=1
Uit
∥∥∥∥∥2
43
=1N
N∑i=1
(√Ti√T−√ρi
)2
E
∥∥∥∥∥ 1√Ti
Ti∑t=1
Uit
∥∥∥∥∥2
≤ C supi
(√TiT−√ρi
)2
where the second equality follows from independence across i and the inequality from the previous argument.
Assumption 1 also gives that supi
(√TiT −
√ρi
)2
→ 0, so
1√N
N∑i=1
(1√T
Ti∑t=1
Uit −√ρi
1√Ti
Ti∑t=1
Uit
)= op(1).
Abusing notation and defining YiT =√ρi
1√Ti
∑Tit=1 Uit = 1
Ti
∑Tit=1 S
∗it, we have 1√
N
∑Ni=1 YiT
d−→N(0,Ω) as in Hansen (2007) Lemma 2. The conclusion then follows by noting that
√NT (B1 −
1NT
GNT∑g=1
(NgTg)κgψg1) =1√NT
N∑i=1
Ti∑t=1
Uit
=1√N
N∑i=1
(1√T
Ti∑t=1
Uit −√ρi
1√Ti
Ti∑t=1
Uit
)+
1√N
N∑i=1
YiT .
Lemma 14. Under Assumptions 1-5 and 7, B3 = Op
(ξNT supg
Ng−1Ng√
NT
).
Proof. Recall that B3 = 1NT
∑GNTg=1
∑i∈Ig
∑Tit=1
(ϕθαit (θ0, αg(i))− Eit[ϕθαit (θ0, αg(i))]
)(αg − α0(i)), and to
conserve notation, let zit = ϕθαit (θ0, αg(i))−Eit[ϕθαit (θ0, αg(i))]. Note that E[zit] = 0 and that E[∥∥∥∑Ti
t=1 zit
∥∥∥2]≤
Ti∆ under Assumptions 2 and 4, so we have
E[‖B3‖2] =1
(NT )2E
trace
GNT∑g=1
∑i∈Ig
Ti∑t=1
GNT∑h=1
∑j∈Ih
Tj∑s=1
(αg − α0(i))′z′itzjs(αh − α0(j))
=
1(NT )2
GNT∑g=1
∑i∈Ig
trace
(αg − α0(i))′E
( Ti∑t=1
zit
)′( Ti∑t=1
zit
) (αg − α0(i))
≤ 1
(NT )2
GNT∑g=1
∑i∈Ig
‖αg − α0(i)‖2E
∥∥∥∥∥Ti∑t=1
zit
∥∥∥∥∥2
≤ 1(NT )2
GNT∑g=1
∑i∈Ig
∆(
supg
Ng − 1Ng
)2
C2NTTiξ
2NT =
∆(
supgNg−1Ng
)2
C2NT ξ
2NT
NT
where the last inequality follows from Lemma 4. The conclusion is then immediate. 44
Lemma 15. Under Assumptions 1-5 and 7, B4 = O
(((supg
Ng−1Ng
)+(
supgNg−1Ng
)2)ξ2NT
).
Proof. Recall that B4 = 1NT
∑GNTg=1
∑i∈Ig
∑Tit=1
(Eit[ϕθαit (θ0, αg(i))]− Eit[ϕθαit (θ0, αg(j))]
)(αg − α0(i)) =
1NT
∑GNTg=1
∑i∈Ig
∑Tit=1
(Eit[ϕθαit (θ0, αg(i))− ϕθαit (θ0, αg(j))]
)(αg − α0(i)). Then,
‖B4‖ ≤1NT
GNT∑g=1
∑i∈Ig
Ti∑t=1
(Eit‖ϕθαit (θ0, αg(i))− ϕθαit (θ0, αg(j))‖
)‖αg − α0(i)‖
≤ 1NT
GNT∑g=1
∑i∈Ig
Ti∑t=1
Eit[M(wit)]‖αg(i)− αg(j)‖‖αg − α0(i)‖
=1NT
GNT∑g=1
∑i∈Ig
Ti∑t=1
Eit[M(wit)]‖(αg(i)− α0(i))− (αg(j)− α0(j))− (α0(j)− α0(i))‖‖αg − α0(i)‖
≤ ∆CNT
(supg
Ng − 1Ng
)ξNT
(2CNT
(supg
Ng − 1Ng
)ξNT + C(p)ξNT
)the second inequality follows under Assumption 4 and the last inequality follows from Lemma 4 using thetriangle inequality and the definition of αg(i).
Lemma 16. Under Assumptions 1-5 and 6-7, B5 = O((
supgNg−1Ng
)ξ2NT
).
Proof. Note that under Assumption 6
‖Eit[ϕθαit (θ0, αg(j))]− Ejt[ϕθαit (θ0, αg(j))]‖ = ‖∫W
ϕθαit (θ0, αg(j))(dFit − dFjt)‖
≤∫W
‖ϕθαit (θ0, αg(j))‖‖(dFit − dFjt)‖
≤∫W
M(w)C(w)‖α0(i)− α0(j)‖
≤(∫
W
M(w)2dFit
)‖α0(i)− α0(j)‖
≤ ∆‖α0(i)− α0(j)‖.
It then follows that ‖B5‖ ≤ 1NT
∑GNTg=1
∑i∈Ig
∑Tit=1 ∆‖α0(i)−α0(j)‖‖αg −α0(i)‖ ≤ ∆
(supg
Ng−1Ng
)ξ2NT .
Lemma 17. Under Assumptions 1-5 and 7, B6 = O
((supg
Ng−1Ng
)2
ξ2NT
).
Proof. B6 = 1NT
∑GNTg=1
∑i∈Ig
∑Tit=1
(Ejt[ϕθαit (θ0, αg(j))− ϕθαit (θ0, α0(j))]
)(αg − α0(i)), so it follows from
the Lipschitz condition and moments bounds in Assumption 4 that ‖B6‖ ≤ 1NT
∑GNTg=1
∑i∈Ig
∑Tit=1 ∆‖αg(j)−
α0(j)‖‖αg−α0(i)‖ ≤ ∆ 1NT
∑GNTg=1
∑i∈Ig
∑Tit=1 ‖αg−α0(i)‖2 = O
((supg
Ng−1Ng
)2
ξ2NT
)using Lemma 4 and
the fact that ‖αg(j)− α0(j)‖ ≤ ‖αg − α0(i)‖. 45
In the following lemmas, let
Hθαg =
1NgTg
∑i∈Ig
Ti∑t=1
ϕθαit (θ0, αg),
Hααg =
1NgTg
∑i∈Ig
Ti∑t=1
ϕααit (θ0, αg),
Hθαg =
1NgTg
∑i∈Ig
Ti∑t=1
Eit[ϕθαit (θ0, αg)],
Hααg =
1NgTg
∑i∈Ig
Ti∑t=1
Eit[ϕααit (θ0, αg)],
Hθαg =
1NgTg
∑i∈Ig
Ti∑t=1
Eit[ϕθαit (θ0, α0(i))], and
Hααg =
1NgTg
∑i∈Ig
Ti∑t=1
Eit[ϕααit (θ0, α0(i))].
Note that
1NT
GNT∑g=1
∑i∈Ig
Ti∑t=1
ϕθαit (θ0, αg)(Hααg )−1(
6∑j=1
ψgj) =1NT
GNT∑g=1
(NgTg)Hθαg (Hαα
g )−1(6∑j=1
ψgj)
and that
1NT
GNT∑g=1
(NgTg)Hθαg (Hαα
g )−1(6∑j=1
ψgj) =1NT
GNT∑g=1
(NgTg)Hθαg (Hαα
g )−1(6∑j=1
ψgj)
+1NT
GNT∑g=1
(NgTg)(Hθαg −Hθα
g )(Hααg )−1(
6∑j=1
ψgj)
+1NT
GNT∑g=1
(NgTg)Hθαg
[(Hαα
g )−1 − (Hααg )−1
](
6∑j=1
ψgj)
Lemma 18. Under Assumptions 1-7, 9, and 11, 1√NT
∑GNTg=1 (NgTg)Hθα
g (Hααg )−1(
∑6j=2 ψgj) = op(1).
Proof. 1NT
∑GNTg=1 (NgTg)Hθα
g (Hααg )−1(ψg2) = 1
NT
∑GNTg=1
∑i∈Ig
∑Tit=1 κgEit[ϕ
αit(θ0, α0(i))] = 1√
NTo(1) by
Assumption 11.
Next,
1NT
GNT∑g=1
(NgTg)Hθαg (Hαα
g )−1(ψg3) =1NT
GNT∑g=1
∑i∈Ig
Ti∑t=1
κg(ϕααit (θ0, αg(i))− Eit[ϕααit (θ0, αg(i))])(αg − α0(i))
=1NT
GNT∑g=1
∑i∈Ig
(Ti∑t=1
zit
)(αg − α0(i))
46
with zit = κg(ϕααit (θ0, αg(i)) − Eit[ϕααit (θ0, αg(i))]). Then E∥∥∥ 1NT
∑GNTg=1
∑i∈Ig
(∑Tit=1 zit
)(αg − α0(i))
∥∥∥2
≤∆C2
NT ξ2NT
(supg
Ng−1Ng
)2
NT by an argument similar to that used in Lemma 14 from which it follows that
1NT
GNT∑g=1
(NgTg)Hθαg (Hαα
g )−1(ψg3) = Op
ξNT(
supgNg−1Ng
)√NT
.
For the next term, we have
1NT
GNT∑g=1
(NgTg)Hθαg (Hαα
g )−1(ψg4) =1NT
GNT∑g=1
∑i∈Ig
Ti∑t=1
κgEit[ϕααit (θ0, αg(i))− ϕααit (θ0, αg(j))](αg − α0(i)),
so
‖ 1NT
GNT∑g=1
(NgTg)Hθαg (Hαα
g )−1(ψg4)‖
≤ (∆2/δ)CNT
(supg
Ng − 1Ng
)ξNT
(CNT
(supg
Ng − 1Ng
)ξNT + C(p)ξNT
)
= O
(((supg
Ng − 1Ng
)+(
supg
Ng − 1Ng
)2)ξ2NT
)
by an argument similar to that in Lemma 15 using that ‖κg‖ ≤ ∆/δ.
From an argument similar to that used in Lemma 16, it follows that ‖Eit[ϕααit (θ0, αg(j))]−Ejt[ϕααit (θ0, αg(j))]‖ ≤∆‖α0(i)−α0(j)‖. It then follows that 1
NT
∑GNTg=1 (NgTg)Hθα
g (Hααg )−1(ψg5) ≤ 1
NT
∑GNTg=1
∑i∈Ig
∑Tit=1(∆2/δ)‖α0(i)−
α0(j)‖‖αg − α0(i)‖ = O((
supgNg−1Ng
)ξ2NT
).
Finally, we have 1NT
∑GNTg=1 (NgTg)Hθα
g (Hααg )−1(ψg5) = 1
NT
∑GNTg=1
∑i∈Ig
∑Tit=1 κgEjt[ϕ
ααit (θ0, αg(j)) −
ϕααit (θ0, α0(j))](αg − α0(i)) = O
((supg
Ng−1Ng
)2
ξ2NT
)using the same argument as in Lemma 17.
It then follows that 1√NT
∑GNTg=1 (NgTg)Hθα
g (Hααg )−1(
∑6j=2 ψgj) = op(1).
Lemma 19. Under Assumptions 1-7, 9, and 11, 1√NT
∑GNTg=1 (NgTg)(Hθα
g − Hθαg )(Hαα
g )−1(∑6j=1 ψgj) =
op(1).
Proof.
1NT
GNT∑g=1
(NgTg)(Hθαg −Hθα
g )(Hααg )−1(
6∑j=1
ψgj) =1NT
GNT∑g=1
(NgTg)(Hθαg − Hθα
g )(Hααg )−1(
6∑j=1
ψgj)
+1NT
GNT∑g=1
(NgTg)(Hθαg −Hθα
g )(Hααg )−1(
6∑j=1
ψgj).
47
(i) Considering the first term, we have
E‖ 1NT
GNT∑g=1
(NgTg)(Hθαg − Hθα
g )(Hααg )−1ψg1‖ ≤
1NT
GNT∑g=1
1δ
(NgTg)(E‖Hθαg − Hθα
g ‖2E‖ψg1‖2)1/2
≤ 1NT
GNT∑g=1
1δ
(NgTg)(
∆NgTg
∆NgTg
)1/2
= CGNTNT
where the first inequality is from Cauchy-Schwarz and the second from arguments as in Lemma 7. Thus,1√NT
∑GNTg=1 (NgTg)(Hθα
g − Hθαg )(Hαα
g )−1ψg1p−→ 0 under Assumption 9.
(ii) Next
E‖ 1√NT
GNT∑g=1
(NgTg)(Hθαg − Hθα
g )(Hααg )−1ψg2‖
≤ 1√NT
GNT∑g=1
1δ
(E‖√NgTg(Hθα
g − Hθαg )‖2
)1/2
‖√NgTgψg2‖
≤ 1√NT
supg‖√NgTgψg2‖GNT
∆δ
= o(1)GNT√NT
→ 0
where the first inequality follows from the triangle and Cauchy-Schwarz inequalities and Assumption 7 andthe second inequality follows from Assumption 11 and Lemma 7. Convergence to zero is guaranteed byAssumption 9.
(iii) 1NT
∑GNTg=1 (NgTg)(Hθα
g − Hθαg )(Hαα
g )−1ψg3 = Op
(GNTNT
(supg
Ng−1Ng
)ξNT
)follows in a similar fash-
ion to (i) using an argument similar to that used in Lemma 14 to show that E‖√NgTgψg3‖2 ≤ C
(supg
Ng−1Ng
)2
ξ2NT .
(iv)
‖ψg4‖ = ‖ 1NgTg
∑i∈Ig
Ti∑t=1
(Eit [ϕααit (θ0, αg(i))]− Eit [ϕααit (θ0, αg(j))]) (αg − α0(i))‖
≤ ∆CNT
(supg
Ng − 1Ng
)ξNT
(CNT
(supg
Ng − 1Ng
)ξNT + C(p)ξNT
)≡ ΞNT
follows from an argument similar to that in Lemma 15. We then have
E‖ 1NT
GNT∑g=1
(NgTg)(Hθαg − Hθα
g )(Hααg )−1ψg4‖
≤ 1NT
GNT∑g=1
ΞNTδ
(NgTg)(
E‖Hθαg − Hθα
g ‖2)1/2
≤ 1NT
GNT∑g=1
∆ΞNTδ
supg
(NgTg)1/2 =GNT√NT
∆ΞNTδ
supg(NgTg)1/2
NgT.
48
Under the assumptions, GNT√NT→ 0,
√NTΞNT → 0, and supg(NgTg)1/2
NgT= O(1), so 1√
NT
∑GNTg=1 (NgTg)(Hθα
g −Hθαg )(Hαα
g )−1ψg4p−→ 0.
(v)
1√NT
GNT∑g=1
(NgTg)(Hθαg − Hθα
g )(Hααg )−1ψg5
p−→ 0
and1√NT
GNT∑g=1
(NgTg)(Hθαg − Hθα
g )(Hααg )−1ψg6
p−→ 0
follow from arguments similar to those used in (iv) by making use of
‖ψg5‖ = ‖ 1NgTg
∑i∈Ig
Ti∑t=1
(Eit [ϕααit (θ0, αg(j))]− Ejt [ϕααit (θ0, αg(j))]) (αg − α0(i))‖
≤ ∆(
supg
Ng − 1Ng
)ξ2NT and
‖ψg6‖ = ‖ 1NgTg
∑i∈Ig
Ti∑t=1
(Ejt [ϕααit (θ0, αg(j))]− Ejt [ϕααit (θ0, α0(j))]) (αg − α0(i))‖
≤ ∆(
supg
Ng − 1Ng
)2
ξ2NT
which follow respectively from arguments similar to those in Lemmas 16 and 17.
(vi) For the remaining terms, we have∥∥∥∥∥ 1NT
GNT∑g=1
(NgTg)(Hθαg −Hθα
g )(Hααg )−1 (
6∑j=1
ψgj)
∥∥∥∥∥∥≤ 1NT
∑g
(NgTg)∥∥∥Hθα
g −Hθαg
∥∥∥∥∥(Hααg )−1
∥∥∥∥∥∥∥∥6∑j=1
ψgj
∥∥∥∥∥∥≤ C
NT
∑g
(NgTg)
1NgTg
∑i∈Ig
Ti∑t=1
Eit[M(wit)]
‖αg − αg‖∥∥∥∥∥∥
6∑j=1
ψgj
∥∥∥∥∥∥≤ C
NT
∑g
(NgTg)‖αg − αg‖
∥∥∥∥∥∥6∑j=1
ψgj
∥∥∥∥∥∥=
C
NT
∑g
(NgTg)
∥∥∥∥∥∥[(Hαα
g )−1 − (Hααg )−1
] 6∑j=1
ψgj + (Hααg )−1
6∑j=1
ψgj
∥∥∥∥∥∥∥∥∥∥∥∥
6∑j=1
ψgj
∥∥∥∥∥∥≤ C
NTmax
1≤g≤GNT
∥∥∥(Hααg )−1 − (Hαα
g )−1∥∥∥GNT∑g=1
(NgTg)
∥∥∥∥∥∥6∑j=1
ψgj
∥∥∥∥∥∥∥∥∥∥∥∥
6∑j=1
ψgj
∥∥∥∥∥∥49
+C
NT
GNT∑g=1
(NgTg)
∥∥∥∥∥∥6∑j=1
ψgj
∥∥∥∥∥∥∥∥∥∥∥∥
6∑j=1
ψgj
∥∥∥∥∥∥ .From arguments identical to those used to verify Lemma 9, we can show
max1≤g≤GNT
∥∥∥(Hααg )−1 − (Hαα
g )−1∥∥∥ p−→ 0;
so it suffices to show 1√NT
∑GNTg=1 (NgTg)
∥∥∥∑6j=1 ψgj
∥∥∥∥∥∥∑6j=1 ψgj
∥∥∥ p−→ 0.
(vii) From the triangle inequality,∥∥∥∑6
j=1 ψgj
∥∥∥ ≤∑6j=1 ‖ψgj‖. From (iv) and (v), we have ‖ψg4‖ ≤ ΞNT ,
‖ψg5‖ ≤ ∆(
supgNg−1Ng
)ξ2NT , and ‖ψg6‖ ≤ ∆
(supg
Ng−1Ng
)2
ξ2NT . Let ΥNT = max‖ψg4‖, ‖ψg5‖, ‖ψg6‖ and
note that√NTΥNT → 0. Then
∑6j=1 ‖ψgj‖ ≤
∑3j=1 ‖ψgj‖+ 3ΥNT . We also have
‖ψg2‖ =1√NgTg
‖ 1√NgTg
∑i∈Ig
Ti∑t=1
Eit[ϕαit(θ0, α0(i))]‖ ≤ 1√NgTg
BNT
where BNT = supg ‖ 1√NgTg
∑i∈Ig
∑Tit=1 Eit[ϕαit(θ0, α0(i))]‖ = o(1). Finally, we have E‖ψg1‖2 ≤ ∆
NgTgas in
(i) and E‖ψg3‖2 ≤C(
supgNg−1Ng
)2ξ2NT
NgTgas in (iii). Putting all of this together gives
E
∥∥∥∥∥∥6∑j=1
ψgj
∥∥∥∥∥∥∥∥∥∥∥∥
6∑j=1
ψgj
∥∥∥∥∥∥ ≤ E
(‖ψg1‖+ ‖ψg3‖+1√NgTg
BNT + 3ΥNT
)2
= E
[‖ψg1‖2 + 2‖ψg1‖‖ψg3‖+ ‖ψg3‖2 +
2√NgTg
‖ψg1‖BNT
+2√NgTg
‖ψg3‖BNT +1
NgTgB2NT + 6‖ψg1‖ΥNT + 6‖ψg3‖ΥNT
+6√NgTg
BNTΥNT + 9Υ2NT
]
≤ ∆NgTg
+2(C∆)1/2
(supg
Ng−1Ng
)ξNT
NgTg+C(
supgNg−1Ng
)2
ξ2NT
NgTg
+2∆1/2BNTNgTg
+2C1/2
(supg
Ng−1Ng
)ξNTBNT
NgTg+B2NT
NgTg
+6∆1/2ΥNT√
NgTg+
6C1/2(
supgNg−1Ng
)ξNTΥNT√
NgTg+
6BNTΥNT√NgTg
+ 9Υ2NT
=∆ + bNTNgTg
+ΥNT (6∆1/2 + cNT )√
NgTg+ 9Υ2
NT
where bNT → 0 and cNT → 0.50
(viii) Using (vii) yields
E
∣∣∣∣∣ 1√NT
GNT∑g=1
(NgTg)
∥∥∥∥∥∥6∑j=1
ψgj
∥∥∥∥∥∥∥∥∥∥∥∥
6∑j=1
ψgj
∥∥∥∥∥∥∣∣∣∣∣∣ =
1√NT
GNT∑g=1
(NgTg)E
∥∥∥∥∥∥6∑j=1
ψgj
∥∥∥∥∥∥∥∥∥∥∥∥
6∑j=1
ψgj
∥∥∥∥∥∥
≤ 1√NT
GNT∑g=1
(NgTg)
(∆ + bNTNgTg
+ΥNT (6∆1/2 + cNT )√
NgTg+ 9Υ2
NT
)
=GNT (∆ + bNT )√
NT+
ΥNT (6∆1/2 + cNT )√NT
GNT∑g=1
√NgTg + 9
√NTΥ2
NT
= o(1) +ΥNT (6∆1/2 + cNT )√
NT
GNT∑g=1
√NgTg + o(1)
≤ ΥNT (6∆1/2 + cNT )√NT
GNT∑g=1
NgTg + o(1)
=√NTΥNT (6∆1/2 + cNT ) + o(1) = o(1).
(ix) The conclusion of the lemma follows by combining (i)-(viii).
Lemma 20. Under Assumptions 1-7, 9, and 11,
1√NT
GNT∑g=1
(NgTg)Hθαg
[(Hαα
g )−1 − (Hααg )−1
](
6∑j=1
ψgj) = op(1).
Proof. From a mean value expansion of (Hααg )−1 about Hαα
g = Hααg , we have (Hαα
g )−1 − (Hααg )−1 =
−(Hαα∗g )−1(Hαα
g −Hααg )(Hαα∗
g )−1 where ‖Hαα∗g −Hαα
g ‖ ≤ ‖Hααg −Hαα
g ‖. Expanding further yields
(Hααg )−1 − (Hαα
g )−1 = −(Hααg )−1(Hαα
g −Hααg )(Hαα
g )−1
− (Hααg )−1(Hαα
g −Hααg )[(Hαα∗
g )−1 − (Hααg )−1]
− [(Hαα∗g )−1 − (Hαα
g )−1](Hααg −Hαα
g )(Hααg )−1
− [(Hαα∗g )−1 − (Hαα
g )−1](Hααg −Hαα
g )[(Hαα∗g )−1 − (Hαα
g )−1].
Plugging this expression into
1√NT
GNT∑g=1
(NgTg)Hθαg
[(Hαα
g )−1 − (Hααg )−1
](
6∑j=1
ψgj)
and making use of max1≤g≤GNT ‖(Hαα∗g )−1 − (Hαα
g )−1‖ p−→ 0 which can be demonstrated as in Lemma9 using that ‖Hαα∗
g − Hααg ‖ ≤ ‖Hαα
g − Hααg ‖, the result follows by an argument similar to that used to
demonstrate Lemma 19 with Hθαg , Hθα
g , and Hθαg replaced by Hαα
g , Hααg , and Hαα
g . 51
7.2.4. Main Result
Proposition 2. Under Assumptions 1-11,√NT (θ − θ0) d−→ J−1N(0,Ω).
Proof. The results is an immediate consequence of the expansions derived in Sections 7.2.1 and 7.2.2 andLemmas 12-20.
References
Almeida, H., M. Campbello, and M. Weibach (2004): “The Cash Flow Sensitivity of Cash,” Journal of Finance,59, 1777–1804.
Altonji, J., T. Elder, and C. Taber (2005): “Selection on Observed and Unobserved Variables: Assessing theEffectiveness of Catholic Schools,” Journal of Political Economy, 113, 151–184.
Anderson, E. (1970): “Asymptotic Properties of Conditional Maximum Likelihood Estimators,” Journal of theRoyal Statistical Society, Series B, 32(2), 283–301.
Arellano, M. (2003): “Discrete Choice with Panel Data,” Investigaciones Economicas, 27(3), 423–458.Arellano, M., and S. Bonhomme (2009): “Robust Priors in Nonlinear Panel Data Models,” forthcoming Econo-metrica.
Arellano, M., and J. Hahn (2005): “Understanding Bias in Nonlinear Panel Models: Some Recent Developments,”Invited Lecture, Econometric Society World Congress, London.
Baltagi, B. (1992): “Specification Issues,” in The Econometrics of Panel Data, ed. by Matyas, and Sevestre. KluwerAcademic Publishers.
Bester, A. C., and C. Hansen (2007): “A Penalty Function Approach to Bias Reduction in Nonlinear PanelModels with Fixed Effects,” forthcoming Journal of Business and Economic Statistics.
Bester, C. A., and C. B. Hansen (2008): “Identification of Marginal Effects in a Correlated Random EffectsModel,” forthcoming Journal of Business and Economic Statistics.
Carro, J. M. (2006): “Estimating Dynamic Panel Data Discrete Choice Models,” forthcoming Journal of Econo-metrics.
Chamberlain, G. (1980): “Analysis of Covariance with Qualitative Data,” Review of Economic Studies, 47, 225–238.Chen, S., and S. Khan (2007): “Semiparametric Estimation of Nonstationary Censored Panel Data Models withTime-Varying Factor Loads,” forthcoming, Econometric Theory.
Chernozhukov, V., H. Hong, and E. Tamer (2004): “Inference on Identified Parameter Sets in EconometricModels,” MIT Working Paper.
Chernozukov, V., I. Fernandez-Val, J. Hahn, and W. Newey (2009): “Identification and Estimation ofMarginal Effects in Nonlinear Panel Models,” Working Paper, Department of Economics, MIT.
Doukhan, P. (1994): Mixing: Properties and Examples, vol. 85 of Lecture Notes in Statistics (Springer-Verlag). NewYork: Springer-Verlag, Editors S. Fienberg, J. Gani, K. Krickeberg, I. Olkin, and N. Wermuth.
Faulkender, M., and R. Wang (2006): “Corporate Financial Policy and the Value of Cash,” Journal of Finance,61, 1957–1990.
Fernandez-Val, I. (2005): “Estimation of Structural Parameters and Marginal Effects in Binary Choice Panel DataModels with Fixed Effects,” Mimeo.
Gayle, G.-L., and C. Viauroux (2007): “Root-N Consistent Semiparametric Estimators of a Dynamic PanelSample Selection Model,” Journal of Econometrics, 141(1), 179–212.
Hahn, J., and G. Kuersteiner (2004): “Bias Reduction for Dynamic Nonlinear Panel Models with Fixed Effects,”Mimeo.
Hahn, J., and G. M. Kuersteiner (2002): “Asymptotically Unbiased Inference for a Dynamic Panel Model withFixed Effects When Both N and T Are Large,” Econometrica, 70(4), 1639–1657.
Hahn, J., and W. K. Newey (2004): “Jackknife and Analytical Bias Reduction for Nonlinear Panel Models,”Econometrica, 72(4), 1295–1319.
Hansen, C. B. (2007): “Asymptotic Properties of a Robust Variance Matrix Estimator for Panel Data when T isLarge,” Journal of Econometrics, 141, 597–620.
52
Hausman, J. A., and W. E. Taylor (1981): “Panel Data and Unobservable Individual Effects,” Econometrica,49(6), 1377–1398.
Heckman, J. J. (1981): “The Incidental Parameters Problem and the Problem of Initial Conditions in Estimating aDiscrete Time-Discrete Data Stochastic Process,” in Structural Analysis of Discrete Panel Data with EconometricApplications, ed. by C. F. Manski, and D. McFadden. Elsevier: North-Holland.
Henderson, D. J., and A. Ullah (2005): “A Nonparametric Random Effects Estimator,” Economics Letters,88(3), 403–407.
Holmstrom, B., and J. Tirole (1998): “Private and Public Supply of Liquidity,” Journal of Political Economy,106, 1–40.
Honore, B. E. (1992): “Trimmed LAD and Least Squares Estimation of Truncated and Censored Models withFixed Effects,” Econometrica, 60(3), 533–565.
Honore, B. E., and E. Kyriazidou (2000): “Panel Data Discrete Choice Models with Lagged Dependent Vari-ables,” Econometrica, 68(4), 839–874.
Honore, B. E., and A. Lewbel (2002): “Semiparametric Binary Choice Panel Data Models Without StrictlyExogenous Regressors,” Econometrica, 70, 2053–2063.
Honore, B. E., and E. Tamer (2006): “Bounds on the Parameters in Panel Dynamic Discrete Choice Models,”Econometrica, 74(3), 611–632.
Kim, T. Y. (1994): “Moment Bounds for Non-Stationary Dependent Sequences,” Journal of Applied Probability, 31,731–742.
Lancaster, T. (2002): “Orthogonal Parameters and Panel Data,” Review of Economic Studies, 69, 647–666.Lewbel, A. (2005): “Simple Endogenous Binary Choice and Selection Panel Model Estimators,” mimeo.Lin, X., and R. J. Carroll (2000): “Nonparametric function estimation for clustered data when the predictor ismeasured without/with error,” Journal of the American Statistical Association, 95, 520–534.
Lindley, D. V., and A. F. M. Smith (1972): “Bayes Estimates for the Linear Model,” Journal of the RoyalStatistical Society, Series B, 34, 1–41.
Manski, C. (1987): “Semiparametric Analysis of Random Effects Linear Models from Binary Panel Data,” Econo-metrica, 55(2), 357–362.
Matyas, L., and P. Blanchard (1998): “Misspecified heterogeneity in panel data models,” Statistical Papers, 39,1–27.
Meulbroek, L. K. (1992): “An Empirical Analysis of Illegal Insider Trading,” Journal of Finance, 47(5), 1661–1699.Mundlak, Y. (1978): “On the Pooling of Time Series and Cross Section Data,” Econometrica, 46, 69–85.Newey, W. K., and J. L. Powell (2003): “Instrumental Variable Estimation of Nonparametric Models,” Econo-metrica, 71, 1565–1578.
Neyman, J., and E. L. Scott (1948): “Consistent Estimates Based on Partially Consistent Observations,” Econo-metrica, 16(1), 1–32.
Nickell, S. (1981): “Biases in Dynamic Models with Fixed Effects,” Econometrica, 49(6), 1417–1426.Raudenbush, S. W., and A. S. Bryk (2002): Hierarchical Linear Models: Applications and Data Analysis Methods.Thousand Oaks: Sage Publications, second edn.
Roulstone, D. T. (2006): “Insider Trading and the Information Content of Earnings Announcements,” ChicagoGSB working paper.
StataCorp (2007): Stata Statistical Software: Release 10. College Station, TX: StataCorp LP.Sufi, A. (2009): “Bank Lines of Credit in Corporate Finance: An Empirical Analysis,” Review of Financial Studies,22, 1057–1088.
Sun, Y. X. (2005): “Estimation and Inference in Panel Structure Models,” working paper, UCSD.Ullah, A., and N. Roy (1998): “Nonparametric and Semiparametric Econometrics of Panel Data,” in Handbookof Applied Economic Statistics, ed. by A. Ullah, and D. E. A. Giles, vol. 1. Mercel Dekker: New York, NY.
Wooldridge, J. M. (2002): Econometric Analysis of Cross Section and Panel Data. Cambridge, Massachusetts:The MIT Press.
(2005): “Unobserved Heterogeneity and Estimation of Average Partial Effects,” in Identification and Infer-ence for Econometric Models: Essays in Honor of Thomas Rothenberg, ed. by D. W. K. Andrews, and J. H. Stock.Cambridge University Press.
Woutersen, T. (2005): “Robustness against Incidental Parameters and Mixing Distributions,” Mimeo.
53
FEFE‐BC
500200
10050
2010
51
AIC
BIC
Bias1.161
1.075*
*0.546
0.2200.082
0.031‐0.035
‐0.2410.094
‐0.008
RMSE
1.2781.193
**
0.6250.284
0.1590.129
0.1260.276
0.2160.124
AIC %
0*
**
1.614.3
19.351.8
12.90.1
**
BIC %0
**
*0
00.1
22.973.3
3.7*
*Size
N0.812
0.731*
*0.593
0.2960.103
0.0730.096
0.6690.146
0.079Size
50.564
0.492*
*0.312
0.1670.080
0.0560.075
0.3680.108
0.066
Bias0.202
0.127*
*0.091
0.0420.017
‐0.004‐0.053
‐0.2480.013
‐0.008
RMSE
0.2200.151
**
0.1150.076
0.0620.059
0.0810.271
0.0620.059
AIC %
0*
**
00.3
66.632.5
0.60
**
BIC %0
**
*0
00.3
76.522.8
0.4*
*Size
N0.708
0.315*
*0.285
0.1160.062
0.0640.215
0.9300.062
0.068Size
50.486
0.219*
*0.194
0.0950.074
0.0700.127
0.5510.073
0.074
Bias1.024
0.9410.488
0.1510.068
0.0290.009
‐0.009‐0.058
‐0.2570.007
‐0.012
RMSE
1.0410.959
0.5020.167
0.0900.062
0.0530.052
0.0810.278
0.0530.053
AIC %
0*
00
0.20.5
77.721
0.60
**
BIC %0
*0
00
00.5
81.817.6
0.1*
*Size
N1.000
1.0001.000
0.6510.229
0.0930.053
0.0490.259
0.9580.051
0.052Size
50.989
0.9810.903
0.4210.161
0.0700.048
0.0540.137
0.6090.049
0.056
Bias0.197
0.1230.088
0.0330.017
0.0060.001
‐0.013‐0.060
‐0.2530.002
‐0.004
RMSE
0.2010.129
0.0940.043
0.0310.026
0.0250.028
0.0700.272
0.0250.026
AIC %
0*
00
0.15.2
94.40.3
00
**
BIC %0
*0
00
046.1
52.91
0*
*Size
N1.000
0.9460.866
0.2350.097
0.0550.044
0.0740.643
0.9950.047
0.052Size
51.000
0.9460.866
0.2350.097
0.0550.044
0.0740.643
0.9950.047
0.052
Table 1. Simulation Results for Pure H
ierarchical Model
Grouping Schem
e
A. N
=200, T=2
B. N=200, T=8
C. N=1000, T=2
D. N
=1000, T=8
FEFE‐BC
500200
10050
2010
51
AIC
BIC
Bias1.150
1.061*
*0.481
0.1840.066
‐0.004‐0.092
‐0.2510.102
‐0.029
RMSE
1.2651.177
**
0.5630.260
0.1570.132
0.1570.284
0.2280.137
AIC %
0*
**
2.818.1
47.429.1
2.50.1
**
BIC %0
**
*0
00.4
49.745.3
4.6*
*Size
N0.751
0.682*
*0.482
0.2020.083
0.0590.183
0.6500.127
0.091Size
50.514
0.444*
*0.249
0.1270.070
0.0630.116
0.3750.092
0.081
Bias0.187
0.116*
*0.074
0.0200.001
‐0.037‐0.112
‐0.2620.000
‐0.027
RMSE
0.2080.143
**
0.1050.070
0.0640.073
0.1330.281
0.0640.070
AIC %
0*
**
00.2
95.44.4
00
**
BIC %0
**
*0
018.6
74.56.9
0*
*Size
N0.584
0.238*
*0.174
0.0640.056
0.1170.490
0.9600.056
0.090Size
50.405
0.189*
*0.130
0.0770.067
0.1020.266
0.6940.066
0.090
Bias1.033
0.9460.445
0.1280.051
0.011‐0.001
‐0.038‐0.109
‐0.2600.000
‐0.022
RMSE
1.0520.966
0.4610.147
0.0810.059
0.0560.068
0.1270.277
0.0610.062
AIC %
0*
0.10.3
0.10.3
97.12.1
00
**
BIC %0
*0
00
030.8
64.74.5
0*
*Size
N1.000
0.9980.993
0.4580.152
0.0560.046
0.1280.555
0.9710.050
0.085Size
50.992
0.9780.838
0.3050.113
0.0800.071
0.1070.285
0.6930.074
0.085
Bias0.186
0.1150.074
0.0180.001
‐0.013‐0.012
‐0.042‐0.111
‐0.261‐0.012
‐0.012
RMSE
0.1900.121
0.0820.035
0.0290.032
0.0310.053
0.1210.275
0.0310.031
AIC %
0*
00
00
99.90.1
00
**
BIC %0
*0
00
095.4
4.60
0*
*Size
N0.999
0.8660.658
0.0970.064
0.0930.086
0.3700.911
1.0000.086
0.090Size
50.947
0.6140.444
0.0850.053
0.0740.072
0.2240.512
0.8560.072
0.074
A. N
=200, T=2
B. N=200, T=8
C. N=1000, T=2
D. N
=1000, T=8
Grouping Schem
eTable 2. Sim
ulation Results for Mixed M
odel
FEFE‐BC
500200
10050
2010
51
AIC
BIC
Bias1.090
1.009*
*0.258
‐0.123‐0.246
‐0.272‐0.283
‐0.2920.237
‐0.292
RMSE
1.1951.114
**
0.3460.170
0.2620.285
0.2950.303
0.3420.303
AIC %
0*
**
92.26.7
0.10
0.20.8
**
BIC %0
**
*0
00
0.10.1
99.8*
*Size
N0.826
0.738*
*0.262
0.2820.775
0.8650.908
0.9330.269
0.933Size
50.534
0.461*
*0.125
0.1640.596
0.7160.762
0.8050.132
0.805
Bias0.210
0.132*
*‐0.121
‐0.223‐0.267
‐0.281‐0.287
‐0.2920.132
‐0.291
RMSE
0.2250.152
**
0.1330.228
0.2710.284
0.2900.295
0.1520.294
AIC %
100*
**
00
00
00
**
BIC %0
**
*0
00
0.52.9
96.6*
*Size
N0.783
0.354*
*0.685
0.9951.000
1.0001.000
1.0000.354
1.000Size
50.568
0.269*
*0.434
0.9430.990
0.9950.998
0.9990.269
0.999
Bias1.006
0.9270.233
‐0.173‐0.245
‐0.270‐0.284
‐0.288‐0.290
‐0.2920.233
‐0.292
RMSE
1.0230.944
0.2520.179
0.2480.273
0.2860.290
0.2930.294
0.2520.294
AIC %
0*
1000
00
00
00
**
BIC %0
*0
00
00
00.1
99.9*
*Size
N1.000
1.0000.788
0.9591.000
1.0001.000
1.0001.000
1.0000.788
1.000Size
50.992
0.9790.463
0.7950.996
1.0001.000
1.0001.000
1.0000.463
1.000
Bias0.208
0.130‐0.124
‐0.240‐0.268
‐0.281‐0.288
‐0.290‐0.292
‐0.2930.130
‐0.292
RMSE
0.2110.134
0.1260.240
0.2680.281
0.2890.291
0.2920.293
0.1340.293
AIC %
100*
00
00
00
00
**
BIC %0
*0
00
00
0.21.5
98.3*
*Size
N1.000
0.9830.999
1.0001.000
1.0001.000
1.0001.000
1.0000.983
1.000Size
50.995
0.8490.961
1.0001.000
1.0001.000
1.0001.000
1.0000.849
1.000
Table 3. Simulation Results for Pure Individual Effect M
odelGrouping Schem
e
A. N
=200, T=2
B. N=200, T=8
C. N=1000, T=2
D. N
=1000, T=8
Mean Std. Dev. Pooled FE AIC[EBITDA/(assets‐cash)]t‐1 ‐0.016 0.372 0.639 0.206 0.408
(0.070) (0.727) (0.084)[0.146] [0.005] [0.075]
0.192[Tangible assets/(assets‐cash)]t‐1 0.328 0.237 0.283 ‐1.041 0.245
(0.104) (1.644) (0.207)[0.064] [‐0.025] [0.045]
‐0.984[ln(assets‐cash)]t‐1 4.901 2.388 0.158 0.491 0.198
(0.013) (0.289) (0.030)[0.036] [0.012] [0.036]
0.469[Net worth, cash adjusted]t‐1 0.424 0.252 ‐0.350 ‐1.619 ‐0.221
(0.100) (0.967) (0.150)[‐0.080] [‐0.040] [‐0.041]
‐1.537[Market to book, cash adjusted]t‐1 2.406 2.866 ‐0.071 0.119 ‐0.026
(0.009) (0.065) (0.009)[‐0.016] [0.003] [‐0.005]
0.113AIC 5817.8 7905.6 5441.4
Descriptive Statistics Probit Coef. EstimatesTable 4. Results from Bank Lines of Credit Example
Note: The table reports descriptive statistics and estimated probit coefficients for key independent variables. The dependent variable in the analysis is whether a firm has a bank line of credit. The mean of the dependent variable is .795. We use two years of data (2002‐2003) giving a total of 7034 observations with 3648 firms. Columns labeled "Probit Coef. Estimates" provide estimation results from a pooled probit model (Pooled), a model which includes a full set of firm dummies (FE), and the AIC minimizing grouped‐effect estimator (AIC) which in this case makes use of 405 groups. The number in parentheses below the point estimates are estimated standard error clustered at the firm level. The numbers in brackets give estimated average marginal effects. For the FE estimator, bias‐corrected FE results based on the Hahn and Kuersteiner (2004) are provided in the braces.
Mean Std. Dev. Pooled FE AIC
Cumulative Abnormal Returns 0.008 0.114 0.306 0.716 0.336(0.100) (0.229) (0.103)
0.623
(Cumulative Abnormal Returns)2 ‐0.297 ‐0.617 ‐0.231(0.159) (0.190) (0.122)
‐0.537CAR Average Marginal Effect 0.103 0.041 0.107
Unexpected Earnings ‐0.006 0.698 0.287 0.286 0.281(0.096) (0.208) (0.094)
0.253
(Unexpected Earnings)2 ‐0.089 ‐0.078 ‐0.009(0.046) (0.058) (0.040)
‐0.071UE Average Marginal Effect 0.098 0.035 0.090AIC 22232 22028 21766
Cumulative Abnormal Returns 0.008 0.114 ‐0.099 ‐0.644 ‐0.137(0.107) (0.256) (0.112)
‐0.560
(Cumulative Abnormal Returns)2 ‐0.031 0.203 ‐0.030(0.081) (0.367) (0.091)
0.165CAR Average Marginal Effect ‐0.029 0.019 ‐0.038
Unexpected Earnings ‐0.006 0.698 0.014 0.223 0.033(0.076) (0139) (0.084)
0.211
(Unexpected Earnings)2 0.0003 0.0032 0.0006(0.0009) (0.0015) (0.0009)
0.0031UE Average Marginal Effect 0.004 0.016 0.009AIC 19268 20478 18891
Table 5. Results from Insider Trading ExampleDescriptive Statistics Probit Coef. Estimates
Note: The table reports descriptive statistics and estimated probit coefficients for key independent variables. The dependent variable for the results in Panel A is an indicator for whether there were any insider buys of own company stock during the quarter, and the dependent variable in Panel B is an indicator for whether there were any insider sells of own company stock during the quarter. We use quarterly data from 1999 giving a total of 19,716 observations with 5877 firms. Columns labeled "Probit Coef. Estimates" provide estimation results from a pooled probit model (Pooled), a model which includes a full set of firm dummies (FE), and the AIC minimizing grouped‐effect estimator (AIC) which uses 425 groups in both the buy results and the sell results. The number in parentheses below the point estimates are estimated standard error clustered at the firm level. For the FE estimator, bias‐corrected FE parameter estimates based on the Hahn and Kuersteiner (2004) are provided in the braces.
A. Insider Buys (Mean of Buys = 0.300)
B. Insider Sells (Mean of Sells = 0.276)