EstimatingDynamicEconomicModelswithFixed Eﬀects

Estimating Dynamic Economic Models with FixedEffects∗

PRELIMINARY AND INCOMPLETEJeppe Druedahl† Thomas H. Jørgensen‡ Dennis Kristensen§

January 30, 2017

Abstract

We propose a novel approach to estimate dynamic economic models with hetero-geneous agents from observed behavior. The estimator is non-parametric in thesense that it does not impose any restrictions on the distribution of heterogeneousparameters. We develop the asymptotic behavior of the estimator and Monte Carloresults show that the proposed estimator works well even in relatively short panels.We apply our method to estimate a model of intertemporal consumption allocationallowing for heterogeneity in time preferences using high quality Danish longitudinalregister data. We find substantial heterogeneity in preferences within educationalstrata and the distributions of estimated preferences suggest more mass at highvalues of discount factors for high skilled. Finally, we use the estimated household-specific preferences to show that households who have never had an unemploymentinsurance are also less patient and less risk averse than other households. (JEL:C14, C51, D91)

Keywords: Heterogeneity, Dynamic Economic Models, Structural Estimation, Intertem-poral asset allocation.

∗We thank Bo E. Honoré, Elena Manresa, Christopher Carroll, Mette Ejrnæs, Lutz Hendricks, RasmusSøndergaard Pedersen, Søren Leth-Petersen, Claus Thustrup Kreiner and Anders Munk-Nielsen for fruit-ful discussions and suggestions. The project also benefited from seminar participants at Princeton andCopenhagen. Financial support from the Danish Council for Independent Research in Social Sciences isgratefully acknowledged (FSE, grant no. 4091-00040 and 5052-00086B). Part of this research was carriedout while Jørgensen was visiting Princeton University in the fall 2015. Jørgensen thanks Bo E. Honoréfor his exceptional hospitality and effort in organizing the stay. An earlier draft was circulated under thetitle “Heterogeneous Preferences and Wealth Inequality”.†Department of Economics, University of Copenhagen, Øster Farimagsgade 5, Building 26, DK-

1353 Copenhagen K, Denmark. E-mail: [email protected]. Website: http://econ.ku.dk/druedahl.‡Department of Economics, University College London, Gower Street, London, United Kingdom. E-

mail: [email protected]. Webpage: www.tjeconomics.com.§Department of Economics, University College London, Gower Street, London, United Kingdom. E-

mail: [email protected]. Website: https://sites.google.com/site/econkristensen.

[email protected]

http://econ.ku.dk/druedahl

http://econ.ku.dk/druedahl

[email protected]

www.tjeconomics.com

[email protected]

https://sites.google.com/site/econkristensen

1 Introduction

Economic agents are recurrently found to be heterogeneous in terms of ex ante charac-teristics such as abilities and preferences. Experiments and surveys have, for example,repeatedly provided evidence of substantial preference heterogeneity.1 This heterogeneity,furthermore, often have important positive and normative implications. Heterogeneity inpatience and risk aversion are, for example, important in explaining wealth inequality inexcess of income inequality2 and for understanding asset price puzzles.3 Furthermore itmight have large effects on the form and level of optimal taxation.4

We propose a novel non-parametric approach to estimate dynamic economic modelswith heterogeneous agents from panel data on observed choices. We use that systematicvariation in observed choices, beyond what a given economic model and measurementerror can explain, is evidence of heterogeneity. Our estimator of both homogeneous andheterogeneous parameters is simple to implement without imposing any distributionalassumptions on heterogeneous parameters. It can furthermore be used to estimate modelswith both discrete and continuous choices, though we focus on the latter.

For concreteness,imagine that the goal is to estimate a consumption-saving model inthe spirit of Deaton (1991), allowing for heterogeneity in a single preference parameter,and that we have access to panel data of N households, indexed by i, where we for Tiperiods, indexed by t, observe their market resources mit and level of consumption cit.Further denote the model-implied optimal level of consumption by c?(mit; θ, γi), whereθ is a vector of homogeneous parameters, and γi is a household-specific heterogeneouspreference parameter.

In order to estimate θ and the household-specific values of γi, our main assumptionis the distribution of γi can be well-approximated by a discrete distribution, Γ. Ourestimator can therefore be thought of as a grouped fixed effects estimator, where thedistribution of γi is uncovered as the histogram of the household-specific values of γi, asoriginally suggested by Kamakura (1991). Postponing the discussion of how to choose Γin empirical applications to later, assuming instead that it is known, the homogeneousparameters in θ and the group membership of each household, ji ∈ 1, . . . J, can beestimated using e.g. nonlinear least squares as

1 See e.g. Barsky, Juster, Kimball and Shapiro (1997), Beetsma and Schotman (2001), Holt and Laury(2005), Andersen, Harrison, Lau and Rutström (2008, 2010), Guiso and Paiella (2008), Kimball, Sahmand Shapiro (2008, 2009), Dohmen, Falk, Huffman, Sunde, Schupp and Wagner (2011), Andreoni andSprenger (2012) and Finke and Huston (2013).

2 See e.g. Krusell and Smith (1997, 1998), Hendricks (2007), Cagetti and De Nardi (2008), Cozzi (2014),Carroll, Slacalek and Tokuoka (2014) and De Nardi (2015).

3 See e.g. Guvenen (2006, 2009) and Gârleanu and Panageas (2015).4 See e.g. Kocherlakota (2010) and Farhi and Werning (2012).

1

(θ, j1, j2, . . . , jN) = arg minθ,j1,j2,...jN

1N

N∑i=1

Ti∑t=1

(cit − c?(mit; θ, γji))2

Estimating the distribution of γi then boils down to finding the weights on each elementin Γ, ω = ω1, . . . , ωJ. These weights can be found by simple population averages,ωk = 1

N

∑Ni=1 1ji = k.

Traditional fixed effect (FE) estimators, not assuming finite support of γi, suffers froman incidental parameter problem known to cause a substantial bias in nonlinear panelmodels (see e.g. Hahn and Newey, 2004). Hahn and Moon (2010) argue that because theclassification parameters are super-consistent when assuming finite support, the incidentalparameter problem of our estimator should less pronounced. Furthermore, unlike randomcoefficient models, our estimator allows for arbitrary correlation between heterogeneousparameters and other model elements. Finally, a major computational advantage of ourestimator is that, conditional on θ, the J solutions to the economic model, c?(·), can bepre-computed. The model does thus not need to be re-solved when estimating theN groupmemberships. This substantially reduces the computational time required to evaluate thecriterion function, and makes many and potentially dense points in Γ feasible.

Monte Carlo estimation results show that the proposed estimator has good finitesample properties. Specifically, we test its ability to uncover heterogeneous time prefer-ences in the canonical buffer-stock consumption model pioneered by Deaton (1991, 1992)and Carroll (1992, 1997). Assuming that observed consumption is contaminated withmultiplicative log-normal measurement error, we formulate a maximum likelihood (ML)version of our estimator. We find that the estimator performs well even with relativefew time periods and substantial measurement error in consumption. We also find thatwhile misspecifying the number of and placement of the fixed nodes in γ naturally affectsthe performance of the estimator, the estimated distribution of heterogeneous parametersare very close to the truth. We find a substantial bias in homogeneous parameters whenthe number of nodes are incorrect, however. We show how the panel jackknife approachof Hahn and Newey (2004) and Dhaene and Jochmans (forthcoming) can substantiallyreduce the bias in the homogeneous parameters.

To illustrate the empirical applicability of our proposed estimator, we also estimatethe buffer-stock consumption model on Danish administrative register data allowing forheterogeneous time preferences and/or heterogeneous CRRA coefficients. This model wasfirst structurally estimated in Gourinchas and Parker (2002) and Cagetti (2003), assum-ing homogeneous preferences within occupational and educational strata respectively. Weare the first to estimate the model with a non-parametric distribution of preference het-erogeneity. Our results suggest that there is substantial heterogeneity within educationalstrata. Across educational strata we find that the estimated distributions of discount fac-

2

tors and CRRA coefficients are shifted towards higher values for high skilled households.Alan and Browning (2010), which is the only comparable study estimating heterogeneouspreference parameters using observational data, finds similar results using the Panel Studyof Income Dynamics (PSID).

Our explicit estimation of group memberships allows us to perform post-estimationanalyzes of the different preference groups. Specifically, we use the estimated household-specific preferences to show that households who have never had an unemployment insur-ance are less patient and less risk averse than other households. This suggests that theestimated preferences align with economic intuition.

After discussing the related literature below, the paper proceeds as follows. Section2 formulates and presents the proposed estimator in general notation. Section 4 presentsthe Monte Carlo estimation results. In section 5, we report the estimation results fromour empirical application. Finally, we conclude in section 6.

1.1 Relation to Existing Estimators

Our proposed estimator is closely related to two recent strands of literature. Firstly,Bajari, Fox and Ryan (2007) and Fox, Kim, Ryan and Bajari (2011) suggest a similar his-togram strategy of fixing a discrete grid of the heterogeneous parameters when estimatingdiscrete choice models with random coefficients.5 Fox, Kim and Yang (2015) provide for-mal justification for their non-parametric approach approach. The assumption in thisexisting literature is that the coefficients are random, and they thus seek to estimate thepopulation weights on each fixed node, ω. In the case of discrete (or discretized) choicemodels, this estimator can be formulated as a constrained least squares problem, whichis easy to implement, and is ensured to have a unique global optimum. Unfortunately,the estimator is much more complex for continuous choice models because it generallyrequires solving a highly non-linear optimization problem with all the population weightsas variables.6 All dynamic models with random coefficients, furthermore, face the initialconditions problem (see e.g. Heckman, 1981).

The second strand of literature, closely related to our proposed estimator, is thegrouped fixed effect (GFE) estimator proposed by Hahn and Moon (2010); Bonhommeand Manresa (2015) and Bester and Hansen (forthcoming).7 In these papers, both theplacement of groups (i.e. the values in Γ) and the group membership of each observa-

5 Like our estimator, this facilitates pre-computation of the model solution for all J types (for given θ).Ackerberg (2009) likewise proposed a combination of importance sampling and change of variables toreduce simulation based estimation time by “pre-computing” the solution only over relevant objects.

6 The constrained least squares formulation of the estimator can, as shown by Nevo, Turner and Williams(forthcoming), be recovered for continuous choices in a method of moment version where all the momentsare restricted to be linear in the population weights.

7 See also the related studies by Lin and Ng (2012) and Ando and Bai (2016).

3

tional unit is explicitly estimated. However, as discussed in Bonhomme and Manresa(2015), estimating the group placements can in practice imply problems with multiplelocal optima and non-convergence.8 This implies that the GFE estimator is mostly usefulwhen heterogeneity is suspected to be in the form of a small number of “sufficiently”distinct groups across an unknown domain. Our estimator oppositely focuses on the caseof pervasive heterogeneity on a well-known domain.

In a broader context, our estimator is also related to a large literature on estimationof mixture models. A particularly popular estimator in this class is the non-parametricmaximum likelihood estimator (NPMLE) proposed by Heckman and Singer (1984), amongothers. These types of estimators often formulate an expected likelihood function whereboth the groups placement and weights are to be estimated. As for the GFE estimator,the simultaneous estimation of weights and nodes can result in multiple local optimaand problems of convergence. The common approach to numerically maximize the logexpected likelihood function is to apply the expectation-maximization (EM) algorithm(Dempster, Laird and Rubin, 1977). Unfortunately, the EM-algorithm has a slow con-vergence rate and thus requires many evaluations of the likelihood function which can bevery time consuming if the estimator nests a numerical solution of a dynamic economicmodel (Pilla and Lindsay, 2001). Empirical applications have therefore been restricted tocases with a few distinct groups.

In the specific context of estimation of heterogeneous time and risk preferences, ourpaper is also closely related to Alan and Browning (2010) and Alan, Browning and Ejrnæs(2014). They propose a synthetic residual estimation (SRE) approach, where the distancebetween observed and simulated consumption data is minimized conditional on fully para-metric distributions of preference heterogeneity and the assumption that all households,irrespective of their individual preferences, draw Euler residuals from a mixture of twolog-normal distributions.9 The main benefit of the SRE estimator is that it does not re-quire a full specification of the income process, or ever solving the model, but on the otherhand it relies on very restrictive parametric assumptions, which our estimator avoids.

2 A Fixed Grouped Fixed Effects Estimator

In this section, we state the proposed estimator in general notation, while we later turnto a concrete example in our Monte Carlo study. We consider a structural model, which

8 In a certain sense the results in the GFE papers can be seen as providing formal justification for clusteranalysis.

9 Note that while the mean Euler-residual (in the absence of borrowing constraints) is independent ofpreferences, higher order moments are generally not. This is the case even if the distribution of pooledEuler-residuals across heterogeneous households is well approximated by a mixture of two log-normals(as found in Alan and Browning, 2010).

4

for unit i (individual, household, form etc.) at time t has state variables sit and choicevariables cit, and implies optimal choices c?it ≡ c?(sit; θ, γi), where θ ∈ Θ is a set ofhomogeneous parameters, and γi is a vector of unit-specific parameters. This could be avector of optimal discrete and continuous choice variables in a dynamic economic model.We wish to estimate θ and γi using an (unbalanced) panel of N units observed for Ti(potentially non-consecutive) periods, where we in each period observes all the states, sobsit ,and a non-empty subset of the choices, cobsit , potentially contaminated with measurementerror.

The fixed effects (FE) estimator is given by

θFE = arg minθ∈Θ

1N

N∑i=1

Ti∑t=1

g(cobsit , c?(sobsit ; θ, γFEi (θ)); θ) (2.1)

γFEi (θ) = arg minγi∈R

Ti∑t=1

g(cobsit , c?(sobsit ; θ, γi); θ) (2.2)

where g(·) is some criteria function. The FE problem has N + dim(θ) parameters tobe solved for. Especially when it is time consuming to evaluate c?() (by e.g. stochasticdynamic programming), this estimator might seem infeasible. We propose an alternativeapproximate estimator that aims at limiting the computational burden of FE estimationof structural dynamic economic models.

Our approach is to formulate a discrete approximation of the continuous FE estimatorin (2.1) in which γi is restricted to take on only a finite number of values, γi ∈ Γ =γ1, . . . , γJ. We think of the number of nodes, J , as a function of the data but suppressthe dependence throughout. Below we supply an approach to estimate the number ofnodes in applications. Our proposed fixed group fixed effects estimator (FGFE) is then

θ = arg minθ∈Θ

1N

N∑i=1

Ti∑t=1

g(cobsit , c?(sobsit ; θ, γi(θ)); θ) (2.3)

γi(θ) = arg minγi∈Γ

Ti∑t=1

g(cobsit , c?(sobsit ; θ, γi); θ) (2.4)

Let j = (j1, . . . , jN) denote the vector of group memberships and J ≡ 1, 2, . . . J asthe set of potential group memberships, the group membership is then estimated as ji =∑Jk=1 k1γi(θ)=γk where ∑J

k=1 1γi(θ)=γk = 1.A key advantage of our proposed estimator is that for a given guess of θ, we can pre-

compute the J solutions to the economic model for the various values in Γ, and estimatethe N group membership parameters independently across units from equation (2.3). Thepopulation weights on each element in Γ, ω = ωjJ1 , can subsequently be estimated by

ωk = 1N

N∑i=1

1ji=k, ∀k ∈ J. (2.5)

5

The estimator easily handles situations where ωk = 0 for some k. This is not the casefor estimators where γk is also estimated. Even if all weights are always strictly positivein the true optimum, a trial value of γk with ωk = 0 imply that the objective functiondoes not change with γk severely complicating the optimizer’s decision how to proceed.Indeed, Bonhomme and Manresa (2015) report significant problems with finding the globalmaximum which might be due to a large dimensional problem with many flat regions.10

2.1 Estimating the Number of Nodes, J

We propose a split-panel cross-validation approach to choose the number of nodes, J , inapplications. For a given guess of J , imagine splitting the panel into I non-overlappingpartitions along the time dimension. For each partition, ι, we can estimate θJι andjJι = (jJ1,ι, . . . , jJN,ι) and use these estimated parameters to calculate the sum of squaredpredicted errors for the time periods not used in estimation (denoted with subscript −ι),

Eι(J) ≡ N−1N∑i=1

T∑t=1

εit,−ι(θJι , γ jJi,ι)2.

Choosing J that minimizes the mean squared error

J = arg minJ∈N

1I

I∑ι=1Eι(J)

provides an estimate of the number of nodes and domain that trades of the bias andvariance of the estimator.

Another way to estimate the number of nodes is a successive approximation approach,similar to that suggested by Fernández-Villaverde, Rubio-Ramírez and Santos (2006) todetermine the degree of accuracy of a numerical solution method required for the approxi-mate likelihood function to be a good approximation of the exact likelihood. Particularly,to use the decrease in the estimated objective function as a metric to determine when tostop adding nodes. While this is a simple metric to compute, choosing when to stop issomewhat arbitrary.

Other alternatives have been proposed in various different strands of literature. Popu-lar approaches to determine the number of latent factors in factor analysis or the numberof clusters in cluster analysis is to use information criteria, such as BIC or AIC (see e.g.Milligan and Cooper, 1985; Bai and Ng, 2002; and Bonhomme and Manresa, 2015).

10This identification problem is not unique to the GFE estimator. The same identification problem isalso inherent in the random coefficient estimator proposed by Heckman and Singer (1984) and heavilyused in empirical applications.

6

3 Asymptotic Theory [To come]

Under the assumption that the number of nodes and the placement of these nodes areknown (i.e. known G), the estimator studied in Hahn and Moon (2010) is similar to thetype of estimator we consider. Specifically, their estimator focuses on the estimation ofdynamic discrete games of firm behavior with potentially multiple, but a finite numberof, equilibria. In their setup, each market is observed over several time periods and theyassume that the equilibrium played in a market is time-invariant. Translating their setupinto our framework, the equilibrium played in a market is the unobserved heterogeneity(γi) in our framework and the assumption of finitely many equilibria is equivalent to ourfinite support assumption on Γ.

Hahn and Moon (2010) show that the estimator is consistent as N and T both goesto infinity and that correct classification converges to one even when the number of timeperiods observed, T , grows significantly slower than the number of units, N . Specifically,they show that for many typical settings, the incidental parameter problem of standard FEestimators (unrestricted support of γi) vanishes as long as T grows as some log function ofN . Finally, they show that the estimator of the homogeneous parameters, θ, is asymptoticnormal and inference is not affected by the classification parameters due to their fast rateof convergence.

• Consistency

• Normality (as. var as FE)

• Convergence rates (N−1, T−1, J−1)

– Bias reduction works when T/J → 0 when T, J →∞

– Asymptotic distribution of the bias reduced estimator

4 Monte Carlo Experiments

We here illustrate the finite sample properties of our proposed FGFE estimator. We studytwo examples based on the data generating processes (DGPs):

DGP1: yit = ρyit−1 + αi + εit (4.1)

DGP2: yit = exp(ρyit−1 + αi + εit)1 + exp(ρyit−1 + αi + εit)

(4.2)

where ε ∼ N (0, 0.01) across all simulations.For each of the 200 Monte Carlo replications we simulate N = 2, 000 individuals for

T ∈ 10, 20, 30 periods and apply our estimator using J ∈ 5, 10, 50 equally spaced

7

nodes. We simulate data letting yi0 = 0, ρ = 0.95 and α is drawn from a normal withmean zero, variance 0.1 truncated to the interval [−0.3, 0.3] and assume that researchersknow the domain of the unit-specific coefficients, αi, but not the values. In turn, theresearcher wishes to uncover one homogenous parameter, ρ, and a vector of heterogeneousparameters, α = (α1, . . . , αN).

Table 4.1: Monte Carlo Results: ρ, Linear Model.Avg. Abs. Bias MC Std.

baseline bias reduced baseline bias reducedT = 10

FE 0.0550 0.0054 0.0035 0.0053FGFEJ = 5 0.0587 0.0102 0.0061 0.0118J = 10 0.0561 0.0061 0.0037 0.0067J = 50 0.0550 0.0056 0.0036 0.0054

T = 20FE 0.0194 0.0029 0.0017 0.0022FGFEJ = 5 0.0212 0.0062 0.0041 0.0074J = 10 0.0199 0.0039 0.0022 0.0040J = 50 0.0195 0.0029 0.0017 0.0022

T = 30FE 0.0116 0.0018 0.0010 0.0012FGFEJ = 5 0.0132 0.0203 0.0035 0.0247J = 10 0.0119 0.0027 0.0016 0.0031J = 50 0.0116 0.0018 0.0010 0.0013

Notes: Columns 1 and 2 report the average absolute bias of the base-line and the bias-reduced estimates of ρ across the Monte Carlo repli-cations. Columns 3 and 4 report the standard deviation across thereplications. All results are for the linear DGP1.

Table 4.2 shows that increasing the number of nodes reduces the average mean squarederror and the average MC standard error significantly.

8

Table 4.2: Monte Carlo Results: αi, Linear Model.Avg. MSE Avg. MC Std.


FE 0.0260 0.0228 0.1240 0.1097FGFEJ = 5 0.0283 0.0247 0.1332 0.1181J = 10 0.0265 0.0232 0.1261 0.1116J = 50 0.0260 0.0228 0.1241 0.1097

T = 20FE 0.0231 0.0205 0.1152 0.1030FGFEJ = 5 0.0254 0.0224 0.1245 0.1121J = 10 0.0236 0.0208 0.1171 0.1048J = 50 0.0231 0.0205 0.1152 0.1031

T = 30FE 0.0222 0.0201 0.1116 0.1014FGFEJ = 5 0.0245 0.0256 0.1215 0.1255J = 10 0.0227 0.0205 0.1136 0.1032J = 50 0.0222 0.0201 0.1117 0.1014

Notes: Columns 1 and 2 report the average mean square error of thebaseline and the bias-reduced estimates of αiN

1 across the MonteCarlo replications. Columns 3 and 4 report the average standarddeviation across the replications. All results are for the linear DGP1.

Tables 4.1 and 4.2 reports the estimation results related to the homogeneous andheterogeneous parameters, respectively. The first column in Table 4.1 reports the averageabsolute bias and the second column reports the split-panel jackknife bias reduced averageabsolute bias. The third and fourth column reports the standard deviation across MC runsfor the baseline and bias reduced estimates. We also report the standard FE estimatesfor reference. The FE estimator is feasible in this setup because the computational timeassociated with evaluating the models in both DGPs are rather low. In situations whereevaluating the model at a set of parameter values is time consuming, such as manystructural dynamic economic models, implementing the standard FE estimator may easilybe unfeasible.

As our theory suggests, the FGFE estimator converges towards the FE estimator andusing 50 nodes delivers almost identical results. The split-panel jackknife bias reductionreduces the incidental parameter bias significantly (while increasing the variance slightlyin out finite sample).

9

Table 4.3: Monte Carlo Results: ρ, Nonlinear Model.Avg. Abs. Bias MC Std.


FE 0.0569 0.0369 0.0110 0.0147FGFEJ = 5 0.0529 0.0910 0.0573 0.1123J = 10 0.0544 0.0476 0.0253 0.0500J = 50 0.0570 0.0372 0.0117 0.0163

T = 20FE 0.0447 0.0221 0.0101 0.0124FGFEJ = 5 0.0458 0.0846 0.0560 0.1036J = 10 0.0383 0.0549 0.0333 0.0687J = 50 0.0452 0.0244 0.0112 0.0158

T = 30FE 0.0374 0.0165 0.0096 0.0114FGFEJ = 5 0.0506 0.0838 0.0597 0.1060J = 10 0.0314 0.0546 0.0328 0.0669J = 50 0.0372 0.0178 0.0105 0.0149

Notes: Columns 1 and 2 report the average absolute bias of the base-line and the bias-reduced estimates of ρ across the Monte Carlo repli-cations. Columns 3 and 4 report the standard deviation across thereplications. All results are for the nonlinear DGP2.

10

Table 4.4: Monte Carlo Results: αi, Nonlinear Model.Avg. MSE Avg. MC Std.


FE 0.0233 0.0239 0.1063 0.1122FGFEJ = 5 0.0252 0.0295 0.1194 0.1371J = 10 0.0237 0.0249 0.1089 0.1174J = 50 0.0233 0.0240 0.1064 0.1124

T = 20FE 0.0213 0.0212 0.1034 0.1056FGFEJ = 5 0.0234 0.0270 0.1163 0.1299J = 10 0.0216 0.0231 0.1068 0.1149J = 50 0.0213 0.0213 0.1035 0.1059

T = 30FE 0.0208 0.0207 0.1022 0.1036FGFEJ = 5 0.0233 0.0266 0.1160 0.1292J = 10 0.0211 0.0225 0.1055 0.1125J = 50 0.0209 0.0207 0.1023 0.1038

Notes: Columns 1 and 2 report the average mean square error of thebaseline and the bias-reduced estimates of αiN

1 across the MonteCarlo replications. Columns 3 and 4 report the average standarddeviation across the replications. All results are for the nonlinearDGP2.

4.1 Choosing the Number of Nodes

We here implement a simple half-panel cross-validation approach. This follows closely thebias-correction approach above and aims at preserving eventual time-series properties ofthe actual data. In particular, we split the simulated data into two sub-samples wherethe first contains the first T/2 time period observations for all individuals and the secondsample contain the remaining T/2 time periods for all individuals. We then estimate themodel parameters for each sub-sample to get (ρJι , αJi,ι), ι = 1, 2. We then calculate thesquared predicted error for each sample using the estimates from the other left out sample

11

(illustrated here for the linear DGP1)

E1(J) = N−1N∑i=1

T/2∑t=1

(yi,t − ρJ2yit−1 + αJ2,i)2

E2(J) = N−1N∑i=1

T∑t=T/2+1

(yi,t − ρJ1yit−1 + αJ1,i)2

and estimate J asJ = arg min

J∈J

12(E1(J) + E2(J)) (4.3)

where we restrict the number of nodes to be in a sub-set of the natural numbers, namelyJ = 10, 11, . . . , 100.

Figure 4.1 plots the histogram of estimated number of groups across all 200 MonteCarlo runs together with the average number of estimated groups for T ∈ 10, 20, 30.Figure 4.1 reports results for the linear DGP1 and Figure 4.2 reports the results from thenonlinear DGP2.

Figure 4.1: Estimated Number of Groups, J . Linear model, DGP1.

(a) T = 10.

10 20 30 40 50 60 70 80 90 100number of groups, J

0

0.05

0.1

0.15

0.2

0.25

shar

e

shareaverage (43)

(b) T = 20.


0

0.05

0.1

0.15

0.2

0.25

shar

e

shareaverage (54)

(c) T = 30.


0

0.05

0.1

0.15

0.2

0.25

shar

e

shareaverage (60)

Notes: The figure reports the distribution (and average) of the estimated number of groups using thecross-validation criteria (4.3) for varying sample sizes. All results are based on the linear DGP1.

Figure 4.2: Estimated Number of Groups, J . Nonlinear model, DGP2.

(a) T = 10.


0

0.1

0.2

0.3

shar

e

shareaverage (73)

(b) T = 20.


0

0.1

0.2

0.3

shar

e

shareaverage (82)

(c) T = 30.


0

0.1

0.2

0.3

shar

e

shareaverage (85)

Notes: The figure reports the distribution (and average) of the estimated number of groups using thecross-validation criteria (4.3) for varying sample sizes. All results are based on the nonlinear DGP2.

12

As our theory suggests [REF to BIAS RATE], when the number of time periodsincrease, the average optimal number of groups increases. Particularly, the average es-timated number of groups are 43, 54 and 60 when using 10, 20 and 30 time periods,respectively. While there is significant dispersion across MC runs the average estimatednumber of groups are reassuringly close to the 50 groups that delivered almost identicalresults as the standard FE estimator in tables 4.1–4.4 above. For the non-linear model,the number of optimal groups is slightly higher in our setting.

5 An Empirical Application to Danish Data

In this section, we apply our proposed estimator to Danish administrative register data.We estimate through Maximum Likelihood the canonical buffer-stock consumption modelof Deaton (1991, 1992) and Carroll (1992, 1997) assuming that (imputed) consumptionis contaminated with mean-one multiplicative log-normal error with variance σ2

η. Qual-itatively similar results from a non-linear least square (NLLS) estimator and a robustHuber-type estimator, not relying on the distributional assumption on the measurementerror, is reported in Appendix D in the online supplemental material. The supplementalmaterial Appendix C also contain a Monte Carlo study of the ability of the proposedestimator to estimate this type of model calibrated to the Danish data.

We first estimate the model under the assumption of fully homogeneous preferences,and then in turn allow for heterogeneity in the discount factor, β. We include in thesupplemental material alternative results from letting ρ be heterogeneous. In both cases,we find substantial preference heterogeneity within educational strata, and a clear im-provement in the model’s predictive power when allowing for either type of heterogeneity.The results also align well with economic intuition with e.g. high skilled households beingmore patient. In a post-estimation analysis, we additionally show that households withno unemployment insurance are estimated to have more mass at lower discount factors(and lower relative risk aversion coefficients). The NPGFE estimation routine convergedin less than five minutes, illustrating that the proposed estimator is also applicable formore complex and computationally time demanding models.

13

5.1 Data

We use high quality Danish administrative registers covering the entire population in theperiod 1987-1996.11 All information are based on third party reports with little additionalself-reporting. All self-reporting are moreover subject to possible auditing giving reliablelongitudinal information on household characteristics, assets, liabilities and income.

Household income includes all monetary income net of all taxes, except any incomerelated to ownership of financial assets. Transfers, such as child benefits and unemploy-ment benefits, are also included to ensure that disposable income accurately measuresthe flow of resources available for consumption. Net wealth consists of stocks, bonds,bank deposits, cars, boats, house value for home owners and mortgage deeds net of totalliabilities. The house value is assessed by the tax authorities for tax purposes. Pensionwealth is not observed in the registers and thus not included in the wealth measure.

Household consumption is not observed in the registers and is, therefore, imputedusing a simple budget approach, Ct = Yt − ∆At, where Yt = Yt + r · At is disposableincome, At is end-of-period net wealth, r is the real rate of return, and ∆At thus proxiessavings. A very similar imputation method is evaluated on Danish data in Browning andLeth-Petersen (2003) and found to produce a reasonable approximation. The resultingconsumption measure will, however, e.g. include some durables such as home appliances.All variables are deflated with the official consumer price index.

We restrict attention to stable married or cohabiting couples in which the husband isbetween age 25 and 59. This is to mitigate issues regarding educational and retirementchoices. To increase homogeneity of households, we restrict the spousal age differenceto be no more than five years, and require that no one in the household ever becomesself-employed, are out of the labor market, or retire before age 59. To limit the effectof errors in the imputation procedure on our estimates of preference heterogeneity, wetrim our sample from extreme observations and require that we have data for at least 5years.12 In total this leaves us with an unbalanced panel of 317,793 households observed inat most 9 time periods with a total of 2,994,679 household-time observations. Householdsare classified as high skilled if either member holds at least a bachelor degree (86.713households are denoted as high-skilled).

11We begin in 1987 to be able to consistently match individuals into couples, and we end with 1996because the Danish wealth tax was abolished in this year. Information on, e.g., cars and boats wherenot collected in subsequent years leading to a break in the wealth measure from 1996 to 1997.

12Further details on the data are provided in appendix B.

14

5.2 Model

Our application builds on the incomplete markets model pioneered by Deaton (1991, 1992)and Carroll (1992, 1997). For completeness we give e short description of the model here.We consider unitary households indexed by i with heterogeneous preferences who workfor Tr periods, then retire and eventually die at the end of period T . The recursive formof the household’s problem is

Vit (Pit,Mit) = maxCit≥0

C1−ρit

1− ρ + βiEt [Vit+1 (Pit+1,Mit+1)] (5.1)

subject to the inter-temporal budget constraint

Mit+1 = RAit + Yit (5.2)

Ait = Mit − Cit (5.3)

where Ait is end-of-period assets, Mit is beginning-of-period market resources, Yit is in-come, and R is the gross rate of return. Consumers are allowed to be net-borrowers upto a fraction of their permanent income Pit. End-of-period wealth thus has to satisfy

Ait ≥ −λtPit, λt =

0 t ≥ Tr

λ else.(5.4)

where we restrict retirees not to be net borrowers (λt = 0, t ≥ Tr).In the beginning of each period, households receive a stochastic income

Yit = Pitξit, ξ ∼ logN (−0.5σ2ξ , σ

2ξ ) (5.5)

Pit = GtPit−1ψit, ψ ∼ logN (−0.5σ2ψ, σ

2ψ) (5.6)

where Gt is an age-dependent gross growth rate of permanent income, ψit is a mean-onepermanent shock to income, and ξit is a mean-one transitory shock to income. We assumethat income is constant post retirement, Yit = κPiT , t ≥ T , where κ is the replacementrate in retirement.

We denote the model-implied optimal level of consumption for a household aged t

with resources Mit and permanent income Pit by C?it = C?

t (Mit, Pit; θ, βi) where θ =(ρ,R, λ, σ2

ξ , σ2ψ). The model is solved using the endogenous grid method (EGM) proposed

by Carroll (2006).13

13We use 300 discrete points to approximate the consumption function and 82 Guass-Hermite quadraturepoints to approximate expectations with respect to future transitory and permanent income shocks.

15

5.3 Calibrations

In addition to the parameters of the income process estimated below, we fix several otherparameters of the model before turning to estimation. Particularly, we choose an interestof R = 1.03 similar to the long run real return on 10 year Danish government bondswhich over the period 1987-2007 was 3.8 percent. The same interest rate is used ine.g. Gourinchas and Parker (2002). Informally looking into the observed consumptionbehavior of households in debt we furthermore set the borrowing constraint to be bindingat 30 percent of permanent income (λ = 0.30). Kaplan (2012) estimates an almostidentical placement of the credit constraint using the PSID. Finally, we set the replacementrate in retirement to 90 percent, κ = 0.9 based on The Danish Ministry of Finance (2003)and assume that households retire at age 60 (Tr = 60) and dies at age 75 (T = 75).

Following the approach in Meghir and Pistaferri (2004), we estimate the transitoryand permanent income shocks variances for each education group separately using

σ2ψ = cov(∆εit,

2∑k=0

∆εi,t−1+k) (5.7)

σ2ξ = −cov(∆εit,∆εi,t+1) (5.8)

whereεit is the residual for household i in period t from a regression of log householdincome on a full set of age dummies, i.e.

log(Yit) = cons +59∑j=25

αagej 1ageit=j + εit (5.9)

The results are reported in table 5.1. The income variances of Danish households aresmaller than those typically estimated for the US. As argued in Jørgensen (forthcoming),this is most likely due to i) a generous social welfare system, ii) progressive taxation, iii)a relatively high “minimum wage”, and iv) register data is typically less noise comparedto surveys typically used. We find that high skilled households are subject to both largertransitory shocks, and larger permanent shocks.

Table 5.1: Income Shock VariancesLow skilled High skilledEst (s.e.) Est (s.e.)

σ2ψ · 103 2.86 (0.05) 3.56 (0.09)σ2ξ · 103 3.20 (0.10) 5.19 (0.24)

Notes: The income shock variances are es-timated based on the approach proposed inMeghir and Pistaferri (2004).

16

The growth in income is estimated by re-arranging the income process such that

Gt = exp(

1N

N∑i=1

∆ log Yit + 12σ

2ψ

)(5.10)

A smoothed growth rate Gt is obtained using a third degree polynomial in age. The resultsare reported in figure 5.1. Permanent income, Pit, is found by applying the Kalman filteron the time series of log income for each household (the resulting life cycle profile is shownin appendix B in the online supplemental material).

Figure 5.1: Gross Income Growth Rates, Gt.

(a) Low skilled

25 30 35 40 45 50 55 60

age

0.98

1

1.02

1.04

1.06

1.08

1.1

1.12Point EstimationSmoothed

(b) High skilled

25 30 35 40 45 50 55 60

age

0.98

1

1.02

1.04

1.06

1.08

1.1

1.12

5.4 A Maximum Likelihood Estimator

We follow the typical assumption that consumption is observed with multiplicative iidlog-normal measurement error (G&P?) with mean one and variance σ2

η, i.e.

Cobsit = C?

itηit, log η ∼ N (−0.5σ2η, σ

2η) (5.11)

The mean-corrected log-differences in observed and predicted consumption thus follows aGaussian distribution, i.e.

εit(ρ, βi) ≡ cobsit − c?it + 0.5σ2η, εit(ρ, βi) ∼ N (0, σ2

η) (5.12)

where lowercase letters denote log-transformed variables, e.g., cobsit = logCt(M obsit , P

obsit ; θ, βi).

Note, that we ignore for simplicity here that we use an estimated permanent income mea-sure stemming from the Kalman filter because we do not observe that in the data. Al-ternatively, the approach proposed in Jørgensen and Kristensen (2017) could be adopted

17

here as well. We have not pursued that strategy further here.The mean log likelihood function is then

L(ρ, ση, βiNi=1) = 1N

N∑i=1

ì(ρ, ση, βi)

where ì(ρ, ση, βi) is the log-likelihood contributions associated with household i,

ì(ρ, ση, βi) = −12

(Ti log(2πσ2

η) +Ti∑t=1

εit(ρ, βi)2σ−2η

)

We discretize the discount factors, βjJj=1, into J equally spaced nodes and our FGFEMLE solves

(ρ, ση) = arg maxρ,ση∈R+

L(ρ, ση, βjiNi=1) (5.13)

ji(ρ, ση) = arg maxji∈J

ì(ρ, ση, βji), ∀i = 1, . . . , N (5.14)

where the group memberships are estimated through classification maximum likelihood(Bryant and Williamson, 1978).

The MLE of the measurement error variance, σ2η, is biased because the estimator

does not recognize the reduced degrees of freedom from the estimation of βi and thehomogeneous parameters in θ. We, therefore, estimate a re-parameterized parameter,σ2η = σ2

η(NT − N − (dim(θ) − 1))/NT , where σ2η is the “standard” biased measurement

error variance and NT ≡ ∑Ni=1 Ti. This parametrization corrects for the dimensionality

of the homogeneous parameters and the estimation of the N classification parameters inequation (5.14).

5.5 Estimation Results

The estimation results are presented in table 5.2 for both educational groups.Columns (1) and (4) report the homogeneous estimates for low and high skilled house-

holds. Surprisingly, we here find that low-skilled households with a discount factor of 0.966are much more patient than high skilled households with a discount factor of 0.933. Onthe other hand, high skilled households have a CRRA coefficient of 7.3 far exceeding theCRRA coefficient of 2.1 for low skilled. As discussed above in relation to the problemsin identifying β and ρ separately, the total saving motive might thus still be stronger forthe high skilled than for the low skilled households – both due to the higher risk aversionand the lower inter-temporal elasticity of substitution.

Columns (2) and (5) in table 5.2 report estimation results when β is allowed to beheterogeneous, and figure 5.2 shows the estimated distributions of β. We allow for a largedomain with β ∈ [0.75, 1.05] and discretize it into J = 100 bins. We find very little mass

18

Table 5.2: Estimated Preferences.Low skilled High skilled

Hom. FGFE Hom. FGFE(1) (2) (3) (4)

β 0.966(0.000)

0.961[0.033]

† 0.938(0.001)

0.968[0.030]

†

ρ 2.119(0.080)

1.490 7.256(0.065)

1.3931.969

(0.001)1.772

(0.002)

ση 0.333(0.000)

0.335 0.352(0.000)

0.3520.340

(0.000)0.361

(0.001)

L −2.502 −1.984 −2.916 −2.355N 168315 168315 62057 62057

Obs. 1341582 1341582 490694 490694J − 100 − 100β ∈ R+ [0.75, 1.05] R+ [0.75, 1.05]

Notes: Robust asymptotic standard errors in brackets. Clus-tered on the individual level. Split-panel jackknife biasreduced estimates reported in curly brackets (Dhaene andJochmans, forthcoming).† Reported are the estimated mean of the respective heteroge-neous distribution with the standard deviation of the distri-bution in square brackets.‡ The number of nodes refers here to the number of points inthe interpolation object. The household-specific estimates areallowed to be continuous in the domain.

on the boundary indicating that our chosen domain is large enough. In opposition to theresults for the homogeneous specification, we now find that the high skilled are somewhatmore patient than the low skilled; the mean of the distribution for the high skilled is0.968, while it is 0.0961 for the low skilled. In figure 5.2 we see that the distribution ofdiscount factors is shifted to the right for the high skilled compared to low skilled, whilethe shapes of the distributions are quite similar across the two educational groups. Bothdistributions are relatively symmetric, but with a somewhat fat left tail.

The CRRA coefficients are found to be relatively low and similar across the educationalgroups; ρ = 1.49 for low skilled, and ρ = 1.39 for high skilled. The estimates increase toaround 1.97 and 1.77 for low and high skilled, respectively, when applying the split-paneljackknife bias reduction approach proposed in Dhaene and Jochmans (forthcoming). Theestimated homogeneous parameters and the means of heterogeneous distributions are thusin line with existing estimates. See e.g. Cagetti (2003); Gourinchas and Parker (2002)and Alan, Attanasio and Browning (2009).

Looking at the improvement in the log-likelihood function when allowing β to be

19

heterogeneous compared to the homogeneous case, we can calculate log-likelihood ratiosof 174, 374 and 69, 628 for low and high skilled, respectively. Assuming that these LR-statistics are χ2

N−1 with N − 1 degrees of freedom (N classification parameters versus 1homogeneous parameter), we get p-values of zero, suggesting that the estimated hetero-geneity is significant both economically and statistically.14

Figure 5.2: Estimated β Distributions, FGFE.

(a) Probability distribution.

0.75 0.8 0.85 0.9 0.95 1 1.05-

0

0.02

0.04

0.06

0.08

0.1Low skilledHigh skilled

(b) Cumulative distribution.

0.75 0.8 0.85 0.9 0.95 1 1.05-

0

0.2

0.4

0.6

0.8

1

Notes: The figure reports the estimated share of the Danish estimation sample in each of the J fixedgroups as the empirical PDF in the left panel and the CDF in the right panel.

Although we find evidence of substantial preference heterogeneity, our estimated dis-tributions, especially of discount factors, are more narrow than those found in Alan andBrowning (2010), which is the only comparable study using observational data. In thecase of β, they find a spread between the 90th and 10th percentile of 0.143 for the lowskilled and 0.134 for the high skilled. We estimate spreads of 0.050 and 0.046, respectively.

Our results are broadly in line with experimental evidence from Denmark. The dis-tribution of β seems consistent with experimental evidence from Denmark reported inAndersen, Harrison, Lau and Rutström (2010). In general it should be noted that thedispersion of estimates found in the experimental and survey-based literature is large15,but that our results is probably in the lower end in terms of the estimated degree ofheterogeneity.

14This is not a formal test. The χ2 distribution and the degrees of freedom are at best an approximation.Furthermore, for the “test” to be nested, we should include the estimated homogeneous β as a node inthe discrete domain when estimating the distribution of discount factors. Since we view this “test” asan informal assessment of the importance of heterogeneity, we have not pursued this any further.

15See footnote 1 for references.

20

5.6 Preferences and Unemployment Insurance

An interesting feature of the proposed estimator is that it explicitly group householdsaccording to their (estimated) preferences. We can thus easily compare the distributionsof preferences across groups of households divided according to various characteristics.16

In particular, we observe whether anyone in the household have an unemployment insur-ance. In order to construct a time invariant grouping, we denote households as havingno unemployment insurance if they have never been observed as having an unemploymentinsurance, and as having unemployment insurance otherwise.

Figures 5.3 shows the estimated cumulative distributions of discount factors for low andhigh skilled household divided by unemployment insurance take up. The group withoutany unemployment insurance is relatively small, and the resulting density plots noisy, butwe see a noticeable difference in the discount factor distributions across the two groups –especially for high skilled households. Specifically, as economic intuition would suggest,we find that households who have never had unemployment insurance are associated withrelatively lower valuation of the future compared to the group of households in which atleast one member have had unemployment insurance at some point.

Figure 5.3: Estimated Preference Distributions for Sub-Groups

(a) Low skilled.

0.75 0.8 0.85 0.9 0.95 1 1.05-

0

0.2

0.4

0.6

0.8

1

(b) High skilled.

0.75 0.8 0.85 0.9 0.95 1 1.05-

0

0.2

0.4

0.6

0.8

1

Notes: The figure reports the estimated share of the Danish estimation sample in each of the J fixedgroups as the empirical PDF in the left panel and the CDF in the right panel. Results are split based onwhether at least one household member have ever been observed to have an unemployment insurance ornot.

While the reported correlations are not causal interpretations and we have ignoredestimation uncertainty related to the group membership of each household for conve-nience, this analysis shows how the proposed estimator can be used to investigate ex ante

16While the estimated preferences are associated with estimation uncertainty, we abstract from thatimportant point here.

21

preference heterogeneity across household characteristics and produce meaningful results.

• Update the figure to include legend!

6 Discussion

Our non-parametric grouped fixed-effects estimator was shown to offer a computationallysimple and efficient approach to estimate dynamic economic models with unrestricted het-erogeneity from panel data on observed choices. In a Monte Carlo study we showed thatit has good finite sample properties, and specifically that it can uncover the distributionof preference heterogeneity from consumption choices using a standard life-cycle con-sumption model. The estimator’s empirical applicability was shown estimating a similarlife-cycle consumption saving model on Danish administrative data allowing for hetero-geneous time preferences and and/or heterogeneous CRRA coefficients. These resultsindicated a large degree of preference heterogeneity, where differences across educationgroups for example aligned well with economic intuition.

Interesting avenues for future work includes both an investigation of the asymptoticproperty of the estimator and applications of the estimator to more complex dynamiceconomic models with multiple continuous and discrete choices. Building on the currentapplication, it would, for example, be interesting to estimate a more general life-cyclesaving model with non-parametric heterogeneity affecting not just consumption choices,but also portfolio and retirement choices.

22

A Model Details

Proposition 1. The optimal end-of-period asset choice satisfies

At ≥ At =

0 if t = T

−min Ωt, λt · Γt if t < T

where

Λt ≡

R−1 · Γt · ξ if t = T − 1

R−1 ·[min ΛT−1, λ+ ξ

]· Γt if t < T − 1

Γt ≡ Gt · ψ

Proof. Let Et [•] denote the worst-case expectation operator given information t. Notethat any MT ≤ 0 implies that the household cannot choose a Ct > 0 such that At ≤ 0.Consequently

limMt0

VT (•,MT ) = limCt0

C1−ρt

1− ρ = −∞

which the household want to avoid at any cost. Therefore we have

ET−1 [MT − AT ] > 0↔

ET−1 [R · AT−1 + YT ] > 0↔

R · AT−1 + ΓT · ξ · PT−1 > 0↔

AT−1 > −R−1 · ΓT · ξ · PT−1

Combining this with the exogenous borrowing constraint we get

AT−1 > −min ΛT−1, λ · PT−1

Similar arguments further implies

ET−2 [MT−1 −min ΛT−1, λ · PT−1] > 0 ↔

ET−1 [R · AT−2 + YT−1] > −min ΛT−1, λ · ET−1 [PT−1]↔

R · AT−2 +GT−1 · ψ · ξ · PT−2 > −min ΛT−1, λ ·GT−1 · ψ · PT−2 ↔

AT−2 > −R−1 ·[min ΛT−1, λ+ ξ

]· ΓT−1︸︷︷︸

=ΛT−2

· PT−2

.

23

B Data

B.1 Income Definitions

In the Danish income registers, we have the following income variables:

DISPON_NY︸︷︷︸disposable income

= SAMLINK_NY− SKATMVIALT_NY︸︷︷︸taxes

− QRENTUD2︸︷︷︸interest payments

− UNDERHOL + TBKONTHJ︸︷︷︸alimony+returned benefits

SAMLINK_NY︸︷︷︸total income

= PERINDKIALT

+OVSKEJD02_NY + OVERSKEJD07︸︷︷︸imputed rental value

PERINDKIALT︸︷︷︸total monetary income

= RENTEINDK︸︷︷︸interest income

+ PEROEVRIGFORMUE︸︷︷︸other property income

+

ERHVERVSINDK(_GL)︸︷︷︸wages and profits

+ OVERFORSINDK︸︷︷︸public transfers

+RESUINK(_GL)︸︷︷︸other income

We define

Y grossit ≡ PERINDKIALT

Y assetit ≡ RENTEINDK+PEROEVRIGFORMUE

Y nonassetit ≡ PERINDKIALT− Y assets

t

Y transfersit ≡ OVERFORSINDK

ςit ≡ SKATMVIALT_NY

Y nomit ≡

Y grossit − ςit if

∣∣∣ Y assetit

Y grossit

∣∣∣ < 0.1

(1− τit) · Y nonassetit − else

where i is for a couple, t is for observation year, and Y nomit is after-tax monetary income

from all sources, except financial assets. To approximate the after tax earnings of house-holds with substantial income from financial assets, we use the tax rate τit ≡ ςit

Y grossitof

households without substantial income from financial assets, but with a similar level ofnon-asset income (specifically we use twenty bins of Y nonassets

it ).

24

B.2 Data Construction

We construct our variables as follows:

1. Couples are constructed using EFALLE (from BEF) (before 1987 we only haveC_FAELLE_ID from FAIN).

2. Birthyear and gender is based on FOED_DAG and KOEN (from BEF) or if notavailable ALDER and KOEN (from FAIN). Couple age is the age of the male.

3. Wealth Anomit is the total net wealth excluding pensions (FORM and FORM-REST_NY05 (after 1996) from INDH) adjusted upwards with 10 percent of thevalue of any owned properties Hnom

ikt (KOEJD or if missing EJENDOMSVURDER-ING from INDH).

4. Self-Employment is coded as PSTILL≤ 20 (from IDAP).

5. Not in the labor market is coded as PSTILL= 90 (from IDAP).

6. Retirement is coded as PSTILL in 50, 55, 92, 93, 94 (from IDAP).

7. Student is coded as PSTILL = 91 (from IDAP).

8. A couple is coded as high-skilled if at least one of them has ≥ 180 months ofeducation (using HFPRIA from UDDA); otherwise it is coded as low-skilled.

We additionally calculate nominal cash-on-hand and imputed consumption as

Mnomit ≡ R · Anomi,t−1 + Y nom

it (B.1)

Cnomit ≡ Mnom

it − Anomit (B.2)

All variables are subsequently deflated with the consumer price index.

B.3 Sample Selection

We use the following iterative selection criteria:

1. Our baseline sample is all unique couples, where the male is older than 18 and is inthe income registers sometimes between 1987 and 1996 (both included).

2. Both partners are between age 25 and 59 (both included).

3. The age difference is not larger than 5 years.

4. Neither of them are ever self-employed or not in the labor market (see definition insub-section B.2).

25

Table B.1: Sample SelectionUnique Couples Observations

1. Baseline 1.935.069 12.869.3912. Age between 25 and 59 1.142.433 8.542.7853. Age difference ≤ 5 years 1.040.074 6.862.1974. Never self-employed 657.926 4.207.3775. Not students 626.302 4.117.7886. Not retired before age 59 624.944 3.990.0077. Education information not missing 617.334 3.951.5048. ≥ 5 “non-extreme” observations 230.372 1.832.276hereof high-skilled 62.057 490.694

5. No information is used when or before any of them are students (see definition insub-section B.2).

6. Neither of them retire before age 59 (see definition in sub-section B.2).

7. Education information is not missing for both partners.

8. We remove all households with fewer than 5 observations satisfying:

(a) Mit

Yit, CitYit

, AitY rawit

and Yit are not below the 1st percentile or above the 99th per-centile by age-year bins.

(b) mit ≡ Mit

Pit≥ −λ

(c) ait ≡ AitPit≥ −λ

(d) cit ≡ CitPit

< 0.3

Additionally we do not use information for any of the periods where the aboverequirements are not satisfied.

Table B.1 shows how the sample size is affected by these choices.

B.4 Life Cycle Profiles

In order to calculate life-cycle profiles, we need to detrend across cohorts. We do so intwo steps. First we run the following regression separately for each education group

log(Yit) = cons +59∑j=25

αagej 1ageit=j +1996∑

k=1987αyeark 1yearit=k + εit (B.3)

26

Secondly, the education specific trend growth rates of income is derived as the constantfrom a regression of the first differences year dummy coefficients on no covariates, i.e.

∆αyeart = (G− 1) + εt (B.4)

Finally all monetary variables are detrended relative to a 25 year old in 1996 by dividingthrough by the factor Gbirthyearit−1996−25, and normalized by subsequently dividing throughby the mean income of a unskilled household of age 25.

Figure B.1-B.4 show the resulting life-cycle profiles.

Figure B.1: Life Cycle Profiles - Yt(a) Low Skilled - Percentiles

25 30 35 40 45 50 55 60

age

0.5

1

1.5

2

2.510th25th50th75th90th

(b) High Skilled - Percentiles

25 30 35 40 45 50 55 60

age

0.5

1

1.5

2

2.5

(c) Low Skilled - Mean by Birthyear

25 30 35 40 45 50 55 60

age

0.5

1

1.5

2

2.5

(d) High Skilled - Mean by Birthyear

25 30 35 40 45 50 55 60

age

0.5

1

1.5

2

2.5

27

Figure B.2: Life Cycle Profiles - At(a) Low Skilled - Percentiles

25 30 35 40 45 50 55 60

age

-2

0

2

4

6

810th25th50th75th90th


25 30 35 40 45 50 55 60

age

-2

0

2

4

6

8

(c) Low Skilled - Mean by Birthyear

25 30 35 40 45 50 55 60

age

-2

0

2

4

6

8

(d) High Skilled - Mean by Birthyear

25 30 35 40 45 50 55 60

age

-2

0

2

4

6

8

Figure B.3: Life Cycle Profiles - mt

(a) Low Skilled - Percentiles

25 30 35 40 45 50 55 60

age

0

1

2

3

4

5



25 30 35 40 45 50 55 60

age

0

1

2

3

4

5

6

28

Figure B.4: Life Cycle Profiles - ct(a) Low Skilled - Percentiles

25 30 35 40 45 50 55 60

age

0

0.5

1

1.5



25 30 35 40 45 50 55 60

age

0

0.5

1

1.5

2

29

C A Monte Carlo Study: Buffer Stock Model

In this section, we investigate the finite sample properties of our proposed estimatorapplied to estimate the model of interest in the application in Section 5.

In each of the 50 Monte Carlo runs conducted here, we first simulate N households for35 periods (from age 25 through 59) with household-specific discount factors βi and initialdraws of wealth and permanent income from log-normal distributions with, respectively,means of 0.1 and 1 and variances of 0.2 and 0.4. We draw individual discount factorsfrom a normal distribution with mean 0.98 and standard deviation 0.02 and truncate thedistribution at such that βi ∈ [0.8, 1.1] for all i. In our simulations, no observations weretruncated. When estimating, we fix the domain to be β ∈ [0.8, 1.1] and define the finitesupport as BJ = 0.8 + j(1.1− 0.8)/JJ−1

j=0 .We calibrate the model using the fairly standard values reported in table C.1. The

estimation sample is then constructed by randomly picking T adjacent periods for eachhousehold to be used. Consumption is finally multiplied with random draws of log-normalmeasurement error as in equation (5.11). We use this simulated data to estimate (ρ, ση, j)solving (5.13)–(5.14) for each MC run.

Table C.1: Calibrated Parameters for Monte Carlo Results.ρ κ σξ σψ Gt R λ

2.5 0.5 0.1 0.1 1.02 1.04 0.3

Table C.2 and C.3 report MC results when using J ∈ 50, 100, 200 nodes to ap-proximate the true continuous distribution of preferences. Results for N = 100, 000,T ∈ 10, 30, and ση = 0.10 are reported. We estimate, besides the heterogeneous dis-count factors, the CRRA coefficient, ρ, using a nonlinear least squares criterion,

ρ = arg minρ>0

1N

N∑N=1

T∑t=1

(Cit/C?t (Mit, Pit; βi(ρ), ρ)− 1)2

βi(ρ) = arg minβi∈BJ

T∑t=1

(Cit/C?t (Mit, Pit; βi, ρ)− 1)2

using the fact that measurement error has mean one.

30

Table C.2: Monte Carlo Results: ρ, Buffer Stock Model.Avg. Abs. Bias MC Std.


J = 50 1.0578 0.6503 0.0428 0.0923J = 100 0.5044 0.2598 0.0416 0.0880J = 200 0.2544 0.2648 0.0495 0.0758

T = 30J = 50 0.4059 0.0759 0.0136 0.0347J = 100 0.2385 0.1130 0.0111 0.0224J = 200 0.1873 0.1580 0.0101 0.0215

Notes: Columns 1 and 2 report the average absolute bias of thebaseline and the bias-reduced estimates of ρ across the Monte Carloreplications. Columns 3 and 4 report the standard deviation acrossthe replications. All results are for the Buffer Stock model.

Table C.3: Monte Carlo Results: βi, Buffer Stock Model.Avg. MSE Avg. MC Std.


J = 50 0.0003 0.0002 0.0113 0.0099J = 100 0.0001 0.0001 0.0092 0.0085J = 200 0.0001 0.0001 0.0084 0.0085

T = 30J = 50 0.0000 0.0000 0.0048 0.0043J = 100 0.0000 0.0000 0.0042 0.0040J = 200 0.0000 0.0000 0.0041 0.0040

Notes: Columns 1 and 2 report the average mean square error of thebaseline and the bias-reduced estimates of βiN

1 across the MonteCarlo replications. Columns 3 and 4 report the average standarddeviation across the replications. All results are for the Buffer Stockmodel.

31

D Results from Alternative Estimators [UPDATE]

We here report in Table [REF], the estimation results from alternative estimators. Par-ticularly, we here show estimated parameters from a non-linear least squares (NLLS)estimator,

L(θ, γiNi=1) = 1N

N∑i=1

Ti∑t=1

εit(θ, γi)2

whereεit(θ, γi) = Cobs

it /C?it − 1

and a pseudo Huber loss function,

L(θ, γiNi=1) = 1N

N∑i=1

Ti∑t=1

δ2(√

1 + (εit(θ, γi)/δ)2 − 1)

with δ = 0.1 being a dampening parameter such that the estimates are more robust tooutliers.

32

References

Ackerberg, D. A. (2009): “A new use of importance sampling to reduce computationalburden in simulation estimation,” Quantitative Marketing and Economics, 7(4), 343–376.

Alan, S., O. Attanasio and M. Browning (2009): “Estimating Euler equationswith noisy data: two exact GMM estimators,” Journal of Applied Econometrics, 24(2),309–324.

Alan, S. and M. Browning (2010): “Estimating Intertemporal Allocation Parametersusing Synthetic Residual Estimation,” The Review of Economic Studies, 77(4), 1231–1261.

Alan, S., M. Browning and M. Ejrnæs (2014): “Income and Consumption: a MicroSemi-structural Analysis with Pervasive Heterogeneity,” .

Andersen, S., G. W. Harrison, M. I. Lau and E. E. Rutström (2008): “ElicitingRisk and Time Preferences,” Econometrica, 76(3), 583–618.

Andersen, S., G. W. Harrison, M. I. Lau and E. E. Rutström (2010): “Pref-erence heterogeneity in experiments: Comparing the field and laboratory,” Journal ofEconomic Behavior & Organization, 73(2), 209–224.

Ando, T. and J. Bai (2016): “Panel Data Models with Grouped Factor StructureUnder Unknown Group Membership,” Journal of Applied Econometrics, 31(1), 163–191, jae.2467.

Andreoni, J. and C. Sprenger (2012): “Risk Preferences Are Not Time Preferences,”The American Economic Review, 102(7), 3357–3376.

Bai, J. and S. Ng (2002): “Determining the Number of Factors in Approximate FactorModels,” Econometrica, 70(1), 191–221.

Bajari, P., J. T. Fox and S. P. Ryan (2007): “Linear Regression Estimation ofDiscrete Choice Models with Nonparametric Distributions of Random Coefficients,”The American Economic Review, 97(2), 459–463.

Barsky, R. B., F. T. Juster, M. S. Kimball and M. D. Shapiro (1997): “Pref-erence Parameters and Behavioral Heterogeneity: An Experimental Approach in theHealth and Retirement Study,” The Quarterly Journal of Economics, 112(2), 537–579.

Beetsma, R. M. W. J. and P. C. Schotman (2001): “Measuring Risk Attitudes ina Natural Experiment: Data from the Television Game Show Lingo,” The EconomicJournal, 111(474), 821–848.

33

Bester, C. A. and C. B. Hansen (forthcoming): “Grouped effects estimators in fixedeffects models,” Journal of Econometrics.

Bonhomme, S. and E. Manresa (2015): “Grouped Patterns of Heterogeneity in PanelData,” Econometrica, 83(3), 1147–1184.

Browning, M. and S. Leth-Petersen (2003): “Imputing consumption from incomeand wealth information,” The Economic Journal, 113(488), F282–F301.

Bryant, P. and J. A. Williamson (1978): “Asymptotic Behaviour of ClassificationMaximum Likelihood Estimates,” Biometrika, 65(2), 273–281.

Cagetti, M. (2003): “Wealth Accumulation Over the Life Cycle and PrecautionarySavings,” Journal of Business & Economic Statistics, 21(3), 339–353.

Cagetti, M. and M. De Nardi (2008): “Wealth Inequality: Data and Models,”Macroeconomic Dynamics, 12(S2), 285–313.

Carroll, C. D. (1992): “The buffer-stock theory of saving: Some macroeconomic evi-dence,” Brookings Papers on Economic Activity, 2, 61–156.

(1997): “Buffer-Stock Saving and the Life Cycle/Permanent Income Hypothesis,”The Quarterly Journal of Economics, 112(1), 1–55.

(2006): “The method of endogenous gridpoints for solving dynamic stochasticoptimization problems,” Economics Letters, 91(3), 312–320.

Carroll, C. D., J. Slacalek and K. Tokuoka (2014): “The Distribution of Wealthand the Marginal Propensity to Consume,” .

Cozzi, M. (2014): “Risk aversion heterogeneity, risky jobs and wealth inequality,”Queen’s Economics Department Working Paper, No. 1286.

De Nardi, M. (2015): “Quantitative Models of Wealth Inequality: A Survey,” NBERWorking Paper 21106.

Deaton, A. (1991): “Saving and liquidity constraints,” Econometrica, 59(5), 1221–1248.

Deaton, A. (1992): Understanding Consumption. Oxford University Press.

Dempster, A. P., N. M. Laird and D. B. Rubin (1977): “Maximum Likelihoodfrom Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society.Series B (Methodological), 39(1), 1–38.

Dhaene, G. and K. Jochmans (forthcoming): “Split-panel jackknife estimation offixed-effect models,” Review of Economic Studies.

34

Dohmen, T., A. Falk, D. Huffman, U. Sunde, J. Schupp and G. G. Wag-ner (2011): “Individual Risk Attitudes: Measurement, Determinants, and BehavioralConsequences,” Journal of the European Economic Association, 9(3), 522–550.

Farhi, E. and I. Werning (2012): “Capital taxation: Quantitative explorations of theinverse Euler equation,” Journal of Political Economy, 120(3), 398–445.

Fernández-Villaverde, J., J. F. Rubio-Ramírez and M. S. Santos (2006):“Convergence properties of the likelihood of computed dynamic models,” Economet-rica, 74(1), 93–119.

Finke, M. S. and S. J. Huston (2013): “Time preference and the importance of savingfor retirement,” Journal of Economic Behavior & Organization, 89, 23–34.

Fox, J. T., K. i. Kim, S. P. Ryan and P. Bajari (2011): “A simple estimator forthe distribution of random coefficients,” Quantitative Economics, 2(3), 381–418.

Fox, J. T., K. i. Kim and C. Yang (2015): “A simple nonparametric approach toestimating the distribution of random coefficients in structural models,” Discussionpaper.

Gârleanu, N. and S. Panageas (2015): “Young, Old, Conservative, and Bold: TheImplications of Heterogeneity and Finite Lives for Asset Pricing,” Journal of PoliticalEconomy, 123(3), 670–685.

Gourinchas, P.-O. and J. A. Parker (2002): “Consumption over the life cycle,”Econometrica, 70(1), 47–89.

Guiso, L. and M. Paiella (2008): “Risk Aversion, Wealth, and Background Risk,”Journal of the European Economic Association, 6(6), 1109–1150.

Guvenen, F. (2006): “Reconciling conflicting evidence on the elasticity of intertemporalsubstitution: A macroeconomic perspective,” Journal of Monetary Economics, 53(7),1451–1472.

(2009): “A Parsimonious Macroeconomic Model for Asset Pricing,” Economet-rica, 77(6), 1711–1750.

Hahn, J. and H. R. Moon (2010): “Panel Data Models with Finite Number of MultipleEquilibira,” Econometric Theory, 26(3), 863–881.

Hahn, J. and W. K. Newey (2004): “Jackknife and analytical bias reduction fornonlinear panel models,” Econometrica, 72, 1295–1319.

35

Heckman, J. (1981): “The Incidental Parameters Problem and the Problem of Ini-tial Conditions in Estimating aDiscrete Time–Discrete Data Stochastic Process,” inStructural Analysis of Discrete Panel Data with Econometric Applications, ed. by C. F.Manski and D. McFadden, pp. 179–195. Cambridge, MA: MIT Press,.

Heckman, J. and B. Singer (1984): “A method for minimizing the impact of distribu-tional assumptions in econometric models for duration data,” Econometrica: Journalof the Econometric Society, pp. 271–320.

Hendricks, L. (2007): “How important is discount rate heterogeneity for wealth in-equality?,” Journal of Economic Dynamics and Control, 31(9), 3042–3068.

Holt, C. A. and S. K. Laury (2005): “Risk Aversion and Incentive Effects: New Datawithout Order Effects,” The American Economic Review, 95(3), 902–904.

Jørgensen, T. H. (forthcoming): “Life-Cycle Consumption and Children: Evidencefrom a Structural Estimation,” Oxford Bulletin of Economics and Statistics.

Jørgensen, T. H. and D. Kristensen (2017): “Simple Estimation of Microecono-metric Models with Latent Dynamic Variables,” unpublished working paper, UniversityCollege London.

Kamakura, W. A. (1991): “Estimating flexible distributions of ideal-points with exter-nal analysis of preferences,” Psychometrika, 56(3), 419–431.

Kaplan, G. (2012): “Inequality and the life cycle,” Quantitative Economics, 3.

Kimball, M. S., C. R. Sahm and M. D. Shapiro (2008): “Imputing risk tolerancefrom survey responses,” Journal of the American statistical Association, 103(483), 1028–1038.

(2009): “Risk Preferences in the PSID: Individual Imputations and FamilyCovariation,” American Economic Review, 99(2), 363–68.

Kocherlakota, N. R. (2010): The New Dynamic Public Finance. Princeton UniversityPress.

Krusell, P. and A. A. Smith (1997): “Incoem and wealth heterogeneity, portfoliochoice, and equilibrium asset returns,” Macroeconomic Dynamics, 1(02), 387–422.

(1998): “Income and wealth heterogeneity in the macroeconomy,” Journal ofPolitical Economy, 106(5), 867–896.

Lin, C.-C. and S. Ng (2012): “Estimation of Panel Data Models with Parameter Het-erogeneity when Group Membership is Unknown,” Journal of Econometric Methods,1(1), 42–55.

36

Meghir, C. and L. Pistaferri (2004): “Income variance dynamics and heterogeneity,”Econometrica, 72(1), 1–32.

Milligan, G. W. and M. C. Cooper (1985): “An examination of procedures fordetermining the number of clusters in a data set,” Psychometrika, 50(2), 159–179.

Nevo, A., J. L. Turner and J. W. Williams (forthcoming): “Usage-Based Pricingand Demand for Residential Broadband,” Econometrica.

Pilla, R. S. and B. G. Lindsay (2001): “Alternative EM methods for nonparametricfinite mixture models,” Biometrika, 88(2), 535–550.

The Danish Ministry of Finance (2003): ældres sociale vilkår. (In Danish).

37

Date post:	24-Dec-2021
Category:	Documents
Upload:	others
View:	14 times
Download:	0 times

EstimatingDynamicEconomicModelswithFixed Eﬀects

Documents