+ All Categories
Home > Documents > Maximum Simulated Likelihood and Expectation-Maximization Methods to Estimate Random Coefficients...

Maximum Simulated Likelihood and Expectation-Maximization Methods to Estimate Random Coefficients...

Date post: 08-Dec-2016
Category:
Upload: cristian
View: 213 times
Download: 0 times
Share this document with a friend
9
65 Transportation Research Record: Journal of the Transportation Research Board, No. 2302, Transportation Research Board of the National Academies, Washington, D.C., 2012, pp. 65–73. DOI: 10.3141/2302-07 E. Cherchi, Department of Transport, Technical University of Denmark, Bygning- storvet 116 B, 2800 Kgs. Lyngby, Denmark. C. A. Guevara, Facultad de Ingeniería y Ciencias Aplicadas, Universidad de los Andes, San Carlos de Apoquindo 2200, Santiago, Chile. Corresponding author: C. A. Guevara, [email protected]. the MMNL model is the maximum simulated likelihood (MSL) method [see, e.g., Train (1)]. Although practical, simulation techniques such as the MSL show some drawbacks. One problem is that the MSL estimators are downward biased for a finite number of draws. In particular, the MSL estimators are consistent and efficient only if the number of draws rises faster than the square root of the sample size (1). In addition, the maximization of the simulated likelihood func- tion might present empirical identification problems. In some cases, the model might not be analytically identifiable, but the use of a small number of draws in simulation may result in a false empirical identification whose estimators are meaningless (2). In many other practical cases, the analytical model might be identifiable, but the sample may contain very little information so that the log likelihood function is almost flat and therefore parameters are not uniquely identifiable. This problem, known as lack of empirical identification (3), can be properly revealed only if a sufficiently large number of draws are used (2). The lack of empirical identification can arise even when the model includes only one random parameter, because it depends on the data at hand; but it is more likely to occur when the number of random coefficients increases [see Cherchi and Ortúzar for a theo- retical discussion (4)] and when a full variance–covariance matrix is considered. The lack of empirical identification due to the large dimensionality of the model is usually known as the “curse of dimensionality.” The problem results from the fact that the num- ber of draws required for estimation grows exponentially with the number of parameters, which quickly makes simulation imprac- tical. Using data sets with repeated observations from the same individual helps in estimating random parameters so the curse of dimensionality is less of a problem; however it is not inevitable. The reason is that the simulation variance is related to the sam- pling variance (5). As discussed by Sándor and Train, when the sampling variance is large the log likelihood function is too flat near the maximum, complicating simulation (6). Several papers report problems of estimating random parameters by using MSL with cross-sectional data, even without correlation; empirical examples of random coefficients logit models estimated with repeated choices [usually in the form of stated preference (SP) experiments] showed that results can be either good or poor. For example, Hess et al. showed no problems related with the curse of dimensionality when a model was estimated with 16 indepen- dent random coefficients by using an SP data set of 500 individuals answering up to 15 choice tasks each (7). In that case a total of Maximum Simulated Likelihood and Expectation-Maximization Methods to Estimate Random Coefficients Logit with Panel Data Elisabetta Cherchi and Cristian Angelo Guevara The random coefficients logit model allows a more realistic represen- tation of agents’ behavior. However, the estimation of that model may involve simulation, which may become impractical with many random coefficients because of the curse of dimensionality. In this paper, the traditional maximum simulated likelihood (MSL) method is compared with the alternative expectation-maximization (EM) method, which does not require simulation. Previous literature had shown that for cross-sectional data, MSL outperforms the EM method in the ability to recover the true parameters and estimation time and that EM has more difficulty in recovering the true scale of the coefficients. In this paper, the analysis is extended from cross-sectional data to the less volatile case of panel data to explore the effect on the relative performance of the methods with several realizations of the random coefficients. In a series of Monte Carlo experiments, evidence suggested four main conclusions: (a) efficiency increased when the true variance–covariance matrix became diagonal, (b) EM was more robust to the curse of dimensionality in regard to efficiency and estimation time, (c) EM did not recover the true scale with cross-sectional or with panel data, and (d) EM systematically attained more efficient estimators than the MSL method. The results imply that if the purpose of the estimation is only to determine the ratios of the model parameters (e.g., the value of time), the EM method should be preferred. For all other cases, MSL should be used. A key assumption required for the derivation of the traditional logit model is that the indirect utility that individuals retrieve from the attributes is deterministic and equal among prespecified classes of the population. In turn, the random coefficients logit model relaxes this behaviorally unrealistic assumption by considering that the coefficients of the utility are instead distributed across the popu- lation. The mixed multinomial logit (MMNL) is the most popu- lar model to account for random coefficients. Its strength is in its simple theoretical formulation, which features a mixture of the standard multinomial logit (MNL) integrated out over the density of the coefficients. The most widely used technique to estimate
Transcript
Page 1: Maximum Simulated Likelihood and Expectation-Maximization Methods to Estimate Random Coefficients Logit with Panel Data

65

Transportation Research Record: Journal of the Transportation Research Board, No. 2302, Transportation Research Board of the National Academies, Washington, D.C., 2012, pp. 65–73.DOI: 10.3141/2302-07

E. Cherchi, Department of Transport, Technical University of Denmark, Bygning-storvet 116 B, 2800 Kgs. Lyngby, Denmark. C. A. Guevara, Facultad de Ingeniería y Ciencias Aplicadas, Universidad de los Andes, San Carlos de Apoquindo 2200, Santiago, Chile. Corresponding author: C. A. Guevara, [email protected].

the MMNL model is the maximum simulated likelihood (MSL) method [see, e.g., Train (1)].

Although practical, simulation techniques such as the MSL show some drawbacks. One problem is that the MSL estimators are downward biased for a finite number of draws. In particular, the MSL estimators are consistent and efficient only if the number of draws rises faster than the square root of the sample size (1). In addition, the maximization of the simulated likelihood func-tion might present empirical identification problems. In some cases, the model might not be analytically identifiable, but the use of a small number of draws in simulation may result in a false empirical identification whose estimators are meaningless (2). In many other practical cases, the analytical model might be identifiable, but the sample may contain very little information so that the log likelihood function is almost flat and therefore parameters are not uniquely identifiable. This problem, known as lack of empirical identification (3), can be properly revealed only if a sufficiently large number of draws are used (2).

The lack of empirical identification can arise even when the model includes only one random parameter, because it depends on the data at hand; but it is more likely to occur when the number of random coefficients increases [see Cherchi and Ortúzar for a theo-retical discussion (4)] and when a full variance–covariance matrix is considered. The lack of empirical identification due to the large dimensionality of the model is usually known as the “curse of dimensionality.” The problem results from the fact that the num-ber of draws required for estimation grows exponentially with the number of parameters, which quickly makes simulation imprac-tical. Using data sets with repeated observations from the same individual helps in estimating random parameters so the curse of dimensionality is less of a problem; however it is not inevitable. The reason is that the simulation variance is related to the sam-pling variance (5). As discussed by Sándor and Train, when the sampling variance is large the log likelihood function is too flat near the maximum, complicating simulation (6).

Several papers report problems of estimating random parameters by using MSL with cross-sectional data, even without correlation; empirical examples of random coefficients logit models estimated with repeated choices [usually in the form of stated preference (SP) experiments] showed that results can be either good or poor. For example, Hess et al. showed no problems related with the curse of dimensionality when a model was estimated with 16 indepen-dent random coefficients by using an SP data set of 500 individuals answering up to 15 choice tasks each (7). In that case a total of

Maximum Simulated Likelihood and Expectation-Maximization Methods to Estimate Random Coefficients Logit with Panel Data

Elisabetta Cherchi and Cristian Angelo Guevara

The random coefficients logit model allows a more realistic represen-tation of agents’ behavior. However, the estimation of that model may involve simulation, which may become impractical with many random coefficients because of the curse of dimensionality. In this paper, the traditional maximum simulated likelihood (MSL) method is compared with the alternative expectation-maximization (EM) method, which does not require simulation. Previous literature had shown that for cross-sectional data, MSL outperforms the EM method in the ability to recover the true parameters and estimation time and that EM has more difficulty in recovering the true scale of the coefficients. In this paper, the analysis is extended from cross-sectional data to the less volatile case of panel data to explore the effect on the relative performance of the methods with several realizations of the random coefficients. In a series of Monte Carlo experiments, evidence suggested four main conclusions: (a) efficiency increased when the true variance–covariance matrix became diagonal, (b) EM was more robust to the curse of dimensionality in regard to efficiency and estimation time, (c) EM did not recover the true scale with cross-sectional or with panel data, and (d) EM systematically attained more efficient estimators than the MSL method. The results imply that if the purpose of the estimation is only to determine the ratios of the model parameters (e.g., the value of time), the EM method should be preferred. For all other cases, MSL should be used.

A key assumption required for the derivation of the traditional logit model is that the indirect utility that individuals retrieve from the attributes is deterministic and equal among prespecified classes of the population. In turn, the random coefficients logit model relaxes this behaviorally unrealistic assumption by considering that the coefficients of the utility are instead distributed across the popu-lation. The mixed multinomial logit (MMNL) is the most popu-lar model to account for random coefficients. Its strength is in its simple theoretical formulation, which features a mixture of the standard multinomial logit (MNL) integrated out over the density of the coefficients. The most widely used technique to estimate

Page 2: Maximum Simulated Likelihood and Expectation-Maximization Methods to Estimate Random Coefficients Logit with Panel Data

66 Transportation Research Record 2302

32 parameters could be estimated with 500 quasi-random number sequences. Equivalently, Train reports a successful estimation of a model with seven parameters, all randomly distributed and cor-related, for an SP data set with 361 individuals answering 12 choice tasks each (8). However, Meijer and Rouwendal, using an SP data set with 235 individuals each answering on average 12.5 choice tasks, found that when all the parameters were allowed to be ran-dom, the random coefficients diverge to infinity, although their model included only four attributes (9). They pointed out that the problem was due to the values of the random coefficients that make the choice (almost) deterministic and therefore render the model unstable. The authors suggested (but did not test it) that a param-eterization should be specified with a full set of random coefficients. Revelt and Train used an SP data set with 401 individuals each answering 12 choice tasks and seven attributes, but the authors esti-mated a model with only six random correlated parameters (10). The cost parameter was fixed because when all coefficients are allowed to vary in the population, identification is empirically difficult (11). There are a few other empirical works that consider correlation but only among a subset of coefficients (12, 13).

There are some alternative methods to estimate MMNL. For exam-ple, Train proposed a recursive method based on an expectation- maximization (EM) algorithm in which the maximum likelihood esti-mators for the parameters that characterize the density function are computed as the maximization of the sample average of the weighted moments over R draws of the conditional distribution (13). Because the method does not require computing derivatives, the numeri-cal problems in the computation of the Fisher information matrix are overcome. Alternatively, Harding and Hausman proposed to approximate the probability integral of the MMNL with a Laplace approximation on the basis of a Taylor expansion around an optimal coefficient for each observation (this method is called HH) (14). Avoiding the simulation burden of calculating the integral, the HH method does not need a number of draws that grows exponentially with the number of dimensions.

Guevara et al. have shown cross-sectional Monte Carlo evidence suggesting that MSL outperforms HH in practice (15). Similarly, Cherchi and Guevara have shown cross-sectional Monte Carlo evi-dence suggesting also that MSL outperforms EM, and that MSL and EM outperform HH in the ability to recover the true parameters and the estimation time (16). The authors also found that EM had more problems recovering the true scale of the model coefficients. The Monte Carlo experiments performed by Cherchi and Guevara showed that all coefficients were estimated with the same scale for EM, but this scale was different from the scale assumed in the data generation process (16). Although the substitution rates among attri-butes and the aggregate elasticities are usually retained when the true scale is not recovered (17), in some situations this may cause a reduc-tion in efficiency and may complicate forecasting (18). Cherchi and Guevara mention that the fact that EM is not able to recover the scale parameter can reveal that EM is more sensitive to challenging situ-ations, whereas MSL might be able to tweak more information out of them (16).

In this paper the analysis of Cherchi and Guevara is extended by assessing the relative performance of MSL and EM methods in the estimation of the random coefficients logit model, but now under a panel data framework (16). The goal is to explore the impact of having several realizations of the random coefficients in the relative performance of the methods, and in the ability of recovering the scale parameter. In particular, this paper compares the performance of the EM and MSL methods under a wide set of parametrical variations

and tests, not only the ability of these two methods to overcome the curse of dimensionality, but also the quality of the estimated parameters, that is, their capability to reproduce the true phenom-enon, including recovering the true scale. The HH method is not considered in this analysis because the HH method cannot take any advantage of the panel data setting and because it has already been shown that HH was largely outperformed by MSL and EM for cross-sectional data (16).

Three reasons motivate studying the performance of the methods under a panel data framework. The first is that, as previously stated, the literature had usually considered the panel data case for the ran-dom coefficients logit problem [see, e.g., Hess et al. (7), Train (8), Meijer and Rouwendal (9), Revelt and Train (10), and Ruud (11)]. The second reason is that curse of dimensionality problems are less likely to occur (but are not inevitable) when there is more than one piece of information for estimating individuals’ tastes, which is the case when panel data are used. Consequently, it is hypothesized that the conclusions about the relative performance of the methods may differ under these alternative frameworks. The third reason is to test whether the problem in recovering the true scale when EM is used will disappear or attenuate when panel data are used.

This research seeks to shed light on the question of whether the MSL method, which is the tool mostly used so far to estimate ran-dom coefficient models with panel data, may be outperformed by the EM method proposed by Train (13). To achieve this goal a set of Monte Carlo experiments was set up by varying systematically the number of estimated parameters and the structure of the param-eters’ variance–covariance matrix. The choices were simulated as panel data by considering that the coefficients were random across individuals, but were repeated across repeated observations for the same individual.

Finally, Bhat (19) and Bhat and Sidharthan (20) recently proposed the novel maximum approximated composite marginal likelihood (MACML) method to estimate MMNL and provided some evidence suggesting that the MACML method can overcome the estimation problems of simulation-based procedures. The comparison of the MACML with EM, and potentially also with the HH method, exceeds the scope of the present paper, but constitutes an obvious next step in the analysis of methods to estimate the MMNL model.

The rest of the paper is organized as follows. The next section starts setting out a review of the random coefficients model for-mulation and the derivation of the MMNL model to focus then on the methods to estimate the MMNL. In particular, a description is given of the MSL method, which is the method most typically used in the presence of random coefficients; the novel EM method is illustrated as well. The following section describes the Monte Carlo experiments set out and illustrates the results obtained from the different estimation methods. The final section summarizes the conclusions.

Random CoeffiCients Logit modeLs: foRmuLation and estimation

theoretical Background

Consider the problem of modeling the behavior of an individual n facing a sequence of discrete choice situations s. The utility that individuals derive from each chosen alternative i is usually mea-sured by means of observable attributes weighted by the degree of importance (preference) that individuals place on each attribute.

Page 3: Maximum Simulated Likelihood and Expectation-Maximization Methods to Estimate Random Coefficients Logit with Panel Data

Cherchi and Guevara 67

The preference for each attribute varies, in general, across the pop-ulation, depending mainly on the individual socioeconomic and demographic characteristics and also for reasons reflecting indi-vidual preferences and concerns (21). Random coefficients models allow accounting for the proportion of individual heterogeneity that the modeler is not able to explain through systematic variations in taste coefficients.

Let Unis be the utility that individual n associates to alternative i in the choice situation s. The random coefficients specification can then be written as

U x x bnis nisT

n nis nisT

n nis= + = +( ) +β ε δ ε ( )1

where

xTnis = transpose of a column vector formed by K explanatory vari-

ables of model, which typically includes level-of-service and socioeconomic variables as well as alternative specific constants, but can of course be extended to include any other variables relevant for the phenomenon as well as any nonlinear effect;

εnis = error term that accounts for the discrepancy between true and measured phenomenon;

βn = column vector of K coefficients that can take any desired joint distribution f(β|θ) across individuals;

θ = coefficients of the joint distribution, with mean b, and variance–covariance matrix Σ; and

δn = vector of individual coefficient perturbations from mean b.

If the error term εnis in Equation 1 has the usual Extreme Value 1 (EV1) type independent and identical distribution with loca-tion zero and scale µ, the typical random coefficients MMNL are obtained (22). In this case, the probability Lni(βn) that individual n would choose a particular sequence of S alternatives i = {i1, . . . , iS}, one for each choice situation s, for a specific realization of βn, would correspond to

Le

en n

x

x

j C

s

S nisT

n

njsT

n

n

µ β

µ β( ) =

= ∑∏

1

2( )

where Cn is the choice set of alternatives faced by individual n and j is an index that denotes each element in the set Cn. The scale µ cannot be identified and needs therefore to be fixed for estimation.

Therefore, because βn is a random variable, the probability Pni that individual n will choose a particular sequence of alternatives i is the expected value Eβ(Lni) integrating over the density f (β|θ) of the random coefficients β:

P E L L f dn B n ni i i= ( ) = ( ) ( )∫θ β β θ β ( )3

which is the MMNL expression for panel data.

estimating Random Coefficients: msL method

The most common way to estimate coefficients in MMNL models is through simulation. In this case the integral in Equation 3 is approx-imated by the simulated probability (SPni), shown in Equation 4, which is an average probability computed over R pseudo–randomly drawn repetitions βr of the vector coefficients of each individual (1).

SPR

Ln n nr

r

R

i i= ( )=

∑14

1

β ( )

The simulated probability SPni is an unbiased estimator of Pni by construction; its variance decreases as R increases, and it is strictly positive and smooth (twice differentiable) in the coefficients β and θ and the variables xni. Smoothness facilitates the numerical search for the maximum of the likelihood function but an estimation using MMNL still presents many challenges. The simulated log likelihood function is consistent only when the number of draws increases with the sample size; it is simulated with a downward bias under a finite number of draws. In fact, a sufficiently large number of draws is rec-ommended when the MMNL model is estimated [see, e.g., Walker (3) and Ben-Akiva and Bolduc (23)], as well as a sensitivity test to verify the stability of parameter estimates as the number of draws increases (24).

Estimating random coefficients with full variance–covariance matrix can be problematic because the variance–covariance matrix needs to be positive definite at each iteration of the search algorithm of the optimization method used to estimate the model. This can be achieved by estimating the elements of a Cholesky decomposition of the variance–covariance matrix, instead of the matrix itself. How-ever, using a Cholesky decomposition does not guarantee the ability to gather the variance–covariance matrix of the estimators because when maximum likelihood methods are used, they are estimated from the information matrix or the Hessian of the likelihood func-tion. Even though the estimators of the Cholesky decomposition might be used to build a proper estimator of the variance–covariance matrix, if the likelihood function is too flat in the vicinity of the convergence point, gathering the standard errors of such estimators might be impossible.

In addition, when all coefficients are allowed to vary in the popula-tion and even more when a full variance–covariance matrix is speci-fied, identification might be empirically more difficult for a problem of scale. As pointed out by Revelt and Train, if the stochastic portion of utility is dominated by the random coefficients δ such that the ε term has little influence, the scaling of utility by the variance of the ε term becomes unstable and additional scaling is needed (10). At an extreme, where the extreme-value term has no influence (i.e., zero variance), the simulated probability becomes an accept/reject simulator and scaling of the remaining utility is required.

estimating Random Coefficients: em method

The EM algorithm consists of a recursive procedure in which, at each iteration (t), the unconditional moments (θt) of the random coefficients are computed as the expected value of the moments of the conditional distributions in the previous iteration (M(θt−1)) (13). The procedure is iterated until θt = θt−1, within a tolerance. Because, at the true parameters (θ*), the expected value of the moments of the conditional distributions M(θ*) equals the unconditional moments (θ*), the iteration procedure guarantees that the estimated parameters (θ̂t) are the true parameters, within a tolerance.

Given a vector β of random coefficients normally distributed with density function f(β|θ), the unconditional moments of β, by definition, are equal to the conditional moments of the distribution integrating out their density function:

θ β β θ ββ

= ( ) ( )∫ m f d ( )5

Page 4: Maximum Simulated Likelihood and Expectation-Maximization Methods to Estimate Random Coefficients Logit with Panel Data

68 Transportation Research Record 2302

where m(β) is a vector including the conditional moments, namely, the mean and the elements of the lower portion of (ββT). Equation 5 holds only when the distribution of β is normal. An extension to other distributions is provided in Train (1, 13).

In an MMNL setting, the unconditional moments can be com-puted by using the Bayes theorem on the conditional density, which states that

h y xL y x f

P y xn nn n

n n

β θβ β θ

θ, ,

,

,( )( ) =

( ) ( )( ) 6

where

yn = observed choice of individual n, L(yn|β, xn) = conditional probability shown in Equation 2 that

individual n would choose a particular sequence of alternatives i, and

P(yn|xn, θ) = unconditional probability in Equation 3. The condi-tional moments are then equal to

θ β β θ ββ

= ( ) ( )∫ m h y x dn n, , ( )7

and their expectation computed for a sample (N) of the population is then given by

MN

mL y x

P y xf dn n

n nn

N

θ ββ

θβ θ β

β( ) = ( ) ( )

( ) ( )∫∑=

1

1

,

,(( )8

Equation 8 can be solved by simulation:

MNR

w mnr nrr

R

n

N

θ β( ) = ( )==

∑∑19

11

( )

where βnr is a realization of β for each draw r and each individual n and the weight (wnr) is the simulated unconditional moment computed for each draw r and each individual n.

wL y x

RL y x

nrn nr n

n nr nr

R=

( )( )′

′=∑

β

β

,

,

( )1

10

1

and the recursion consists in computing the unconditional moments (θt) at each iteration (t) as the average weight of the moments at the previous iteration (t − 1). This means, first, that Equation 9 is used to simulate the expectation of the unconditional moment (M(θt−1)) at time (t − 1) and, second, that the unconditional moments (θt) are computed at time t as θt = M(θt−1).

monte CaRLo expeRiments

experimental setting

In this section a Monte Carlo experiment is developed to compare the performance of MSL and EM methods in estimating random coefficient models with a full variance–covariance matrix. The experimental setting used is similar to the one used by Cherchi and Guevara, but differs principally in that a panel data structure is now

considered in which the realizations of the random coefficients are repeated a given number of times for each individual (16).

In particular, the data sets were generated as if a sample of indi-viduals would have answered a typical SP experiment, in which interviewees are asked to make a series of hypothetical choices between two alternatives. The data set generated in Cherchi and Guevara was used as the reference alternative (i.e., as if they were the real attributes revealed by each individual), and the SP design was built pivoting around each individual real value, by using an orthogonal design (16). Five SP experiments were generated includ-ing two, four, six, eight, and 10 attributes, respectively, each one at three levels. For four and more variables only main effects were considered. Therefore nine choice sets were considered in the experiment with four attributes, 18 in the experiment with six attri-butes and 27 in the experiment with eight and 10 attributes. For the experiment with two attributes, a full factorial design with nine choice sets was considered instead, with the only purpose being to have a number of individual realizations comparable with the other data sets. Because the repetitions in the SP experiments increase the dimension of the sample, only a subsample of the data set generated in Cherchi and Guevara was actually used to generate the current database (16). Moreover, only two alternatives were considered to simplify the SP design. Table 1 summarizes the characteristics of the data sets generated, including the number of individuals and pseudoindividuals (i.e., the number of individuals multiplied by the number of choice sets) for each data set generated.

Following Williams and Ortúzar, a collection of samples was simulated according to the following choice process (25):

• All individuals have two alternatives available.• Linear-in-the-parameters utility functions are specified.• All coefficients are randomly distributed and fully correlated

across individuals.• Each random coefficient varies among individuals but not within

the sequence of choice situations faced by each individual.• An EV1 error is added to the utility functions, independently

distributed between alternatives and among the choice set.

Several Monte Carlo experiments were performed systemati-cally varying the number of estimated parameters and the structure of the parameters’ variance–covariance matrix. The number of estimated parameters across experiments was subsequently two, four, six, eight, and 10. To analyze the effect of the structure of the variance–covariance matrix, the diagonal elements were main-tained to be equal to 0.3 in all experiments, but the elements out-side the diagonal were varied in each experiment, by subsequently using the values 0.05, 0.1, 0.15, 0.20, and 0.25.

TABLE 1 Characteristics of Samples Generated

No. of Attributes

No. of Choice Sets

No. of Individuals

No. of Pseudo-individuals

2 9 2,000 18,000

4 9 2,000 18,000

6 18 2,000 36,000

8 27 1,000 27,000

10 27 1,000 27,000

Note: No. = number.

Page 5: Maximum Simulated Likelihood and Expectation-Maximization Methods to Estimate Random Coefficients Logit with Panel Data

Cherchi and Guevara 69

Inside each experiment the covariance among any pair of param-eters was the same (i.e., all equal to 0.05, and all equal to 0.1, and so on), which guarantees consistency throughout the experiments when the number of random coefficients is increased. The data were also generated to guarantee that the choices were almost equally split between the two alternatives available (approximately 50% for each alternative). Following Cherchi and Guevara the random coef-ficients were drawn from a normal distribution with mean of −1 and variance of 0.30 (16).

The number of random parameters and the degree of the co variance influence the capability to estimate the model and, more generally, the capability to reproduce the true parameters. It is then crucial to generate the data so as to be able to control for each effect separately, that is, to be able to increase the number of random coefficients but at the same time controlling that the covariance between any additional pairs of parameters is the same. Keeping the correlation constant across pairs of coefficients allows one to systematically disentangle these effects. It was found that both effects were relevant for the capacity of estimating the model and for recovering the true model parameters.

For the EV1 error terms, a scale parameter (µ) equal to 1.2825 was used. Each sample was generated 50 times with the same character-istics but varying the seed used to generate the random coefficients and the EV1 error terms. When simulated data are used, it is important to control for the weight of the EV1 component on the total random utility. As mentioned previously, the reason is that when the stochastic portion of utility is dominated by the random coefficients, scaling of utility by the variance of the EV1 term becomes unstable. The weight of the EV1 component on the stochastic portion of the random utility varies with the number and with the covariance of the random coef-ficients specified in the utility. In the SP simulated data the average stochastic portion of utility due to the variance of EV1 across the repetitions is never lower than 38%, which occurs for cases with low covariance and two random coefficients, or lower than 10% for cases with high covariance and 10 random coefficients.

assessment of em and msL estimation methods

With the data sets generated under the assumptions described previ-ously, parameters were estimated by using the EM and MSL methods. With these estimators the performance of the methods were analyzed in regard to the effect of the correlation between the random coeffi-cients, their robustness to the curse of dimensionality, their ability to recover the true scale, and their estimation times.

Simulated data offer the great advantage over real data that the true value of the parameters is known. It is then possible to control for the ability of each estimation method to correctly reproduce the true parameters. To measure this, the bias, the t-test against the true value, and the mean squared error of each estimated parameter were computed.

The bias offers a comparative measure of the efficacy of each method in recovering the true parameters. The bias corresponds to the difference between the true values and the averages of the estimated parameter across repetitions. For proper comparison with their respective true values, the estimators of the mean of each coef-ficient and of the elements of the Cholesky decomposition were divided by 1.2825.

The t-test against the respective true value of each model param-eter complements the measure of efficacy of the bias. The t-test

allows formally testing the null hypothesis that the estimators are equal to their respective true values with some level of confidence. This t-test was computed as the ratio between the bias and the stan-dard deviation of the estimated parameter across the 50 repetitions of the sample.

The mean squared error offers a comparative measure of the effi-ciency of each method in recovering the true parameters. This measure corresponds to the sum of the square of the bias plus the variance of each estimator. The variance of the estimators was gathered from the sampling distribution, which was in turn gathered from the repetitions.

Analogously to the estimation with real data, the expectation of the log likelihood across the repetitions is also reported. The overall adjustment of the model to the data can be measured and one can understand the way the model would look if a real situation existed in which the true phenomenon was not known. The results are sum-marized in Tables 2 and 3, in which the results obtained by using two and four variables, respectively, are reported. Only the results with two and four coefficients are reported for reasons of space; comments will be made in the text about the results with six, eight, and 10 variables. For the experiment with two variables, the esti-mators obtained for the whole range of correlations considered in the experiment will be reported; for the experiment with four vari-ables only the results obtained when the correlation was 0.25, 0.15, and 0.05 are reported. Both tables report the estimators of the mean of the utility coefficients and the estimators of the elements of the inferior triangular of the respective Cholesky decomposition of the variance–covariance matrix of the random coefficients. Finally, for all cases the number of draws used in the estimation was increased to check the stability of the estimated results.

Impact of Degree of Covariance Between Random Coefficients

To begin, the effect of the degree of covariance among the random coefficients in the efficacy and efficiency of the MSL and EM estimation methods is analyzed. When the model considers only two variables, results in Tables 2 and 3 show that there is a small improvement in efficacy when the correlation decreases. The reason is that the average of the absolute value of the bias, among all the estimators of each experiment, decreases slightly when the degree of covariance between the random coefficients decreases. This mod-est effect becomes even less clear when the number of variables increases to four and more. This relatively inconclusive effect with panel data seems to contradict the findings with cross-sectional data, which suggests that further investigation is needed to clarify this point (16).

However, results suggest that there is a relative increase in effi-ciency as correlation between the random coefficients decreases. This increased efficiency can be noted in that the average mean squared error of all of the estimators decreases with the degree of covariance. This effect is relatively mixed when the model considers only two variables and tends to be more marked for larger numbers of variables. An equivalent result was found for cross-sectional data (16).

Impact of Curse of Dimensionality

The second issue analyzed is the performance of the methods in regard to the problem of the curse of dimensionality. This issue can

Page 6: Maximum Simulated Likelihood and Expectation-Maximization Methods to Estimate Random Coefficients Logit with Panel Data

70 Transportation Research Record 2302

be assessed by comparing the efficacy (bias) and efficiency (mean squared error) of the methods as the number of variables grows.

The experiments for MSL showed that the efficacy is reduced to an important extent when the number of variables estimated increases. The average of the absolute value of the bias is almost doubled when the model considers four variables, compared with the model with two variables. For more than four variables, the increase in the bias is also observed, but it is much less marked. However, this effect tends to vanish when a correlation of 0.05 between the random coefficients is considered.

The effect of the curse of dimensionality in the efficacy of the EM method is the opposite of the one obtained with MSL. For EM, the average of the absolute value of the bias shrinks as the number of variables increases. This shrinking value suggests that the EM method is more robust to the curse of dimensionality and confirms the result found for cross-sectional data (16).

Besides the effect on efficiency and efficacy, it was found that the estimation times grew considerably as the number of variables grew, which is also evidence of the problem of the curse of dimensionality. This is further analyzed at the end of this section.

Finally, although cases were found for which the model could not be estimated with a large number of variables, those cases were significantly less than in the experiments considered by Guevara et al. (15) or Cherchi and Guevara (16). This arguably occurs because in this experiment a binary logit was considered instead of a trinomial logit and a panel data structure, instead of a cross-section structure. Another possibility is that now the simulated experiment was better controlled.

Comparison Between MSL and EM Methods

In regard to the relative comparison between the MSL and EM meth-ods, EM showed better log likelihood than MSL for all experiments

considered. This showing indicates that, besides potential problems in recovering the scale, which can obscure the comparison using the bias, EM estimators are objectively better than the MSL estimates.

In regard to the bias, MSL is the method that achieves the best results. This finding can be verified either by comparing the number of estimators with smaller bias between methods, or by compar-ing the averages for equivalent experiments. However, the relative superiority of MSL in regard to the bias tends to vanish when the correlation between the random coefficients decreases.

In regard to efficiency, the results for MSE with two variables are mixed, with no clear advantage for MSL or for EM. In turn, in the case of four variables, EM is clearly superior to MSL when the correlation is high, but the relative position is inverted when the correlation decreases. A similar, but less marked, effect was found with more than four variables.

Recovery of True Scale for EM Method

Cherchi and Guevara found that the EM method has some problems in recovering the true scale of the model for a cross-sectional experi-ment. In this section the prevalence of this finding is studied in the case of panel data (16). Tables 2 and 3 show that all t-tests against the true values of the parameters accept the null hypothesis that they are equal, with 95% confidence. This finding could be interpreted as if the problem of the recovery of the true scale with the EM method with cross-sectional data seems not to be present in this panel data experiment. However, although the null hypothesis that the estima-tors are equal to their true values is accepted for MSL and EM, the confidence is significantly greater for MSL experiments. This fact suggests that the problem of recovering the scale with the EM method is also still present when panel data are used, but it is less significant than for the cross-sectional experiments [see Cherchi and Guevara (16)]. This hypothesis is reinforced by the facts that (a) the

TABLE 2 Assessment of MSL and EM with Two Variables

True Covariance

0.25 0.20 0.15 0.10 0.05

Estimated Parameter Value

t-Test (true) Bias MSE Value

t-Test (true) Bias MSE Value

t-Test (true) Bias MSE Value

t-Test (true) Bias MSE Value

t-Test (true) Bias MSE

MSL Method

β1a −1.2802 0.0369 0.0018 0.002 −1.2840 −0.0220 −0.0011 0.002 −1.2903 −0.1169 −0.0060 0.002 −1.2756 0.1036 0.0054 0.004 −1.2752 0.1080 0.0057 0.003

β2a −1.2767 0.0921 0.0046 0.002 −1.2823 0.0040 0.0002 0.002 −1.2865 −0.0592 −0.0031 0.001 −1.2762 0.0932 0.0050 0.004 −1.2849 −0.0337 −0.0018 0.002

Ch11b 0.6814 −0.1778 −0.0164 0.006 0.7047 0.0183 0.0017 0.012 0.6853 −0.1335 −0.0134 0.009 0.7052 0.0207 0.0021 0.012 0.6888 −0.0995 −0.0107 0.012

Ch21b 0.5586 −0.3715 −0.0209 0.013 0.4570 −0.3013 −0.0088 0.028 0.3448 −0.0118 −0.0050 0.026 0.2481 −0.1594 0.0108 0.040 0.0993 −0.2205 −0.0139 0.037

Ch22b 0.3533 −0.1723 −0.0273 0.011 0.4996 −0.0643 −0.0187 0.004 0.6074 −0.0323 −0.0007 0.003 0.6481 0.0668 −0.0111 0.006 0.6692 −0.0853 −0.0183 0.014

LL/Obs.c −5.9415 −5.9376 −5.9294 −5.9286 −5.9192

EM Method

β1a −1.1996 1.3248 0.0647 0.007 −1.2120 1.1124 0.0550 0.006 −1.2260 0.8682 0.0441 0.005 −1.2428 0.5844 0.0310 0.006 −1.2391 0.6315 0.0339 0.004

β2a −1.1974 1.3410 0.0664 0.008 −1.2106 1.1274 0.0561 0.006 −1.2210 0.9301 0.0480 0.005 −1.2452 0.5410 0.0291 0.005 −1.2477 0.4984 0.0272 0.003

Ch11b 0.5735 −0.9354 −0.1005 0.015 0.6068 −0.7108 −0.0746 0.012 0.5914 −0.7769 −0.0866 0.011 0.6297 −0.5092 −0.0568 0.009 0.6186 −0.5527 −0.0654 0.008

Ch21b 0.4022 −0.9138 −0.1428 0.027 0.3967 −0.3530 −0.0558 0.009 0.2356 −0.4833 −0.0901 0.018 0.1290 −0.4210 −0.0820 0.024 −0.0280 −0.3309 −0.1131 0.019

Ch22b 0.4083 0.2218 0.0156 0.001 0.4431 −0.9634 −0.0628 0.005 0.5510 −0.6692 −0.0448 0.006 0.6183 −0.4017 −0.0343 0.006 0.6234 −0.4242 −0.0540 0.005

LL/Obs.c −5.2765 −5.2741 −5.2678 −5.2671 −5.2607

Note: MSE = mean squared error.aβ1 and β2 = estimators of the mean of the random coefficients.bCh11, Ch21, and Ch22 = elements of the inferior triangular of the respective Cholesky decomposition of the variance-covariance matrix of the random coefficients.cLL/Obs. = log likelihood per observation for estimated model.

Page 7: Maximum Simulated Likelihood and Expectation-Maximization Methods to Estimate Random Coefficients Logit with Panel Data

Cherchi and Guevara 71

bias tends to be larger for EM but, at the same time, the likelihood is also larger in this case and (b) the estimators of MSL tend to be systematically larger than those of EM. The experiments show that the difference in scale between MSL and EM seems to be reduced as the correlation between the random coefficients is reduced and when the number of variables in the model increases. EM seems to have problems, however, in recovering the average of the random parameters but performs better than MSL in recovering the elements of the Cholesky matrix.

The source for this unexplained change of scale that occurs with EM, and its possible effects on efficiency and forecasting, had not been reported in previous literature. Although a formal theoretical explanation for the change of scale cannot be provided, two plau-sible hypotheses can be formulated: (a) it can be attributed to the higher sensitivity of EM to identification problems (this hypoth-esis was suggested by Kenneth Train in an e-mail exchange) and (b) the iterative nature of the EM method is reflected in the addi-tion of exogenous white noise to the structural equation of the choice model, increasing the variance of the actual error term and, therefore, decreasing the empirical scale [see, e.g., Guevara and Ben-Akiva (26)]. The study of these hypotheses is left for further research.

Estimation Time

In regard to estimation time, it was found first that, as expected, it grows exponentially with the number of variables. It was also found that the EM method was considerably slower than MSL for a small number of variables, but the order was inverted when eight and 10 variables were considered. In addition, it was found that estima-tion times for this panel data experiment were much smaller than those obtained for the cross-sectional experiments considered by Cherchi and Guevara (16). Finally, in regard to the effect of the

degree of covariance on the estimation time, evidence was found suggesting that, for MSL, the larger the covariance, the smaller the estimation time. However, an opposite effect was found in the case of the EM method. These effects were more evident as the number of variables in the model increased.

For the model with two variables, MSL estimation times of each repetition were on the order of seconds for all degrees of covari-ance. In the case of EM, estimation times were about 3 min, with a small reduction when correlation was 0.25. For the model with four variables, MSL estimation times were about 30 s for small correla-tions, and jumped up to 2 min for a correlation of 0.25. In turn, EM took about 15 min for the lowest covariance and only about 4 min for the highest covariance. For six variables, MSL took about 6 min, whereas EM took about 17 min. But then, for eight variables MSL took about 15 min, and EM took about 10 min. Finally, for 10 vari-ables MSL took on average 50 min for each repetition, whereas EM took only 20 min.

ConCLusion

In this paper, the analysis of Cherchi and Guevara was extended by assessing the relative performance of MSL and EM methods in the estimation of the random coefficients logit model under a panel data framework (16). The goal was to explore the effect of having several realizations of the random coefficients in the relative per-formance of the methods. In particular, Monte Carlo experiments were used to compare the performance of the EM and MSL methods under a wide set of parametrical variations and to test not only the ability of these two methods to overcome the curse of dimensional-ity, but also the quality of the estimated parameters, that is, their capability to reproduce the true phenomenon, including recovering the true scale.

TABLE 2 Assessment of MSL and EM with Two Variables

True Covariance

0.25 0.20 0.15 0.10 0.05

Estimated Parameter Value

t-Test (true) Bias MSE Value

t-Test (true) Bias MSE Value

t-Test (true) Bias MSE Value

t-Test (true) Bias MSE Value

t-Test (true) Bias MSE

MSL Method

β1a −1.2802 0.0369 0.0018 0.002 −1.2840 −0.0220 −0.0011 0.002 −1.2903 −0.1169 −0.0060 0.002 −1.2756 0.1036 0.0054 0.004 −1.2752 0.1080 0.0057 0.003

β2a −1.2767 0.0921 0.0046 0.002 −1.2823 0.0040 0.0002 0.002 −1.2865 −0.0592 −0.0031 0.001 −1.2762 0.0932 0.0050 0.004 −1.2849 −0.0337 −0.0018 0.002

Ch11b 0.6814 −0.1778 −0.0164 0.006 0.7047 0.0183 0.0017 0.012 0.6853 −0.1335 −0.0134 0.009 0.7052 0.0207 0.0021 0.012 0.6888 −0.0995 −0.0107 0.012

Ch21b 0.5586 −0.3715 −0.0209 0.013 0.4570 −0.3013 −0.0088 0.028 0.3448 −0.0118 −0.0050 0.026 0.2481 −0.1594 0.0108 0.040 0.0993 −0.2205 −0.0139 0.037

Ch22b 0.3533 −0.1723 −0.0273 0.011 0.4996 −0.0643 −0.0187 0.004 0.6074 −0.0323 −0.0007 0.003 0.6481 0.0668 −0.0111 0.006 0.6692 −0.0853 −0.0183 0.014

LL/Obs.c −5.9415 −5.9376 −5.9294 −5.9286 −5.9192

EM Method

β1a −1.1996 1.3248 0.0647 0.007 −1.2120 1.1124 0.0550 0.006 −1.2260 0.8682 0.0441 0.005 −1.2428 0.5844 0.0310 0.006 −1.2391 0.6315 0.0339 0.004

β2a −1.1974 1.3410 0.0664 0.008 −1.2106 1.1274 0.0561 0.006 −1.2210 0.9301 0.0480 0.005 −1.2452 0.5410 0.0291 0.005 −1.2477 0.4984 0.0272 0.003

Ch11b 0.5735 −0.9354 −0.1005 0.015 0.6068 −0.7108 −0.0746 0.012 0.5914 −0.7769 −0.0866 0.011 0.6297 −0.5092 −0.0568 0.009 0.6186 −0.5527 −0.0654 0.008

Ch21b 0.4022 −0.9138 −0.1428 0.027 0.3967 −0.3530 −0.0558 0.009 0.2356 −0.4833 −0.0901 0.018 0.1290 −0.4210 −0.0820 0.024 −0.0280 −0.3309 −0.1131 0.019

Ch22b 0.4083 0.2218 0.0156 0.001 0.4431 −0.9634 −0.0628 0.005 0.5510 −0.6692 −0.0448 0.006 0.6183 −0.4017 −0.0343 0.006 0.6234 −0.4242 −0.0540 0.005

LL/Obs.c −5.2765 −5.2741 −5.2678 −5.2671 −5.2607

Note: MSE = mean squared error.aβ1 and β2 = estimators of the mean of the random coefficients.bCh11, Ch21, and Ch22 = elements of the inferior triangular of the respective Cholesky decomposition of the variance-covariance matrix of the random coefficients.cLL/Obs. = log likelihood per observation for estimated model.

Page 8: Maximum Simulated Likelihood and Expectation-Maximization Methods to Estimate Random Coefficients Logit with Panel Data

72 Transportation Research Record 2302

Concerning the effect of the structure of the variance–covari-ance of the random coefficients, evidence was found that there is an increase in efficiency for both methods as the true variance–covariance matrix becomes more diagonal. A similar result was found for the estimators’ efficacy, but the evidence was much less conclusive in that case.

In regard to the overall efficacy and efficiency of the methods, it was found that MSL had a systematically smaller bias than EM, but larger mean squared error and poorer adjustment to the data, in regard to the log likelihood. This seemingly contradictory result is explained by the finding that EM has difficulties in recovering the true scale of the model with cross-sectional data and with panel data. The source for this unexplained change of scale and its possible effects on efficiency and forecasting are left for further research.

Concerning the curse of dimensionality, the evidence showed that the EM method is more robust to this problem than the MSL method. The efficacy and efficiency of the EM method are less affected when the number of variables was increased. Furthermore, EM was even found to be faster than MSL when the number of parameters of the model was larger than eight.

In summary, the overall recommendation on which method to use can be stated as follows: If the purpose of the estimation is only to determine the ratios of the model parameters (e.g., the value of time), the EM should be preferred because it is more efficient and estimates the ratios correctly. However, if the purpose of the estima-tion is to use the model for forecasting, MSL should be preferred because EM is unable (in general) to recover the scale of the model and that inability affects the forecasting properties of the model.

TABLE 3 Assessment of MSL and EM with Four Variables

True Covariance

0.25 0.15 0.05

Estimated Parameter Value

t-Test (true) Bias MSE Value

t-Test (true) Bias MSE Value

t-Test (true) Bias MSE

MSL Method

β1a −1.2874 −0.0887 −0.004 0.0019 −1.2914 −0.1578 −0.007 0.0017 −1.2761 0.1122 0.005 0.0020

β2a −1.2788 0.0725 0.003 0.0012 −1.2870 −0.0826 −0.003 0.0025 −1.2855 −0.0543 −0.002 0.0017

β3a −1.2751 0.1323 0.006 0.0020 −1.2869 −0.0751 −0.003 0.0026 −1.2803 0.0379 0.002 0.0024

β4a −1.2824 0.0031 0.000 0.0014 −1.2858 −0.0609 −0.002 0.0016 −1.2908 −0.1546 −0.006 0.0014

Ch11b 0.7141 0.1607 0.010 0.0031 0.7069 0.0956 0.006 0.0055 0.7076 0.1163 0.008 0.0047

Ch21b 0.5765 −0.0565 −0.005 0.0072 0.3448 −0.0725 −0.007 0.0104 0.1211 −0.0344 −0.004 0.0132

Ch22b 0.3678 −0.1667 −0.017 0.0200 0.6009 −0.0368 −0.002 0.0036 0.6959 0.0099 0.001 0.0026

Ch31b 0.5487 −0.3074 −0.030 0.0119 0.3348 −0.0850 −0.010 0.0127 0.0970 −0.1596 −0.019 0.0161

Ch32b 0.1489 −0.1515 −0.024 0.0537 0.1627 −0.1857 −0.023 0.0152 0.0857 −0.2106 −0.022 0.0157

Ch33b 0.2901 −0.3118 −0.045 0.0302 0.5613 −0.1124 −0.010 0.0138 0.6671 −0.1999 −0.015 0.0049

Ch41b 0.5874 0.0986 0.008 0.0063 0.3598 0.1021 0.010 0.0099 0.1206 −0.0285 −0.003 0.0106

Ch42b 0.1704 −0.0689 −0.010 0.0362 0.2252 0.1564 0.018 0.0093 0.1176 0.1386 0.014 0.0078

Ch43b 0.0417 −0.4523 −0.054 0.0436 0.0978 −0.3522 −0.042 0.0249 0.0571 −0.2163 −0.023 0.0137

Ch44b 0.1118 −1.2036 −0.173 0.0673 0.4865 −0.6962 −0.057 0.0269 0.6521 −0.2597 −0.017 0.0034

LL/Obs. −1.2412 −1.2385 −1.2357

EM Method

β1a −1.2542 0.5055 0.022 0.0018 −1.2596 0.4003 0.018 0.0018 −1.2575 0.4229 0.020 0.0029

β2a −1.2608 0.4198 0.017 0.0013 −1.2679 0.2757 0.011 0.0021 −1.2746 0.1437 0.006 0.0019

β3a −1.2499 0.5399 0.025 0.0019 −1.2615 0.3404 0.016 0.0028 −1.2650 0.2739 0.014 0.0033

β4a −1.2534 0.5336 0.023 0.0017 −1.2582 0.4326 0.019 0.0020 −1.2747 0.1336 0.006 0.0016

Ch11b 0.6535 −0.5667 −0.037 0.0027 0.6320 −0.7564 −0.052 0.0069 0.6539 −0.4875 −0.035 0.0081

Ch21b 0.5812 −0.0093 −0.001 0.0015 0.3592 0.0383 0.004 0.0086 0.1120 −1.7179 −0.189 0.0565

Ch22b 0.3086 −0.3882 −0.063 0.0042 0.5438 −0.5812 −0.047 0.0046 0.6693 0.8076 0.051 0.0067

Ch31b 0.5661 −0.1442 −0.017 0.0018 0.3574 0.0601 0.008 0.0082 0.0888 −1.4990 −0.202 0.0660

Ch32b 0.1413 −0.1099 −0.030 0.0012 0.1500 −0.2167 −0.033 0.0106 0.0680 −0.8030 −0.097 0.0275

Ch33b 0.2694 −0.2947 −0.061 0.0038 0.4825 −0.5911 −0.072 0.0075 0.5987 0.1970 0.019 0.0079

Ch41b 0.5799 0.0235 0.002 0.0017 0.3466 −0.0069 −0.001 0.0049 0.0902 −1.7711 −0.201 0.0603

Ch42b 0.1466 −0.1287 −0.029 0.0012 0.2256 0.1359 0.018 0.0063 0.1194 −0.5432 −0.065 0.0174

Ch43b 0.0850 −0.0800 −0.021 0.0006 0.0772 −0.3456 −0.058 0.0108 0.0104 −1.3360 −0.110 0.0284

Ch44b 0.2611 −0.4448 −0.056 0.0033 0.4736 −0.7371 −0.067 0.0064 0.6180 0.5611 0.046 0.0075

LL/Obs. −1.1202 −1.1177 −1.1152

aβ3 and β4 = estimators of the mean of random coefficients.bCh31, Ch32, Ch33, Ch41, Ch42, Ch43, and Ch44 = elements of the inferior triangular of the respective Cholesky decomposition of the variance–covariance matrix of the random coefficients.

Page 9: Maximum Simulated Likelihood and Expectation-Maximization Methods to Estimate Random Coefficients Logit with Panel Data

Cherchi and Guevara 73

aCknowLedgments

The authors thank Kenneth Train for facilitating the MSL and the EM algorithm code software and for his comments on some of the results obtained with EM. This research was partially supported by Fondo de Ayuda a la Investigación of the Universidad de los Andes, and by Fondecyt of Chile.

RefeRenCes

1. Train, K. Discrete Choice Methods with Simulation. Cambridge University Press, Cambridge, United Kingdom, 2009.

2. Chiou, L., and J. L. Walker. Masking Identification in the Discrete Choice Model Under Simulation Methods. Journal of Econometrics, Vol. 141, No. 2, 2007, pp. 683–703.

3. Walker, J. L. Extended Discrete Choice Models: Integrated Framework, Flexible Error Structures, and Latent Variables. PhD dissertation. Massachusetts Institute of Technology, Cambridge, 2001.

4. Cherchi, E., and J. de D. Ortúzar. Can Mixed Logit Infer the Actual Data Generating Process? Some Implications for Environmental Assessment. Transportation Research Part D, Vol. 15, No. 7, 2010, pp. 428–442.

5. McFadden, D. A Method of Simulated Moments for Estimation of Dis-crete Response Models Without Numerical Integration. Econometrica, Vol. 57, No. 5, 1989, pp. 995–1026.

6. Sándor, Z., and K. Train. Quasi-Random Simulation of Discrete Choice Models. Transportation Research Part B, Vol. 38, No. 4, 2004, pp. 313–327.

7. Hess, S., K. Train, and J. W. Polak. On the Use of a Modified Latin Hypercube Sampling (MLHS) Method in the Estimation of a Mixed Logit Model for Vehicle Choice. Transportation Research Part B, Vol. 40, No. 2, 2006, pp. 147–163.

8. Train, K. Recreation Demand Models with Taste Variation. Land Economics, Vol. 74, No. 2, 1998, pp. 230–239.

9. Meijer, E., and J. Rouwendal. Measuring Welfare Effects in Models with Random Coefficients. Journal of Applied Econometrics, Vol. 21, No. 2, 2006, pp. 227–244.

10. Revelt, D., and K. Train. Mixed Logit with Repeated Choices. Review of Economics and Statistics, Vol. 80, No. 4, 1998, pp. 647–657.

11. Ruud, P. Simulation of the Multinomial Probit Model: An Analysis of Covariance Matrix Estimation. Working paper. Department of Economics, University of California, Berkeley, 1996.

12. Hensher, D. A., and W. H. Greene. The Mixed Logit Model: The State of Practice. Transportation, Vol. 30, No. 2, 2003, pp. 133–176.

13. Train, K. A Recursive Estimator for Random Coefficient Models. Working paper. Department of Economics, University of California, Berkeley, 2007.

14. Harding, M. C., and J. Hausman. Using a Laplace Approxima-tion to Estimate the Random Coefficients Logit Model by Nonlinear Least Squares. International Economic Review, Vol. 48, No. 4, 2007, pp. 1311–1328.

15. Guevara, C. A., E. Cherchi, and M. Moreno. Estimating Random Coefficient Logit Models with Full Covariance Matrix: Comparing Performance of Mixed Logit and Laplace Approximation Methods. In Transportation Research Record: Journal of the Transportation Research Board, No. 2132, Transportation Research Board of the National Academies, Washington, D.C., 2009, pp. 87–94.

16. Cherchi, E., and C. A. Guevara. A Monte Carlo Experiment to Analyse the Curse of Dimensionality in Estimating Random Coefficients Mod-els with a Full Variance–Covariance Matrix. Transportation Research Part B, Vol. 46, No. 2, 2012, pp. 321–332.

17. Cramer, J. Robustness of Logit Analysis: Unobserved Heterogeneity and Mis-specified Disturbances. Oxford Bulletin of Economics and Statistics, Vol. 69, No. 4, 2007, pp. 545–555.

18. Guevara, C. Endogeneity and Sampling of Alternatives in Spatial Choice Models. PhD dissertation. Massachusetts Institute of Technology, Cambridge, 2010.

19. Bhat, C. R. The Maximum Approximated Composite Marginal Like-lihood (MACML) Estimation of Mixed Unordered Response Choice Models. Presented at 90th Annual Meeting of the Transportation Research Board, Washington, D.C., 2011.

20. Bhat, C. R., and R. Sidharthan. A Simulation Evaluation of the Maxi-mum Approximated Composite Marginal Likelihood (MACML) Estimator for Mixed Cross-Sectional and Panel Unordered Response Choice Models. Presented at 90th Annual Meeting of the Transportation Research Board, Washington, D.C., 2011.

21. Tversky, A. Elimination by Aspects: A Theory of Choice. Psychological Review, Vol. 79, No. 4, 1972, pp. 281–299.

22. McFadden, D., and K. Train. Mixed MNL Models for Discrete Response. Journal of Applied Econometrics, Vol. 15, No. 5, 2000, pp. 447–470.

23. Ben-Akiva, M., and D. Bolduc. Multinomial Probit with a Logit Kernel and a General Parametric Specification of the Covariance Structure. Working paper. Massachusetts Institute of Technology, Cambridge, 2006.

24. Walker, J. Mixed Logit (or Logit Kernel) Model: Dispelling Mis-conceptions of Identification. In Transportation Research Record: Journal of the Transportation Research Board, No. 1805, Transporta-tion Research Board of the National Academies, Washington, D.C., 2002, pp. 86–98.

25. Williams, H. C. W. L., and J. de D. Ortúzar. Behavioural Theories of Dispersion and the Mis-specification of Travel Demand Models. Transportation Research Part B, Vol. 16, No. 39, 1982, pp. 167–219.

26. Guevara, C. A., and M. Ben-Akiva. Change of Scale and Forecasting with the Control-Function Method in Logit Models. Transportation Science (forthcoming).

The Transportation Demand Forecasting Committee peer-reviewed this paper.


Recommended