Taste variation in discrete choice models

transcript

Taste Variation in DiscreteChoice Models

ANDREW CHESHERUniversity College London

J. M. C. SANTOS SILVAISEG Universidade Tecnica de Lisboa

First version received April 1995; final version accepted June 2001 (Eds.)

This paper develops an extension of the classical multinomial logit model whichapproximates a class of models obtained when there is uncontrolled taste variation across agentsand choices in addition to the stochastic noise inherent in the logit model. Unlike semipara-metric and parametric alternatives, the extended logit model is easy to estimate even when thereare many potential choices. Unlike parametric alternatives, it does not require the specificationof a distribution of varying tastes. The extended logit model can give a quick indication of theimpact of taste variation on estimates and it generates estimates of the covariances of the tasteshifters. It can be used as an exploratory device en route to the construction of a modelincorporating a particular form of random taste variation and it can be used to determinewhether such effort is required at all. When the amount of taste variation is not excessive theapproximate model can be adequate itself. The model nests the conventional logit model whichleads to a misspecification diagnostic. A method for estimating the model using conventionallogit model software is proposed, asymptotic properties of estimators are derived and anapplication is presented.

1. INTRODUCTION

The simplicity and ease of use of the logit model for multiple discrete choices makes it afrequent choice in applied econometric work, but it is well known that this simplicity isbought at a cost. In particular, thinking in terms of its classical ‘‘stochastic utility’’maximization genesis, the model is only obtained when a strong and potentially objec-tionable assumption is made, namely that the stochastic (really partially observed) utilitiesare, conditional on functions of observed covariates, independently and identically dis-tributed with extreme value distributions. In practice we might expect taste variation andother additional dispersion due to omitted and mis-measured covariates to cause theseassumptions to fail.

This paper provides an extended version of the multinomial logit model which per-mits a degree of overdispersion. Unlike some of the other approaches to this problem thisextended model, the heterogeneity adjusted logit model, does not require the specificationof the distribution of the additional taste and other variation. It is easy to estimate, incontrast to some other alternatives to the logit model. The conventional logit model isnested within the heterogeneity adjusted logit model and a misspecification test is a directby-product.

Review of Economic Studies (2002) 69, 147–168 0034-6527/02/00060147$02.00

� 2002 The Review of Economic Studies Limited

1.1. The heterogeneity adjusted logit model

In a conventional polytomous logit model for choice among I alternatives with alternativespecific covariates X ¼ ½X1; . . . ;XI�, the probability that alternative i is chosen conditionalon X ¼ x � ½x1; . . . ; xI� is

pðijx; �Þ ¼exp ðx 0

i �ÞPIj¼1 exp ðx 0

j �Þ: ð1Þ

Here xi and � are K element vectors. We shall refer to x 0i � as the index for choice i. When

there is unobserved variation in tastes for alternatives and no repeated observations onindividuals, econometric analysis must be based on choice probabilities marginal withrespect to taste variation. These depend intimately on the multivariate distribution of tasteshifters about which economic theory has nothing to say. A nonparametric attack is foiledby the curse of dimensionality when more than a few choices are available. We return tothese points in Section 1.2.

We offer a way around this impasse by focusing on situations in which taste variationis in some sense limited and show in Section 2 that in this case the required marginal choiceprobabilities are then approximately

gðijx;�;�Þ ¼

exp ðx 0i �þ

PIs¼1

PIt¼s

s;t 6¼i�

!stzsti ðx;�ÞÞPI

j¼1 exp ðx 0j �þ

PIs¼1

PIt¼s

s;t 6¼i�

!stzstj ðx; �ÞÞ

for arbitrary choice2 of i �. We call the polytomous logit model that employs theseprobabilities the Heterogeneity Adjusted Logit (HAL) model.

The added variables z sti ðx; �Þ in the HAL model choice indices, whose values vary

across alternatives, have a very simple form. For all s and t they are zero when i ¼ i �. Fori 6¼ i �, when s ¼ t, z ss

i ðx;�Þ is 12 � pðsjx; �Þ for i ¼ s and zero for i 6¼ s. When s 6¼ t,

z sti ðx; �Þ is �pðtjx; �Þ for i ¼ s, �pðsjx;�Þ for i ¼ t and zero when both i 6¼ s and i 6¼ t. Eachparameter !st can be interpreted as a covariance of normalized random variables cap-turing across individual variation in tastes for alternatives s and t. For s ¼ t these para-meters can be interpreted as taste variances. Choosing a particular value of i � in (2) isequivalent to measuring taste variation relative to taste for alternative i �.

Since the added variables are non-linear functions of unknown parameters, estima-tion of the heterogeneity adjusted logit model appears to require use of general non-linearoptimization software rather than off-the-shelf estimation tools. In fact a simple two stepprocedure using conventional logit model estimation software is available. This involves,at the second stage, conventional maximum likelihood logit estimation of � and the !st’susing the approximate probabilities in (2) with the added variables evaluated at the firststage maximum likelihood (ML) estimates of � obtained estimating a logit model using thechoice probabilities given in (1).

1. This specification can include covariates which are alternative invariant characteristics of agentscarrying alternative specific coefficients, if interactions of alternative specific indicators with these alternativeinvariant covariates are included among the elements of x.

2. The added variables, zsti , and the interpretation of the !st parameters vary with choice of i �, as

described shortly, but the approximate choice probabilities gðijx;�;�Þ are invariant with respect to i �.

148 REVIEW OF ECONOMIC STUDIES

1.2. Alternative approaches

There are many ways of bringing taste heterogeneity into the logit framework, all invol-ving bringing additional stochastic terms into the choice indices. One possibility is tospecify a parametric joint probability distribution for the added terms, then derive choiceprobabilities marginal with respect to these terms and maximize the resulting marginallikelihood function. Examples include Gonul and Srinivasan (1993) and Postorino (1993)who use multivariate normal distributions allowing dependence across alternatives,Chintagunta, Jain and Vilcassim (1991) who use normal and Gamma distributionsassuming independence across alternatives and Steckel and Vanhonacker (1988) who usegamma distributions (focusing on the exponential special case for most of their analysis),also assuming independence across alternatives.

McFadden and Train (2000) provide further references to examples of the use ofmixed multinomial logit (MMNL) models and develop an interesting property of MMNLmodels, namely their ability to approximate any choice probabilities arising from anyrandom utility maximization mechanism. Brownstone, Bunch and Train (2000) describean interesting recent application in which the flexibility in stochastic structure afforded byMMNL models is exploited to allow information in stated and revealed preference data tobe brought to bear on the estimation of common parameters.

A problem with all of these parametric approaches is that there are few multivariatedistributions which allow a full range of correlation structures across alternatives.Dependence across alternatives is likely to be an important feature of taste heterogeneityin multiple discrete choice problems since alternatives may well differ in their degree ofsimilarity to one another. An additional difficulty is that in models with many alternativesthere is a potentially difficult high dimensional numerical integration to be done. Evenafter the recent intensive research effort in the area of simulation inference (see the surveyin Hajivassilou and Ruud (1994) and for a recent application in discrete choice modelling,Stern (1994)) this is still not routine. So an easy to use alternative such as that presentedhere is likely to be of use, at least during exploratory analysis. Even if computationalproblems become trivial there is no guidance from economic theory concerning thedistributional form to use in an exact parametric analysis.

An alternative is to take a semiparametric approach and leave the joint distribution ofthe varying tastes for alternatives unspecified, estimating a mass point model as describedin Laird (1978) and in an econometric duration data analysis context in Heckman andSinger (1984). Examples of this approach are described in Montgomery, Richards andBraun (1986), Kamakura and Russell (1989), Agresti (1993), Follmann and Lambert(1989), Formann (1992) and Jain, Vilcassim and Chintagunta (1994). Even with fullyobserved data there are enormous difficulties in nonparametric estimation of multivariatedensities (see the discussion in Section 4.5 of Silverman (1986)) so the prospects for thisapproach when there are multiple sources of heterogeneity are poor. In practice it is oftenfound that estimated distributions have unrealistically few points of support.3 Here, byworking with an approximation to the model incorporating heterogeneity, we are able toleave the joint distribution for the added taste variables unspecified yet still allow simpleand fast estimation even in models with many choices. Small (1994), in a different setting,

3. For example Jain et al. (1994) report a maximum of six points of support in a variety of four choiceproblems in which depending on the case considered there are from three to nine sources of heterogeneity. Clearlyin some of these cases the estimated joint distributions are degenerate.

CHESHER & SANTOS SILVA DISCRETE CHOICE MODELS 149

takes a similar approach to us to provide an extended version of the conventional logitmodel which approximates the nested logit model.

There are three main reasons for using the logit model as the basis for model con-struction. First, as is well known,4 it arises when, conditional on observed characteristicsand among a population with identical tastes, utilities associated with choices are inde-pendently distributed with extreme value distributions. These distributions are attractiveas a basis for stochastic modelling here because they arise as the limiting distribution ofmaxima. The utility attached to each alternative can be thought of as the maximumattainable utility across a set of similar alternatives. The second reason is that the logitmodel is so widely used as a model for discrete choices that it is of interest to model theimpact of departures from this model. Our extension of the logit model nests the classicallogit model in a class of models which admit taste variation. This prompts a mis-specifi-cation diagnostic and, with a single additional logit estimation, produces estimates cor-rected for the effect of taste variation and estimates of the variances and covariances ofvarying tastes. The third reason is that any discrete choice model derived from randomutility maximization can be approximated arbitrarily well by a mixed multinomial logitmodel (McFadden and Train (2000)).

An alternative to the method proposed here is to estimate a semiparametric version ofthe discrete choice model leaving the relationship between the choice indices and thechoice probabilities unspecified (Klein and Spady (1993), Lee (1995)). However semi-parametric estimation is difficult in problems with many choices and there is the dis-advantage that the resulting estimates do not have a direct structural interpretation.Further, most semiparametric approaches to multiple discrete choice model estimationmake a ‘‘multiple linear index’’ assumption. This assumption is inappropriate when thereis variation in tastes for characteristics of alternatives, a situation for which we develop anapproximation in Section 4.

The model proposed here is intended to be of use when each individual yields a singleresponse. If panel data are available then a conditional likelihood solution is available5

(Chamberlain (1980)) and the multinomial logit form can be dispensed with too (Manski(1987), Honore and Kyriazidou (2000)).

One use of the HAL model is to give practitioners a quick and easy way to assess thepotential importance of taste variation in discrete choice models by providing a simplespecification test and an immediate view of the impact of taste variation on the structuralparameters of interest. If the methods described here indicate that taste variation is likelyto be important then further investigation using one of the approaches outlined above maybe useful.

The remainder of the paper is organized as follows. In Section 2 an approximation tochoice probabilities in the presence of taste variation is derived and properties of theQuasi-Maximum-Likelihood estimator implied by the associated approximate likelihoodfunction are studied. Section 3 presents the misspecification diagnostic that arises ontesting whether augmentation of the logit model to allow for the effects of taste variation isnecessary. Section 4 extends the scope of HAL model to cases in which there is variation intastes for characteristics of alternatives as well as in simple preferences for alternatives andSection 5 concludes.

4. See McFadden (1973, 1976 and 1984).

5. The approximation based method proposed here can be extended to panel data contexts (see Chesherand Santos Silva (1995)). One might wish to pursue this (a) if coefficients on constant-within-individual covariateswere of interest and (b) to provide additional information on coefficients which are estimated inaccurately by theconditional maximum likelihood estimator.

2. HETEROGENEITY

2.1. Approximate choice probabilities

The conventional polytomous logit model is extended by supposing that for each alter-native, i, utility, Ui, is given by

Ui ¼ x 0i �þ� 0

i �þ "i ð3Þ

where the unobserved � ¼ � j

� �varies across individuals. Here �0

i is the i-th row of� ¼ st½ � ¼ �� where � ¼ tr ð�� 0Þ

1=2, and � and the normalized �� are I I lowertriangular constant matrices. The vector � is a I element random vector with mean zeroand unit covariance matrix conditional on x. The alternative specific term � 0

i � ¼ i, say,can be regarded as capturing taste variation and perhaps the net effect of omitted cov-ariates,6 and the "i’s have mutually independent extreme value distributions, independentof x ¼ ½x 0

1; . . . ; x 0I �

0 and ¼ ½i� ¼ ��. Let � ¼ ½!st� ¼ �� 0, which has trace � 2, denotethe covariance matrix of conditional on x. Then the choice probabilities conditional onx and are as follows:

~ppðijx; �; �;�Þ ¼exp ðx 0

i �þ� 0i �ÞPI

j¼1 exp ðx 0j �þ� 0

j �Þ: ð4Þ

Since � is not observed, inference about � is often based on choice probabilities marginalwith respect to � given x, that is on the expected values of these conditional choiceprobabilities with respect to the distribution of � given x. Since a flexible parametricspecification of this distribution is problematic, and a nonparametric specification leads toinsurmountable estimation problems when there are many choices, we now develop anapproximation to the marginal probabilities.

Assume that the conditional absolute third moments of � given x are bounded byfinite valued functions of x. The approximate choice probabilities marginal with respect to� are obtained by expanding ~ppðijx; �; �;�Þ in (4) in a second order Taylor series in the non-zero elements of � around � ¼ 0, and marginalizing the resulting approximation. Thederivatives of ~ppðijx; �;�;�Þ at � ¼ 0 are as follows:

@ su~ppðijx; �; �;�Þj�¼0 ¼ ps ðijx; �Þ�u; ð5Þ

@ su t�~ppðijx; �; �;�Þj�¼0 ¼ pstðijx; �Þ�u�� ð6Þ

In these expressions appear the first and second derivatives, with respect to the choiceindices, of the choice probability, pðijx; �Þ, given in (1). These derivatives are as follows:

ps ðijx; �Þ ¼ pðijx; �Þf�is � pðsjx; �Þg; ð7Þ

pstðijx; �Þ ¼ pðijx; �Þf�is �it � �is pðtjx; �Þ � �it pðsjx;�Þ

� �st pðsjx;�Þ þ 2pðsjx; �Þpðtjx;�Þg: ð8Þ

6. We assume that E ½�jx� and � are independent of x. If taste variation or omitted covariates are x-dependent then the coefficient � in the extended model captures the sensitivity of choice probabilities arisingdirectly from the observed covariates and from their influence on unobserved heterogeneity.

Here �is ¼ 1½i¼s� is the Kronecker delta.It follows that the choice probability marginal with respect to � given x can be written

ppðijx; �;�Þ ¼

ð� � �

ðf pðijx; �Þ þ

Xu su�u p s ðijx; �Þ

X� su�u t�� pstðijx;�Þ þ Oð�3ÞgdFð�jxÞ; ð9Þ

in which the identity ~ppðijx; �; �; 0Þ ¼ pðijx; �Þ and equations (5) and (6) are used, and insummations, here and later, unless noted, the range for each index is f1; . . . ; I g. Uponintegrating term by term, the linear terms in su vanish because � has zero mean, and termsin su t� with u 6¼ � vanish because the elements of � are uncorrelated. Then, noting thatPI

u¼1 su tu ¼ !st, we arrive at the following approximation to the choice probabilitiesmarginal with respect to alternative specific tastes.

ppðijx; �;�Þ ¼ pðijx; �Þ þ1

Xt!st p stðijx;�Þ þ Oð�3Þ: ð10Þ

The boundedness of the 3rd absolute moments of � given x is sufficient to ensure that theapproximation errors are Oð�3Þ. Taking the expansion in (9) one term further it can beseen that if the elements of � are symmetrically distributed in the sense thatE ½�u��wjx� ¼ 0 for all u; �;w and if the 4th moments of � are finite, the remainder termsin (10) will be Oð�4Þ.

This approximation shows the local effect on logit choice probabilities of uncon-trolled variation in alternative specific tastes. Interestingly the approximation does notdepend on the form of the distribution of the heterogeneity terms but a higher orderapproximation would bring into the expression third and higher order cumulants of thedistribution of tastes. We propose using this approximation as the basis for estimation(e.g. by maximum likelihood) of multinomial logit models when across individual tasteheterogeneity is a potential problem. The resulting estimators can be expected to be lessaffected by taste heterogeneity than conventional estimators which ignore heterogeneity.This is examined further in Section 2.5. However, as (10) stands it cannot be used as thebasis for estimation by conventional methods because it does not define a properprobability model. The approximate probabilities do sum to one, because withcs ¼ xs

p stðijx; �Þ ¼@2

@cs @ct

pðijx; �Þ ¼ 0; ð11Þ

but they do not in general lie in the unit interval. A simple adjustment produces a properprobability model, as we now show.

2.2. A proper approximation

To obtain a proper probability model which approximates the class of heterogeneous logitmodels we find functions of x and �, r st

i ðx; �Þ, to augment the choice index so that theresulting approximate choice probabilities

gðijx; �;�Þ ¼exp ðx 0

i �þP

Pt !str

sti ÞP

j exp ðx 0j �þ

Pt !str

stj Þ

ð12Þ

have first order Taylor series expansions in � around � ¼ 0 (equivalently, second orderexpansion in � around � ¼ 0) identical to (10) to order Oð�3Þ. By construction thesechoice probabilities will be components of a proper probability distribution (i.e. they lie inthe unit interval and sum to one) and the error, ppðijx; �;�Þ � gðijx; �;�Þ, will be of orderOð�3Þ. They can therefore serve as approximations to choice probabilities in the presenceof taste variation while providing a model amenable to conventional statistical analysis.

Expanding (12) gives

gðijx; �;�Þ ¼ pðijx;�Þ þX

Xt!st pðijx;�Þ r st

i �X

j pð jjx; �Þn o

þ Oð�3Þ:

Choosing

r sti ðx; �Þ ¼

12 p stðijx; �Þ=pðijx; �Þ ð13Þ

and noting that

Xjr st

j ðx; �Þpð jjx;�Þ ¼1

p stð jjx; �Þ ¼ 0;

it follows on comparing with (10) that the functions r sti ðx; �Þ ¼

12 pstðijx;�Þ=pðijx;�Þ have

the necessary attributes. This leads to the following, proper approximation:

gðijx; �;�Þ ¼exp ðx 0

i �þ12

Pt !stp

stðijx; �Þ=pðijx; �ÞÞPj exp ðx 0

j �þ12

Pt !st p stð jjx; �Þ=pð jjx; �ÞÞ

: ð14Þ

2.3. Identification and normalization

Only IðI � 1Þ=2 of the IðI þ 1Þ=2 distinct elements of � can be identified. This is becausechoice probabilities conditional on and x are unaffected if, for any i � 2 f1; . . . ; I g, thescalar i � is subtracted from the index for each choice. This has the effect of bringing arow and column of zeros into the covariance matrix of the resulting added stochasticterms. In what follows we normalize by setting !i �j ¼ 0 for all j 2 f1; . . . ; I g and some i �.The form of the added variables r st

i ðx; �Þ is then very simple if their values are normalizedso that they are measured relative to the values taken for the alternative i � that is used innormalizing the covariance parameters. Define

z sti ðx;�Þ ¼ r st

i ðx; �Þ � r sti �ðx; �Þ:

Then for s; t 6¼ i �, using (13)

z sti ðx; �Þ ¼

12 �is �it � �is pðtjx; �Þ � �it pðsjx;�Þð Þ; ð15Þ

so that the approximate choice probabilities may be written as follows.

gðijx; �;�Þ ¼

exp ðx 0i �þ

PIs¼1

PIt¼s

s;t 6¼i�

!stzsti ðx;�ÞÞPI

j¼1 exp ðx 0j �þ

PIt¼s

s;t 6¼i�

!stzstj ðx; �ÞÞ

: ð16Þ

The values taken by the normalized added variables are, from (15), as follows.

z sti ðx; �Þ ¼

12 � pðtjx; �Þ; i ¼ s \ i ¼ t;

�pðtjx;�Þ; i ¼ s \ i 6¼ t;

�pðsjx;�Þ; i 6¼ s \ i ¼ t;

0; i 6¼ s \ i 6¼ t:

8>>>>>><>>>>>>:

ð17Þ

2.4. Discussion

The first order effect of uncontrolled taste variation can be captured by bringing additionalterms into the choice indices. These are functions of the indices for all alternatives and takevalues which vary across alternatives. In an I choice model, uncontrolled taste variationbrings I I � 1ð Þ=2 additional terms, one for each variance and covariance of the I � 1 tasteshifters that remain once utilities are measured relative to a base alternative.

It follows that, unlike the conventional logit model for discrete choice, the hetero-geneity adjusted logit model does not possess the Independence from Irrelevant Alter-natives (IIA) property in the sense that the odds on choosing alternative i relative toalternative j, Oi:jðxÞ, can depend on elements of x other than xi and xj. This is a goodfeature because, while a particular agent’s deterministic choice between two alternatives isplausibly modelled as being independent of the presence and nature of other alternatives,the stochastic version of the IIA property which applies in the conventional logit model isfar less attractive. It implies, for example that if a particular alternative, k is replicated mtimes, then the probability that an alternative j 6¼ k is chosen is

pð jjxÞ ¼ mOk:jðxÞ þX

i6¼kOi:jðxÞ

� �1

which passes to zero as m becomes large. This seems unreasonable, because if the malternatives of type k are truly identical then the probability of making a choice other thanone of type k should be constant as m increases.

This problem does not arise in the HAL model because it approximates models inwhich there is taste variation, which may be highly correlated amongst similar alternatives.McFadden, Tye and Train (1977, p. 41) remark:

‘‘Violations [of the IIA property] may be traced to the multinomial logit modelassumption that the unobserved utility component is independent across alternativesand independent of the observed attributes.’’

In the HAL model the assumption of independence across alternatives is relaxed allowingmodelling of choices in which alternatives display differing degrees of similarity. Nestedlogit models provide an alternative approach to modelling such choices. One advantage ofthe HAL model is that, unlike the nested logit model it can capture negative correlations

across alternatives. Another is that it does not require a priori specification of a nesting (ordependence) structure. A third is that the HAL can capture variance inhomogeneity, asshown in the example later.

One way to view the HAL model is as a parsimoniously parameterized member of theclass of universal logit models (McFadden, Tye and Train (1977)) with the advantage oversome other members of the class that its parameters have a direct structural interpretation,the elements of � as coefficients in a conditional choice model given tastes and the elementsof � as normalized variances and covariances of alternative specific taste shifters.

Suppose there is prior knowledge or belief that tastes for certain pairs of alternativesare uncorrelated. The implications of this for the variances and covariances of the nor-malized covariance matrix � can be deduced and the appropriate restrictions placed onthe elements !st that appear in (16). This may lead to the omission of certain of thevariables z st

i ðx;�Þ. In the binary logit model the only identifiable function of � is thevariance of the difference between the taste shifters for the two alternatives and, nor-malizing say with respect to choice 2, the added variable takes the values 1

2 � pð1jx; �Þunder alternative 1 and zero under alternative 2.

2.5. Estimation

We consider quasi-maximum likelihood (QML) estimators obtained using (16) as if theywere exact choice probabilities. Let Lað Þ be the log likelihood function for a logit model inwhich choice probabilities are given by (16) with ¼ ð�;�Þ, let 0 ¼ ð� 0;�0Þ be the datagenerating value of , and let ~ ¼ ð ~��; ~��Þ ¼ argmax Lað Þ. When there is in fact no tastevariation Lað Þ is correctly specified and ~ will consistently estimate ð� 0; 0Þ. When there isuncontrolled taste variation Lað Þ is a local approximation to the true, unknown, loglikelihood function and so ~ will not be a consistent estimator of 0. However it can beexpected to improve on the conventional ML estimator which ignores taste variationaltogether.

Under the conditions of Theorem 6.4 in White (1994), now assumed to apply, the firstorder asymptotic distribution of the QML estimator is given by:ffiffiffiffi

ð ~ � aÞ ¼ Nð0;Vð aÞÞ þ opð1Þ: ð18Þ

where a ¼ plimN!1~ . Because of the approximations being used, a is not in general

equal to 0, the data generating value of , but, as is shown in Appendix 1, a ¼ 0 þ Oð�3Þ, so that ~ improves on the ML estimator in the conventional logit modelwhich generally has asymptotic bias of larger order, Oð�2Þ.

The asymptotic covariance matrix of the QML estimator in (18) has the standard‘‘sandwich’’ form

Vð aÞ ¼ plimN!1

N�1La ð

aÞ� ��1

plimN!1

N�1La ð

aÞLa ð

� �plimN!1

N�1La ð

aÞ� ��1

;ð19Þ

with probabilities induced by the exact (not the approximate) choice probabilities. Theseprobabilities are unknown without a complete specification of the distribution of tastesbut Vð aÞ can be consistently estimated by replacing probability limits by sample averagesof log likelihood derivative contributions evaluated at the QML estimator. We show inAppendix 2 that

Vð aÞ ¼ I að aÞ�1

þ Oð�3Þ; ð20Þ

where I að aÞ is the information matrix of the HAL model regarded as an exact model forchoice probabilities. This means that standard software packages will produce approxi-mately correct standard errors when the HAL model is estimated. However, if a sub-stantial amount of taste variation is suspected, use of the ‘‘sandwich’’ estimator isrecommended.

The additional non-linearity involving � caused by the introduction of the functionsz st

i ðx; �Þ in the choice probabilities (16) at first sight prohibits the use of conventional logitestimation software when estimating the HAL model. However, as we show in Appendix3, estimators with the same Oð�3Þ inconsistency as ~ can be obtained using a two stepprocedure involving maximum likelihood estimation of a conventional logit model at eachstep. At the first step the logit model neglecting taste variation is estimated by maximumlikelihood giving the estimate ��. At the second step a logit model with probabilities givenin (16) is estimated with the added variables z st

i ðx; ��Þ calculated using the first stageestimates of �. Only the coefficients on the original and added variables are estimated atthe second stage, the values that the added variables take being fixed during the secondstage estimation.7

3. A TEST TO DETECT TASTE VARIATION

The HAL model reduces to the conventional logit model when � ¼ 0. So, a test of thehypothesis H0 : � ¼ 0 provides a potentially useful misspecification diagnostic sensitive toheterogeneity not captured in the simple logit model for multiple discrete choices. Sincethe HAL model provides a local approximation to a wide class of heterogeneous logitmodels a score test developed from the HAL model will be identical to score testsdeveloped from each member of this class (Davidson and MacKinnon (1987)), possessingthe usual large sample optimality properties of such tests. Let xin and xn denote respec-tively the values of the covariates xi and x ¼ ½x1; ; xI� at realization n.

The log likelihood function for choice indicators Yin and the model with choiceprobabilities defined by (16) is

LN �;�ð Þ ¼XN

i¼1Yin x 0

in�þXI

t¼ss;t 6¼i�

!stzsti ðxn; �Þ

n¼1ln

i¼1exp x 0

in�þXI

t¼ss;t 6¼i�

!stzsti ðxn;�Þ

from which the average score for !st at � ¼ 0 is

SNð!st; �Þ ¼1

i¼1Yin � pðijxn; �Þð Þz st

i ðxn; �Þ: ð21Þ

This leads to an omitted variable score test in which the omitted variables involveunknown parameters, evaluated at the conventional logit model estimates when the testis computed. Let z�in contain the added variables corresponding to the variances and

7. Estimated standard errors reported at the second step by conventional MNL software need to beadjusted to allow for the use of �� in constructing the second stage regressors zst

i ðx; ��Þ as described in Section 6.2 ofNewey and McFadden (1994).

covariances chosen as the subject of the test. The average information matrix associatedwith the appropriately reduced HAL model evaluated at � ¼ 0 is

I �ð�Þ ¼ limN!1

j¼1�ij pðijxn; �Þ � pðijxn; �Þpð jjxn; �Þ� � xinx 0

jn xinz�0

z�inx 0jn z�inz�

from which it is clear that the variance matrix of the score test is unaffected8 by using �� inplace of � in calculating z�in.

Writing z sti ðxn;�Þ in (21) in terms of choice probabilities using (17) and manipulating

the result, the average score for !st can be written as follows:

SNð!st; �Þ ¼1

n¼1Ysn � pðsjxn; �Þð Þ Ytn � pðtjxn; �Þð Þ

n¼1�st pðsjxn; �Þ þ pðsjxn; �Þpðtjxn; �Þð Þ: ð22Þ

So, the score test for H0 : � ¼ 0 can be viewed either as measuring the correlation betweenresiduals Yin � pðijxn; �Þ and the IðI � 1Þ=2 candidate omitted variables z st

i ðxn;�Þ (equation(21)) or as measuring the difference between the sample covariances of these residuals andtheir covariances evaluated under the null (equation (22)). In the latter form it is clear thatthe score test statistic is a member of the class of Information Matrix test statistics (White(1982)). This accords with the interpretation of the Information Matrix test as a test todetect the presence of neglected heterogeneity (Chesher (1984)). A special case of this testis studied by McFadden (1987) and a related test appears in McFadden and Train (2000).

3.1. Monte Carlo results

Design. This section reports the results of a simulation study comparing the per-formance of the HAL test and two other types of test designed to detect departures fromthe IIA property. A three choice model is employed to generate the data for the study withtwo covariates varying across alternatives, no alternative specific intercepts, and choiceprobabilities specified as in (4) with � 0 ¼ 2;�2½ � and with ¼ �� N ½0;��. To alloweasy manipulation of the experimental conditions the matrix � (prior to normalization) iswritten as a function of three scalars, r, h and M, as follows:

� ¼M

1� M

1þ h rffiffiffiffiffiffiffiffiffiffiffi1þ h

rffiffiffiffiffiffiffiffiffiffiffi1þ h

0 0 1� h

37775:

Here r is the correlation between tastes for alternatives 1 and 2 which are both independentof tastes for alternative 3. The extent to which h differs from zero measures the in-homogeneity in variances of tastes across alternatives, and M controls the proportion of

8. The terms which arise in the information matrix when � 6¼ 0 due to the appearance of �� in z�in vanishwhen � is set to zero.

noise in the model that is due to taste variation, i.e. that comes from . A 100 pointcovariate design was used in the experiments, replicated an appropriate number of times,design points being obtained as independent realizations of Nð0; �2Þ variates with� ¼ �=

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi6ð1� MÞ

pwhich forces the proportion of variation due to covariate variation to

be constant as M is varied. When M ¼ 0 the classical logit specification is correct. Thereare 5000 replications in each experiment and samples of size 1000 and 2000 are used.Covariate values were kept fixed across all replications. We study size and power of thetests, reporting percentages of rejections of the simple MNL specification when first orderasymptotic approximate 5% critical values are used.

Tests. Three types of test are studied, (a) the HAL test described above, (b)Hausman and McFadden (1984) choice set partition tests and (c) tests directed againstnested logit model alternatives.

Each test can be computed as an efficient score test for omission of constructedvariables, wi, from the indices for choices i 2 f1; . . . ; I g, (McFadden (1987)). Specifically,the tests are computed as N I � 1ð ÞR2 where R2 is the unadjusted squared multiple cor-relation coefficient from OLS estimation of a single auxiliary ‘‘regression’’. In this‘‘regression’’ the N I values taken by the ‘‘dependent variable’’ are

ðYin � pðijxn; ��ÞÞ=

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffipðijxn; ��Þ

q; n 2 f1; . . . ;Ng; i 2 f1; ; I g:

The values of the ‘‘explanatory variables’’ for a particular pair ðn; iÞ are

xin � xxnð Þ

qand win � wwnð Þ

xxn ¼X

ixin p ijxn; ��

� ; wwn ¼

Xiwin p ijxn; ��

and �� is the ML estimator of the coefficients in the standard MNL model (1).In the HAL test the constructed ‘‘omitted variables’’ wHAL

i are those defined inequation (17). In the three choice case considered here the HAL test statistic is distributedas �2ð3Þ þ OpðN

�1Þ when the heterogeneity free MNL model is a correct specification.In the three choice case there are three possible Hausman and McFadden tests one for

each possible choice set partition. We denote these by HMj where j indicates the excludedalternative. The constructed ‘‘omitted variables’’ for HMj are as follows (see, McFadden(1987) p. 71):

wHM jin ¼

0 i ¼ j;

xin �P

s 6¼j xsnp sjxn; �ð ÞPt 6¼j p tjxn; �ð Þ

!i 6¼ j:

8>><>>:

When the standard MNL model (1) is the correct specification these HMj statistics aredistributed as �2ð2Þ þ OpðN

�1Þ. We also consider a portmanteau test (McFadden (1987))which tests simultaneously for omission of constructed regressors generated by each of thethree choice set partitions. This statistic, denoted by HMA, is here approximately �2ð6Þwhen the MNL model is the correct specification.

The nested logit alternative can be specified in three ways in the three choice case,depending on which pair of alternatives is determined in a sub-choice. For the three tests,

denoted NLj where j indicates the top-level choice for which there are no sub-choices, theconstructed ‘‘omitted variables’’ are as follows:9

wNL jin ¼

0 i ¼ j;

�lnp ijxn; �ð ÞP

s6¼j p sjxn; �ð Þ

!i 6¼ j:

8>><>>:

When the MNL model is the correct specification, these tests are distributed as�2ð1Þ þ OpðN

�1Þ. Here too there is available a portmanteau test obtained by including aspotential omitted variables the variables used for each test NLi. This test statistic, denotedNLA, is approximately �2ð3Þ when the MNL model is correct.

Size and Power. Table 1 shows Monte Carlo estimates of the exact size of the testswhen first order approximate 5% critical values are used.10 For these experiments datawere generated with M ¼ 0. At these sample sizes the first order approximate criticalvalues seem accurate for all of the tests and we proceed without introducing any finitesample size correction.

TABLE 1

Size: Percentage of rejections of correctly specified logit model using first order approximate 5% critical values

n HMA HM1 HM2 HM3 NLA NL1 NL2 NL3 HAL

1000 5�3 5�4 5�1 4�7 5�0 5�0 4�9 5�0 5�02000 5�1 4�2 5�4 4�4 4�7 4�2 5�0 4�6 4�8

Table 2 shows Monte Carlo estimates of the power of the tests when a heterogeneouslogit model generates the data. All combinations of M 2 f1=3; 2=3g, h 2 f0; 2=3g andr 2 f�0�5; 0; 0�5g are considered. A notable feature of the results is that the HMj and NLjtests’ performance is sensitive to the way in which the test is formulated. The portmanteauHM and NL tests perform quite well but are usually dominated by other tests within theirclasses. In only six cases is the HAL test dominated by any of the other eight testsconsidered and then usually by only a small margin.

When h ¼ r ¼ 0 only the HAL test has power significantly exceeding size and thisonly by a small amount. This is perhaps not surprising since in this case only theextreme value assumption is incorrect, "i in (3) still being identically and independentlydistributed across choices. Maintaining independence but introducing variance inho-mogeneity (r ¼ 0, h ¼ 2=3) leads to increased power and again the HAL test dominatesthough NL1 now comes a close second. Introducing correlation across choices (r 6¼ 0)the HAL test still usually dominates, however for r ¼ 0�5 and h ¼ 0, which introducespositive correlation between utilities for choices 1 and 2 maintaining variance homo-geneity, the test NL3 does a little better, as might be anticipated since this test is

9. See McFadden (1987), p. 74.

10. All computations in this and in the following section were performed using TSP 4�5, Hall and Cummins(1999).

TABLE 2

Powers: Percentage of rejections of misspecified logit model using first order approximate 5% critical values

M h r N HMA HM1 HM2 HM3 NLA NL1 NL2 NL3 HAL

�0�5 1000 6�5 6�3 5�8 8�9 7�9 6�3 6�0 9�3 11�9�0�5 2000 9�5 6�7 7�1 11�9 12�9 8�1 7�5 13�8 20�40 1000 4�3 4�5 4�6 5�1 4�1 4�2 4�3 4�5 8�5

0 0 2000 4�4 4�3 4�8 5�1 5�0 3�9 4�5 4�8 11�70�5 1000 6�2 5�7 5�8 9�5 7�9 5�9 6�3 12�6 11�7

0�5 2000 9�8 6�0 6�8 6�8 14�8 6�6 8�6 24�4 21�3

�0�5 1000 18�5 16�0 6�6 26�3 27�5 24�0 6�0 35�0 32�3�0�5 2000 39�1 32�3 8�7 45�6 54�8 45�1 8�5 58�8 61�5

23 0 1000 7�2 8�8 4�9 9�3 8�7 12�2 3�8 10�4 13�3

0 2000 11�0 14�6 5�4 12�0 16�7 21�4 4�2 13�9 23�70�5 1000 5�1 5�0 7�0 5�6 5�6 5�6 7�6 5�4 9�40�5 2000 6�6 5�5 9�5 6�8 8�7 6�5 11�4 7�4 13�9

�0�5 1000 14�8 8�7 9�1 19�3 20�9 11�5 10�3 23�9 31�1�0�5 2000 32�0 15�2 15�5 32�2 45�9 19�9 17�4 41�9 61�30 1000 4�3 4�6 4�6 5�1 4�7 4�1 4�1 4�3 13�2

0 0 2000 6�7 4�6 5�3 5�5 9�3 4�2 4�0 5�2 28�40�5 1000 16�7 8�0 9�0 33�4 28�5 9�1 12�1 47�3 38�9

0�5 2000 43�4 10�7 14�8 66�8 63�7 14�1 22�4 79�5 73�4

�0�5 1000 69�3 58�6 11�3 74�0 84�4 73�5 12�4 85�7 86�9�0�5 2000 96�8 92�2 21�0 96�0 99�4 96�8 22�2 98�9 99�3

23 0 1000 21�5 27�2 5�7 21�4 31�7 39�2 3�7 28�0 42�7

0 2000 47�4 56�5 7�2 35�9 66�0 70�2 3�9 48�0 75�60�5 1000 8�7 7�1 14�1 7�7 14�1 9�8 18�7 9�0 23�60�5 2000 21�2 12�6 25�4 13�0 32�8 17�7 36�2 16�8 49�6

designed for situations in which choices 1 and 2 form a sub-choice the result of which iscompared with choice 3.

This experiment suggests that in typical microeconometric sample sizes the HAL testhas reasonable power and usually performs well by comparison with the other statisticsconsidered here. As the number of choices increases the HM and NL tests quickly becomeinfeasible if the full range of choice set partition and nesting is to be considered. Inthis situation it seems likely that the HAL test provides a useful portmanteau test, theassociated HAL model providing a structural interpretation to departures from theconventional MNL specification.

4. VARIATION IN TASTES FOR CHARACTERISTICS OFALTERNATIVES AND AN APPLICATION

So far we have accommodated variation in tastes for alternatives, equivalently, randomvariation across individuals in coefficients on alternative specific indicators in the choiceindices. Variation in tastes for characteristics of alternatives11 is equivalent to randomvariation across alternatives and individuals in coefficients on covariates. In some appli-cations this is a useful extension of the conventional logit model. For example in a travelmode choice problem it allows for variation across individuals in the value of time. Theresulting model is heteroskedastic in the sense that the extent of random variation around

11. This is the response heterogeneity discussed by Jain, Vilcassim and Chintagunta (1994).

the choice indices depends upon covariate values. In the application set out in Section 4.2we re-analyse a data set in which such heteroskedastic variation has been found.

4.1. HAL model choice probabilities

Variation in tastes for characteristics of alternatives is captured by writing the choiceprobabilities as

pðijx; ; �Þ ¼exp ðx 0

i �þ x 0iiÞP

j exp ðx 0j �þ x 0

jjÞ: ð23Þ

where each i, i ¼ 1; . . . ; I, is now a K element vector with E ½ijx� ¼ 0 andcov½i; jjx� ¼ �ij. Let � denote the set of matrices f�stgs;t¼1;...;I. Proceeding as in Section2 leads to the approximate choice probabilities

ppðijx; �;�Þ ¼ pðijx; �Þ þ1

s�stxt p stðijx; �Þ þ oð�stÞ;

where �st ¼ tr ð�stÞ. Conversion to a proper probability model is done as before, leading tothe HAL model choice probabilities

gðijx; �;�Þ ¼

�x 0

i �þPI

PIt¼s

s;t 6¼i�

x 0s�stxtz

sti ðx; �Þ

j¼1 exp

�x 0

j �þPI

s;t 6¼i�

x 0s�stxtz

stj ðx; �Þ

; ð24Þ

in which the variables z sti ðx; �Þ are exactly as in (17) in Section 2.2. Note that the con-

ventional multiple linear index restriction does not apply here.The model in Section 2, in which i is a scalar, is a special case of the analysis of this

section obtained by regarding i in (23) as composed of zeros except for a single elementcorresponding to a unit element in each xi. This is equivalent to setting each K K matrix�st to zero except for a single diagonal element with value !st corresponding to anintercept term in each vector xi. Thus equation (24) is identical to equation (16) in Section2.1 with !st replaced by xs

0�stxt.In some common specifications not all the normalized covariances can be identified.

For example suppose there are alternative specific coefficients allowing characteristics ofindividuals (e.g. household income, age and family size), constant across alternatives, toappear in choice indices. Since the covariates (xs� and xt�, say, in the indexes for choices sand t) associated with these coefficients have to appear as interactions with alternativespecific indicator variables if they are not to cancel out from the numerator anddenominator of (1), cross-products xs� xt� are zero for s 6¼ t, leading to thedisappearance of the ð�; �Þ terms in each quadratic form in (24) involving �st for s 6¼ t.

4.2. Application

We study the data12 used by Dubin and Zeng (1991) and Dubin (1998) which recordschoice among three types of still camera: (1) 35 mm, (2) instant, and (3) other formats, andcovariates defined in Table 3. We write the utility of owning a camera of type i as

Ui ¼ Vi þ "i

¼ �0i þ �1iINC þ �2iC þ �3iED þ �4iM þ �5iHSIZE þ �6NCAMi þ "i;

which differs slightly from the specification of Dubin and Zeng, who do not include thevariable HSIZE in the indices. They found evidence of heteroskedasticity and specifiedvar ð"iÞ / eð�2�HSIZEÞ, leading to a logit model with the following choice probabilities:

pðiÞ ¼exp ðVie

ð�HSIZEÞÞPj exp ðVjeð�HSIZEÞÞ

: ð25Þ

This permits heteroskedasticity across households, but within households and acrossalternatives it restricts utilities to be uncorrelated with equal variances. The HAL modelused here is specified so as to introduce heteroskedasticity driven by the covariate HSIZEby defining each matrix �st in (24) as diagonal with just one non-zero element, !st,associated with the HSIZE variable in the covariate vector.

TABLE 3

Definition of covariates

Covariate Description

INC Household income ($000)C 1 if the household has children, 0 otherwiseED Years of education of the head of householdM 1 if the head of household is married, 0 otherwiseHSIZE No. of household membersNCAMi No. of cameras of type i in household in the previous year

Table 4 shows the results of estimating a standard logit model, the heteroskedasticlogit model and the HAL model. The heteroskedastic logit model produces a sizeableestimate: �� ¼ �0�275 which is 2�83 times its estimated asymptotic standard error, sug-gesting that the variances of all the utilities increase with household size. The estimatedvariances and covariances produced using the HAL model suggest that there is significantacross household variation in the impact of household size on utilities. The covarianceterm is positive and significantly different from zero,13 suggesting a squared correlationacross utilities for alternatives 1 and 2 around 0�5 when these are measured relative toutility for alternative 3. The estimated variance term associated with utility for alternative2 relative to alternative 3 is larger than that associated with utility 1 measured relative toutility from alternative 3.

12. The data which relate to 875 U.S. households observed in 1982–1983 were gathered by National FamilyOpinion Inc.

13. The Nð0; 1Þ based p-value here is 0�001.

TABLE 4

Parameter estimates and ‘‘t’’ statistics

Simple logit Het. logit HAL

CONSTANT �1�33 �2�37 �1�54(3�63) (2�93) (3�23)

INC 0�04 0�08 0�05(4�56) (3�32) (4�21)

C 0�31 1�13 0�47(1�25) (1�41) (1�37)

35 mm ED 0�07 0�16 0�08(2�83) (2�74) (2�60)

M 0�15 0�79 0�23(0�66) (1�22) (0�77)

HSIZE �0�30 �1�01 �0�37(3�11) (1�82) (2�16)

CONSTANT �0�89 0�09 �0�49(2�09) (0�08) (0�60)

INC 0�01 0�03 0�02(1�17) (1�08) (1�51)

C �0�45 0�59 0�14(1�39) (0�54) (0�24)

Instant ED 0�02 0�04 0�03(0�59) (0�71) (0�80)

M �0�30 0�57 0�08(1�26) (0�61) (0�17)

HSIZE 0�07 �1�48 �0�68(0�06) (1�37) (1�21)

All NCAM 0�19 0�46 0�25(3�06) (2�84) (3�06)

� �0�275(2�83)

!11 0�11(3�06)

� !12 0�13(3�27)

!22 0�30(3�19)

Log L �855�3 �852�2 �851�1

5. CONCLUDING REMARKS

The heterogeneity adjusted logit model presented here approximates a class of models inwhich tastes for alternatives and for characteristics of alternatives vary across individuals.The mixed multinomial logit models that make up this class are interesting because theydo not in general possess the IIA property and they can provide good approximations toany choice model consistent with random utility maximization.

The HAL model does not require specification of a joint distribution of the hetero-geneous taste shifters yet is easier to estimate than many models in which this distributionis left unspecified and treated nonparametrically, especially when the number of choices islarge. Estimates of variance and covariances of taste shifters are produced together withestimates of coefficients of an underlying model in which tastes are constant. The model

nests the conventional multinomial logit model and a score test examining whether theadded variance and covariance parameters are zero yields a misspecification diagnosticwhich is a member of the class of Information Matrix tests. The model allows developmentof approximations to the impact of taste variation on probability limits of estimatorsobtained using a conventional logit model, after the style of Keifer and Skoog (1984). Anexample is given in Chesher and Santos Silva (1995). An example of a microeconometricapplication of the HAL model is given by Eymann and Ronning (1997).

Experience with the HAL model suggests that it produces better quality estimates ofcoefficients on covariates than of the variances and covariances of varying tastes. Oneproblem that can arise in practice is that the QML estimator can produce impropervariance and covariance estimates, implying a correlation outside ½�1; 1�. Sometimes thisis just due to inaccurate estimation of these parameters. Sometimes the estimates are welldetermined and then the result may point to some other type of misspecification than thetaste variation. In such cases estimation of the HAL has usefully highlighted an inade-quacy in the conventional logit specification but it has not provided a structural inter-pretation of the departure from the specification.

When the HAL model suggests that there is substantial taste variation it may beworth proceeding to estimate a model in which there is a parametric or nonparametricspecification of the distribution of varying tastes. In that case the heterogeneity adjustedlogit model of this paper can give an idea of the nature of the taste variation that ispresent, suggesting simplifying restrictions on its form, thereby easing the problems thatarise when specifying and estimating high dimensional joint distributions. When tastevariation is only small, estimates obtained using the heterogeneity adjusted logit modelmay be adequate.

APPENDIX 1: ORDERS OF INCONSISTENCY

We consider the probability limit of the QML estimator obtained with a log likelihood function constructed using

the approximate choice probabilities (2) and outline an argument showing that the difference between this

pseudo-true value and value of the data generating parameter vector is of the same order as the difference between

the approximate and exact choice probabilities, that is that

a � 0 ¼ Oð�3Þ

where is a vector containing the K elements of � and IðI � 1Þ=2 identifiable functions of elements of � and 0 isthe data generating value of .

The required probability limit, or pseudo-true value, a, is the solution to

limN!1

N�1E ½Lað aÞj ¼ 0� ¼ 0

which we assume to be unique. Here Lað � Þ is the approximate log likelihood function constructed using the

approximate choice probabilities and expectation is taken using the exact choice probabilities under taste

variation.

In this discrete choice problem, the pseudo-true value, a, is the solution of the following system of

equations:

limN!1

N�1XN

ppðijx n; 0Þ

gðijx n; aÞ g ðijx

n; aÞ ¼ 0: ðA1:1Þ

Here gðijx; Þ and ppðijx; Þ are respectively approximate and exact choice probabilities written in terms of ,g ðijx;

aÞ is the vector of derivatives, r gðijx; Þj ¼ a , and xn denotes the value of the covariates x ¼ ½x1; . . . ; xI� at

realization n.

By construction the approximate choice probabilities have the property

gðijx; 0Þ ¼ ppðijx; 0Þ þ Oð�3Þ;

where at the data generating value �, � ¼ trace ð�� 0Þ1=2, � ¼ �� , and we consider variations in � for fixed ��.

Substituting in (A1.1) gives

limN!1

N�1XN

gðijxn; 0Þ

gðijxn; aÞ g ðijxn;

aÞ ¼ Oð�3Þ: ðA1:2Þ

Write the left-hand side of (A1.2) as að 0; aÞ and suppose that að 0; aÞ ¼ c has a unique solution for a which is

differentiable with respect to c. We write this solution a ¼ bð 0; cÞ and consider a Taylor series expansion of a

around c ¼ 0 in order to show that c ¼ Oð�3Þ, implies that a � 0 ¼ Oð�3Þ.The approximate probabilities are proper in the sense that for all ,

PIi¼1 gðijx n; Þ ¼ 1. ThereforePI

i¼1 g ðijxn; Þ ¼ 0 which implies that að 0; 0Þ ¼ 0, and so bð 0; 0Þ ¼ 0. We now require rcbð

0; cÞjc¼0. First

note that

r a að 0; aÞd a ¼ dc

and therefore

d a ¼ r a að 0; aÞ� ��1

from which it follows that

rcbð 0; cÞjc¼0 ¼ r a að 0; aÞj a¼ 0

� ��1;

as long as the inverse matrix exists. The matrix to be inverted is as follows.

r a að 0; aÞj a¼ 0 ¼ � limN!1

N�1XN

i¼1gðijx n; 0Þ

g ðijxn; 0Þ

gðijx n; 0Þ

� g ðijx

n; 0Þ

gðijxn; 0Þ

¼ �I að 0Þ:

The matrix I að 0Þ is the Information Matrix for the HAL model if the HAL model were exact not approximate.

This is required to be non-singular at the data generating value of if a ¼ bð 0; cÞ is to be differentiable with

respect to c.

Finally, collecting results, the first order Taylor series expansion of bð 0; cÞ around c ¼ 0 is

bð 0; cÞ ¼ 0 � I að 0Þ�1c þ oðcÞ ðA1:3Þ

and (A1.2) implies a ¼ bð 0;Oð�3ÞÞ. Therefore setting c ¼ Oð�3Þ in (A1.3) gives the required result, namely

a ¼ 0 þ Oð�3Þ:

APPENDIX 2: APPROXIMATE VARIANCE OF THE HAL ESTIMATOR

We now show that the asymptotic covariance matrix of ~ aN is equal to I að aÞ

�1þ Oð�3Þ where I að aÞ is the

information matrix obtained for the approximate model assuming that data are generated by that model, that is,

I að aÞ ¼ � limN!1

N�1X

aÞ exp ðLað aÞÞ;

where here and later in this Appendix summation is across all possible values of the choice indicators.

From White (1994) under the conditions imposed there, we haveffiffiffiffiN

pð ~ a

N � aÞ ¼ Nð0;Vð aÞÞ þ OpðN�1=2Þ

Vð aÞ ¼ EE �La ð

aÞ� ��1 EE La

ð aÞLa

ð aÞ

0� �

EE �La ð

aÞ� ��1

: ðA2:1Þ

Here EE½ � � denotes limN!1 N�1E ½ � � and expectation is conditional on the covariate design and taken with respect

to the true model with ¼ 0. So, for example,

EE �La ð

aÞ� �

¼ � limN!1

N�1X

aÞ exp ðLð 0ÞÞ: ðA2:2Þ

The order of magnitude of the approximation error in the approximate choice probabilities and the result of

Appendix 1 imply

exp ðLð 0ÞÞ ¼ exp ðLað 0Þ þ Oð�3ÞÞ

¼ exp ðLað aÞ þ Oð�3ÞÞ

¼ exp ðLað aÞÞð1þ Oð�3ÞÞ;

and substituting in (A2.2)

EE �La ð

aÞ� �

¼ � limN!1

N�1X

aÞ exp ðLað aÞÞ þ Oð�3Þ

¼ I að aÞ þ Oð�3Þ: ðA2:3Þ

By a similar argument, and noting that Lað � Þ is a proper log likelihood function for which the Information Matrix

identity holds,

EE La ð

aÞLa ð

� �¼ I að aÞ þ Oð�3Þ: ðA2:4Þ

The result follows on substituting (A2.3) and (A2.4) in (A2.1).

APPENDIX 3: LARGE SAMPLE PROPERTIES OF THE 2-STEP APPROXIMATIONTO THE HAL ESTIMATOR

Let Lð�;�; bÞ be a conventional logit model log likelihood function in which choice probabilities are:

hðijx;�;�; bÞ ¼

�x 0

i �þPI

s;t 6¼�

!stzsti ðx; bÞ

j¼1 exp

�x 0

j �þPI

s;t 6¼�

!stzstj ðx; bÞ

which is as in (16) except that the functions z sti ðx;�Þ are evaluated at � ¼ b and b is regarded as fixed. Thus

hðijx;�;�; �Þ ¼ gðijx;�;�Þ. Let ¼ ð ��; ��Þ maximize this likelihood function when b ¼ �� which is the conven-

tional logit ML estimator obtained when taste variation is ignored and choice probabilities are gðijx;�; 0Þ. We

now show that the probability limits of and ~ differ by Oð�3Þ.The estimator is the solution to the following estimating equations.

L�kð ��; ��; ��Þ ¼0; 1 � k � K ðA3:1Þ

L!stð ��; ��; ��Þ ¼0; 1 � s � t � I; s 6¼ i � ðA3:2Þ

Let plimN!1ð��Þ ¼ �� be the pseudo-true value of the MLE obtained ignoring taste variation. The choice prob-

abilities used in the simple MNL likelihood function that produces �� differ from the true choice probabilities by

order Oð�2Þ and an argument similar to that in Appendix 1 leads to

�� ¼ � 0 þ Oð�2Þ; ðA3:3Þ

where � 0 is the true data generating value of �, and, on noting that !st ¼ Oð�2Þ, to

!stzsti ðx; ��Þ ¼ !stz

sti ðx;� 0Þ þ Oð�3Þ: ðA3:4Þ

The solution to (A3.1) and (A3.2) has the same probability limit as its solution when �� appears in place of ��.With that replacement these are then the estimating equations from another approximation to the log likelihood

function employing choice probabilities with error, given the result in (A3.4), of order Oð�3Þ. The argument of

Appendix 1 then implies that plimN!1 ¼ 0 þ Oð�3Þ and so that and ~ have probability limits differing by

Oð�3Þ.

Acknowledgements. We are grateful to participants at the European Meeting of the Econometric Society,Brussels, 1992, and to Montezuma Dumangane, James Heckman and Quang Vuong for helpful comments, toJeffrey Dubin, Richard Blundell and Peter Dolton for providing data used in developing the methods set out here,to Gerard Austin, Simon Peters and Richard Spady, for advice on computations, to Chris Orme who found anerror in our original demonstration of the order of inconsistency of the QMLE based on the HAL model, and toan editor and two referees. We gratefully acknowledge financial support, as follows: Andrew Chesher, fromESRC grant H519255009 as part of the Analysis of Large and Complex Datasets initiative and ESRC grantR008237386; Joao Santos Silva from INVOTAN (Grant No 24/a/89/PO) and from Fundacao para a Ciencia e aTecnologia, programme Praxis XXI. Ekaterina Ametistova provided valuable research assistance.

REFERENCES

AGRESTI, A. (1993), ‘‘Distribution-Free Fitting of Logit-Models with Random Effects for Repeated CategoricalResponses’’, Statistics in Medicine, 12, 1969–1987.

BROWNSTONE, D., BUNCH, D. S. and TRAIN, K. (2000), ‘‘Joint Mixed Logit Models of Stated andRevealed Preferences for Alternative-Fuel Vehicles’’, Transportation Research Part B—Methodology, 34,315–338.

CHAMBERLAIN, G. (1980), ‘‘Analysis of Covariance with Qualitative Data’’, Review of Economic Studies, 47,225–238.

CHESHER, A.D. (1984), ‘‘Testing for Neglected Heterogeneity’’, Econometrica, 52, 865–872.CHESHER, A. D. and SANTOS SILVA, J. M. C. (1995), ‘‘Taste Variation in Discrete Choice Models’’

(University of Bristol Department of Economics Discussion Paper No. 95/397).CHINTAGUNTA, P. K., JAIN, D. C. and VILCASSIM, N. J. (1991), ‘‘Investigating Heterogeneity in Brand

Preferences in Logit Models for Panel Data’’, Journal of Marketing Research, 28, 417–428.DAVIDSON, R. and MACKINNON, J. G. (1987), ‘‘Implicit Alternatives and the Local Power of Test Statistics’’,

Econometrica, 55, 1305–1329.DUBIN, J. A. and ZENG, L. (1991), ‘‘The Heterogeneous Logit Model’’ (California Institute of Technology,

Social Science Working Paper 759).DUBIN, J. A. (1998) Studies in Consumer Demand—Econometric Methods Applied to Market Data (Boston, MA:

Kluwer Academic Publishers).EYMANN, A. and RONNING, G. (1997), ‘‘Microeconometric Models of Tourists’ Destination Choice’’,

Regional Science and Urban Economics, 27, 735–761.FOLLMANN, D. A. and LAMBERT, D. (1989), ‘‘Generalizing Logistic Regression by Nonparametric Mixing’’,

Journal of the American Statistical Association, 84, 295–300.FORMANN, A. K. (1992), ‘‘Linear Logistic Latent Class Analysis for Polytomous Data’’, Journal of the

American Statistical Association, 87, 476–486.GONUL, F. and SRINIVASAN, K. (1993), ‘‘Modeling Unobserved Heterogeneity in Multinomial Logit

Models: Methodological and Managerial Implications’’, Marketing Science, 12, 213–229.HAJIVASSILOU, V. A. and RUUD, P. A. (1994), ‘‘Classical Estimation Methods for LDV Models Using

Simulation’’, in R. F. Engle and D. L. McFadden (eds.), Handbook of Econometrics, Vol. 4, Chap. 40(Amsterdam: Elsevier Science BV).

HALL, B. H. and CUMMINS, C. (1999), Time Series Processor Version 4.5 User’s Guide (Palo Alto, CA: TSPInternational).

HECKMAN, J. J. and SINGER, B. (1984), ‘‘A Method of Minimizing the Impact of Distributional Assumptionsin Econometric Models of Duration Data’’, Econometrica, 52, 271–320.

HONORE B. E. and KYRIAZIDOU, E. (2000), ‘‘Panel Data Discrete Choice Models with Lagged DependentVariables’’, Econometrica, 68, 839–874.

JAIN, D. C., VILCASSIM, N. J. and CHINTAGUNTA, P. K. (1994), ‘‘A Random-Coefficients Logit Brand-Choice Model’’, Journal of Business and Economic Statistics, 12, 317–328.

KAMAKURA, W. A. and RUSSELL, G. J. (1989), ‘‘A Probabilistic Choice Model for Market Segmentationand Elasticity Structure’’, Journal of Marketing Research, 26, 379–390.

KIEFER, N., and SKOOG, G. (1984), ‘‘Local Misspecification Analysis’’, Econometrica, 52, 873–885.KLEIN, R. W. and SPADY, R. H. (1993),‘‘An Efficient Semiparametric Estimator for Binary Response

Models’’, Econometrica, 61, 387–422.LAIRD, N. (1978), ‘‘Nonparametric Maximum Likelihood Estimation of a Mixing Distribution’’, Journal of the

American Statistical Association, 73, 805–811.LEE, L-F. (1995), ‘‘Semiparametric Maximum Likelihood Estimation of Polychotomous and Sequential Choice

Models’’, Journal of Econometrics , 65, 381–428.MANSKI, C. F. (1987), ‘‘Semiparametric Analysis of Random Effects Linear Models from Binary Panel Data’’,

Econometrica, 55, 357–362.MCFADDEN, D. L. (1973), ‘‘A Conditional Logit Analysis of Qualitative Choice Behavior’’, in P. Zarembka

(ed.), Frontiers in Econometrics, (New York: Academic Press), 105–142.MCFADDEN, D. L. (1976), ‘‘Quantal Choice Analysis: A Survey’’ Annals of Economic and Social Measurement,

5, 363–390.

MCFADDEN, D. L. (1984), ‘‘Qualitative Response Models’’, in Z. Griliches and M. D. Intrilligator (eds.),Handbook of Econometrics, Vol. 2, (Amsterdam: North-Holland).

MCFADDEN, D. L. (1987), ‘‘Regression-Based Specification Tests for the Multinomial Logit Model’’, Journal ofEconometrics, 34, 63–82.

MCFADDEN, D. L. and TRAIN, K. (2000), ‘‘Mixed MNL Models for Discrete Response’’, Journal of AppliedEconometrics, 15, 447–470.

MCFADDEN, D. L., TYE, W. B. and TRAIN, K. (1977), ‘‘An Application of Diagnostic Tests for theIndependence from Irrelevant Alternatives Property of the Multinomial Logit Model’’, TransportationResearch Record, 637, 39–45.

MONTGOMERY, M. R., RICHARDS, T. and BRAUN, H. I. (1986), ‘‘Child Health, Breast Feeding, andSurvival in Malaysia: A Random-Effects Logit Approach’’, Journal of the American Statistical Association,81, 297–309.

NEWEY, W. K. and MCFADDEN, D. L. (1994), ‘‘Large Sample Estimation and Hypothesis Testing’’, In R. F.Engle and D. L. McFadden (eds.), Handbook of Econometrics, Vol. 4, Chap. 36 (Amsterdam: Elsevier ScienceBV).

POSTORINO, M. N. (1993), ‘‘A Comparative Analysis of Different Specifications of Modal Choice Models in anUrban Area’’, European Journal of Operational Research, 71, 288–302.

SILVERMAN, B. W. (1986) Density Estimation for Statistics and Data Analysis (London: Chapman and Hall).SMALL, K. A. (1994), ‘‘Approximate Generalized Extreme Value Models of Discrete Choice’’, Journal of

Econometrics, 62, 351–382.STECKEL, J. H. and VANHONACKER, W. R. (1988), ‘‘A Heterogeneous Conditional Logit Model of

Choice’’, Journal of Business and Economic Statistics, 6, 391–398.STERN, S. (1994), ‘‘Two Dynamic Discrete Choice Estimation Problems and Simulation Method Solutions’’,

Review of Economics and Statistics, 76, 695–702.WHITE, H. (1982), ‘‘Maximum Likelihood Estimation of Misspecified Models’’ Econometrica, 50, 1–25.WHITE, H. (1994) Estimation, Inference and Specification Analysis (Cambridge: Cambridge University Press).

Taste variation in discrete choice models

Documents