+ All Categories
Home > Documents > INFORMS is located in Maryland, USA Publisher: Institute...

INFORMS is located in Maryland, USA Publisher: Institute...

Date post: 14-Oct-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
22
This article was downloaded by: [103.24.77.60] On: 25 January 2017, At: 17:49 Publisher: Institute for Operations Research and the Management Sciences (INFORMS) INFORMS is located in Maryland, USA Management Science Publication details, including instructions for authors and subscription information: http://pubsonline.informs.org On Theoretical and Empirical Aspects of Marginal Distribution Choice Models Vinit Kumar Mishra, Karthik Natarajan, Dhanesh Padmanabhan, Chung-Piaw Teo, Xiaobo Li To cite this article: Vinit Kumar Mishra, Karthik Natarajan, Dhanesh Padmanabhan, Chung-Piaw Teo, Xiaobo Li (2014) On Theoretical and Empirical Aspects of Marginal Distribution Choice Models. Management Science 60(6):1511-1531. http://dx.doi.org/10.1287/ mnsc.2014.1906 Full terms and conditions of use: http://pubsonline.informs.org/page/terms-and-conditions This article may be used only for the purposes of research, teaching, and/or private study. Commercial use or systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisher approval, unless otherwise noted. For more information, contact [email protected]. The Publisher does not warrant or guarantee the article’s accuracy, completeness, merchantability, fitness for a particular purpose, or non-infringement. Descriptions of, or references to, products or publications, or inclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, or support of claims made of that product, publication, or service. Copyright © 2014, INFORMS Please scroll down for article—it is on subsequent pages INFORMS is the largest professional society in the world for professionals in the fields of operations research, management science, and analytics. For more information on INFORMS, its publications, membership, or meetings visit http://www.informs.org
Transcript
Page 1: INFORMS is located in Maryland, USA Publisher: Institute ...natarajan_karthik/wp-content/uploads/2017/01/On...Vinit Kumar Mishra, Karthik Natarajan, Dhanesh Padmanabhan, Chung-Piaw

This article was downloaded by: [103.24.77.60] On: 25 January 2017, At: 17:49Publisher: Institute for Operations Research and the Management Sciences (INFORMS)INFORMS is located in Maryland, USA

Management Science

Publication details, including instructions for authors and subscription information:http://pubsonline.informs.org

On Theoretical and Empirical Aspects of MarginalDistribution Choice ModelsVinit Kumar Mishra, Karthik Natarajan, Dhanesh Padmanabhan, Chung-Piaw Teo, Xiaobo Li

To cite this article:Vinit Kumar Mishra, Karthik Natarajan, Dhanesh Padmanabhan, Chung-Piaw Teo, Xiaobo Li (2014) On Theoretical andEmpirical Aspects of Marginal Distribution Choice Models. Management Science 60(6):1511-1531. http://dx.doi.org/10.1287/mnsc.2014.1906

Full terms and conditions of use: http://pubsonline.informs.org/page/terms-and-conditions

This article may be used only for the purposes of research, teaching, and/or private study. Commercial useor systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisherapproval, unless otherwise noted. For more information, contact [email protected].

The Publisher does not warrant or guarantee the article’s accuracy, completeness, merchantability, fitnessfor a particular purpose, or non-infringement. Descriptions of, or references to, products or publications, orinclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, orsupport of claims made of that product, publication, or service.

Copyright © 2014, INFORMS

Please scroll down for article—it is on subsequent pages

INFORMS is the largest professional society in the world for professionals in the fields of operations research, managementscience, and analytics.For more information on INFORMS, its publications, membership, or meetings visit http://www.informs.org

Page 2: INFORMS is located in Maryland, USA Publisher: Institute ...natarajan_karthik/wp-content/uploads/2017/01/On...Vinit Kumar Mishra, Karthik Natarajan, Dhanesh Padmanabhan, Chung-Piaw

MANAGEMENT SCIENCEVol. 60, No. 6, June 2014, pp. 1511–1531ISSN 0025-1909 (print) � ISSN 1526-5501 (online) http://dx.doi.org/10.1287/mnsc.2014.1906

© 2014 INFORMS

On Theoretical and Empirical Aspects ofMarginal Distribution Choice Models

Vinit Kumar MishraDepartment of Business Analytics, University of Sydney Business School, New South Wales 2006, Australia,

[email protected]

Karthik NatarajanEngineering Systems and Design, Singapore University of Technology and Design, Singapore 138682,

[email protected]

Dhanesh PadmanabhanGeneral Motors Research and Development–India Science Lab, Bangalore 560066, India,

[email protected]

Chung-Piaw TeoDepartment of Decision Sciences, National University of Singapore Business School, Singapore 117591,

[email protected]

Xiaobo LiIndustrial and Systems Engineering, University of Minnesota, Minneapolis, Minnesota 55455,

[email protected]

In this paper, we study the properties of a recently proposed class of semiparametric discrete choice models(referred to as the marginal distribution model (MDM)), by optimizing over a family of joint error distributions

with prescribed marginal distributions. Surprisingly, the choice probabilities arising from the family of generalizedextreme value models of which the multinomial logit model is a special case can be obtained from this approach,despite the difference in assumptions on the underlying probability distributions. We use this connection todevelop flexible and general choice models to incorporate consumer and product level heterogeneity in bothpartworths and scale parameters in the choice model. Furthermore, the extremal distributions obtained from theMDM can be used to approximate the Fisher’s information matrix to obtain reliable standard error estimates of thepartworth parameters, without having to bootstrap the method. We use simulated and empirical data sets to testthe performance of this approach. We evaluate the performance against the classical multinomial logit, mixed logit,and a machine learning approach that accounts for partworth heterogeneity. Our numerical results indicate thatMDM provides a practical semiparametric alternative to choice modeling.

Keywords : discrete choice model; convex optimization; machine learning; applied probabilityHistory : Received April 1, 2012; accepted December 18, 2013, by Eric Bradlow, special issue on business analytics.

Published online in Articles in Advance May 5, 2014.

1. IntroductionConjoint analysis is used in practice to determinehow consumers choose among products and services,based on the utility maximization framework. It allowscompanies to decompose consumers preferences intopartworths (or utilities), associated with each levelof each attribute of products and services. Since theearly work of McFadden (1974) on the logit basedchoice model, and the subsequent introduction ofthe generalized extreme value models (see McFadden1977, 1978), discrete choice models have been usedextensively in many areas in economics, marketing,and transportation research.

In the simplest discrete choice model, the utilityof a consumer i ∈I= 81121 0 0 0 1 I9 for an alternative

k ∈ K = 81121 0 0 0 1K9 is decomposed into a sumof a deterministic and a random utility given byUik = Vik + �ik. The first component Vik is a functionof the consumers preference weights Âi, expressed asVik4Âi5, that represents the deterministic utility obtainedfrom the observed attributes of the alternative in thechoice task. Typically, Âi is the vector of the consumer’spreference weights (or partworths) for the vector ofattributes of alternative k offered to consumer i, denotedby xik with Vik4Âi5= Â′

ixik. The second component �ikdenotes the random and unobservable or idiosyncraticterm in the utility.

Prediction of consumer i’s choice for alternative kis done by evaluating the choice probability Pik 2=P4Uik ≥ maxl∈K Uil5 = P4Vik + �ik ≥ maxl∈K4Vil + �il55.

1511

Dow

nloa

ded

from

info

rms.

org

by [

103.

24.7

7.60

] on

25

Janu

ary

2017

, at 1

7:49

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Page 3: INFORMS is located in Maryland, USA Publisher: Institute ...natarajan_karthik/wp-content/uploads/2017/01/On...Vinit Kumar Mishra, Karthik Natarajan, Dhanesh Padmanabhan, Chung-Piaw

Mishra et al.: Marginal Distribution Choice Models1512 Management Science 60(6), pp. 1511–1531, © 2014 INFORMS

The computation involves an evaluation of a multi-dimensional integral, and closed-form solutions areavailable only for certain classes of distributions. Thisincludes the generalized extreme value (GEV) family,which is derived under the assumption that the errorterms follow a generalized extreme value distribu-tion. The multinomial logit (MNL) and nested logit(NestL) models are well-known members of this fam-ily. The choice probabilities do not have closed-formsolutions for most other distributions.

In practice, an adequate modeling of consumer het-erogeneity is important for accurate choice prediction.To account for taste variation among the consumers, themixed logit (MixL) model (see, e.g., Train 2009, Allenbyand Rossi 1999) assumes that the partworth parametervector Âi is sampled from a distribution Â0 + Åai . In thisway, the utility function Uik = 4Â0 + Åai 5

′xik + �ik capturesconsumer taste variation across the attributes. By inte-grating over the density, the choice probabilities underthe MixL model is derived as Pik =

Pik4Åai 5g4Å

ai 5 dÅ

ai ,

where Pik4Åai 5 is the choice probability for given Åai , and

g4 · 5 denote the probability density of Åai . For instance,when the error terms �ik are independent and iden-tically distributed (i.i.d.) Gumbel, the MNL formulaapplies with

Pik4Åai 5=

e4Â0+Åai 5′xik

l∈K e4Â0+Åai 5′xil

0

At the same time, g4 · 5 is typically assumed to be acontinuous unimodal distribution such as the multi-variate normal distribution. The choice probabilitiesand parameters for the MixL model are then estimatedusing simulation techniques.

An alternate way to deal with consumer taste vari-ation is to use a regularization technique commonlyemployed in machine learning. Evgeniou et al. (2007)propose a convex optimization formulation in the dis-crete choice setting as an alternative to the MixL model.Their approach is based on augmenting the conditionallog-likelihood objective with a regularization term thatpools information among the consumers and shrinksthe individual partworths toward the population mean.This is analogous to estimating the model using thelikelihood function Pik4Å

ai 5g4Å

ai 5.

In all these approaches, the error terms �ik are essen-tially i.i.d. Gumbel, so that the choice probabilitiesPik4Å

ai 5, can be expressed in closed form. However, this

implicitly assumes that the independence of irrelevantalternatives (IIA) property holds at the individual level,even though it does not hold at the population level.Steenburgh and Ainslie (2010) recently pointed out thatthe MixL model has undesirable properties, in that itrestricts the type of substitution patterns that one canexpect in the population, as long as the individual IIAproperty holds. Louviere and Meyer (2007) (see also

Figure 1 Plots of the Ratings for Two Movies

0.81.01.21.41.61.82.02.22.42.62.83.03.23.43.63.84.04.24.44.64.85.05.2

0.81.01.21.41.61.82.02.22.42.62.83.03.23.43.63.84.04.24.44.64.85.05.2

Fiebig et al. 2010) have also argued that much of thetaste heterogeneity encountered in many practical choicetasks can be better described as scale heterogeneity, byassuming that consumers have different scales for theidiosyncratic error terms. They modeled the randomÂi by �aiÂ0, where Â0 is independent of i, and �ai is arandom scale term for each consumer. Note that in thesemodels, the idiosyncratic errors �ik for the alternatives,after the scaling by �ai , are still i.i.d. Gumbel, to ensuretractability of the choice model.

These assumptions, however, are not realistic formany choice tasks, especially for the case when oneof the alternatives is presented as a no-choice or opt-out option. In these cases, the assumption that theidiosyncratic error distribution for the opt-out optionis identical to the rest of the product options is oftentoo strong. This assumption is also not valid for manyproduct evaluations. For instance, using the data fromGroupLens Research page,1 the ratings for two ran-domly selected movies (by 452 and 136 users, respec-tively) in the data set are as shown in Figure 1. Clearly,the scale of the ratings for the two movies are different,as the spreads of the ratings for the two movies differ.If ratings are indicative of the utilities associated withthe movies, it is then essential to account for the dif-ference in scales in the idiosyncratic error for choicemodeling in movie selections.

There have been other attempts to investigate thebenefits of incorporating heteroskedastic error distribu-tions into choice models. Most of these studies buildon the heteroskedastic extreme value (HEV) modelintroduced by Bhat (1995), based on the assumption of

1 See http://grouplens.org/datasets/movielens/#attachments (ac-cessed November 2013).

Dow

nloa

ded

from

info

rms.

org

by [

103.

24.7

7.60

] on

25

Janu

ary

2017

, at 1

7:49

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Page 4: INFORMS is located in Maryland, USA Publisher: Institute ...natarajan_karthik/wp-content/uploads/2017/01/On...Vinit Kumar Mishra, Karthik Natarajan, Dhanesh Padmanabhan, Chung-Piaw

Mishra et al.: Marginal Distribution Choice ModelsManagement Science 60(6), pp. 1511–1531, © 2014 INFORMS 1513

independent extreme value error distributions withnonidentical scale. Since a closed-form expression forthe choice probabilities under HEV is not available,Bhat (1995) developed a Gaussian quadrature techniqueto estimate the parameters of HEV using the maximumlikelihood method.

In this paper, we build on a recently proposedclass of semiparametric choice models that allow theidiosyncratic error terms to have different scales acrossproducts/alternatives and is computationally tractable.In a way, this approach extends the nested logit model,where the scale associated with nests may be different.More importantly, our approach can be combinedwith the machine learning approach discussed earlierto handle both partworth and scale heterogeneityacross consumers and products/attributes. In fact,incorporating regularization terms often speeds upthe convergence rate of the estimation procedure. Ourapproach has the added computational advantage thatthe calibration problem can be solved using standardnonlinear optimization packages. This approach isalong the line of work in the semiparametric andnonparametric literature (see Manski 1975, Farias et al.2013), in that we do not assume that the distribution ofthe idiosyncratic error terms are completely specified.For these models, the standard errors of the estimatesare often obtained via a bootstrapping approach. In ourapproach, the marginals in MDM can be used toapproximate the Fisher’s information matrix to obtainreliable standard error estimates of the partworthparameters directly.

The structure of the rest of the paper and our maincontributions are as follows:

In §2, we introduce the marginal distribution model(MDM) and establish its connection with the gener-alized extreme value model. We show that the GEVchoice probabilities can be obtained as an instance ofMDM using generalized exponential distributions. In aspecial case, this implies that MDM with identical expo-nential distributions generates the MNL choice proba-bilities. This is surprising because MDM is obtainedwithout the assumption of independence, whereas theMNL model imposes the assumption of independenceamong the error terms. In fact, we show that underappropriate choice of marginals, there is a one-to-onecorrespondence between all choice probabilities in theunit simplex and the deterministic components of theutilities. As an illustration, the choice probabilities fora distributionally robust choice model (with knownfirst and second moments for the marginals) can bederived from MDM using t-distributions.

In §3, we study the parameter estimation problemunder the MDM using the maximum log-likelihoodapproach. This estimation problem is known to beconvex for only a few special cases. We show that undera linear utility specification, the estimation problem

for MDM is convex in partworths under appropriateconditions for several classes of marginal distributions.This includes the MNL and NestL results as specialcases. We show further that the method can be usedto find asymptotic variance of maximum likelihoodestimators under MDM.

In §§4 and 5, we test the performance of MDM usingsimulated data and empirical conjoint choice data onsafety features in automobiles. For companies suchas General Motors, understanding consumer prefer-ences for these features is important as it provideskey insights on packaging features for current andfuture vehicles. We use three types of models—MNLwith and without heterogeneous partworth preferences,mixed logit with random partworths, and instancesof MDM—to analyze the data. We find that despitethe nonconvexity of the calibration problem for MDMwhen both scale and partworth parameters need tobe estimated, the optimization problem can be solvedfairly efficiently using standard nonlinear optimizationpackages. The ability to account for scale heterogeneityamong consumers using MDM is shown to be use-ful, because it improves the out-of-sample hit rateby around 20%, in comparison to the simplest MNLmodel. The out-of-sample predictions in this data setalso indicate that MDM with consumer scale hetero-geneity performs as well, if not better, than MNL withconsumer partworth heterogeneity, despite using muchfewer variables and at a fraction of the computationaltime. We also propose a way to derive standard errorestimates for partworths based on the MDM approach,and compare its performance with the standard mixedlogit.

2. Choice Prediction Under MDMIn this section, we introduce the marginal distributionmodel and establish useful properties for the choiceprobabilities obtained from this model.

2.1. Marginal Distribution Choice ModelsNatarajan et al. (2009) recently proposed a semipara-metric approach to choice modeling using limitedinformation on the joint distribution of the randomutilities. Under this model, the choice prediction isperformed as follows. The modeler assumes that theconsumer i is utility maximizing while making a choiceamong the alternatives k ∈K (with utilities Uik) andsolves

Z4Ui5 = max{

k∈K

Uikyik2∑

k∈K

yik = 11

yik ∈ 80119 ∀k ∈K

}

0 (1)

The key difference in the marginal distribution modellies in the evaluation of choice probabilities. Rather than

Dow

nloa

ded

from

info

rms.

org

by [

103.

24.7

7.60

] on

25

Janu

ary

2017

, at 1

7:49

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Page 5: INFORMS is located in Maryland, USA Publisher: Institute ...natarajan_karthik/wp-content/uploads/2017/01/On...Vinit Kumar Mishra, Karthik Natarajan, Dhanesh Padmanabhan, Chung-Piaw

Mishra et al.: Marginal Distribution Choice Models1514 Management Science 60(6), pp. 1511–1531, © 2014 INFORMS

assuming a probability distribution � of the randomutility vector Ui = 4Ui11 0 0 0 1 UiK5, the choice probabili-ties are evaluated for a probability distribution �∗ ofthe random utility vector Ui that satisfies certain pre-specified conditions (e.g., fixed marginal distributionalinformation or marginal moment information) and isextremal with respect to the consumer expected welfare.An extremal distribution here refers to the assumptionthat �∗ is chosen from a set of probability distributionsä so that it maximizes the consumer expected welfare:

max�∈ä

E�4Z4Ui550 (2)

The extremal distribution is thus given by

�∗= arg max

�∈ä

E�4Z4Ui550 (3)

Given marginal distributions in the description of ä,the framework of “copula modeling” from probabilitytheory (see Sklar 1959, Nelsen 2006) provides a nat-ural way in which univariate marginal distributionscan be combined to create multivariate distributions.Besides the popular use of copula in financial riskmanagement, Danaher and Smith (2011) recently iden-tified the flexibility of a copula modeling approach inmarketing applications. Examples of copula used inmarketing applications include the Gaussian copula(see Danaher and Smith 2011, Seetharman et al. 2005)and the Sarmanov copula (see Schweidel et al. 2008,Park and Fader 2004, Danaher 2007). Our choice of theextremal distribution in (3) gives rise to joint randomvariables, which is commonly referred to in copulatheory literature as countermonotonic random variables(see Nelsen 2006, Weiss 1986).2

When ä denotes the family of probability distribu-tions with prescribed marginals, we obtain the marginaldistribution model. When ä denotes the family ofall probability distributions with prescribed first andsecond marginal moments, we obtain the marginalmoment model (MMM). Under the extremal jointdistribution �∗ of the random utility vector,

E�∗4Z4Ui55 = E�∗

(

k∈K

Uiky∗

ik4Ui5

)

=∑

k∈K

E�∗4Uik � y∗

ik4Ui5= 15P�∗4y∗

ik4Ui5= 151

where y∗

ik4Ui5 is the optimal value of decision variableyik in (1), which is random because of random coeffi-cients Uik, and P�∗4y∗

ik4Ui5= 15 is the choice probabilityof kth alternative for consumer i under the extremal

2 For K = 2, countermonotonic random variables are generated from aproper copula, and for K > 2 the countermonotonic random variablesare generated from a proper quasi-copula. A detailed discussion onthis topic can be found in Nelsen (2006).

distribution. For a detailed discussion on these models,the reader is referred to Natarajan et al. (2009), wherethe model is derived and its application to discretechoice modeling is provided.

For the connection to the persistency problem, seeBertsimas et al. (2006). The key result of Natarajan et al.(2009) for MDM is given in the following theorem.

Theorem 1 (Natarajan et al. 2009). For consumer i,assume that the marginal distribution Fik4 · 5 of the error term�ik is a continuous distribution for all k ∈K. The followingconcave maximization problem solves (2):

maxPi

{

k∈K

(

VikPik +

∫ 1

1−Pik

F −1ik 4t5 dt

)

2∑

k∈K

Pik = 11

Pik ≥ 0 ∀k ∈K

}

(4)

and the choice probabilities under an extremal distribution�∗ of (3) is the optimal solution vector P∗

i to (4).

Under MDM, the optimality conditions yield thechoice probabilities as

P ∗

ik = 1 − Fik4�i −Vik51 (5)

where the Lagrange multiplier �i satisfies the followingnormalization condition:

k∈K

P ∗

ik =∑

k∈K

41 − Fik4�i −Vik55= 10 (6)

Natarajan et al. (2009) provide an explicit characteriza-tion of the extremal distribution �∗.3

This result has several immediate implications. Takefor instance Fik4�5= 1 − e−� for � ≥ 0. Solving the opti-mality condition for MDM for (5) and (6), we obtainthe MNL choice probabilities:

Pik =eVik

l∈K eVil0

In fact, the formulation in Theorem 1 in this case isexactly the entropy-type utility maximization prob-lem proposed by Anderson et al. (1988) to derive theMNL choice probabilities via a “representative con-sumer” model. The extremal distribution �∗ obtainedfrom the MDM provides an alternative explanationto the representative consumer model of Andersonet al. (1988).

3 This distribution is a finite mixture distribution of the form �∗4Åi5=∑

k∈K P ∗

ik fik4�i5, where P ∗

ik satisfies the conditions (5) and (6) and thedensity function for the error terms are generated independentlyfrom the density functions of the error terms fik4 · 5 as follows:

fik4�i5=fik4�ik5I8�ik≥�i−Vik9

1 − Fik4�i −Vik5

l∈K2 l 6=k

fil4�il5I8�il≤�i−Vil 9

Fil4�i −Vil5for k ∈K0

It is clear from this construction that the joint distribution underMDM is highly correlated.

Dow

nloa

ded

from

info

rms.

org

by [

103.

24.7

7.60

] on

25

Janu

ary

2017

, at 1

7:49

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Page 6: INFORMS is located in Maryland, USA Publisher: Institute ...natarajan_karthik/wp-content/uploads/2017/01/On...Vinit Kumar Mishra, Karthik Natarajan, Dhanesh Padmanabhan, Chung-Piaw

Mishra et al.: Marginal Distribution Choice ModelsManagement Science 60(6), pp. 1511–1531, © 2014 INFORMS 1515

An important consideration that a modeler faces inthe selection of a discrete choice model is that the utilitymodel selected must be capable of generating all possi-ble choice probabilities in the unit simplex. Hofbauerand Sandholm (2002) and Norets and Takahashi (2013)have shown that under mild assumptions on the jointdistribution of the error terms, the mapping from thedeterministic components of the utilities to the set ofchoice probabilities is subjective. Hence, any vector ofchoice probabilities in the unit simplex can be obtainedby selecting suitable Vik. We show in the next theoremthat under mild assumptions on the marginal densityfunction, MDM also satisfies desirable properties.

Theorem 2. Set Vi1 = 0. Assume MDM with errorterms �ik, k ∈K that have a strictly increasing continuousmarginal distribution Fik4 · 5 defined either on a semi-infinitesupport 6�ik1�5 or an infinite support 4−�1�5. Let ãK−1

be the K − 1 dimensional simplex of choice probabilities:

ãK−1 =

{

Pi = 4Pi11 0 0 0 1 PiK52∑

k∈K

Pik = 11 Pik ≥ 0 ∀k ∈K

}

0

Let ê4Vi21 0 0 0 1ViK52 <K−1 → ãK−1 be a mapping fromthe deterministic components of the utilities to the choiceprobabilities under MDM. Then � is a bijection between<K−1 and the interior of the simplex ãK−1.

Proof. See the appendix.

This result shows that with the error distributionFik4 · 5 fixed, the deterministic utility component Vi,with Vi1 fixed at zero, is identifiable from the choiceprobabilities Pi. In the appendix, we provide additionalconditions under which the deterministic utility com-ponent and the parameters of the error distributionFik4 · 5 are simultaneously identifiable from the choiceprobabilities Pi.

2.2. Distributionally Robust Model withKnown First and Second Moments

In a similar spirit, when error terms �ik have meanzero and variance �2

ik, but the exact distribution is notknown, the choice probabilities under the marginalmoment model can be found by solving the followingconcave maximization problem:

maxPi

{

k∈K

4VikPik +�ik

Pik41 − Pik552∑

k∈K

Pik = 11

Pik ≥ 0 ∀k ∈K

}

0 (7)

In this case, the optimality conditions generate thechoice probabilities

P ∗

ik =12

(

1 +Vik −�i

√4Vik −�i5

2 +�2ik

)

1 (8)

where �i satisfies the following normalization condition:∑

k∈K

P ∗

ik =∑

k∈K

12

(

1 +Vik −�i

√4Vik −�i5

2 +�2ik

)

= 10 (9)

We show that this choice formula can be obtained fromMDM using t-distributions as the marginal distribu-tions. Let

Fik4�5=12

(

1 +�

√�2ik + �2

)

(10)

denote the distribution function of �ik. From the MDMchoice probability (5), we have

Pik = 1 − Fik4�i −Vik5

= 1 −12

(

1 +�i −Vik

√�2ik + 4�i −Vik5

2

)

=12

(

1 +Vik −�i

√�2ik + 4�i −Vik5

2

)

0

This is precisely the choice probability obtained fromsolving the MMM.

2.3. Generalized Extreme Value ModelThe MDM choice probabilities as given by (5) and (6)are quite general in the sense that different choices ofthe marginal distributions Fik of error terms �ik leads todifferent choice probabilities. We show next that thismodel can be related to the GEV model, namely, allGEV probabilities can be obtained from MDM.

Suppose

Fik4�5 = 1 − e−�Gik4eVi11 0 0 0 1 eViK 1Ã5

for � ≥ ln4Gik4eVil1 0 0 0 1 eViK 1Ã550 (11)

Note that Fik4 · 5 is a valid distribution function for �ikunder the assumptions on Gik4 · 5 listed in McFadden(1978), and à are given parameters for the distribu-tions. In this case, the Lagrange multiplier satisfies thefollowing condition:

k∈K

Pik =∑

k∈K

41 − Fik4�i −Vik55

= e−�i ×∑

k∈K

eVikGik4eVi11 0 0 0 1 eViK 1Ã5= 10

Solving this equation, we get

�i = ln(

k∈K

eVikGik4eVi11 0 0 0 1 eViK 1Ã5

)

0

The consumer i’s choice probability for alternative k isthen given by

Pik =eVik+lnGik4e

Vi1 10001eViK 1Ã5

l∈K eVil+lnGil4eVi1 10001eViK 1Ã5

0 (12)

This is exactly the choice probabilities obtained fromthe GEV model. Note that

Pik

Pil

=eVik

eVil

Gik4eVi11 0 0 0 1 eViK 1Ã5

Gil4eVi11 0 0 0 1 eViK 1Ã5

0

Hence, by specifying appropriate choices of Gik4 · 5, wecan overcome the limitation of the IIA property inherent

Dow

nloa

ded

from

info

rms.

org

by [

103.

24.7

7.60

] on

25

Janu

ary

2017

, at 1

7:49

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Page 7: INFORMS is located in Maryland, USA Publisher: Institute ...natarajan_karthik/wp-content/uploads/2017/01/On...Vinit Kumar Mishra, Karthik Natarajan, Dhanesh Padmanabhan, Chung-Piaw

Mishra et al.: Marginal Distribution Choice Models1516 Management Science 60(6), pp. 1511–1531, © 2014 INFORMS

in MNL. For instance, when Gik4eVi11 0 0 0 1 eViK 1Ã5 =

m∈K �kmVim for all k, we have

Pik

Pil

=eVik

eVil

m �kmVim∑

m �lmVim

0

If �km > �lm, then whenever Vim increases, Pik/Pil

increases too. This captures the substitution pattern forthe case when product m takes away more shares fromproduct l then from product k. Hence, the GEV modelsdeveloped this way need not satisfy the IIA property.

3. Estimation Under MDMIn this section, we provide results on the parameterestimation problem under the MDM by exploiting first-order optimality conditions. Since the choice probabili-ties of several classical choice models can be recoveredfrom MDM, this provides a new approach to calibratethe parameters in these models.

3.1. Convex Calibration Under MDMMcFadden (1974) showed that the maximum log-likelihood problem under MNL is a convex optimiza-tion problem. For the nested logit model, Daganzo andKusnic (1993) showed that the problem is convex inpartworth parameters for a choice of scale parametersif the mean utility is linear in the partworths. Note thatin the NestL model, scale parameters are also estimatedand the problem is not jointly convex in partworthparameters and scale parameters.

In the following theorem we present a general con-vexity result for MDM. We assume that the populationis homogeneous (i.e., consumer partworths Âi = Â),and Vik4Â5 is affine in Â. The calibration problem is toestimate the location parameters Â.

Theorem 3. The maximum log-likelihood problem underMDM can be formulated as a convex optimization problem,when the tail distribution Fik4x5 2= 1 − Fik4x5 of the errorterms satisfies the following two conditions: for x in thedomain, (i) Fik4x5 is log concave and (ii) Fik4x5 is convex.

Proof. The maximum log-likelihood problem underthe MDM assuming homogeneous consumer part-worths Âi = Â can be formulated as the optimizationproblem

maxË1Â

i∈I

k∈K

zik ln41 − Fik4�i −Â′xik55

s.t.∑

k∈K

41 − Fik4�i −Â′xik55= 11 i ∈I1(13)

where zik = 1 if the consumer i ∈I chooses alterna-tive k ∈K and 0 otherwise. It is easy to see that theconstraints in (13) can be relaxed to

k∈K

41 − Fik4�i −Â′xik55≤ 11 (14)

since both the objective function and the left-hand sideof the unique constraint involving �i are decreasing

in �i. In the optimal solution, �i is hence chosen to bethe value such that

k∈K

41 − Fik4�i −Â′xik55= 10

Under the assumption that Fik4x5= 1 − Fik4x5 is bothlog concave and convex, the MLE in (13), with theconstraints replaced by 4145 is a convex optimizationproblem. �

Note that Fik4x5 is log concave if fik4x5 is log concave.Furthermore, convexity of Fik4x5 implies that fik4x5, andhence log4fik4x55, is nonincreasing. Hence, any densityfunction with log4f 4x55 concave and nonincreasing willsatisfy the conditions of the theorem. This includesf 4x5= e−x as a special case, which corresponds to thewell-known MNL choice model.

3.2. Nested LogitIn the classical nested logit model, we have the GEVfunction Gik4y11 0 0 0 1 yK1Ã = 8�r9r 5=

∑Rr=14

k∈Bry

1/�rr 5�r ,

where the variables are partitioned into r = 11 0 0 0 1Rblocks, each with Br elements. The model is known tobe consistent with utility maximizing behavior for � ∈

40117. Let Br 4k5 denote the block containing element k.The estimation problem for this case reduces to

4NestL5 maxË1Â

i∈I

k∈K

zik

(

−�i +Â′xik

+ ln(

l∈Br 4k5

e41/�r 54Â′xil−Â′xik5

)�r−1)

s.t.∑

k∈K

(

l∈Br 4k5

e41/�r 54Â′xil−Â′xik5

)�r−1

× e−�i+Â′xik ≤ 11 i ∈I0

(15)

Using the theory of constrained optimization underMDM, this approach provides an alternative proof ofthe result in Daganzo and Kusnic (1993).

Theorem 4. The estimation problem (15) for nested logitis a convex optimization problem if �r ∈ 40117.

Proof. See the appendix.

3.3. Marginal Exponential ModelAs a generalization, we consider a heterogeneous ver-sion of MDM in this section where the exponentialmarginal distributions potentially have different scaleparameters. This model is inspired by the heteroskedas-tic extreme value model of Bhat (1995). Consider aninstance of MDM, which we call the marginal expo-nential model (MEM) where the scale parameters foreach alternative for each consumer is given by �ik > 0(possibly different) and the marginal distribution isgiven as

Fik4�5= 1 − e−�ik� for � ≥ 00 (16)

Dow

nloa

ded

from

info

rms.

org

by [

103.

24.7

7.60

] on

25

Janu

ary

2017

, at 1

7:49

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Page 8: INFORMS is located in Maryland, USA Publisher: Institute ...natarajan_karthik/wp-content/uploads/2017/01/On...Vinit Kumar Mishra, Karthik Natarajan, Dhanesh Padmanabhan, Chung-Piaw

Mishra et al.: Marginal Distribution Choice ModelsManagement Science 60(6), pp. 1511–1531, © 2014 INFORMS 1517

Thus, MEM captures heteroskedasticity in the errorterms as in the HEV model. For given scale parameters�ik, the estimation problem under MEM reduces to thefollowing convex formulation:

4MEM5 maxË1Â

i∈I

k∈K

zik�ik4Â′xik −�i5

s.t.∑

k∈K

e�ik4Â′xik−�i5 ≤ 11 i ∈I0 (17)

However, when the scale parameters also need to beestimated, the maximum log-likelihood problem is nolonger jointly convex in the 4Â1Ë5 and Á variables.In this case, the problem is formulated as

maxË1Â1Á

i∈I

k∈K

zik�ik4Â′xik −�i5

s.t.∑

k∈K

e�ik4Â′xik−�i5 = 11 i ∈I

Á ∈C0

(18)

For model identification and to avoid overfitting, it isnecessary to restrict the possible values of the scaleparameters Á to be chosen from an appropriate set C.

3.4. Handling Partworth and Scale HeterogeneityA key issue in discrete choice modeling is to handleheterogeneity in the data. Evgeniou et al. (2007) pro-posed a class of convex optimization formulations inthe discrete choice setting to handle this issue, whichcan be used as an alternative to the mixed logit model(Allenby and Rossi 1999). Their technique is inspiredby regularization techniques from statistics. By relaxingthe assumption that the consumers come from a homo-geneous population with a common Â, Evgeniou et al.(2007) allow for consumers to have their individualpartworths Âi. We refer to the model in Evgeniou et al.(2007) as �-HetMNL. For �-HetMNL, the log-likelihoodobjective is supplemented with a regularization termthat pools information among the consumers andshrinks the individual partworths toward the popu-lation mean. The convex optimization problem for�-HetMNL model is formulated as

4�-HetMNL5

maxÂi1Â01D

{

i∈I

k∈K

zik ln(

eÂ′ixik

l∈K eÂ′ixil

)

−�∑

i∈I

4Âi −Â05D−14Âi −Â05

}

1

D � 01 trace4D5= T 1 (19)

where the parameter � ≥ 0 captures the trade-offbetween the log-likelihood criterion and the regulariza-tion term. For a fixed value of �, this problem is convex.The value of � is set with cross-validation techniquesthat could possibly use either out-of-sample data or alter-natively bootstrapping methods with in-sample data.The matrix variable D is constrained to be a positive

semidefinite matrix with trace fixed to an arbitrary con-stant T (say 1). This matrix is related to the covariancematrix of partworths. A natural extension is to replacethe MNL choice probabilities in the above formulationusing choice probabilities from MDM. In particular,if we consider the setting where the scale parametersvary across different alternatives with �ik = �k for eachconsumer, we can replace the MNL model with MEMto obtain the following �-HetMEM model:

4�-HetMEM5

maxË1Âi1Â01�k1D

{

i∈I

k∈K

zik�k4Â′

ixik −�i5

−�∑

i∈I

4Âi −Â05D−14Âi −Â05

}

s.t.∑

k∈K

e�k4Â′ixik−�i5 = 11 i ∈I1

D � 01 trace4D5= T 1

Á ∈C0

(20)

The �-HetMEM model has K additional scale parame-ters in �k compared with the �-HetMNL model andit overcomes the constant scale assumption acrossalternatives in �-HetMNL. When the scale parametersare identical for different k, it reduces to the �-HetMNLmodel.

A key feature of MEM is that it has additional flexi-bility to capture heterogeneity in the scale parametersat the consumer and attribute level. The importanceof incorporating scale heterogeneity in the randomcomponent of the utilities has been observed byseveral researchers including Bhat (1995), Louviere(1988), Louviere and Swait (2010), Louviere et al.(2002), Salisbury and Feinberg (2010), and Fiebig et al.(2010). We propose a new class of choice models withscale heterogeneity based on MEM to address thisissue. Our model is inspired by the optimization for-mulation of Evgeniou et al. (2007), but differs in thatwe assume that  is common across the consumers, butallow for the scale parameters to have different valuesfor consumers across products/alternatives. We termthis model the �-HetMEM model. The optimizationproblem under �-HetMEM is formulated as

4�-HetMEM5

maxË1Â1�ik1�k

{

i∈I

k∈K

zik�ik4Â′xik −�i5

−�∑

i∈I

k∈K

4�ik −�k52

}

s.t.∑

k∈K

e�ik4Â′xik−�i5 = 11 i ∈I1

Á ∈C0

(21)

Dow

nloa

ded

from

info

rms.

org

by [

103.

24.7

7.60

] on

25

Janu

ary

2017

, at 1

7:49

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Page 9: INFORMS is located in Maryland, USA Publisher: Institute ...natarajan_karthik/wp-content/uploads/2017/01/On...Vinit Kumar Mishra, Karthik Natarajan, Dhanesh Padmanabhan, Chung-Piaw

Mishra et al.: Marginal Distribution Choice Models1518 Management Science 60(6), pp. 1511–1531, © 2014 INFORMS

The parameter � captures the trade-off between the log-likelihood criterion and the regularization term for thescale parameters. Once again for model identification,we add in constraints that require Á to be chosen froma set C.

Note that 4�-HetMEM5 has significantly less parame-ters than 4�-HetMEM5 and 4�-HetMNL5 if the numberof alternatives is much smaller than the number ofattributes defining the alternatives, i.e., �8�ik9� � �8Âi9�.In the data set obtained from General Motors (see §5)this is indeed the case since the choice tasks involveconsumers selecting one from a few safety featurepackages that are created from combinations of manyattributes. Our computational results with the empiri-cal data set from General Motors also indicates that4�-HetMEM5 can in this case explain much of the tastevariation with far fewer variables, as compared with4�-HetMEM5 and 4�-HetMNL5. This helps confirm theusefulness of capturing scale heterogeneity in choicemodels.

3.5. Asymptotic Variance of the MaximumLog-Likelihood Estimators (MLE)

In this section, we derive a methodology to obtainstandard error estimates of partworth parameter inMDM. We assume that the population is homoge-neous in consumer partworths. Let Â∗ be the maximumlog-likelihood estimator and Â0 be the true partworthparameter vector. From the theory of maximum likeli-hood estimation, it is well known that under regularity4

conditions, the maximum likelihood estimator has thefollowing asymptotic properties:

(a) Consistency. Â∗ converges in probability to Â0(b) Asymptotic normality. As the sample size increases,

the distribution of Â∗ approaches a Gaussian randomvector with mean Â0 and covariance matrix 6I4Â057

−1,where I4Â05= −E06ï

2Â0

lnL4Â057, with lnL4Â05 denotingthe log-likelihood value at Â0. When the form of theinformation matrix I4Â05 is not available, the followingestimator of the information matrix can be used:

I 4Â∗5= −6ï 2Â lnL4Â57Â∗ 0 (22)

We assume that the log-likelihood function in (13) sat-isfies the standard regularity conditions. We now evalu-ate second-order partial derivatives of the log-likelihood

4 Details of these regularity conditions can be found in Greene (2011)and references provided therein. These conditions would typicallyrequire at least the existence of MLE in the interior of feasibility set,and a well-behaved 1 − Fik4�i −Vik4Â55, which is continuous andtwice differentiable as a function of estimated parameters etc. Giventhat MDM replicates GEV of which nested logit and MNL are specialcases, for a large class of extreme choice distributions under MDM,consistency and efficiency of MLE follows for these models (see, e.g.,McFadden 1974 for MNL and Brownstone and Small 1989 for nestedlogit).

function under the MDM at the MLE. For the mod-els such as mixed logit and multinomial probit, theevaluation of these partial derivatives is done numer-ically by manipulating the first-order derivatives ofthe simulated log likelihood. The estimation is notonly computationally challenging, but is also proneto errors. In the case of MDM, analytical expressionscan be derived for the second-order partial deriva-tives. The maximum likelihood problem (13) involvesnormalization constraints, and additional Lagrangemultipliers that need to be accounted for in derivinganalytical expressions for second-order partial deriva-tives. The key step involves deriving partial derivativesof �i with respect to the MLE using the normalizationconstraints. A step-wise method for finding the analyti-cal expressions of second-order partial derivatives forgeneral distribution functions Fik is outlined next.

We present the method when partworth only param-eters  are estimated. The log likelihood is

lnL4Â5=∑

i∈I

k∈K

zik ln41 − Fik4�i −Vik4Â∗5551 (23)

where the variables �i are the solutions to the equation

k∈K

41 − Fik4�i −Vik4Â555= 1 ∀ i ∈I0 (24)

The following steps can be used to evaluate the second-order partial derivatives of the log-likelihood functionat MLE Â∗.

Step 1. For each i ∈ I differentiate constraint (24)with respect to Â. We get the following expression:

ïÂ�i =

k∈K fik4�i −Vik5ïÂVik∑

k∈K fik4�i −Vik50

Note that Vik is a function of  and fik are densityfunctions of Fik. The above vector can be differentiatedagain to find analytical expressions for ï 2

Â�i.Step 2. For i ∈I and k ∈K, choice probabilities are

Pik = 1 − Fik4�i −Vik5. Differentiating these probabilitieswith respect to Â, we get the gradient

ïÂPik = −fik4�i −Vik54ïÂ�i −ïÂVik50

Differentiating again, we get the matrix of secondpartial derivatives as

ï 2ÂPik = −

[

f ′

ik4�i −Vik54ïÂ�i −ïÂVik54ïÂ�i −ïÂVik5′

+ fik4�i −Vik54ï2Â�i −ï 2

ÂVik5]

0

These expressions can easily be derived using first-and second-order derivatives of �i found in Step 1.Note that, in the case when Vik are linear in parameters,ï 2ÂVik = 0.

Dow

nloa

ded

from

info

rms.

org

by [

103.

24.7

7.60

] on

25

Janu

ary

2017

, at 1

7:49

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Page 10: INFORMS is located in Maryland, USA Publisher: Institute ...natarajan_karthik/wp-content/uploads/2017/01/On...Vinit Kumar Mishra, Karthik Natarajan, Dhanesh Padmanabhan, Chung-Piaw

Mishra et al.: Marginal Distribution Choice ModelsManagement Science 60(6), pp. 1511–1531, © 2014 INFORMS 1519

Step 3. The final step involves differentiating thelog-likelihood (23) with respect to  twice to find theexpression for second-order partial derivatives of loglikelihood

ï 2Â lnL4Â5=

i∈I

k∈K

zikPik

(

ï 2ÂPik −

ïÂPikïÂP′

ik

Pik

)

0

Using expressions derived in Step 2, it is possible toevaluate this at the MLE Â∗.

These values can be thus used to find standarderrors of estimators in MDM. In our computationalexperiments, with additional constraints imposed onthe scale parameters, we found that standard errorestimates of the scale parameters were much lessreliable. Hence, we restrict our attention to findingstandard error estimates for the  parameters in thispaper.

4. Computational Experiments UsingSimulated Data

In this section, we test for the empirical identificationof the partworth and scale parameters in MEM using asmall-scale simulation study with alternative specificscale parameters �ik = �k. Consider a choice set withalternatives indexed as K = 811213149, where alter-native 4 is the outside option. The utility of the fouralternatives for consumer i is defined as

Uik = ASCk +�1xik1 +�2xik2 +�3xik3 + �ik

∀ i ∈I1 ∀k ∈ 81121391

Ui4 = �i4 ∀ i ∈I1 ∀k ∈K0

Figure 2 Comparison of True and Estimated Parameters from MEM

–3 –2 –1 0 1 2 3–3–2

–1

0

1

2

3

5,000 simulations

–3 –2 –1 0 1 2 3–3–2

–1

0

1

2

310,000 simulations

–3 –2 –1 0 1 2 3–3–2

–1

0

1

2

315,000 simulations

–3 –2 –1 0 1 2 3–3

–2

–1

0

1

2

ASCtrueASCtrueASCtrueASCtrue

AS

Ces

t

AS

Ces

t

AS

Ces

t

AS

Ces

t

20,000 simulations

–3 –2 –1 0 1 2 3–3–2

–1

0

1

2

3

–3 –2 –1 0 1 2 3–3–2

–1

0

1

2

3

–3 –2 –1 0 1 2 3–3–2

–1

0

1

2

3

–3 –2 –1 0 1 2 3–3–2

–1

0

1

2

3

0 1 20

1

2

�true �true �true �true

�true �true �true �true

0 1 20

1

2

0 1 20

1

2

0 1 20

1

2

�es

t

�es

t

�es

t

�es

t

�es

t

�es

t

�es

t

�es

tThe constants ASCk denote the alternative specific con-stants for the first three alternatives and Â= 4�11�21�35denotes the partworths for three attributes common tothe alternatives. The attribute values xik = 4xik11 xik21 xik35were generated from random independent draws uni-formly chosen in the interval 60117. The choice taskswere simulated for 15 instances of randomly generatedparameters as follows: Alternative specific constantsASCk were chosen randomly in the range 6−1117, part-worths  were generated randomly in the range 6−2127,scale parameters Á= 4�11�21�35 were generated ran-domly in the range 60004127. The scale parameter forthe outside option was set to 1. For a given instance ofthe parameters, we simulated the error terms from theextremal distribution in MEM (see §2) for I = 51000,I = 101000, I = 151000 and I = 201000 consumers. Usingthe choices made by the consumers in the simulation,we estimated the parameters using the following log-likelihood estimation model based on (18):

maxË1ASC1Â1Á

{

i∈I

k∈8112139

zik�k4ASCk+Â′xik−�i5

+∑

i∈I

zi44−�i5

}

s.t.∑

k∈8112139

e�k4ASCk+Â′xik−�i5+e−�i =11 i∈I1

0≤�k ≤21 k∈81121390

(25)

The optimization problems were coded in AMPL andsolved using the LOQO solver. The scatter plot forthe true and estimated parameters for the 15 sets ofparameters are plotted in Figure 2. From Figure 2,

Dow

nloa

ded

from

info

rms.

org

by [

103.

24.7

7.60

] on

25

Janu

ary

2017

, at 1

7:49

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Page 11: INFORMS is located in Maryland, USA Publisher: Institute ...natarajan_karthik/wp-content/uploads/2017/01/On...Vinit Kumar Mishra, Karthik Natarajan, Dhanesh Padmanabhan, Chung-Piaw

Mishra et al.: Marginal Distribution Choice Models1520 Management Science 60(6), pp. 1511–1531, © 2014 INFORMS

Figure 3 Comparison of True Parameters in MNL with Different Scale Parameters and Estimated Parameters from MEM

–12 –6 0 2–12

–6

0

2

ASCtrue �true �true

AS

Ces

t

�es

t

�es

t

–2 –1 0 1 2–2

–1

0

1

220,000 simulations

0 1 20

1

2

we observe that as the number of simulated choicesincreases, the estimated parameters clearly get closerto the true parameters. If one knows the true valuesof the �k parameters, then from our previous results,the estimation problem is convex, and hence onewould be guaranteed to converge to the true solution.Since we also estimate the scale parameters for thealternatives, the optimization problem is nonconvex.The results indicate that while nonconvex, the LOQOsolver in conjunction with an appropriate optimizationformulation is able to identify the Á and  parametersfairly well in this example.

To test the robustness of the parameter estimates, wesimulated choices by varying the underlying distribu-tion of the error terms. Results in §2 imply that theMEM choice probabilities with exponential marginalsis the same as the MNL choice probabilities withGumbel distributions for the identical scale parametercase. In our simulations, we generate heteroskedasticerror distributions using independent Gumbel ran-dom variables with Fik4�5= e−e−�k� and simulate choices.Using the optimization formulation for MEM, we thenestimate the parameters. The results are provided inFigure 3 for 20,000 simulated choices. The results indi-cate that the model is fairly effective in identifying theÁ and  parameters despite model misspecification. Forthe alternative specific constants, we found that in fourof the 45 parameters there was a fairly big differencein the actual values of the ASC parameters and theestimated ASC parameters. A more in-depth study ofthese cases showed that this corresponds to alternativeswith very low scale parameters �k ≤ 001. In this case,the presence of the product term in the estimationmodel �kASCk results in more extreme estimates ofthe alternative specific constant in the optimizationformulation.

5. Computational Experiments UsingEmpirical Data

In this section, we apply the models developed inthis paper to empirical data provided by General

Motors. The choice-based conjoint data was collectedby General Motors to understand consumer’s trade-offs for various vehicle features. We were providedconjoint choice data for a total of 500 consumers. Eachconsumer performed 19 choice tasks. The data werecollected using the CBC/Web system of the choice-based conjoint from Sawtooth Software (2008). Therewere 19 feature attributes and one “price” attribute.Each attribute has between two to 11 levels. In eachchoice task, three alternatives (feature packages) wereshown based on a partial profile conjoint experiment(see Chrzan and Elrod 1995), where only informationon 10 attributes were shown at a time, with the “price”attribute being shown in all tasks. In each choice task,the consumer either chose one of the three alternativesor the outside option. The latter is referred to as the“no choice alternative.” (See Figure 4 for a samplechoice task.)

The complete list of product attributes and theattribute level codes are shown in Table 1. Althoughthe full description of each of the attributes and theirlevels were shown to consumers, we use simple codesto represent them in Table 1. As an illustration, theproduct attribute NS refers to “navigation system.”Level NS1 refers to “standard navigation system,”NS2 refers to “navigation system with curve noti-fication,” NS3 refers to “navigation system withspeed advisor,” NS4 refers to “navigation systemwith curve notification and speed advisor,” whereasNS5 refers to “not present.” For the “no choicealternative,” since there is no information on theattributes, we use zero to denote the mean utility of thealternative.

Apart from the feature package attributes, the dataalso contained information regarding consumers demo-graphic profile such as age, education, income, gen-der, and percentage of time the consumer drives atnight.

5.1. Classes of Choice ModelsWe first compare the estimation and prediction resultson the General Motors choice data using MNL, NestL,

Dow

nloa

ded

from

info

rms.

org

by [

103.

24.7

7.60

] on

25

Janu

ary

2017

, at 1

7:49

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Page 12: INFORMS is located in Maryland, USA Publisher: Institute ...natarajan_karthik/wp-content/uploads/2017/01/On...Vinit Kumar Mishra, Karthik Natarajan, Dhanesh Padmanabhan, Chung-Piaw

Mishra et al.: Marginal Distribution Choice ModelsManagement Science 60(6), pp. 1511–1531, © 2014 INFORMS 1521

Figure 4 A Sample Choice Task

Which of the following packages would you prefer the most?Choose by clicking one of the buttons below

Full speed rangeadaptive cruise

control

-

Traditional back upaid

Lane departurewarning

Front collisionwarning

Side head air bags

Emergency notificationwith pictures

-

-

Option package price:$3,000

Adaptive cruisecontrol

Navigation systemwith curve notification

& speed advisor

Rear vision system

-

-

Side body air bags

Emergency notification

-

Head up display

Option package price:$500

Full speed rangeadaptive cruise

control

Traditional navigationsystem

-

-

Front collisionwarning

Side body & head airbags

-

Night vision withpedestrian detection

Head up display

Option package price:$12,000

None: I wouldn’t purchase any of these

packages

and MEM. We compare the three models, both withand without heterogeneity. We estimated the parame-ters for MNL, NestL, and MEM by maximizing thelog-likelihood function. For each respondent i ∈I=

811 0 0 0 15009, we use the first 12 of the 19 choice tasks

Table 1 Attribute and Level Codes

Serial no. Attribute name Attribute code No. of levels Level codes

1 Cruise control CC 3 CC1, CC2, CC32 Go notifier GN 2 GN1,GN23 Navigation system NS 5 NS1, NS2, NS3, NS4, NS54 Backup aids BU 6 BU1, BU2, BU3, BU4, BU5, BU65 Front park assist FA 2 FA1, FA26 Lane departure LD 3 LD1, LD2, LD37 Blind zone alert BZ 3 BZ1, BZ2, BZ38 Front collision warning FC 2 FC1, FC29 Front collision protection FP 4 FP1, FP2, FP3, FP4

10 Rear collision protection RP 2 RP1, RP211 Parallel park aids PP 3 PP1, PP2, PP312 Knee air bags KA 2 KA1, KA213 Side air bags SC 4 SC1, SC2, SC3, SC414 Emergency notification TS 3 TS1, TS2, TS315 Night vision system NV 3 NV1, NV2, NV316 Driver assisted adjustments MA 4 MA1, MA2, MA3, MA417 Low speed braking assist LB 4 LB1, LB2, LB3, LB418 Adaptive front lighting AF 3 AF1, AF2, AF319 Head up display HU 2 HU1, HU220 Price Price 11 $500, 1,000, 1,500, 2,000, 2,500, 3,000,

4,000, 5,000, 7,500, 10,000, 12,000

performed by the consumer to estimate the parame-ters of the models. Thus, each consumer i performsj ∈J= 811 0 0 0 1129 choice tasks in the calibration step.This corresponds to a total of 6,000 in-sample datapoints (consumer-choice task pairs). The set of available

Dow

nloa

ded

from

info

rms.

org

by [

103.

24.7

7.60

] on

25

Janu

ary

2017

, at 1

7:49

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Page 13: INFORMS is located in Maryland, USA Publisher: Institute ...natarajan_karthik/wp-content/uploads/2017/01/On...Vinit Kumar Mishra, Karthik Natarajan, Dhanesh Padmanabhan, Chung-Piaw

Mishra et al.: Marginal Distribution Choice Models1522 Management Science 60(6), pp. 1511–1531, © 2014 INFORMS

alternatives for each choice task is k ∈K= 811 0 0 0 149,where the fourth alternative is used to represent the“no choice” alternative. The product attribute vectorfor consumer i in choice task j for alternative k isdenoted by xijk.5 This is coded as effect-type dummyvariables, where each element of xijk was set to 1 if thecorresponding attribute level was present in the choicetask and 0 otherwise. The total number of attributes inthis data set was A= 71. We use the calibrated modelto predict the choices for the remaining seven tasks foreach consumer. This corresponded to a total of 3,500out-of-sample data points (consumer-choice task pairs).

The comparison study was carried out between threeclasses of models:

(a) Consumer homogeneity. In the simplest model, weassume that all the consumers are homogeneous intheir partworth preferences and scales of their idiosyn-cratic errors for different options. Under MNL, themaximum log-likelihood problem is formulated as aconvex optimization problem:

(MNL) maxÂ

i∈I

j∈J

k∈K

zijk ln(

eÂ′xijk

l∈K eÂ′xijl

)

1 (26)

where zijk = 1 if the consumer i ∈I in choice task j ∈Jchooses the alternative k ∈K and 0 otherwise.

Under MEM, the marginal distribution of eacherror term is modeled as an exponential distribution:P4�ijk ≤ t5= 1 − e−�kt . The parameter �k in this casecorresponds to the scale parameter for each of thefour choices k ∈K. Since the alternatives shown tothe consumers varied with tasks, we consider thesimplest possible model where the scale parametersfor the first three alternatives are assumed to be iden-tical (�1 = �2 = �3), but the scale parameter for theno choice alternative can be different. The maximumlog-likelihood problem in this case is

(MEM5 maxË1Â1�k

i∈I

j∈J

k∈K

zijk�k4Â′xijk −�ij5

s.t.∑

k∈K

e�k4Â′xijk−�ij 5 = 11 i ∈I1 j ∈J1

k∈K

�k =K1

�1 = �2 = �31

�k ≥ TOL1 k ∈K0

(27)

The constraint∑

k∈K�k =K serves as a normalizationconstraint. In our experiment, we used K = 4 to ensurethat the sum of the scale parameters equals the numberof alternatives. The constraints �k ≥ TOL ensure that the

5 In the earlier sections, we have assumed that each consumerperforms only one choice task, and hence we have omitted the indexj in our earlier discussion.

scale parameters are strictly positive. In the experiments,we set TOL = 001.

In the nested logit model, it is natural to form the firstthree options as a nest, while the outside option formsanother nest. The maximum log-likelihood problem isthus

(NestL)

maxÂ1�

i∈I

j∈J

( 3∑

k=1

zijk ln(

eÂ′xijk/�4

∑3l=1e

Â′xijl/�5�−1

4∑3

l=1eÂ′xijl/�5�+1

)

−zij4 ln(( 3∑

l=1

eÂ′xijl/�

)�

+1))

1

s.t. TOL≤�≤UB0

(28)

Note that the upper bound (UB) is imposed sincethe value of � must be within a particular range forthe model to be consistent with utility-maximizingbehavior (see, e.g., Train 2009). In our experiments,we set UB = 108 and the lower bound TOL to 001 asin MEM.

(b) Consumer partworth heterogenenity. The secondclass of models uses a regularization approach to modelconsumer partworth heterogeneity. In �-HetMNL,the log-likelihood objective is supplemented with aregularization term that pools information amongthe consumers and shrinks the individual part-worths toward the population mean. The convexoptimization problem for the �-HetMNL model isformulated as

4�-HetMNL5

maxÂi1Â01D

{

i∈I

j∈J

k∈K

zijk ln(

eÂ′ixijk

l∈K eÂ′ixijl

)

−�∑

i∈I

4Âi −Â05D−14Âi −Â05

}

1

s.t. D � 01 trace4D5= T 1

(29)

where the parameter � ≥ 0 captures the trade-offbetween the log-likelihood criterion and the regu-larization term. In our experiments without loss ofgenerality, we set T equal to the number of attributes.For a fixed value of �, this problem is convex. Wefound the value of � using cross-validation withadditional out-of-sample data. We vary � in a largeenough interval range to find the value of � thatprovides best performance out of sample in termsof log likelihood. The matrix D is obtained usingthe sequential optimization approach described inEvgeniou et al. (2007).

An extension of this model is to incorpo-rate partworth heterogeneity into the new MEM.

Dow

nloa

ded

from

info

rms.

org

by [

103.

24.7

7.60

] on

25

Janu

ary

2017

, at 1

7:49

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Page 14: INFORMS is located in Maryland, USA Publisher: Institute ...natarajan_karthik/wp-content/uploads/2017/01/On...Vinit Kumar Mishra, Karthik Natarajan, Dhanesh Padmanabhan, Chung-Piaw

Mishra et al.: Marginal Distribution Choice ModelsManagement Science 60(6), pp. 1511–1531, © 2014 INFORMS 1523

The optimization problem under �-HetMEM isformulated as

4�-HetMEM5

maxË1Âi1Â01�k1D

{

i∈I

j∈J

k∈K

zijk�k4Â′

ixijk −�ij5

−�∑

i∈I

4Âi −Â05D−14Âi −Â05

}

s.t.∑

k∈K

e�k4Â′ixijk−�ij 5 = 11 i ∈I1 j ∈J1

k∈K

�k = 41

�1 = �2 = �31

�k ≥ TOL1 k ∈K1

D � 01 trace4D5= T 0

(30)

(c) Consumer scale heterogenenity. The third class ofmodels assumes that  is common across the consumersbut allows for the scale parameter to vary acrossconsumers and alternatives. In our data set, since thefirst three alternatives vary across different choicetasks, we simplify the model by assuming that �i1 =

�i2 = �i3, but allow for the scale of the no choicealternative �i4 to be different. We term this model as �-HetMEM. The optimization problem under �-HetMEMis formulated as

4�-HetMEM5

maxË1Â1�ik1�1

{

i∈I

j∈J

k∈K

zijk�ik4Â′xijk−�ij5−�

i∈I

4�i1 −�152

}

s.t.∑

k∈K

e�ik4Â′xijk−�ij 5=11 i∈I1 j ∈J1

k∈K

�ik =41 i∈I1

�i1 =�i2 =�i31 i∈I1

�ik ≥TOL1 i∈I1 k∈K0

(31)

The parameter � captures the trade-off between thelog-likelihood criterion and the regularization termfor the scale parameters. Once again, we set � usingcross-validation.

Parallel to the �-HetMEM, one can also allowthe scale parameters to vary among different con-sumers using the classical MNL formula. Interestingly,a straightforward implementation of this scale het-erogeneous model does not improve the perfor-mance of the model. We can also incorporatescale heterogeneity into the nested logit model.We term this model as �-HetNest. We implementthis model as a benchmark for the �-HetMEM

model. The optimization problem under �-HetNest isformulated as

4�-HetNest5

maxÂ1�i1�

{

i∈I

j∈J

( 3∑

k=1

zijk ln(

eÂ′xijk/�i 4

∑3l=1e

Â′xijl/�i 5�i−1

4∑3

l=1eÂ′xijl/�i 5�i +1

)

−zij4 ln(( 3∑

l=1

eÂ′xijl/�i

)�i

+1))

−�∑

i∈I

4�i−�52

}

1

s.t. TOL≤�i ≤UB1 i∈I0 (32)

The parameter � again captures the trade-off betweenthe log-likelihood criterion and the regularization termfor the scale parameters.

5.2. ResultsWe compared the methods using four differentcriterion:

• Log likelihood 4LL5. The fit log likelihood (in sample)and the predicted log likelihood (out of sample) wereused to compare the fit and prediction of the choicemodels.

• Akaike information criterion 4AIC5. AIC is a measureof the relative goodness of fit of the model and isderived based on information entropy concepts. AICdescribes the trade-off between the accuracy and com-plexity of the model and is defined as AIC = 2P − LL,where P is the number of parameters to estimate inthe model, and LL is the log likelihood for the esti-mated model. A lower value of AIC indicates a morepreferred model. Thus, AIC rewards not only the good-ness of fit, but also includes a penalty that increaseswith the number of parameters estimated.

• Hit rate. The hit rate is defined as the fractionof choice tasks in which the alternative chosen bythe consumer coincides with the alternative with thehighest predicted probability. Hit rates were evaluatedfor both the in-sample and out-of-sample data.

• Computational time. The computational time thatwas required to solve the different optimization modelswere used to compare the models. The optimizationroutines were coded in AMPL and solved using theLOQO solver.

The results are provided in Tables 2 and 3. Althoughwe have tested the models for a wide range of �, foreach model, we report only the results with the valueof � selected by cross-validation. Several observationscan be made from the results:

(a) The models with homogeneous  performsroughly the same in both in-sample and out-of-sampleexperiments, in terms of LL and hit rate (around 46%to 51%).

(b) Incorporating heterogeneity in the consumerpartworths significantly improves the in-sample perfor-mance over the homogeneous partworth models—both

Dow

nloa

ded

from

info

rms.

org

by [

103.

24.7

7.60

] on

25

Janu

ary

2017

, at 1

7:49

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Page 15: INFORMS is located in Maryland, USA Publisher: Institute ...natarajan_karthik/wp-content/uploads/2017/01/On...Vinit Kumar Mishra, Karthik Natarajan, Dhanesh Padmanabhan, Chung-Piaw

Mishra et al.: Marginal Distribution Choice Models1524 Management Science 60(6), pp. 1511–1531, © 2014 INFORMS

Table 2 Fit Results for In-Sample Data

Method Log likelihood (LL) Parameters (P) AIC Time (seconds) Hit rate

Consumer homogeneityMNL −7121802283 71 7136002283 70627 004682MEM −7121706459 72 7136106459 440705 004677NestL −7121708022 72 7136108022 400925 004677

Consumer partworth heterogeneity�-HetMNL (� = 4) −3172505119 401612 84194905119 2110713 008223�-HetMEM (� = 4) −3172506659 401613 84195106659 3570723 008241

Consumer scale heterogeneity MEM�-HetMEM (� = 6) −5145702608 572 6160102608 120728 006035

Consumer scale heterogeneity nested Logit�-HetNest (� = 007) −6110001974 572 7124401974 54038 005075

Note. Results are indicated for the values of � selected by out-of-sample cross-validation.

log likelihood and hit rates improves drastically forthe �-HetMNL and �-HetMEM models. The in-samplehit rate is as high as 80%, whereas it is 60% in out-of-sample experiments. The performance of �-HetMNLand �-HetMEM are very similar, indicating againthat adding consumer invariant scale heterogeneitydoes not improve the �-HetMNL model. Note thatthese approaches are more computationally intensive,because of significantly more parameters that need tobe estimated for model specification. As suggested inEvgeniou et al. (2007), one also needs to sequentiallyoptimize over D and Â. Table 4 shows the results of thealgorithm in 10 steps of our implementation. We notethat a good choice of D can improve both the in-sampleand out-of-sample performances significantly. However,the procedure takes more time to solve nearer theoptimal choice of D. The reader is referred to Argyriouet al. (2008) for possible techniques to speed up con-vergence of these methods. The AIC values are thelargest for this approach, because of the large numberof parameters needed for the calibration problem.

(c) By incorporating respondent heterogeneity inthe scale parameters, we obtain the best performance

Table 3 Prediction Results for Out-of-Sample Data

Method Log likelihood (LL) Hit rate

Consumer homogeneityMNL −4104806055 00512MEM −4104808774 005111NestL −4104807111 005100

Consumer partworth heterogeneity�-HetMNL (� = 4) −3125600218 006194�-HetMEM (� = 4) −3125906518 006180

Consumer scale heterogeneity MEM�-HetMEM (� = 6) −3120104232 006206

Consumer scale heterogeneity Nested Logit�-HetNest (� = 007) −3159202794 005491

Note. Results are indicated for the values of � selected by out-of-samplecross-validation.

for out-of-sample experiments. There are much fewerparameters needed in the �-HetMEM model (with572 scale parameters), compared to �-HetMNL (withmore than 401000 parameters corresponding to customerpartworth terms). This is reflected in the AIC criterionwhere the �-HetMEM outperforms the �-HetMNLand �-HetMEM models significantly. The �-HetMEMalso has the best out-of-sample log likelihood withLL = −3120104232 (for � = 6). In terms of hit rates,capturing scale heterogeneity provides similar values tothe �-HetMEM and �-HetMNL models in out-of-sampleexperiments. In the appendix, we consider alternativemethods that incorporate scale heterogeneity in theMNL framework. The results indicate that �-HetMEMis still the most preferred model.

(d) In terms of running times, the MNL runs in theleast time followed by the �-HetMEM and then the�-HetMNL and �-HetMEM models. This is surprisingsince �-HetMEM is a nonconvex optimization prob-lem and �-HetMNL is convex. However, this can bepartly explained by the fact that �-HetMEM has farfewer variables compared with �-HetMNL. Note that�-HetMEM runs faster than MEM, indicating that theconvex regularization terms are effective in speedingup the convergence of the algorithm.

A natural extension based on these results is tobuild a more sophisticated consumer heterogeneity

Table 4 Results for �-HetMNL Model in � = 4 Case

Fit Predict Time� Step Fit LL hit rate Predict LL hit rate (seconds)

4 1 −4180200127 007905 −3165209236 005986 1506994 2 −4109208787 008210 −3136409695 006254 1109584 3 −3189101753 008178 −3129308904 006211 1109034 4 −3182701519 008178 −3127605331 006194 1206234 5 −3178900618 008200 −3126901673 006174 1403114 6 −3176305864 008227 −3126408488 006174 1604504 7 −3174700119 008225 −31261075611 006189 1605154 8 −3173604616 008233 −3125903701 006174 1906584 9 −3172907236 008232 −3125705152 006180 3605254 10 −3172505119 008223 −3125600216 006194 600321

Dow

nloa

ded

from

info

rms.

org

by [

103.

24.7

7.60

] on

25

Janu

ary

2017

, at 1

7:49

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Page 16: INFORMS is located in Maryland, USA Publisher: Institute ...natarajan_karthik/wp-content/uploads/2017/01/On...Vinit Kumar Mishra, Karthik Natarajan, Dhanesh Padmanabhan, Chung-Piaw

Mishra et al.: Marginal Distribution Choice ModelsManagement Science 60(6), pp. 1511–1531, © 2014 INFORMS 1525

model that accounts for both partworth and scaleheterogeneity as in Fiebig et al. (2010). However, theproblem becomes computationally intractable becauseof the difficulty in calibrating the D matrix for thisnonconvex problem. To get around such difficulty,we used only the simplest case where D was set tothe identity matrix and tested the performance of ourmodel, by allowing all or some of the  parametersto be heterogeneous. The problem is formulated asfollows:

4�1�-HetMEM5

maxË1Âi1�ik1�1

{

i∈I

j∈J

k∈K

zijk�ik4Â′

ixijk−�ij5

−�1

i∈I

4�i1 −�152−�2

i∈I

��i−Âi0�2

}

s.t.∑

k∈K

e�ik4Â′ixijk−�ij 5=11 i∈I1 j ∈J1

k∈K

�ik =41 i∈I1

�i1 =�i2 =�i31 i∈I1

�ik ≥TOL1 i∈I1 k∈K0

(33)

To avoid overfitting with many variables, we test theperformance again using only a partial set of  todepend on consumer demographics—we allow part-worths corresponding to the levels of attributes BZ, FP,RP, PP, and NV to differ among different consumers.All other partworth parameters are consumer invariant.From Table 5, we can see that the above modifica-tion has only slight improvement in the in-sampleand out-of-sample performances. For this data set,modeling alpha and beta heterogeneity together didnot provide significant improvement on the perfor-mance of �-HetMEM in out-of-sample experiments.This suggests that in this data set it is important tomodel the scale carefully by allowing for consumerheterogeneity in their perception of the outside option.Using the scale heterogeneity models (�-HetMEM) inthis data set provides a significant improvement interms of out-of-sample performance and better resultsthan partworth heterogeneity models (�-HetMNL or�-HetMEM). Furthermore, it requires only slightlymore computational time than the classical MNL model,

Table 5 Results for �1�-HetMEM

�1 �2 Fit LL Fit hit rate Predict LL Predict hit rate

5 10 −5131108948 00614 −3119809593 0062024 10 −5128508087 006155 −3121205870 0061946 10 −5135009046 00615 −3120800522 00625 9 −5130006860 006157 −3119900236 0062035 11 −5132101340 006128 −3119809718 0062

Table 6 Robustness to Varying Splits of Data Set

MNL �-HetMEM

Split Predict LL Predict hit rate Predict LL Predict hit rate TOL �

15:4 −21304020266 005235 −1182301053 006315 001 412:7 −4104806055 005120 −3120104232 006206 001 69:10 −5185009678 005036 −4180307499 006000 001 56:13 −7172309843 004835 −6159705596 005746 001 43:16 −91800066147 0045512 −9128408399 0051487 001 303:16 −91800066147 0045512 −9112609715 005220 002 83:16 −91800066147 0045512 −8191505699 005246 005 4

but improves upon its hit rate performance by morethan 20%.

We finally test for the robustness of the results bysplitting the data set in different ratios and compare thehomogeneous MNL model with �-HetMEM. Out of the19 choice tasks, we split the calibration to predictionset sizes in ratios of 15:4, 12:7 (the original split), 9:10,6:13, and 3:16. The results are provided in Table 6.From the table, we see that the improvements obtainedover the simple MNL model are fairly robust, in thatthe �-HetMEM provides improvement in terms ofout-of-sample hit rates from 15% to 20%.

5.3. Standard Error Estimation: Comparison withMixed Logit

For the second set of experiments, we test the accuracyof the standard error estimation from MDM. For non-parametric methods such as �-HetMNL, such estimatesneed to be computed using bootstrap methods. ForMDM, the optimality conditions and results from §3.5can be used to compute standard error estimates.We use the MMM class from MDM to test the accuracyof the standard error estimation. Note that we onlyneed to specify the means and variances of the marginaldistributions in the MMM model. We compare theperformance of MMM with the mixed logit model.Note that mixed logit requires extensive simulation forparameter estimation as well as choice probabilitiescalculation. We used the R software to implement themixed logit model, and LOQO to solve the MMM.

The demographic attributes for consumer i is denotedby di. We model them as continuous variables. We con-sider three demographic variables in our experiments:(1) income, (2) age, and (3) night (the percentage oftime consumer drives during night time), and use thefollowing utility specification for the two models:

Uik = 4Â+ Åa5′xijk +Â′

ddi + �ik1

i ∈I1 j ∈J1 k ∈ 81121391

Uij4 = �ij41 i ∈I1

(34)

where Åa models the taste variation in Â. We assumealso that the demographic variables affect the util-ity of the first three alternatives, but not the fourth

Dow

nloa

ded

from

info

rms.

org

by [

103.

24.7

7.60

] on

25

Janu

ary

2017

, at 1

7:49

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Page 17: INFORMS is located in Maryland, USA Publisher: Institute ...natarajan_karthik/wp-content/uploads/2017/01/On...Vinit Kumar Mishra, Karthik Natarajan, Dhanesh Padmanabhan, Chung-Piaw

Mishra et al.: Marginal Distribution Choice Models1526 Management Science 60(6), pp. 1511–1531, © 2014 INFORMS

(cf. Hoffman and Duncan 1988). Under the mixed logitmodel we assume that individual members of Åa arei.i.d. normal with zero means, and �ik are i.i.d. extremevalue type-I distributed. For this mixed logit model, and variances of Åa are estimated by maximizinga simulated likelihood function generated throughpseudo random number generations for Åa. UnderMMM, mean of utility for each choice is Â′xijk +Â′

ddi,and variance is x′

ijkèxijk +�2/6. With these, the MMMprobabilities can be used for estimation and predic-tion. We refer to MMM with random coefficients as“mixed MMM.”

We used the same setup as before to conduct thesecond experiment. We allow partworths correspond-ing to the levels of attributes BZ, FP, RP, PP, and NVto be random, while partworths of other attributeswere assumed to be deterministic. We have performedthe same experiment on other selections on prod-uct attributes, and the results are largely identical.We report the results for this experiment merely becausethe LL obtained from these models are the best amongall experiments performed.

Tables A.1 and A.2 in the appendix show the param-eter estimates corresponding to attribute levels, and theassociated standard errors, from simulation (for ML)and from the MDM approach (for MMM) based onresults in §3.5. A total of 74 parameters were estimated.It is interesting that the mixed MMM can generatemaximum likelihood estimators that are fairly similarto the mixed logit, a model that is widely used inpractice and well supported in theory. This seems tobe in general true for the MLE of means partworths ofattribute levels. Furthermore, based on the standarderrors calculated, both models identify the same set ofpartworth attribute levels that are significant. Figures 5and 6 show the scatter plots of the partworths and thecorresponding p-values, under both the MMM andMLE models.

Figure 5 The p-Value Estimation

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

p(MMM)

p(M

ixL)

Figure 6 Partworth Estimates

–4.0 –3.5 –3.0 –2.5 –2.0 –1.5 –1.0 –0.5 0 0.5 1.0–4.0

–3.5

–3.0

–2.5

–2.0

–1.5

–1.0

–0.5

0

0.5

1.0

MLE(MMM)

MLE

(Mix

L)

5.4. Managerial InsightsOur experiments on the vehicle features data providesome interesting managerial insights. For example,we observe that a technically advanced feature isnot always preferred by the consumer over the lessadvanced one. For the feature attribute navigationsystem (NS), the levels NS3 and NS4 are technicallymore advanced as compared with NS1, the former twohaving extra features on top of the latter. Interestingly,the value of partworth estimates for NS1 is higherthan NS3 and NS4. A similar observation holds forattribute lane departure (LD) as well. There can beseveral reasons for consumers to prefer a less technicallyadvanced product over a technically more advancedone. It may be due to the lack of prior experiencewith such new technologies, or because consumersprefer human in-the-loop kind of technologies. Anotherinteresting implication of this observation is that abundle of two attributes is not always desired byconsumers. It is possible, for example, that consumersprefer an attribute when presented in the absence ofanother desired attribute more than a bundle of thetwo. Our studies also reveal that explicitly showing“none” for the attribute front collision protection (FP)would have a significant negative utility compared tonot showing the attribute at all.

The relative preferences across attributes also pro-vide useful insights on willingness to pay for variousattribute levels. For example, the MNL as well asthe MMM suggest that NV2, NS1, NS4, SC3, andBU4 are the top five attribute levels in terms of high-est partworth estimates and consequently highestwillingness to pay. Such insights can be used forevaluating proposed feature packages on vehicles andfor determining the required pricing for these packagesto obtain a balance between demand and profits.

The test implementations of the mixed models sug-gest there can be significant heterogeneity in preferencesof certain attributes like blind zone alert and parallelpark aids, and it might be prudent to look at underlyingtaste variations among the consumers.

Dow

nloa

ded

from

info

rms.

org

by [

103.

24.7

7.60

] on

25

Janu

ary

2017

, at 1

7:49

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Page 18: INFORMS is located in Maryland, USA Publisher: Institute ...natarajan_karthik/wp-content/uploads/2017/01/On...Vinit Kumar Mishra, Karthik Natarajan, Dhanesh Padmanabhan, Chung-Piaw

Mishra et al.: Marginal Distribution Choice ModelsManagement Science 60(6), pp. 1511–1531, © 2014 INFORMS 1527

6. ConclusionIn this paper, we have provided new theoretical andempirical insights into the semiparametric approachintroduced in Natarajan et al. (2009) for discrete choicemodeling. Using appropriate choices of marginal distri-butions, we can reconstruct many interesting choiceprobability formulas such as MNL, nested logit, andGEV. This allows us to estimate and calibrate theparameters in these models using standard optimiza-tion tools, from which the convex nature of manyclassical log-likelihood maximization models followeasily. The numerical results using a set of conjointdata confirm the efficacy of this approach. By explicitlyincorporating consumer heterogeneity into the choicemodel, we develop a new parsimonious choice modelcalled �-HetMEM and show that it is possible to obtaingood out-of-sample performance by allowing for thescale parameter of the outside option to be potentiallydifferent across respondents. Capturing scale hetero-geneity with a simple model in this data set providessignificant improvements over other representations ofheterogeneity. In a separate experiment using the sameset of data, the mixed form of MMM essentially picksup the same set of significant parameters as the mixedlogit model, albeit at a fraction of the computationalefforts needed.

The insights obtained in this study are potentiallyuseful for other settings. In most operations manage-ment problems involving choice models, invariablythe outside option (i.e., “no purchase”) is always oneof the choices available. This paper proposes a parsi-monious and yet computationally tractable methodto incorporate this phenomenon into choice predic-tion, allowing for consumer level heterogeneity in theidiosyncratic error distribution across product options.The performance of this method on an empirical dataset is encouraging—the �-HetMEM model retains thesimplicity of the MNL method, and yet it can improvethe out-of-sample hit rate by close to 20%. A natu-ral extension is to examine the performance of thisapproach in a revenue management setting, where theMNL has been used predominantly because of its easeof use.

Although we restrict our discussion to discrete choicemodels, it should be noted that the model of Natarajanet al. (2009) applies to general 0-1 optimization prob-lems. This would allow to extend some of the resultsin this paper to estimate the partworths of more com-plicated choice tasks, when there are trade-offs andconstraints in the selection of products (such as in theassignment problem, and the shortest path problem).This is potentially useful also for noncompensatorychoice modeling, if we impose an additional constraint(with random parameters) into the MDM problem. Thiscan be used to model the ad hoc screening process usedby consumers in choosing products to evaluate, and

will entail generalizing the MDM model into the casewhere the objective and constraints are both randomlygenerated.

Another interesting issue is to understand howthe identification for the � and � parameters maybe affected by the dimensionality of the problem,measured by the number of alternatives as well asnumber of attributes in the problem. What is the effectof dimensionality in the quality of the identification?We leave this and other issues for future research.

AcknowledgmentsThe authors thank Eric Bradlow (special issue coeditor), theassociate editor, and two anonymous reviewers for theirvaluable comments and detailed suggestions on improvingthis paper. The authors thank the staff from the formerGeneral Motors India Laboratory for initiating this project.The research in this paper was funded by the NUS-GMresearch [Grant R-314-000-087-597]. The research of VinitKumar Mishra and Xiaobo Li was also partly funded by theSUTD-MIT International Design Center [Grant IDG31300105]on “Optimization for Complex Discrete Choice.”

AppendixProof of Theorem 2

Proof. Under MDM, the choice probabilities are given as

Pik = 1 − Fik4�i −Vik5 ∀k ∈K

where �i is the solution to the normalization condition:

k∈K

41 − Fik4�i −Vik55= 10

Under the assumption that Fik4 · 5 is strictly increasing on itssupport, it is clear that a unique value of the multiplier �i

satisfies the normalization condition. For the semi-infinitesupport, this value of �i will lie in the range 4maxk∈K4Vik +

�ik51�5, and for the infinite support, the value of � willlie in the range (−�1�). Hence, the choice probabilitiesstrictly satisfy Pik ∈ 40115 with

k Pik = 1. To complete theproof, we show that for choice probabilities in the interior ofthe simplex, there is a unique set of values for the utilities(Vi21 0 0 0 1ViK) with Vi1 = 0 and the multiplier �i. From the setof equalities, we get

�i = F −1i1 41 − Pi15

andVik = F −1

i1 41 − Pi15− F −1ik 41 − Pik5 ∀k ∈K0

Hence, there is a one-to-one mapping between the set of deter-ministic utilities in <K−1 and the set of choice probabilities inthe interior of the unit simplex. �

The exponential marginal distribution for MDM thatrecreates the MNL choice probabilities, the generalizedexponential distribution for MDM that recreates GEV choiceprobabilities, and the t-distribution for MDM that recreatesthe MMM choice probabilities satisfy the conditions inTheorem 2.

Identifiability of Vi and FikAssume that Vik is an affine function of the attributes of thekth alternative, i.e., Vik = �′xik . Let F be a prespecified family

Dow

nloa

ded

from

info

rms.

org

by [

103.

24.7

7.60

] on

25

Janu

ary

2017

, at 1

7:49

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Page 19: INFORMS is located in Maryland, USA Publisher: Institute ...natarajan_karthik/wp-content/uploads/2017/01/On...Vinit Kumar Mishra, Karthik Natarajan, Dhanesh Padmanabhan, Chung-Piaw

Mishra et al.: Marginal Distribution Choice Models1528 Management Science 60(6), pp. 1511–1531, © 2014 INFORMS

of distributions containing possible error distributions. Wewould like to answer the following question:

Are there distinct solutions 4�a1 8F aik4 · 595 and 4�b1 8F b

ik4 · 595,with F a

ik4 · 51 Fbik4 · 5 ∈F, such that

1 − F aik4�

ai 4x5− 4�a5′xik5= 1 − F b

ik4�bi 4x5− 4�b5′xik5 for all k1

and∑

k∈K

41−F aik4�

ai 4x5− 4�a5′xik55=

k∈K

41−F bik4�

bi 4x5− 4�b5′xik55= 11

for some �ai 4x5, �

bi 4x5, given xik?

If we fix 4�a5′xi1 = 4�b5′xi1 = 0 and standardize F ai14 · 5 =

F bi14 · 5, then we have 1 − F a

i14�ai 4x55= 1 − F b

i14�bi 4x55, and hence

�ai 4x5= �b

i 4x5. If the family F is chosen in such a way thatF aik4�i4x5 − 4�a5′xik5 = F b

ik4�i4x5 − 4�b5′xik5 for all xik only ifF aik4 · 5= F b

ik4 · 5 and �a = �b , then we would be able to identify� and Fik4 · 5 from the choice probabilities. For instance, whenF= 8Fik4x5= 1 − e−�ikx2 �ijk > 09, then F a

ik4�i4x5− 4�a5′xik5=

F bik4�i4x5− 4�b5′xik5 holds if and only if �a

ik4�i4x5− 4�a5′xik5=

�bik4�i4x5− 4�b5′xik5 for all xik . Since the Lagrange multiplier

�i4x5 cannot be a linear function of x, this is possible only if�aik = �b

ik and �a = �b .

Proof of Theorem 4We rewrite problem (15) in the following manner.

max�1�

{

i∈I

k∈K

zik

(

−�i +�′xik

+ 4�r − 15 ln(

l∈Br 4k5

e41/�r 54�′xil−�′xik5

))}

s.t.R∑

r=1

(

l∈Br

e4�′xil5/�r

)�r

× e−�i ≤ 11 i ∈I0

(35)

Since � ∈ 40117 and ln∑

i exi is convex in x, the objective

function of this problem is concave. To show that the feasi-ble region is convex, it suffices to show that the functiong4x5= 4

i∈K exi/a5a is convex in x for a ∈ 40117. The Hessian

matrix ï 24g4x55 of this function is 441′z5a−2/a544a− 15zz′ +

41′z5diag4z55, where z = 4ex1/a1 ex2/a1 0 0 05, diag4z5 is the diago-nal matrix with elements of z in diagonal, and 1 is the vectorof ones. Note that 41′z5a−2/a > 0 for a ∈ 40117. Furthermore,

44a− 15zz′+ 41′z5diag4z55= 4azz′

+ 1′z diag4z5− zz′50

Clearly, zz′ � 0. Furthermore, 1′z diag4z5− zz′ � 0 due to theCauchy–Schwarz inequality. The convexity of g4x5 followsfrom the positive-semidefiniteness of Hessian ï 24g4x55. �

Heterogeneity in MNL Type ModelsNote that there are several ways in which one could introduceheterogeneity into the scale parameters in MNL type models.If we simply allow each �i to be �i� for some value �i andfixed �, and add a regularization term as before to penalizedeviation of �i from some number �0, then the calibrationproblem indeed becomes much simpler than �-HetMNL. Themodel is

4�-HetMNL5 max�1�1�i1�0

{

i∈I

j∈J

k∈K

zijk ln(

e�i�′xijk

l∈Ke�i�

′xijl

)

−�∑

i∈I

4�i−�052}

s.t. DOWN≤�i ≤UP1 i∈I0

(36)

This model is indeed similar to �-HetMEM, except thatits emphasis is on the scale of every consumer, whereasin �-HetMEM, the emphasis is on the perception of theconsumer in the outside option. The results are provided inthe following table.

Fit Fit Predict PredictDOWN UP � LL HR LL HR

0.1 4 0 −6125301878 004561 −3162106659 0.50820.1 4 005 −6132804420 00462 −3161704705 0.51280.1 4 1 −6136009836 00465 −3162207267 0.51480.1 2 0 −6130507746 004597 −3161702259 0.51260.1 2 005 −6132805502 00462 −3161705081 0.51280.1 2 1 −6136009836 00465 −3162207266 0.51481 2 0 −6181607917 004706 −3182209386 0.50941 2 005 −6181905745 004705 −3182208581 0.50971 2 1 −6182804274 004706 −3182705302 0.5097

By varying the values of �, UP, and DOWN to test theperformance, the results show that the prediction perfor-mance (LL ≈ −31620) lies in the halfway between MNL(LL = −4104806055) and �-HetMEM (LL = −3120104232).Hence, this version of �-HetMNL does not perform aswell.

An alternative approach is to use the HEV model withGumbel random variables with different scales. However,since the HEV model does not possess closed-form choiceprobabilities, it is computationally challenging to try andestimate the heteroskedastic parameter for every consumerwith regularization terms. The simulated experiments in §4indicate that the �-HetMEM is fairly good at capturing theparameters coming from the HEV model. The last modelthat we experimented with was a modified MNL (MMNL)in the spirit of our �-hetMEM. The calibration under thisproblem is

4�-HetMMNL5 max�1�1�ik1�0

{

i∈I

j∈J

k∈K

zijk ln(

e�ik�′xijk

l∈Ke�il�

′xijl

)

−�∑

i∈I

4�i1 −�152}

s.t.∑

k∈K

�ik =41 i∈I1

�i1 =�i2 =�i31 i∈I1

�ik ≥TOL1 i∈I1k∈K0

(37)

Although this model has similarities to the �-HetMEM model,the performance is not as good. The following table providesthe results:

� Fit LL Fit HR Predict LL Predict HR

0.1 −6135002662 004615 −3162300763 005140.5 −6135401941 00462 −3162001574 0051342 −6139900739 004661 −3163203006 005134

Again the results indicate that both the in-sample andout-of-sample performances are similar to the �-HetMNLmodel, which is not as good as �-HetMEM.

Dow

nloa

ded

from

info

rms.

org

by [

103.

24.7

7.60

] on

25

Janu

ary

2017

, at 1

7:49

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Page 20: INFORMS is located in Maryland, USA Publisher: Institute ...natarajan_karthik/wp-content/uploads/2017/01/On...Vinit Kumar Mishra, Karthik Natarajan, Dhanesh Padmanabhan, Chung-Piaw

Mishra et al.: Marginal Distribution Choice ModelsManagement Science 60(6), pp. 1511–1531, © 2014 INFORMS 1529

Comparison of Mixed MMM and Mixed Logit

Table A.1 Estimation Results for Mixed MMM and Mixed Logit Model-I

Mixed MMM Mixed logit

Parameter MLE Std. error t-value Pr > �t� Parameter MLE Std. error t-value Pr > �t�

cc1∗∗∗ 00398662 0009054572 4040288069 1008705E−05 cc1∗∗∗ 0032574 00076165 402768 0000001896cc2∗∗∗ 00339607 00088727307 30827536451 00000130769 cc2∗∗∗ 0027439 00074179 30699 000002164cc3∗∗∗ 0036051 00089652707 40021183641 5086277E−05 cc3∗∗∗ 0030695 00074385 401265 0000003683gn1 00115812 00079975188 10448099121 00147642378 gn1 00086939 00068643 102665 002053253gn2 000803683 00080173937 10002424269 00316179766 gn2 00052636 00068456 007689 004419485

ns1∗∗∗ 00540169 00102108237 50290160884 1026607E−07 ns1∗∗∗ 0045493 00084169 50405 6048E−08ns2 000358746 00103254123 0034743988 00728273249 ns2 00032526 00086865 003744 00708079ns3 00151385 00106853961 10416746734 00156609696 ns3 0013628 00088261 105441 001225607

ns4∗∗∗ 00487241 00103093018 40726226927 203403E−06 ns4∗∗∗ 0042011 00083152 500523 40364E−07ns5 −00124014 00104530081 −10186395326 00235513907 ns5 −00098734 00089872 −100986 002719379bu1∗ 00247324 00109618994 20256214822 00024093575 bu1* 0022245 00090652 204539 000141332

bu2** 00315278 00107954458 20920472248 0000350823 bu2** 0028619 00090938 301471 000016488bu3∗∗∗ 00432823 00109196941 30963691632 7046698E−05 bu3∗∗∗ 0038938 00090135 403199 0000001561bu4∗∗∗ 00432593 00112279847 30852810733 00000118006 bu4∗∗∗ 0038393 00089911 402701 0000001954bu5** 00314371 00112985732 20782395571 00005413044 bu5** 0029076 00091585 301748 000014993bu6 −000833997 00112864517 −00738936399 00459974964 bu6 −00085001 00092655 −009174 003589369

fa1∗∗∗ 00279762 00080407507 30479301999 00000506349 fa1∗∗∗ 0024716 00068751 30595 000003243fa2 000608508 00080027742 00760371324 00447062992 fa2 00052879 00069081 007655 004439982

ld1∗∗∗ 00361308 00089857016 4002092143 5086928E−05 ld1∗∗∗ 0032315 00075978 402532 0000002108ld2** 00283853 000882024 30218200412 00001296941 ld2** 002294 00073996 301002 000019341ld3 00133014 00089058556 10493556662 00135344917 ld3 0010749 00076837 103989 001618506

bz1* 0029458 00118661916 20482515115 00013073408 bz1∗∗∗ 0031615 00075622 401806 0000002907bz2** 00344085 00109854458 30132189694 00001743554 bz2∗∗∗ 003754 00078809 407635 00000001903

bz3 −000233921 00107942655 −00216708586 00828442936 bz3 −0001938 00084974 −002281 00819593fc1∗∗∗ 00367185 00079603264 40612687736 4005878E−06 fc1∗∗∗ 0031237 00068204 405799 00000004651

fc2 00101607 00081161943 1025190448 00210654216 fc2 00077122 0006974 101059 002687908fp1 00204973 00122518399 10672997699 00094380675 fp1** 0024487 0008885 20756 000058511

fp2∗∗∗ 00448341 0011362379 30945837389 8004433E−05 fp2∗∗∗ 0038893 00081675 407619 00000001917fp3 00199766 00124837987 10600202033 00109607222 fp3* 0021386 00087592 204416 000146223fp4 −00223402 00121143679 −10844107772 00065217457 fp4 −0017036 0010591 −106085 001077203

rp1* 00235401 00093065203 20529420163 00011450898 rp1∗∗∗ 0023578 00070084 303642 000007676rp2 0000658772 00100388209 00065622448 00947680636 rp2 00031849 00070654 004508 006521464

∗p < 0005; ∗∗p < 0001; ∗∗∗p < 00001.

Table A.2 Estimation Results for Mixed MMM and Mixed Logit Model-II

Mixed MMM Mixed logit

Parameter MLE Std. error t-value Pr > �t� Parameter MLE Std. error t-value Pr > �t�

pp1∗∗∗ 00405399 00106963117 30790082148 00000152096 pp1∗∗∗ 0032397 00078736 401147 0000003878pp2∗ 00234565 00107242461 20187239996 00028764062 pp2** 0023175 00081958 208276 000046897pp3 000948428 00112283625 00844671693 00398328359 pp3 0011533 00076031 105169 001292989

ka1∗∗∗ 00333277 00080610221 40134426055 306077E−05 ka1∗∗∗ 0029162 00070169 40156 0000003239ka2∗∗ 00209422 00080676056 20595838351 00009459391 ka2** 0018243 00069632 206199 000087961sc1∗∗∗ 00339823 00095637405 3055324363 00000383482 sc1∗∗∗ 0028495 00080582 305362 00000406sc2** 00301116 00096262222 301280807 00001768074 sc2** 0026195 0007968 302875 000010108sc3∗∗∗ 00549228 00096989831 50662737987 1055991E−08 sc3∗∗∗ 0047159 0008033 508707 4034E−09

sc4 0000829448 00095891568 00086498533 00931073052 sc4 −000030269 00081662 −000371 009704321ts1∗∗∗ 003833 00089413697 40286815273 1084136E−05 ts1∗∗∗ 0032855 00075344 403606 0000001297ts2∗∗∗ 00457622 00089413218 501180576 3018456E−07 ts2∗∗∗ 0039905 00075332 502972 10176E−07ts3* 00177221 00089631421 10977219572 00048063213 ts3* 0016714 00076673 201799 000292661

nv1∗∗∗ 00467919 00105160324 4044957737 8076185E−06 nv1∗∗∗ 0042944 00077168 50565 20621E−08nv2∗∗∗ 00677672 00104949418 60457129644 1015211E−10 nv2∗∗∗ 005896 00078425 705179 50573E−14

nv3 −00157573 00114562655 −10375430765 00169050066 nv3 −00083872 00093388 −008981 003691336ma1 00159165 0009554434 10665875755 0009579116 ma1 0012147 00080026 105179 001290386ma2 000664014 00097047423 00684216004 00493865605 ma2 00047416 00080955 005857 005580738

ma3∗∗∗ 0034282 00096013298 3057054709 00000359068 ma3∗∗∗ 0028601 00079795 305843 00000338

Dow

nloa

ded

from

info

rms.

org

by [

103.

24.7

7.60

] on

25

Janu

ary

2017

, at 1

7:49

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Page 21: INFORMS is located in Maryland, USA Publisher: Institute ...natarajan_karthik/wp-content/uploads/2017/01/On...Vinit Kumar Mishra, Karthik Natarajan, Dhanesh Padmanabhan, Chung-Piaw

Mishra et al.: Marginal Distribution Choice Models1530 Management Science 60(6), pp. 1511–1531, © 2014 INFORMS

Table A.2 (Continued)

Mixed MMM Mixed logit

Parameter MLE Std. error t-value Pr > �t� Parameter MLE Std. error t-value Pr > �t�

ma4 00110704 00096065334 10152382398 00249210581 ma4 00079377 00080246 009892 003225843lb1 000965966 00094774553 10019225061 00308137852 lb1 00070988 0007929 008953 003706311lb2 00143061 00097531122 10466824095 00142477121 lb2 0010881 00080141 103578 001745426lb3 00169312 000950662 10780990502 0007496535 lb3 0013181 00080465 106382 001013889lb4 −000722387 00095989094 −00752571955 00451737149 lb4 −00074834 00080996 −009239 003555237

af1∗∗∗ 00365297 00087662821 40167068719 3012944E−05 af1∗∗∗ 0030735 00074125 401464 0000003377af2** 00247988 00088843965 20791275682 00005266925 af2** 0021493 00074818 208728 000040691af3 −000235715 00091693143 −00257069386 00797134159 af3 −00026228 00077915 −003366 007364008

hu1∗∗∗ 00349939 00079554898 40398710952 1010808E−05 hu1∗∗∗ 0029532 00068467 403133 0000001609hu2 000993129 00080882392 10227867995 00219545426 hu2 0007005 0006983 100032 003157862

Price2∗∗∗ −00732407 00130178037 −50626194849 1092683E−08 Price2∗∗∗ −0057239 00083242 −608762 60147E−12Price3∗∗∗ −101495 00131429762 −80746116392 2084051E−18 Price3∗∗∗ −0092979 00087734 −1005978 <202e − 16Price4∗∗∗ −1047071 00135820887 −1008283051 4053197E−27 Price4∗∗∗ −10203 00093321 −1208908 <202e − 16Price5∗∗∗ −1073634 00138530497 −12053399097 1037502E−35 Price5∗∗∗ −10456 00097447 −1409411 <202e − 16Price6∗∗∗ −1084914 00140580221 −13015362851 5069824E−39 Price6∗∗∗ −105448 00099941 −1504569 <202e − 16Price7∗∗∗ −2018123 00151474296 −14040000093 3011346E−46 Price7∗∗∗ −108345 0010464 −1705316 <202e − 16Price8∗∗∗ −2037265 00153351597 −15047196149 5079821E−53 Price8∗∗∗ −200085 0010753 −1806783 <202e − 16Price9∗∗∗ −2047772 00161114507 −15037862761 2032731E−52 Price9∗∗∗ −200826 0011033 −1808756 <202e − 16

Price10∗∗∗ −2098843 00184297331 −16021526465 6088454E−58 Price10∗∗∗ −205157 0012562 −2000261 <202e − 16Price11∗∗∗ −3053154 0021556974 −16038235495 5003245E−59 Price11∗∗∗ −208943 0013505 −2104314 <202e − 16Price12∗∗∗ −3044198 0020769941 −16057192957 2051211E−60 Price12∗∗∗ −208617 0013181 −2107101 <202e − 16

agea∗∗∗ −0001825794 00001024016 −17082974051 2064459E−69 agea∗∗∗ −00016952 000023411 −702412 40448E−13nighta 00002489172 00005281256 00471321934 00637428252 nighta 000023613 000017368 103595 001739863

incomea_n∗∗ 1085482E−06 7008457E−07 2061811118 00008864352 incomea_n∗∗ 106895E−06 602719E−07 206938 00007065∗p < 0005; ∗∗p < 0001; ∗∗∗p < 00001.

ReferencesAllenby GM, Rossi PE (1999) Marketing models of consumer hetero-

geneity. J. Econometrics 89(1–2):57–78.Anderson SP, De Palma A, Thisse J-F (1988) A representative

consumer theory of the logit model. Internat. Econom. Rev.29(3):461–466.

Argyriou A, Evgeniou T, Pontil M (2008) Convex multi-task featurelearning. Machine Learn. 73(3):243–272.

Bertsimas D, Natarajan K, Teo CP (2006) Persistence in discreteoptimization under data uncertainty. Math. Programming, Ser. B108(2–3):251–274.

Bhat CR (1995) A heteroscedastic extreme value model of intercitymode choice. Transportation Res. Part B 29(6):471–483.

Brownstone D, Small KAK (1989) Efficient estimation of nested logitmodels. J. Bus. Econom. Statist., Amer. Statist. Assoc. 7(1):67–74.

Chrzan K, Elrod T (1995) Choice-based approach for large numbersof attributes. Marketing News 29(1):20–24.

Daganzo CF, Kusnic M (1993) Two properties of the nested logitmodel. Transportation Sci. 27(4):395–400.

Danaher PJ (2007) Modeling page views across multiple websiteswith an application to Internet reach and frequency prediction.Marketing Sci. 26(3):422–437.

Danaher PJ, Smith MS (2011) Modeling multivaraite distribu-tions using copulas: Applications in marketing. Marketing Sci.30(1):4–21.

Evgeniou T, Pontil M, Toubia O (2007) A convex optimizationapproach to modeling consumer heterogeneity in conjointestimation. Marketing Sci. 26(6):805–818.

Farias VF, Jagabathula S, Shah D (2013) A nonparametric approachto modeling choice with limited data. Management Sci. 59(2):305–322.

Fiebig DG, Keane MP, Louviere J, Wasi N (2010) The generalizedmultinomial logit model: Accounting for scale and coefficientheterogeneity. Marketing Sci. 29(3):393–421.

Greene WH (2011) Econometric Analysis, 7th ed. (Prentice Hall, UpperSaddle River, NJ).

Hofbauer J, Sandholm WH (2002) On the global convergence ofstochastic fictitious play. Econometrica 70(6):2265–2294.

Hoffman SD, Duncan GJ (1988) Multinomial and conditionallogit discrete-choice models in demography. Demography 25(3):415–427.

Louviere J (1988) Conjoint analysis modelling of stated preferences.A review of theory, methods, recent developments and externalvalidity. J. Transport Econom. Policy 22(1):93–119.

Louviere J, Meyer RJ (2007) Formal choice models of informalchoices: What choice modeling research can (and can’t) learnfrom behavioral theory. Malhotra NK, ed. Review of MarketingResearch (M. E. Sharpe, New York), 3–32.

Louviere J, Swait J (2010) Discussion of “alleviating the constantstochastic variance assumption in decision research: Theory,measurement and experimental test.” Marketing Sci. 29(1):18–22.

Louviere J, Street S, Carson R, Ainslie A, Deshazo JR, Cameron T,Hensher D, Kohn R, Marley T (2002) Dissecting the randomcomponent of utility. Marketing Lett. 13(3):177–193.

Manski CF (1975) Maximum score estimation of the stochastic utilitymodel of choice. Zarembka P, ed. J. Econometrics 3:205–228.

McFadden D (1974) Conditional logit analysis of qualitative choicebehavior. Zarembka P, ed. Frontiers in Econometrics (AcademicPress, New York), 105–142.

McFadden D (1977) A closed-form multinomial choice model with-out the independence from irrelevant alternatives restrictions.Working Paper 7703. Urban Travel Demand Forecasting Project,Institute of Transportation Studies, University of California,Berkeley, Berkeley.

McFadden D (1978) Modelling the choice of residential location.Karlqvist A, Lundqvist L, Snickars F, Weibull J, eds. Spatial Inter-action Theory and Planning Models (North Holland, Amsterdam),75–96.

Natarajan K, Song M, Teo C-P (2009) Persistency model and itsapplications in choice modeling. Management Sci. 55(3):453–469.

Nelsen RB (2006) An Introduction to Copulas, 2nd ed., Springer Seriesin Statistics (Springer, New York).

Dow

nloa

ded

from

info

rms.

org

by [

103.

24.7

7.60

] on

25

Janu

ary

2017

, at 1

7:49

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Page 22: INFORMS is located in Maryland, USA Publisher: Institute ...natarajan_karthik/wp-content/uploads/2017/01/On...Vinit Kumar Mishra, Karthik Natarajan, Dhanesh Padmanabhan, Chung-Piaw

Mishra et al.: Marginal Distribution Choice ModelsManagement Science 60(6), pp. 1511–1531, © 2014 INFORMS 1531

Norets A, Takahashi S (2013) On the surjectivity of the mappingbetweeen utilities and choice probabilities. Quant. Econom. 4(1):149–155.

Park Y-H, Fader PS (2004) Modeling browsing behavior at multiplewebsites. Marketing Sci. 23(2):280–303.

Salisbury LC, Feinberg FM (2010) Alleviating the constant stochasticvariance assumption in decision research: Theory, measurementand experimental test. Marketing Sci. 29(1):1–17.

Sawtooth Software (2008) CBC v6.0. Sawtooth Software Inc., Sequim,WA. http://www.sawtoothsoftware.com/download/techpap/cbctech.pdf.

Schweidel DA, Fader PS, Bradlow ET (2008) A bivariate timingmodel of consumer acquisition and retention. Marketing Sci.27(5):829–843.

Seetharman PB, Chib S, Ainslie A, Boatwright P, Chan T, Gupta S,Mehta N, Rao V, Strijnev A (2005) Models of multi-categorychoice behavior. Marketing Lett. 16(3/4):239–254.

Sklar A (1959) Fonctions de répartition á n dimensions et leursmarges. Publ. Inst. Statist. Univ. Paris 8:229–231.

Steenburgh TJ, Ainslie AS (2010) Substitution patterns of theranndom coefficients logit. HBS Marketing Unit WorkingPaper 10-053, Harvard Business School, Boston. http://ssrn.com/abstract=1535329.

Train K (2009) Discrete Choice Methods with Simulation, Second ed.(Cambridge University Press, Cambridge, UK).

Weiss G (1986) Stochastic bounds on distributions of optimal valuefunctions with applications to PERT, network flows and reliability.Oper. Res. 34(4):595–605.

Dow

nloa

ded

from

info

rms.

org

by [

103.

24.7

7.60

] on

25

Janu

ary

2017

, at 1

7:49

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


Recommended