CONSUMER DECISIONS ON SHARE OF WALLET, AUTOMOBILE SEARCH,
AND ONLINE PRODUCT REVIEWS
by
Sungha Jang APPROVED BY SUPERVISORY COMMITTEE: ___________________________________________ Brian T. Ratchford, Chair ___________________________________________ Ashutosh Prasad, Co-Chair ___________________________________________ B.P.S. Murthi ___________________________________________ Gonca Soysal
Copyright 2011
Sungha Jang
All Rights Reserved
To my parents, Judeok Jang and Boksun Kim
CONSUMER DECISIONS ON SHARE OF WALLET, AUTOMOBILE SEARCH,
AND ONLINE PRODUCT REVIEWS
by
SUNGHA JANG, B.A., M.B.A.
DISSERTATION
Presented to the Faculty of
The University of Texas at Dallas
in Partial Fulfillment
of the Requirements
for the Degree of
DOCTOR OF PHILOSOPHY IN
MANAGEMENT SCIENCE
THE UNIVERSITY OF TEXAS AT DALLAS
May, 2011
UMI Number: 3450462
All rights reserved
INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted.
In the unlikely event that the author did not send a complete manuscript
and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion.
UMI 3450462
Copyright 2011 by ProQuest LLC. All rights reserved. This edition of the work is protected against
unauthorized copying under Title 17, United States Code.
ProQuest LLC 789 East Eisenhower Parkway
P.O. Box 1346 Ann Arbor, MI 48106-1346
v
ACKNOWLEDGEMENTS
I started the long journey of studying marketing keeping in mind my mother’s saying that ‘you
can travel in a long distance by asking other people’. As I am writing these acknowledgments to
my dissertation, I am so deeply indebted to so many people who have guided me in this journey.
I have benefited greatly from the aid of my advisor, Dr. Brian Ratchford. His belief and
encouragement raised my passion and ability and were vital to the successful completion of the
theses. I am honored to work under his insightful advice. I also offer special thanks to my co-
advisor, Dr. Ashutosh Prasad. His thoughtful and constructive advice has showed me the way of
a researcher. I could not have completed my dissertation without his guidance.
I wish to thank two other committee members as well. I owe special thanks to Dr. B.P.S. Murthi
for his academic advice and considerate care throughout the program. I thank Dr. Gonca Soysal
for her interest and helpful comments on my research. I must also express my appreciation to the
marketing faculty at the University of Texas at Dallas. I am especially grateful to Dr. Ram Rao
and Dr. Nanda Kumar for their passionate and instructive guide on research.
At last, I show my greatest gratitude to my parents, who always guide me with wisdom. With
their trust, encouragement, and love, I am able to finish this long journey to a marketing Ph.D.
March, 2011
vi
CONSUMER DECISIONS ON SHARE OF WALLET, AUTOMOBILE SEARCH,
AND ONLINE PRODUCT REVIEWS
Publication No. ___________________
Sungha Jang, Ph.D. The University of Texas at Dallas, 2011
ABSTRACT
Supervising Professor: Brian T. Ratchford The objective of three essays is to understand consumers’ decisions on allocating budget to credit
card expenditures, using information sources for automobile purchases, and incorporating online
product reviews with their prior knowledge.
In the first essay, we examine how consumers allocate their budget to multiple firms and
categories. As expenditures are simultaneous and censored, we propose a Bayesian estimation of
a simultaneous equations Tobit model with latent classes. Our approach taking into account
expenditure interrelationships and consumer heterogeneity results in the more accurate prediction
of the size and share of wallet, which firms can use for better segmenting and targeting.
In the second essay, we examine the interdependency between various information sources,
segment consumers based on their search patterns, and compare search results by the segments.
We find out that both online search and offline search affect each other and that low external
search segments choose American brands and get lower discounts while high external search
vii
segments choose foreign brands and get better price deals. Our results can give managers
guidelines in which media they should provide information and which search segments they
should target.
In the third essay, we examine the effects and value of online product reviews on the purchase
decision process. In our approach, consumers incorporate product reviews with their prior
perceived quality in order to construct posterior perceived quality which affects the consideration
set and choice decisions. Our findings show that consumers use product reviews mainly in the
consideration set stage and their updating method is consistent with Bayesian updating. We also
compute the monetary values of each component of the product reviews. Our results have
managerial implications such as product review providers should display all components of
consumer reviews from the beginning of the search and manufacturers should keep consumers’
perceived quality high by managing their prior quality at all times. They should also encourage
satisfied customers to write good reviews.
viii
TABLE OF CONTENTS
Acknowledgements ..........................................................................................................................v
Abstract .......................................................................................................................................... vi
List of TABLES ...............................................................................................................................x
List of FIGURES ........................................................................................................................... xi
CHAPTER 1 CONSUMER SPENDING PATTERNS ACROSS FIRMS AND CATEGORIES: APPLICATION TO THE SIZE AND SHARE OF WALLET .......................................................1
ABSTRACT .........................................................................................................................2
INTRODUCTION ...............................................................................................................3
LITERATURE REVIEW ....................................................................................................6
MODEL AND METHODOLOGY......................................................................................9
DATA AND ESTIMATION .............................................................................................14
RESULTS ..........................................................................................................................18
CONCLUSION ..................................................................................................................31
APPENDIX ........................................................................................................................35
REFERENCES ..................................................................................................................45
CHAPTER 2 SEARCH PATTERNS, SEARCH-BASED SEGMENTATION AND SEARCH RESULTS OF AUTOMOBILE PURCHASERS ..........................................................................47
ABSTRACT .......................................................................................................................48
INTRODUCTION .............................................................................................................49
LITERATURE REVIEW ..................................................................................................55
MODEL AND METHODOLOGY....................................................................................60
DATA AND ESTIMATION .............................................................................................64
RESULTS ..........................................................................................................................70
CONCLUSION ..................................................................................................................89
ix
APPENDIX ........................................................................................................................93
REFERENCES ..................................................................................................................95
CHAPTER 3 HOW CONSUMERS USE PRODUCT REVIEWS IN THE PURCHASE DECISION PROCESS...................................................................................................................97
ABSTRACT .......................................................................................................................98
INTRODUCTION .............................................................................................................99
LITERATURE REVIEW ................................................................................................103
MODEL AND ESTIMATION ........................................................................................107
SURVEY AND DATA ....................................................................................................117
RESULTS ........................................................................................................................121
CONCLUSION ................................................................................................................131
REFERENCES ................................................................................................................135
APPENDIX ......................................................................................................................137
VITA
x
LIST OF TABLES
Number Page
Table 1.1. Descriptive Statistics.....................................................................................................15
Table 1.2. Model Fits and Reduction Percent Compared to the Benchmark Model (M1) ............19
Table 1.3. Parameter Estimates ......................................................................................................20
Table 1.4. Prediction of the Size and Share of Wallet ...................................................................29
Table 2.1. Information Sources in the Automobile Purchases.......................................................51
Table 2.2. Comparison of Studies Related to Automobile Purchases ...........................................56
Table 2.3. Descriptive Statistics of Major Variables .....................................................................65
Table 2.4. Results of Principal Component Analysis ....................................................................68
Table 2.5. Interrelationship of the External Search Sources ..........................................................71
Table 2.6. Effects of the Exogenous Variables ..............................................................................76
Table 2.7. Description of Segments ...............................................................................................83
Table 2.8. Results of ANCOVA ....................................................................................................88
Table 3.1. Competing Specifications ...........................................................................................108
Table 3.2. Hotel Information .......................................................................................................118
Table 3.3. Log Marginal Likelihood of Models ..........................................................................122
Table 3.4. Parameter Estimates of Model 10 ...............................................................................124
Table 3.5. Monetary Value of a Unit Increase in Product reviews Components .........................128
xi
LIST OF FIGURES
Number Page
Figure 1.1. Conceptual Model of Factors to Affect Expenditures .................................................16
Figure 2.1. Conceptual Model .......................................................................................................52
Figure 2.2. Search Based Segments (S1 to S9) and Their Search Times in Hours .......................79
Figure 2.3. Correspondence between Search-based Segments and Brand Choices ......................85
Figure 3.1. Survey Screen on Some Hotel Information ...............................................................118
Figure 3.2. Survey Procedure.......................................................................................................119
1
CHAPTER 1
CONSUMER SPENDING PATTERNS ACROSS FIRMS AND CATEGORIES:
APPLICATION TO THE SIZE AND SHARE OF WALLET
Sungha Jang
School of Management, Department of Marketing, SM32
The University of Texas at Dallas
800 West Campbell Road
Richardson, Texas 75080-3021
2
ABSTRACT
Firms need to know consumers’ expenditures across firms and categories to predict their size-
and share-of-wallet. In this study, we model consumers’ expenditures allowing for three features:
(1) interrelationship in consumers’ spending across multiple firms and categories, called
simultaneity; (2) data censoring that occurs when consumers do not spend in certain categories;
and (3) consumer heterogeneity in spending patterns. To handle these, we propose a
simultaneous equations Tobit model with latent classes. The model is estimated with Bayesian
estimation using credit card expenditure data. Two segments are identified. One segment, of
‘habituals,’ covers 76% of consumers who show habitual usage patterns. The remaining
segment, ‘adaptives,’ allocates their budget based on income and demographics. We discuss the
interrelationship of expenditures across firms and categories by segments. The findings suggest
that firms need to take heterogeneity and inter-related expenditures into account for accurate
prediction of size- and share-of-wallet.
Keywords: Share of wallet; Customer heterogeneity; Structural model; Simultaneous equations
Tobit model; Bayesian estimation.
3
INTRODUCTION
Share of wallet is defined as the percentage of a customer’s total category expenditure (i.e., size
of wallet) that is captured by the firm. It is an informative metric for customers’ untapped
potentials, for the effectiveness of marketing activities, and for competitive benchmarking. It has
been used as a loyalty measure (e.g., Bowman and Narayandas 2004), as a segmentation criterion
(e.g., Reinartz and Kumar 2003), and is known to have a positive impact on profits (e.g.,
Reinartz, Thomas and Kumar 2005). However, as Du, Kamakura and Mela (2007) emphasize,
calculating the share of wallet requires information about customers’ expenditures at competing
firms as well as one’s own firm; information that is often unavailable. In the absence of
information on expenditures at competitors, a model to predict it needs to be constructed. This
issue motivates the present research.
It may be informative to consider the impact of cross-firm and cross-category
expenditures on share of wallet calculations because studies show that consumer purchases in a
category are related to their purchases in other categories (e.g., Iyengar, Ansari and Gupta 2003;
Li, Sun and Wilcox 2005). However, these studies do not have share of wallet calculation as a
goal, and do not consider consumers’ purchases from competing firms. In the existing share of
wallet literature, either a single category is considered (e.g., Chen and Steckel 2005; Zheng,
Fader, and Padmanabhan 2009) or the relationship between inter-category expenditures is
indirectly examined through the error term (e.g., Du, Kamakura and Mela 2007).
4
In this paper, we develop a structural model of budget allocation to study the
interrelationship of expenditures across firms and categories, given current and past expenditures
in multiple categories at a focal firm and competing firms. We consider the possibility that
spending patterns can vary for different consumers and control for unobserved heterogeneity
using a latent class approach. Estimating this model presents methodological issues such as
simultaneity and censoring, which lead to more complexity compared to the typical problem of
determining the size and share of wallet in a single category that is found in the literature.
Simultaneity occurs because expenditures across firms and categories are interrelated due to
consumers’ budget constraint. Censoring occurs when consumers do not spend in all categories
in all firms.
To resolve the simultaneity and censoring problems, we propose a Bayesian estimation
method for estimating a simultaneous equations Tobit model, which represents an important
contribution of the paper. The approach is to estimate the coefficients of endogenous variables
(expenditures) by a Metropolis-Hastings algorithm to solve simultaneity, and for zero
expenditures, impute from a truncated normal distribution conditional on non-zero expenditures,
solving censoring. Multiplying endogenous variables with the coefficients of endogenous
variables yields a SUR model. With this SUR model, we estimate coefficients of exogenous
variables and error covariance by Gibbs samplers. As Carlin and Louis (2000) and Koop (2003)
point out, there are not many Bayesian studies that use a simultaneous equations Tobit model.
Therefore, our approach can be a good alternative for future studies.
The empirical application, in common with several other papers in the literature, deals
with financial services offerings. We use credit card data for a focal bank and a composite of
5
competing banks, both of whom offer services in the two categories that we consider, namely,
making purchases and taking cash advances on the bank credit card. We refer to these two
categories as purchases and cash advances, respectively. Given this data, we examined the
following research questions: (1) What consumer segments can be identified? (2) How different
are consumer spending patterns by segments, especially interrelationship of expenditures? (3)
Can we better predict the size and share of wallet by allowing interrelationship and heterogeneity
compared to benchmark models? We answer those questions through the estimation results of
our model. We find the proposed model has better fit and prediction for the size and share of
wallet on both the estimation and validation sample than benchmark models.
The paper proceeds as follows: The next section is a review of the literature. In the model
and methodology section, we propose the simultaneous equations Tobit model and the estimation
method. Then, we describe the data set and present the results. Finally, we provide the
conclusions, managerial implication and directions for future research.
6
LITERATURE REVIEW
This study is related to two research areas: relationship of purchases across categories and the
size and share of wallet. We consider each in turn.
First, relationship of purchases across categories is well studied in the context of cross-
selling. For example, Kamakura et al. (2003) showed relationships in purchases of 22 financial
services at a focal bank and its competitors. From this, they predicted the possibility of
customers purchasing services that they did not yet own. Li et al. (2005) found a sequential order
of buying financial products and predicted opportunities of cross-selling products based on this.
Similarly, Iyengar et al. (2003) present a model to better understand and predict consumers’
purchases in a category given purchases in other categories. In general, the cross-selling
literature shows that purchases across categories are interrelated and that this knowledge can
help to predict purchases in other categories. However, the existing literature is for categories
within a firm and does not examine interrelationship between firms, which we address in this
study.
Second, we examine the share of wallet literature by further dividing it into three broad
streams of research. The first stream deals with the role and consequences of share of wallet.
Regarding its role, share of wallet has been treated as a loyalty measure (Bowman and
Narayandas 2004), as a segment criterion (Reinartz and Kumar 2003), and as an attrition detector
(Malthouse and Wang 1998). Regarding its consequences, the positive effects of share of wallet
7
on profitability and relation duration have been investigated (e.g., Reinartz and Kumar 2003;
Reinartz, Thomas and Kumar 2005).
The second stream examines the antecedents of share of wallet. For example, Bowman
and Narayandas (2004) and Cooil, Keiningham, Aksoy and Hsu (2007) analyze the impact of
customer satisfaction on share of wallet. Baumann, Burton and Elliott (2005) and Verhoef (2003)
identify customers’ characteristics and transaction patterns and firms’ marketing activities that
are associated with share of wallet. These findings are helpful for firms to understand who their
low share of wallet consumers are and improve the relationship with such customers.
The third stream is to predict expenditures at competing firms and calculate share of
wallet based on the predicted expenditures. Often, expenditure data are available only for the
focal firm and either unavailable or partially available for the competitors. Then, a prediction
model for the unavailable expenditures is required. This is the literature against which we are
most closely positioned. We proceed to discuss it in more detail.
Chen and Steckel (2005) calculate share of wallet of a credit card firm by inferring
consumers’ behavior at competing firms. Specifically, they use the number of grocery purchases
with the focal firm’s credit card and infer the number of purchases that customers might have
made from competitors using Markov switching matrices. Based on the length of inter-purchase
times, they calculate share of wallet without information on competitors. Zheng et al. (2009)
calculate share of wallet for five online retailers (selling apparel, wireless services, books, office
supplies, and travel) by using market level data on competitors that are publicly available. They
use a limited information NBD/Dirichlet model for calculating share of wallet. Du et al. (2007)
use imputed balances outside the focal bank to simultaneously predict category ownership, total
8
amount, and the focal firm’s share at 10 categories of a focal bank by a multivariate factor
analytic model.
The previous literature focused on share of wallet in a single category (e.g., Chen and
Steckel 2005), share of wallet at the firm level (e.g., Zheng et al. 2009), or independent share of
wallet in several categories (Du et al. 2007). However, based on the results from the cross-selling
literature, it is obvious that firms should consider the relationships between purchases in multiple
categories, which we consider in this paper. Therefore, our methodological contribution extends
the previous literature by including the interrelationship of expenditures across multiple
categories at multiple firms.
9
MODEL AND METHODOLOGY
Consider J firms that sell products in C categories to a market of consumers. The consumers also
have an outside option that represents all consumption categories not provided by the J firms.
The firms are indexed by Jj ,,1 , the categories by Cc ,,1 and the outside option by 0.
We assume that consumers maximize their concave utility functions by allocating budget W
across these M (= CJ ) firm-category combinations and the outside option. This is expressed by
(1) Wyts
yAyybyU
M
y
'1
'
1..
2/')(max
where )',,,,,,,( 21112110 MC yyyyyyy is the vector of expenditures, b is an (M+1)
dimensional vector, A is an (M+1) (M+1) negative definite matrix, and 11M is an (M+1)
dimensional vector of ones. Note that expenditures are nonnegative, and that b and A can be
interpreted as proportional to the mean and the variance-covariance matrix of the unit return to
the expenditures (Amemiya, Saito and Shimono 1993).
Ransom (1987) and Amemiya et al. (1993) point out that the Kuhn-Tucker conditions for
the quadratic utility with a budget constraint yields a simultaneous equations Tobit model
introduced by Amemiya (1974). We provide the derivation in Appendix A. Thus, for each
individual i, we have a simultaneous equations Tobit model given in structural form by:
(2) *i i iy X , where ),0(~ Ni .
10
The vector *iy of latent utility from expenditures contains M endogenous variables and
thus M equations indexed by Mm ,,1 . For consumer i and expenditures of m-th firm-
category combination, the relationship between the expenditure utility ( *miy ) and the observed
expenditure ( miy ) is given by,
00
0*
**
mimi
mimimi
yify
yifyy
That is, consumers’ spending equals their expenditure utility provided it is higher than a
threshold, which is scaled to zero, and they spend nothing in categories that yield negative utility.
Matrix iX contains the vectors of exogenous variables (e.g., consumer characteristics) in
the following form.
'
'2
'1
000
0000
Mi
i
i
i
x
xx
X ,
where mix is a mk -vector containing i-th observation of the vector of explanatory variables in the
m-th equation. In each mix vector, there are variables common to all M equations and a unique
variable in each m-th equation for exclusion restrictions to identify the system.
is a MM matrix whose diagonal elements are one and whose off-diagonal elements
are coefficients of other endogenous variables. This matrix expresses the interrelationship of the
endogenous variables. is a 1K vector of coefficients, where M
mmkK
1
. The vector
11
shows the impact of the exogenous variables on the endogenous variables. Finally, i follows a
normal distribution ),0(N .
Estimating the structural model given by Equation 2 presents issues of simultaneity and
censoring. Simultaneity occurs because each endogenous variable affects the other endogenous
variables, expressed through the matrix . The standard solution described in, e.g., Cameron
and Trivedi (2005, p. 561) for two simultaneous equations, is to first obtain the reduced form and
estimate it as a Tobit model. Second, replace the regressors *iy in the structural model by their
reduced form predictions, then proceed with regression.
In contrast, we adopt an alternative Bayesian approach. To solve the simultaneity
problem, we estimate from the original structural form. To solve the censoring issue, if miy is
zero, we impute *miy using a truncated multivariate normal distribution conditional on other non-
zero endogenous variables. If we multiply the endogenous variables with (i.e. *iy ) and derive
new variables, namely )(~ **ii yy , the original model converts into a SUR model (i.e.
iii Xy *~ ). Thereafter, the estimation method of and is the same as that of a SUR
model.
There are two advantages of our Bayesian approach. First, compared to maximum
likelihood estimation, it is more feasible in estimating models with many endogenous variables.
This is because if we estimate a simultaneous equations Tobit model by maximum likelihood
method (e.g. Maddala, 1986), the numerical approximation (e.g. the GHK simulator) to the
multivariate normal integral in the likelihood is less accurate as the number of equations
increases. Even if we estimate the model by two-step maximum likelihood estimation (e.g.
12
Murphy and Topel, 1985), in the case of more than two equations, it is not easy to correct the
covariance matrix of estimators in the second step. However, our approach does not involve
multiple integrals nor variance correction as we sequentially draw parameters given other
parameters.
The second advantage of the proposed approach, compared to other Bayesian methods, is
that it is possible to estimate all types of simultaneous equations models: recursive models, just-
identified models and over-identified models. Li (1998) infers a recursive model with a limited
dependent variable. Our approach can easily analyze a recursive model by setting a proper
restriction on the matrix . Regarding just-identified models, Koop (2003) mentions an
estimation method to first take its reduced form, estimate it by a SUR model and then recover the
parameters in the structural form by transformation. Yang, Narayan and Assael (2006) also use a
similar approach. However, their method is not applicable to over-identified models because
there is an identification issue of recovering parameters in the structural form from the reduced
form. In contrast, our approach is applicable to even over-identified models as we do not recover
parameters from a reduced form but separately draw parameters of endogenous and exogenous
variables.
Note that we adopt a latent class model in order to capture unobserved heterogeneity. On
every iteration, we determine the membership of the latent classes and estimate parameters by
the latent classes to have different , and . Details of the Markov Chain Monte Carlo
(MCMC) algorithms are provided in Appendix B. A summary is presented below.
Step A. Draw the membership of the latent class s
Step B. For the latent class s=1, ,S
13
- Step B1. Take a candidate draw *~s using the posterior simulator. Calculate the posterior
probability at *~s and )1(~ r
s , where r means r-th iteration. Accept the candidate draw with
the probability
- Step B2. With the new draw )(~ rs , derive )(r
s and calculate )1*()()*(~ rs
rs
rs yy
- Step B3. Draw )(rs and )(r
s using a Gibbs sampler for a SUR model
- Step B4. Draw )*(rsy given all parameters )(r
s , )(rs and )(r
s
Step C. Repeat Step A through B for R times
14
DATA AND ESTIMATION
Data
The empirical application uses data on credit card expenditures in two categories,
purchases and cash advances, which are the main services of the credit card industry. The data
consist of expenditures in these two categories at the focal bank and all its competitors
aggregated together. Therefore, we assume that there are two banks: the focal bank and a second,
composite bank.
The dataset has the current three-month average expenditures and past three-month
average expenditures both at the focal bank and competitors. Besides expenditure information,
we have demographic information such as age, gender, income, salaried person indicator (1 if a
consumer is salaried, 0 if self-employed), and credit scores.
In Table 1.1.A, we provide descriptive statistics for the demographic variables, but
omitting expenditures due to confidentiality requirements. Recall that if latent utility is negative,
expenditures are observed as zero. In Table 1.1.B, the censoring percent of the current and past
expenditures are presented.
We discarded some consumers who provided information that seemed suspect (i.e.,
reported too low or too high income) or showed abnormal credit card use (i.e., were delinquent
or had average monthly expenditure that exceeded a quarter of their annual income). After
cleaning the data, the final dataset had 6617 observations. We randomly selected 5610 (or 85%)
observations for estimation and kept the remaining 1007 for validation.
15
Table 1.1. Descriptive Statistics A) Demographics
Variable Mean (or %) SD Min Max
Age 37.3 8.1 22 77
Male (Dummy) 0.7 0.5 0 1
Log of Income 29.1 18.4 3.0 197.6
Salaried Person (Dummy) 0.9 0.3 0 1
Credit Score 6.4 0.7 3.7 7.8
B) Censoring percent of expenditures variables
Expenditures Current Past
Purchase C 6.4% 6.5%
Cash Advance C 46.8% 47.8%
Purchase F 12.3% 16.9%
Cash Advance F 59.3% 58.8%
The subscript C represents competing banks and F represents the focal bank.
Model Estimation
A conceptual view of the model is depicted in Figure 1.1. With the variables in the
dataset, we construct a simultaneous equations Tobit model with four equations (2 banks 2
categories) as shown in Equation 3.
(3)
22
21
12
11
*22
*21
*12
*11
421
412
411
322
312
311
222
221
211
122
121
112
,2222
,2121
,1212
,1111
'22
'21
'12
'11
*22
*21
*12
*11
00
00
yyyy
yyyy
XXXX
yyyy
Past
Past
Past
Past
16
Figure 1.1. Conceptual Model of Factors to Affect Expenditures
Note that we re-arranged the terms in Equation 2 for better interpretation. Thus, the
matrix in Equation 2, which represents the interrelationship between endogenous variables, is
converted into the -matrix on the right hand side of Equation 3. The ’s in the jc-th equation
are interpreted as effects of the latent utility from other expenditures ( *jcy ) on the latent utility
from the jc-th expenditure ( *jcy ).
In Equation 3, *jcy is the latent utility from expenditure in the category c at the firm j,
where j=1 (competitors) or 2 (the focal firm) and c=1 (purchases) or 2 (cash advances). Equation
3 shows that *jcy is affected by all other endogenous variables *
jcy as well as common
demographic information X. The past three-month expenditures Pastjcy , are included in the
equation where the corresponding expenditure is the endogenous variable and are used for the
17
exclusion restrictions. Values of the past expenditures are assumed to be known in the current
period.
We estimate the parameters using Metropolis-Hastings within a Gibbs sampler and
impute the latent endogenous variables *jcy using data augmentation. We made 20,000 draws, of
which the first 10,000 were discarded as a ‘burn-in’ period. We checked that the algorithm
converged by investigating the stable trends of draws and the distributions of draws. The
acceptance rates were around 0.4. We kept every 10th draw to report the parameters estimation
and calculate the conditional expenditures at competitors.
Benchmark Model
We use the following multivariate Tobit model as a benchmark model to compare with
the proposed model.
iii Xy* , where ),0(~ Ni
The benchmark model is thus a reduced-form of our structural model that does not consider the
interrelationship of endogenous variables and heterogeneity of consumer preferences. In this
model the utility from expenditures is affected by demographics and past expenditures. The
vector is the combined effect of and in Equation 2. For the exogenous matrix iX , we
use the same variables as the proposed model.
18
RESULTS
In this section we discuss the model fit and the parameter estimates. Using model fit statistics,
the number of the latent classes is determined. Then, for each latent class, we interpret the
interrelationship between expenditures across firms and categories, which is our main focus.
After this, we interpret effects of exogenous variables including demographics and past
expenditures. Finally, we present the prediction results for the estimation and validation sample.
Model Comparison
We estimate the proposed model (denoted S) for different numbers of latent segments.
We also estimate the benchmark model (denoted M). Variables whose 95% posterior intervals do
not contain zero are in bold.
In Table 1.2, we present several model fit statistics including marginal likelihood, AIC,
BIC, and DIC. For the proposed model, we find that a 2-segment solution, denoted S2, works
best. The BIC of S2 (30,551) is a reduction of 29% from the BIC of S1 (43,120), while the
decrease after two segments is less significant (e.g. less than 5% difference). Other statistics also
indicate that S2 is preferred. In S2, the size of the segments are 76% and 24%.
From Table 1.2, we can see the benefit of including expenditure interrelationships and
consumer heterogeneity by comparing S2 and S1 against the benchmark models M1 and M2.
First, both S2 and S1 show better model fit than M1. Furthermore, comparing the main model S2
against a benchmark model with the same latent class number (M2), shows that modeling
19
Table 1.2. Model Fits and Reduction Percent Compared to the Benchmark Model (M1)
Model Segment Size
Model Fit Reduction %
Marginal LL
AIC BIC DIC Marginal LL
AIC BIC DIC
M1 100% -21,700 42,950 43,200 107,280 - - - -
M2 63/37% -13,733 30,452 30,956 68,386 -36.7% -29.1% -28.3% -36.3%
S1 100% -21,610 42,790 43,120 106,810 -0.4% -0.4% -0.2% -0.4%
S2 76/24% -13,274 29,888 30,551 67,101 -38.8% -30.4% -29.3% -37.5%
interrelationships is important because the BIC in S2 (30,551) is statistically smaller than BIC in
M2 (30,956). Second, with respect to consumer heterogeneity, S2 and M2 show large increase in
likelihood and decrease in information criteria compared to M1. That is, there is heterogeneity in
consumer spending patterns and including it significantly increases model fit.
Segment Characteristics
Hereafter, we focus on the results for S2. We profile the two segments in our proposed
model S2 by examining parameter estimates of each segment in Table 1.3. The larger segment
(76%) shows inertia in usage pattern over time since the coefficients of past expenditures across
firms and categories are high (over 0.7) possibly because allocation preference across firms and
categories are already in equilibrium. Meanwhile, the effects of other factors including
interrelationship are not so strong compared to the other segment. In contrast, the smaller
segment (24%) shows less dependence on past expenditures (the coefficients are around 0.4) and
20
Table 1.3. Parameter Estimates A) Expenditures at competing banks
LHS RHS Habitual Segment Adaptive Segment
Estimates SD Estimates SD
Purchases
(C)
Exo-
geneous
Intercept 0.02 0.10 2.01 0.80
Age -0.001 0.001 -0.01 0.01
Male (D) -0.01 0.01 -0.09 0.08
Log of Income 0.05 0.01 0.59 0.08
Salaried Person (D) -0.04 0.02 -0.37 0.11
Credit Score 0.01 0.01 -0.31 0.12
Past Expenditure 0.79 0.01 0.33 0.02
Other
Endo-
geneous
Purchase C - - - -
Cash Advance C 0.03 0.005 0.03 0.03
Purchase F -0.08 0.02 -0.30 0.04
Cash Advance F -0.02 0.01 -0.31 0.05
Cash
Advances
(C)
Exo-
geneous
Intercept 5.05 0.26 10.19 0.91
Age -0.004 0.003 0.001 0.01
Male (D) 0.06 0.04 -0.01 0.13
Log of Income -0.11 0.04 -0.28 0.13
Salaried Person (D) 0.14 0.06 0.43 0.16
Credit Score -0.82 0.04 -1.60 0.13
Past Expenditure 0.94 0.01 0.46 0.03
Other
Endo-
geneous
Purchase C 0.12 0.02 0.39 0.08
Cash Advance C - - - -
Purchase F -0.13 0.05 -0.36 0.11
Cash Advance F 0.07 0.02 0.28 0.09
21
Table 1.3 continued.
B) Expenditures at the focal bank
LHS RHS Habitual Segment Adaptive Segment Estimates SD Estimates SD
Purchases (F)
Exo-
geneous
Intercept 0.03 0.06 0.95 0.42
Age -0.001 0.001 -0.01 0.003
Male (D) -0.003 0.01 0.00 0.05
Log of Income 0.03 0.01 0.31 0.06
Salaried Person (D) -0.002 0.01 -0.15 0.06
Credit Score 0.000 0.01 -0.12 0.06
Past Expenditure 0.75 0.02 0.39 0.02
Other
Endo-
geneous
Purchase C -0.03 0.01 -0.13 0.03
Cash Advance C -0.01 0.004 -0.09 0.02
Purchase F - - - -
Cash Advance F 0.003 0.01 0.06 0.04
Cash Advances
(F)
Exo-
geneous
Intercept 1.57 0.14 4.05 0.58
Age -0.001 0.001 0.00 0.004
Male (D) 0.04 0.02 0.02 0.06
Log of Income -0.08 0.02 -0.02 0.06
Salaried Person (D) -0.01 0.03 0.04 0.08
Credit Score -0.26 0.02 -0.65 0.09
Past Expenditure 1.03 0.02 0.58 0.04
Other
Endo-
geneous
Purchase C -0.10 0.02 -0.17 0.04
Cash Advance C 0.02 0.01 -0.02 0.03
Purchase F 0.19 0.03 0.02 0.05
Cash Advance F - - - -
22
Table 1.3 continued.
C) Derivatives implied by coefficients of endogenous variables
Label Purchases Cash Advances
Derivatives Habitual Segment
Adaptive Segment
Derivatives Habitual Segment
Adaptive Segment
(1) Across category within bank
CC CASHPUR / 0.03 0.03 CC PURCASH / 0.12 0.39
FF CASHPUR / 0.003 0.06 FF PURCASH / 0.19 0.02
(2) Across bank within category
FC PURPUR / -0.08 -0.30 FC CASHCASH / 0.07 0.28
CF PURPUR / -0.03 -0.13 CF CASHCASH / 0.02 -0.02
(3) Across bank and category
FC CASHPUR / -0.02 -0.31 FC PURCASH / -0.13 -0.36
CF CASHPUR / -0.01 -0.09 CF PURCASH / -0.10 -0.17
- The variable CASH represents expenditure in the cash advances category and PUR represents expenditure in the purchases category. - The subscript C represents competing banks and F represents the focal bank.
23
more reliance on factors such as income. In addition, expenditures in this segment are much
more interrelated. Thus, we label the larger segment as habitual segment and the smaller
segment as adaptive segment because expenditures depend on income.
The characteristics of the segments in demographics and expenditure patterns are as
follows. Compared to the habitual segment, consumers in the adaptive segment are slightly older
(39.1 vs. 36.8), comprise more males (70.9% vs. 66.2%), have fewer salaried persons (83.9% vs.
89.5%), and have more income. The adaptive segment spends, on average, 2.3 times more in the
purchases category and 1.4 times more in the cash advances category. However, there are no
differences between the segments in credit score and share of wallet in both categories. In the
following subsection, we discuss the effects of all factors by segment.
Effects of Endogenous Variables
Next, we interpret the coefficients of endogenous variables, which are presented in terms
of the derivatives implied by the coefficients in Table 1.3.C. These summarize how the
preference for purchases category is associated with the preference for cash advances category
and vice versa, across firms and categories.
Across categories within the same bank (Label 1), the interrelationships of the preference
between categories are positive and asymmetric. Specifically, in both segments, the increased
preference for cash advances is not associated with the preference for purchase except for the
habitual segment at competing banks with a weak positive impact ( )03.0/ CC CASHPUR . In
contrast, the increased preference for purchases is associated with the increased preference for
cash advances. At competitors, the coefficients are significant for both segments
24
CC PURCASH /( =0.12 and 0.39 for the habitual and the adaptive segment, respectively). At
the focal bank, the coefficient is significant only for the habitual segment
)19.0/( FF PURCASH . From the results, we can see that the purchases category is the more
important for inducing cross-selling than cash advances category and that the interrelationship
across categories more strongly takes place at competitors.
Across banks within the same category (Label 2), the interrelationships of the preference
for different banks depend on the category. In the purchases category, increased preference at the
focal bank is associated with decreased preference at competitors ( FC PURPUR / = 08.0 and
30.0 for the habitual and adaptive segment, respectively). Similarly, the increased preference at
competitors is associated with the decreased preference at the focal bank ( FC PURPUR / =
03.0 and 13.0 , respectively). The reason for the negative interrelationship in the purchases
category may be because consumers keep a balance between expenditures at two different banks
due to the limited budget.
Interestingly, cash advances respond somewhat differently. The increased preference for
cash advances at the focal bank are associated with increased preference for cash advances at
competing firms ( 28.0,07.0/ FC CASHCASH for each segment, respectively), but the
reverse is not so strong ( 02.0/ CF CASHCASH , for the habitual segment). Evidently, those
who use the focal firm for cash advances tend to seek cash advances elsewhere as well possibly
because they cannot satisfy their cash demand only at the focal bank. Therefore, if the focal bank
can increase the limit of cash advances without risk, there are business opportunities in the cash
advances category. However, those who use competing firms for cash advances are less likely to
25
use the focal firm possibly because there are a number of competing banks that can be sources of
cash advances.
Finally, we compare the interrelationships of preferences across bank and across category
(Label 3). In general, these interrelationships are negative. It is seen that increased preference for
cash advances by the adaptive segment of the focal bank is associated with reduced preference
for purchases at competing banks ( 31.0/ FC CASHPUR ). Similarly, increased preference
for cash advances at competing banks is associated with reduced preference for purchases at the
focal bank ( 01.0/ CF CASHPUR and 09.0 , respectively). With respect to cash advances
category, increased preference for purchase at the focal bank is associated with reduced
preference for cash advances at competing banks ( 13.0/ FC PURCACH and 36.0 ) while
increased preference for purchases at competing banks is associated with reduced preference for
cash advances at the focal bank ( 10.0/ CF PURCASH and 17.0 , respectively). The
results show that the interrelationships of expenditures also occur across-bank and across-
category, which cannot be discovered by a within bank and within category examination.
In summary, the results show that there are interrelationships between expenditures
across banks and categories. Overall, the interrelationships are positive across categories at the
same bank or in the cash advances category across banks. Therefore, banks can utilize the
positive results for cross-selling at the same bank or offering more cash advances in order to
keep consumers use one’s own cash advances category. Meanwhile, interrelationships are
negative in the purchases across banks and cross-bank and cross-category. It should be noted that
those interrelationships vary by segments. The adaptive segment shows stronger
26
interrelationships than the habitual segment. Therefore, it is necessary for firms to distinguish the
segments and implement different marketing mix based on their usage patterns. For example, it
could be more effective to target the adaptive segment for cross-selling.
Effects of Exogenous Variables
From Table 1.3.A and 1.3.B, we found large differences in the effects of exogenous
variables between the two segments. Most of all, expenditures of the habitual segment are
heavily affected by past expenditures while the expenditures of the adaptive segment are more
affected by demographics. Especially, the differences in coefficients of income, salaried person
indicator, and credit score are salient. We briefly present impacts of exogenous variables by
segments.
First, we start with the habitual consumers. Age, male, and salaried person do not have an
impact on the preferences for both purchase and cash advance categories at both firms. However,
income has a positive impact on preferences for purchase category ( , for
competing banks and the focal bank respectively, and hereafter in this subsection) and a negative
impact on preferences for cash advance category ( 8,11 ). Credit score has negative
impacts only on the preference for cash advance category ( 26.0,82.0 ). Finally, the
impact of past expenditures is so strongly positive )03.1~74.0( that this segment is named
as the habitual segment.
Second, with respect to the adaptive segment, age has a negative impact only on the
preference for purchase category )01( and male does not have an impact on either
category. Income has a positive impact on the preference for purchase )31.0,59.0( and for
27
cash advance category at competing banks ).28.0( Salaried person has negative impact on
the preference for purchase )15.0,37.0( and a positive impact on the preference for cash
advance at competing banks )43.0( . Credit score has negative impacts on both the preference
for purchase )12.0,31.0( and the preference for cash advance )65.0,60.1( . The
coefficients of past expenditures )58.0~33.0( are about half that for the habitual segment.
Overall, credit card usage is less affected by age or gender and more affected by income.
Higher income consumers tend to spend more in the purchases category and less in the cash
advances category. Salaried persons show various usage patterns depending on firms and
categories. Finally, consumers with high credit score are less likely to use the cash advances
category.
Prediction of the Size and Share of Wallet
After finding consumer spending patterns across firms and categories, we utilize this
knowledge for predicting the size and share of wallet. We predict expenditures only at competing
banks conditional on expenditures at the focal bank given that the firm should have availability
of its own transactions with consumers. The size of wallet for a category is the sum of
expenditures at the focal firm and expected expenditures at competing firms. The share of wallet
is calculated as,
)|( focalscompetitorfocal
focalfocal
yyEyy
WalletofSizey
WalletofShare .
After the burn-in period, we calculate the expected expenditures at competing banks.
Note that we derive expected expenditures from the latent utility from expenditures at competing
28
banks conditional on observed expenditures at the focal bank. As the analytical calculation
method is complicated, we use a numerical approach using Monte Carlo integration, which is
explained in detail in Appendix C.
The prediction results of our proposed model (S2) and the benchmark model (M1) are
presented in Table 1.4.A for the estimation sample and Table 1.4.B for the validation sample. We
present Mean Absolute Error (or MAE) of the size and share of wallet between the real
expenditures and expected expenditures. For the estimation sample, we present the results by
segments to see how much we can improve the prediction by segmentation. In each drawing, we
obtain the membership from a multinomial distribution with the probability which is the revised
prior membership probability with the likelihood conditional on observed expenditures. We
allocate each consumer to the segment of which their membership frequency is higher.
The results show that, in general, the proposed model considering the interrelationship
and heterogeneity shows better performance in predicting both the size and share of wallet,
especially in the habitual segment. As the benchmark model M1 does not distinguish the
interrelationships and the different preferences of consumers. Although the coefficients of past
expenditures in M1 are on average 0.67, its prediction is worse than the adaptive segment which
does not heavily depend on the past usage patterns. Specifically, in the purchases category, the
decreased MAE of the habitual segment in model S2 compared to model M1 is 4.20 % for the
size and 2.22 % for the share of wallet. However, in the adaptive segment, the reduction is
marginal for the size ( 7.2 %) and does not take place for the share of wallet. In the cash
advances category, the decreased MAE of the habitual segment in model S2 is 3.25 % for the
29
Table 1.4. Prediction of the Size and Share of Wallet A) The estimation sample
Category Type Segment N M1 (MAE) S2 (MAE) Reduction (%)
Purchases
Size Habitual 4412 0.341 0.271 -20.4%
Adaptive 1143 0.971 0.944 -2.7%
Share Habitual 4395 0.134 0.104 -22.2%
Adaptive 1141 0.121 0.127 5.0%
Cash
Advances
Size Habitual 4412 0.482 0.360 -25.3%
Adaptive 1143 0.899 0.806 -10.3%
Share Habitual 2675 0.139 0.123 -11.4%
Adaptive 695 0.143 0.140 -1.6%
B) The validation sample
Category Type Past Expenditure N M1 (MAE) S2 (MAE) Reduction
(%)
Purchases
Size Observed 995 0.519 0.506 -2.6%
Predicted 994 0.689 0.694 0.7%
Share Observed 991 0.132 0.123 -6.4%
Predicted 990 0.164 0.163 -0.3%
Cash
Advances
Size Observed 995 0.566 0.483 -14.7%
Predicted 994 0.874 0.816 -6.6%
Share Observed 612 0.152 0.145 -4.5%
Predicted 612 0.181 0.177 -2.4%
30
size and 4.11 % for the share. In addition, in the adaptive segment, the decrease in MAE is
3.10 % for the size and 6.1 % for the share of wallet. In summary, the proposed model brings
the benefits of better prediction for the habitual segment by correctly estimating the effects of
factors, mainly the past expenditures, and as good prediction for the adaptive segment as the
simple model M1 though the latter segment does not heavily rely on the past expenditures.
We also present the prediction results of the validation sample in Table 1.4.B. In practice,
it may be difficult to use past expenditures at competing banks as exogenous variables.
Therefore, for past expenditures at competing banks, we use both observed ones assumed to be
available and predicted ones estimated by a Tobit model in which we regress past expenditures at
competing banks on demographics and past expenditures at the focal bank. To calculate expected
expenditures at competitors, we use the every 10th draws from the posterior distribution. As we
cannot decide the segment membership in the validation sample, each consumer is randomly
assigned to the habitual segment and adaptive segment with the probability of 76% and 24% on
average for each iteration.
The results show that the performance of the proposed model S2 is generally better than
the performance of the benchmark model M1 in that there are decreases of MAE in prediction of
the size and share for both categories. It is notable that the better performance of S2 holds even
when we use the predicted past expenditures at competitors. Considering the fact that a reduced-
form model usually predicts better than a structural model, it is meaningful that our proposed
structural model outperforms the benchmark. Therefore, the results in the validation sample
firmly show that it is necessary to consider the interrelationships and heterogeneous preferences
when firms predict the size and share of wallet.
31
CONCLUSION
Summary
The purpose of this paper is to better understand consumers’ spending patterns across
firms and categories in order to better predict the size and share of wallet. It considers the
interrelationship between expenditures and heterogeneity in preference, which have not been
addressed in the previous literature. Consumers’ utility maximization problem with respect to
their expenditures subject to the budget constraint derives the simultaneous equations Tobit
model. To estimate the model, we propose a Bayesian estimation method. That is, using the
MCMC algorithms, we estimate the coefficients of endogenous variables, impute the latent
endogenous variables, and estimate the coefficients of exogenous variables and variance matrix
in sequence.
With this approach, we sought to answer several research questions raised in the
introduction. The first issue was what segments we can identify. We find two consumer
segments; the habitual segment and the adaptive segment. The former consists of consumers
whose current expenditures are closely related to past expenditures, possibly because their
budget allocation preference is stable. The latter segment consists of consumers whose current
expenditures across firms and categories are strongly interrelated and are affected by their
income.
32
The second issue was about differences in consumer spending patterns by segments,
especially the interrelationship of expenditures. We find that the interrelations are different
mainly in magnitudes by the segments. In general, within-bank expenditures in purchase and
cash advance categories positively affect each other. Within category, expenditures at the focal
bank and competing banks affect each other negatively in purchases category but positively in
the cash advances category. In cross-banks and categories, we find generally negative usage
patterns. For example, purchases at one bank are negatively related to cash advances at other
bank and vice versa.
The third issue was whether we can better predict the size and share of wallet by
considering the interrelationship of expenditures and customer heterogeneity. We compared the
size and share of wallet from our proposed model with those from the benchmark model. The
proposed model generally has lower prediction errors in both the size and share of wallet than the
benchmark model. We especially find that the prediction error reduction in the habitual segment
is large, possibly because the effects of past expenditures on the current expenditures are more
accurately estimated. In conclusion, our empirical findings show that it is important to consider
the interrelationships between expenditures and consumer heterogeneity to better predict the size
and share of wallet.
Managerial Implications
Our findings provide managers with guidelines through utilizing the interrelationship of
expenditures and heterogeneity between segments. First, from the significance and magnitude of
the interrelationships, firms can accurately implement cross-selling. For example, the increased
preference for purchases is associated with the increased preference for cash advances while the
33
converse is not true. Therefore, a cross-selling strategy of promoting purchases category first and
then cash advance category can be applied.
Second, managers should understand that expenditures in some categories at the focal
bank positively affect expenditure in the same categories at competing banks. For example, in
the cash advance category, expenditures at the focal bank increase expenditures at competing
banks. Without considering the reason of this positive impact, managers may unnecessarily
overspend on competitive promotions and advertising. If they can find reasons (e.g., the low
limit of cash advances) and take actions (e.g., increase the limit or offer a loan), there may be an
opportunity to capture consumers’ whole budget in the category.
Third, managers should give attention to the adaptive segment. This segment has larger
size of wallet than the habitual segment but the focal bank’s share is not larger. As the
interrelationships of this segment across firms is largely negative, if the focal bank can attract
this segment more, the bank can get the higher share of wallet from the increased expenditures at
the focal bank and decreased expenditures at competing banks. Therefore, managers need to take
care of this segment and provide incentives so that they can achieve higher share from
consumers of higher size.
We recommend the implementation of our approach as follows: Managers need to get
information on customers’ expenditures at competing firms for a sample of customers. It is
necessary to obtain this information at least once in order to estimate the parameters and check
whether the prediction is correct. After the managers obtain the coefficients from the sample,
they can predict expenditures for the out-of-sample customers by multiplying the exogenous
variables with the coefficients. Finally, they can predict the size and share of wallet with the
34
predicted expenditures at competing firms conditional on the expenditures at the firm. The
method to estimate a sample of customers and apply the coefficients to out-of-sample for
prediction is found in many studies (e.g., Iyengar et al. 2003).
If the managers use the past expenditures at competing firms for the exogenous variables
for the exclusion restrictions, it would be necessary to predict those expenditures for the out-of-
sample customers. Using a multivariate Tobit model, managers can regress the past expenditures
at competing firms on other variables available at the firm for the sample of customers. Then,
they can predict those expenditures for the out-of-sample customers using the coefficients
obtained from the model. For the periods after estimation, the managers can use the expenditures
previously predicted by our model as the past expenditures in the current period.
Limitation and Future Research
As the data is limited to demographic information and past expenditures, we saw the
impacts of only these variables on the current expenditures and not, for example, other marketing
mix effects. For example, we might speculate that if competitors increase advertising or provide
promotions, consumers may increase expenditures at competing firms and decrease expenditures
at the focal firm. Thus, future research using richer data sets could investigate the effects of
marketing mix. If panel data is available, it would also be worth investigating the change of the
interrelationship, size, and share of wallet over time, and its possible drivers.
35
APPENDIX
A. Derivation of a Simultaneous Equations Tobit Model
We present the derivation of a Simultaneous Equations Tobit model from the utility
maximization problem with binding non-negativity constraints referring to the previous research
(i.e., Amemiya et al. 1993; Ransom 1987).
We assume that consumers maximize a quadratic utility function by allocating their
budget across M firm-category combination and the outside option. This is expressed by
(A1) 2/')(max ' yAyybyUy
Wyts M'
11.. ,
where y is a vector of non-negative expenditures, )',,,,,,,( 21112110 MC yyyyyyy , b is a
(M+1) dimensional vector, A is a (M+1) (M+1) negative definite matrix, and 11M is a (M+1)
dimensional vector of ones.
The Lagrangean function is )1(2/' '1
' yWyAyybL M and the necessary and
sufficient Kuhn-Tucker conditions for a constrained maximum are
0my
L , 0my , 0m
m yLy
0L , 0 , 0L ,
where Mm ,,0 . That is,
36
(A2) mm
yyU 0 and 01'
1 WyM .
We assume that 00y and consequently, 00y
U and 01'1 WyM because of
complementary slackness. Therefore, Equation A2 can be rewritten as
(A3) mm
yyU
yU 0
0
and WyM'
11 , where .,,1 Mm
We partition the matrix and vectors as
,'
,, 000
Aaaa
Abb
byy
y and aa
a 0
where ,, 00 by and 0a are scalars.
In a matrix form, Equation A3 can be written as
(A4) yyabyAab MM 101)'(][ 0
Using the identity yWy M'
0 1 , we express the Kuhn-Tucker conditions in Equation A4 as
(A5) yGy M10 ,
where '0
' 111'1 MMMM aAaaG and MM WabWab 11 00 .
Note that contains the stochastic elements in the form of ( 0uum ) if we set up b with
a typical element )( 0 mmm ubb , where 0mb is a deterministic marginal utility and mu
represents individual differences in marginal utility among consumers. In a general form,
could be made to depend on individuals’ characteristics (exogenous variables) and error terms. A
typical m-th equation in Kuhn-Tucker conditions in Equation A5 could thus be written as
37
00
0011
m
mm
K
kmkmk
J
jmmj
yif
yifxy
where mkx are the individuals’ characteristics and m is an error term resulting from differences
in marginal utility.
An alternative way of writing the conditions would be
00
011
RHSif
RHSifxyy m
K
kmkmk
J
mjj
mmjmmm
which is a typical expression of a simultaneous equations Tobit model. That is, the Kuhn-Tucker
conditions to find out the optimal expenditure convert to the estimation problem of a
simultaneous equations Tobit model. After standardization of the parameters, we derive a
simultaneous equations Tobit model as
(A6) ,XY
where Y represents a vector of endogenous variables which could be censored at zero and X
represents exogenous individuals’ characteristics, and is a vector of error terms following a
multivariate normal distribution.
B. MCMC Algorithms for Model Estimation
Equation 2 is the main equation to estimate. Our approach is to sequentially draw , ,
, and *iy . We first explain the basic estimation method in the aggregate level from the
subsection 1 through the subsection 5. Then, in the subsection 6, we explain how to extend the
basic estimation method to the latent class model.
38
1. Likelihood function
To calculate the likelihood conditional on other parameters, we require the distribution of
*iy , which is derived from Equation 2. As i follows a multivariate normal distribution, we re-
arrange the equation and denote the function, )( *1iyg , as follows.
(B1) iiii Xyyg **1 )(
By the transformation technique, we get the distribution of *iy :
(B2)
)()()(21exp
)2(
1
|)(|)()(21exp
)2(1
|)(|][
)()]([)(
*1'11'1*2/1'112/
*1'*2/12/
*
*'
*1*1*
iiiiM
iiiiM
ii
i
iii
XyXy
absXyXy
absXyf
yyg
absygfyf
That is, a multivariate Normal distribution, ),(~ '111*ii XNy , is obtained.
Assuming that each observation is independent, we can calculate the likelihood for all
observations by )()( *
1
*i
N
i
yfyf .
2. Estimation of Coefficients of Endogenous Variables ( )
The matrix is estimated using a random walk chain Metropolis-Hastings method. As
the diagonal elements of are 1, we need to estimate only the off-diagonal elements. Let ~
denote the vector consisting of off-diagonal elements of , where the dimension of ~ is
1)(~ 2 MMK . We use a normal prior, i.e., ),(~~~~N and generate candidate draws
according to zr )1(* ~~ , where ),0(~ zNz and r denotes the r-th iteration. We assume
39
Kz Ic ~~ and determine the value of c~ to make the acceptance probability is around 40%,
following the general rule (Koop 2003). Therefore, the candidate *~ is drawn from a
multivariate normal distribution such as
)~,~(~~~
)1(*K
r IcN
Using the prior of ~ and the likelihood, we calculate the posterior probability of ~ as
follows.
(B3) )()~(),,|~( ** yfy
With Equation B3, we calculate the acceptance probability as
(B4) 1,),,|~~(
),,|~~(min)~,~( *)1(
***)1(
yy
rr .
3. Estimation of Coefficients of Exogenous Variables and Error Covariance ( and )
Once we get a new draw of )(~ r , we construct )(r by re-arranging .~ )(r Finally, we
calculate **~ii yy (hereafter, we suppress the iteration number r for simplicity). We now stack
all the observations together as
NMNN X
X
X
y
y
y
111
*
*1
* ,,,
~
~
~
and write
(B5) Xy *~ .
We assume that i follows ),0(N and follows ),0( NIN .
40
As Equation B5 is a SUR model, we can estimate and by using a Gibbs sampler
with standard Normal-Wishart priors. Specifically, we use a normal prior ),(~ N and a
Wishart prior ),(~1 VvW .
The posterior of conditional on *~y and 1 is ),(~,~| 1* Ny ,
where 1
1
1'1 )(N
iii XX and )~(
1
*1'1N
iii yX . The posterior for 1
conditional on *~y and is ),(~,~| *1 VvWy , where vNv and
.)'~)(~(1
1
**1N
iiiii XyXyVV
4. Data Augmentation
Now, we address the censoring issue. After getting all parameters ( , and ), we can
impute *iy in the following way. (1) If all elements in iy are positive, there is no need to impute.
(2) If all elements in iy are zero, we draw *iy from the multivariate truncated normal
distribution, )~,( 1)0,( iXMVTN , where '11~ . (3) If some elements of iy are zero,
we draw the latent values from a conditional multivariate truncated normal distribution.
Denote the zero symi ' as a vector of uy (unknown symi '* ) and the non-zero symi ' as a
vector of ky (known symi '* ). Then, we impute *
uy from
)~,( ||)0,( kukuMVTN ,
41
where ))((~~)( 111| kkkkukuku XyX and '1
|~~~~~
ukkkukuuku . Note that
uX )( 1 is a vector of elements of iX1 , which corresponds to unknown symi '* while
kX )( 1 is a vector of elements of iX1 , which corresponds to known symi '* . Similarly,
uu~ is a covariance matrix between unknown symi '
* while kk~ is a covariance matrix between
known symi '* . In addition, uk
~ is a covariance matrix between unknown symi '* and known .'* symi
5. Prior Distribution
We use diffuse settings for the priors on parameters as follows. Coefficients of
endogenous variables : )10,0(~~~
3~ KK IMVN . Coefficients of exogenous variables:
)10,0(~ 3KK IMVN . Variance-covariance matrix: ),(~1 VvW , where 3Mv and
MIvV )/1( . Variance of the random walk chain: 510~c , which makes Kz I ~510 .
6. The latent class membership
The steps of estimating latent classes model are to determine the class of observation for
each iteration and then run the estimation of Equation 2 in the given class. Given a latent
segment s, Equation 2 for consumer i belonging to the segment s can be expressed as
issisiss Xy* , where ),0(~ sis N and the likelihood function given other parameters is
S
ssssiisii yfeVBGepyf
1
** ),,|(),,,,|( ,
where '1 ),,( iSii eee and 1ise if consumer i belong to the segment s and 0ise if not. In
addition, '1 ),,( SG , '
1 ),,( SB and '1 ),,( SV .
42
p is a vector of the probability of the consumer belonging to the s-th class in the mixtures.
That is, ),,( 1 Sppp and )1( iss ePp . For the prior of p, we set up a Dirichlet distribution
of )(~ Dp , where S1 and S1 is an S-vector of ones. As there is an identification problem in
the mixture model, we impose a labeling restriction by drawing p from an ordered Dirichlet,
which makes ss pp 1 for Ss ,,2 . The posterior distribution of p is
)(~ Dp , where N
iie
1
.
We draw ie from the multinomial distribution, ),1(~ pMei . The posterior distribution of ie is
expressed as
S
ssssis
SSSiSS
ssssis
ii
yfp
yfp
yfp
yfpMe
1
*
*
1
*
111*
1
),,|(
),,|(,,),,|(
),,|(,1~ .
After the membership s is determined, we select observations belonging to the s class (i.e.
*isy and isX ), sequentially draw other parameters within the given class s, and repeat the steps
through the last class.
C. Monte Carlo Integration
We calculate expected expenditures at competing banks given that the focal bank has
access to consumers’ expenditures at it. To do this, first, we need to consider consumers’ usage
patterns at the focal bank (i.e., 21y and 22y ) such that (1) they use both categories, (2) they use
one of categories, and (3) they use none of categories. Second, we need to convert the latent
utility from expenditures (i.e., ),|( 2221*11 yyyE and ),|( 2221
*12 yyyE ) to expected consumer
43
spending (i.e., ),|( 222111 yyyE and ),|( 222112 yyyE ). We extend the expectation logic of a
univariate Tobit model to multivariate and conditional expectation. The distribution of the latent
utility is
(C1) ),(~ '111*ii XMVNy
The expected expenditures in two categories at competitors are calculated as follows.
(C2) ],),(|),[(]),[(
])(|)[(])[(
**12
*11
*12
*11
4
1
**12
*11
12,1112,11
4
112,1112,11
kk
k
kk
k
RyyyyERyyP
RyyyyERyyPyyE
where kR means a possible range that actual expenditures at competing banks exist. Specifically,
each range is defined as )0,0(: 12111 yyR , ),0(: 212112 ryyR , )0,(: 121113 yryR , and
),(: 2121114 ryryR , where 01r and 02r . *kR means a possible range that the latent utility
from expenditures at competing banks. Corresponding to , each *kR is defined as follows:
)0,0(: *12
*11
*1 yyR , )0,0(: *
12*11
*2 yyR , )0,0(: *
12*11
*3 yyR , and )0,0(: *
12*11
*4 yyR .
Given that expenditures at the focal bank is known, we can calculate expected
expenditures at competing banks conditional on expenditures at the focal bank.
(C3)
],),(),(|),[(]),(|),[(
),(|
),(|
**22
*21
**12
*11
*12
*11
4
1
**22
*21
**12
*11
**22
*2112,11
222112,11
lkk
lk
l
l
SyyandRyyyyESyyRyyP
SyyyyE
SyyyyE
where lS means observed expenditures at the focal bank and is one of )0,0(: 22211 yyS ,
),0(: 222212 syyS , )0,(: 221213 ysyS , or ),(: 2221214 sysyS , where 01s and
02s . *lS means a possible range that the latent utility from expenditures at the focal bank
kR
44
exists. Corresponding to lS , each *lS is defined as follows: )0,0(: *
22*21
*1 yyS ,
),0(: 2*22
*21
*2 syyS , )0,(: *
221*21
*3 ysyS , and ),(: 2
*221
*21
*4 sysyS .
As it is difficult to analytically calculate Equation C3, we use a Monte Carlo integration
and describe the steps as follows.
Step 1. Randomly draw ),( *12
*11 yy for N times. In case of )0,0(: 22211 yyS , we draw
from the multivariate normal distribution in Equation C1. In other cases, we draw from a
multivariate normal distribution conditional on the positive values of 21y and 22y .
Step 2. For the probability part in Equation C3, calculate the ratio of the number of draws **
12*11 ),( kRyy and **
22*21 ),( lSyy to the number of draws **
22*21 ),( lSyy .
Step 3. For the expectation part, calculate the average of draws ),( *12
*11 yy , which belong to
*kR given *
lS .
45
REFERENCES
Amemiya, Takeshi (1974), “Multivariate Regression and Simultaneous Equation Models when the Dependent Variables Are Truncated Normal,” Econometrica, 42 (6), 999-1012.
Amemiya, Takeshi, Makoto Saito, and Keiko Shimono (1993), “A Study of Household
Investment Patterns in Japan: An Application of Generalized Tobit Model,” The Economic Studies Quarterly, 44 (1), 13-28.
Baumann, Chris, Suzan Burton, and Greg Elliott (2005), “Determinants of Customer Loyalty and
Share of Wallet in Retail Banking,” Journal of Financial Services Marketing, 9 (3), 231-48.
Bowman and Das Narayandas (2004), “Linking Customer Management Effort to Customer
Profitability in Business Markets,” Journal of Marketing Research, 41 (November), 433-47.
Cameron, Colin A. and Pravin K. Trivedi (2005), Microeconometrics: Methods and
Applications, Cambridge: Cambridge University Press. Carlin, Bradley P. and Thomas A. Louis (2000), Bayes and Empirical Bayes Methods for Data
Analysis. Boca Raton: Chapman & Hall. Chen, Yuxin and Joel H. Steckel (2005), “Modeling Credit Card 'Share of Wallet': Solving the
Incomplete Information Problem,” working paper, New York University, NY. Cooil, Bruce, Timothy L. Keiningham, Lerzan Aksoy, and Michael Hsu (2007), “A Longitudinal
Analysis of Customer Satisfaction and Share of Wallet: Investigating the Moderating Effect of Customer Characteristics,” Journal of Marketing, 71 (January), 67-83.
Du, Rex Yuxing, Wagner A. Kamakura, and Carl F. Mela (2007), “Size and Share of Customer
Wallet,” Journal of Marketing, 71 (April), 94-113. Iyengar, Raghuram, Asim Ansari, and Sunil Gupta (2003), “Leveraging Information Across
Categoreis,” Quantitative Marketing and Economics, 1 (4), 425-65.
46
Kamakura, Wagner, Michel Wedel, Fernando de Rosa, and Jose Afonso Mazzon (2003), “Cross-selling through Database Marketing: a Mixed Data Factor Analyzer for Data Augmentation and Prediction,” International Journal of Resear3ch in Marketing, 20 (1), 45-65.
Koop, Gary (2003), Bayesian Econometrics. Hoboken, NJ: Wiley. Li, Kai (1998), “Bayesian Inference in a Simultaneous Equation Model with Limited Dependent
Variables,” Journal of Econometrics, 85 (2), 387-400. Li, Shibo, Baohong Sun, and Ronald T. Wilcox (2005), “Cross-Selling Sequentially Ordered
Products: An Application to Consumer Banking Services,” Journal of Marketing Research, 42 (May), 233-39.
Maddala, (1986), Limited-Dependent and Qualitative Variables in Econometrics. New York,
NY: Cambridge University Press. Malthouse, Edward C. and Paul Wang (1998), “Database Segmentation Using Share of
Customer,” Journal of Database Marketing, 6 (3), 239-52. Murphy, KM and RH Topel (1985), “Estimation and Inference in Two-Step Econometric
Models,” Journal of Business and Economic Statistics, 20 (1), 88-97. Ransom, Michael R. (1987), “A Comment on Consumer Demand Systems with Binding Non-
negativity Constraints,” Journal of Econometrics, 34, 355-59. Reinartz, Werner J. and V. Kumar (2003), “The Impact of Customer Relationship Characteristics
on Profitable Lifetime Duration,” Journal of Marketing, 67 (January), 77-99. Reinartz, Werner J., Jacquelyn S. Thomas, and V. Kumar (2005), “Balancing Acquisition and
Retention Resources to Maximize Customer Profitability,” Journal of Marketing, 69 (January), 63-79.
Verhoef, Peter C. (2003), “Understanding the Effect of Customer Relationship Management
Efforts on Customer Retention and Customer Share Development,” Journal of Marketing, 67 (October), 30-45.
Yang, Sha, Vishal Narayan and Henry Assael (2006), “Estimating the Interdependence of
Television Program Viewership Between Spouses: A Bayesian Simultaneous Equation Model,” Marketing Science, 25 (4), 336-49.
Zheng, Zhiqiang, Peter S. Fader, and Balaji Padmanabhan (2009), “Inferring Competitive
Measures Using Augmented Site-Centric Data,” working paper, University of Texas at Dallas, TX.
47
CHAPTER 2
SEARCH PATTERNS, SEARCH-BASED SEGMENTATION AND SEARCH RESULTS
OF AUTOMOBILE PURCHASERS
Sungha Jang
School of Management, Department of Marketing, SM32
The University of Texas at Dallas
800 West Campbell Road
Richardson, Texas 75080-3021
48
ABSTRACT
Consumers often search several information sources when making purchase decisions. In this
paper, we study how time spent searching one source is interrelated with time spent searching
other sources using data on new automobile purchases. In this category, information sources
include different offline sources, Internet websites, spouse and internal search. We build a
structural model assuming that consumers allocate their search time across several information
sources to maximize utility and estimate relationships of each information source. Then, we
segment consumers based on their search preferences and examine brand choices and price-
related results. We find that consumers use information sources in a complementary manner and
that the dealer is still a prominent source. We also find that at the segment level, except for two
segments, an inverted U-shaped relationship exists between internal and external search. Brand
choice analysis reveals that low external search is associated with a choice of American brands.
Finally, segments achieve different price negotiation times and discounts but display similar
satisfaction with price paid. Based on these results, we provide recommendations to automakers.
Keywords: Search; Segmentation; Structural model; Simultaneous equations Tobit model;
Brand choice.
49
INTRODUCTION
When consumers search for information to make purchase decisions, they often use several
information sources (Ratchford et al. 2003, 2007). For example, when purchasing an automobile,
consumers may do internal search based on their past experiences and satisfaction and do
external search by talking with acquaintances, reading independent reviews or going to
manufacturer and dealer sources either on the Internet or offline. If married, they will probably
also ask their spouses. Search can affect consumers’ brand choices and their ability to negotiate
better prices (Ratchford et al. 2003). Therefore, it is necessary to understand consumers’ search
patterns, identify different segments of searchers, and examine their search results. This
understanding can help to budget the billions of advertising and promotional dollars spent by
automobile manufacturers and dealers more effectively.
While a number of research papers deal with search across multiple information sources,
this paper addresses some issues that have received less attention. Consider, for example, the
following two observations from automobile search:
First, despite the proliferation of information sources and an average sticker price of
about $27000, half the consumers in our dataset reported an external search time of less than
eight hours. A possible explanation for this may be that these consumers rely less on external
sources because they possess information from internal search. Therefore, it would be useful to
see how consumers allocate their time on the different external and internal information sources.
Our proposed structural model allows us to consider the relationship between the entire set of
50
information sources including internal, offline, the Internet and spouses, unlike existing studies.
The model builds on work that studied the relationship between internal search, or product
knowledge, and external search but with fewer sources and limited interrelationships (e.g., Rao
and Sieben 1992; John, Scott and Bettman 1986). Similarly, our proposed model extends the
studies by Ratchford et al. (2003) and Klein and Ford (2003) that examined the impact of the
Internet on offline sources only, by looking at the reverse impact as well. Thus, this paper
contributes to extending the literature on search by examining interrelationships of all
information sources that consumers use.
Second, we observe that 40% of the consumers in the dataset did not use the Internet for
search even in 2005, although the Internet was widely available by then (i.e., 93% of respondents
had some access to the Internet) and studies by Ratchford et al. (2003, 2007) have found that in
automobile purchases, Internet search has substituted significantly for traditional sources. A
possible explanation is that different individuals search differently, with some segments having
lower search preference for the Internet than others. This possibility could be studied by a
segmentation analysis based on search preference. This paper examines search-based
segmentation of automobile buyers. The prevailing benchmark is Furse, Punj, and Stewart
(1984), who classified buyers into six segments, but prior to the emergence of the Internet as an
information source. It will be relevant to compare our segmentation results against Furse et al.
(1984) and see what changes have occurred. On the topic of segments, while the effects of search
on price-related outcomes such as the price discount have been examined (e.g., Viswanathan,
Kuruzovich, Gosain, and Agarwal 2007), brand choice for search-based segments has been
largely unexplored and it is undertaken in this paper. Therefore, the search-based segmentation
51
contributes to the literature by comparing similarities and differences between segments before
and after the Internet. In addition, the relationship between search-based segments and brands
shows the rationale that firms need different approaches to provide their consumers necessary
information.
We next provide a brief overview of the model and results of this paper. The information
sources in our dataset are listed in Table 2.1.
Table 2.1. Information Sources in the Automobile Purchases Category Offline sources Internet sources
Internal Search
External
Search
Personal Friends / Relative Online Chat sites
Independent Consumer Report Third Party websites
Independent websites
Manufacturer Brochures, Advertisement Manufacturer websites
Dealer Dealer, Showroom Dealer websites
Experiential Test-driving N/A
Spousal Search
We assume that consumers have limited discretionary time in which to search for information
and do outside activities. They further allocate their search time among different external sources
based on their preferences to maximize their utility. Due to the limited search time, an increase in
the search time on a source can lead to an increase or decrease in the search time on other
sources. These changes may depend on consumer characteristics and product attributes. We build
a structural model and estimate the impact of each information source and consumers’
52
Figure 2.1. Conceptual Model
characteristics on other information sources. Figure 2.1 depicts a conceptual view of these
relationships. In the model, positive search time is the revealed preference, whereas in case of
zero search, we estimate the latent preferences for the external information sources by data
augmentation.
Next, we segment consumers based on their estimated search preferences for information
sources. We compare and contrast the segments by their brand choices as well as their price-
related outcomes (e.g., negotiation time, discount and final price satisfaction). The relationship
between segments and brand choice is insightful for developing the proper communication
strategies for automobile makers.
Internal search
Internal search2
Spouse search
Spouse search × Male
Year 2003
Year 2005
Male
Age
Education years
Hourly wage
Log of sticker price
Exogenous Variables
Endogenous Variables
Personal
Independent (I*)
Manufacturer (I*)
Dealer (I*)
Independent
Manufacturer
Dealer
Test-Driving
*: I represents the Internet source.
53
Our main results are as follows: First, we find that the interrelationships between
preferences for information sources in automobile purchases are generally positive. However,
consumers who prefer to spend more time with the dealer spend less time on other information
sources including Internet sources. Another finding is that Internet search is generally associated
with increased offline search except for search in the manufacturer websites, which substitutes
search in offline independent, manufacturer, and dealer sources. In addition, relationships
between the Internet sources are complementary to each other. We also find that internal search
reduces external search, perhaps explaining why some customers have short search times, but we
do not find an overall inverted U-shaped relationship between internal search and each external
information source in this part of the analysis. Finally, the buyer’s external search increases with
spouse search.
Second, we identify nine segments of automobile buyers using the individual search
preference for each source. Segment-wise, the relationship between internal and external search
is, in general, inverted U-shaped, indicating that consumers with low or high internal search do
less external search than those with moderate internal search. But two segments do not exhibit
this pattern. In one there is low internal search but high external search, while the other has
moderate internal search but very high external search. In addition, when we overlay our
segments on those of Furse et al. (1984), several differences emerge, which may be expected
given that our segmentation is influenced by Internet usage. For example, the proportion of
searchers relying on acquaintances has decreased while the proportion of independent searchers
has increased due to easy access to information via the Internet.
54
Third, we examine the brand choices of the segments. Overall, the low external search
segments tend to choose American brands, while high external search segments, which use the
Internet, tend to choose Japanese or European brands. This result indicates that manufacturers
need to consider different advertising strategies (e.g., enhancing the brand loyalty vs. providing
more information) depending on their customers’ search patterns. For the price-related results,
we find that the different segments have different negotiation time, discount amount, and
discount rate but the same satisfaction levels with the price paid.
The rest of this paper is organized as follows: In the next section, we discuss the relevant
literature and our contribution. After this, we introduce the search time allocation model and the
estimation methodology. We then describe the dataset and present empirical results. Finally, we
provide conclusions, managerial implication and directions for future research.
55
LITERATURE REVIEW
This study is related to four research streams: search time allocation, relationships of information
sources, segmentation of automobile buyers, and the outcomes of search. Table 2.2 describes the
positioning of this study in comparison to other closely related studies on automobile purchases.
Regarding, first, search time allocation, Hauser Urban, and Weinberg (1993) propose an
allocation model of search time given a budget constraint. They calculate the value of positive
and negative information on four different sources, viz., showroom, interview, articles, and
advertisements. They find that the values of sources are different in the search order, i.e.,
showroom and interviews had higher values than other sources. Differences from the current
research are that their study was conducted in a laboratory setting, with fewer information
sources and with limited search time, and that the interrelationship between different sources was
not tested. Ratchford et al. (2003) propose a utility maximization model in which consumers
decide on the total search time and allocate it between different sources by taking into account
the gains and losses from search. Their focus is on the impact of the Internet on the share of
offline sources in searching. Our paper extends their examination of the effect of the Internet by
differentiating between search on different types of Internet websites (i.e., independent,
manufacturer, and dealer) and for different segments of consumers. These different websites
have different characteristics. That is, independent websites (e.g., Consumer Reports) cover
many brands and have prices and users’ opinions. Manufacturer websites provide detailed
information on their own models, and dealer websites provide selling prices and transaction
56
Table 2.2. Comparison of Studies Related to Automobile Purchases
Category Furse et al. (1984)
Hauser et al. (1993)
Ratchford et al. (2003)
Zettelmeyer et al. (2006)
This Study (2010)
1. Purpose Classify the buyers into segments in the perspective of the buyers and dealers
Find benefit/cost model of how consumers allocate time and of the value of sources
Study the effects of the Internet search on the offline sources and total time
Study the way of how the Internet reduces the price
Study search patterns, segments, and the search results
2. Data Survey of recent buyers
Experiment Survey of recent buyers
Survey of recent buyer and real transaction data
Survey of recent buyers
3. Model Cluster analysis Allocation of time on source j:
Tttts
tvtv
J
jj
J
jjj
10
100
..
)()(max
))/exp(1()( jjjjj ttv
v: value t: search time
Allocation of time on source j:
jj
jjj
tw
taSg
MaxB
))]ln((exp[1
g: search gain S: prior information t: search time
Regression:
iiii
i
SDXprice )ln(
X: transaction data D: demographics S: survey response
Allocation of time on source j (y vector):
2/'')( AyyybyUy: a vector of time spent on source j’s
4. Variables a. Personal b. Independent c. Manufacturer d. Dealer e. Experiential f. Indep. (I*) g. Manuf. (I) h. Dealer (I) i. Internal j. Spouse
√ √ √ √ √
√ (indirect) √
√ √ √
√ √ √ √ √
√ (aggregated) √ (aggregated) √ (aggregated) √ (indirect)
√ √ √ √ √ √ √
√ √ √ √ √ √ √ √ √ √
5. Relationship among search sources
√ (share of j) √
6. Heterogeneity in searchers √ (clusters) √
7. Search results a. Brand b. Price
√ (Partial)
√
√
√ √
*: I represents the Internet source.
57
services. Therefore, different websites potentially have different effects on consumer search
behavior. Ratchford et al. (2007) extend the model in Ratchford et al. (2003), allowing more
general assumptions, but do not distinguish different types of websites. We also examine the
interrelationship between information sources, which are not fully dealt with in these papers and
falls in the next research stream.
The second research stream deals with interrelationships between offline search, Internet
search, internal search, and spouse search. The impact of the Internet on the offline sources has
been investigated. For example, Ratchford et al. (2003, 2007) find that Internet search
moderately reduces the share of third-party print and severely reduces the share of the dealer.
Similarly, Klein and Ford (2003) find that active shoppers who were planning to purchase a car
within the next six months in 2000, increased the share of Internet sources but decreased the
share of dealer visits compared to buyers who already bought in the previous year, 1999. In
contrast, there is little study on the effects of the offline sources on Internet search. In our data,
even in 2005, almost half the consumers did not search the Internet when purchasing an
automobile. Possibly those consumers rely on offline sources. Therefore, we examine how
offline sources affect Internet search, which is absent in existing automobile studies.
There are several studies on the relationship between internal search, measured as
knowledge and experience, and external search (e.g., Guo 2001), but the theory is still unsettled.
Moorthy, Ratchford, and Talukdar (1997) find that amount of search follows the inverted U-
shape versus experience, which means that external search is low for consumers with low or high
knowledge, and high for those with moderate knowledge. They explain that consumers with low
experience are not able to make fine distinction between alternatives and therefore have little
58
incentive to search and consumers with high experience have relatively little uncertainty about
alternatives and thus do not externally search a lot, while consumers with an intermediate
experience have partially differentiated brand perceptions and hence a greater incentive to
search. In contrast, Rao and Sieben (1992) find a U-shaped relation between prior product
knowledge and search amount. Some researchers report a positive relationship between
subjective knowledge and search (e.g., Srinivasan and Ratchford 1991) while others find a
negative relationship between product familiarity and the search (e.g., Russo and Leclerc 1994).
We test for the inverted U-shaped relationship in two ways, first by looking at the significance of
the quadratic form of internal search in parameter estimation and second by looking at external
search time by the segments sorted by the degree of internal search. Segment level analysis helps
to resolve some of the above inconsistencies. Separately, we also examine search time with the
spouse. Yang, Narayan, and Assael (2006) find that spouses have a positive impact on their
partners’ viewership of TV programs. Because the husband and wife are likely to visit dealers
and perform other search activities together, one would expect to find a positive relationship
between spouses in automobile purchases.
The third stream of interest is the segmentation of automobile buyers. Furse et al. (1984)
identified six segments from the perspective of buyers and dealers. However, segmentation of
automobile buyers needs to be revisited because many consumers currently search on the
Internet, which was unavailable at the time of Furse et al.’s study. In addition, their paper did not
focus on the brand choice or price-related outcomes for different search-based segments.
The fourth literature stream deals with the results of search. Much of it focuses on the
effects of search on price discounts. For example, Zettelmeyer et al. (2006) examine how search
59
on the different types of websites reduces the price paid. Viswanathan et al. (2007) find that price
information from Internet search reduces the paid price but that product information from
Internet search increases it. However, it is also informative to understand the relationship
between search and brand choices, as we shall show.
To conclude, our study extends the existing literature by investigating the
interrelationship of a large number of information sources used in automobile search, segmenting
consumers on the basis of their search preference across multiple sources, which includes
Internet sources, and examining the results of search on brand choices as well as price-related
outcomes, by segment.
60
MODEL AND METHODOLOGY
As shown in Table 2.1, information sources used by automobile buyers can be grouped into:
internal search, various external search sources, and spouse. We will assume that internal search
and spouse search are less time constrained to buyers compared to external search. Therefore,
consumers maximize their utility by allocating a limited time across external sources, with these
allocations possibly influenced by internal search, spouse search, consumer demographics, and
product attributes.
We set up the utility maximization problem as follows: Consumers allocate
Jjy j ,1, , of their time to search in each of J external information sources, and time 0y on
other activities including work and leisure. They maximize utility by allocating a total time of T.
This is expressed by
(1) 0 1 2
0 1 2, , ,
01
max ( , , , , )
. . .
JJy y y y
J
jj
U y y y y
s t y y T
Jy( , ),
As in previous studies, the functional form for utility is selected such that the marginal gains
from search are diminishing (e.g., Hauser et al. 1993 and Ratchford et al. 2003). We use a
quadratic utility function, which has this property, specifically,
2/'')( AyyybyU ,
where y is the vector of search times and other activities, )',,,,( 210 Jyyyyy , b is a (J+1)
dimensional vector, and A is a (J+1) (J+1) negative definite matrix. Here, b and A can be
61
interpreted as proportional to the mean and the variance-covariance matrix of the unit return to
search time and other activities (Amemiya, Saito and Shimono 1993).
For this type of maximization problem of a quadratic utility and linear constraint,
Ransom (1987) and Amemiya et al. (1993) have proved that the Kuhn-Tucker conditions yield a
simultaneous equations Tobit model, first introduced by Amemiya (1974). That is, for each
consumer i, the structural form of a simultaneous equations Tobit model is given by
(2) iii Xy* ,
where the vector of endogenous variables *iy affect each other and are affected by a vector of
exogenous variables iX and a vector of error terms i . Note that other activities 0y and the total
time T are absorbed into error terms and coefficients of variables during transformation.
The endogenous vector *iy can be defined as the vector of latent search preference for J
information sources. For consumer i and information source j, the relationship between the
search preference and the search time is given by,
000
*
**
ij
ijijij yif
yifyy
where *ijy is consumer i’s search preference for the information source j and ijy is the observed
search time on it. That is, we assume that search time on the information source j is related to the
search preference for the source j. If the search preference for an information source exceeds a
threshold utility, which we scale to zero, the consumer searches that information source and the
observed positive search time is treated as the preference itself. However, if the search
62
preference for the source is less than the threshold, the consumer does not search that source and
the observed time is zero.
The matrix iX contains the vectors of exogenous variables (e.g., demographics and
product attributes) for all the information sources in the following form.
'
'2
'1
000
0000
iJ
i
i
i
x
xx
X ,
where ijx is a jk -vector containing the i-th observation of the vector of explanatory variables in
the j-th equation. In each ijx vector, there are variables common in all J equations and a unique
variable in each j equation. We use the unique variables as exclusion restrictions to identify the
system.
is a JJ matrix whose diagonal elements are one and off-diagonal elements are
coefficients of other endogenous variables. This matrix expresses the interrelationship of the
endogenous variables. is a 1K vector of coefficients of the form '''1 ),,( J , where
J
jjkK
1
. The vector shows the impact of the exogenous variables on the endogenous
variables. Finally, i follows a normal distribution ),0(N .
Estimating the model involves simultaneity and censoring issues. Simultaneity occurs
since, due to limited time, change in the allocation of search time to one source leads to changes
in all the others. The censoring issue occurs because consumers do not use all information
sources. For example, in our data, almost half the respondents did not use the Internet. To handle
63
these issues in estimation, we adopt the method of Jang, Prasad, and Ratchford (2010). This
method, briefly described in the Appendix, is to sequentially draw ,,, *iy and given
other parameters using MCMC algorithms.
Next, we classify automobile buyers into segments. To segment, we use the latent search
preference *iy for each information source, the internal and spouse search times, and a dummy
variable for whether the consumer used the Internet or not. To handle both continuous variables
and a discrete variable (the Internet usage dummy), a two-step clustering method is used. We
then profile the segments using the variables used in the clustering analysis and adding
demographic and transactional variables.
Finally, we investigate the search results by segment. To find the relationship between
the segments and their brand choices, we use a correspondence analysis and multinomial logit
analysis. Correspondence analysis is an exploratory data analysis technique for the graphical
display of contingency tables (e.g., see Hoffman and Franke 1986). Besides brand choices, we
also look at the price-related outcomes (negotiation time with the dealer, discount amount,
discount rate, and the final price satisfaction) using ANCOVA.
64
DATA AND ESTIMATION
Data Description
The database contains consumers’ search behavior on automobiles purchased in 2001,
2003, and 2005; the data on purchases in 2001 was used by Ratchford et al. (2003, 2007). The
more recent datasets, which follow the same general format and survey procedure, have not been
used in studies to date. Details of the data collection procedures are given in Ratchford et al.
(2003, 2007). Table 2.3 shows the descriptive statistics of some important variables.
Automobile buyers answered about their search time on various information sources and
the helpfulness of each source. They also provided responses about their previous purchase
experience, with which we construct the internal search variable, and about spouse search time.
In addition, they gave their new car information such as brand and price. For time-variant
monetary variables such as hourly wage, sticker price, or discount, we took 2005 as the base year
and adjusted the monetary values in 2001 and 2003 by the proper inflation factors of 10% and
6%, respectively.
To see the interrelationships between search preference for different information sources
and the effects of other exogenous factors on them, we ran the simultaneous equations Tobit
model given by Equation 2. For simplified interpretation, we rearrange the terms as,
.
65
Table 2.3. Descriptive Statistics of Major Variables A) Search Time
Information Sources
Search Time (hour) Helpfulness Mean SD User (%) Mean SD
Offline
Personal 1.51 (2.52) 69.7% 3.67 (1.98) Independent 1.81 (3.20) 63.8% 3.43 (2.05) Manufacturer 1.51 (2.40) 75.1% 3.32 (1.71) Dealer 2.66 (2.68) 100.0% 4.65 (1.44) Test Driving 1.53 (2.85) 88.3% 5.15 (1.90)
Internet Independent 1.07 (2.96) 36.6% 2.12 (1.74) Manufacturer 1.08 (3.06) 45.4% 2.81 (2.18) Dealer 0.45 (1.30) 28.2% 2.19 (1.86)
Buyer Total Search 11.61 (11.40) 100.0% Spouse 3.94 (6.99) 63.5%
*: Helpfulness is on a 1-7 scale where 7 is very helpful.
B) Demographics Variables Mean (or %) SD
Age 46.88 (13.02)
Male (Dummy variable) 0.54 (0.50)
Married (Dummy variable) 0.70 (0.46)
Education Years 15.40 (2.82)
Hourly Wage ($) 23.36 (19.36)
Sticker Price ($) 27164.76 (7752.15)
Discount ($) 2864.34 (2565.89)
66
(3)
8
2
1
*8
*2
*1
8281
2821
1812
88
22
11
'8
'2
'1
*8
*2
*1
00
00
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
y
yy
helpful
helpfulhelpful
Xc
XcXc
y
yy
.
Note that the matrix in Equation 2, which represents the interrelationship between endogenous
variables, is converted into the -matrix in the right hand side of Equation 3. Therefore, ’s in
the j-th equation is interpreted as effects of search preference for other information sources
)( *, jiy on search preference for the j-th information source ( *
ijy ).
Search preference for external information sources makes up the endogenous variables
( *ijy ). The sources can be divided into offline and Internet sources. Search preference for offline
sources includes, personal (talking to friend or relative, *1iy ), independent (reading magazines
such as Consumer Reports, *2iy ), manufacturer (reading brochures, *
3iy ), dealer (visiting the
dealer, *4iy ), and experiential (test-driving, *
5iy ) sources. Search preference for Internet sources
include visiting independent websites ( *6iy ), manufacturer websites ( *
7iy ), and dealer websites
( *8iy ). Therefore, there are eight endogenous variables corresponding to the search times on eight
external information sources.
For the common exogenous variables, iXc , we include internal search and spouse search.
We include the square of internal search in order to see whether there is the inverted U-shaped
relationship between internal search and external search. We also consider an interaction effect
between spouse search and gender to see whether men consult less with their spouse than
women. To control observable consumer heterogeneity, we include age, male, education years,
67
hourly wage, and log of sticker price. Especially, education years and hourly wage are
significantly related to search patterns as more educated people may search less because of their
higher search productivity and higher income people may search less due to higher opportunity
costs. To capture unobservable heterogeneity over years, we use year dummies for year 2003 and
year 2005. For exclusion restrictions, we use the variables, helpfulness of each information
source.
As internal search cannot be directly observed, it is operationalized indirectly. Of the
available variables in the data, according to Bettman (1979), the degree of internal search is
related to whether the consumers buy from the same maker, satisfaction with the previous
product, and the number of purchases in 10 years. Furthermore, in the existing automobile
studies, experience and knowledge, which induce internal search, are measured by total
purchases, satisfaction with the previous car, or time since last purchase (Moorthy et al. 1997;
Punj and Staelin 1983; Srinivasan and Ratchford 1991). We conduct a principal component
analysis in order to construct a measure from these variables, which represents internal search.
Table 2.4 shows the results of the principal component analysis. The first principal
component has weights of 0.53 to 0.62 on each variable, capturing the common characteristic of
each variable, and it explains 44% of the original information. Thus, we use the first principal
component as the measure of internal search.
We exclude a few respondents who were outliers, namely, those reporting search times
on each offline external source exceeding 30 hours, or spouse search times of over 50 hours, or
zero time with the dealer. After excluding outliers, the number of respondents was 647 in 2001,
510 in 2003, and 550 in 2005, making up 37.5%, 30.1%, and 32.4% of the sample, respectively.
68
Table 2.4. Results of Principal Component Analysis Variables PC1* PC2 PC3
Same maker indicator (Dummy variable) 0.62 -0.13 -0.77
Previous car satisfaction 0.57 -0.60 0.56
Number of new car purchases in 10 years 0.53 0.79 0.30
Proportion Explained 44% 29% 27%
*: PCk means the k-th principal component.
For the missing values in certain variables for some respondents, we replaced them with the
mean of the rest of the respondents in the same year.
We exclude a few respondents who were outliers, namely, those reporting search times
on each offline external source exceeding 30 hours, or spouse search times of over 50 hours, or
zero time with the dealer. After excluding outliers, the number of respondents was 647 in 2001,
510 in 2003, and 550 in 2005, making up 37.5%, 30.1%, and 32.4% of the sample, respectively.
For the missing values in certain variables for some respondents, we replaced them with the
mean of the rest of the respondents in the same year.
Estimation
We estimate the parameters of Equation 2 by a Metropolis-Hastings within a Gibbs
sampler and impute the latent endogenous variables *jy using data augmentation method. The
MCMC algorithms are described by, and follow, Jang et al. (2010). We review the estimation
method and the choice of priors in Appendix.
The MCMC algorithms are run twice. In the first run, we took 25000 draws and
discarded the first 10000 as a ‘burn-in’ period. The remaining 15000 draws were used to
69
calculate the variance of the coefficients of the endogenous variables, used for the candidate
drawing in the second run. The second run made 25000 draws of which the first 10000 were
discarded and the remaining 15000 were used to calculate the posterior distributions of all the
coefficients. We verified that the algorithm converged by investigating the stable trends of draws
and the distributions of draws. The acceptance rate for the second run is about 0.47. For *iy , we
saved the last 1000 imputed values and averaged them.
70
RESULTS
The estimation results of Equation 3 are reported in three parts. The first part deals with the
estimation of the simultaneous equations Tobit model, which reveals the interrelationships
between search preferences in different sources and the effects of exogenous variables. The
second part deals with the results of the segmentation analysis, which classifies consumers based
on their search patterns. The third part deals with examining the segments’ brand choices and
price-related outcomes. In the results, we emphasize interpretation of the statistically significant
variables whose 95% posterior intervals do not contain zero.
Relationships between Information Sources
Recall that in the simultaneous equations Tobit model of Equation 3, the - matrix on
the right-hand side represents the effect on search preference for a given source of search
preference for other sources. The vector represents the effects of the internal search, spouse
search, and other observable demographics on search preferences. We begin by discussing
results from the estimation of the - matrix, given in Table 2.5, dividing them into effects of
search preference for Internet sources and effects of search preference for offline sources on each
information source. After that, we will discuss the results from the estimation of , given in
Table 2.6.
Effects of the Internet sources. The upper right side of Table 2.5 shows the effects of
search preference for Internet sources on search preference for offline sources. Overall, search
71
Table 2.5. Interrelationship of the External Search Sources
RHS
LHS
Offline Sources Internet Sources
Personal Independent Manufacturer Dealer Experiential Independent Manufacturer Dealer
Offline Personal 0.127 (0.06) -0.05 (0.13) -1.384 (0.26) 0.288 (0.06) 0.069 (0.06) -0.042 (0.09) 0.253 (0.11)
Independent 0.176 (0.06) 0.315 (0.11) -0.521 (0.12) -0.016 (0.09) 0.222 (0.05) -0.198 (0.08) 0.163 (0.11)
Manufacturer 0.107 (0.04) 0.16 (0.03) -0.634 (0.23) 0.197 (0.08) 0.043 (0.03) -0.06 (0.03) 0.162 (0.05)
Dealer 0.025 (0.03) 0.03 (0.03) -0.006 (0.05) 0.189 (0.03) 0.07 (0.03) -0.055 (0.03) 0.072 (0.03)
Experiential 0.071 (0.07) 0.066 (0.07) -0.073 (0.11) -1.605 (0.22) 0.174 (0.06) -0.031 (0.08) 0.04 (0.1)
Internet Independent -0.101 (0.12) 0.346 (0.1) -0.055 (0.16) -2.209 (0.46) 0.407 (0.12) 0.155 (0.11) 0.228 (0.15)
Manufacturer 0.108 (0.07) -0.148 (0.08) 0.14 (0.21) 0.092 (0.35) 0.028 (0.15) 0.204 (0.05) 0.209 (0.09)
Dealer 0.07 (0.04) 0.087 (0.04) -0.063 (0.08) -0.537 (0.24) 0.034 (0.1) 0.041 (0.03) 0.193 (0.03)
Note: Variables in bold are significant at the 95% level and numbers in parentheses are standard deviations.
72
preference for independent websites and dealer websites is correlated with search preference for
offline information sources while search preference for manufacturer websites is negatively
correlated with it. Specifically, search preference for independent websites is correlated with
search preference for offline independent sources ( 222.0 ), dealer sources ( 07.0 ), and
the experiential source ( 174.0 ). Increased search preference for dealer websites is associated
with increased search preference for the offline personal, manufacturer, and dealer sources
( 253.0 , 0.162, and 0.072, respectively). The complementary effects exhibited above may
have been driven by consumers’ will to verify or to utilize the online search information more
effectively with offline search information. In contrast, search preference for manufacturer
websites is negatively correlated with search preference for offline independent sources
( 198.0 ), manufacturer sources ( 06.0 ), and dealer sources ( 055.0 ). The
substitution pattern is meaningful not only because it shows which Internet source substitutes
traditional offline information sources but also because it has implications for manufacturers that
providing proper information on their websites can reduce consumers’ extended offline search.
The interrelationships between Internet sources also show complementary effects (in the
lower right side in Table 2.5). Search preference for independent websites is correlated with
search preference for manufacturer websites ( 204.0 ). Search preferences in the manufacturer
websites and the dealer websites are correlated with each other ( 193.0 and 209.0 ,
respectively). Though there is no direct association between search preference for independent
websites and dealer websites, there is an indirect association because search preference for
independent websites is associated with search preference for manufacturer websites, which is
73
sequentially associated with search preference for dealer websites. Thus, we can conclude that
Internet users tend to search all types of websites together.
Effects of the offline sources. The interrelationships of the offline sources in the upper left
side in Table 2.5 are positive in general, i.e., we find that consumers complementarily use each
offline information source. It is seen that search preference for personal sources is correlated
with search preference for independent sources and manufacturer sources ( 176.0 and 0.107).
Also search preference for independent sources is correlated with search preference for personal
sources and manufacturer sources ( 127.0 and 0.16), while search preference for
manufacturer sources is correlated with search preference for independent sources only
( 315.0 ). Finally, search preference for experiential sources (test-driving) is correlated with
search preference for personal, manufacturer, and dealer sources ( 288.0 , 0.197, and 0.189).
Therefore, it can be concluded that consumers prefer to search multiple offline information
sources.
An exception exists with search preference for offline dealer sources, which is strongly
negative on search preference for other offline sources ( 384.1 , 521.0 , 634.0 , and
605.1 , for the personal, independent, manufacturer, and experiential sources, respectively).
That is, if consumers have higher preference of searching at offline dealer sources, their
preference of searching in other sources decreases. It is interesting that search preference for
offline dealer sources is also negatively correlated with search preference for experiential
sources. One reason may be that offline dealer sources include taking a look at the showroom,
which enables consumers to have product information (e.g., style, comfortable seats, etc) without
74
driving. The implication of the findings stated above is that the dealer can supply effective and
comprehensive information to consumers.
Effects of search preference for offline sources on search preference for Internet sources
vary depending on offline sources (in the lower left side in Table 2.5). Search preference for
offline independent sources is correlated with search preference for independent websites and
dealer websites ( 346.0 and 0.087). Similarly, search preference for experiential sources is
correlated with search preference for independent websites ( 407.0 ). An explanation for these
positive effects may be that consumers tend to seek for information across offline and Internet
sources. It is notable that the effect of offline independent sources is positive on independent
websites but negative on the manufacturer websites ( 148.0 ). Recalling that the effect of
independent websites is positive on offline independent sources, while the effect of the
manufacturer websites is negative, clearly offline independent sources do not compete against
their Internet version but against manufacturer websites.
It is also found that effects of search preference for offline dealer sources are negative for
search preference for independent websites ( 209.2 ) and dealer websites ( 537.0 ).
Thus, though the Internet is widely available and popular for searching, it has not replaced the
traditionally important information source, the dealer, for a large proportion of buyers. However,
it is notable that search preference for offline dealer sources is not negatively correlated with
search preference for manufacturer websites. Along with the results that search preference for
manufacturer websites replaces search preference for offline sources, no effect of search
preference for offline dealer sources on search preference for manufacturer websites shows that
manufacturers’ website information plays an important role in consumers’ search behavior.
75
The findings from the interrelationships between information sources can be summarized
as follows. First, many positive coefficients show that consumers prefer to use various
information sources. Second, negative effects of search preference for offline dealer sources on
search preference for other offline and online sources show that the dealer is still an important
source in the Internet era. Third, manufacturer websites are the most important Internet sources
in consumer search since search preference for manufacturer websites is negatively correlated
with search preference for offline sources, while search preference for manufacturer websites is
not negatively correlated with search preference for offline dealer sources. The latter generally
reduces search preference for other sources. Fourth, the Internet versions of information do not
necessarily replace the traditional counterparts, as there are positive interrelationships between
some information sources (e.g., independent sources). Rather, different information sources in
the different format can be in competition. For example, search preference for the offline
independent source is negatively correlated with search preference for manufacturer websites
and vice versa.
Effects of Internal search and spouse search. The effects of exogenous variables are
given in Table 2.6. Here, we discuss the effects of internal search, spouse search and other
exogenous variables. We find in Table 2.6 that the extent of internal search significantly reduces
search preference for external sources except offline manufacturer sources, independent
websites, and dealer websites. The reduction is especially large in the offline personal and
experiential sources ( is less than 40.0 ). Thus, consumers who have a high level of internal
search reduce external search. However, any coefficients of the squared internal search are not
76
Table 2.6. Effects of the Exogenous Variables A) Effects of exogenous variables on search preference for offline sources
Endogenous Exogenous
Offline Sources Personal Independent Manufacturer Dealer Test Driving
Intercept 0.211 (5.34) -9.878 (4.18) 1.236 (3.28) -1.891 (2.52) -12.173 (5.71) Internal -0.398 (0.13) -0.281 (0.1) -0.122 (0.09) -0.172 (0.06) -0.497 (0.13) Internal2 -0.044 (0.09) -0.07 (0.07) -0.018 (0.06) -0.066 (0.04) -0.059 (0.09) Spouse 0.158 (0.03) 0.091 (0.02) 0.152 (0.03) 0.092 (0.01) 0.195 (0.04) Spouse Male 0.048 (0.04) 0.017 (0.03) -0.042 (0.02) 0.025 (0.02) 0.072 (0.04) Year 2003 -0.161 (0.31) 0.615 (0.26) -0.107 (0.21) -0.149 (0.15) 0.377 (0.35) Year 2005 -0.389 (0.31) 0.238 (0.25) -0.521 (0.2) -0.016 (0.15) 0.586 (0.34) Male -0.755 (0.31) 0.004 (0.26) 0.086 (0.21) -0.22 (0.15) -0.82 (0.34) Age -0.023 (0.07) -0.043 (0.06) -0.016 (0.05) 0.018 (0.04) 0.048 (0.08) Education years -0.084 (0.05) -0.042 (0.04) -0.098 (0.04) -0.06 (0.03) -0.05 (0.06) Hourly wage -0.013 (0.01) -0.01 (0.01) -0.002 (0.01) -0.009 (0.004) -0.02 (0.01) Log of sticker price 0.301 (0.51) 0.891 (0.39) 0.096 (0.32) 0.435 (0.23) 1.228 (0.54) Helpfulness 1.02 (0.08) 1.12 (0.06) 0.571 (0.05) 0.175 (0.02) 0.94 (0.09)
B) Effects of exogenous variables on search preference for online sources
Note: Variables in bold are significant at the 95% level and numbers in parentheses are standard deviations.
Endogenous Exogenous
Online Sources Independent Manufacturer Dealer
Intercept -16.19 (8.36) -7.74 (5.4) 2.808 (3.52) Internal -0.345 (0.21) -0.275 (0.14) -0.131 (0.09) Internal2 -0.123 (0.14) -0.041 (0.09) -0.105 (0.06) Spouse 0.253 (0.06) -0.006 (0.05) 0.065 (0.03) Spouse Male 0.036 (0.06) 0.056 (0.04) 0.029 (0.02) Year 2003 1.665 (0.53) 0.901 (0.35) -0.491 (0.23) Year 2005 0.978 (0.51) 0.878 (0.35) 0.141 (0.22) Male -0.333 (0.51) 0.086 (0.34) -0.031 (0.22) Age 0.061 (0.12) 0.042 (0.08) 0.037 (0.05) Education years 0.099 (0.08) 0.065 (0.06) -0.061 (0.04) Hourly wage -0.006 (0.01) 0.009 (0.01) -0.01 (0.01) Log of sticker price 1.195 (0.79) 0.052 (0.51) -0.399 (0.34) Helpfulness 1.782 (0.15) 1.194 (0.1) 0.752 (0.05)
77
significant. Therefore, we do not find any evidence of the inverted U-shaped relationship
between internal search and external search in parameter estimation.
The effect of spouse search is positive on search preference for all the external sources
except for manufacturer websites ( ’s range between 0.065 and 0.253). Note that there is no
interaction effect between spouse search and gender across all information sources. Therefore,
regardless of the gender of buyers, it seems that buyers and their spouses search together.
Effects of other exogenous variables. There are various effects of other exogenous
variables in Table 2.6. We find that the year of the survey had an effect on search in external
sources. Compared to the base year 2001, consumers searched more in offline independent
sources in 2003 ( 615.0 ) and searched less on offline manufacturer source in 2005
( 521.0 ). Among Internet sources, consumers searched more on independent websites
( 665.1 in 2003) and manufacturer websites ( 901.0 in 2003 and 0.878 in 2005).
However, they reduced their search time on dealer websites in 2003 ( 491.0 ). The positive
sum of coefficients reflects the increase in total external search time in 2003 and 2005 compared
to the base year 2001.
The results in Table 2.6 also show the effects of demographics and product attributes.
Male buyers searched less on offline personal sources ( 755.0 ) and experiential sources
( 82.0 ). However, there is no effect of age on search preference for all information sources.
With respect to search cost-related variables, consumers with higher education years had a lower
search preference for the offline manufacturer sources and dealer sources ( 098.0 and
0.06- ) and consumers with higher hourly wage searched less in offline dealer sources and
78
experiential sources ( 009.0 and 02.0 ). It is also found that high sticker price increases
search preference for offline independent sources and experiential sources ( 891.0 and 1.228,
respectively). Finally, as might be expected, the effect of the helpfulness of each information
source is positive.
Search Based Segments
Our second part of results is obtained from the segmentation analysis. By estimating
Equation 2, we obtained the latent search preference ( *ijy ) on each external information source.
We then segment the consumers in terms of search related variables including the latent search
preferences, internal search and spouse search, as well as a dummy variable of whether the
customer used the Internet or not. As mentioned previously, we used a two-step cluster analysis
for handling both continuous variables related to search and the discrete dummy variable. We
selected the number of clusters based on BIC and the size of the segments. We used a nine-
segment solution because the decrease in BIC is marginal after nine segments and none of the
segments is too small to be managerially relevant. Figure 2.2 shows the search times across the
segments.
In Figure 2.2, the x-axis represents the extent of the internal search and the y-axis
represents the offline, the Internet, and spouse search times. The figure lists the segments from
S1 to S9 for reference and gives the size of each segment. We sorted nine segments by the
degree of internal search. The first three segments are low internal search segments (i.e. the
levels of internal search lie between - 1.21 and 0.44- ), the next three segments are moderate
internal search segments (the levels of internal search lie between 0.24- and 0.05), and the last
79
Segments Segment Label S1 (273, 16.1%) Lowest internal searcher S2 (192, 11.3%) Lowest external searcher S3 (247, 14.6%) Low internal but moderate offline searcher S4 (55, 3.2%) Highest external searcher S5 (114, 6.7%) High searcher in online/offline sources S6 (75, 4.4%) High searcher in offline sources S7 (286, 16.9%) Moderate searcher S8 (189, 11.1%) High internal and low external searcher S9 (266, 15.7%) Most experienced and loyal searcher
Figure 2.2. Search Based Segments (S1 to S9) and Their Search Times in Hours
three segments are high internal search segments (the levels of internal search lie between 0.35
and 1.20). Note that some segments use the Internet (S1, S4, S5, S7, and S8) while other
segments do not use the Internet (S2, S3, S6, and S9).
Figure 2.2 reveals two interesting results that are new to the literature. First, we find, at
the segment level, the inverted U-shaped relationship between the internal search and external
0
5
10
15
20
25
30
-1.21 -0.79 -0.44 -0.24 -0.18 0.05 0.35 1.03 1.20
Exte
rnal
Sea
rch
Tim
e
Internal Search
Search Time by Segments
Offline Internet Spouse
SegmentSize
S1(273)
S2(192)
S3(247)
S4(55)
S5(114)
S6(75)
S7(286)
S8(189)
S9(266)
80
search if we put aside segments S1 and S4. That is, the level of external search increases in the
low internal search segments up to the segment S5 and S6 (the segments with moderate internal
search) and then decreases in the high internal search segments. It is worth remarking that the
theoretical inverted U-shaped relationship, like the view of Moorthy et al. (1997), is obtained at
the segment level, but not in aggregate. The other interesting result is two off-pattern segments:
S1 and S4. Members of S1, the lowest internal search segments, do moderate external search
(12.2 hours). The reason could be that these consumers want to compensate for their lack of
knowledge by external search. Segment S4 is a niche segment (3.2%) with moderate internal
search (-0.24) but long external search (43.9 hours). The reasons could be that they are efficient
enough to process the external information more or that they enjoy external search based on their
current internal knowledge. Uncovering these segments is important because, depending on their
size, they can mask the inverted U-shaped relationship. If segments S1 and S4 are relatively
large, it may make the relationship between internal and external search negative.
In Table 2.7, we provide a labeling and profiling of the segments by looking at their
search patterns and descriptive characteristics including demographics. After this, we also
discuss how our segmentation results compare against those of Furse et al. (1984).
Segment S1 (size n=273, 16.1%) is the second largest and characterized by the lowest
internal search. Though their internal search level is lowest, their external search is moderate
(12.2 hours). Segment S2 (n=192, 11.3%) also does low internal search and is characterized by
its lowest external search time (3.48 hours) and no Internet usage. Segment S3 (n=247, 14.6%)
consists of the moderate offline external searchers. This segment shows higher level of internal
search and external search than S2.
81
Segment S4 (n=55, 3.2%) is characterized by the highest external search time (43.9
hours). It is notable that their external search is extremely high compared to the degree of their
internal search. Segment S5 (n=114, 6.7%) and Segment S6 (n=75, 4.4%) consist of consumers
who use offline sources for long time (21.2 and 25.5 hours, respectively) and whose spouse
search is also high (15.1 and 16.1 hours, respectively). The differences between two segments
are that S5 uses Internet sources while S6 does not and that S6 spends the longest time on test-
driving (7.39 hours).
Segments S7 (n=286, 16.9%), S8 (n=189, 11.1%), and S9 (n=266, 15.7%) do relatively
high internal search. As the degree of internal search increases in segments, the external search
time decreases. Especially, S9 does the highest internal search but very low external search time
(3.9 hours). In addition, this segment does not use the Internet at all.
Results relating segments and demographics are as follows. Older consumers are more
likely to belong to high internal search and low external search segments compared to younger
consumers. Females with low internal search or males with high internal search are likely to
belong to low external search segments. Employed consumers do not necessarily belong to lower
search segments than those who are unemployed. Highly educated consumers do not necessarily
belong to the high internal search segments but are likely to belong to the high external search
segments. Consumers with high hourly wages do not seem to reduce their search time because
some high wage segments search more than low wage ones.
We overlay our segments on those of Furse et al. (1984) in Table 2.7. We find that S1 matches
their cluster, Self-Reliant Shopper, in that they spend certain amount of time but do not involve
other people much. S2 matches their cluster, Purchase Pal Assisted, who are the least
82
experienced car shoppers and get help from others. S3 and S7 are similar to their Moderate
search cluster. S4 matches their High Search cluster of those consumers spending the greatest
amount of time in search activity. S5 and S6 are similar to their cluster, Retail Shopper, who
involves many decision makers, especially the wife, in the search process. S8 and S9 match their
cluster, Low Search, of those who have the prior purchases experience but spend less time.
Roughly, therefore, search based segments in the Internet era match those in the pre-Internet era.
Some differences, however, are also found in the Internet era. For example, while the
proportion of Purchase Pal Assisted has decreased (19% vs. 11.3%), Self-Reliant Shopper has
increased (12% vs. 16.9%). In addition, a new segment has emerged (S5, 6.7%), which is similar
to their Retail Shopper cluster, but uses the Internet as an additional information source. The
changes occurred possibly because many consumers got information directly from the Internet
without other people’s help or even led to a new searcher type.
Search Results
The third and final part of results pertains to the effects of search on brand choices and
price-related outcomes. First, we look at the brand choices of the search-based segments, which
is new to the literature. We examine the relationship by a correspondence analysis first followed
by a logit model analysis. Then, we look at price-related outcomes such as pricing negotiation
time, discount amount/rate, and the final price satisfaction by segment.
Search based segments and their brand choices. We categorize the individual automobile
brands into country level brands. If there are many brands in the same country, we classify them
by manufacturer depending on the number of observations. The final brands we use are Chevy
(20.1%), GM low brands (Pontiac and Saturn, 9.6%), GM high brands (Cadillac, GMC, and
83
Table 2.7. Description of Segments
Segments (Size, %) Search Pattern Demographics
Similar Group in
Furse et al. (1984)
S1 (273, 16.1%) Lowest internal searcher
Their internal search is lowest. They seem to make up lack of their knowledge by moderate external search.
The youngest segment (average age 39.1 years). They are highly educated (16.3 years), employed (88%), and paid hourly wage ($25.1). They buy cars for the first time or change models.
Self-Reliant Shopper (12%)
S2 (192, 11.3%) Lowest external searcher
As their internal search is low, their external search is also low. They are the lowest searchers and do not use the Internet.
Average age 49.1 years. The female proportion is higher (54%). Marriage rate is 64%. Their education level (14.4 years) and hourly wage ($19.3) are lower than others. They do not have many experiences in automobile purchases.
Purchase Pal Assisted (19%)
S3 (247, 14.6%) Low internal but moderate offline searcher
Their internal search is relatively low but external search is larger than other low internal searcher segment (S2). They do not use the Internet.
Average age is 50.2 years. Fewer years of education (14.6 years), less employed (66%) and paid ($20.1) than others. They do not have many experiences in purchases.
Moderate Searcher (32%)
S4 (55, 3.2%) Highest external searcher
They do moderate internal search but extremely high external search. This segment is niche.
Average age is 42.5 years. They are less married (60%) but more educated (16.5 years), employed (82%) and paid ($27) than others.
High Searcher (5%)
S5 (114, 6.7%) High searcher in online/offline sources
They do high external search in various sources including spouses.
Average age is 43.6 years. Half are female (51%). Most are married (80%) and employed (87%).
Retail Shopper (5%)
S6 (75, 4.4%) High searcher in offline sources
They do high external search but do not use the Internet. They spend long time in test-driving and get spouses’ help most.
Average age is 50.7 years. Half are female (51%). Most are married (85%). They are less employed (68%) and paid ($17.6).
Retail Shopper (5%)
S7 (286, 16.9%) Moderate searcher
They use various sources in the moderate level. They are the largest segment.
Average age is 44.5 years. Most are employed (87%) and highly paid ($26). The proportion of males, marriage rate, and education levels are average.
Moderate Searcher (32%)
S8 (189, 11.1%) High internal and low external searcher
They do high internal search but low external search.
Average age 48.8 years. Higher proportion of males (65%). More years of education (16.3 years), higher employment rate (84%), and paid ($31). They have purchased 3.16 cars in 10 years and 72% of them buy the same makers.
Low Searcher (26%)
S9 (266, 15.7%) Most experienced and loyal searcher
Their internal search is the highest and their external search is low. They do not use the Internet.
Average age 52.7 years. More are married (80%) but less employed (66%) and paid ($20.3). They are so loyal to auto makers that 80% of them buy the same brands. They are most experienced in purchasing cars (3.32 in 10 years).
Low Searcher (26%)
84
Oldsmobile, 10.2%), Ford (17.4 %), Chrysler (11.2%), Toyota/Honda (14.2%), other Japanese
brands (e.g., Nissan, Mazda, and so on, 4.6%), EU brands (4.7%) and Korean brands (3.6%).
We look at the graphical relationship between the search-based segments and automobile
brands using a correspondence analysis. The result is in Figure 2.3. In the correspondence
analysis, we chose two dimensions. The first dimension explains 69.2% of the original
information and the second explains 14.4%. From the perspective of the segments, the main
dimension (x-axis) appears related to Internet usage because the Internet using segments (S1, S4,
S5, S7, and S8) are located on the right side and the rest of the segments are located on the left
side. From the perspective of the brands, the main dimension appears related to the brand origin
because American brands are located together on the left side while foreign brands are on the
right side.
By combining the results of the search-based segments and their brand choices, we can
see the relationship between them. The most salient result is that the low and moderate search
segments (S2, S3, S7, S8, and S9) correspond to the American brands while the high search
segments (S1, S4, and S5) correspond to all the Japanese and EU brands. Segment S6, high
offline search segment, is close to Chrysler and Korean brands.
In addition to the correspondence analysis, we confirmed the relationship between the
segment membership and brand choices by using a multinomial logit model. We set up the
multinomial logit model as follows.
L
lil
ili
x
xlbrandP
1
)exp(
)exp()( ,
85
Figure 2.3. Correspondence between Search-based Segments and Brand Choices
where i indexes consumer and l indexes brand and ix are the independent variables including
segment dummies, demographics and product related data. For brevity, we report only the main
results of this analysis. The logit analysis results are, in general, similar to the correspondence
analysis. Setting Chevy as the reference category, compared to S9, the segments S1, S4, and S5
are more likely to choose Toyota/Honda ( =2.33, 1.83, and 1.91), other Japanese brands (
=1.98 and 1.53 for S1 and S5), or EU brands ( =2.14, 1.68, and 1.72, respectively). S3 is more
likely to choose Chrysler ( =0.85) or Korean brands ( =1.35) and S6 is more likely to choose
Chrysler ( =1.17). However, there is no difference in brand choices of S2, S7, and S8,
S1S2
S3
S4S5
S6
S7S8
S9
Chevy
GM Low
GM HighFord
Chrysler
Toyota/Honda
Other JapaneseEU
Korean-1.5
-1.0
-0.5
0.0
0.5
1.0
-1.0 -0.5 0.0 0.5 1.0 1.5
Sale
s Vol
ume
American-Foreign
Correspondence between Segments and Brands
86
compared to S9, as those segments belong to the low external search segments and are likely to
choose American brands.
The close relationship between search-based segments and their brand choices
demonstrates that it is important for automakers to choose proper communication media for their
customers. That is, American brands, whose customers are high internal and low external
searchers, might consider spending more on building consumer loyalty and satisfaction. In
contrast, foreign brands should provide more information to satisfy their consumers’ information
needs. As the foreign brands are strongly associated with Internet users, they should enhance
their Internet-based advertising and communications.
Search based segments and price-related outcomes. Finally, we look at price-related
outcomes for the different segments. To see the differences by segment, we run an ANCOVA, in
which the dependent variables are the price negotiation time with the dealer, discount amount,
discount rate (discount amount over the sticker price), and the final price satisfaction. The main
independent variable is the segment variable and the covariates are age, male indicator, marriage
indicator, employment indicator, education level in years, hourly wage, sticker price, and the
brands. Table 2.8 shows the F-test results which test for mean differences in the dependent
variables by segment.
We find that there is a difference (i.e., we can reject the null hypothesis of mean equality)
in the price negotiation time with the dealer for different segments (F=17.32, p-value<0.01). In
general, the high external search segments (S4, S5, and S6) spend a longer time on negotiating
with the dealer (around 3 hours) while low external search segments (S2, S8, and S9) and
moderate external search segments (S1, S3, S7) spend a shorter time on negotiating with the
87
dealer (around 1 to 1.5 hours, respectively). Because the negotiation takes place at the dealer,
high external searchers seem to spend more time with the dealer when they visit the dealer to
shop. The discount amount is also different by segment (F=2.41, p-value=0.01). Overall, the high
offline search segments (S4, S5 and S6) or the high internal search segments (S7, S8, and S9) get
on average discounts of $3000 while the others receive on average discounts of less than $2500.
The results of the discount rate show similar differences (F=2.44, p-value=0.01). The segments
S4 through S9 get about a 10.7~11.6% discount but the other segments receive about a
9.4~10.5% discount.
Interestingly, however, even though price negotiation times, discount amount, and
discount rate are different for different segments, the final price satisfaction, on average 5.4 out
of 7, is not different across the segments (F=0.84, p-value=0.56). Considering that every segment
ends up with a similar satisfaction level, the different search patterns are the outcomes of their
best search effort to maximize their utility given their current knowledge, productivity in
different search sources, or spousal help.
88
Table 2.8. Results of ANCOVA
Segment Segment Name Negotiation
Time Discount
Discount Rate
Final Price Satisfaction
Mean SD Mean SD Mean SD Mean SD
S1 Lowest internal searcher 1.35 (1.34) 2467 (2536) 0.094 (0.086) 5.28 (1.18)
S2 Lowest external searcher 0.92 (0.86) 2462 (2272) 0.100 (0.091) 5.32 (1.44)
S3 Low internal / moderate offline searcher 1.60 (1.76) 2691 (2237) 0.105 (0.085) 5.33 (1.27)
S4 Highest external searcher 2.32 (2.51) 2959 (2788) 0.107 (0.092) 5.38 (0.97)
S5 High searcher in online/offline sources 3.38 (3.59) 3144 (2777) 0.113 (0.093) 5.20 (1.45)
S6 High searcher in offline sources 3.16 (4.30) 3034 (2780) 0.110 (0.092) 5.37 (1.2)
S7 Moderate searcher 1.53 (2.12) 3168 (2642) 0.116 (0.089) 5.43 (1.22)
S8 High internal and low external searcher 1.07 (1.04) 3192 (2758) 0.114 (0.091) 5.55 (1.21)
S9 Most experienced and loyal searcher 1.25 (3.36) 2950 (2582) 0.108 (0.092) 5.59 (1.22)
F statistic 17.32 2.41 2.44 0.84
p-value 0.00 0.01 0.01 0.57
89
CONCLUSION
Summary
The objectives of this paper were to find out the relationships between search sources in a
comprehensive manner, segment the buyers based on their search patterns, and examine the
search results for each segment. We consider the entire range of information sources that buyers
consult in automobile purchases including internal search, offline search sources, Internet
sources, and spouse search. By analyzing the data on automobile purchases in 2001, 2003, and
2005, we find some interesting results that extend the results from the previous studies.
First, we find that, in general, search preference for each information source is positively
associated with the others. The generally positive interrelationship occurs within the offline and
the Internet sources and across the offline and the Internet sources, implying that consumers
complementarily use all information sources. However, search preference for the dealer sources
and internal search reduce search preference for all information sources. It is notable that search
preference for dealer sources significantly reduces search preference for the Internet sources.
This finding extends previous results that looked at the effects of the Internet on offline sources
but not the reverse effects.
Second, we identify nine segments based on consumers’ search patterns. The segments
are profiled based on the extent of their internal search, Internet and offline search time, and
spouse search time. Several of the segments correspond to those of Furse et al. (1984) obtained
90
prior to the Internet. At the segment level, we find the inverted U-shaped relationship between
internal search and external search. That is, low and high internal searchers are low external
searchers while moderate internal searchers are high on external search. We also find that two
segments do not conform to the inverted U-shaped relationship; one has low internal search but
moderate external search and the other has moderate internal search but extremely high external
search. Though the latter segment is small in size, the presence of two such segments shows a
reason for why the inverted U-shaped relationship may be hard to find at the aggregate level.
Finally, we examine the outcomes of search, focusing on brand choice. The results show
that segments with low external search are associated with purchase of American brands while
segments with high external search correspond to Japanese and EU brands. These results are
notable in that the relationship between search and brand choices is indentified for the first time.
In addition, we find that though the price-related outcomes are different for different segments,
final price satisfaction levels are similar across segments.
In conclusion, our study extends the search literature by providing some new insights
including the effect of offline search on Internet search, the identification of search-based
segments, the relationship of internal and external search at the segment level, and the search
segments’ brand choices. We discuss how automakers might utilize these results next.
Managerial Implication
Our results have some practical implications for dealers and automakers. First, the dealer
is still a powerful and efficient information source for consumers in the Internet era. The more
time consumers prefer to spend with the dealer, the less time they prefer to spend with other
information sources. This result qualifies the results in previous studies about the role of the
91
Internet in reducing the search time with the dealer. Automakers should carefully select and train
dealers, maintaining a good relationship with them not only for the final sales but also for
providing information to consumers.
Second, automakers can identify their positioning and their competitors’ positioning in
terms of consumers’ search patterns. The results show that American brands and Japanese brands
are close to the other brands of their countries. EU brands are close to Japanese brands, maybe
being perceived as foreign country brands, while Korean brands are positioned in a distinct
location. That is, competition occurs between brands of the same country group. Thus,
automakers could focus on their differentiation from other brands from the same country.
Third, automakers can develop efficient communication strategies based on the
relationship of the search segments and their brand choices. For example, because customers of
American brands are low external searchers, American brands might implement advertising
campaigns that build brand image and loyalty. As Japanese and EU brands are associated with
higher external and Internet search, they should enhance information delivery through their own
websites from which consumers can acquire their information and substitute other offline
information sources. Korean brands should provide more information to convince the high
search consumers. However, they have to work to reduce the distance in their position from other
foreign brands to be perceived as one of them.
Limitation and Future Research
If researchers have more information, they can understand consumers’ search patterns
better. First, consideration sets can affect the search patterns. If consumers are considering those
brands with which they are familiar and have experience, they are less likely to conduct long
92
searches because of high internal search. Yet, if they are considering new automakers, they
would have to search more to obtain the necessary information. Therefore, future studies should
consider ways to include the effect of consideration sets. Second, the sequence of search can help
determine if some information sources initiate or stop further search. This might give some
insights into which information sources are more important in different stages of search. Third,
our dataset did not cover the 2008-2009 periods, which has seen turmoil and bankruptcies in the
automobile industry, changes in product lines, elimination of dealers, government intervention,
and the recession. It would be interesting to see whether these have altered consumers’ search
patterns in automobile purchases.
93
APPENDIX
MCMC Algorithms for Model Estimation
(1) Estimating the parameters of endogenous variables ( )
The matrix is estimated using a random walk chain Metropolis-Hastings method. As
the diagonal elements of are one, we need to estimate the off-diagonal elements only. Let ~
denote the vector consisting of off-diagonal elements of , where the dimension of ~ is
1)(~ 2 JJK . In this study J is 8. We use a diffuse normal prior, i.e., ),(~~~~N , where
K~~ 0 and KI ~4
~ 10 .
We generate candidate draws according to zs )1(* ~~ , where ),0(~ zNz and s
denotes the s-th iteration. To find the proper z , we ran the MCMC algorithms twice, following
Koop (2003). In the first run, we assume Kz Ic ~1~ and randomly assign 5
1 10~c (80%) and
51 105~c (20%). After we get the variance of ~ , denoted as z , in the second run, we
reassume that zz c2~ and randomly assign 2
2 10~c (80%) and 22 105~c (20%).
As there are many parameters in ~ , we draw and accept the new candidates equation by
equation. For example, let us denote j~ as the coefficient vector of the endogenous variables in
the j-th equation. To determine sj
~ , we draw js
jj z)1(* ~~ given sj
~ , where the subject j
means the related components in the j-th equation. By comparing the posterior probabilities with
94
)1(~ sj and *~
j , we decide which draw to use at the s-th iteration and repeat the process for j=1 to
J. Estimating ~ by the split equations is helpful for getting the proper acceptance rate.
(2) Estimating the parameters of exogenous variables ( ) and covariance matrix ( )
We estimate and by using a Gibbs sampler with standard Normal-Wishart priors.
Specifically, we use a normal prior ),(~ N , where K0 and KI410 and a
Wishart prior ),(~1 VvW , where 3Jv and JIvV )/1( .
(3) Imputing *y
After getting all parameters ( , and ), we can impute *iy . If all elements in iy are
positive, there is no need to impute. If all elements in iy are zero, we draw *iy from the
multivariate truncated normal distribution, ),( '111)0,( iXMVTN . If some elements of
iy are zero, we draw the latent values from a conditional multivariate truncated normal
distribution.
95
REFERENCES
Amemiya, Takeshi (1974), “Multivariate Regression and Simultaneous Equation Models when the Dependent Variables Are Truncated Normal,” Econometrica, 42 (6), 999-1012.
________, Makoto Saito, and Keiko Shimono (1993), “A Study of Household Investment
Patterns in Japan: An Application of Generalized Tobit Model,” The Economic Studies Quarterly, 44 (1), 13-28.
Bettman, James R., (1979), Information Processing Theory of Consumer Choice. Reading, MA:
Addison-Wesley. Furse, David H., Girish N. Punj, and David W. Stewart (1984), “Typologies of Individual Search
Strategies Among Purchasers of New Automobiles,” Journal of Consumer Research, 10 (March), 417-31.
Guo, Chiquan (2001), “A Review on Consumer External Search: Amount and Determinants,”
Journal of Business and Psychology, 15 (3), 505-19. Hauser, John, Glen Urban, and Bruce Weinberg (1993), “How Consumers Allocate Their Time
When Searching for Information,” Journal of Marketing Research, 30 (November), 452-66.
Hoffman, Donna L. and George R. Franke (1986), “Correspondence Analysis: Graphical
Representation of Categorical Data in Marketing Research,” Journal of Marketing Research, 23 (August), 213-27.
Jang, Sungha, Ashutosh Prasad, and Brian T. Ratchford (2010), “Consumer Spending Patterns
across Firms and Categories: Application to the Size and Share of Wallet,” working paper, University of Texas at Dallas, TX.
John, Deborah Roedder, Carol A. Scott, and James R. Bettman (1986), “Sampling Data for
Covariation Assessment: The Effect of Prior Beliefs on Search Patterns,” Journal of Consumer Research, 13 (June), 38-47.
Klein, Lisa R. and Gary T. Ford (2003), “Consumer Search for Information in the Digital Age:
An Empirical Study of Prepurchase Search for Automobiles,” Journal of Interactive Marketing, 17 (3), 29-49.
96
Koop, Gary (2003), Bayesian Econometrics. Hoboken, NJ: Wiley. Moorthy, K. Sridhar, Brian T. Ratchford, and Debabrata Talukdar (1997), “Consumer
Information Search Revisited: Theory and Empirical Analysis,” Journal of Consumer Research, 23 (March), 263-77.
Punj, Girish N. and Richard Staelin (1983), “A Model of Consumer Information Search Behavior
for New Automobiles,” Journal of Consumer Research, 9 (March), 366-80. Ransom, Michael R. (1987), “A Comment on Consumer Demand Systems with Binding Non-
negativity Constraints,” Journal of Econometrics, 34, 355-59. Rao, Akshay and Wanda Sieben (1992), “The Effect of Prior Knowledge on Price Acceptability
and the Type of Information Examined,” Journal of Consumer Research, 19 (September), 256-270.
Ratchford, Brian T., Myung-Soo Lee, and Debabrata Talukdar (2003), “The Impact of the
Internet on Information Search for Automobiles,” Journal of Marketing Research, 40 (May), 193-209.
Ratchford, Brian T., Debabrata Talukdar, and Myung-Soo Lee (2007), “The Impact of the
Internet on Consumers’ Use of Information Sources for Automobiles: A Re-Inquiry,” Journal of Consumer Research, 34 (June), 111-19.
Russo, J. Edward and France LeClerc (1994), “An Eye-Fixation Analysis of Choice Processes
for Consumer Nondurables,” Journal of Consumer Research, 21 (September), 274-90. Viswanathan, Siva, Jason Kuruzovich, Sanjay Gosain, and Ritu Agarwal (2007), “Online
Infomediaries and Price Discrimination: Evidence from the Automotive Retailing Sector,” Journal of Marketing, 71 (July), 89-107.
Srinivasan, Narasimhan and Brian T. Ratchford (1991), “An Empirical Test of a Model of
External Search for Automobiles,” Journal of Consumer Research, 18 (2), 233-42. Yang, Sha, Vishal Narayan and Henry Assael (2006), “Estimating the Interdependence of
Television Program Viewership Between Spouses: A Bayesian Simultaneous Equation Model,” Marketing Science, 25 (4), 336-49.
Zettelmeyer, Florian, Fiona Scott Morton, and Jorge Silva-Risso (2006), “How the Internet
Lowers Prices: Evidence from Matched Survey and Automobile Transaction Data,” Journal of Marketing, 43 (May), 168-81.
97
CHAPTER 3
HOW CONSUMERS USE PRODUCT REVIEWS
IN THE PURCHASE DECISION PROCESS
Sungha Jang
School of Management, Department of Marketing, SM32
The University of Texas at Dallas
800 West Campbell Road
Richardson, Texas 75080-3021
98
ABSTRACT
Several studies have found a positive effect of product reviews on sales at the aggregate level.
This paper, however, uses individual level data to examine the influence of product reviews in
different stages of the consumer’s purchase decision process. Specifically, a two-stage model
consisting of consideration set formation and choice is posited, where information from product
reviews can be incorporated at each stage. The model is estimated using an online panel study
about hotel choice. We find that: (1) Consumers use product reviews more in the consideration
set stage and less in the choice stage; (2) Bayesian updating of prior perceived quality explains
better how consumers use product reviews compared to two competing updating methods; (3)
The monetary value of a unit increase in the mean of product reviews can be computed – in the
case of the hotel study we find that it is equivalent to a price decrease of $57. Our results suggest
that managers should make product reviews available from the beginning of the search process,
show all components of product reviews (i.e., mean, number, and variance), and focus on
satisfying customers and encouraging them to write reviews.
Keywords: Product reviews; Bayesian updating; Consideration sets; Multivariate Probit; Choice
models; Bayesian estimation.
99
INTRODUCTION
Consumers frequently rely on the opinion of other consumers, such as product experts,
acquaintances, or online users, before they make their purchase decisions. The easy availability
of online product reviews has facilitated this behavior. Reflecting the surge in product reviews
usage, a number of recent papers have investigated the effects of product reviews on aggregate
sales. In general, more favorable product reviews are found to lead to higher sales (e.g.,
Chevalier and Mayzlin 2006). However, at the individual level, the use of product reviews in
different stages of the purchase decision process is relatively unexplored. The two stages that we
consider are the consideration set stage, where products are selected for further evaluation, and
the choice stage, where a final product is chosen from the consideration set. The motivation of
this paper is thus to examine in what stage, and how, consumers use product reviews in the
purchase decision process.
For a clearer explanation of the type of information that product reviews provide and how
consumers use product reviews, consider the following scenario:
M is planning a first trip to Cancun. Online information leads to short-listing the Marriott
hotel, because M has had good experiences with the Marriott brand, and the Fiesta
Americana hotel, because it is highly rated. Furthermore, after considering that the Marriott
has 30 online product reviews, averaging 3.5/5, and the Fiesta Americana has 270 reviews,
averaging 4.7/5, M still decided to choose the Marriott due to prior experience carrying
more weight.
100
From this, we see that the process of an individual consumer’s use of product reviews can be a
balancing act. Even while allowing the relationship in the literature that high product reviews
positively affect aggregate sales, it is not necessary that individual consumers always select the
highest rated product.
To explore how product reviews influence the purchase decision process, we organize the
study into three research questions:
(1) Are product reviews used in the consideration set stage, the choice stage, or both stages,
and to what extent?
(2) How is information from product reviews incorporated with prior experience or prior
perceived quality in the different stages?
(3) What is the value of each component of product reviews (i.e., mean, number, and
variance) expressed ideally in monetary terms?
A possible hypothesis is that consumers use prior perceived quality in the consideration
set stage and use product reviews in the choice stage by updating their prior perceived quality.
The rationale for this would be that in the consideration set stage, consumers are thought to apply
simple criteria to minimize their search efforts to a subset of alternatives from all alternatives
(e.g., Gilbride and Allenby 2004). Therefore, consumers are likely to use their current knowledge
about quality, called prior perceived quality. In contrast, in the choice stage, consumers carefully
consider detailed information and incorporate other people’s opinions on quality, i.e., product
reviews, with their prior perceived quality. As a result, they get an updated knowledge of quality,
called posterior perceived quality. The updating method could be Bayesian or something else.
Finally, posterior perceived quality affects the choice.
We empirically test the above hypothesis and several competing specifications in the
context of making a hotel choice online. Several previous studies have also looked at hotel
101
choice and this is likely to be a high involvement task with extended problem solving (e.g.,
Vermeulen and Seegers 2009).
Our findings contribute to the current theory on the effect of product reviews in three
ways. First, we find that product reviews have separate effects in different stages of the purchase
decision process. The results show that product reviews update prior perceived quality in the
consideration set stage, which affects inclusion into the consideration set. Therefore, product
reviews, assuming that they are positive, should be made available from the beginning of the
purchase process.
Second, we find that consumers integrate product reviews information in a manner
consistent with Bayesian updating. To see this, we compared the performance of Bayesian
updating with two other updating heuristics. In Bayesian updating, consumers combine their
prior perceived quality with product reviews data, including mean, number, and variance,
resulting in posterior perceived quality. In the heuristic updating methods, consumers either
replace their prior perceived quality by product reviews, or use the average of prior perceived
quality and product reviews.
Third, we obtain the monetary value of each component of the product reviews. In the
consideration set stage, we examine how much increase in the mean, number or variance of
product reviews is necessary to keep the utility level and consideration set composition
unchanged for a given price decrease. Specifically, we found that a unit increase in the mean
consumer review is worth $57, a unit increase in the number of reviews is not worth much, and a
unit increase in the variance of product reviews has a value that depends on the difference
between the mean of product reviews and prior perceived quality.
102
The paper proceeds as follows: In the next section, we provide a review of the literature.
Then, we describe the two-stage choice model and the estimation method. In addition, we
explain how consumers update prior perceived quality. After this, we describe the survey and
data followed by the results. Finally, we give the conclusions and directions for future research.
103
LITERATURE REVIEW
Our paper is based on several research streams that deal with the effect of product reviews on
sales, consideration set formation and choice models, and Bayesian updating.
Several papers have examined the effects of product reviews on sales for various product
categories. For example, Chevalier and Mayzlin (2006) find that the number and mean of
product reviews are positively related to online book sales. Clemons, Gao, and Hitt (2006) find
that mean and variance of beer brand review ratings are positively related to the beer brand’s
sales growth rate. Liu (2006) studies the dynamics of word-of-mouth and the box office revenue
for movies and finds that current period word-of-mouth affects next period box office revenue
but also that current period box office revenue affects next period word-of-mouth. Also studying
the movie industry, Sun (2009) finds that the variance of product reviews has an influence on
consumer decisions. Specifically, low review scores with large variance are less negatively
interpreted by consumers because consumers might assume a mismatch occurred between the
unhappy consumers and the product.
Though they show that product reviews affect sales, current studies focus on the
relationship at the aggregate level and do not explain how product reviews affect the purchase
decision process of individual consumers. This study focuses on the latter point. For example,
consumers may already hold a prior perceived quality about the product and use product reviews
to update it, or even completely replace it with the product reviews. Furthermore, product
104
reviews might affect different stages of the purchase decision process differently. We discuss this
next.
A second relevant research stream deals with consideration set formation and choice.
There is clear evidence that consumers form consideration sets as part of the decision making
process. For a review of this literature, see Roberts and Lattin (1997). The rationale for forming a
consideration set is that consumers do not find it cost-effective to process information on all the
brands available. That is, the consideration set stage is less effortful while the choice stage is
more comprehensive (Gilbride and Allenby 2004). Empirical work shows that choice can be
predicted more accurately by a two-stage process involving consideration set formation rather
than a one-stage process (Gensch 1987).
In the different stages of the two-stage process, consumers may have different
information about product attributes or apply different weights on the same product attribute. For
example, Andrews and Srinivasan (1995) find that the effect of price is negative in the
consideration set stage but can be positive in the choice stage. This might occur because
consumers select only affordable products in the consideration set stage, while in the choice
stage, higher price is associated with higher quality. Similarly, Allenby and Ginter (1995) find
that in-store displays and features influence consideration set formation whereas merchandising
support information affects choice. Consumers might also have or use different information
about product attributes in the two stages. It may be that consumers use only prior perceived
quality to construct the consideration set from a large number of products. In the choice stage,
however, consumers may search product reviews and incorporate information from different
sources. Our model allows for all these possibilities.
105
A third related research stream is about how consumers incorporate current knowledge
(i.e., prior perceived quality) with new information (i.e., product reviews). Both before and after
Erdem and Keane (1996), who model consumer learning about brand attributes and consumer
updating uncertainty over time, there are papers that model the consumers learning process using
Bayesian updating. For example, Mehta, Rajiv, and Srinivasan (2003) examine how consumers
update their prior perceived quality from initial transactions while buying and experiencing
products. They draw consumers’ post perceived quality and use it to explain consumers
consideration set and choice.
Our approach is similar to Mehta et al. (2003). We apply Bayesian updating of prior
perceived quality by product reviews. This contributes in two ways to the existing literature on
product reviews. First, we take into account prior knowledge as an information source, which
extant approaches do not. Second, the Bayesian updating can use the mean, number, and
variance of the product reviews and examine the values of those components of product reviews,
whereas the extant approach does not use all of these components at the same time. It is not,
however, necessary that consumers use Bayesian updating at all, and they may use a simpler
method. We examine if they exclude prior perceived quality and use only the product reviews, or
if they use the average of prior perceived quality and product reviews. We evaluate which of
these methods is most consistent with the outcomes.
In summary, this study contributes to the current research on product reviews in three
ways. First, we extend the understanding of the effects of product reviews in the decision process
by examining in what stage between consideration set stage and choice stage product reviews
affect consumers’ decision. Second, we extend the understanding of how consumers use product
106
reviews with prior perceived quality by using different updating methods. Third, we evaluate the
monetary values of the components of product reviews and prioritize the importance of them.
107
MODEL AND ESTIMATION
In this section, we develop the two-stage model of consideration set formation and choice. At
each stage, consumers have utilities from perceived quality, price, and other product
characteristics. We use four types of perceived quality (viz. prior perceived quality, product
reviews, average of prior perceived quality and product reviews, and Bayesian updating
perceived quality). Listed in order of complexity, these are defined as follows:
- Prior perceived quality is consumers’ perceived quality before they look at (or if they look
at but ignore) product reviews. We can measure the mean and variance of prior perceived
quality by directly asking the respondents.
- Product reviews are equal to the perceived quality if consumers completely adopt other
consumers’ evaluations about product quality.
- Average of prior perceived quality and product reviews can be calculated. This is the
perceived quality if consumers use this simple method to update prior perceived quality
with product reviews.
- Bayesian updating perceived quality is the updated prior perceived quality with product
reviews in a Bayesian manner. We calculate Bayesian updating perceived quality using
Bayes’ rule.
We examine in what stage which type of perceived quality is used by consumers to
integrate product reviews by comparing model fits of models with different types of perceived
quality in the two stages. Note that a priori there is no certainty that any of these types will
provide a better fit than another given that they are not nested. Table 3.1 shows the list of models
we test and a brief description of their characteristics.
108
Table 3.1. Competing Specifications
Model Perceived Quality in
Consideration Set Stage ( Cq )
Perceived Quality in Choice Stage ( Fq ) Characteristics
Model 1
Prior
Prior Consumers use only prior perceived quality in the consideration set stage. They may or may not update it after looking at product reviews.
Model 2 Reviews
Model 3 Average of Prior and Reviews
Model 4 Bayesian updating
Model 5
Reviews
Reviews Consumers use product reviews in the consideration set stage and may or may not update it.
Model 6 Average of Prior and Reviews
Model 7 Bayesian updating
Model 8 Average of Prior and Reviews
Average of Prior and Reviews Consumers incorporate prior perceived quality and reviews. Model 9 Bayesian updating
Model 10 Bayesian updating Bayesian updating Consumers update in the Bayesian manner.
For example, in Model 1, consumers use prior perceived quality in both stages. In Model
2, consumers use prior perceived quality in the consideration set stage but use product reviews in
the choice stage. Model 4, which indicates that consumers use prior perceived quality in the
consideration set stage and use Bayesian updating perceived quality in the choice stage, has
some theoretical support (e.g., Gilbride and Allenby 2004). The rationale is that consumers do
not spend much effort and time on looking at detailed product reviews of all products to form a
consideration set, but that they would go through detailed product reviews in the choice stage by
looking at not only the mean but also the number and variance of product reviews.
109
Consideration set stage
In the consideration set stage, consumers evaluate the utility of each product for inclusion
in the consideration set. The utility of each product is given by a multi-attribute model (e.g.,
Andrews and Srinivasan 1995). Thus, for individual i, the utility in the consideration set stage for
product j, where j=1,…,J, is denoted *ijz and expressed as follows:
(1) ijijijCijjij xpqz 3210
* ,
where Cijq is perceived quality of product j in the consideration set stage. ijp is the ratio of the
price of the product j to consumer i’s willingness to pay; and ijx is a vector of other product
attributes. The error term ij is assumed to be normally distributed, as ),0(~ zzij N . Finally,
j0 is a j-product specific intercept and 321 ,, is the vector of coefficients of the covariates.
Note that we look at perceived quality, in whichever of its four types is used, as capturing other
elements of the expected hotel utility than star ratings and price. This seems reasonable because
when both a 5-start and a 3-star hotel receive a 3/5 rating, one would still suppose that the 5-star
hotel provides higher overall utility.
We assume that consumer i includes product j in the consideration set if 0*ijz and
excludes it if 0*ijz . Therefore, the relationship between the consideration set utility *
ijz and the
observed decision ijz of whether consumer i includes product j is given by
*
*
1 if 0,0 if 0,
ijij
ij
zz
z
where 1ijz if consumer i includes product j and 0ijz otherwise.
110
Consumer i’s consideration set iC is thus the vector ),,,( 21 iJii zzz and it is related to the
vector of utilities ),,,( **2
*1
*iJiii zzzZ . The distribution of *
iZ , following Edwards and Allenby
(2004), is
(2) ),(~*zz
Cii XMVNZ ,
where CiX is the matrix of product attributes across J products in the consideration set iC ,
),,,,,( 321001 J is the vector of coefficients and zz is the variance-covariance
matrix of error terms. We estimate the parameters using a multivariate Probit model because
consumers decide for each of the J alternatives whether it should be included or not.
Choice stage
After consumer i forms consideration set iC , each product in the set is further evaluated
in the choice stage and the one with highest utility is chosen. As before, we assume that utilities
in the choice stage are multi-attribute functions of the perceived quality, the ratio of price to
willingness to pay, and other product attributes. Though the consumer has information on the
same product attributes, it is possible that the consumer weights them differently in the choice
stage because that is a different task than consideration set formation (e.g., Andrews and
Srinivasan 1995). Therefore, we allow different coefficients on the product attributes. The utility
of individual i from product j is expressed as
(3) ijijijFijjij xpqy 3210
* ,
where *ijy is the utility relative to the outside option, whose utility is scaled to zero, and other
variables are as before except that a different type of perceived quality Fijq can be used.
111
Furthermore, the parameters for product attributes as well as the error structure, ),0(~ yyij N ,
are different from the consideration set stage.
The choice rule that relates the utility *ijy to observed choice ijy , depends on whether
product j belongs to consideration set iC or not:
If ij C then * *
*
1 if max( ,0) ,
0 if max( ) 0.ik ij
ij
ik
y yy
y
If ij C then 0ijy .
Here k represents products in the consideration set. Thus, 1ijy indicates that consumer i
included product j in the consideration set iC in the first stage and then chose it in the choice
stage.
We can find the distribution of the vector of utilities ),,,( **2
*1
*iJiii yyyY as follows.
(4) ),(~*yy
Fii XMVNY ,
where FiX is the matrix of product attributes across J products, ),,,,,( 321001 J is a
vector of coefficients and yy is the variance-covariance matrix of error terms. We estimate
parameters using a multinomial Probit model because consumers choose a specific product from
the consideration set.
Bayesian updating and alternative heuristics
Consumers can change their prior perceived quality after looking at product reviews.
With Bayesian updating, consumers construct a posterior perceived quality by combining prior
perceived quality and product reviews. We describe the Bayesian updating method below.
112
Let ijq denote consumer i’s prior perceived quality on product j and assume that it
follows a normal distribution,
(5) ),(~ 200 ijijij WNq .
Suppose there are jn other consumers who have experienced product j and provided
product reviews ),1( jjl nlr and those consumers are believed to be representative buyers
and unbiased. They may have experienced different quality because of different consumption
situations. For example, consumers of the same hotel may have had different employee
interactions, room service or seasons. The consumer experiences with quality are assumed to be
normally distributed around the intrinsic quality jQ with variance of 2jQ . That is,
),(~,, 21 jj Qjjnj QNrr .
However, the intrinsic quality jQ is not known to any consumer. From consumer i’s
perspective, his or her prior perceived quality ijq is an indicator of jQ , meaning that the
consumer thinks that the quality of product j is ijq and that other people received quality
experiences centered at ijq . Therefore, before looking at product reviews, the consumer believes
the distribution of product reviews is as follows:
),(~,, 21 ijj qijjnj qNrr .
Finally, the consumer updates his or her perceived quality after looking at product
reviews and has a new distribution for the posterior perceived quality. The posterior distribution
of the quality mean given the variance and product reviews is
(6) ),(~,,,| 2111
2ijijjnjqij WNrrq
jij,
113
where )/1(/1
)/1(/122
0
20
20
1ij
ij
qjij
jqjijijij n
rnWW and
)/1(/11
220
21
ijqjijij n
. Note that jr and jn are the
mean and number of product reviews on product j, respectively.
For the variance of product reviews, 2ijq , we use a vague prior ),0(~/1 2 Gamma
ijq
and obtain a posterior distribution of
(7) 212 2,
2~,,,|/1
ijj
jjnjijq sn
nGammarrq
jij,
where jn
lijjl
jij qr
ns
1
22 )(1
1 is consumer i’s posterior variance of product reviews on product
j, given his or her posterior perceived quality is ijq .
In the Bayesian updating method, it should be noted that the parameters of the posterior
perceived quality distribution consist of the parameters of the prior perceived quality distribution
and consumer review components. That is, the distribution of the posterior perceived quality
mean is Normal with mean 1ijW and variance 21ij , where 1ijW consists of parameters of prior
perceived quality ( 0ijW and 20ij ) and product reviews components ( jr and jn ) and 2
1ij consists
of 20ij and jn . Similarly, the posterior perceived quality variance 2
ijq follows a gamma
distribution with parameters consisting of prior perceived quality ijq and product reviews
components ( jlr and jn ). That is, the posterior perceived quality is affected not only by the
characteristics of product reviews ( jr , jn and 2js ), but also by the characteristics of a consumer’s
prior perceived quality ( 0ijW and 20ij ).
114
As alternatives to Bayesian updating, the heuristic update methods we consider are (1)
product reviews and (2) the average of prior perceived quality and product reviews. As the
format of product reviews in our dataset is a 1-5 scale, which is common, we use multinomial
random draws for product reviews.
Note that regardless of the updating methods, we assume that consumers update their
prior perceived quality only for those products that they include in the consideration set. For
products excluded in the consideration set, we use the same perceived quality as in the
consideration set stage.
Estimation of the two-stage model
We next discuss the estimation. We assume that the utilities in the consideration set stage
and the choice stage are interrelated and cast the two utility functions into a system of equations
as follows.
(8) i
iFi
Ci
i
i
XX
YZ
00
*
*
,
where *iZ is the vector of utilities from the consideration set stage and *
iY is the vector of
utilities from the choice stage. We assume that yyzy
zyzz
i
i N ',00
~ , where zz and
yy are variance-covariance matrixes of i and i , respectively and zy is the covariance matrix
between i and i .
Thus our two-stage model is given by Equation (8). It allows different parameters for
consideration utility *iZ and choice utility *
iY , which is a flexible representation (Gilbride and
115
Allenby 2004). And, by looking at zy , we can see the relationships between consideration set
utility and choice utility, which is relatively unexamined in the literature because it is empirically
difficult to model correlation between the two stages (Nierop, Bronnenberg, Paap, Wedel, and
Franses 2010).
Equation (8) is in the form of a SUR model and the estimation method for parameters ,
and is discussed in several places (e.g., Koop 2003). The difference from a standard SUR
model is that we need to draw *iZ and *
iY by data augmentation using the consideration set and
the choice. The full Bayesian MCMC algorithms including the data augmentation procedure are
in Appendix.
The probability that consumer i chooses product j from consideration set Ci in the choice
stage can be written as
)()|()( iiii CPCjyPjyP .
Here, )( iCP is the probability of observed consideration set Ci and it is calculated as
follows:
)0,,0,0(),,,()( **22
*1121 iJiJiiiiiJiii zdzdzdPcccPCP ,
where cij=1 if consumer i includes product j in the consideration set and cij=0 if not. And dij=1 if
consumer i includes product j in the consideration set and dij= -1 if not. For example, if consumer
i includes product 1 and 2 out of three products {1,2,3}, the probability of consideration Ci is
calculated as )0,0,0()0,1,1()( *3
*2
*1321 iiiiiii zzzPcccPCP .
Furthermore, )|( ii CjyP is the probability that consumer i finally chooses product j
given the consideration set Ci and it is calculated as follows:
116
),|()|( **iikijii CkjjkyyPCjyP ,
where yi is consumer i’s choice among J products. We calculate this probability using the GHK
estimator (Keane 1994; Hajivassilious et al. 1996).
Equation (8) involves a multivariate Probit model for consideration set and a multinomial
Probit model for the choice. As these are discrete choice models, we cannot identify all the
parameters in Equation (8). Following Edwards and Allenby (2003), we navigate in the
unidentified parameter space in estimation but report parameters which are divided by the
corresponding variances.
117
SURVEY AND DATA
To collect data to estimate the model, we conducted an online survey about hotel choice. To
make the task realistic, we used a tourist destination Cancun, Mexico as a specific location. We
constructed the survey website based on real hotel names, star ratings, product reviews and
prices taken from an online travel site, Travelocity.com. The survey page allows respondents to
access product reviews in both the consideration set stage and the choice stage if they want. We
presented respondents with search results similar to what they would see on online sites.
Respondents could read the hotel descriptions and get more information about hotel amenities by
clicking on the hotel name. However, compared to online sites such as Travelocity.com and
Hotels.com, we simplified the presentation in three ways. First, we picked a subset of 10 hotels
whose star ratings and product reviews were available. These are shown in Table 3.2 and
augmented by the last two columns which show, from the survey results, the percentage of
respondents who included them in their consideration sets and final choice decisions.
Second, we used the numerical ratings but not the textual descriptions of product reviews
because it is not easy to quantify these descriptions. Third, we presented respondents with the
number of reviews, the mean, and a histogram, as shown in Figure 3.1. In comparison,
Travelocity.com does not provide the histogram, but other websites such as Amazon.com do, and
we include it so that respondents have information about variance in reviews.
118
Table 3.2. Hotel Information
Hotel Stars Price ($)
Product reviews Conside-ration Choice Mean Number Variance
H1. Royal Solaris Cancun 3.5 152 3.52 175 1.24 29.3% 7.0% H2. Dreams Cancun Resort 4.5 229 4.70 308 0.57 32.0% 7.8% H3. GR Solaris Cancun 4.0 173 3.54 209 1.20 23.0% 2.8% H4. InterContinental 4.0 93 4.10 27 1.15 46.6% 17.4% H5. Riu Palace Las Americas 4.0 198 4.62 179 0.60 31.1% 6.3% H6. Fiesta Americana 5.0 168 4.90 69 0.37 37.6% 15.2% H7. JW Mariott Cancun 5.0 139 3.80 29 2.33 50.6% 23.1% H8. Hotel Sotavento 2.5 51 2.35 11 1.67 18.4% 1.9% H9. Imperial Las Perlas 2.0 32 3.02 11 2.08 19.7% 5.9% H10. Holiday Inn Express 3.0 60 2.45 12 1.42 36.7% 12.7%
Figure 3.1. Survey Screen on Some Hotel Information
119
Survey Procedure
First, we described the purpose of the survey and asked questions about their travel
experience. Second, we measured respondents’ prior knowledge about each hotel brand by
asking questions on awareness, stay experience, and perceived quality before they looked at
product reviews. Third, we presented hotel descriptions and product reviews and asked
respondents to shortlist hotels for further consideration if they want (i.e., the consideration set).
Fourth, we presented only hotels that respondents chose in the previous stage and asked them to
choose one hotel for their stay. Finally, we asked some demographic questions and how
respondents used product reviews in the consideration set and the choice stage. The survey
procedure is summarized in Figure 3.2.
Figure 3.2. Survey Procedure
We measured prior perceived quality with respect to mean and variance. We asked
respondents about the perceived quality of hotels on a scale of 1.0 to 5.0 with increments of 0.5.
120
This perceived quality is the mean of prior perceived quality. For variance, we asked the degree
of confidence on quality evaluation using 1 to 10 scales as its inverse is related to the variance.
Descriptive Statistics
We recruited 771 respondents from an online survey company, TRCHOME.com. We
dropped 75 respondents who completed the survey too rapidly indicating that they were
uninvolved in the task. Demographics are as follows: The average age was 46.5 years (s.d.=12.4)
and 73% were female. Based on the zip codes, the respondents are spread across the US.
Respondents were familiar with online shopping (4.48 out of 5, s.d.=0.9) and online hotel
booking (3.71 out of 5, s.d.=1.4). 17 % of them had stayed in Cancun for several days. Their
usual budget for hotels is $125 per night on average.
Respondents formed consideration sets with an average of 3.2 hotels. In general, hotels in
the consideration set were globally recognized hotel brands. That is, the percentage including
Marriot, Intercontinental, and Holiday Inn are 50.6%, 46.6%, and 36.7% respectively. In the
choice stage, the top three hotels were the Marriot (23.1%), Intercontinental (17.4%), and Fiesta
Americana (15.2%).
121
RESULTS
Model Comparison
We tested the ten models given in Table 3.1 that use the four types of perceived quality in
the different stages. For example, Model 4 hypothesizes that consumers use prior perceived
quality in the consideration set stage and posterior perceived quality updated by the Bayesian
manner in the choice stage. Since there is some support for this, we refer to it as the proposed
model. We compared the different models with log marginal likelihoods using the importance
sampling method of Newton and Raftery (1994). Note that all models have the same number of
parameters. Comparing the models, we can address: (1) In what stage consumers use product
reviews, and (2) how consumers use product reviews – i.e., ignore them, completely adopt them
or incorporate them with prior perceived quality in the Bayesian manner or heuristic manner.
Table 3.3 shows the log marginal likelihoods of unconditional choice )( jyP i ,
consideration set )( iCP , and conditional choice )|( ii CjyP for each model. The numbers in
parentheses are the ranks of the likelihoods, where smaller rank means a better model.
The comparison of log marginal likelihoods of unconditional choice reveals that Model
10, in which consumers are specified as using Bayesian updating of perceived quality in both
stages, is the best fitting model (log marginal likelihood is -3323). This means that consumers
use product reviews in the consideration set stage and update prior perceived quality with
product reviews in the Bayesian manner. Note that the Bayesian factors between the best model
(Model 10) and other models are large enough to support that the true model is Model 10. When
122
Table 3.3. Log Marginal Likelihood of Models
Model
Perceived Quality in
Consideration Set Stage ( Cq )
Perceived Quality in
Choice Stage ( Fq )
Log Marginal Likelihood
)( jyP i )( iCP )|( ii CjyP
Model 1
Prior
Prior -3391.0 (6) -2700.9 (7) -761.5 (10) Model 2 Reviews -3390.2 (5) -2647.5 (3) -752.1 (6) Model 3 Average* -3389.0 (4) -2689.3 (5) -753.6 (8) Model 4 Bayesian -3385.4 (3) -2697.8 (6) -752.6 (7) Model 5
Reviews Reviews -3392.2 (7) -2689.2 (4) -741.3 (2)
Model 6 Average -3381.2 (2) -2636.0 (2) -745.0 (3) Model 7 Bayesian -3456.4 (10) -2750.6 (10) -748.7 (5) Model 8
Average Average -3405.1 (8) -2721.8 (9) -747.2 (4)
Model 9 Bayesian -3416.8 (9) -2715.0 (8) -734.7 (1) Model 10 Bayesian Bayesian -3323.0 (1) -2592.2 (1) -759.2 (9)
*: Average represents average of prior perceived quality and product reviews.
we separately look at the log marginal likelihoods of consideration set and conditional choice in
Model 10, it is shown that Model 10 is the best fitting model for the consideration set stage, but
it does not explain the choice stage much. This reveals that the best fit for the consideration set
leads to the best model in the two stage decision process.
The second best model is Model 6, in which consumers use product reviews in the
consideration set stage and the average of prior perceived quality and product reviews in the
choice stage. This result shows that even product reviews themselves and the simple heuristic
have some utility towards describing the consumer decision process.
Among models, Model 4 was in accordance with the theory that the consideration set
stage is less effortful while the choice stage is more comprehensive (Gilbride and Allenby 2004).
However, our findings show that Model 4 is the third best model. While not the best, this
123
relatively high rank indicates that this model performs reasonably. Yet, the evidence outlined in
Table 3.3 does indicate that Bayesian updating is employed at the consideration stage, indicating
that respondents exerted the effort needed to integrate the information available at the
consideration stage with their prior beliefs.
In conclusion, product reviews turn out to be a critical factor in the decision process in
consideration set formation and all components of product reviews are used in the Bayesian
manner. Therefore, it is useful for firms to proactively manage product reviews such as by
displaying them saliently, or encouraging users to provide reviews, or having complaint redressal
mechanisms for unhappy consumers rather than having them air grievances online.
Parameter Estimation
Table 3.4 shows the parameter estimates from the consideration set stage and choice
stage of Model 10 whose marginal likelihood is the highest. In this model, Bayesian updating
perceived quality is employed as the measure of product quality. In the consideration set stage,
the hotel-specific intercept is closely associated with the possibility of the hotel being included in
the consideration set. For example, the intercept of JW Marriott has the highest value and it ranks
at the top in the consideration percentage. In contrast, the intercepts of Hotel Sotavento and
Imperial Las Perlas, which rank at the bottom, have quite negative values of 9.0 each. Thus.
the hotel-specific intercepts represent consumers’ consideration tendency on hotels based on
hotel intrinsic characteristics other than product attributes such as quality and price.
124
Table 3.4. Parameter Estimates of Model 10
Variables Consideration Set Model Choice Model Mean 2.5% 97.5% Mean 2.5% 97.5%
H1. Royal Solaris -0.576 -0.856 -0.376 -0.490 -1.394 0.134 H2. Dreams Resort -0.510 -1.026 -0.175 -0.767 -2.360 0.218 H3. GR Solaris -0.740 -0.971 -0.570 -1.054 -2.165 0.212 H4. InterContinental -0.220 -0.559 -0.008 0.080 -0.443 0.661 H5. Riu Palace Las Americas -0.549 -1.007 -0.274 -0.154 -1.138 0.755 H6. Fiesta Americana -0.401 -0.728 -0.143 0.938 0.280 1.644 H7. JW Marriott -0.161 -0.427 0.090 0.278 -0.536 0.926 H8. Hotel Sotavento -0.944 -1.121 -0.764 -1.157 -3.314 0.153 H9. Imperial Las Perlas -0.915 -1.058 -0.773 -0.826 -1.814 -0.076 H10. Holiday Inn -0.478 -0.636 -0.338 -1.184 -2.357 -0.481 Bayesian Updating Perceived Quality 0.054 0.005 0.115 0.019 -0.045 0.089 Price to WTP -0.104 -0.189 -0.048 -0.223 -0.338 -0.147 Awareness 0.252 0.138 0.421 0.423 0.180 0.676 Experience 0.075 -0.058 0.210 0.038 -0.227 0.289
Parameters in bold are significant at the 95% level.
The product attributes also affect consideration set membership. The coefficient of
Bayesian updating perceived quality is positive ( 054.0 ). Therefore, hotels with high
Bayesian updating perceived quality are more likely to be included in the consideration set. The
negative coefficient of the ratio of price to willingness to pay ( 104.0 ) shows that hotels
with the ratio of higher price to willingness to pay ratio are less likely to be included. In addition,
as the coefficient of hotel brand awareness is positive ( 252.0 ), it is more likely that well
known hotels are included in the consideration set. However, the coefficient of hotel experience
at other places is not significant, possibly because most of the respondents had not stayed at
hotels in the survey. In case of Holiday Inn, around 50% of respondents have stayed at one of its
125
chain hotels, but it seems that Holiday Inn, a relatively low quality hotel, is less attractive as a
resort hotel to respondents.
In the choice stage, the hotel-specific intercepts are significant only for some hotels. The
positive coefficient of Fiesta Americana ( 938.0 ) means that if this hotel is included in the
consideration set, it is likely to be finally chosen. In contrast, the negative coefficients of
Imperial Las Perlas and Holiday Inn ( 826.0 and ,184.1 respectively) means that even if
they are included in the consideration set, those hotels are much less likely to be finally chosen.
Other than those three hotels, there are no hotel-specific effects in the choice stage. This could be
because consumers have already considered the hotel-specific effects at the consideration set
stage.
The effects of other product attributes are different in the two stages. In the choice stage,
unlike in the consideration set stage, the Bayesian updating perceived quality is not significant.
That is, after consumers consider alternatives with respect to quality in the consideration set
stage, they do not consider quality any more. Possibly, they consider hotels with similar quality
level. The coefficients of the ratio of price to willingness to pay and awareness have the same
signs as those in the consideration set stage ( 223.0 and ,423.0 respectively). Thus, among
hotels in the consideration set, hotels with higher ratio of price to willingness to pay or less
known hotels are less likely to be chosen. Finally, the experience variable is not significant in the
choice stage either.
In summary, hotel specific characteristics affect both consideration set and choice. The
significance of Bayesian updating perceived quality means that product reviews play an
important role in the consideration set stage. But it is notable that consumers do not consider
126
quality again in the choice stage. Overall, lower price and awareness increase the possibility of
being included in the consideration set and being chosen as a choice.
Values of Bayesian Updating Perceived Quality and Product Reviews
We compute the monetary value of a unit increase in Bayesian updating perceived quality
and product reviews. Our approach is to compute the unit changes of Bayesian updating
perceived quality and price, which induce the same change of the consideration set utility. We
use the coefficients of Bayesian updating perceived quality and price to willingness to pay. As
Bayesian updating perceived quality consists of prior perceived quality and product reviews, we
can finally derive the value of a unit increase in product reviews by the chain rule formula.
The coefficient of Bayesian updating perceived quality *1 1 0.054ij ijz W means
that a unit increase in Bayesian updating perceived quality increases the consideration set utility
by 0.054. The coefficient of the ratio of price to willingness to pay *2 0.104ij ijz p
means a unit decrease in the ratio increases the utility by 0.104. Therefore, one unit increase in
Bayesian updating perceived quality brings as much utility change as )104.0/054.0(52.0 unit
decrease in the ratio of price to willingness to pay. That is, for an individual i across all products
)52.0(*
1
*
ij
ij
ij
ij
pz
Wz
,
where *ijz is the utility of including product j in the consideration set, 1ijW is the expectation of
Bayesian updating perceived quality, ijp is the price to willingness to pay. As ijp consists of
127
price jp and willingness to pay iWTP , it is the case that .1
ij
i
jij WTP
dpWTP
pddp Thus, 0.52
unit decrease in ijp is equivalent to iWTP52.0 unit decrease in price as
ijij WTPdpdp 52.052.0 ,
where iWTP is willingness to pay of the individual i and is constant across hotels. In our dataset,
the average of iWTP52.0 of all respondents is $70.6, which indicates that the value of one unit
increase in Bayesian updating perceived quality is worth $70.6 in that both one unit increase in
Bayesian updating perceived quality and price decrease of $70.6 result in the same utility
change.
Next, we calculate the monetary values of each component of product reviews using the
monetary value of Bayesian updating perceived quality. Based on Equation 6, we set up the
expectation of Bayesian updating perceived quality as )/1(/1)/1(/1
220
20
20
1jjij
jjjijijij sn
rsnWW after
replacing 2ijq by 2
js which summarizes the variance of product reviews. By multiplying the
monetary value of Bayesian updating perceived quality and the derivatives of 1ijW with respect
to each component of product reviews, we can calculate the monetary values of product reviews
as follows.
128
Component of Product Reviews
Monetary Value (=Necessary Price Change Derivative)
Mean )( jr 22
0
21
1
**
//1/
)52.0(jjij
jji
j
ij
ij
ij
j
ij
snsn
WTPr
WWz
rz
Number )( jn 222
0
20
201
1
**
)//1()//1)((
)52.0(jjij
ijjijji
j
ij
ij
ij
j
ij
snsWr
WTPn
WWz
nz
Variance )( 2js
2220
20
220
21
1
*
2
*
)//1()//)((
)52.0(jjij
ijjjijji
j
ij
ij
ij
j
ij
snsnWr
WTPs
WWz
sz
Note that even though the monetary values of one unit change in posterior )52.0( iWTP are
product-invariant, the monetary values of product reviews are the product-variant as the prior
perceived quality and product reviews are product-variant.
Table 3.5 shows the monetary values of a unit increase in the mean, number, and variance
of product reviews by hotels.
Table 3.5. Monetary Value of a Unit Increase in Product reviews Components
Hotel
Monetary Value of Unit Increase in Product reviews ($)
)( 0ijj Wr Mean )( jr
Number)( jn
Variance)( 2
js
H1. Royal Solaris 67.6 -0.003 0.60 -0.030 H2. Dreams Resort 69.7 0.004 -1.10 0.931 H3. GR Solaris 68.1 -0.002 0.45 -0.058 H4. InterContinental 56.2 0.160 -4.32 0.605 H5. Riu Palace Las Americas 69.1 0.012 -2.16 0.983 H6. Fiesta Americana 68.2 0.058 -3.99 0.941 H7. JW Marriott 45.8 -0.093 2.71 -0.261 H8. Hotel Sotavento 40.6 -0.310 3.41 -0.302 H9. Imperial Las Perlas 39.6 0.343 -4.12 0.728 H10. Holiday Inn 43.8 -0.667 8.68 -0.589
129
Some observations are that (1) a unit increase in the mean of product reviews is the most
valuable, while a unit increase in variance is the second and unit increase in the number of
product reviews is not very valuable, and (2) values of product reviews vary across hotels.
Regarding the value of a unit increase in the mean consumer review, the average is $57
with a maximum of $69.7 for Dreams Resort hotel (H2) and the minimum of $39.6 for Imperial
Las Perlas (H9). The average value implies that a unit increase in the mean of product reviews
brings as much utility increase in the consideration set as price decrease by $57. Therefore, the
higher mean of product reviews is an alternative to avoid undesirable price decrease to be
included in the consideration set.
Interestingly, however, the value of a unit change in the number of product reviews is not
high and its sign is inconsistent across hotels. The maximum value is $0.34 per review for
Imperial Las Perlas (H9) and the minimum value is -$0.66 for Holiday Inn (H10). Different signs
result from the difference between the mean of product reviews and prior perceived quality
)( 0ijj Wr . If the mean of product reviews is higher than prior perceived quality (e.g., H9),
consumers may interpret the larger number of product reviews positively. However, if the mean
of product reviews is lower than prior perceived quality (e.g., H10), consumers may have doubts
about quality on those hotels and be assured by the large number of reviews.
A unit increase in variance of product reviews has moderate value and large differences
across hotels. For example, its maximum value is $8.68 for Holiday Inn (H10) and the minimum
value is -$4.32 for Intercontinental (H4). Again, different signs result from the difference
between the mean of product reviews and prior perceived quality )( 0ijj Wr but the interpretation
is not the same. If the mean of product reviews is higher than prior perceived quality (e.g., H4),
130
high variance possibly makes consumers think that even though the overall quality is high, there
are some consumers who experienced low quality just like their low prior perceived quality. So,
the value of high variance is negative. However, if the mean of product reviews is lower than
prior perceived quality (e.g., H10), consumers may regard large variance as consumer
heterogeneity and positively interpret that there are consumers who experience high quality just
like their high prior perceived quality.
In summary, using the estimation results of Bayesian updating perceived quality and
price, we find that the value of Bayesian updating perceived quality is $70.6 and the various
monetary values of product reviews depending on hotels and differences between the mean of
product reviews and prior perceived quality. Especially, the value of the mean of product reviews
is around $57 on average.
131
CONCLUSION
Summary
The objective of this paper is to study in which stages of the purchase decision process
consumers use product reviews and how they incorporate product reviews with their prior
perceived quality. We also evaluate how valuable product reviews are in monetary terms. We
used four types of perceived quality (viz. prior perceived quality, product reviews, average of
prior perceived quality and product reviews, and Bayesian updating perceived quality) in a two-
stage choice model in order to understand consumers’ decision processes when product reviews
are available.
The best fitting model (Model 10) shows that consumers use Bayesian updating
perceived quality in the consideration set stage. This means that consumers use product reviews
from the consideration set stage and the update method is consistent with the Bayesian manner,
by which consumers update prior perceived quality using the information components of product
reviews. These components are the mean of product reviews, their number and variance.
The estimation results in the two-stage choice model are summarized as follows: In the
consideration set stage, intrinsic hotel effects are high for well-known international hotel brands
such as Marriott but low for local hotels such as Hotel Sotavento. Hotels with high Bayesian
updating perceived quality are more likely to be included in the consideration set while hotels
with high price are less likely to be included. It is also shown that awareness is important for
hotels to be included.
132
In the choice stage, the results show that intrinsic hotel effects and Bayesian updating
perceived quality become much less important. Rather, price and awareness play a significant
role. Consumers consider hotels with a similar quality level in the consideration set stage but
once they construct consideration sets consisting of the similar quality hotels, they put more
weight on prices and awareness.
Finally, we compute the monetary values of the components of product reviews. We find
that a unit increase in the mean of product reviews is worth $57 on average. That is, by
improving the mean of product reviews, hotels are more likely to be included at the same price or
do not need to reduce prices to be considered more. Our findings also show that the number of
product reviews is less important, while the variance of product reviews can have positive or
negative monetary value depending on the differences between the mean of product reviews and
prior perceived quality.
Managerial Implication
There are several managerial implications of our study for retail managers who present
product reviews of different manufacturers’ products or manufacturers themselves. First, the
result that consumers use product reviews in the consideration set stage but less so in the choice
stage provides a guide for how to display product reviews. Since product reviews are important
from the consideration set stage, managers may need to give consumers easy access to product
reviews from the beginning of the search. The methods would include showing product reviews
in the list of first search results, or allowing consumers to sort the search results by the
components of product reviews. Then, consumers could actively use product reviews from the
consideration set. The managerial implication to manufacturers is that they need to have their
133
product quality good enough to be included in the consideration set because once consumers
construct the consideration set, quality is not a choice criterion any more but price still is.
Therefore, as shown in our results of the choice model, they should be aware of more price
competition between manufacturers within the similar quality level.
Second, consumers’ Bayesian updating shows that retailers and manufacturers need to be
concerned about all components of product reviews (i.e., the mean, number, and variance) as all
of components are used to update prior perceived quality. Particularly, it is recommended for
retailers to provide variance information, perhaps by using a histogram, as well as the mean and
the number which are commonly presented. Manufacturers should note that a high mean of
product reviews is much more important for determining Bayesian updating perceived quality
and eventually consideration set formation than a larger number of product reviews. Thus, it is
beneficial for manufacturers to provide encouragement to consumers who have positive
experiences with their products in order to have them write good product reviews and to handle
grievances of unhappy consumers proactively. In other words, manufacturers may need to
concentrate on motivating satisfied consumers more than increasing the number of product
reviews.
Third, regardless of the strong effects of product reviews, it is important to manage
consumers’ prior perceived quality and awareness at all times. Prior perceived quality is directly
related to Bayesian updating perceived quality and indirectly mediates the effects of the number
and variance of product reviews on Bayesian updating perceived quality. Therefore, if
manufacturers constantly maintain high prior perceived quality by brand positioning or
134
advertising, they may be able to negate the effects of bad product reviews, which are sometimes
inevitable.
Limitation and Future Research
A limitation of the research is that besides the numerical summary of product reviews,
consumers also get product information from review passages and on some sites, the percentage
of consumers who recommend a review as being helpful or unhelpful. Furthermore, consumers
can deliberately search for some positive phrases for including alternatives quickly or negative
phrases for eliminating alternatives. Therefore, it would give new insights to quantify descriptive
passages and analyze them. In further research, researchers can utilize product reviews on
subcategories. From the hotel example, consumers may also refer to detailed evaluation on hotel
service, gym or pool, hotel condition, room cleanliness, or room comfort. As consumers consult
information on different subcategories depending on products or purchase situations (e.g.,
vacation, business, or family trip), models which consider detailed information would be useful.
135
REFERENCES
Allenby, Greg M. and James L. Ginter (1995), “The Effects of In-store Displays and Feature Advertising on Consideration Set,” International Journal of Research in Marketing, 12 (May), 67-80.
Andrews Rick L., T.C. Srinivasan (1995), “Studying Consideration Effects in Empirical Choice
Models Using Scanner Panel Data,” Journal of Marketing Research, XXXII February, 30-41.
Chevalier, Judith A., Dina Mayzlin (2006), “The Effect of Word of Mouth on Sales: Online
Book Reviews,” Journal of Marketing Research, 43 (August), 345-354. Chiang, Jeongwen, Siddhartha Chib, Chakravarthi Narasimhan (1999), “Markov chain Monte
Carlo and models of consideration set and parameter heterogeneity,” Journal of Econometrics 89 223-248.
Clemons, Eric K., Guodong Gordon Gao, Lorin M. Hitt (2006), “When Online Reviews Meet
Hyperdifferentiation: A Study of the Craft Beer Industry,” Journal of Management Information Systems, 23 (2), 149-171.
Edwards, Yancy D., Greg M. Allenby (2003), “Multivariate Analysis of Multiple Response
Data,” Journal of Marketing Research, 40 (August), 321-334. Erdem, Tülin and Michael P. Keane (1996), “Decision-Making under Uncertainty: Capturing
Dynamic Choice Processes in Turbulent Consumer Goods Markets,” Marketing Science, 15 (1), 1-20.
Gensch, Dennis H. (1987), “A Two Stage Disaggregate Attribute Choice Model,” Marketing
Science, 6 (Summer), 223-31. Gilbride, Timothy J., Greg M. Allenby (2004), “A Choice Model with Conjunctive, Disjunctive,
and Compensatory Screening Rules,” Marketing Science, 23, 391-406. Hajivassiliou, V., D. McFadden, P. Rudd. (1996). “Simulation of multivariate normal rectangle
probabilities and their derivatives,” Journal of Econometrics, 72, 85-134. Keane, M. (1994). “A computationally practical simulation estimator for panel data,”
Econometrica, 62, 95-116.
136
Koop, Gary (2003), Bayesian Econometrics. Hoboken, NJ: Wiley. Liu, Yong (2006), "Word of Mouth for Movies: Its Dynamics and Impact on Box Office
Revenue," Journal of Marketing, 70 (3), 74-89. Mehta, Nitin, Surendra Rajiv, Kannan Srinivasan (2003), “Price Uncertainty and Consumer
Search: A Structural Model of Consideration Set Formation,” Marketing Science, 22 (1), 58-84.
Newton, M. and Raftery, A. (1994), “Approximate Bayesian inference by the weighted
likelihood bootstrap,” Journal of the Royal Statistical Society, Series B, 56, 3-48. Nierop, Erjen Van, Bart Bronnenberg, Richard Paap, Michel Wedel, Philip Hans Franses (2010),
“Retrieving Unobserved Consideration Sets from Household Panel Data,” Journal of Marketing Research, 47 (February), 63-74.
Roberts, John H., James M. Lattin (1997), "Consideration: Review of Research and Prospects for
Future Insights," Journal of Marketing Research, 34 (August), 406-410. Sun, Monic 2009, “How Does Variance of Product Ratings Matter?” Working paper, Stanford
University, CA. Vermeulen, Ivar E. and Daphne Seegers (2009), “Tried and tested: The impact of online hotel
reviews on consumer consideration,” Tourism Management, 30, 123-127.
137
APPENDIX
MCMC Algorithms
Equation 8 is the main equation to estimate. Our approach is to sequentially draw , ,
, *iZ and *
iY . We first stack all equations into vectors and matrices as
*
**
i
ii Y
ZU , F
i
Ci
i XX
X0
0, B ,
i
iie .
We then stack all the observations together as
*
*1
*
.
.
NU
U
U ,
NX
X
X..
1
, and
Ne
e
e..1
,
and write
(A1) eXBU *,
where e is ),0(N , and where is a block-diagonal matrix given by ),0( NIN .
As Equation A1 is a SUR model, we can estimate B and by using a Gibbs sampler
with standard Normal-Wishart priors. Specifically, we use a normal prior ),(~ BBNB and a
Wishart prior ),(~1 VvW .
The posterior of B conditional on *U and 1 is ),(~,| 1*BBNUB ,
138
where )(1
*1'1N
iiiBBBB UX and 1
1
1'1 )(N
iiiBB XX . The posterior for 1
conditional on *U and B is ),(~,| *1 VvWBU , where vNv and
.)')((1
1
**1N
iiiii BXUBXUVV
Now, we estimate *iZ and *
iY using data augmentation. First, we data augment *iZ given
*iY and other parameters from a multivariate normal distribution
(A2) )~,|(~ **zzi
Cii YXMVNZ ,
where *| iCi YX is the expectation of *
iZ conditional on *iY and zz
~ is the variance-covariance
matrix of error terms in the consideration set stage conditional on the other variance-covariance
matrices ( zyyy , ). We draw a positive *ijz if consumer i includes product j in the consideration
set and a negative *ijz if consumer i does not include product j in the consideration set.
Second, we data augment *iY given *
iZ and other parameters from a multivariate normal
distribution
(A3) )~,|(~ **yyi
Fii ZXMVNY ,
where *| iFi ZX is the expectation of *
iY conditional on *iZ and yy
~ is the variance-covariance
matrix of error terms in the choice stage conditional on the variance-covariance matrices
),( zyzz .
We draw *ijy in the case of whether product j is in the consideration set and whether it is
finally chosen in the consideration set. For products in the consideration set, we augment *ijy
from a sub-distribution of the distribution in A3, consisting of the mean vector and variance-
139
covariance matrix of products in the consideration set. We impose restrictions that *ijy is the
highest for the finally chosen product among products in the consideration set and that each *ijy
is negative if the consumer chooses the outside option (No reservation). For products not
included in the consideration set, we augment *ijy from a sub-distribution of the distribution in
A3, consisting of the mean vector and covariance matrix of products not included in the
consideration set. Unlike the products in the consideration set, however, we do not impose a
restriction on the size and sign of *ijy .
VITA
Sungha Jang received a Bachelor of Economics with a major in statistics in 1998 and a Master of
Business Administration concentrating on Marketing in 2001 from Korea University, Seoul,
Korea. He will be awarded the Doctor of Philosophy in Management Science specializing in
Marketing in May, 2011 at the University of Texas at Dallas. Prior to joining the Ph.D. program,
he worked for Experian Korea as a senior consultant in the field of credit risk management.