Framework for Online Searches by Consumers

Electronic copy available at: http://ssrn.com/abstract=2675331

A Framework for Modeling How Consumers FormOnline Search Queries

Jia Liu∗and Olivier Toubia†

Columbia University

October 16, 2015

Abstract

We explore how consumers form online search queries, and in particular the linkbetween consumers’ information needs and their search queries. Our goal is to providea framework for the development of search models that can infer consumers’ informa-tion needs from their queries. The semantic relationships between queries and resultsdifferentiate query formation from traditional, discrete-choice based search. Accord-ingly, our specific research questions are as follows: (i) Are consumers able to lever-age semantic relationships between queries and results when forming online searchqueries? (ii) How should researchers represent these semantic relationships? (iii)What are consumers’ beliefs on these semantic relationships? Using an experiment inwhich information needs are manipulated exogenously, we find that consumers havethe ability to formulate queries that leverage semantic relationships. Consequently,models of search query formation should capture consumers’ beliefs on a set of se-mantic relationships, which capture the probability that any query will activate any setof words. Fortunately, we show that these semantic relationships may be approximatedparsimoniously by functions of asymmetric activation probabilities at the word level.We find that consumers’ beliefs are biased upwards, and that they are not asymmetricenough.

Keywords: online search engines, search models, information needs, preferences,semantic relationships

∗Jia Liu is a Ph.D. Candidate in Marketing, Columbia Business School. Email: [email protected].†Olivier Toubia is Glaubinger Professor of Business, Graduate School of Business, Columbia University. Email:

[email protected]

mailto:[email protected]

mailto:[email protected]


1 Introduction

Over the past decade, search engines like Google have become one of the primary tools con-

sumers use when searching for products, services, or content. According to a report by Fleishman-

Hillard (2012), 89% of consumers visit Google, Bing or other search engines to find information

prior to making purchases. For major purchases like cars, 81% of consumers go online before

heading out to the store, and spend an average of 79 days gathering information (GE, 2013). That

is probably why search marketing spending is expected to reach $31.62 billion in the U.S. only in

2015 (Statista, 2015a).

The relevance of search results and of search-related advertising and targeting is a function

of how well the content presented to a consumer matches their underlying information needs.

Therefore, it is essential for search engines and the firms that use them to be able to correctly

infer consumers’ information needs based on their search queries. For example, the amount that

a marketer should be willing to bid on particular keyword is a function of how well their content

matches with the information needs of consumers who usually type that keyword. More generally,

search queries contain valuable information about users’ preferences (Pirolli, 2007), which users

tend to provide frequently, voluntarily and truthfully. As such, online search queries have the

potential to be leveraged in revealed preference frameworks.

A large literature in marketing (and economics) has linked preferences and utility to search,

in situations where search is performed by a series of discrete choices (e.g., purchases, clicks).

While text-based search may be viewed as a special case of discrete-choice search, some aspects

of text-based search are not captured by traditional search models. In particular, search queries and

online content are semantically related to each other. For example, consider a consumer typing the

following query: “affordable sedan made in America.” Inferring this consumer’s information needs

based on the query may not be as straightforward as concluding that they are simply interested in

affordable sedans made in America. For example, it might be possible that the most important

attributes for this consumer are in fact safety, comfort, and made in America, and that affordability

is of lesser importance. This consumer might have decided to type the query “affordable sedan

2


made in America” because they believe that cars made in America are generally safe (but that

the reverse is not necessarily true) and comfortable (again, the reverse may not be true), but not

necessarily affordable. In that case, the consumer anticipated that they would find relevant search

results (i.e., results that match their information needs) efficiently (i.e., with short queries) by only

including “made in America” and “affordable” in their queries, but not “safe” or “comfortable,”

although these are important attributes. In other words, the consumer may have leveraged the

semantic relationships between queries and results when formulating their query.

Before proceeding further, we define “information needs” and “semantic relationships” more

precisely in our context. The Information Retrieval (IR) literature defines information needs as

“topics about which the user desires to know more” (Manning et al. (2008), Page 5). Such topics

could be constructed as functions of individual words, using for example natural language pro-

cessing tools such as Latent Dirichlet Allocation (Blei et al., 2003; Tirunillai and Tellis, 2014). In

that case information needs would be expressed as functions of topics, which themselves would be

expressed as functions of words, i.e., information needs would be indirectly expressed as functions

of words. More generally, information needs may be captured by utility functions that are specified

over a dictionary of relevant words.

In this paper, we define “semantic relationships” based on word occurrence on the web, fol-

lowing the general approach in the information retrieval literature (Khoo and Na, 2006). Semantic

relationships in our context are objectively defined probabilities of finding specific sets of words

in the top results of a given search engine, given a specific search query.1 Consumers hold beliefs

on these probabilities, which are approximations of the true, empirical probabilities. We note that

according to this definition, semantic relationships are specific to individual search engines. In this

paper we focus on Google, which as of April 2015 had a 88.44% share globally (Statista, 2015b).

We also note that this definition is different from the typical definition of semantic relationships

in cognitive psychology, which relates to spreading activation from memory (Collins and Loftus,

1975; Raaijmakers and Shiffrin, 1981).

1If information needs are represented by topics instead of words, then the relevant semantic relationships maydescribe the link between the topic distribution in the search query and the topic distribution in the search results.

3

We make a distinction between two non-mutually exclusive ways of selecting words to form

a search query. Preference-based search consists in selecting the most valuable words for in-

clusion in the query, i.e., the words that are most strongly related to the consumer’s information

needs. In our previous example, preference-based search would lead to the inclusion of “safe” and

“comfortable” in the query. However, semantic relationships may be leveraged to reach valuable

information with shorter queries. As argued above, this may lead a consumer to form the query

“affordable compact car made in America”, with the anticipation that the search results will be

likely to contain information about cars that are also safe and comfortable. We label this approach

to forming search queries as semantic-based. Semantic-based search enables consumers to shorten

their queries by leveraging semantic relationships between queries and results. This is consistent

with empirical evidence that users tend to form short queries (Jansen et al., 2000; Spink et al., 2001;

Kamvar and Baluja, 2006; Jansen et al., 2009). This is also consistent with a boundedly rational

view of users, whereby forming longer queries is cognitively costly (Ruthven, 2003; Azzopardi

et al., 2013).

Our motivation for this paper is the question of whether and how consumers leverage semantic

relationships when forming online search queries. This question has clear implications for the de-

velopment of models that leverage online search queries as a source of information on consumers’

preferences. Indeed, if consumers do not engage in semantic-based search, their information needs

may be learned directly and relatively easily from their queries. However, if semantic-based search

is relevant, it becomes necessary to understand how consumers translate their information needs

into search queries, in order to be able to infer information needs from queries. In that case, lever-

aging the information revealed by consumers in their search queries requires “reverse engineering”

queries to infer the underlying information needs.

Developing search models that are able to infer information needs from online queries is an

ambitious endeavor, which should probably be tackled by multiple researchers across multiple

papers. In this paper, we attempt to provide a framework for the development of such models,

by improving our basic understanding of the link between information needs and search queries.

4

Given this background, our specific research questions are as follows: (i) Are consumers able to

leverage semantic relationships between queries and results when forming online search queries?

(ii) How should researchers represent these semantic relationships? (iii) What are consumers’

beliefs on these semantic relationships?

The rest of the paper is organized as follows. Section 2 reviews relevant research. Section

3 introduces relevant definitions and notations. Section 4 presents the experimental design for

Study 1. Section 5 addresses our first research question. Section 6 addresses our second research

question. Sections 7 and 8 address our third research question: Section 7 compares the estimated

users’ beliefs from Study 1 to the true activation probabilities; Section 8 describes Study 2 in

which we measure these beliefs directly and compare them to the truth. Section 9 concludes and

integrates our results into a framework for modeling query formation.

2 Relevant Literature

2.1 Search Models

Our paper is related to the large literature in marketing that has studied how preferences drive

search, and modeled the search behavior of utility-maximizing agents (Erdem et al., 2005; Hui

et al., 2009; Park and Chung, 2009; Dzyabura, 2013; Yang et al., 2015). Some of this literature

has even studied search in the context of search engines (Jeziorski and Segal, 2010; Kim et al.,

2010; Shi and Trusov, 2013). However, in this literature search is typically expressed by discrete

choices among items such as products or links. Consequently, the link between preferences (or

information needs) and text-based online search queries has largely been ignored in marketing. As

discussed above, text-based search is not a straightforward special case of discrete-choice search in

which consumers would select from a very large universe of queries. In particular, search words are

semantically related to each other and to the search results, which creates a rich set of dependencies

between queries and their results.

Such semantic relationships between words have been used in the cognitive science literature

5

to predict users’ navigation path on the web, under the framework of Information Foraging Theory

(Pirolli and Card, 1999; Fu and Pirolli, 2007; Wu et al., 2014). In these models, users develop

an updated assessment of a website’s relevance after reviewing its content, and adjust their page-

viewing strategies based on their ongoing evaluation of the website’s utility and their own search

cost. The assessment is assumed to be a function of the association strength between different

words. Because users’ actual beliefs on these association strengths are unobservable, these re-

searchers usually assume users’ beliefs are the same as the actual semantic relationships obtained

from online text corpora. Our research differs from these studies in two main ways. First, we focus

on text-based query formation behavior, whereas these models only study discrete search behavior

(e.g., clicking). Second, rather than assuming that users leverage semantic relationships and have

correct beliefs, we test whether users leverage semantic relationships in their queries, study the

accuracy of their beliefs, and identify systematic ways in which beliefs deviate from the truth.

2.2 Information Retrieval

A large body of research on online search queries comes from the Information Retrieval (IR)

literature, which has focused primarily on the problem of finding the most relevant documents

given a query (e.g., Salton and McGill (1986); Manning et al. (2008)). The IR literature was

developed well before web search engines existed, for situations in which professionals trained

in the art of phrasing queries searched over a collection of documents whose style and structure

they understood well. In such situations, queries tend to be well-formed and reflect information

needs accurately. Accordingly, this literature has typically focused more on optimizing information

retrieval systems using a query as input, than on understanding the process by which users form

their queries. However, with web search engines, the link between information needs and queries

may not be as direct, and it is not as well understood (Santos et al., 2015). In contrast to the

traditional approach in the IR literature, in this research we treat search engines as “black boxes”

and focus on the behavior of the end users. In particular, we attempt to understand how consumers

form online search queries based on their information needs and based on their beliefs on how

6

the search engine operates. Accordingly, we only review here a few areas within the IR literature

which are most relevant to our research.

2.2.1 Descriptive Studies on Online Queries

There is a considerable body of descriptive research on online queries in the IR literature. It has

been shown consistently that most queries are a list of one or more nouns; on average the length

of a query is two to three terms; and at least 80% of queries contain three terms or fewer (Kamvar

and Baluja, 2006; Jansen et al., 2000; Spink et al., 2001; Jansen et al., 2009). The average number

of queries per user session has been found to be between two and five, depending on the study and

the search task (Jansen et al., 2000; Spink et al., 2001; Wu et al., 2014). The underlying reason

could be that forming long queries is costly for users in terms of time, cognitive effort, physical

typing, and so on (Ruthven, 2003; Azzopardi et al., 2013). Importantly, these findings suggest that

it is reasonable for us to focus on short queries with nouns in this research.

Researchers have also proposed different ways to categorize the intent of queries. The first

and most popular categorization of online queries was proposed by Broder (2002) who defined

three broad classes: informational, navigational, and transactional. Informational search involves

looking for a specific fact or topic; navigational search seeks to locate a specific web site; and trans-

actional search usually involves looking for information related to a particular product or service.

Jansen et al. (2008) found that about 80% of queries are informational, about 10% are navigational,

and under 10% are transactional. Rose and Levinson (2004) refine Broder’s taxonomy by intro-

ducing the concept of “Resource” search, where the user’s goal is to obtain a resource available

on web pages (e.g., files, movies, etc.). Although these empirical studies provide valuable insights

into what the intent is behind user queries, none of them has suggested a systematic framework for

modeling the query formation process so that more precise inferences could be derived.

7

2.2.2 IR Models

Models of information retrieval, in general, predict and explain which documents are most

relevant given a search query (Roelleke, 2013). Popular traditional IR models include models

based on the probability of relevance framework such as BM25 (Robertson and Zaragoza, 2009),

and language models such as Unigram (Zhai, 2008). Others have used topic modeling or other

language models to map queries and documents onto topics, and retrieve documents whose topic

distribution matches that in the query (Kurland and Lee, 2004; Wei and Croft, 2006).

We note a few simplifying assumptions about words in queries and documents often made by

IR models. One is the bag-of-word approach, which ignores the position or order of words in

queries and pages. This assumption is commonly used, because it is extremely hard to develop

a model of relevance with positional information without exploding the number of parameters,

and position information has been shown to have surprisingly little effect on retrieval accuracy

(Robertson and Zaragoza, 2009). Other approximations which relate to the independence of words

in queries and/or pages will be introduced more formally in Section 6. In the IR literature, these

approximations are usually assumed to be valid for convenience. In contrast, we will test the

validity of our proposed approximations in Section 6 using the actual semantic relationships on

Google.

We also note that the IR literature has recognized that search queries may not be perfectly repre-

sentative of the users’ information needs, and has proposed solutions to this problem, for example

by increasing the diversity of the search results (Santos et al., 2015) or by combining information

across clusters of queries and documents (Kurland and Lee, 2004). However, as mentioned above,

IR models are typically focused on optimizing the search results given a query. In contrast, we at-

tempt to understand how consumers translate their information needs into search queries, in order

to pave the way for the development of search models that can accommodate text-based search. To

the best of our knowledge, this type of research has not been conducted in the IR literature.

8

2.2.3 Semantic Relationships

Several researchers in IR have considered semantic relationships between queries and doc-

uments, typically in order to improve retrieval performance (Li and Xu, 2013). For instance,

Ruthven (2003) explores the use of semantic relationships for query expansion, by identifying new

words that are semantically related to the information the user is presumably seeking, and adding

these new words to existing queries. The effort to leverage semantic relationships took a new di-

mension with the advent of the “semantic web”(Berners-Lee et al., 2001) that is meant to identify

semantic relationships between web pages, people, organizations, places, etc. Guha et al. (2003)

use the term “semantic search” to refer to a system that leverages the semantic web to improve

traditional web searching. For example, their system leverages semantic relationships between

search terms to disambiguate queries by inferring the exact semantic meaning of each word in a

query. See Mangold (2007) for a review of semantic search in IR systems.

In sum, some descriptive research has been published in the IR literature that identifies vari-

ous types of queries and reports statistics related to online search queries. Previous research has

leveraged semantic relationships to improve the efficiency of search engines, but has not focused

on exploring whether users also leverage such relationships when forming queries. In general,

the IR literature focuses primarily on improving the results of IR systems, taking queries as in-

put. In contrast, we focus on the end users and attempt to understand how they form their search

queries, given their information needs and their beliefs on how search engines operate. As such, by

“semantic-based search” we refer to the strategies employed by users when forming their queries,

not to systems designed to improve the efficiency of search engines. Therefore, our definition of

“semantic-based search” should not be confused with the definition of “semantic search” in IR.

9

3 Definitions and Notations

We assume consumers derive value from consuming the content of webpages, based on how

well these pages satisfy their information needs. As argued earlier, the value of a webpage l for

a user may be written as a function of the words that are present in the page. In practice, such

function is likely to be very complex and highly non-linear. For example, some words complement

each other, some words are substitutes, some words have different meanings or value based on

which other words are present, etc. In particular, words may be mapped onto topics using natural

language processing tools such as Latent Dirichlet Allocation (Blei et al., 2003; Tirunillai and

Tellis, 2014). That is, the value of a page to a user may be specified as a function of the topics on

the page, which may themselves be specified as functions of the words on the page.

In this paper, we exogenously select simple functions that link words to value, instead of relying

on richer approaches such as topic modeling. This allows us to have no uncertainty on the correct

specification of the value of a page as a function of its text. This also allows us to manipulate

information needs exogenously, in ways that are easy to explain to users. In particular, we choose

a very simple value function that is linear and additive in a set of dummy variables indicating which

words are present in the page. This assumption is not critical to our analysis, and our results and

conclusions are easily generalizable to alternative specifications.

Let g = {t1, t2, ..., tW} denote the set of relevant words. We denote by β j the value of word t j

for a given user. In our experiment, this value is determined by us and communicated to the user.

The value of a webpage l for the user is as follows:

v(l|β) = ∑t j∈g

β jI(t j ∈ l) (1)

where I(t j ∈ l) indicates whether webpage l contains word t j.

A search query q is defined as any ordered subset of the words in g. A user who would only

use preference-based search would simply form queries by selecting the words with the highest

β j. In contrast, modeling semantic-based search requires capturing the probability that any given

10

query may retrieve webpages with any set of relevant words. In general, when evaluating search

results from a query, users tend to focus on the top results (Narayanan and Kalyanam, 2011; Shi

and Trusov, 2013; Yoganarasimhan, 2015). Accordingly, we focus on the top K = 10 results from

query q. For each possible subset of words s⊆ g, we define φq→s as the probability that a random

webpage l from the top K results of query q contains exactly all the words in set s. We say that a

webpage contains a word when the word appears anywhere on the actual page itself, not just the

page title or snippet displayed on the search engine result page. We refer to φq→s as the probability

of “activating” the words in s using query q. The expected value of each top search result l from

query q can be written as:

E(v(l)|β,q) = ∑s⊆g

φq→s ∑t j∈s

β j (2)

Even if the value function above is linear and additive, the number of possible queries and the

number of subsets of g both grow exponentially with the number of words in g. In particular, with

W words there are 2W possible subsets of g. Therefore, in Section 6, we will introduce and test

some assumptions that allow approximating the relevant semantic relationships using functions

of a more parsimonious set of parameters. Before investing in building such representation, we

will first verify that it is indeed necessary to capture semantic relationships when modeling and

studying user queries, by providing model-free evidence for semantic-based search in Section 5.

4 Design of Study 1

Addressing our first research question requires distinguishing semantic-based search from preference-

based search empirically. As mentioned above, doing so requires either a modeling framework that

links information needs to query formation, or an appropriate set of exogenous variations. In order

to inform the development of such a modeling framework, in Study 1 we opt for the second option

and explore the relevance and existence of semantic-based search using experimental data. In par-

ticular, we develop a “search query game”, an experimental paradigm that allows us to manipulate

information needs exogenously.

11

4.1 Search Query Game

We designed our paradigm with the following specifications in mind: (i) the relevant words g

and their values β should be set exogenously and provided to participants; (ii) participants should

be asked to form queries based on g and β; (iii) the game should reflect the fact that creating

longer search queries is costly to users; (iv) the game should be incentive-aligned, i.e., participants’

payment should be a function of the “value” (as defined based on β) of the results of their queries

and the length of their queries; (v) the value of a query should be independent of the particular

computer on which the game is played; (vi) in order to focus exclusively on query formation, any

other type of search behavior such as evaluating results and clicking on links should be excluded

from the game; (vii) the game should capture the essence of query formation on search engines;

(viii) the game should be easy to explain to participants.

Taking these into consideration, we developed a search query game that asks each participant

to form search queries on Google to win a cash bonus. This game is played in independent tasks.

In each task a participant is given a set of three words g = (t1, t2, t3). Each word t j is randomly

assigned a monetary value β j, which is either high ($2) or low ($1). The participant is asked

to form one search query based on the three given words, i.e., decide which word(s) to use and

in what order. For each query, we consider the pages associated with the top K=10 results. We

compute the value of each page based on the words it contains. For example, suppose the three

words in a task are “fruit”, “salad”, and “chicken”, and their respective values are $2, $1, and $2. A

webpage that contains “chicken” and “salad” would have a value of $3. The score associated with

each query is the value of its best result (among the top 10 results), minus the cost of the query in

dollars. This cost simply equals the number of words in the query. In our previous example, the

query “salad” has a cost of $1, and the query “chicken fruit” has a cost of $2. This mimics the

various costs reviewed above that are associated with longer queries. Participants are informed that

their queries will be run automatically in the background and that the webpages associated with

the top 10 results will be scanned. The actual instructions of the game to participants are displayed

in Appendix 1. To ensure that participants understand the instructions, they are given a short quiz

12

after reading the instructions. Participants proceed to the game only after having answered all quiz

questions correctly.

One obvious approach for participants is to form a query with all three words, which costs $3

and has the highest value (assuming all three words will be found on at least one page). However,

participants may be able to reduce the cost of the query and maintain its value by leveraging the

semantic relationships among these words. Moreover, picking a word with low value has zero

marginal return to a participant using preference-based search only (both its value and its cost are

equal to $1). However, if a low-value word has a positive probability of activating high-value

words, then picking a low-value word may have a positive marginal return to a user engaging in

semantic-based search. Figure 1 shows an example in which a participant is asked to search for

“milk”, “cheese”, and “tea”. In Figure 1(a), the participant is forming their query by deciding

which words to use and in which order. In this case, although “cheese” and “tea” are worth more

than “milk”, only “milk” has a strong association with both of the other two words. This implies

that forming the one-word query “milk” may achieve the highest score.

After submitting a search query, on the next page the participant is shown the url of the link

with the highest score, the list of words that are found on that page, and the score for this task.2

For example, Figure 1(b) is the result after a participant submits the query “tea cheese” in which

they pick the two words with a higher value, consistent with preference-based search. Figure 1(c)

displays the result after submitting the query “milk,” consistent with semantic-based search.

[Insert Figure 1 Here]

Recall that we count whether the word appears anywhere on the actual webpage associated

with the search result, not just the title and snippet provided by Google. Also, it does not matter

how many times each word appears on the webpage associated with a result, as long as it appears

at least once. Participants are not allowed to use any other website while playing the game. We

enforced this by running the study in a lab in which we could observe and control the sites accessed

by participants.

2Sometimes there are multiple links that give the same maximum score for a query. In this case, we only presentone of them.

13

By manipulating information needs and costs exogenously, our query formation game allows

identifying preference-based search from semantic-based search empirically. However, the setting

of the game is somewhat artificial. Therefore, we view this game as an instrument for testing the

existence of semantic-based search by consumers, but not for measuring the extent to which con-

sumers engage in semantic-based search in real life. An analogy may be made to the experimental

economics literature. Games such as the dictator game are used in this literature to show that

individuals have the potential to behave in ways that are inconsistent with maximizing their own

economic well being, although these games do not quantify the extent of such behavior in real life.

4.2 Methods

We chose nouns as our words. We selected these words mainly from the food domain, because

this is a very common domain on which we expect all participants to have at least some knowledge.

Each participant completed 10 tasks in a random order, i.e., 10 rounds of the game. We formed 10

overlapping sets of three words using the following 14 unique words: caffeine, cake, candy, cheese,

drink, Easter, egg, fish, ketchup, milk, pizza, sugar, tea, tomato. In the study, we randomized

the order in which the words were displayed to participants in each task, in order to avoid any

potential ordering effect. We also varied the word values (β1,β2,β3) by selecting randomly (with

equal probabilities) one of the four sets for each of the 10 tasks and each participant: ($2,$2,$2),

($1,$2,$2), ($2,$1,$2), and ($2,$2,$1).

We formed these 10 sets of words so that different types of queries would be optimal across

tasks. In Table 1, we present the 10 sets, along with their corresponding optimal queries. There are

seven tasks in which there exists one “trigger” word that can activate both other two words in the

search result. For these cases, forming a query using the “trigger” word alone is the only optimal

query, irrespective of the set of word values. The words in the remaining three tasks have weaker

semantic relationships with each other, and forming queries using two words is optimal in these

cases. In these three tasks, different queries may be optimal based on the particular set of word

values, and more than one query may be optimal for a given set of values. Note that the same word

14

may be a “trigger” word in one task and “non-trigger” word in another task.

[Insert Table 1 Here]

Before running the study, we ran on Google all possible queries that may be formed in each

task, and downloaded the source code of the web pages related to the top 10 results associated with

each query. We scanned all the pages to identify which of the target words were included in each

page. We ran all queries on a single computer to ensure that the results given to participants during

the game would not be dependent on the computer on which the query was run. We used these

results during the game, i.e., we did not actually run any query during the game.3

The score in each task in the study can range from $2 to $5. For each participant, we randomly

chose at the end of the game the score from one of the 10 tasks and paid that amount as a bonus

to the participant, in addition to a $3 show-up fee. After participants finished the game, we also

collected demographic variables, measures of domain knowledge and search experience. Prior

research has shown that these factors might influence how users form their queries on the web

(Holscher and Strube, 2000; Hsieh-Yee, 2001). However, we did not find significant variations on

these measures, which may be because our participants all had similar levels of knowledge and

search experience. Therefore, we did not use these variables in our analysis.

5 Evidence for Semantic-Based Search

We obtained results from N=108 participants recruited at a large university in the northeast of

the United States. We first calculate the total score across the 10 tasks for each participant, and

compare it to the best achievable score for that participant. Figure 2 displays the histogram of

participants’ percentage deviation from the optimal score. The large variation suggests there is

heterogeneity in participants’ query choices. We also report the average score across participants

for each task (i.e., set of words) and each round (i.e., position of the task in the game) in Fig-

ure 3. Figure 3(a) indicates that performance varies across tasks. Figure 3(b) shows very stable

3We also re-ran these queries using different computers, and the optimal queries and results were mostly consistent.

15

performance over rounds, which suggests that participants did not learn over time.

[Insert Figures 2 and 3 Here]

We then analyze the actual queries formed by participants. Table 2 summarizes the distribution

of the length of participants’ queries, crossed with whether the query is optimal. Participants were

most likely to form queries with two words (56%), followed by one word (30%) and three (14%).

Overall, 24% of the queries were optimal. Queries were more likely to be optimal conditional on

having one word: 65% of the one-word query were optimal. These observations suggest that at

least some participants were able to leverage the semantic relationships between words to increase

their score, and were able to recognize some cases in which a single-word query was optimal.

Additional evidence in support for semantic-based search may be found by looking specifically at

words that were valued at $1. Recall that with probability 0.25, all three words were valued at $2,

and with probability 0.75, two words were valued at $2 and one was valued at $1. We find that in

21% of the cases in which one word was assigned a value of $1, participants formed a one-word

query containing the $1 word. In these situations, the participants favored the $1 word over both

$2 words, which would not be optimal under pure preference-based search. Similarly, when one

word was assigned a value of $1, 32% of the two-word queries contained the $1 word, which was

favored over the third word valued at $2.

However, one may wonder whether this pattern of results may be the result of participants

forming queries randomly. We find that participants formed shorter queries in tasks in which the

optimal query had only one word. The average query length was 1.76 when the optimal query had

one word, vs. 2.01 when the optimal query had two words (p-value< 0.01). Such pattern of results

is not consistent with participants forming queries completely randomly.

To further test for the existence of semantic-based search, we compare how frequently partic-

ipants used each word when its value was $2 versus $1, depending on whether it was optimal to

use the word. We find that among all cases in which a word was valued at $1 and it was optimal

to use this word, the word was actually used in 65.19% of the queries. This proportion dropped

to 47.17% among cases in which a word was valued at $1 and it was NOT optimal to use it. A

16

Chi-square test reveals that these two proportions are significantly different (p-value< 0.01), con-

firming that at last some consumers have the ability to leverage semantic relationships in search.

However, the fact that the proportions are quite far from 100% and 0% respectively also suggests

that participants did not leverage semantic relationships to their full potential. Among all cases in

which a word was valued at $2 and it was optimal to use this word, the word was used in 63.95%

of the queries. This proportion dropped only slightly to 61.42% when considering cases in which

a word was valued at $2 and it was NOT optimal to use the word. The difference in proportions is

not significant (p-value= 0.20). The fact that the use of $2 words was not significantly affected by

whether it was optimal to use them suggests that despite the existence of semantic-based search,

preference-based search played a large role in driving query formation in our data.

Finally, we can compare how likely the same word was to be used when it was the ”trigger”

(i.e., when it could activate the other two words) vs. not. In our design, five words were used in

two different tasks, and were triggers in only one of these tasks. For two out of these five words,

we observe a significant increase in the probability of being included in the query when the word

is a trigger vs. not (candy: 73% vs. 37%, p-value< 0.01; Easter: 94% vs. 69%, p-value< 0.01).

However there was no significant difference for two words (sugar: 43% vs. 37%, p-value= 0.40;

tomato: 35% vs. 35%, p-value= 1.00), and one word was actually significantly less likely to

be used when it was a trigger (cake: 57% vs. 67%, p-value< 0.05). This further suggests that

although semantic-based search exists, it may not be as prominent as is optimal. This also suggests

that consumers may have erroneous beliefs on semantic relationships, which we will address in

Sections 7 and 8.


To sum up, we find that performance varies across participants and tasks, but not over time. The

behavior we observe suggests that participants are able to leverage semantic relationships between

words, at least to some extent. However, the choice of whether a given word should be included

in a query seems to have been largely driven by the value of this word, which is consistent with

preference-based search being dominant in our data. Because our study uses a somewhat artificial

17

lab setting, we do not claim that the extent to which consumers use semantic-based search in

the real world is the same as in our study. Instead, we view our results as proof of existence

that semantic-based search by consumers is relevant. We believe this evidence should be enough

to convince researchers and practitioners that they should consider semantic-based search when

building models that link queries to information needs.

6 Parsimonious Representations of Semantic Relationships

Our results so far suggest that it would be unreasonable for researchers or practitioners to build

models of query formation, or to attempt to learn consumers’ information needs from their queries,

without modeling how these queries are influenced by consumers’ beliefs on the relevant semantic

relationships.

These semantic relationships are captured by the φ parameters in the framework introduced in

Section 3, where φq→s is the probability that a randomly selected result from the top K results from

query q contains exactly the words in set s ⊆ g. For a search task with three words, we have 24

unknown φ parameters, which are displayed in Table 3. Each row contains the probability of each

possible outcome for a given query, and the sum within each row is one. Because we find that the

top K results always contain the words in the search query (at least in our data), some outcomes

happen with probability zero or one for certain queries. Moreover, the empty set (i.e., no word is

found on the webpage) is an outcome that happens with 0 probability, and it is therefore omitted.

Unfortunately, the number of φ’s grows exponentially with the number of relevant words.

Moreover, these parameters are unique to each domain. Therefore measuring them systematically

would be very computationally costly, and estimating consumers’ beliefs on these probabilities on

a large scale would be practically infeasible. In this section, we address this problem by discussing

three possible approximations of these semantic relationships, which are based on the IR literature

and which we test on actual data from Google. Note that in this section we focus on approximating

the true semantic relationships. We will study consumers’ beliefs on these semantic relationships

18

in Sections 7 and 8.


6.1 Independence in Pages, Independence in Queries, and Symmetry Ap-

proximations

The first approximation, “independence in pages”, simplifies the semantic relationships from

being from queries to sets of words to being from queries to individual words. Let aq→ j denote the

activation probability of query q on word t j. It is defined as the probability of observing word t j

on the webpage of a random top result when submitting q. The number of possible aq→ j’s is much

smaller than the number of possible φq→s’s. The “independence in pages” approximation assumes

that the occurrence of a word in a page is independent of the other words on that same page. In that

case, φq→s may be approximated as the product of the probabilities that each word t j ∈ s appears

in the page and that each word in g/s does not appear in the page, i.e.,

φq→s ≈∏j∈s

aq→ j ∏k∈g/s

(1−aq→k). (3)

A similar assumption has been used in the traditional IR models. For example, the probability of

relevance framework computes the relevance between a query and a document by assuming that the

document is represented as a vector of different features which are independent events (Robertson

and Zaragoza, 2009).

The second approximation, ”independence in queries,” simplifies the activation probabilities

further from being from queries to individual words, to being from individual words to individual

words. It assumes that the activation probabilities from different words to the same target word are

independent from each other, and that the order of these words in a query does not matter. This

is similar to the bag-of-word approach that is commonly used in IR models (see Section 2.2.2),

combined with the assumption that different terms within a query are independent to each other,

which for example has been assumed in the probability of relevance framework (Robertson and

19

Zaragoza, 2009) and the language model Unigram (Zhai, 2008). Mathematically, the “indepen-

dence in queries” approximation implies:

aq→ j ≈ 1−∏tk∈q

(1−ak→ j), (4)

where the activation probability ak→ j is defined from word tk to word t j, i.e., it is the probability

that a top result contains word t j given the one-word query tk. The intuition behind Equation (4) is

that the probability of activating a target word equals the probability that at least one of the words

in the query activates this word.

Given both the “independence in pages” and “independence in queries” approximations, the se-

mantic relationships relevant to query formation can be specified based on a directed graph where

nodes represent words, and edges represent asymmetric activation probabilities between pairs of

words. These approximations are extremely convenient, as they reduce the number of parameters

dramatically. With W words, the number of relevant semantic relationships is in the order of 22W

(the number of possible queries is in the order of 2W and the number of possible sets of words is

2W ), i.e., it grows exponentially with W . With the two independence assumptions, these semantic

relationships may be expressed as function of only W (W −1) asymmetric activation probabilities.

That is, the number of asymmetric activation probabilities at the word level grows only polyno-

mially with W . For example, for a domain with three words, we only need six parameters (a1→2,

a2→1, a1→3, a3→1, a2→3, a3→2). Here we assume that ak→k = 1, i.e., all top K results from a

single-word query contain that word (which is true in our data). This assumption may be relaxed,

in which case the number of asymmetric activation probabilities would be W 2.

The third and last approximation, “symmetry,” simplifies the activation probabilities further by

assuming that they are symmetric, i.e., a j→ j′ ≈ a j′→ j. If all three approximations were valid, the

relevant semantic relationships could be approximated by functions of only W (W−1)2 parameters.

Note that as far as we know, this assumption has not been explicitly used in traditional IR models.

20

6.2 Empirical Test of Approximations

We test the three approximations described above using the true semantic relationships on

Google from Study 1. Specifically, we compute φq→s, the actual proportion of the top 10 results

from query q that contain exactly the words in set s.

To test the “independence in pages” approximation, we compute aq→ j, the proportion of the

top 10 results from query q that contain word t j, for all query-word combinations in Study 1. We

estimate a linear regression model where the dependent variable is the true φq→s, and the regres-

sor is the approximation φq→s(ind pages) = ∏ j∈s aq→ j ∏k∈g/s(1− aq→k), including an intercept.

Because the dependent variable is constrained to be between 0 and 1, we constrain the coefficients

to be between 0 and 1, and their sum to be no greater than 1. None of these constraints are bind-

ing, and we ignore these constraints when computing confidence intervals on the coefficients. To

compute confidence intervals, we take into account the fact that observations from the same task

are not independent from each other. For instance, φ{1,2}→3 is likely to be correlated with φ{1}→3.

This kind of correlation between observations within a group is also called an intraclass correla-

tion, which will cause the standard errors of the estimates from regular ordinary least square to

be biased. We correct for this using clustered robust standard errors (Rogers, 1994). We find that

the estimated model has an intercept 0.0131 (p-value<0.02) and a slope 0.9552 (p-value<0.01).

The R2 of the linear model is 0.907. Therefore, we can conclude that at least in this dataset, the

“independence in page” approximation seems to be valid.

Given that the “independence in pages” approximation appears to be valid, next we consider

the approximated probabilities based on both the “independence in pages” and the “independence

in queries” approximations. The approximated semantic relationships become φq→s(ind pages+

ind queries) = ∏ j∈s aq→ j ∏k∈g/s(1− aq→k), where aq→ j = 1−∏tk∈q(1− ak→ j) and ak→ j is the

proportion of the top 10 results from query tk that contain word t j. We then repeat the same re-

gression analysis as above, using φq→s(ind pages+ ind queries) instead of φq→s(ind pages). We

find that the fitted regression line has an intercept 0.0181 (p-value=0.40) and a slope 0.9418 (p-

value<0.01). The R2 of the linear model is 0.782. These results suggest that the approximation

21

φq→s(ind pages+ ind queries) also fits the truth well. Therefore, at least in this dataset, the “inde-

pendence in pages” and ‘independence in queries” approximations appear to be jointly valid.

Therefore, the parameters needed to capture the semantic relationships may be reduced to a

much smaller set of pairwise activation probabilities at the word level. For these simple relation-

ships, we finally explore the “symmetry” approximation. For each pair of words t j and t j′ , we

compare a j→ j′ to a j′→ j. With three pairs per task, we have 30 pairs to compare in total. A paired

two-sample t-test would not be appropriate here, because the labeling of j vs. j′

is arbitrary, i.e.,

observations are not naturally split into two samples. Instead, we compare the maximum activation

probability max{a j→ j′ ,a j′→ j} to an activation probability that is randomly selected between a j→ j′

and a j′→ j. In other words, we consider a hypothetical user who would need to select one word

in order to activate both, and compare the performance achieved when using the optimal word vs.

choosing one word randomly (where each word has an equal probability of being chosen). We use

a bootstrapping approach, where at each iteration we randomly draw 30 pairs of words with re-

placement. We compute the average activation probability for the two samples (max{a j→ j′ ,a j′→ j}

vs. the randomly-selected one) at each iteration. Figure 4 displays the sample distribution with

1,000 bootstrapping iterations. We see a large difference between the optimal sample (solid line)

and the random sample (dashed line). This suggests that it would be incorrect to model semantic

relationships based on symmetric activation probabilities.


In sum, we find that the “independence in pages” and “independence in queries” approxima-

tions seem to be valid, but not the “symmetry” approximation. That means, the set of semantic

relationships may be approximated as functions of asymmetric activation probabilities between

individual words. Imposing such structure on the semantic relationships enables us to parameter-

ize the semantic relationships more parsimoniously, reducing the dimensionality from being in the

order of 22W to W (W − 1). This will help us estimate consumers’ beliefs on these relationships

in the next section. We note that while we focused on the top K =10 search results per query in

this analysis, we also tested these approximations based on the top 30 and 50 results, and reached

22

the same conclusions. Details are available from the authors. We also note that we focus on short

queries in this paper (which have been shown to be more common, see Section 2.2), and these

approximations may need to be tested again on longer queries.

7 Consumers’ Beliefs on Activation Probabilities

The previous section suggested that the complex set of semantic relationships relevant to

semantic-based search may be parameterized parsimoniously based on asymmetric activation prob-

abilities at the level of individual words, rather than sets of words. The last key step toward being

able to build models of query formation is to specify consumers’ beliefs on these asymmetric acti-

vation probabilities. In particular, if we find that consumers’ beliefs are close to the truth, it may

be possible to build models that simply assume consumers’ beliefs are correct, which would ad-

dress the issue of empirically identifying beliefs from information needs. Alternatively, if we find

that beliefs deviate from the truth in some systematic ways, it may be possible to build models

that express consumers’ beliefs as parsimonious functions of the truth, and that jointly estimate

consumers’ information needs and the (small) set of parameters on which their beliefs depend.

In this section we explore this issue by estimating users’ beliefs based on their query choices

in Study 1. Empirically identifying beliefs from information needs is not an issue in this particular

study, because we manipulated information needs exogenously. In the next section, we report the

results of another experiment in which we measured these beliefs directly.

7.1 Estimating Users’ Beliefs Based on Their Queries: Empirical approach

In order to estimate participants’ beliefs on activation probabilities in Study 1, we specify

a choice model that captures query formation as an outcome of the participant maximizing their

expected payoff from each task, given their beliefs on activation probabilities. In this choice model,

the “utility” of a query, U(q|βi,{a j→ j′},ρ), is the expected value of the best score among the top

K = 10 results retrieved by the query, given the (known) preference vector βi, the user’s belief on

23

the set of activation probabilities {a j→ j′}, and a risk parameter ρ > 0 (ρ = 1 implies risk neutrality,

ρ < 1 risk aversion, and ρ > 1 risk seeking). This utility is derived based on Equation (2), which

specified the expected value of one random top result retrieved by a query. That is, we derive the

closed-form expression for the expected value of the best result, given the expected value of each

result. Details are provided in Appendix 2.

Based on this utility function, we model the formation of a search query in our study using a

multinomial logit model. The probability of choosing query q among Q possible queries can then

be expressed as:

Pr(q|βi,{a j→ j′},ρ,µ) =exp(

µU(q|βi,{a j→ j′},ρ))

∑Qq′=1

exp(

µU(q′|βi,{a j→ j′},ρ)) (5)

where µ is a logit scale parameter. In our data, the parameters βi are known as they were selected

by us and communicated to the participants. The only parameters to estimate are the logit scale

parameter µ, the risk parameter ρ, and the beliefs {a j→ j′}.

Each pair of words appeared in only one task. Therefore, the likelihood function is separable

in the parameters {a j→ j′}, i.e., each parameter enters into the likelihood corresponding to one

task only. Accordingly, we estimate the model for each of the 10 tasks separately. Because each

participant played each task only once, we impose some structure on the heterogeneity across

participants in order to estimate the beliefs at the individual level. We capture heterogeneity across

participants using a latent class approach. For a given task, the likelihood function with S segments

of beliefs can be expressed as:

L(Θ) =I

∏i=1

S

∑s=1

πs

[Q

∏q=1

Pr(q|βi,{as

j→ j′},ρ,µ

)diq

](6)

where I is the number of participants, πs is the share of segment s (for identification purposes

we assume πs is decreasing in s, i.e., the first segment has the largest share), diq ∈ {0,1} denotes

whether participant i formed query q, and {asj→ j′} are the beliefs in segment s. For each task, the

24

beliefs are captured by six parameters a j→ j′ for j 6= j′ ∈ {1,2,3}. We estimate the parameters by

maximizing the above likelihood function, while constraining all the parameters a ∈ [0,1].

7.2 Estimation Results

We use the AIC to select the optimal number of segments for each task. In Table 4, we report the

AIC for up to three segments. We can see that S = 1 is optimal for task 4, 5, and 10; whereas S = 2

is optimal for all the remaining tasks. We present the parameter estimates from the best model for

each task in Table 5. It seems that overall, participants tend to be slightly risk-averse. For the seven

tasks with two segments, Segment 1 (the larger segment) has relatively weaker beliefs (i.e., the a

parameters have lower values) and a share around 65-85%; and Segment 2 has stronger beliefs. We

also compare the estimated beliefs on activation probabilities for the same pair of words in both

directions. It seems that for some pairs of words (e.g., Task 4 Segment 1) the beliefs are highly

asymmetric, while for others (e.g., Task 1 Segment 1) the beliefs are closer to being symmetric.

[Insert Tables 4 and 5 Here]

7.3 Accuracy of Consumers’ Beliefs and Systematic Biases

We now turn to the question of whether consumers’ beliefs are accurate, and whether there

exist systematic biases in these beliefs. Based on the estimates in Table 5, we calculate the posterior

estimate of each participant’s belief on each activation probability in each task. This gives us 6,480

observations: 6 activation probabilities per task for 10 tasks from a total of 108 participants. The

distribution across all observations of the difference between the estimated belief and the truth is

presented in Figure 5. Positive values mean that participants’ beliefs are larger than the truth. The

distribution is slightly right-tailed, with a mean of 0.0753 and a standard deviation of 0.2450.


Next we explore systematic biases in consumers’ beliefs. Aitchison (2012) suggests that hu-

mans form semantic relationships from their mental lexicons which are developed through edu-

cation, experience, and context, based on word similarity in meaning or sound. Hence, if two

25

words share strong similarity, consumers will form strong associations between them, no matter

which one is the target word in a directional relationship. This suggests that consumers’ beliefs

on activation probabilities may not be asymmetric enough, which is consistent with mental dis-

tance approaches in cognitive psychology (Shepard, 1962). Mental distance approaches assume

that concepts may be represented as points within a mental space. Similarity between concepts is

related to the distance between them in that space, which is symmetric. For example, if a consumer

feels that Coke is similar to Pepsi, then it follows that they must feel Pepsi to be equally similar

to Coke. However, other theories, such as featural approaches (Tversky, 1977), suggest that con-

sumers’ representations are asymmetric. For example, Johnson (1986) found that a consumer may

find Coke to be very similar to Pepsi and at the same time Pepsi to be less similar to Coke. In sum,

we may expect consumers’ beliefs to have some level of asymmetry, but to be less asymmetric

than the truth.

To test and quantify such bias, we estimate a model that specifies a user’s belief a j→ j′ as a

function of the truth a j→ j′ and the relationship in the opposite direction a j′→ j. This model is

developed based on the truth and bias (T&B) model of judgment proposed by West and Kenny

(2011). In addition to the potential bias toward a j′→ j, we allow for bias toward the ends of the

scale. Accordingly, we build the following constrained linear model, where consumers are indexed

by i:

ai, j→ j′ = b0×0+b1×1+btruth×a j→ j′ +bopposite×a j′→ j + εi j j′ . (7)

In this model, the parameters b0 and b1 capture directional bias toward 0 and 1 respectively; the

parameter btruth denotes what West and Kenny (2011) refer to as the truth force; and the parameter

bopposite to what they refer to as a bias force. The random error term εi j j′ is assumed to have a mean

zero. We impose the following constraint: b0 +b1 +btruth +bopposite = 1, and all the coefficients

are non-negative. They guarantee that the fitted beliefs will fall into [0,1] in expectation.

We estimate this model by minimizing the sum of squared residuals subject to the above con-

straint on the parameters, based on the beliefs estimated in Study 1. We treat the estimation prob-

lem as a constrained quadratic programming problem. As quadratic programming can only give

26

us a point estimate, we use bootstrapping to estimate the standard errors of the coefficients. One

issue here again is that observations are not independent. Specifically, all observations related to

the same pair of words are correlated with each other. In order to preserve such nested correlation

structure in resampling, we adopt block bootstrapping for clustered data (Field and Welsh, 2007;

Ren et al., 2010). We sample among the 30 pairs of words with replacement, and for each pair

consider all observations across participants in both directions. We then obtain the solution to the

quadratic programming for this resampled data. Our inference is based on the parameter estimates

from 1,000 bootstrap iterations. We find that

ai, j→ j′ = 0.1051+0.7442a j→ j′ +0.1005a j′→ j + εi j j′ ,

and b0 = 0.0502. All the coefficients are statistically significant (p-value<0.01). These results

indicate that participants’ beliefs have significant directional bias toward both ends of the scale,

especially toward 1, which is consistent with the observations in Figure 5. While the truth force

plays the dominant role in the model, the bias force toward the activation probability in the opposite

direction also seems to have significant impact on consumers’ beliefs. The residual of the fitted

model has a mean -0.0012 with a standard deviation 0.2669. The mean square error is 0.0712, and

the mean absolute value of the residuals is 0.1881.

8 Study 2

Our findings so far may be summarized as follows. Consumers leverage semantic relationships

between words at least to some extent when forming search queries. Moreover, the relevant se-

mantic relationships may be simplified as functions of asymmetric activation probabilities between

individual words. When compared to the true activation probabilities, consumers’ beliefs appear

to be slightly biased upward, and to be not asymmetric enough. In Study 2 we test this last re-

sult further by measuring consumers’ beliefs directly (in an incentive-aligned manner), instead of

inferring them from query choices as in Study 1.

27

8.1 Design

We measured participants’ beliefs on activation probabilities directly, for example by asking

them to “estimate how many of the top 10 search results from the query egg on Google contain the

word fish”. Again, we told participants that a search result contains the word if it appears on the

actual page, and not just the description provided by Google. Participants chose a number between

0 and 10 as their answers, i.e., they entered their best guess of the number of results containing

the target word. We formed 30 pairs of words using the same words as in Study 1, giving us 60

possible questions. We chose the pairs of words based on the true activation probabilities, to have a

large range of a j→ j′ and a large range of |a j→ j′−a j′→ j| across pairs. Each participant answered 30

questions that were randomly selected from the pool. After participants answered all 30 questions,

we presented them with the correct answers for all these questions (all at once). The correct answer

was derived using the same approach as Study 1. At end of the survey, we collected demographics,

and also asked participants to describe how they made decisions for these questions. This study

was incentive-aligned. In addition to a $3 show-up fee, each participant won a $0.20 cash bonus

for each correct answer.

8.2 Results

We conducted this study in a lab experiment with N=206 participants. We first analyze partic-

ipants’ introspective statements about how they selected their answers. The most commonly used

words in these statements include “relationship”, “correlations”, “related”, “association”, “simi-

larity”, and “match.” Very few participants mentioned directionality as a factor. Hence, it seems

that participants made choices by relying mostly on the general association and similarity between

words, rather than the directional relationship. This finding is consistent with our previous findings

that consumers’ beliefs on activation probabilities are not asymmetric enough.

Among the 60 questions we selected, 0 is the correct answer for about one third. Hence,

choosing 0 could be an attractive strategy for participants in this study. However, we found that

all participants gave at least three different answers across questions. Therefore, no participant

28

blindly selected 0 as their answers to all questions.

We now compare participants’ beliefs to the true activation probabilities. The distribution

(across participants and pairs) of the difference between participants’ beliefs and the truth is dis-

played in Figure 6. The distribution has a mean of 0.1297 and a standard deviation of 0.3005. Note

that the difference has a larger positive mean than was found in Study 1, i.e., the upward bias is

more severe. We again estimate the T&B model (7) based on participants’ answers in Study 2,

using quadratic programming and block bootstrapping as we did in Study 1. The estimation results

below is based on 1,000 bootstrap iterations:

ai, j→ j′ = 0.1549+0.5481a j→ j′ +0.2960a j′→ j + εi j j′

and b0 = 0.0010. All the coefficient estimates are statistically significant (p-value<0.01). The

fitted error term has a mean -0.0076 with a standard deviation 0.2818. The mean square error is

0.0795, and the mean absolute value of the residuals is 0.2338. We can see that there is a larger bias

toward 1 compared to Study 1, which is consistent with what we observed in Figure 6. Compared

to Study 1, we also find an even stronger effect of the opposite relationship a j′→ j on participants’

beliefs, which is also consistent with participants’ self-statements on how they made decisions.

Therefore, the results of Study 2 are directionally similar to those in Study 1, although the bias was

more severe in Study 2.


9 Conclusions

To the best of our knowledge, our study is the first to document semantic-based search by

consumers on online search engines. We find that consumers do not necessarily form queries that

simply reflect their information needs. Rather, they have the ability to leverage semantic rela-

tionships in order to improve the efficiency of their queries. We show that the relevant semantic

relationships may be approximated parsimoniously as functions of asymmetric activation proba-

29

bilities at the word level. We further show that consumers’ beliefs on these activation probabilities

tend to be biased upward, and that they are not asymmetric enough.

The combination and integration of these results suggest a framework for building models of

query formation. In particular:

• Models of query formation should capture semantic-based search by consumers.

• In order to capture semantic-based search, it is necessary to capture consumers’ beliefs on

a set of relevant semantic relationships. The set of relevant semantic relationships grows

exponentially with the number of relevant words.

• These relevant semantic relationships may be approximated as functions of asymmetric ac-

tivation probabilities at the level of individual words, based on the ”independence in pages”

and ”independence in queries” approximations. This set of asymmetric activation probabili-

ties grows only polynomially with the number of relevant words.

• Consumers’ beliefs on the asymmetric activation probabilities may be specified as a function

of an intercept, the true activation probabilities, and the activation probabilities in the other

direction.

This proposed framework, by parameterizing beliefs very parsimoniously, allows building

models that empirically identify information needs from beliefs, at least parametrically. In this

paper we modeled consumers’ beliefs based on the truth and bias (T&B) model (West and Kenny,

2011). Future research may explore alternative modeling frameworks and identify additional co-

variates that influence consumers’ beliefs.

More generally, we hope our research will facilitate the development of marketing search mod-

els that go beyond discrete choices to incorporate text-based search. In particular, in today’s en-

vironment search is primarily text-based, and marketing models of search should be adapted to

capture this reality. It is important to keep in mind that despite being ubiquitous, online search

queries are often only one type of behavior over the whole search path. Therefore, future search

models in marketing may combine text-based search with discrete-choice based search.

30

Future research may also explore the extent to which models of query formation need to be

specific to each search engine. We focused on Google as it is by far the most common search

engine with a worldwide market share of 88.44% (Statista, 2015b). However, if consumers adjust

their query formation strategies from one search engine to the other, semantic-based search may

be more or less relevant across search engines. In addition, the approximations of the semantic

relationships that we tested on Google would need to be verified on alternative search engines, and

the bias in consumers’ beliefs on activation probabilities may vary across search engines. Similarly,

our research could be extended to other online websites that have a search function, e.g., YouTube

and Amazon.

Finally, future research may combine our approach with more complex language models. In

our analysis, for practical reasons we assumed that the value of a webpage was a linear and additive

function of dummy variables capturing the presence of individual words. The framework outlined

above is compatible with any alternative specification. However, one challenge with field data is

that the number of relevant words may be very large, giving rise to an unmanageable number of

parameters to estimate. This dimensionality may be reduced, using for example topic modeling

(Blei et al., 2003; Tirunillai and Tellis, 2014). In that case, semantic relationships and activation

probabilities may be specified at the level of topics rather than individual words, using a similar

approach as the one outlined above. In particular, the relevant semantic relationships would de-

scribe the link between the topic distribution in the search query and the topic distribution in the

search results; and activation probabilities may be replaced with parameters that capture how the

weight on a given topic in a query influences the weight of another topic in the search results.

Going back to the IR literature, such ideas can also be used to extend the existing semantic

matching algorithms (Kurland and Lee, 2004; Wei and Croft, 2006), by relaxing the assumption

that the topic distribution in a query is a direct representation of the user’s information needs.

Instead, the topic distribution that captures the user’s true underlying information needs may be

inferred from the query, and documents may be found that match that topic distribution rather than

the query’s topic distribution.

31

References

Aitchison, J. (2012). Words in the mind: An introduction to the mental lexicon. John Wiley &

Sons.

Azzopardi, L., D. Kelly, and K. Brennan (2013). How query cost affects search behavior. Pro-

ceedings of the 36th international ACM SIGIR conference on Research and development in

information retrieval, 23–32.

Berners-Lee, T., J. Hendler, O. Lassila, et al. (2001). The semantic web. Scientific american 284(5),

28–37.

Blei, D. M., A. Y. Ng, and M. I. Jordan (2003). Latent dirichlet allocation. the Journal of machine

Learning research 3, 993–1022.

Broder, A. (2002). A taxonomy of web search. In ACM Sigir forum, Volume 36, pp. 3–10. ACM.

Collins, A. M. and E. F. Loftus (1975). A spreading-activation theory of semantic processing.

Psychological review 82(6), 407.

Dzyabura, D. (2013). The role of changing utility in product search. Available at SSRN 2202904.

Erdem, T., M. P. Keane, T. S. Oncu, and J. Strebel (2005). Learning about computers: An analysis

of information search and technology choice. Quantitative Marketing and Economics 3(3),

207–247.

Field, C. A. and A. H. Welsh (2007). Bootstrapping clustered data. Journal of the Royal Statistical

Society: Series B (Statistical Methodology) 69(3), 369–390.

Fleishman-Hillard (2012). 2012 fleishman-hillard digital influence index. Available at

http://www.harrisinteractive.com.

Fu, W.-T. and P. Pirolli (2007). Snif-act: A cognitive model of user navigation on the world wide

web. Human–Computer Interaction 22(4), 355–412.

32

GE (2013). Ge capital retail bank major purchase shopper study. Available from

http://www.businesswire.com/ .

Guha, R., R. McCool, and E. Miller (2003). Semantic search. In Proceedings of the 12th interna-

tional conference on World Wide Web, pp. 700–709. ACM.

Holscher, C. and G. Strube (2000). Web search behavior of internet experts and newbies. Computer

networks 33(1), 337–346.

Hsieh-Yee, I. (2001). Research on web search behavior. Library & Information Science Re-

search 23(2), 167–185.

Hui, S. K., E. T. Bradlow, and P. S. Fader (2009). Testing behavioral hypotheses using an in-

tegrated model of grocery store shopping path and purchase behavior. Journal of consumer

research 36(3), 478–493.

Jansen, B. J., D. Booth, and B. Smith (2009). Using the taxonomy of cognitive learning to model

online searching. Information Processing & Management 45(6), 643–663.

Jansen, B. J., D. L. Booth, and A. Spink (2008). Determining the informational, navigational, and

transactional intent of web queries. Information Processing & Management 44(3), 1251–1266.

Jansen, B. J., A. Spink, A. Pfaff, and A. Goodrum (2000). Web query structure: Implications for ir

system design. In Proceedings of the 4th World Multiconference on Systemics, Cybernetics and

Informatics (SCI 2000), pp. 169–176.

Jeziorski, P. and I. Segal (2010). What makes them click: Empirical analysis of consumer de-

mand for search advertising. Technical report, Working papers//the Johns Hopkins University,

Department of Economics.

Johnson, M. D. (1986). Consumer similarity judgments: A test of the contrast model. Psychology

& Marketing 3(1), 47–60.

33

Kamvar, M. and S. Baluja (2006). A large scale study of wireless search behavior: Google mobile

search. In Proceedings of the SIGCHI conference on Human Factors in computing systems, pp.

701–709. ACM.

Khoo, C. S. and J.-C. Na (2006). Semantic relations in information science. Annual Review of

Information Science and Technology (40), 157–228.

Kim, J. B., P. Albuquerque, and B. J. Bronnenberg (2010). Online demand under limited consumer

search. Marketing Science 29(6), 1001–1023.

Kurland, O. and L. Lee (2004). Corpus structure, language models, and ad hoc information re-

trieval. In Proceedings of the 27th annual international ACM SIGIR conference on Research

and development in information retrieval, pp. 194–201. ACM.

Li, H. and J. Xu (2013). Semantic matching in search. Foundation and Trends in Informational

Retrieval 7(5), 343–469.

Mangold, C. (2007). A survey and classification of semantic search approaches. International

Journal of Metadata, Semantics and Ontologies 2(1), 23–34.

Manning, C. D., P. Raghavan, and H. Schutze (2008). Introduction to information retrieval, Vol-

ume 1. Cambridge university press Cambridge.

Narayanan, S. and K. Kalyanam (2011). Measuring position effects in search advertising: A

regression discontinuity approach. Technical report, Working Paper.

Park, J. and H. Chung (2009). Consumers travel website transferring behaviour: analysis using

clickstream data-time, frequency, and spending. The Service Industries Journal 29(10), 1451–

1463.

Pirolli, P. and S. Card (1999). Information foraging. Psychological review 106(4), 643.

Pirolli, P. L. (2007). Information foraging theory: Adaptive interaction with information. Oxford

University Press.

34

Raaijmakers, J. G. and R. M. Shiffrin (1981). Search of associative memory. Psychological re-

view 88(2), 93.

Ren, S., H. Lai, W. Tong, M. Aminzadeh, X. Hou, and S. Lai (2010). Nonparametric bootstrapping

for hierarchical data. Journal of Applied Statistics 37(9), 1487–1498.

Robertson, S. and H. Zaragoza (2009). The probabilistic relevance framework: BM25 and beyond.

Now Publishers Inc.

Roelleke, T. (2013). Information retrieval models: Foundations and relationships. Synthesis Lec-

tures on Information Concepts, Retrieval, and Services 5(3), 1–163.

Rogers, W. (1994). Regression standard errors in clustered samples. Stata technical bulletin 3(13).

Rose, D. E. and D. Levinson (2004). Understanding user goals in web search. In Proceedings of

the 13th international conference on World Wide Web, pp. 13–19. ACM.

Ruthven, I. (2003). Re-examining the potential effectiveness of interactive query expansion. In

Proceedings of the 26th annual international ACM SIGIR conference on Research and develop-

ment in informaion retrieval, pp. 213–220. ACM.

Salton, G. and M. J. McGill (1986). Introduction to modern information retrieval.

Santos, R. L., C. Macdonald, and I. Ounis (2015). Search result diversification. Foundations and

Trends in Information Retrieval 9(1), 1–90.

Shepard, R. N. (1962). The analysis of proximities: Multidimensional scaling with an unknown

distance function. i. Psychometrika 27(2), 125–140.

Shi, S. W. and M. Trusov (2013). The path to click: Are you on it? working paper.

Spink, A., D. Wolfram, M. B. Jansen, and T. Saracevic (2001). Searching the web: The public and

their queries. Journal of the American society for information science and technology 52(3),

226–234.

35

Statista (2015a). Digital marketing spending in the united states from 2014 to 2019. Avail-

able from http://www.statista.com/statistics/275230/us-interactive-marketing-spending-growth-

from-2011-to-2016-by-segment.

Statista (2015b). Global market share of search engines 2010-2015. Available from

http://www.statista.com/statistics/216573/worldwide-market-share-of-search-engines/ .

Tirunillai, S. and G. J. Tellis (2014). Mining marketing meaning from online chatter: Strategic

brand analysis of big data using latent dirichlet allocation. Journal of Marketing Research 51(4),

463–479.

Tversky, A. (1977). Features of similarity. Psychological Review 84(4), 327–352.

Wei, X. and W. B. Croft (2006). Lda-based document models for ad-hoc retrieval. In Proceed-

ings of the 29th annual international ACM SIGIR conference on Research and development in

information retrieval, pp. 178–185. ACM.

West, T. V. and D. A. Kenny (2011). The truth and bias model of judgment. Psychological

review 118(2), 357.

Wu, W.-C., D. Kelly, and A. Sud (2014). Using information scent and need for cognition to under-

stand online search behavior. In Proceedings of the 37th international ACM SIGIR conference

on Research & development in information retrieval, pp. 557–566. ACM.

Yang, L., O. Toubia, and M. G. De Jong (2015). A bounded rationality model of information

search and choice in preference measurement. Journal of Marketing Research 52(2), 166–183.

Yoganarasimhan, H. (2015). Search personalization using machine learning. Available at SSRN

2590020.

Zhai, C. (2008). Statistical language models for information retrieval. Synthesis Lectures on

Human Language Technologies 1(1), 1–141.

36

Tables

Table 1: Search Tasks in Study 1 and Optimal Queries

Task t1 t2 t3 Optimal Query1 candy caffeine sugar “candy”2 fish tea tomato two words*3 milk cheese tea “milk”4 Easter candy egg “Easter”5 tomato drink pizza “tomato”6 Easter caffeine ketchup two words*7 sugar cake pizza “sugar”8 egg candy drink two words*9 cake cheese Easter “cake”10 ketchup cake tomato “ketchup”

Note: For the seven tasks in which the optimal query has a single word,this trigger word is labeled as t1. In the study, words were always shownto participants in a random order. * indicates that the optimal query de-pends on the value assigned to each word.

Table 2: Number of Queries with Different Lengths and Optimality in Study 1

Query Length Not Optimal Optimal Row Total Percentage

1 180 149 229 30%2 498 106 604 56%3 147 0 147 14%

Column Total 825 255 1,080Percentage 76% 24% 100%

37

Table 3: Original Parameterization of Semantic Relationships for a Task g = (t1, t2, t3)

QueryPossible Outcomes

{t1} {t2} {t3} {t1, t2} {t1, t3} {t2, t3} {t1, t2, t3}t1 φ1→1 0 0 φ1→{1,2} φ1→{1,3} 0 φ1→{1,2,3}t2 0 φ2→2 0 φ2→{1,2} 0 φ2→{2,3} φ2→{1,2,3}t3 0 0 φ3→3 0 φ3→{1,3} φ3→{2,3} φ3→{1,2,3}t1 t2 0 0 0 φ(1,2)→{1,2} 0 0 φ(1,2)→{1,2,3}t2 t1 0 0 0 φ(2,1)→{1,2} 0 0 φ(2,1)→{1,2,3}t1 t3 0 0 0 0 φ(1,3)→{1,3} 0 φ(1,3)→{1,2,3}t3 t1 0 0 0 0 φ(3,1)→{1,3} 0 φ(3,1)→{1,2,3}t2 t3 0 0 0 0 0 φ(2,3)→{2,3} φ(2,3)→{1,2,3}t3 t2 0 0 0 0 0 φ(3,2)→{2,3} φ(3,2)→{1,2,3}t1 t2 t3 0 0 0 0 0 0 1t1 t3 t2 0 0 0 0 0 0 1t2 t1 t3 0 0 0 0 0 0 1t2 t3 t1 0 0 0 0 0 0 1t3 t1 t2 0 0 0 0 0 0 1t3 t2 t1 0 0 0 0 0 0 1

Note: with three words, there are 15 possible queries (ordered sets of words) and 7 possible unordered sets ofwords to be found on the result pages. φq→s is the probability of activating exactly the words in s with query q.

Table 4: AIC for Different Tasks and Numbers of Segments in Study 1

aaaaaaa# Seg.

Task 1 2 3 4 5 6 7 8 9 10

1 510 555 532 384 469 565 487 531 550 4542 493 515 515 419 471 504 472 509 542 4643 502 526 529 423 497 517 487 559 555 470

38

Tabl

e5:

Est

imat

esof

Part

icip

ants

’Bel

iefs

forE

ach

Task

inSt

udy

1

Task

1Ta

sk2

Task

3Ta

sk4

Task

5Ta

sk6

Task

7Ta

sk8

Task

9Ta

sk10

Segm

ent1

a 1→

20.

000

0.05

60.

047

0.87

00.

999

0.07

00.

023

0.05

50.

086

0.06

1

a 2→

10.

019

0.01

00.

095

0.02

00.

077

0.07

30.

152

0.03

40.

023

0.00

3

a 1→

30.

115

0.02

70.

092

0.87

90.

019

0.04

90.

086

0.00

70.

036

0.99

7

a 3→

10.

084

0.04

20.

081

0.07

50.

821

0.00

00.

000

0.04

10.

128

0.12

7

a 2→

30.

088

0.01

70.

006

0.14

40.

117

0.02

50.

028

0.07

10.

085

0.09

9

a 3→

20.

130

0.01

30.

067

0.00

00.

178

0.00

10.

061

0.01

60.

042

0.07

8

Segm

ent2

a 1→

20.

130

0.26

90.

694

0.27

80.

964

0.14

80.

111

a 2→

10.

123

0.99

80.

997

0.64

20.

915

0.43

00.

992

a 1→

30.

985

0.43

60.

201

0.42

80.

909

0.96

10.

960

a 3→

10.

178

0.28

30.

995

0.99

90.

215

0.99

90.

994

a 2→

30.

989

0.07

90.

097

0.16

30.

842

0.24

40.

105

a 3→

20.

544

0.39

90.

103

0.10

10.

999

0.12

20.

097

π1

0.66

80.

774

0.71

50.

762

0.84

30.

788

0.78

3

ρ0.

896

1.03

50.

904

0.83

80.

863

0.93

80.

934

0.98

50.

882

1.01

4

µ6.

930

4.62

95.

793

3.11

33.

168

6.67

16.

384

5.58

66.

385

2.86

0

39

Figures

(a)

(b) (c)

Figure 1: Search Query Game Interface in Study 1

Figure (a) is the game interface where a participant forms a search query given the set of words, theirvalues ($1 or $2 per word) and costs ($1 per word). The participant decides which word(s) to useand in which order. Figure (b) and (c) show the screens the participant will see after submitting thequeries “tea cheese” (b) and “milk” (c). The participant is shown the search result with the highestscore (score=value-cost), the list of relevant words found on its webpage, and the correspondingscore.

40

Figure 2: Distribution across Participants of Deviation fromOptimal Total Score - Study 1

We calculate the total score across the 10 tasks for each participant, and compare it to the bestachievable score for that participant.

(a) (b)

Figure 3: Average Performance across Tasks and Rounds in Study 1

We compute the average score across participants for each task (i.e., set of words) and each round(i.e., position of the task). Figure (a) indicates that performance varies across tasks. Figure (b) showsvery stable performance over rounds, which suggests that participants did not learn over time.

41

Figure 4: Testing the Symmetry of Activation Probabilities in Study 1

For each pair of words, we compare the maximum activation probability max{a j→ j′ ,a j′→ j} to anactivation probability that is randomly selected between a j→ j′ and a j′→ j. We use a bootstrappingapproach (with 1,000 iterations), where at each iteration we randomly draw 30 pairs of words withreplacement. We compute the average activation probability for the two samples (max{a j→ j′ ,a j′→ j}vs. the randomly-selected one) at each iteration. We see a large difference between the optimalsample (solid line) and the random sample (dashed line). This suggests that it would be incorrect tomodel semantic relationships based on asymmetric activation probabilities.

42

Figure 5: Distribution across Participants and Pairs of Words ofDeviation from Truth - Study 1

Based on the estimates in Table 5, we calculate the posterior estimates of each participant’s beliefson each activation probability in each task, and compare the estimated beliefs to the truth.

Figure 6: Distribution across Participants and Pairs of Words ofDeviation from Truth - Study 2

We compare participants’ beliefs (measured directly) to the true activation probabilities.

43

Appendix 1: Instruction Page for Search Query Game

44

Appendix 2: Choice Model Derivations

Note that the following analysis is specific to a given task g.

Let Y q = {y1, ...,yK} be random variables that denote the value of the top K results retrieved

by query q. Each result contains one of the seven possible sets of words shown in Table 3. Given

the (known) preference vector βi for a consumer i, we can calculate the value vs(βi) correspond-

ing to each possible set s of words, and order these values from the smallest to the largest as

{v[1](βi),v[2](βi), ...,v[N](βi)}, where N is the number of unique values among the seven cases (N

is less than 7 if some sets of words have the same value). The expected utility from query q is

therefore written as a function of the expected score of its best result, i.e.,

U(q|βi,{a j→ j′},ρ) =N

∑n=1

Pr(

max{y1, ...yK}= v[n](βi)|q,{a j→ j′})(

v[n](βi)− c(q))ρ

(8)

where, a j→ j′ is the consumer’s belief on the activation probability from word t j to word t j′ , ρ > 0

is a risk parameter (ρ = 1 implies risk neutrality, ρ < 1 risk aversion, and ρ > 1 risk seeking), and

the cost c(q) equals the length of query q (recall that in our game the cost of a query was equal to

its number of words).

In order to compute the expected utility from query q, we need to compute Pr(

max{y1, ...yK}=

v[n](βi)|q,{a j→ j′})

for each possible value v[n](βi). That is, we need to compute the probability

distribution of the score of the best result from query q.

First, based on the “independence in pages” and ‘independence in queries” approximations, we

express consumers’ beliefs on the semantic relationships at the level of sets of words {φq→s}, as a

function of the beliefs at the word level {a j→ j′}. This approximation is given by Equation (3) and

(4). Let fn denote the probability that a random top result from query q has value v[n](βi) based

on these beliefs (let f0 = 0). We compute this probability by simply summing φq→s over the sets s

whose value vs(βi) equals to v[n](βi), i.e.:

fn = ∑s

φq→sI(vs(βi) = v[n](βi)

)(9)

45

We can then write the cumulative density function of max{y1, ...yK} as:

Pr(

max{y1, ...y10} ≤ v[n](βi)|q,{a j→ j′})=(∑i≤n

fi)K (10)

That is, the probability that the best search result has value less than or equal to v[n](βi) is the

probability that all K search results have value less than or equal to v[n](βi). Given this cumulative

density function, we can now derive the probability that the best result has exactly a value of

v[n](βi) as:

Pr(

max{y1, ...yK}= v[n](βi)|q,{a j→ j′})=(∑i≤n

fi)K−

(∑

i≤n−1fi)K (11)

Plugging Equation (11) into Equation (8) provides a closed-form expression of the expected

utility from each possible query.

46

Date post:	03-Feb-2016
Category:	Documents
Upload:	jk7373
View:	220 times
Download:	0 times

Framework for Online Searches by Consumers

Documents