+ All Categories
Home > Technology > The Use of Query Reformulation to Predict Future User Actions

The Use of Query Reformulation to Predict Future User Actions

Date post: 21-Jan-2015
Category:
Upload: jim-jansen
View: 567 times
Download: 1 times
Share this document with a friend
Description:
 
Popular Tags:
21
Using Query Reformulation for User Profiling Jim Jansen College of Information Sciences and Technology The Pennsylvania State University [email protected] Interested in how much descriptive information we can generate about a people by leveraging search log data.
Transcript
Page 1: The Use of Query Reformulation to Predict Future User Actions

Using Query Reformulation for User Profiling

Jim Jansen

College of Information Sciences and Technology The Pennsylvania State University

[email protected]

Interested in how much descriptive information we can generate about a people by leveraging search log data.

Page 2: The Use of Query Reformulation to Predict Future User Actions

What Did We Find Out?

We can tell quite a lot about a user!

When combined with other information, query reformulation is a revealing

searching characteristic.

Page 3: The Use of Query Reformulation to Predict Future User Actions

The State of Web Search

Why search data is important

Page 4: The Use of Query Reformulation to Predict Future User Actions

The Power of Search and the Web

Sources: comScore, U.S., Feb. ’06, Stanford Institute for the Quantitative Study of Society, Nov. ‘05

• Search is the top online activity

• Search drives over 7 billion monthly queries in the U.S.

• Online activity has a huge impact on people’s daily lives:– 70 minutes less with

family

– 30 minutes less TV

– 8.5 minutes less sleep

Page 5: The Use of Query Reformulation to Predict Future User Actions

Analysis of Search Marketplace comScore Core Search Report* Aug 2009 vs. Sept 2009 Total U.S. – Home/Work/University Locations Source: comScore qSearch 2.0

Share of Searches (%)

Core Search Entity Aug-09 Sept-09

Point Change

Sep-09 vs. Aug-09

Total Core Search 100.0% 100.0% N/A Google Sites 64.6% 64.9% 0.3 Yahoo! Sites 19.3% 18.8% -0.5 Microsoft Sites 9.3% 9.4% 0.1 Ask Network 3.9% 3.9% 0.0 AOL LLC Network 3.0% 3.0% 0.0

* Based on the five major search engines including partner searches and cross-channel searches. Searches for mapping, local directory, and user-generated video sites that are not on the core domain of the five search engines are not included in the core search numbers.

Holding fairly stable over the last year or so, albeit with some Bing flux

Page 6: The Use of Query Reformulation to Predict Future User Actions

Search Logs

• Contains the trace data recorded when a person visits the search engine, submits a query, views results, etc

• On one hand, logs have been criticized for not being rich enough (i.e., only have behaviors but not the ‘why’ factors)

• On the other hand, logs have been criticized for recording too much about us (i.e., logging a lot of personal information about a person)

search logs

How much we can learn about a person from the data

stored in search logs?

Specifically, how rich of a searcher profile can we build

of what a person is doing, of why they are doing it, and

to predict what are they going to do next?

Page 7: The Use of Query Reformulation to Predict Future User Actions

An illustrative example

Page 8: The Use of Query Reformulation to Predict Future User Actions

How much can we tell from a single query?

• ASIS&T is an acronym for the American Society of Information Science and Technology

• Good probability that this user is an academic, a researcher, a librarian, or a student in one of these disciplines

• Leveraging demographic information:– 57 percent female / 43 percent male probability – 66.2 percent chance works in the information science field– 55.6 percent probability this user has master’s degree

Page 9: The Use of Query Reformulation to Predict Future User Actions

How much can we tell from a single query?

• Leveraging demographic information (cont’d):– 32.3 percent probability this user has a doctorate– 53 percent likelihood works in academia.

• Using IP, we can locate the geographical area• Based on time, could infer that:

– this person is searching for the conference’s schedule (if the query is submitted prior to the meeting) for travel

– or looking for presentations or papers from the meeting (if the query is submitted after the conference).

Theoretically, we can tell a lot!

However, with billions of queries per month, we can’t do the analysis by hand like this example.

To develop user profiles, we need automated methods.

Research Question - How complete of a profile can one develop for a Web search engine user from search log data? [(a) what the user is doing, (b) what the user is

interested in, and (c) what the user intends to do]

Page 10: The Use of Query Reformulation to Predict Future User Actions

Specific aspects with automated methods …

Location Geographical interest Topical interest Topical complexity Content desires Commercial intent Purchase intent Potential to click on a link Gender User identification

– where the user is at – where the user is going – what the user is interested in – how motivated is the user – Info, Nav, Transactional – eCommerce related – getting ready to buy – will user click on

link - demographic targeting/personalization - specific user targeting

Page 11: The Use of Query Reformulation to Predict Future User Actions

Automated methods using query reformulation

Location Geographical interest Topical interest Topical complexity – n-grams pattern analysis Content desires Commercial intent Purchase intent Potential to click on a link Gender User identification

Page 12: The Use of Query Reformulation to Predict Future User Actions

Where to get full story?

The methodological implementation reported in paper in your ASIST proceedings:

Jansen, B.J., Zhang, M., Booth, B. Park, D., Zhang, Y., Kathuria, A. and Bonner, P. (2009) To What Degree Can Log Data Profile a Web Searcher? Proceedings of the American Society for Information Science and Technology 2009 Annual Meeting. Vancouver, British Columbia. 6-11 November.

Page 13: The Use of Query Reformulation to Predict Future User Actions

Topical Complexity

Number of queries by a user in a session on a topic can tell us many things:–the complexity of the topic

–the user’s motivation for the need

–provide prediction of future action

Page 14: The Use of Query Reformulation to Predict Future User Actions

Information Searching• Probabilistic user modeling

– increasingly important area– allows computer systems to adapt to users

• Algorithmic techniques typically employ state models– Simple Bayesian Classifier– Markov Modeling– n-grams

Note: not always ‘informational’ anymore. Many time people are searching for ‘other things’. Rose & Levinson (2004); Jansen, Booth, & Spink (2008).

Page 15: The Use of Query Reformulation to Predict Future User Actions

Illustration of Probabilistic User Modeling Using n-grams

User Search StateTransitions

1 ABCF

2 ABCDE

3 ABCDE

4 A

5 AC

PredictivePattern

NextState?

Accuracy

AB C 1OO%

BC D 66%

CD E 100%

A B 60%

C D 40%

Given these states … … how accurately can we predict these?

Page 16: The Use of Query Reformulation to Predict Future User Actions

Example Using Search Log• ~ 965,000 searching

sessions• ~ 1,500,000 queries• 8 states focusing on

query reformulation

• Similar results for other aspects of searching

• See - Qui (1993), Jansen (2005), Jansen & McNeese (2006)

• Maybe ‘states’ are not the correct paradigm?

0 1st 2nd 3rd 4th

Order of the Model

Acc

urac

y of

Pre

dict

ion

0.

1

0.2

0.3

0.4

0.5

0

.6

0.28

0.40

0.470.440.44

0.60Drop out rate (folks who don’t submit a query ~40%)

Jansen, B. J., Booth, D. L., & Spink, A. (2009). Patterns of query modification during Web searching. Journal of the American Society for Information Science and Technology.

10% improvement from 1st to 2nd order: okay, but would like to do better

Page 17: The Use of Query Reformulation to Predict Future User Actions

User Profiling Framework

• Classify user aspects into two levels: internal and external.

• Internal aspects refer to attributes of the users themselves.

• External aspects relate to the behavior or interest of the users.

• Interaction between internal and external aspects. Can infer external aspects from internal aspects. External aspects reflect internal aspects

Page 18: The Use of Query Reformulation to Predict Future User Actions

Thank you!(open for questions and further discussion)

Jim Jansen

College of Information Sciences and Technology The Pennsylvania State University

[email protected]

Page 19: The Use of Query Reformulation to Predict Future User Actions

Search Logs has some common fields, such as time, queries, results, etc.

We can enrich the log with additional fields.

Back Back

Page 20: The Use of Query Reformulation to Predict Future User Actions

Back

Page 21: The Use of Query Reformulation to Predict Future User Actions

Back


Recommended