Contextual IR
Naama Kraus
Slides are based on the papers:Searching with Context, Kraft, Chang, Maghoul, KumarContext-Sensitive Query Auto-Completion, Bar-Yossef and Kraus
Ambiguous queries: jaguar
General queries:haifa
Terminology differences (synonyms)between user and corpusstars - planets
The Problem (recap)• User queries are an imperfect description of their information
needs• Examples:
Contextual IR• Leverage context to better understand the
user’s information need
• Context types– Short-term context
• Current time and location, recent queries, recent page visits, current page viewed, recent tweets, recent e-mails …
– Long-term context (user profile/model)• Long-term search history, user interests, user
demographics (gender, education…), emails, desktop files…
Today’s focus:short-term
context
Example
jaguar
recently viewed page
Document retrieval – use context to disambiguate queries
Searching with Context
Kraft, Chang, Maghoul, Kumar,WWW’06
Searching with Context• Goal: improve document retrieval• Capture user’s recent context
– Piece of text– Extract terms from a page a user is currently viewing, a file
a user is currently editing …
• Proposes three different methods – Query rewriting (QR)
• Add terms to the user’s original query– Rank biasing (RB)
• Re-rank results– Iterative filtering meta-search (IFM)
• Generate sub-queries and aggregate results
Query Rewriting
• Send one simple query to a standard search engine
• Augment top context terms to original query– AND semantics– Parameter: how many terms to add
• Query q • Context term weighted vector (a b c d e)
– Terms are ranked by their weight• Q_new = (q a b) for parameter 2
Rank-Biasing• Send complex query that contains ranking
instructions to the search engine• Does not change the original result set, only the
ranking• <q> = <selection=cat> <optional=persian,2.0>
• Selection terms – original query terms• Optional terms – context terms
– boost is a function of their weight
new query definition
must appear terms
optional terms with boost factor
(influence on ranking)
Iterative Filtering Meta-Search
• Intuition: “explore” different ways to express an information need
• Algorithm outline– Generate sub-queries– Send to search engine– Aggregate results
Sub-query Generation
• Use a query template
• Example:– Query q ; context = (a, b ,c)– Sub-queries
• q a , q b , q c • q a b , q b c • q a b c
Ranking and Filtering
• Issue k sub-queries to standard SE• Obtain results• Challenge – how to combine, rank and filter
results ?
• Use rank aggregation techniques
Rank Averaging
• A rank aggregation method (out of many…)
• Given: k lists of top results• Assign score to each position in the list
– E.g., 1 to first position, 2 to second position …• For each document, average over its scores in
the k lists• The final list is constructed using the average
scores
Context-Sensitive Query Auto-Completion
Z. Bar-Yossef and N. Kraus, WWW’11
Query Auto-Completion
An integral part of the user’ssearch experience
Use Cases• Predict the user’s intended
query– Save her key strokes
• Assist a user to formulate her information need
Motivating Example
I am attending WWW 2011
I need some information about
Hyderabad
hyderabadhyderabad airporthyderabad historyhyderabad mapshyderabad indiahyderabad hotelshyderabad www
Current Desired
MostPopular is not always good enough
User queries follow a power law distribution A heavy tail of unpopular queries
MostPopular is likelyto mis-predict when given a small number of keystrokes
MostPopular Completion
Nearest Completion
www 2011
Idea: leverage recent query context Intuition: the user’s intended query is similar to her context query need a similarity measure between queries (refer to paper)
hyderabadairport
hyderabadhyderabad
maps
hyderabadindia
hydroxycut hyperbolahyundai
hyatt
Nearest Completion: Framework
NearestNeighbors
Search
context
candidatecompletionsRepository
top kcontext-related
completions
offline 1. Expand
completions 2. Index completions
online1. Expand context query2. Search for similar completions 3. Return top k completions
HybridCompletion
Problem• If context queries are irrelevant to current query, NearestCompletion
fails to predict user’s query.
Solution• HybridCompletion: a combination of highly popular and highly
context-similar completions– Completions that are both popular and context-similar get promoted
• hybscore(q) = c Zsimscore(q) + (1-c) Zpopscore(q) , c [0,1]– Convex combination
MostPopular, Nearest, and Hybrid (1)
MostPopular, Nearest, and Hybrid (2)
Anecdotal Examplescontext query MostPopular Nearest Hybrid
french flag italian flag internetim helpirsikeainternet explorer
italian flag itunes and frenchirelanditaly irealand
internetitalian flagitunes and frenchim help irs
neptune uranus ups usps united airlinesusbankused cars
uranus uranasuniversityuniversity of chic…ultrasound
uranus uranasupsunited airlinesusps
improving acer laptop
battery
bank of america
bank of america bankofamericabest buybed bath and b…
battery powered …battery plus cha…
bank of america best buybattery powered …