Date post: | 19-Dec-2015 |
Category: |
Documents |
View: | 213 times |
Download: | 0 times |
MULTIPLE INTENTS RE-RANKING
By:
Yossi Azar, Iftah Gamzu, Xiaoxin Yin
pp. 669-678, in Proceedings of STOC 2009
Presented By:
Bhawana Goel
WEB SEARCH AND RANKING
Ranking of search results on the basis of: Hyperlink structure of the web Content of the web page User’s location Not much research on user’s “intent”
INTENT
Same query different intents “computer science at A&M”
Information about computer science department at A&M
Information about admission to computer science department at A&M
PROBLEM STATEMENT
20% of web queries are ambiguous Different user types with different intents Goal is to minimize the average effort of
browsing through the search results Re-rank the web results
TYPES OF INTENTS
Navigational First result is relevant
Informational All the results are relevant
Complex First and third results are relevant
OVERVIEW
Each user type has its own profile vector with subset of relevant pages <1,0…0> , <0,0…1> , <1,1…1> The elements in vector correspond to positions
and not particular page Order of result pages in vector is irrelevant and
is determined by search engine Depicts intention
Type of query need Depicts proportion of users
<1,0,0> <100,0,0>One user 100 users
CALCULATION OF USER EFFORT
Navigational (<1,0,0>)2 * 1 = 2
Informational (<1,1,1>)2*1 + 4*1 + 5*1 = 11
Complex (<0.4,0.4,0.2>)2*0.4 + 4*0.4 + 5*0.2 = 3.4
1 2 3
2
4
1
9
3
1
2
3
5
4
Profile Vectors
PROBLEM FORMULATION
Form a weighted hypergraph With vertices = web results Hyperedges = user types Weights = user profiles
1 2 3
2
4
1
9
3
1
2
3
5
4
9
4
e2(1,2,3)*<1,0,0> = 1
e1(2,4,5)*<15,20,25> = 235
e2
e1
Overhead
SPECIAL CASES All user profiles are of type <1,0,…0>
It’s a case of min-sum set cover problem Its NP-hard Has an approximation ratio of 4
A B C F G IC A B
A F C B G I
Greedily pick the element which covers the most number of uncovered sets.
SPECIAL CASES All user profiles are of type <0,0,…1>
It’s a case of minimum-latency set cover problem Its NP-hard Has e-approximation algorithm
CASE 1: NON-INCREASING WEIGHT VECTORS
Non-increasing weight vectors Generalization for min-sum set cover problem Greedy weight reduction algorithm Approximation ratio of 4
A B C D
E F G
(4,1,0)
(3,0)
(2,2,0)
A
A F
GREEDY ALGORITHM IN GENERAL CASE
Greedy weight reduction algorithm does not work in the general case
Approximation ratio is unbounded
OPT = k2
2w + (3+4…k+2)
ALG = k3
(1+2…k) + (k+2)w
k x <1,0>
w = k2
<0,w>
CASE 2: ARBITRARY WEIGHT VECTORSHARMONIC INTERPOLATION ALGORITHM
Greedy algorithm takes only local maxima into account
Apply greedy algorithm on harmonically interpolated weight vectors
It provides knowledge about future weight reduction potentials of hyperedges
ALG = 2w/2 + (3+4…k+2)
k x <1,0> <w/2,w>
HARMONIC INTERPOLATION
1, , ) (( )1
) (r
jr i
j i
ww w e
jw e w
i
Algorithm Phase I:1. Calculate harmonic interpolation for weight vectors for all e
e E
Algorithm Phase II:2. Calculate the weight of each vertex according to changed weight vectors3. Select vertex with maximum weight
(GREEDY WEIGHT REDUCTION ALGORITHM)
ANALYSIS OF HARMONIC INTERPOLATION ALGORITHM
Use indicator vectors :<0,0,…w…0,0> Only one entry is non-zero
Harmonic interpolation : <w/j,…w/2,w,…0> Notations
(e,i): a potential pair w(e,i): weight of the potential pair let t be the time when (e,i) is covered Penalty of a step = remaining harmonic
weight/weight covered have to minimize:
∑t=1 ∑(e,i) w(e,i) × t
OPTIMAL SOLUTION HISTOGRAM
Create a histogram with no of columns = number of potential pairs, width of a column = w(e,i) and height of the column = t(e,i)
potential pairs
Its monotonically increasing
Time
HISTOGRAM FOR ALGORITHMIC SOLUTION
Its not monotonic
Histogram with no of columns = number of potential pairs, width of a column = ŵ(e,i) and height of the column = penalty of the step
APPROXIMATION RATIO
o Reduce width of ALG by 2Hr and height by 2o The new histogram completely fits inside
optimal solution histogramo ALG/4Hr >= OPT
ALG/4