Date post: | 16-Dec-2015 |
Category: |
Documents |
Upload: | corey-brent-hamilton |
View: | 220 times |
Download: | 1 times |
Sandeep Pandey1, Sourashis Roy2, Christopher Olston1, Junghoo Cho2, Soumen Chakrabarti3
1 Carnegie Mellon2 UCLA 3 IIT Bombay
Shuffling a Stacked Deck
The Case for Partially Randomized Ranking of Search Engine Results
2
@Carnegie MellonDatabases
Popularity as a Surrogate for Quality
Search engines want to measure the “quality” of pages
Quality is hard to define and measure
Various “popularity” measures are used in ranking– e.g., in-links, PageRank, user traffic
1. ---------2. ---------3. ---------
3
@Carnegie MellonDatabases
Relationship Between Popularity and Quality
Popularity : depends on the number of users who “like” a page– relies on both quality and awareness of the page
Popularity is different from quality – But strongly correlated when awareness is large
Usersaware of
page p
like page p
4
@Carnegie MellonDatabases
Problem
Popularity/quality correlation weak for young pages – Even if of high quality, may not (yet) be popular due to
lack of user awareness
Plus, process of gaining popularity inhibited by “entrenchment effect” – [Cho et. al. WWW’04], [Chakrabarti et. al. SODA’05]
[Mowshowitz et. al. Communication’02]
and many others
5
@Carnegie MellonDatabases
Entrenchment Effect
Search engines show entrenched (already-popular) pages at the top
Users discover pages via search engines; tend to focus on top results
1. ---------2. ---------3. ---------4. --------- 5. ---------6. --------- …
entrenched pages
user attention
new unpopular pages
6
@Carnegie MellonDatabases
Outline
Problem introduction Key idea: Mitigate entrenchment by
introducing randomness into ranking– Randomized Rank Promotion Scheme – Model of ranking and popularity evolution– Evaluation
Summary
7
@Carnegie MellonDatabases
Alternative Approaches to Counter-act Entrenchment Effect
Weight links to young pages more – [Baeza-Yates et. al SPIRE ’02]– Proposed an age-based variant of PageRank
Extrapolate quality based on increase in popularity – [Cho et. al SIGMOD ’05]– Proposed an estimate of quality based on the
derivative of popularity
8
@Carnegie MellonDatabases
Our Approach: Randomized Rank Promotion
Select random (young) pages to promote to good rank positions
Rank position to promote to is chosen at random
1
2
3
500
501
..
1
500
2
499
501
..3
9
@Carnegie MellonDatabases
Our Approach: Randomized Rank Promotion
Consequence: Users visit promoted pages; improves ability to estimate quality via popularity
Compared with previous approaches: • Does not rely on temporal measurements (+)• Sub-optimal (-)
10
@Carnegie MellonDatabases
Exploration/Exploitation Tradeoff
Exploration/Exploitation tradeoff– exploit known high-quality pages by assigning
good rank positions– explore quality of new pages by promoting them
in rank
Existing search engines only exploit (to our knowledge)
11
@Carnegie MellonDatabases
Possible Objectives for Rank Promotion
Fairness– Give each page an equal chance to become popular– Incentive for search engines to be fair?
Quality– Maximize quality of search results seen by users (in
aggregate)– Quality page p: extent to which users “like” p– Q(p) [0,1]
our choice
12
@Carnegie MellonDatabases
Quality-Per-Click Metric (QPC)
V(p,t) : number of visits made to page p at time t through search engine
QPC : average quality of pages viewed by users, amortized over time
t p
t p
tpV
pQtpV
QPC
),(
)(),(
13
@Carnegie MellonDatabases
Outline
Problem introduction Key idea: Mitigate entrenchment by
introducing randomness into ranking– Randomized Rank Promotion Scheme – Model of ranking and popularity evolution– Evaluation
Summary
14
@Carnegie MellonDatabases
Desiderata for Randomized Rank Promotion
Want ability to:– Control exploration/exploitation
tradeoff
– “Select” certain pages as candidates for promotion
– – “Protect’’ certain pages from
demotion
1
2
3
500
501
..
1
500
2
499
501
..3
15
@Carnegie MellonDatabases
Randomized Rank Promotion Scheme
WWm
W-Wm
Promotion pool
4
1
2
3
4
1
2
3random ordering
order by popularity Ld
Lm
Remainder
16
@Carnegie MellonDatabases
Randomized Rank Promotion Scheme
Ld
k-1
r 1-r
Promotion list
k = 3 r = 0.5
Remainder
1
1 2
2 3 4
3 4 5 6
1 2
Lm
17
@Carnegie MellonDatabases
Parameters
Promotion pool (Wm)– Uniform rank promotion : give an equal chance to each
page– Selective rank promotion : exclusively target zero
awareness pages
Start rank (k)– rank to start randomization from
Degree of randomization (r) – controls the tradeoff between exploration and exploitation
18
@Carnegie MellonDatabases
Tuning the Parameters
Objective: maximize quality-per-click (QPC)
Two ways to tune– Real-world experiment– Analytical modeling
19
@Carnegie MellonDatabases
Outline
Problem introduction Key idea: Mitigate entrenchment by
introducing randomness into ranking – Randomized Rank Promotion Scheme– Model of ranking and popularity evolution– Evaluation
Summary
20
@Carnegie MellonDatabases
Popularity Evolution Cycle
Popularity P(p,t)
Rank R(p,t)
Awareness A(p,t)
Visit rateV(p,t)
21
@Carnegie MellonDatabases
Popularity Evolution Cycle
Popularity P(p,t)
Rank R(p,t)
Awareness A(p,t)
Visit rateV(p,t)
FAP(A(p,t))
FVA(V(p,t))
FPR(P(p,t))
FRV(R(p,t))
22
@Carnegie MellonDatabases
Deriving Popularity Evolution Curve
Po
pu
lari
ty
P(p
,t)
time (t)
Next step : derive formula for popularity evolution curve
Assumptions– Number of pages constant– Pages are created and retired according to a Poisson
process with rate parameter – Quality distribution of pages is stationary
23
@Carnegie MellonDatabases
Deriving Popularity Evolution Curve
i
j jPRRV
jPRRV
iPRRVi qaFF
qaFF
aFFqaf
1
1
)).((
)).((
)1)).(0(()|(
Doing the steady state analysis, we get
DETAIL
Pp
m
pQxmiPR pQ
m
ifxF
)(/.1
)(|1)(
2/3
1
2/3)(
xi
vxF n
i
RV
q
qx
f
qxE
|
),(
24
@Carnegie MellonDatabases
Use Popularity Evolution Model to Tune Parameters
Model of popularity evolution process (see paper)– Complex dynamic process– To study, we combine approximate analysis with
simulation
Next step: use model to tune rank promotion scheme– Parameters: k, r and Wm
– Objective: maximize QPC
25
@Carnegie MellonDatabases
Tuning: Promotion Pool (Wm )
-no promotion - uniform promotion- selective promotion
k=1 and r=0.2
27
@Carnegie MellonDatabases
Tuning: k and r
Maximize QPC(Quality-per-click)
Avoid excessive“junk”
Preserve #1 resultfor navigationalsearches
28
@Carnegie MellonDatabases
Model of the Web
Squash Linux
Web = collection of multiple disjoint topic-specific communities (e.g., ``Linux’’, ``Squash’’ etc.)
A community is made up of a set of pages, interested users and related queries
29
@Carnegie MellonDatabases
Robustness Across Different Web Communities
0
0.2
0.4
0.6
0.8
1
1E+03 1E+04 1E+05 1E+06
# pages
0
0.2
0.4
0.6
0.8
1
1E+02 1E+04 1E+06
# users
0
0.2
0.4
0.6
0.8
1
0.75 2.25 3.75 5.25
page lifetime
qu
alit
y-p
er-c
lick
0
0.2
0.4
0.6
0.8
1
1E+01 1E+04 1E+07
visit rate
qu
alit
y-p
er-c
lick
30
@Carnegie MellonDatabases
Summary
Entrenchment effect hurts search result quality
Solution : Randomized rank promotion
Model of Web evolution and QPC metric– Used to tune & evaluate randomized rank promotion
Results :– New high-quality pages become popular much faster– Aggregate search result quality significantly improved