Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms
Onur Küçüktunç1,2 Erik Saule1
Kamer Kaya1 Ümit V. Çatalyürek1,3
WWW 2013, May 13–17, 2013, Rio de Janeiro, Brazil.
1Dept. Biomedical Informatics 2Dept. of Computer Science and Engineering
3Dept. of Electrical and Computer Engineering The Ohio State University
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 2/25
Outline
• Problem definition – Motivation – Result diversification algorithms
• How to measure diversity
– Classical relevance and diversity measures – Bicriteria optimization?! – Combined measures
• Best Coverage method
– Complexity, submodularity – A greedy solution, relaxation
• Experiments
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 3/25
Problem definition
G = (V,E)
Q ✓ V
Online shopping product co-purchasing
• one product • previous purchases • page visit history
Academic paper-to-paper citations
• paper/field of interest • set of references • researcher himself/herself
product recommendations “you might also like…” R ⇢ V references for related work
new collaborators
collaboration network
Social friendship network
• user himself/herself • set of people
friend recommendations “you might also know…”
Let G = (V,E) be an undirected graph.
Given a set of m seed nodes
Q = {q1, . . . , qm} s.t. Q ✓ V ,
and a parameter k, return top-k items
which are relevant to the ones in Q,
but diverse among themselves, covering
di↵erent aspects of the query.
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 4/25
Problem definition Let G = (V,E) be an undirected graph.
Given a set of m seed nodes
Q = {q1, . . . , qm} s.t. Q ✓ V ,
and a parameter k, return top-k items
which are relevant to the ones in Q,
but diverse among themselves, covering
di↵erent aspects of the query.
• We assume that the graph itself is the only information we have, and no categories or intents are available
• no comparisons to intent-aware algorithms [Agrawal09,Welch11,etc.] • but we will compare against intent-aware measures
• Relevance scores are obtained with Personalized PageRank (PPR) [Haveliwala02]
p⇤(v) =
(1/m, if v 2 Q0, otherwise.
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 5/25
Result diversification algorithms
• GrassHopper [Zhu07] – ranks the graph k times
• turns the highest-ranked vertex into a sink node at each iteration
0 5 100
2
4
6
8
0 5 1005100
0.005
0.01
0.015g1
0 5 1005100
2
4
6g2
0 5 1005100
0.5
1
1.5g3
(a) (b) (c) (d)
Figure 1: (a) A toy data set. (b) The stationary distribution ! reflects centrality. The item with the largestprobability is selected as the first item g1. (c) The expected number of visits v to each node after g1 becomesan absorbing state. (d) After both g1 and g2 become absorbing states. Note the diversity in g1, g2, g3 as theycome from different groups.
Items at group centers have higher probabilities, andtighter groups have overall higher probabilities.However, the stationary distribution does not ad-
dress diversity at all. If we were to rank the itemsby their stationary distribution, the top list would bedominated by items from the center group in Fig-ure 1(b). Therefore we only use the stationary dis-tribution to find the first item, and use a methoddescribed in the next section to rank the remainingitems.Formally we first define an n ! n raw transition
matrix P by normalizing the rows of W : Pij =wij/
!nk=1 wik, so that Pij is the probability that the
walker moves to j from i. We then make the walka teleporting random walk P by interpolating eachrow with the user-supplied initial distribution r:
P = "P + (1 " ")1r!, (1)
where 1 is an all-1 vector, and 1r! is the outer prod-uct. If " < 1 and r does not have zero elements,our teleporting random walk P is irreducible (possi-ble to go to any state from any state by teleporting),aperiodic (the walk can return to a state after anynumber of steps), all states are positive recurrent (theexpected return time to any state is finite) and thusergodic (Grimmett and Stirzaker, 2001). ThereforeP has a unique stationary distribution ! = P!!.We take the state with the largest stationary proba-bility to be the first item g1 in GRASSHOPPER rank-ing: g1 = argmaxn
i=1 !i.
2.3 Ranking the Remaining Items
As mentioned early, the key idea of GRASSHOPPERis to turn ranked items into absorbing states. Wefirst turn g1 into an absorbing state. Once the ran-dom walk reaches an absorbing state, the walk is ab-sorbed and stays there. It is no longer informative tocompute the stationary distribution of an absorbingMarkov chain, because the walk will eventually beabsorbed. Nonetheless, it is useful to compute theexpected number of visits to each node before ab-sorption. Intuitively, those nodes strongly connectedto g1 will have many fewer visits by the randomwalk, because the walk tends to be absorbed soonafter visiting them. In contrast, groups of nodes faraway from g1 still allow the random walk to lingeramong them, and thus have more visits. In Fig-ure 1(c), once g1 becomes an absorbing node (rep-resented by a circle ‘on the floor’), the center groupis no longer the most prominent: nodes in this grouphave fewer visits than the left group. Note now they-axis is the number of visits instead of probability.GRASSHOPPER selects the second item g2 with the
largest expected number of visits in this absorbingMarkov chain. This naturally inhibits items similarto g1 and encourages diversity. In Figure 1(c), theitem near the center of the left group is selected asg2. Once g2 is selected, it is converted into an ab-sorbing state, too. This is shown in Figure 1(d). Theright group now becomes the most prominent, sinceboth the left and center groups contain an absorbingstate. The next item g3 in ranking will come from theright group. Also note the range of y-axis is smaller:
0 5 100
2
4
6
8
0 5 1005100
0.005
0.01
0.015g1
0 5 1005100
2
4
6g2
0 5 1005100
0.5
1
1.5g3
(a) (b) (c) (d)
Figure 1: (a) A toy data set. (b) The stationary distribution ! reflects centrality. The item with the largestprobability is selected as the first item g1. (c) The expected number of visits v to each node after g1 becomesan absorbing state. (d) After both g1 and g2 become absorbing states. Note the diversity in g1, g2, g3 as theycome from different groups.
Items at group centers have higher probabilities, andtighter groups have overall higher probabilities.However, the stationary distribution does not ad-
dress diversity at all. If we were to rank the itemsby their stationary distribution, the top list would bedominated by items from the center group in Fig-ure 1(b). Therefore we only use the stationary dis-tribution to find the first item, and use a methoddescribed in the next section to rank the remainingitems.Formally we first define an n ! n raw transition
matrix P by normalizing the rows of W : Pij =wij/
!nk=1 wik, so that Pij is the probability that the
walker moves to j from i. We then make the walka teleporting random walk P by interpolating eachrow with the user-supplied initial distribution r:
P = "P + (1 " ")1r!, (1)
where 1 is an all-1 vector, and 1r! is the outer prod-uct. If " < 1 and r does not have zero elements,our teleporting random walk P is irreducible (possi-ble to go to any state from any state by teleporting),aperiodic (the walk can return to a state after anynumber of steps), all states are positive recurrent (theexpected return time to any state is finite) and thusergodic (Grimmett and Stirzaker, 2001). ThereforeP has a unique stationary distribution ! = P!!.We take the state with the largest stationary proba-bility to be the first item g1 in GRASSHOPPER rank-ing: g1 = argmaxn
i=1 !i.
2.3 Ranking the Remaining Items
As mentioned early, the key idea of GRASSHOPPERis to turn ranked items into absorbing states. Wefirst turn g1 into an absorbing state. Once the ran-dom walk reaches an absorbing state, the walk is ab-sorbed and stays there. It is no longer informative tocompute the stationary distribution of an absorbingMarkov chain, because the walk will eventually beabsorbed. Nonetheless, it is useful to compute theexpected number of visits to each node before ab-sorption. Intuitively, those nodes strongly connectedto g1 will have many fewer visits by the randomwalk, because the walk tends to be absorbed soonafter visiting them. In contrast, groups of nodes faraway from g1 still allow the random walk to lingeramong them, and thus have more visits. In Fig-ure 1(c), once g1 becomes an absorbing node (rep-resented by a circle ‘on the floor’), the center groupis no longer the most prominent: nodes in this grouphave fewer visits than the left group. Note now they-axis is the number of visits instead of probability.GRASSHOPPER selects the second item g2 with the
largest expected number of visits in this absorbingMarkov chain. This naturally inhibits items similarto g1 and encourages diversity. In Figure 1(c), theitem near the center of the left group is selected asg2. Once g2 is selected, it is converted into an ab-sorbing state, too. This is shown in Figure 1(d). Theright group now becomes the most prominent, sinceboth the left and center groups contain an absorbingstate. The next item g3 in ranking will come from theright group. Also note the range of y-axis is smaller:
0 5 100
2
4
6
8
0 5 1005100
0.005
0.01
0.015g1
0 5 1005100
2
4
6g2
0 5 1005100
0.5
1
1.5g3
(a) (b) (c) (d)
Figure 1: (a) A toy data set. (b) The stationary distribution ! reflects centrality. The item with the largestprobability is selected as the first item g1. (c) The expected number of visits v to each node after g1 becomesan absorbing state. (d) After both g1 and g2 become absorbing states. Note the diversity in g1, g2, g3 as theycome from different groups.
Items at group centers have higher probabilities, andtighter groups have overall higher probabilities.However, the stationary distribution does not ad-
dress diversity at all. If we were to rank the itemsby their stationary distribution, the top list would bedominated by items from the center group in Fig-ure 1(b). Therefore we only use the stationary dis-tribution to find the first item, and use a methoddescribed in the next section to rank the remainingitems.Formally we first define an n ! n raw transition
matrix P by normalizing the rows of W : Pij =wij/
!nk=1 wik, so that Pij is the probability that the
walker moves to j from i. We then make the walka teleporting random walk P by interpolating eachrow with the user-supplied initial distribution r:
P = "P + (1 " ")1r!, (1)
where 1 is an all-1 vector, and 1r! is the outer prod-uct. If " < 1 and r does not have zero elements,our teleporting random walk P is irreducible (possi-ble to go to any state from any state by teleporting),aperiodic (the walk can return to a state after anynumber of steps), all states are positive recurrent (theexpected return time to any state is finite) and thusergodic (Grimmett and Stirzaker, 2001). ThereforeP has a unique stationary distribution ! = P!!.We take the state with the largest stationary proba-bility to be the first item g1 in GRASSHOPPER rank-ing: g1 = argmaxn
i=1 !i.
2.3 Ranking the Remaining Items
As mentioned early, the key idea of GRASSHOPPERis to turn ranked items into absorbing states. Wefirst turn g1 into an absorbing state. Once the ran-dom walk reaches an absorbing state, the walk is ab-sorbed and stays there. It is no longer informative tocompute the stationary distribution of an absorbingMarkov chain, because the walk will eventually beabsorbed. Nonetheless, it is useful to compute theexpected number of visits to each node before ab-sorption. Intuitively, those nodes strongly connectedto g1 will have many fewer visits by the randomwalk, because the walk tends to be absorbed soonafter visiting them. In contrast, groups of nodes faraway from g1 still allow the random walk to lingeramong them, and thus have more visits. In Fig-ure 1(c), once g1 becomes an absorbing node (rep-resented by a circle ‘on the floor’), the center groupis no longer the most prominent: nodes in this grouphave fewer visits than the left group. Note now they-axis is the number of visits instead of probability.GRASSHOPPER selects the second item g2 with the
largest expected number of visits in this absorbingMarkov chain. This naturally inhibits items similarto g1 and encourages diversity. In Figure 1(c), theitem near the center of the left group is selected asg2. Once g2 is selected, it is converted into an ab-sorbing state, too. This is shown in Figure 1(d). Theright group now becomes the most prominent, sinceboth the left and center groups contain an absorbingstate. The next item g3 in ranking will come from theright group. Also note the range of y-axis is smaller:
highest-ranked vertex
R = {g1}
R = {g1,g2}
R = {g1,g2,g3}
g1 turned into a sink node
highest-ranked in the next step
Kucuktunc et al “Diversified Recommendation on Graphs: Pitfalls Measures, and Algorithms” WWW’13 6/25
Result diversification algorithms
• GrassHopper [Zhu07] – ranks the graph k times
• turns the highest-ranked vertex into a sink node at each iteration
• DivRank [Mei10] – based on vertex-reinforced random walks (VRRW)
• adjusts the transition matrix based on the number of visits to the vertices (rich-gets-richer mechanism)
Fi r 1 A il r ti f di s ki gFi r : An l ti f i r kiFi r 1: n ill str t f iv s r g i t
sample graph weighting with PPR diverse weighting
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 7/25
Result diversification algorithms
• GrassHopper [Zhu07] – ranks the graph k times
• turns the highest-ranked vertex into a sink node at each iteration
• DivRank [Mei10] – based on vertex-reinforced random walks (VRRW)
• adjusts the transition matrix based on the number of visits to the vertices (rich-gets-richer mechanism)
• Dragon [Tong11] – based on optimizing the goodness measure
• punishes the score when two neighbors are included in the results
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 8/25
Measuring diversity
Relevance measures • Normalized relevance
• Difference ratio
• nDCG
Diversity measures • l-step graph density
• l-expansion ratio
rel(S) =
Pv2S ⇡v
Pki=1 ⇡i
di↵(S, S) = 1� |S \ S||S|
nDCGk =⇡s1 +
Pki=2
⇡silog2 i
⇡1
+Pk
i=2
⇡ilog2 i
dens`(S) =
Pu,v2S,u 6=v d`(u, v)
|S|⇥ (|S|� 1)
�`(S) =|N`(S)|
n
N`(S) = S [ {v 2 (V � S) : 9u 2 S, d(u, v) `}where
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 9/25
Bicriteria optimization measures
• aggregate a relevance and a diversity measure • [Carbonell98]
• [Li11]
• [Vieira11]
• max-sum diversification, max-min diversification, k-similar diversification set, etc. [Gollapudi09]
fMMR(S) = (1� �)X
v2S
⇡v � �X
u2S
max
v2Su 6=v
sim(u, v)
fL(S) =X
v2S
⇡v + �|N(S)|
n
fMSD(S) = (k � 1)(1� �)X
v2S
⇡v + 2�X
u2S
X
v2Su 6=v
div(u, v)
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 10/25
Bicriteria optimization is not the answer
• Objective: diversify top-10 results • Two query-oblivious algorithms:
– top-% + random
– top-% + greedy-σ2
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 11/25
Bicriteria optimization is not the answer • normalized relevance and 2-step graph density
• evaluating result diversification as a bicriteria optimization problem with – a relevance measure that ignores diversity, and – a diversity measure that ignores relevancy.
0.2
0.4
0.6
0.8
1
dens
2
rel
0.2
0.4
0.6
0.8
1
dens
2
rel
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
dens
2
rel
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
dens
2
rel
better 0
0.05
0.1
0.15
0.2
0.25
0.3
0 0.2 0.4 0.6 0.8 1
σ2
rel
0
0.05
0.1
0.15
0.2
0.25
0.3
0 0.2 0.4 0.6 0.8 1
σ2
rel
better
6
oo hers
top-90%+randomtop-75%+randomtop-50%+randomtop-25%+random
All random
e se sers
p 2 %+r nd mAll random
top-90%+greedy-σ2top-75%+greedy-σ2top-50%+greedy-σ2top-25%+greedy-σ2
All greedy-σ2
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 12/25
A better measure? Combine both
• We need a combined measure that tightly integrates both relevance and diversity aspects of the result set
• goodness [Tong11]
– downside: highly dominated by relevance
fG(S) = 2X
i2S
⇡i � dX
i,j2S
A(j, i)⇡j
� (1�d)X
j2S
⇡j
X
i2S
p⇤(i)max-sum relevance
penalize the score when two results share an edge
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 13/25
Proposed measure: l-step expanded relevance
• a combined measure of – l-step expansion ratio (σ2) – relevance scores (π)
• quantifies: relevance of the covered region of the graph
• do some sanity check with this new measure
`-step expanded relevance:
exprel`(S) =X
v2N`(S)
⇡v
where N`(S) is the `-step expansion
set of the result set S, and ⇡ is the
PPR scores of the items in the graph.
0
0.2
0.4
0.6
0.8
1
5 10 20 50 100
expr
el2
k
hers
oth r
top-90%+randomtop-75%+randomtop-50%+randomtop-25%+random
All random
others
her
ot ers
to -50%+randomto 25%+random
All randomothersothers
top-90%+greedy-σ2top-75%+greedy-σ2top-50%+greedy-σ2top-25%+greedy-σ2
All greedy-σ2
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 14/25
Correlations of the measures
rele
vanc
e di
vers
ity
goodness is dominated by the relevancy measures
exprel has no high correlations with other relevance or diversity measures
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 15/25
Proposed algorithm: Best Coverage • Can we use -step expanded
relevance as an objective function? • Define:
• Complexity: generalization of weighted maximum coverage problem – NP-hard! – but exprell is a submodular function (Lemma 4.2) – a greedy solution (Algorithm 1) that selects the item
with the highest marginal utility at each step is the best possible polynomial time approximation (proof based on [Nemhauser78])
• Relaxation: computes BestCoverage on highest ranked vertices to improve runtime
exprel`-diversified top-k ranking (DTR`)S = argmax
S0✓V|S0|=k
exprel`(S0)
g(v, S) =P
v02N`({v})�N`(S) ⇡v0
ALGORITHM 1: BestCoverageInput: k,G,⇡, `Output: a list of recommendations SS = ;while |S| < k do
v⇤ argmaxv g(v, S)S S [ {v⇤}
return S
ALGORITHM 2: BestCoverage (relaxed)
Input: k,G,⇡, `Output: a list of recommendations SS = ;Sort(V ) w.r.t ⇡i non-increasing
S1 V [1..k0], i.e., top-k0
vertices where k0= k¯�`
8v 2 S1, g(v) g(v, ;)8v 2 S1, c(v) Uncovered
while |S| < k dov⇤ argmaxv2S1 g(v)S S [ {v⇤}S2 N`({v⇤})for each v0 2 S2 do
if c(v0) = Uncovered thenS3 N`({v0})8u 2 S3, g(u) g(u)� ⇡v0
c(v0) Covered
return S
`
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 16/25
Experiments
• 5 target application areas, 5 graphs from SNAP
• Queries generated based on 3 scenario types – one random vertex – random vertices from one area of interest – multiple vertices from multiple areas of interest
Dataset |V | |E| � D D90% CCamazon0601 403.3K 3.3M 16.8 21 7.6 0.42ca-AstroPh 18.7K 396.1K 42.2 14 5.1 0.63cit-Patents 3.7M 16.5M 8.7 22 9.4 0.09soc-LiveJournal1 4.8M 68.9M 28.4 18 6.5 0.31web-Google 875.7K 5.1M 11.6 22 8.1 0.60
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 17/25
Results – relevance
• Methods should trade-off relevance for better diversity • Normalized relevance of top-k set is always 1 • DRAGON always return results having 70% similar items
to top-k, with more than 80% rel score
amazon0601, combined
0
0.2
0.4
0.6
0.8
1
5 10 20 50 100
rel
k
100
PPR (top-k)GrassHopperDragonPDivRankCDivRankk-RLM
B
BC2 relaxed)
1 0
GSparseBC1BC2BC1 (relaxed)BC2 (relaxed)
0
0.2
0.4
0.6
0.8
1
5 10 20 50 100k
soc-LiveJournal1, combined
2 50 0
m s
PPR (top-k)GrassHopperDragonPDivRankCDivRank
RGS
1
100
5 10 20 0 100
ec
)
k-RLMGSparseBC1BC1 (relaxed)
ama on0601 comb ned
0
PR )a
Drag n
Ck RLM
C1C
0
1
d
soc-L veJo rnal1, combined
0
σ
k
an
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 18/25
Results – coverage
• l-step expansion ratio (σ2) gives the graph coverage of the result set: better coverage = better diversity
• BestCoverage and DivRank variants, especially BC2 and PDivRank, have the highest coverage
( passHopper
i RaCD v an
G par eBC
C
amazon0601, combined
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
5 10 20 50 100
σ2
k
PPR (top-k)GrassHopperDragonPDivRankCDivRankk-RLMGSparseBC1BC2
PD vRankCD vRan
BC
BC1 (relaxedBC2 (relaxed)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
5 10 20 50 100k
ca-AstroPh, combined
0
0.2
0.
0
0 5
PPR (top-k)GrassHopperDragonPDivRankCDivRank
BC2BC (r xed
a
.2
.3
0
5 10 20 50 100
k-RLMGSparseBC1BC2BC1 (relaxed)BC2 (relaxed)
igur 3: Cov r e (� ) of h a orithms w h ary-ing k. BestCo erage and DivRa k varia ts haveh ghest o e a e on h g ap hi e D agon, G par
v i l
(top )
P
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 19/25
Results – expanded relevance
• combined measure for relevance and diversity • BestCoverage variants and GrassHopper perform better • Although PDivRank gives the highest coverage on
amazon graph, it fails to cover the relevant parts!
ak R MGS a se
0 2
0 20 50 00
and k-RL h ve simi ar coverages to top k e
amazon0601, c o m b i n e d
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
5 10 20 50 100
expr
el2
k
PPR (top-k)GrassHopperDragonPDivRankCDivRank
B 2B
10
k-RLMGSparseBC1BC2BC1 (relaxed)BC2 (relaxed)
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
5 10 20 50 100k
soc-LiveJournal1, c o m b i n e d
0
0.35
0.5
0 5
5 0 2 50 00
e2
PPR (top-k)GrassHopperDragonPDivRankCDivRankk-RLMGSparseBC1BC1 (relaxed)
F ure 4: Expan ed r lev nce (e pre ) w h ving k BC1 an B 2 var ants mostly score the best
th ugh PD vRank g ves the hi hest co e age on ama-zon060 (Fig. ), it fa ls to cov r the relevant part
BC2
BC1
PDivRank
GrassHopper
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 20/25
Results – efficiency • BC1 always performs
better, with a running time less than, DivRank and GrassHopper
• BC1 (relaxed) offers reasonable diversity, with a very little overhead on top of the PPR computation
0 01
0.1
1
10
5 10 20 50 100
time
(sec
)
k
PPR (top-k)GrassHopperDragon
ivR
PDivRankCDivRankk-RLMGSparse
C2C
BC1BC2BC1 (relaxed)BC2 (relaxed)
ca-AstroPh, combined
10
100
1000
10000
5 10 20 50 100
k
R (to k)assH pagon
DRLMSpar e
C r
e (s
ec)
PPR (top-k)GrassHopperDragonPDivRankCDivRankk-RLM
00 0
PPR op
Ho
GSparseBC1BC1 (relaxed)
soc-LiveJournal1, combined
10
100
1000
10000
5 10 20 50 100
time
(sec
)
k
PPR (top-k)GrassHopperDragonPDivRankCDivRankk-RLMGS
op-k)
GSparseBC1BC2BC1 (relaxed)BC2 (relaxed)
cit-Patents, scenario 1
1
10
100
1000
5 10 20 50 100
k
P R op-k)a sH pp
Div a k-RLM
CC la
web-Google, scenario 1
s ight o erhe d n top of the PP computa on
BC1
BC1
BC1 BC1
DivRank DivRank
DivRank DivRank
BC1 (relaxed) BC1 (relaxed)
BC1 (relaxed) BC1 (relaxed)
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 21/25
Results – intent aware experiments
• evaluation of intent-oblivious algorithms against intent-aware measures
• two measures – group coverage [Li11] – S-recall [Zhai03]
• cit-Patent dataset has the categorical information – 426 class labels, belong to 36 subtopics
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 22/25
Results – intent aware experiments
• group coverage [Li11] – How many different groups are covered by the results? – omits the actual intent of the query
• top-k results are not diverse enough • AllRandom results cover the most number of groups • PDivRank and BC2 follows
0
10
20
30
40
50
60
70
5 10 20 50 100
Cla
ss c
over
age
k
(a) Class over ge
0
5
10
15
20
25
30
5 10 20 50 100
Subt
opic
cov
erag
e
k
(b) S b
5
0
PPR (top-k)DragonPDivRankCDivRankk-RLMBC1BC2BC1 (relaxed)BC2 (relaxed)AllRandom
op
0
0
cag
PPR (top-k)DragonPDivRankCDivRankk-RLMBC1BC2BC1 (relaxed)BC2 (relaxed)AllRandom
l P d t h 3 q i
top-k top-k
random
BC2 BC2
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 23/25
0
10
20
30
40
50
60
70
5 10 20 50 100
Cla
ss c
over
age
k
PPR (top-k)DragonPDivRankCDivRankk-RLMBC1BC2BC1 (relaxed)BC2 (relaxed)AllRandom
(a) Class coverage
0
5
10
15
20
25
30
5 10 20 50 100
Subt
opic
cov
erag
e
k
PPR (top-k)DragonPDivRankCDivRankk-RLMBC1BC2BC1 (relaxed)BC2 (relaxed)AllRandom
(b) Subtopic coverage
0
1
2
3
4
5
6
5 10 20 50 100
Topi
c co
vera
ge
k
PPR (top-k)DragonPDivRankCDivRankk-RLMBC1BC2BC1 (relaxed)BC2 (relaxed)AllRandom
(c) Topic coverage
0
5
10
15
20
25
30
5 10 20 50 100
Subt
opic
cov
erag
e
k
PPR (top-k)DragonPDivRankCDivRankk-RLMBC1BC2BC1 (relaxed)BC2 (relaxed)AllRandom
0
0.1
0.2
0.3
0.4
0.5
0.6
5 10 20
S-re
call
(cla
ss)
k
(d) S-recall on classes
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
5 10 20S-
reca
ll (s
ubto
pic)
k
(e) S-recall on subtopics (f) S-recall on topics
Figure 8: Intent-aware results on cit-Patents dataset with scenario-3 queries.
level topics7. Here we present an evaluation of the intent-oblivious algorithms against intent-aware measures. Thisevaluation provides a validation of the diversification tech-niques with an external measure such as group coverage [14]and S-recall [23].
Intents of a query set Q is extracted by collecting theclasses, subtopics, and topics of each seed node. Since ouraim is to evaluate the results based on the coverage of dif-ferent groups, we only use scenario-3 queries that representmultiple interests.
One measure we are interested in is the group coverage asa diversity measure [14]. It computes the number of groupscovered by the result set and defined on classes, subtopics,and topics based on the intended level of granularity. How-ever, this measure omits the actual intent of a query, assum-ing that the intent is given with the classes of the seed nodes.
Subtopic recall (S-recall) has been defined as the percent-age of relevant subtopics covered by the result set [23]. Ithas also been redefined as Intent-Coverage [25], and used inthe experiments of [22]. S-recall of a result set S based onthe set of intents of the query I is computed with
S-recall(S, I) =1|I|
X
i2I
Bi
(S), (18)
where Bi
(S) is a binary variable indicating whether intent iis found in the results.
We give the results of group coverage and S-recall onclasses, subtopics, and topics in Figure 8. The algorithmsGrassHopper and GSparse are not included to the resultssince they perform worse than PPR. The results of AllRan-dom are included to give a comparison between the resultsof top-k relevant set (PPR) and ones chosen randomly.
As the group coverage plots show, top-k ranked items ofPPR do not have the necessary diversity in the result set,hence, the number of groups that are covered by these itemsare the lowest of all. On the other hand, a randomizedmethod brings irrelevant items from the search space with-out considering their relevance to the user query. The re-
7Available at: http://data.nber.org/patents/
sults of all of the diversification algorithms reside betweenthose two extremes, where the PDivRank covers the most,and Dragon covers the least number of groups.However, S-recall index measures whether a covered group
was actually useful or not. Obviously, AllRandom scores thelowest as it dismisses the actual query (you may omit the S-recall on topics since there are only 6 groups in this granular-ity level). Among the algorithms, BC
2
variants and BC1
scorethe best while BC
1
(relaxed) and DivRank variants havesimilar S-recall scores, even though BC
1
(relaxed) is a muchfaster algorithm than any DivRank variant (see Figure 7).
6. CONCLUSIONS AND FUTURE WORKIn this paper, we address the problem of evaluating re-
sult diversification as a bicriteria optimization problem witha relevance measure that ignores diversity, and a diversitymeasure that ignores relevance to the query. We prove it byrunning query-oblivious algorithms on two commonly usedcombination of objectives. Next, we argue that a result di-versification algorithm should be evaluated under a measurewhich tightly integrates the query in its value, and presenteda new measure called expanded relevance. Investigating var-ious quality indices by computing their pairwise correlation,we also show that this new measure has no direct correlationwith any other measure. In the second part of the paper,we analyze the complexity of the solution that maximizesthe expanded relevance of the results, and based on the sub-modularity property of the objective, we present a greedyalgorithm called BestCoverage, and its e�cient relaxation.We experimentally show that the relaxation carries no sig-nificant harm to the expanded relevance of the solution.As a future work, we plan to investigate the behavior of
the exprel
`
measure on social networks with ground-truthcommunities.
AcknowledgmentsThis work was supported in parts by the DOE grant DE-FC02-
06ER2775 and by the NSF grants CNS-0643969, OCI-0904809,
and OCI-0904802.
Results – intent aware experiments • S-recall [Zhai03], Intent-coverage [Zhu11]
– percentage of relevant subtopics covered by the result set – the intent is given with the classes of the seed nodes
• AllRandom brings irrelevant items from the search space • top-k results do not have the necessary diversity • BC2 variants and BC1 perform better than DivRank • BC1 (relaxed) and DivRank scores similar, but BC1r much faster
top-
k
rand
om
Div
Ran
k B
C
0
10
20
30
40
50
60
70
5 10 20 50 100
Cla
ss c
over
age
k
PPR (top-k)DragonPDivRankCDivRankk-RLMBC1BC2BC1 (relaxed)BC2 (relaxed)AllRandom
(a) Class coverage
0
5
10
15
20
25
30
5 10 20 50 100
Subt
opic
cov
erag
e
k
PPR (top-k)DragonPDivRankCDivRankk-RLMBC1BC2BC1 (relaxed)BC2 (relaxed)AllRandom
(b) Subtopic coverage
0
1
2
3
4
5
6
5 10 20 50 100
Topi
c co
vera
ge
k
PPR (top-k)DragonPDivRankCDivRankk-RLMBC1BC2BC1 (relaxed)BC2 (relaxed)AllRandom
(c) Topic coverage
0
5
10
15
20
25
30
5 10 20 50 100
Subt
opic
cov
erag
ek
PPR (top-k)DragonPDivRankCDivRankk-RLMBC1BC2BC1 (relaxed)BC2 (relaxed)AllRandom
(d) S-recall on classes (e) S-recall on subtopics (f) S-recall on topics
PPR (top-k)DragonPDivRankCDivRankk-RLMBC1BC2BC1 (relaxed)BC2 (relaxed)AllRandom
Figure 8: Intent-aware results on cit-Patents dataset with scenario-3 queries.
level topics7. Here we present an evaluation of the intent-oblivious algorithms against intent-aware measures. Thisevaluation provides a validation of the diversification tech-niques with an external measure such as group coverage [14]and S-recall [23].
Intents of a query set Q is extracted by collecting theclasses, subtopics, and topics of each seed node. Since ouraim is to evaluate the results based on the coverage of dif-ferent groups, we only use scenario-3 queries that representmultiple interests.
One measure we are interested in is the group coverage asa diversity measure [14]. It computes the number of groupscovered by the result set and defined on classes, subtopics,and topics based on the intended level of granularity. How-ever, this measure omits the actual intent of a query, assum-ing that the intent is given with the classes of the seed nodes.
Subtopic recall (S-recall) has been defined as the percent-age of relevant subtopics covered by the result set [23]. Ithas also been redefined as Intent-Coverage [25], and used inthe experiments of [22]. S-recall of a result set S based onthe set of intents of the query I is computed with
S-recall(S, I) =1|I|
X
i2I
Bi
(S), (18)
where Bi
(S) is a binary variable indicating whether intent iis found in the results.
We give the results of group coverage and S-recall onclasses, subtopics, and topics in Figure 8. The algorithmsGrassHopper and GSparse are not included to the resultssince they perform worse than PPR. The results of AllRan-dom are included to give a comparison between the resultsof top-k relevant set (PPR) and ones chosen randomly.
As the group coverage plots show, top-k ranked items ofPPR do not have the necessary diversity in the result set,hence, the number of groups that are covered by these itemsare the lowest of all. On the other hand, a randomizedmethod brings irrelevant items from the search space with-out considering their relevance to the user query. The re-
7Available at: http://data.nber.org/patents/
sults of all of the diversification algorithms reside betweenthose two extremes, where the PDivRank covers the most,and Dragon covers the least number of groups.However, S-recall index measures whether a covered group
was actually useful or not. Obviously, AllRandom scores thelowest as it dismisses the actual query (you may omit the S-recall on topics since there are only 6 groups in this granular-ity level). Among the algorithms, BC
2
variants and BC1
scorethe best while BC
1
(relaxed) and DivRank variants havesimilar S-recall scores, even though BC
1
(relaxed) is a muchfaster algorithm than any DivRank variant (see Figure 7).
6. CONCLUSIONS AND FUTURE WORKIn this paper, we address the problem of evaluating re-
sult diversification as a bicriteria optimization problem witha relevance measure that ignores diversity, and a diversitymeasure that ignores relevance to the query. We prove it byrunning query-oblivious algorithms on two commonly usedcombination of objectives. Next, we argue that a result di-versification algorithm should be evaluated under a measurewhich tightly integrates the query in its value, and presenteda new measure called expanded relevance. Investigating var-ious quality indices by computing their pairwise correlation,we also show that this new measure has no direct correlationwith any other measure. In the second part of the paper,we analyze the complexity of the solution that maximizesthe expanded relevance of the results, and based on the sub-modularity property of the objective, we present a greedyalgorithm called BestCoverage, and its e�cient relaxation.We experimentally show that the relaxation carries no sig-nificant harm to the expanded relevance of the solution.As a future work, we plan to investigate the behavior of
the exprel
`
measure on social networks with ground-truthcommunities.
AcknowledgmentsThis work was supported in parts by the DOE grant DE-FC02-
06ER2775 and by the NSF grants CNS-0643969, OCI-0904809,
and OCI-0904802.
rand
om
rand
om
rand
om
rand
om
rand
om
top-
k
top-
k
top-
k
top-
k
top-
k
Div
Ran
k
Div
Ran
k
Div
Ran
k
Div
Ran
k
DR
BC
BC
BC
BC
BC
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 24/25
Conclusions
• Result diversification should not be evaluated as a bicriteria optimization problem with – a relevance measure that ignores diversity, and – a diversity measure that ignores relevancy
• l-step expanded relevance is a simple measure that combines both relevance and diversity
• BestCoverage, a greedy solution that maximizes exprell is a (1-1/e)-approximation of the optimal solution
• BestCoverage variants perform better than others, its relaxation is extremely efficient
• goodness in DRAGON is dominated by relevancy • DivRank variants implicitly optimize expansion ratio
Kucuktunc et al. “Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms”, WWW’13 25/25
Thank you
• For more information visit • http://bmi.osu.edu/hpc
• Research at the HPC Lab is funded by