Local Ranking Problem
Michele Trevisiol, Luca Maria Aiello, Paolo Boldi, Roi Blanco
on the BrowseGraph
1
“when the centrality-like rank computed on a local graph differ from the ones on the global graph”
0.40.6
0.50.1
0.2
0.3
0.01
0.01
0.1
Local Ranking Problem
- Bressan et al. in WWW 2013, “The Power of Local Information in PageRank”- Bar-Yossef and Mashiach in CIKM 2008, “Local Approximation of PageRank and reverse PageRank”- Chen et al. in CIKM 2004, “Local Methods for Estimating PageRank Values”
0.40.6
0.50.1
0.2
0.3
0.01
0.01
0.1
0.3
0.6
0.3
0.3
0.2
0.4
0.3
0.6
0.2
2
The BrowseGraph
user session
BrowseGraph
3
“a graph where nodes are webpages and edges are browsing transitions”
user navigation(e.g. Flickr)
cons
truct
ion
Centrality Metrics applied to the BrowseGraph
Increasing popularity in recent years- Chiarandini et al. in ICWSM 2013, “Leveraging browsing patterns for topic discovery and photostream recommendation”- Trevisiol et al. in SIGIR 2012, “Image ranking based on user browsing behavior”- Liu et al. in CIKM 2011, “User browsing behavior-driven web crawling”
Provide higher-quality rankings compared to standard hyperlinks graphs- Y. Liu et al. in SIGIR 2008, “Browserank: letting web users vote for page importance.”
4
Local Ranking Problem on the BrowseGraphWHY?
Image Ranking in Flickr in SIGIR 2012We compared different ranking approaches on the BrowseGraph
(PageRank and BrowseRank among others)
How much our rank could vary having more
information (i.e. nodes)?
6
BrowseGraph and ReferrerGraphsReferrerGraphs: Domain-dependent Browse Graph
Construct different BrowseGraphs based on the referrer domain
Recommend news articles following the ReferrerGraphs
BrowseGraphTwitter ReferrerGraph
Facebook ReferrerGraph
7
Can we rely on centrality-based algorithms to infer news importance?
Local Ranking Problem on the BrowseGraph
Study of the LRP on the BrowseGraph by incrementally expand the local graph (“Growing Rings” experiment)
How to estimate the “distance” between the local and global PageRank exploiting the structural properties of the local graph
Discover the referrer domain when it is not available (not discussed in the presentation—please see the paper)
8
Social Networks Search Engines News Homepage
Yahoo News BrowseGraph
~500M pageviews
Local Ranking Problem on the BrowseGraph
1. Construct the BrowseGraph (our “global graph”)2. Construct the ReferrerGraphs (our “local graphs”)
9
Cross-distance Kendall-tau among common nodes (min overlap 1k)
In general the similarities are very low (<0.3)~different content or different users’ interest
Search engines are the most similar (>0.5)
Subgraph Comparison
11
1. For each ReferrerGraph2. Compare the PageRank values with the
global one (Kendall-tau)3. Expand with the next neighborhood of
nodes4. Iterate till the convergence is closer to 1
Growing Rings Experiment
Study of the LRP on the BrowseGraph by incrementally expand the local graph
K(local+0, global) ~0.307K(local+1, global) ~0.524K(local+2, global) ~0.740K(local+3, global) ~0.912
12
Referrer-based (RB) : the 7 ReferrerGraphs (Facebook, Twitter, Reddit, Homepage, Yahoo, Google, Bing)
Growing Rings Experiment
13
Same size referrer-based (SRB) to measure the impact of the graph size
Random (R) : 7 random graphs reflecting the size of the original RB graphs
Hypothesis 1 : adding all the nodes mean to add more information, therefore it should lead to a faster convergence (Boldi et al. [6] in the paper)
Hypothesis 2 : the most representative nodes bring less noise and therefore a quicker convergence (Cho et al. [13] in the paper)
How does the expansion influences convergence if only few more
representative nodes are selected ?
Growing Rings Experiment with Selection of Nodes
16
Growing Rings Experiment with Selection of Nodes
• 5 • 10 • 30 • 50 • 100
• 100 • 50 • 30 • 10 • 5
fewer more representative nodes lead to a better estimation of PageRank values in the first iterationin the long run, expansions with the highest number of nodes present the best convergence
17
Growing Rings Expansion ..with Selected Nodes
~1 or 2 steps can be enough to estimate the PageRank score of the global graph
Predicting Kendall-tau Distance
Can we estimate the “distance” between the local and global PageRank only considering information available
in the local graph ?
18
Hypothesis : some structural properties of the graph could be a good proxies for the tau value
difference between local and global ranks.
Predicting Kendall-tau Distance
Can we estimate the distancebetween the local and global PageRank only considering information available
in the local graph ?
19
Training Set Construction
Predicting Kendall-tau Distance
ReferrerGraph
Jackknife resampling (1%, 5%, 10%, 20%)
homepage
Kendall-tau distancebetween ReferrerGraphand reduced subgraphs
20
Size and Connectivity (S) : basic statistics Assortativity (A) : tendency of node with a certain degree to be linked with nodes with similar degree Degree (D) : statistics on the degree distribution Weighted degree (W) : same as degree but considering the weight on edges (transitions) Local PageRank (P) : stats on the PageRank values Closeness centralization (C) : statistics on the distance (no hops)
• A. Barrat et al. in Cambridge Univ. Press 2008, “Dynamical Processes on Complex Networks” • S. Wasserman and K. Faust in Cambridge Univ. Press 1994, “Social Network Analysis: Methods and Applications”
Predicting Kendall-tau Distance
We compute 62 structural graphs metrics for each training instance
Extract Structural Properties of each Graph
21
Regression Analysis (RF) in a five-fold CV over 10 iterations
weighted degree : most predictive features ~better than using all the features
assortativity : less predictive power ~too many features and too little training data?
22
Predicting Kendall-tau Distance
Predicting Kendall-tau Distance
Most importance features in weighted degree :
features based on the distribution of in- and out- degree:
very straightforward to compute information alway available in the local graph
23
YES.With just few structural properties features of the of the local graph.
Predicting Kendall-tau Distance
Can we estimate the distancebetween the local and global PageRank only considering information available
in the local graph ?
24
Summary
How the LRP behaves on the BrowseGraph: expanding the local graph with the whole neighborhoods (“Growing Rings” experiment) or with the most representative nodes(“Growing Rings with Selection of Nodes”)
It is possible to estimate the “distance” between the local and global PageRank exploiting the structural properties of the local graph
25