Download - Presentation @SIGIR2015

Local Ranking Problem

Michele Trevisiol, Luca Maria Aiello, Paolo Boldi, Roi Blanco

on the BrowseGraph

1

“when the centrality-like rank computed on a local graph differ from the ones on the global graph”

0.40.6

0.50.1

0.2

0.3

0.01

0.01

0.1


- Bressan et al. in WWW 2013, “The Power of Local Information in PageRank”- Bar-Yossef and Mashiach in CIKM 2008, “Local Approximation of PageRank and reverse PageRank”- Chen et al. in CIKM 2004, “Local Methods for Estimating PageRank Values”

0.40.6

0.50.1

0.2

0.3

0.01

0.01

0.1

0.3

0.6

0.3

0.3

0.2

0.4

0.3

0.6

0.2

2

The BrowseGraph

user session

BrowseGraph

3

“a graph where nodes are webpages and edges are browsing transitions”

user navigation(e.g. Flickr)

cons

truct

ion

Centrality Metrics applied to the BrowseGraph

Increasing popularity in recent years- Chiarandini et al. in ICWSM 2013, “Leveraging browsing patterns for topic discovery and photostream recommendation”- Trevisiol et al. in SIGIR 2012, “Image ranking based on user browsing behavior”- Liu et al. in CIKM 2011, “User browsing behavior-driven web crawling”

Provide higher-quality rankings compared to standard hyperlinks graphs- Y. Liu et al. in SIGIR 2008, “Browserank: letting web users vote for page importance.”

4

Local Ranking Problem on the BrowseGraphWHY?

5

Local Ranking Problem on the BrowseGraphWHY?

Image Ranking in Flickr in SIGIR 2012We compared different ranking approaches on the BrowseGraph

(PageRank and BrowseRank among others)

How much our rank could vary having more

information (i.e. nodes)?

6

BrowseGraph and ReferrerGraphsReferrerGraphs: Domain-dependent Browse Graph

Construct different BrowseGraphs based on the referrer domain

Recommend news articles following the ReferrerGraphs

BrowseGraphTwitter ReferrerGraph

Facebook ReferrerGraph

7

Can we rely on centrality-based algorithms to infer news importance?

Local Ranking Problem on the BrowseGraph

Study of the LRP on the BrowseGraph by incrementally expand the local graph (“Growing Rings” experiment)

How to estimate the “distance” between the local and global PageRank exploiting the structural properties of the local graph

Discover the referrer domain when it is not available (not discussed in the presentation—please see the paper)

8

Social Networks Search Engines News Homepage

Yahoo News BrowseGraph

~500M pageviews

Local Ranking Problem on the BrowseGraph

1. Construct the BrowseGraph (our “global graph”)2. Construct the ReferrerGraphs (our “local graphs”)

9

Very different dimensions

Subgraph Comparison

Very well connected (also Reddit—the smallest one)

10

Cross-distance Kendall-tau among common nodes (min overlap 1k)

In general the similarities are very low (<0.3)~different content or different users’ interest

Search engines are the most similar (>0.5)

Subgraph Comparison

11

1. For each ReferrerGraph2. Compare the PageRank values with the

global one (Kendall-tau)3. Expand with the next neighborhood of

nodes4. Iterate till the convergence is closer to 1

Growing Rings Experiment

Study of the LRP on the BrowseGraph by incrementally expand the local graph

K(local+0, global) ~0.307K(local+1, global) ~0.524K(local+2, global) ~0.740K(local+3, global) ~0.912

12

Referrer-based (RB) : the 7 ReferrerGraphs (Facebook, Twitter, Reddit, Homepage, Yahoo, Google, Bing)


13

Same size referrer-based (SRB) to measure the impact of the graph size

Random (R) : 7 random graphs reflecting the size of the original RB graphs


14

ReferrerGraphs


15

same size RGs RandomReferrerGraphs

Hypothesis 1 : adding all the nodes mean to add more information, therefore it should lead to a faster convergence (Boldi et al. [6] in the paper)

Hypothesis 2 : the most representative nodes bring less noise and therefore a quicker convergence (Cho et al. [13] in the paper)

How does the expansion influences convergence if only few more

representative nodes are selected ?

Growing Rings Experiment with Selection of Nodes

16

Growing Rings Experiment with Selection of Nodes

• 5 • 10 • 30 • 50 • 100

• 100 • 50 • 30 • 10 • 5

fewer more representative nodes lead to a better estimation of PageRank values in the first iterationin the long run, expansions with the highest number of nodes present the best convergence

17

Growing Rings Expansion ..with Selected Nodes

~1 or 2 steps can be enough to estimate the PageRank score of the global graph

Predicting Kendall-tau Distance

Can we estimate the “distance” between the local and global PageRank only considering information available

in the local graph ?

18

Hypothesis : some structural properties of the graph could be a good proxies for the tau value

difference between local and global ranks.


Can we estimate the distancebetween the local and global PageRank only considering information available


19

Training Set Construction


ReferrerGraph

Jackknife resampling (1%, 5%, 10%, 20%)

homepage

Kendall-tau distancebetween ReferrerGraphand reduced subgraphs

20

Size and Connectivity (S) : basic statistics Assortativity (A) : tendency of node with a certain degree to be linked with nodes with similar degree Degree (D) : statistics on the degree distribution Weighted degree (W) : same as degree but considering the weight on edges (transitions) Local PageRank (P) : stats on the PageRank values Closeness centralization (C) : statistics on the distance (no hops)

• A. Barrat et al. in Cambridge Univ. Press 2008, “Dynamical Processes on Complex Networks” • S. Wasserman and K. Faust in Cambridge Univ. Press 1994, “Social Network Analysis: Methods and Applications”


We compute 62 structural graphs metrics for each training instance

Extract Structural Properties of each Graph

21

Regression Analysis (RF) in a five-fold CV over 10 iterations

weighted degree : most predictive features ~better than using all the features

assortativity : less predictive power ~too many features and too little training data?

22



Most importance features in weighted degree :

features based on the distribution of in- and out- degree:

very straightforward to compute information alway available in the local graph

23

YES.With just few structural properties features of the of the local graph.


Can we estimate the distancebetween the local and global PageRank only considering information available


24

Summary

How the LRP behaves on the BrowseGraph: expanding the local graph with the whole neighborhoods (“Growing Rings” experiment) or with the most representative nodes(“Growing Rings with Selection of Nodes”)

It is possible to estimate the “distance” between the local and global PageRank exploiting the structural properties of the local graph

25


Michele Trevisiol, Luca Maria Aiello, Paolo Boldi, Roi Blanco

on the BrowseGraph

26

Thanks.