+ All Categories
Home > Documents > Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds...

Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds...

Date post: 09-Jul-2020
Category:
Upload: others
View: 6 times
Download: 0 times
Share this document with a friend
39
Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs J. Feldman*, S. Lattanzi*, S. Leonardi°, V. Mirrokni*. *Google Research °Sapienza U. Rome Alessandro Epasto
Transcript
Page 1: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Reduce and Aggregate: Similarity Ranking in Multi-Categorical

Bipartite Graphs

J. Feldman*, S. Lattanzi*, S. Leonardi°, V. Mirrokni*. *Google Research °Sapienza U. Rome

Alessandro Epasto

Page 2: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Motivation

● Recommendation Systems: ● Bipartite graphs with Users and Items. ● Identify similar users and suggest relevant

items. ● Concrete example: The AdWords case.

● Two key observations: ● Items belong to different categories. ● Graphs are often lopsided.

Page 3: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Modeling the Data as a Bipartite Graph

Millions of Advertisers Billions of Queries

Hundreds of Labels

Nike Store New York

Soccer Shoes

Soccer Ball

2$

3$

4$

1$

5$

2$

Retailers

Apparel

Sport Equipment

Page 4: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Personalized PageRank

v u

 

 

 

 

The stationary distribution assigns a similarity score to each node in the graph w.r.t. node v.

For a node v (the seed) and a probability alpha

Page 5: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

The Problem

Millions of Advertisers Billions of Queries

Hundreds of Labels

Nike Store New York

Soccer Shoes

Soccer Ball

2$

3$

4$

1$

5$

2$

Retailers

Apparel

Sport Equipment

Page 6: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Other Applications

● General approach applicable to several contexts: ●User, Movies, Genres: find similar users

and suggest movies. ● Authors, Papers, Conferences: find

related authors and suggest papers to read.

Page 7: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Semi-Formal Problem Definition

Advertisers

Queries

Page 8: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Semi-Formal Problem Definition

A

Advertisers

Queries

Page 9: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Semi-Formal Problem Definition

A

Advertisers

Queries

Labels:

Page 10: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Semi-Formal Problem Definition

A

Advertisers

Queries

Labels:Goal:

Find the nodes most “similar” to A.

Page 11: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

How to Define Similarity?

● We address the computation of several node similarity measures: ● Neighborhood based: Common neighbors,

Jaccard Coefficient, Adamic-Adar. ● Paths based: Katz. ● Random Walk based: Personalized PageRank.

● Experimental question: which measure is useful? ● Algorithmic questions: ● Can it scale to huge graphs? ● Can we compute it in real-time?

Page 12: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Our Contribution

● Reduce and Aggregate: general approach to induce real-time similarity rankings in multi-categorical bipartite graphs, that we apply to several similarity measures.

● Theoretical guarantees for the precision of the algorithms.

● Experimental evaluation with real world data.

Page 13: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Personalized PageRank

v u

 

 

 

 

The stationary distribution assigns a similarity score to each node in the graph w.r.t. node v.

For a node v (the seed) and a probability alpha

Page 14: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Challenges

● Our graphs are too big (billions of nodes) even for very large-scale MapReduce systems.

● MapReduce is not real-time.

● We cannot pre-compute the rankings for each subset of labels.

Page 15: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Reduce and Aggregate

Reduce: Given the bipartite and a category construct a graph with only A nodes that preserves the ranking on the entire graph.

Aggregate: Given a node v in A and the reduced graphs of the subset of categories interested determine the ranking for v.

a

b

c

a b

c

c

a

b

a

c

1)

b

2)

3)

Page 16: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Reduce (Precomputation)

Advertisers

Queries

Page 17: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Reduce (Precomputation)

Advertisers

Queries

Precomputed Rankings

Page 18: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Reduce (Precomputation)

Advertisers

Queries

Precomputed Rankings

Precomputed Rankings

Page 19: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Reduce (Precomputation)

Advertisers

Queries

Precomputed Rankings

Precomputed Rankings

Precomputed Rankings

Page 20: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Aggregate (Run Time)

Precomputed Rankings

Precomputed Rankings

Ranking of Red + Yellow

A

Page 21: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Reduce for Personalized PageRank

●Markov Chain state aggregation theory (Simon and Ado, ’61; Meyer ’89, etc.).

● 750x reduction in the number of node while preserving correctly the PPR distribution on the entire graph.

XSide A

Side BSide A

Y X

Y

Page 22: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Run-time Aggregation

Page 23: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Koury et al. Aggregation-Disaggregation Algorithm

Step 1: Partition the Markov chain into DISJOINT subsets

A B

Page 24: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Koury et al. Aggregation-Disaggregation Algorithm

Step 2: Approximate the stationary distribution on each subset independently.

⇡A⇡B

A B

Page 25: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Koury et al. Aggregation-Disaggregation Algorithm

Step 3: Consider the transition between subsets.

⇡A

PAB

PBA

PBB

PAAA B

⇡B

Page 26: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Koury et al. Aggregation-Disaggregation Algorithm

Step 4: Aggregate the distributions. Repeat until convergence.

PAB

PBA

PBB

PAAA B

⇡0B⇡0

A

Page 27: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Aggregation in PPR

X Y

Precompute the stationary distributions individually

⇡A

A

Page 28: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Aggregation in PPR

X Y

Precompute the stationary distributions individually

⇡B

B

Page 29: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Aggregation in PPR

The two subsets are not disjoint!

A B

Page 30: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Our Approach

X Y X Y

● The algorithm is based only on the reduced graphs with Advertiser-Side nodes.

● The aggregation algorithm is scalable and converges to the correct distribution.

⇡A ⇡B

Page 31: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Experimental Evaluation

● We experimented with publicly available and proprietary datasets:

● Query-Ads graph from Google AdWords > 1.5 billions nodes, > 5 billions edges.

● DBLP Author-Papers and Patent Inventor-Inventions graphs.

● Ground-Truth clusters of competitors in Google AdWords.

Page 32: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Patent Graph

Recall

Prec

isio

n

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Pre

cis

ion

Recall

Precision vs Recall

InterJaccard

Adamic-AdarKatzPPR

Page 33: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Google AdWords

Recall

Prec

isio

n

Page 34: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Conclusions and Future Work

● It is possible to compute several similarity scores on very large bipartite graphs in real-time with good accuracy.

●Future work could focus on the case where categories are not disjoint is relevant.

Page 35: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Thank you for your attention

Page 36: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Reduction to the Query Side

X Y

⇡A ⇡B

Page 37: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Reduction to the Query Side

X Y

This is the larger side of the graph.

⇡A ⇡B

Page 38: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Convergence after One Iteration

0

0.2

0.4

0.6

0.8

1

10 20 30 40 50 All

Kend

all-T

au

Position (k)

Kendall-Tau Correlation

DBLPPatent

Query-Ads (cost)

Page 39: Reduce and Aggregate: Similarity Ranking in Multi ...Millions of Advertisers Billions of Queries eds Labels Nike Store New York Soccer Shoes Soccer Ball 2$ 3$ 4$ 1$ 5$ 2$ Retailers

Convergence

Iterations

1-Co

sine

Sim

ilari

ty

1e-06

1e-05

0.0001

0.001

0 2 4 6 8 10 12 14 16 18 20

1-C

osi

ne

Iterations

Approximation Error vs # Iterations

DBLP (1 - Cosine)Patent (1 - Cosine)


Recommended