+ All Categories
Home > Technology > Fast katz-presentation

Fast katz-presentation

Date post: 11-Jun-2015
Category:
Upload: david-gleich
View: 944 times
Download: 0 times
Share this document with a friend
Description:
Fast Katz and Commuters - Quadrature Rules and Sparse Linear Solvers for Link Prediction Heuristics Motivated by social network data mining problems such as link prediction and collaborative filtering, significant research effort has been devoted to computing topological measures including the Katz score and the commute time. Existing approaches approximate all pairwise relationships simultaneously. We are interested in computing the score for a single pair of nodes; the top-k nodes with the best scores from a given source node. For the pairwise problem, we introduce an iterative algorithm that computes upper and lower bounds for the measures we seek. This algorithm exploits a relationship between the Lanczos process and a quadrature rule. For the top-k problem, we propose an algorithm that only accesses a small portion of the graph, similar to algorithms used in personalized PageRank computing. To test scalability and accuracy, we experiment with three real-world networks and find that our algorithms run in milliseconds to seconds without any preprocessing.
Popular Tags:
50
Tweet along @dgleich FAST KATZ AND COMMUTERS Quadrature Rules and Sparse Linear Solvers for Link Prediction Heuristics David F. Gleich Sandia National Labs la/opt seminar October 14 th 2010 With Pooya Esfandiar, Francesco Bonchi, Chen Grief, Laks V. S. Lakshmanan, and Byung-Won On David F. Gleich (Sandia) ICME la/opt seminar 1 / 50
Transcript
Page 1: Fast katz-presentation

Tweet along @dgleich

FAST KATZ AND COMMUTERS

Quadrature Rules and Sparse Linear Solvers

for Link Prediction Heuristics

David F. Gleich

Sandia National Labs

la/opt seminar

October 14th 2010

With Pooya Esfandiar, Francesco Bonchi, Chen Grief,

Laks V. S. Lakshmanan, and Byung-Won On

David F. Gleich (Sandia) ICME la/opt seminar 1 / 50

Page 2: Fast katz-presentation

Tweet along @dgleich

MAIN RESULTS – SLIDE ONE

A – adjacency matrix

L – Laplacian matrix

Katz score :

                                               

Commute time:

                               

David F. Gleich (Sandia) ICME la/opt seminar 2 of 50

Page 3: Fast katz-presentation

Tweet along @dgleich

MAIN RESULTS – SLIDE TWO

For Katz Compute one       fast

Compute top       fast

For Commute

Compute one       fast

For almost commute

Compute top       fast

David F. Gleich (Sandia) ICME la/opt seminar 3 of 50

Page 4: Fast katz-presentation

Tweet along @dgleich

MAIN RESULTS – SLIDE THREE

David F. Gleich (Sandia) ICME la/opt seminar 4 of 50

Page 5: Fast katz-presentation

Tweet along @dgleich

OUTLINE

Why study these measures?

Katz Rank and Commute Time

How else do people compute them?

Quadrature rules for pairwise scores

Sparse linear systems solves for top-k

As many results as we have time for…

David F. Gleich (Sandia) ICME la/opt seminar 5 of 50

Page 6: Fast katz-presentation

Tweet along @dgleich

WHY? LINK PREDICTION

David F. Gleich (Sandia) ICME la/opt seminar

Liben-Nowell and Kleinberg 2003, 2006 found that path based link prediction was more efficient

Neighborhood based

Path based

6 of 50

Page 7: Fast katz-presentation

Tweet along @dgleich

NOTE

All graphs are undirected

All graphs are connected

David F. Gleich (Sandia) ICME la/opt seminar 7 of 50

Page 8: Fast katz-presentation

Tweet along @dgleich

LEO KATZ

David F. Gleich (Sandia) ICME la/opt seminar 8 of 50

Page 9: Fast katz-presentation

Tweet along @dgleich

NOT QUITE, WIKIPEDIA

    : adjacency,                     : random walk

PageRank                        

Katz                        

These are equivalent if     has constant degree

David F. Gleich (Sandia) ICME la/opt seminar 9 of 50

Page 10: Fast katz-presentation

Tweet along @dgleich

WHAT KATZ ACTUALLY SAID

Leo Katz 1953, A New Status Index Derived from Sociometric Analysis, Psychometria 18(1):39-43

“we assume that each link independently has the

same probability of being effective” …

“we conceive a constant     , depending

on the group and the context of the particular

investigation, which has the force of a probability

of effectiveness of a single link. A k-step chain

then, has probability       of being effective.”

“We wish to find the column sums of the matrix”

David F. Gleich (Sandia) ICME la/opt seminar 10 of 50

Page 11: Fast katz-presentation

Tweet along @dgleich

A MODERN TAKE

The Katz score (node-based) is

                   

The Katz score (edge-based) is

                   

David F. Gleich (Sandia) ICME la/opt seminar 11 of 50

Page 12: Fast katz-presentation

Tweet along @dgleich

RETURNING TO THE MATRIX

                     

                                                     

Carl Neumann                          

                                   

David F. Gleich (Sandia) ICME la/opt seminar 12 of 50

Page 13: Fast katz-presentation

Tweet along @dgleich

Carl Neumann

I’ve heard the Neumann series called the “von Neumann”

series more than I’d like! In fact, the von Neumann kernel

of a graph should be named the “Neumann” kernel!

David F. Gleich (Sandia) ICME la/opt seminar

Wikipedia page

13 / 50

Page 14: Fast katz-presentation

Tweet along @dgleich

PROPERTIES OF KATZ’S MATRIX

    is symmetric

    exists when                      

              is sym. pos. def. when                      

Note that         1/max-degree suffices

David F. Gleich (Sandia) ICME la/opt seminar 14 of 50

Page 15: Fast katz-presentation

Tweet along @dgleich

COMMUTE TIME

Consider a uniform random walk on a graph                    

                                                     

                                               

David F. Gleich (Sandia) ICME la/opt seminar

Also called the hitting

time from node i to j, or

the first transition time

15 of 50

Page 16: Fast katz-presentation

Tweet along @dgleich

SKIPPING DETAILS

                : graph Laplacian

                                                             

            is the only null-vector

David F. Gleich (Sandia) ICME la/opt seminar 16 of 50

Page 17: Fast katz-presentation

Tweet along @dgleich

WHAT DO OTHER PEOPLE DO?

1) Just work with the linear algebra formulations

2) For Katz, Truncate the Neumann series as a few (3-5) terms (I’m searching for this ref.)

3) Use low-rank approximations from EVD(A) or EVD(L)

4) For commute, use Johnson-Lindenstrauss inspired random sampling

5) Approximately decompose into smaller problems

David F. Gleich (Sandia) ICME la/opt seminar

Liben-Nowll and Kleinberg CIKM2003, Acar et al. ICDM2009, Spielman and Srivastava STOC2008, Sarkar and Moore UAI2007

17 of 50

Page 18: Fast katz-presentation

Tweet along @dgleich

THE PROBLEM

All of these techniques are

preprocessing based because

most people’s goal is to compute

all the scores.

We want to avoid

preprocessing the graph.

David F. Gleich (Sandia) ICME la/opt seminar

There are a few caveats here! i.e. one could solve the system instead of looking for the matrix inverse

18 of 50

Page 19: Fast katz-presentation

Tweet along @dgleich

WHY NO PREPROCESSING?

The graph is constantly changing

as I rate new movies.

David F. Gleich (Sandia) ICME la/opt seminar 19 of 50

Page 20: Fast katz-presentation

Tweet along @dgleich

WHY NO PREPROCESSING?

David F. Gleich (Sandia) ICME la/opt seminar

Top-k predicted “links”

are movies to watch!

Pairwise scores give

user similarity

20 of 50

Page 21: Fast katz-presentation

Tweet along @dgleich

PAIRWISE ALGORITHMS

Katz                    

Commute                                        

David F. Gleich (Sandia) ICME la/opt seminar

Golub and Meurant

to the rescue!

21 of 50

Page 22: Fast katz-presentation

Tweet along @dgleich

MMQ - THE BIG IDEA

Quadratic form                    

Weighted sum                      

Stieltjes integral                      

Quadrature approximation                      

Matrix equation               David F. Gleich (Sandia) ICME la/opt seminar

Think                    

A is s.p.d. use EVD

“A tautology”

Lanczos

22 of 50

Page 23: Fast katz-presentation

Tweet along @dgleich

LANCZOS

          , $k$-steps of the Lanczos method produce

                                    and                      

David F. Gleich (Sandia) ICME la/opt seminar

                    =

             

23 of 50

Page 24: Fast katz-presentation

Tweet along @dgleich

PRACTICAL LANCZOS

Only need to store the last 2 vectors in      

Updating requires O(matvec) work

      is not orthogonal

David F. Gleich (Sandia) ICME la/opt seminar 24 of 50

Page 25: Fast katz-presentation

Tweet along @dgleich

MMQ PROCEDURE

Goal                        

Given                        

1. Run k-steps of Lanczos on     starting with    

2. Compute       ,     with an additional eigenvalue at     ,

set                     3. Compute     ,     with an additional eigenvalue at   , set

                  4. Output               as lower and upper bounds on b

David F. Gleich (Sandia) ICME la/opt seminar

Correspond to a Gauss-Radau rule, with

u as a prescribed node

Correspond to a Gauss-Radau rule, with

l as a prescribed node

25 of 50

Page 26: Fast katz-presentation

Tweet along @dgleich

PRACTICAL MMQ

Increase k to become more accurate

Bad eigenvalue bounds yield worse results

      and     are easy to compute

      not required, we can iteratively

update it’s LU factorization

David F. Gleich (Sandia) ICME la/opt seminar 26 of 50

Page 27: Fast katz-presentation

Tweet along @dgleich

PRACTICAL MMQ

David F. Gleich (Sandia) ICME la/opt seminar 27 of 50

Page 28: Fast katz-presentation

Tweet along @dgleich

ONE LAST STEP FOR KATZ

Katz                    

         

                                                                   

                                     

David F. Gleich (Sandia) ICME la/opt seminar 28 of 50

Page 29: Fast katz-presentation

Tweet along @dgleich

TOP-K ALGORITHM FOR KATZ

Approximate    

                           

where     is sparse

Keep     sparse too

Ideally, don’t “touch” all of    

David F. Gleich (Sandia) ICME la/opt seminar 29 of 50

Page 30: Fast katz-presentation

Tweet along @dgleich

INSPIRATION - PAGERANK

Approximate    

                           

where     is sparse

Keep     sparse too? YES!

Ideally, don’t “touch” all of     ? YES!

David F. Gleich (Sandia) ICME la/opt seminar

McSherry WWW2005, Berkhin 2007, Anderson et al. FOCS2008 – Thanks to Reid Anderson for telling me McSherry did this too.

30 of 50

Page 31: Fast katz-presentation

Tweet along @dgleich

THE ALGORITHM - MCSHERRY

For              

Start with the Richardson iteration

                                                     

Rewrite

                                     

Richardson converges if                        

David F. Gleich (Sandia) ICME la/opt seminar 31 of 50

Page 32: Fast katz-presentation

Tweet along @dgleich

THE ALGORITHM

Note     is sparse.

If                 , then         is sparse.

Idea

only add one component of         to        

David F. Gleich (Sandia) ICME la/opt seminar 32 of 50

Page 33: Fast katz-presentation

Tweet along @dgleich

THE ALGORITHM

For              

Init:                                  

                               

                               

How to pick   ?

David F. Gleich (Sandia) ICME la/opt seminar 33 of 50

Page 34: Fast katz-presentation

Tweet along @dgleich

THE ALGORITHM FOR KATZ

For                          

Init:                                  

                             

                                           

Pick   as max       David F. Gleich (Sandia) ICME la/opt seminar

Storing the non-zeros of the residual in a heap makes picking the max log(n) time. See Anderson et al. FOCS2008 for more

34 of 50

Page 35: Fast katz-presentation

Tweet along @dgleich

CONVERGENCE?

If you pick   as the maximum element, we can show this is convergent if Richardson converges. This proof requires     to be symmetric positive definite.

David F. Gleich (Sandia) ICME la/opt seminar 35 of 50

Page 36: Fast katz-presentation

Tweet along @dgleich

RESULTS - DATA

All unweighted, connected graphs

David F. Gleich (Sandia) ICME la/opt seminar 36 of 50

Page 37: Fast katz-presentation

Tweet along @dgleich

RESULTS – KATZ ALPHAS

Easy    

                               

Hard    

                       

David F. Gleich (Sandia) ICME la/opt seminar 37 of 50

Page 38: Fast katz-presentation

Tweet along @dgleich

PAIRWISE RESULTS

Katz upper and lower bounds

Katz error convergence

Commute-time upper and lower bounds

Commute-time error convergence

For the arXiv graph here

David F. Gleich (Sandia) ICME la/opt seminar 38 of 50

Page 39: Fast katz-presentation

Tweet along @dgleich

KATZ BOUND CONVERGENCE

David F. Gleich (Sandia) ICME la/opt seminar 39 of 50

Page 40: Fast katz-presentation

Tweet along @dgleich

KATZ ERROR CONVERGENCE

David F. Gleich (Sandia) ICME la/opt seminar 40 of 50

Page 41: Fast katz-presentation

Tweet along @dgleich

COMMUTE BOUND CONVERG.

David F. Gleich (Sandia) ICME la/opt seminar 41 of 50

Page 42: Fast katz-presentation

Tweet along @dgleich

COMMUTE ERROR CONVERG.

David F. Gleich (Sandia) ICME la/opt seminar 42 of 50

Page 43: Fast katz-presentation

Tweet along @dgleich

TOP-K RESULTS

Katz set convergence

Katz order convergence

For arXiv graph

David F. Gleich (Sandia) ICME la/opt seminar 43 of 50

Page 44: Fast katz-presentation

Tweet along @dgleich

KATZ SET CONVERGENCE

David F. Gleich (Sandia) ICME la/opt seminar 44 of 50

Page 45: Fast katz-presentation

Tweet along @dgleich

KATZ ORDER CONVERGENCE

David F. Gleich (Sandia) ICME la/opt seminar 45 of 50

Page 46: Fast katz-presentation

Tweet along @dgleich

CONCLUSIONS

These algorithms are faster than many alternatives.

For pairwise commute, stopping criteria are simpler

For top-k, we often need less than 1 matvec for good enough results

David F. Gleich (Sandia) ICME la/opt seminar 46 of 50

Page 47: Fast katz-presentation

Tweet along @dgleich

WARTS

Stopping criteria on our top-k algorithm can be a bit hairy

The top-k approach doesn’t work right for commute time

David F. Gleich (Sandia) ICME la/opt seminar 47 of 50

Page 48: Fast katz-presentation

Tweet along @dgleich

TODO

Try on netflix data!

Explore our “almost commute measure more”

David F. Gleich (Sandia) ICME la/opt seminar 48 of 50

Page 49: Fast katz-presentation

Tweet along @dgleich

F-MEASURE

David F. Gleich (Sandia) ICME la/opt seminar 49 of 50

Page 50: Fast katz-presentation

Tweet along @dgleich

By AngryDogDesign on DeviantArt

Preprint available by request

Slides should be online soon

Code is online already

stanford.edu/~dgleich/

publications/2010/codes/fast-katz

David F. Gleich (Sandia) ICME la/opt seminar 50


Recommended