+ All Categories

mm

Date post: 13-Dec-2014
Category:
Upload: jonathan
View: 319 times
Download: 2 times
Share this document with a friend
Description:
mm
Popular Tags:
45
Extrapolation Methods for Accelerating PageRank Computations Sepandar D. Kamvar Taher H. Haveliwala Christopher D. Manning Gene H. Golub Stanford University
Transcript
Page 1: mm

Extrapolation Methods for Accelerating PageRank Computations

Sepandar D. Kamvar

Taher H. Haveliwala

Christopher D. Manning

Gene H. Golub

Stanford University

Page 2: mm

2

Results:

1. The Official Site of the San Francisco Giants

Search: Giants

Results:

1. The Official Site of the New York Giants

Motivation Problem:

Speed up PageRank

Motivation: Personalization “Freshness”

Note: PageRank Computations don’t get faster as computers do.

Page 3: mm

3

0.4

0.2

0.4

(k)1)(k Axx Repeat:

u1 u2 u3 u4 u5

u1 u2 u3 u4 u5

Outline Definition of PageRank

Computation of PageRank

Convergence Properties

Outline of Our Approach

Empirical Results

Page 4: mm

4

Link Counts

Linked by 2 Important Pages

Linked by 2 Unimportant

pages

Sep’s Home Page

Taher’s Home Page

Yahoo! CNNDB Pub Server CS361

Page 5: mm

5

Definition of PageRank

The importance of a page is given by the importance of the pages that link to it.

jBj j

i xN

xi

1

importance of page i

pages j that link to page i

number of outlinks from page j

importance of page j

Page 6: mm

6

Definition of PageRank

1/2 1/2 1 1

0.1 0.10.1

0.05

Yahoo!CNNDB Pub Server

Taher Sep

0.25

Page 7: mm

7

PageRank Diagram

Initialize all nodes to rank

0.333

0.333

0.333

nxi

1)0(

Page 8: mm

8

PageRank Diagram

Propagate ranks across links(multiplying by link weights)

0.167

0.167

0.333

0.333

Page 9: mm

9

PageRank Diagram

0.333

0.5

0.167

)0()1( 1j

Bj ji x

Nx

i

Page 10: mm

10

PageRank Diagram

0.167

0.167

0.5

0.167

Page 11: mm

11

PageRank Diagram

0.5

0.333

0.167

)1()2( 1j

Bj ji x

Nx

i

Page 12: mm

12

PageRank Diagram

After a while…

0.4

0.4

0.2

jBj j

i xN

xi

1

Page 13: mm

13

Computing PageRank Initialize:

Repeat until convergence:

)()1( 1 kj

Bj j

ki x

Nx

i

nxi

1)0(

importance of page i

pages j that link to page i

number of outlinks from page j

importance of page j

Page 14: mm

14

Matrix Notation

jBj j

i xN

xi

1

0 .2 0 .3 0 0 .1 .4 0 .1=

.1

.3

.2

.3

.1

.1

.2

.1

.3

.2

.3

.1

.1TP

x

Page 15: mm

15

Matrix Notation

.1

.3

.2

.3

.1

.1

0 .2 0 .3 0 0 .1 .4 0 .1=

.1

.3

.2

.3

.1

.1

.2

xPx TFind x that satisfies:

Page 16: mm

16

Power Method Initialize:

Repeat until convergence:

(k)T1)(k xPx

T(0)x

nn

1...

1

Page 17: mm

17

PageRank doesn’t actually use PT. Instead, it uses A=cPT + (1-c)ET.

So the PageRank problem is really:

not:

A side note

AxxFind x that satisfies:

xPx TFind x that satisfies:

Page 18: mm

18

Power Method And the algorithm is really . . .

Initialize:

Repeat until convergence:

T(0)x

nn

1...

1

(k)1)(k Axx

Page 19: mm

19

0.4

0.2

0.4

(k)1)(k Axx Repeat:

u1 u2 u3 u4 u5

u1 u2 u3 u4 u5

Outline Definition of PageRank

Computation of PageRank

Convergence Properties

Outline of Our Approach

Empirical Results

Page 20: mm

20

Power Method

u1

1u2

2

u3

3

u4

4

u5

5

Express x(0) in terms of eigenvectors of A

Page 21: mm

21

Power Method

u1

1u2

22

u3

33

u4

44

u5

55

)(1x

Page 22: mm

22

Power Method)2(x

u1

1u2

222

u3

332

u4

442

u5

552

Page 23: mm

23

Power Method

u1

1u2

22k

u3

33k

u4

44k

u5

55k

)(kx

Page 24: mm

24

Power Method

u1

1u2

u3

u4

u5

)(x

Page 25: mm

25

Why does it work?

Imagine our n x n matrix A has n distinct eigenvectors ui.

ii uAu i

n0 uuux n ...221)(

u1

1u2

2

u3

3

u4

4

u5

5

Then, you can write any n-dimensional vector as a linear combination of the eigenvectors of A.

Page 26: mm

26

Why does it work? From the last slide:

To get the first iterate, multiply x(0) by A.

First eigenvalue is 1.

Therefore:

...;1 211

n0 uuux n ...221)(

n

n

(0)(1)

uuu

AuAuAu

Axx

nn

n

...

...

22211

221

n(1) uuux nn ...2221

All less than 1

Page 27: mm

27

Power Method

n0 uuux n ...221)(

u1

1u2

2

u3

3

u4

4

u5

5

u1

1u2

22

u3

33

u4

44

u5

55

n(1) uuux nn ...2221

n)( uuux 2

22221

2 ... nn u1

1u2

222

u3

332

u4

442

u5

552

Page 28: mm

28

The smaller 2, the faster the convergence of the Power Method.

Convergence

n)( uuux k

nnkk ...2221

u1

1u2

22k

u3

33k

u4

44k

u5

55k

Page 29: mm

29

Our Approach

u1 u2 u3 u4 u5

Estimate components of current iterate in the directions of second two eigenvectors, and eliminate them.

Page 30: mm

30

Why this approach? For traditional problems:

A is smaller, often dense. 2 often close to , making the power method slow.

In our problem, A is huge and sparse More importantly, 2 is small1.

Therefore, Power method is actually much faster than other methods.

1(“The Second Eigenvalue of the Google Matrix” dbpubs.stanford.edu/pub/2003-20.)

Page 31: mm

31

Using Successive Iterates

u1

x(0)

u1 u2 u3 u4 u5

Page 32: mm

32

Using Successive Iterates

u1

x(1)

x(0)

u1 u2 u3 u4 u5

Page 33: mm

33

Using Successive Iterates

u1

x(1)

x(0)

x(2)

u1 u2 u3 u4 u5

Page 34: mm

34

Using Successive Iterates

x(0)

u1

x(1)

x(2)

u1 u2 u3 u4 u5

Page 35: mm

35

Using Successive Iterates

x(0)

x’ = u1

x(1)

u1 u2 u3 u4 u5

Page 36: mm

36

How do we do this? Assume x(k) can be written as a linear

combination of the first three eigenvectors (u1, u2, u3) of A.

Compute approximation to {u2,u3}, and subtract it from x(k) to get x(k)’

Page 37: mm

37

Assume Assume the x(k) can be represented by

first 3 eigenvectors of A

33322211 uuuAxx )()( kk

n)( uuux 3221 k

32332

2221

2 uuux )( k

33332

3221

3 uuux )( k

Page 38: mm

38

Linear Combination Let’s take some linear combination of

these 3 iterates.

)()()( xxx 33

22

11

kkk

)( 32332

22212 uuu

)( 33332

32213 uuu

)( 33322211 uuu

Page 39: mm

39

Rearranging Terms We can rearrange the terms to get:

)()()( xxx 33

22

11

kkk

1321 )( u

2323

222212 )( u

3333

232313 )( u

Goal: Find 1,2,3 so that coefficients of u2 and u3 are 0, and coefficient of u1 is 1.

Page 40: mm

40

Summary We make an assumption about the

current iterate. Solve for dominant eigenvector as a

linear combination of the next three iterates.

We use a few iterations of the Power Method to “clean it up”.

Page 41: mm

41

u1 u2 u3 u4 u5

u1 u2 u3 u4 u5

0.4

0.2

0.4

(k)1)(k Axx Repeat:

Outline Definition of PageRank

Computation of PageRank

Convergence Properties

Outline of Our Approach

Empirical Results

Page 42: mm

42

ResultsQuadratic Extrapolation speeds up convergence. Extrapolation was only used 5 times!

Page 43: mm

43

ResultsExtrapolation dramatically speeds up convergence, for high values of c (c=.99)

Page 44: mm

44

Take-home message Speeds up PageRank by a fair amount,

but not by enough for true Personalized PageRank.

Ideas are useful for further speedup algorithms.

Quadratic Extrapolation can be used for a whole class of problems.

Page 45: mm

45

The End Paper available at

http://dbpubs.stanford.edu/pub/2003-16


Recommended