Using Adaptive Methods for Updating/Downdating PageRank Gene H. Golub Stanford University SCCM Joint...

Using Adaptive Methods for Updating/Downdating PageRank

Gene H. GolubStanford University SCCM

Joint Work With Sep Kamvar, Taher Haveliwala

2

Motivation Problem:

Compute PageRank after the Web has changed slightly

Motivation: “Freshness”

Note: Since the web is growing, PageRank Computations don’t get faster as computers do.

3

Outline 0.4

0.2

0.4

(k)1)(k Axx Power Method:

0

50

100

150

200

250

300

350

400

Definition of PageRank

Computation of PageRank

Convergence Properties

Outline of Our Approach

Empirical Results

4

Link Counts

Linked by 2 Important Pages

Linked by 2 Unimportant

pages

Martin’s Home Page

Gene’s Home Page

Yahoo! Iain Duff’s Home Page

George W. Bush Donald Rumsfeld

5


The importance of a page is given by the importance of the pages that link to it.

jBj j

i xN

xi

1

importance of page i

pages j that link to page i

number of outlinks from page j

importance of page j

6


1/2 1/2 1 1

0.1 0.10.1

0.05

DuffYahoo!SCCM

Martin

Gene

0.25

7

PageRank Diagram

Initialize all nodes to rank

0.333

0.333

0.333

nxi

1)0(

8

PageRank Diagram

Propagate ranks across links(multiplying by link weights)

0.167

0.167

0.333

0.333

9

PageRank Diagram

0.333

0.5

0.167

)0()1( 1j

Bj ji x

Nx

i

10

PageRank Diagram

0.167

0.167

0.5

0.167

11

PageRank Diagram

0.5

0.333

0.167

)1()2( 1j

Bj ji x

Nx

i

12

PageRank Diagram

After a while…

0.4

0.4

0.2

jBj j

i xN

xi

1

13

Matrix Notation

jBj j

i xN

xi

1

0 .2 0 .3 0 0 .1 .4 0 .1=

.1

.3

.2

.3

.1

.1

.2

.1

.3

.2

.3

.1

.1TP

x

14

Matrix Notation

.1

.3

.2

.3

.1

.1

0 .2 0 .3 0 0 .1 .4 0 .1=

.1

.3

.2

.3

.1

.1

.2

xPx TFind x that satisfies:

15

Eigenvalue Distribution The matrix PT has several eigenvalues on

the unit circle. This will make power method-like algorithms less effective.

16

PageRank doesn’t actually use PT. Instead, it uses A=cPT + (1-c)ET.

E is a rank 1 matrix, and in general, c=0.85. This ensures a unique solution and fast convergence. For matrix A, 2=c. 1 1From “The Second Eigenvalue of the Google Matrix” (http://dbpubs.stanford.edu/pub/2003-20)

Rank-1 Correction

17

Outline Definition of PageRank




Empirical Results

u1 u2 u3 u4 u5

u1 u2 u3 u4 u5

0.4

0.2

0.4

(k)1)(k Axx Repeat:

18

Power Method

Initialize:

Repeat until convergence:

T(0)x

nn

1...

1

(k)1)(k Axx

19

Power Method

u1

1u2

2

u3

3

u4

4

u5

5

Express x(0) in terms of eigenvectors of A

20

Power Method

u1

1u2

22

u3

33

u4

44

u5

55

)(1x

21

Power Method)2(x

u1

1u2

222

u3

332

u4

442

u5

552

22

Power Method

u1

1u2

22k

u3

33k

u4

44k

u5

55k

)(kx

23

Power Method

u1

1u2

u3

u4

u5

)(x

24

Why does it work?

Imagine our n x n matrix A has n distinct eigenvectors ui.

ii uAu i

n0 uuux n ...221)(

u1

1u2

2

u3

3

u4

4

u5

5

Then, you can write any n-dimensional vector as a linear combination of the eigenvectors of A.

25

Why does it work? From the last slide:

To get the first iterate, multiply x(0) by A.

First eigenvalue is 1.

Therefore:

...;1 211

n0 uuux n ...221)(

n

n

(0)(1)

uuu

AuAuAu

Axx

nn

n

...

...

22211

221

n(1) uuux nn ...2221

All less than 1

26

Power Method

n0 uuux n ...221)(

u1

1u2

2

u3

3

u4

4

u5

5

u1

1u2

22

u3

33

u4

44

u5

55

n(1) uuux nn ...2221

n)( uuux 2

22221

2 ... nn u1

1u2

222

u3

332

u4

442

u5

552

27





Empirical Results

u1 u2 u3 u4 u5

u1 u2 u3 u4 u5

0.4

0.2

0.4

(k)1)(k Axx Repeat:

28

The smaller 2, the faster the convergence of the Power Method.

Convergence

n)( uuux k

nnkk ...2221

u1

1u2

22k

u3

33k

u4

44k

u5

55k

29

Quadratic Extrapolation (Joint work with Kamvar and Haveliwala)

u1 u2 u3 u4 u5

Estimate components of current iterate in the directions of second two eigenvectors, and eliminate them.

30

Facts that work in our favor For traditional problems:

A is smaller, often dense. 2 often close to , making the power method slow.

In our problem, A is huge and sparse More importantly, 2 is small1.

1(“The Second Eigenvalue of the Google Matrix” dbpubs.stanford.edu/pub/2003-20.)

31

How do we do this? Assume x(k) can be written as a linear

combination of the first three eigenvectors (u1, u2, u3) of A.

Compute approximation to {u2,u3}, and subtract it from x(k) to get x(k)’

32

Sequence Extrapolation A classical and important field in

numerical analysis: techniques for accelerating the convergence of slowly convergent infinite series and integrals.

33

Example: Aitken Δ2 - ProcessSuppose A=An+aλn+rn

where rn=bμn+o(min{1,|μ|n}),a, b, λ, μ all nonzero, |λ|>|μ|.It can be shown that

Sn = (AnAn+2–An+12)/(An-2An+1+An+2)

satisfies (as n goes to infinity)

| Sn-A|--------- O( (|μ|/|λ|)n = o(1).

|An-A| ….

34

In other words…

Assuming a certain pattern for the series is helpful in accelerating convergence.

We can apply this component-wise in order to get a better estimate of the eigenvector.

35

Another approach Assume the x(k) can be represented by

three eigenvectors of A:

33322211 uuuAxx )()( kk

n)( uuux 3221 k

32332

2221

2 uuux )( k

33332

3221

3 uuux )( k

36

Linear Combination We take some linear combination of

these 3 iterates.

)()()( xxx 33

22

11

kkk

)( 32332

22212 uuu

)( 33332

32213 uuu

)( 33322211 uuu

37

Rearranging Terms We can rearrange the terms to get:

)()()( xxx 33

22

11

kkk

1321 )( u

2323

222212 )( u

3333

232313 )( u

Goal: Find 1,2,3 so that coefficients of u2 and u3 are 0, and coefficient of u1 is 1.

38

Rearranging Terms We can rearrange the terms to get:

)()()( xxx 33

22

11

kkk

1321 )( u

2323

222212 )( u

3333

232313 )( u

Goal: Find 1,2,3 so that coefficients of u2 and u3 are 0, and coefficient of u1 is 1.

39

ResultsQuadratic Extrapolation speeds up convergence. Extrapolation was only used 5 times.

40

Estimating the coefficients

Procedure 1:

Set ß1=1 and solve the least squares problem.

Procedure 2:

Use the SVD for computing the coefficient of the characteristic polynomial.

41

ResultsExtrapolation dramatically speeds up convergence, for high values of c (c=.99)

42

Take-home message Quadratic Extrapolation estimates the

components of current iterate in the direction of the second and third eigenvector, and subtracts them off.

Achieves significant speedup, and ideas are useful for further speedup algorithms.

43

Summary of this part We make an assumption about the

current iterate. Solve for dominant eigenvector as a

linear combination of the next three iterates.

We use a few iterations of the Power Method to “clean it up”.

44

0.4

0.2

0.4

(k)1)(k Axx Power Method:





Empirical Results

45

Most Pages Converge Quickly

46

Most Pages Converge Quickly

47

Basic Idea When a the PageRank of a page has

converged, stop recomputing it.

48

Adaptive PageRank Algorithm

49

Updates Use the previous vector as a start vector. Speedup not that great. Why? The old pages converge quickly,

but the new pages still take long to converge.

But, if you use Adaptive PageRank, you save the computation on the old pages.

50

0

100

200

300

400

0.4

0.2

0.4

(k)1)(k Axx Repeat:





Empirical Results

51

Empirical Results

0

50

100

150

200

250

300

350

400

MFlops

PageRank

PR using lastmonth's PRAPR using lastmonth's PR

3 Update Algorithms on Stanford Web

(n=700,000)

52

Take-home message Simply not recomputing PageRank of

pages that have converged after an update speeds up PageRank by a factor of 2.

53

An Arnoldi/SVD approach (joint work with C. Greif)

Perform Arnoldi (of degree k<<n) on A. Compute the SVD of the (k+1)-by-k unreduced

Hessenberg matrix, after subtracting the augmented identity matrix from it first.

Compute the linear combinations of the columns of the Arnoldi vectors Q with the null vector of H.

Use resulting vector as the new guess for the Arnoldi procedure.

Repeat until satisfied.

54

Advantages We *do* take advantage of knowing the largest

eigenvalue. (As opposed to most general purpose eigensolve-packages.)

Computing the corresponding eigenvector does not rely on prohibitive inversions or decompositions. (Matrix is BBBBBIIIIIGGGG!!)

Orthogonalizing ‘feels right’ from a numerical linear algebra point of view.

Smooth convergence behavior. Overhead is minimal.

Date post:	27-Dec-2015
Category:	Documents
Upload:	howard-wilcox
View:	218 times
Download:	1 times

Using Adaptive Methods for Updating/Downdating PageRank Gene H. Golub Stanford University SCCM Joint...

Documents