Hui xie 591r_presentation

The effect of New Links on Google Pagerank

By Hui Xie

Apr , 07

Computing PageRank

Matrix representation

Let P be an n×n matrix and pij be the entry at the

i-th row and j-th column.

If page i has k>0 outgoing links

pij = 1/k if page i has a link to page j

pij = 0 if there is no link from i to j

If page I has no outgoing links

pij = 1/n j=1,…,n

Google matrix

• G=cP+(1-c)(1/n)eeT

e=(1,…,1)T

• G is stochastic matrix Ge=e

• There exists a unique column vector π such that

πT G= πT, πT e=1

• πT =(1-c)/n eT(I-cP)-1

Discrete Time Markov Chains

• A sequence of random variables {Xn} is called a Markov chain if it has the Markov property:

• States are usually labeled {(0,)1,2,…}

• State space can be finite or infinite

Transition Probability

• Probability to jump from state i to state j

• Assume stationary: independent of time

• Transition probability matrix:

P = (pij)

• Two state MC:

Side Topic: Markov Chains• A discrete time stochastic process is a sequence

of random variables {X0, X1, …, Xn, …} where the 0, 1, …, n, … are discrete points in time.

• A Markov chain is a discrete-time stochastic process defined over a finite (or countably infinite) set of states S in terms of a matrix P of transition probabilities.

• Memorylessness property: for a Markov chain• Pr[Xt+1 = j | X0 = i0, X1 = i1, …, Xt = i] =

Pr[Xt+1 = j | Xt = i]

•

Side Topic: Markov Chains• Let πi(t) be the probability of being in state i at time

step t. • Let π(t) = [π0(t), π1(t), … ] be the vector of

probabilities at time t.• For an initial probability distribution π(0), the

probabilities at time n are • π(n) = π(0) Pn

• A probability distribution π is stationary if π = π P• P(Xm+n =j|Xm = i) = P(Xn =j|X0 = i) = Pn(i,j)

absorbing Markov chainDefine a discrete-time absorbing markov chain

{Xt ,t=0,1,…}with the state space {0,1,…,n}

Where transitions between the states 1,…, n are conducted by the matrix cP, and the state 0 is absorbing.

The transition matrix is1 0

(1 )c ecP

−

Random walk interpretation• Walk starts at a uniformly chosen web page• At each step, if currently at page p• W/p α, go to a uniformly chosen

outneighbor of p• W/p 1 - α, stop

• Let Nj be the total number of visits to state j before absorption including the visit at time t = 0 if X0 is j . Formally,

• Then zij=(I-cP)-1ij=E(Nj|X0=I)

• Let qij be the probability of reaching the state j before absorption if the initial state is i. Then we have

{ }0

1 , 1,..., .tj X j

t

N j n∞

==

= =∑

• Theorem Let X denote a Markov chain with state space E. The total number of visits to a state j∈E under the condition that the chain starts in state i is given by

P(Nj=m|X0=j)=qjjm-1(1-qjj)

and for i!=j P(Nj=m|X0=i)= 1-qij if m=0

qij qjjm-1(1-qjj) if m>=1

Corollary For all i,j ∈E the relations zij=(1-qii)-1 and zij=qijzjj hold

Outgoing links from i do not affect qji for any j!=I

So by changing the outgoing links, a page can control its PageRank up to multiplication by a factor zii=1/(1-qii)

For 0<=qii<=c2 , 1<=zii<=(1-c2)-1≈3.6 for c=0.85

Rank one update of google pagerank

• Page 1 with k0 old links has k1 newly created links to page 2 to k1+1

• k=k0+k1 , p1T be the first row of matrix P

• Updated hyperlink matrix1 1

11 1

2

1,

kT T T T

ii

kP P e u u e p

k k

+

=

= + = −∑%

• According to (9) the ranking of page 1 increases when

For z11=1/(1-q11), zi2=qi1z11, i>1

The above is equivalent to

, 1 H e n c e t h e p a g e in c r e a s e s it s r a n k in g w h e n it r e f e r s t o p a g e s

t h a t a r e c h a r a c t e r iz e d b y a h ig h v a lu e o f q i1. T h e s e m u s t b e t h e 1 p a g e s t h a t r e f e r t o p a g e o r a t

le a s t b e lo n g t o t h e s a m e W e b. c o m m u n it y H e r e b y a W e b c o m m u n it y w e m e a n a s e t o f W e b

p a g e s t h a t a s u r f e r c a n r e a c h f r o m o n e t o a n o t h e r in a r e la t iv e ly .s m a ll n u m b e r o f s t e p s

the PageRank of page j increases if

1

1 n

j kjk

cz

nπ

=

−= ∑

1 1

121

k

ij ji

cz z

k

+

=

>∑

the PageRank of page j increases if1 1

121

k

ij ji

cz z

k

+

=

>∑if several new links are added then the PageRank of page j might actually decrease even if this page receives one of the new links.

Such situation occurs when most of newly created links point to “irrelevant” pages.

• For instance, let j = 2 and assume that there is no hyperlink path from pages 3,…,k+1 to page 2.Then zij is close to zero for i = 3,…, k + 1, and the PageRank of page 2 will increase only if (c/k1)z22 > z12, which is not necessarily true, especially if z12 and k1 are considerably large.

Asymptotic analysis• Let be the stopping time of

the first visit to the state j

• Mij=E( |X0=i) be the average time needed to reach j starting from i(mean first passage time)

min{ , }j nn X jτ = ∈ =¥

jτ

• Consider a page i = 1,…,n and assume that i has links to pages i1,…,ik distinct from i. Further, let mij(c) be the mean first passage time from page i to page j for the Google transition matrix G with parameter c.

Optimal Linking Strategy

1

( )iiim c

π =

• outgoing links from i do not affect mji(c) for any j!= i. Thus, by linking from i to j , one can only alter k, this means that the owner of the page I has very little control over its pagerank. The best that he can do is to link only to one page j* such that

• * ( ) min{ ( )}jij i j

m c m c=

Note that (surprisingly) the PageRank of j* plays no role here.

• Theorem. The optimal linking strategy for a Web page is to have only one outgoing link pointing to a Web page with a shortest mean first passage time back to the original page.

Conclusions

• Our main conclusion is that a Web page cannot significantly manipulate its PageRank by changing its outgoing links.

• Furthermore, keeping a logical hyperlink structure and linking to a relevant Web community is the most sensible and rewarding policy.

Date post:	02-Jul-2015
Category:	Technology
Upload:	ramiro
View:	293 times
Download:	0 times

Hui xie 591r_presentation

Technology