doubts.pdf

PageRank, HyperLex and Chu-Liu-Edmonds

SNLP 2014

CSE, IIT Kharagpur

November 14, 2014

SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 1 / 31

What is a random walk?


Probability Distributions

xt(i): probability that the surfer is at node i at time t

xt+1(i): j (Probability of being at node j)* Pr(j i) = j xt(j)P(j, i)xt+1 = xtP= xt1P

P= . . . = x0Pt

What if the surfer keeps walking for a long time?

Stationary DistributionWhen the distribution does not change anymore, i.e. xT+1 = xTFor well-behaved graphs, this does not depend on the start distribution


What is a stationary distribution?

Stationary distribution at a node is related to the amount of time a randomwalker spends visiting that node

Probability distribution at a node can be written as xt+1 = xtP

For the stationary distribution (say v0), we have v0 = v0P

This is the left eigenvector of the transition matrix

Does a stationary distribution always exist? Is it unique?Yes, if the graph is well-behaved


Well behaved graphs

IrreducibleThere is a path from every node to every other node.


Well behaved graphs

AperiodicThe GCD of all cycle lengths is 1. The GCD is also called period.


PageRank (Page and Brin, 1998)

Basic IntuitionA webpage is important if other important pages point to it

v(i) =ji

v(j)degout(j)

v is the stationary distribution of the Markov chain


Irreducibility and Aperiodicity

How to guarantee this for a web graph?At any time-step the random surfer

jumps (teleport) to any other node with probability c

jumps to its direct neighbors with total probability 1 cP= (1 c)P+ cU

Uij =1ni, j



How to guarantee this for a web graph?

At any time-step the random surfer



Uij =1ni, j



How to guarantee this for a web graph?At any time-step the random surfer



Uij =1ni, j


Computing PageRank: The Power Method

Start with any distribution x0, e.g. uniform distribution

Algorithm: multiply x0 by increasing powers of P until convergence

After one step, x1 = x0P, after k steps xk = x0Pk

Regardless of where we start, we eventually reach the steady state v0

Equation covered during summarization

Ij = k 6=j

Ik Mk,j+ 1|S|Can you verify that it has the equivalent effect?








Ij = k 6=j

Ik Mk,j+ 1|S|

Can you verify that it has the equivalent effect?








Ij = k 6=j

Ik Mk,j+ 1|S|Can you verify that it has the equivalent effect?


PageRank: Example

1 1From Introduction to Information Retrieval slidesSNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 10 / 31

PageRank: Example


HyperLex

Key Idea: Word Sense InductionInstead of using dictionary defined senses, extract the senses from thecorpus itself

These corpus senses or uses correspond to clusters of similarcontexts for a word.


HyperLex

Detecting Root HubsDifferent uses of a target word form highly interconnected bundles (orhigh density components)

In each high density component one of the nodes (hub) has a higherdegree than the others.

Step 1: Construct co-occurrence graph, G.

Step 2: Arrange nodes in G in decreasing order of degree.

Step 3: Select the node from G which has the highest frequency. Thisnode will be the hub of the first high density component.

Step 4: Delete this hub and all its neighbors from G.

Step 5: Repeat Step 3 and 4 to detect the hubs of other high densitycomponents


Step 1: Constructing graph

For each word to be disambiguated, collect a corpus, consisting ofparagraphs where the word occurs

Build a co-occurrence graph such thatI Nodes correspond to the words in the textI Two words in the same paragraph are said to co-occur and are connected

with edgesI Edge-weight is assigned using the relative frequency of co-occurrence.

wij = 1max{P(wi|wj),P(wj|wi)}where P(wi|wj) = freqijfreqjWords which always occur together receive a weight of 0, words rarelyco-occurring receive a weight close to 1.



For each word to be disambiguated, collect a corpus, consisting ofparagraphs where the word occursBuild a co-occurrence graph such that

I Nodes correspond to the words in the textI Two words in the same paragraph are said to co-occur and are connected


wij = 1max{P(wi|wj),P(wj|wi)}where P(wi|wj) = freqijfreqj

Words which always occur together receive a weight of 0, words rarelyco-occurring receive a weight close to 1.



For each word to be disambiguated, collect a corpus, consisting ofparagraphs where the word occursBuild a co-occurrence graph such that

I Nodes correspond to the words in the textI Two words in the same paragraph are said to co-occur and are connected


wij = 1max{P(wi|wj),P(wj|wi)}where P(wi|wj) = freqijfreqjWords which always occur together receive a weight of 0, words rarelyco-occurring receive a weight close to 1.


Steps 2-5

Step 2: Arrange nodes in G in decreasing order of degree.

Step 3: Select the node from G which has the highest frequency. Thisnode will be the hub of the first high density component.

Step 4: Delete this hub and all its neighbors from G.

Step 5: Repeat Step 3 and 4 to detect the hubs of other high densitycomponents


HyperLex: Detecting Root Hubs


Delineating Components

Each of the hubs is linked to the target node with edges weighting 0.

Attach each node to the root hub closest to it.

The distance between two nodes is measured as the smallest sum ofweights of the edges on the paths linking them.

Compute a Minimum Spanning Tree over G taking the target word as theroot.


Disambiguation

Let W = (w1,w2, . . . ,wi, . . . ,wn) be a context in which wi is an instance ofour target word.

Let wi has k hubs in its minimum spanning tree

A score vector s is associated with each wj W(j 6= i), such that skrepresents the contribution of the kth hub as:

sk =1

1+d(hk,wj)if hk is an ancestor of wj

si = 0 otherwise.

All score vectors associated with all wj W(j 6= i) are summed upThe hub which receives the maximum score is chosen as the mostappropriate sense


Chu-Liu-Edmonds Algorithm

Basic IdeaStarting from all possible connections, find the maximum spanning tree.


Constructing the graph

Directed Graph

For each sentence x, define the directed graph Gx = (Vx,Ex) given by

Vx = {x0 = root,x1, . . . ,xn}

Ex = {(i, j) : i 6= j,(i, j) [0 : n] [1 : n]}

Gx is a graph withthe sentence words and the dummy root symbol as vertices and

a directed edge between every pair of distinct words and

a directed edge from the root symbol to every word



Chu-Liu-Edmonds AlgorithmEach vertex in the graph greedily selects the incoming edge with thehighest weight.

If a tree results, it must be a maximum spanning tree.If not, there must be a cycle.

I Identify the cycle and contract it into a single vertex.I Recalculate edge weights going into and out of the cycle.



x = John saw Mary

Build the directed graph



Find the highest scoring incoming arc for each vertex

If this is a tree, then we have found MST.



If not a tree, identify cycle and contract

Recalculate arc weights into and out-of cycle



Outgoing arc weightsEqual to the max of outgoing arc over all vertices in cycle

e.g., John Mary is 3 and saw Mary is 30.



Incoming arc weightsEqual to the weight of best spanning tree that includes head of incomingarc and all nodes in cycle

root saw John is 40root John saw is 29



Calling the algorithm again on the contracted graph:

This is a tree and the MST for the contracted graph

Go back up the recursive call and reconstruct final graph



The edge from wjs to Mary was from saw

The edge from root to wjs represented a tree from root to saw to John.


Date post:	08-Nov-2015
Category:	Documents
Upload:	amar-kaswan
View:	220 times
Download:	0 times

doubts.pdf

Documents