+ All Categories
Home > Documents > doubts.pdf

doubts.pdf

Date post: 08-Nov-2015
Category:
Upload: amar-kaswan
View: 220 times
Download: 0 times
Share this document with a friend
Popular Tags:
42
PageRank, HyperLex and Chu-Liu-Edmonds SNLP 2014 CSE, IIT Kharagpur November 14, 2014 SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 1 / 31
Transcript
  • PageRank, HyperLex and Chu-Liu-Edmonds

    SNLP 2014

    CSE, IIT Kharagpur

    November 14, 2014

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 1 / 31

  • What is a random walk?

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 2 / 31

  • Probability Distributions

    xt(i): probability that the surfer is at node i at time t

    xt+1(i): j (Probability of being at node j)* Pr(j i) = j xt(j)P(j, i)xt+1 = xtP= xt1P

    P= . . . = x0Pt

    What if the surfer keeps walking for a long time?

    Stationary DistributionWhen the distribution does not change anymore, i.e. xT+1 = xTFor well-behaved graphs, this does not depend on the start distribution

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 3 / 31

  • Probability Distributions

    xt(i): probability that the surfer is at node i at time t

    xt+1(i): j (Probability of being at node j)* Pr(j i) = j xt(j)P(j, i)xt+1 = xtP= xt1P

    P= . . . = x0Pt

    What if the surfer keeps walking for a long time?

    Stationary DistributionWhen the distribution does not change anymore, i.e. xT+1 = xTFor well-behaved graphs, this does not depend on the start distribution

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 3 / 31

  • What is a stationary distribution?

    Stationary distribution at a node is related to the amount of time a randomwalker spends visiting that node

    Probability distribution at a node can be written as xt+1 = xtP

    For the stationary distribution (say v0), we have v0 = v0P

    This is the left eigenvector of the transition matrix

    Does a stationary distribution always exist? Is it unique?Yes, if the graph is well-behaved

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 4 / 31

  • Well behaved graphs

    IrreducibleThere is a path from every node to every other node.

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 5 / 31

  • Well behaved graphs

    AperiodicThe GCD of all cycle lengths is 1. The GCD is also called period.

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 6 / 31

  • PageRank (Page and Brin, 1998)

    Basic IntuitionA webpage is important if other important pages point to it

    v(i) =ji

    v(j)degout(j)

    v is the stationary distribution of the Markov chain

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 7 / 31

  • PageRank (Page and Brin, 1998)

    Basic IntuitionA webpage is important if other important pages point to it

    v(i) =ji

    v(j)degout(j)

    v is the stationary distribution of the Markov chain

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 7 / 31

  • Irreducibility and Aperiodicity

    How to guarantee this for a web graph?At any time-step the random surfer

    jumps (teleport) to any other node with probability c

    jumps to its direct neighbors with total probability 1 cP= (1 c)P+ cU

    Uij =1ni, j

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 8 / 31

  • Irreducibility and Aperiodicity

    How to guarantee this for a web graph?

    At any time-step the random surfer

    jumps (teleport) to any other node with probability c

    jumps to its direct neighbors with total probability 1 cP= (1 c)P+ cU

    Uij =1ni, j

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 8 / 31

  • Irreducibility and Aperiodicity

    How to guarantee this for a web graph?At any time-step the random surfer

    jumps (teleport) to any other node with probability c

    jumps to its direct neighbors with total probability 1 cP= (1 c)P+ cU

    Uij =1ni, j

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 8 / 31

  • Computing PageRank: The Power Method

    Start with any distribution x0, e.g. uniform distribution

    Algorithm: multiply x0 by increasing powers of P until convergence

    After one step, x1 = x0P, after k steps xk = x0Pk

    Regardless of where we start, we eventually reach the steady state v0

    Equation covered during summarization

    Ij = k 6=j

    Ik Mk,j+ 1|S|Can you verify that it has the equivalent effect?

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 9 / 31

  • Computing PageRank: The Power Method

    Start with any distribution x0, e.g. uniform distribution

    Algorithm: multiply x0 by increasing powers of P until convergence

    After one step, x1 = x0P, after k steps xk = x0Pk

    Regardless of where we start, we eventually reach the steady state v0

    Equation covered during summarization

    Ij = k 6=j

    Ik Mk,j+ 1|S|

    Can you verify that it has the equivalent effect?

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 9 / 31

  • Computing PageRank: The Power Method

    Start with any distribution x0, e.g. uniform distribution

    Algorithm: multiply x0 by increasing powers of P until convergence

    After one step, x1 = x0P, after k steps xk = x0Pk

    Regardless of where we start, we eventually reach the steady state v0

    Equation covered during summarization

    Ij = k 6=j

    Ik Mk,j+ 1|S|Can you verify that it has the equivalent effect?

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 9 / 31

  • PageRank: Example

    1 1From Introduction to Information Retrieval slidesSNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 10 / 31

  • PageRank: Example

    2 2From Introduction to Information Retrieval slidesSNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 11 / 31

  • PageRank: Example

    3 3From Introduction to Information Retrieval slidesSNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 12 / 31

  • PageRank: Example

    4 4From Introduction to Information Retrieval slidesSNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 13 / 31

  • PageRank: Example

    5 5From Introduction to Information Retrieval slidesSNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 14 / 31

  • HyperLex

    Key Idea: Word Sense InductionInstead of using dictionary defined senses, extract the senses from thecorpus itself

    These corpus senses or uses correspond to clusters of similarcontexts for a word.

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 15 / 31

  • HyperLex

    Key Idea: Word Sense InductionInstead of using dictionary defined senses, extract the senses from thecorpus itself

    These corpus senses or uses correspond to clusters of similarcontexts for a word.

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 15 / 31

  • HyperLex

    Detecting Root HubsDifferent uses of a target word form highly interconnected bundles (orhigh density components)

    In each high density component one of the nodes (hub) has a higherdegree than the others.

    Step 1: Construct co-occurrence graph, G.

    Step 2: Arrange nodes in G in decreasing order of degree.

    Step 3: Select the node from G which has the highest frequency. Thisnode will be the hub of the first high density component.

    Step 4: Delete this hub and all its neighbors from G.

    Step 5: Repeat Step 3 and 4 to detect the hubs of other high densitycomponents

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 16 / 31

  • Step 1: Constructing graph

    For each word to be disambiguated, collect a corpus, consisting ofparagraphs where the word occurs

    Build a co-occurrence graph such thatI Nodes correspond to the words in the textI Two words in the same paragraph are said to co-occur and are connected

    with edgesI Edge-weight is assigned using the relative frequency of co-occurrence.

    wij = 1max{P(wi|wj),P(wj|wi)}where P(wi|wj) = freqijfreqjWords which always occur together receive a weight of 0, words rarelyco-occurring receive a weight close to 1.

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 17 / 31

  • Step 1: Constructing graph

    For each word to be disambiguated, collect a corpus, consisting ofparagraphs where the word occursBuild a co-occurrence graph such that

    I Nodes correspond to the words in the textI Two words in the same paragraph are said to co-occur and are connected

    with edgesI Edge-weight is assigned using the relative frequency of co-occurrence.

    wij = 1max{P(wi|wj),P(wj|wi)}where P(wi|wj) = freqijfreqj

    Words which always occur together receive a weight of 0, words rarelyco-occurring receive a weight close to 1.

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 17 / 31

  • Step 1: Constructing graph

    For each word to be disambiguated, collect a corpus, consisting ofparagraphs where the word occursBuild a co-occurrence graph such that

    I Nodes correspond to the words in the textI Two words in the same paragraph are said to co-occur and are connected

    with edgesI Edge-weight is assigned using the relative frequency of co-occurrence.

    wij = 1max{P(wi|wj),P(wj|wi)}where P(wi|wj) = freqijfreqjWords which always occur together receive a weight of 0, words rarelyco-occurring receive a weight close to 1.

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 17 / 31

  • Steps 2-5

    Step 2: Arrange nodes in G in decreasing order of degree.

    Step 3: Select the node from G which has the highest frequency. Thisnode will be the hub of the first high density component.

    Step 4: Delete this hub and all its neighbors from G.

    Step 5: Repeat Step 3 and 4 to detect the hubs of other high densitycomponents

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 18 / 31

  • HyperLex: Detecting Root Hubs

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 19 / 31

  • Delineating Components

    Each of the hubs is linked to the target node with edges weighting 0.

    Attach each node to the root hub closest to it.

    The distance between two nodes is measured as the smallest sum ofweights of the edges on the paths linking them.

    Compute a Minimum Spanning Tree over G taking the target word as theroot.

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 20 / 31

  • Disambiguation

    Let W = (w1,w2, . . . ,wi, . . . ,wn) be a context in which wi is an instance ofour target word.

    Let wi has k hubs in its minimum spanning tree

    A score vector s is associated with each wj W(j 6= i), such that skrepresents the contribution of the kth hub as:

    sk =1

    1+d(hk,wj)if hk is an ancestor of wj

    si = 0 otherwise.

    All score vectors associated with all wj W(j 6= i) are summed upThe hub which receives the maximum score is chosen as the mostappropriate sense

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 21 / 31

  • Disambiguation

    Let W = (w1,w2, . . . ,wi, . . . ,wn) be a context in which wi is an instance ofour target word.

    Let wi has k hubs in its minimum spanning tree

    A score vector s is associated with each wj W(j 6= i), such that skrepresents the contribution of the kth hub as:

    sk =1

    1+d(hk,wj)if hk is an ancestor of wj

    si = 0 otherwise.

    All score vectors associated with all wj W(j 6= i) are summed upThe hub which receives the maximum score is chosen as the mostappropriate sense

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 21 / 31

  • Chu-Liu-Edmonds Algorithm

    Basic IdeaStarting from all possible connections, find the maximum spanning tree.

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 22 / 31

  • Constructing the graph

    Directed Graph

    For each sentence x, define the directed graph Gx = (Vx,Ex) given by

    Vx = {x0 = root,x1, . . . ,xn}

    Ex = {(i, j) : i 6= j,(i, j) [0 : n] [1 : n]}

    Gx is a graph withthe sentence words and the dummy root symbol as vertices and

    a directed edge between every pair of distinct words and

    a directed edge from the root symbol to every word

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 23 / 31

  • Constructing the graph

    Directed Graph

    For each sentence x, define the directed graph Gx = (Vx,Ex) given by

    Vx = {x0 = root,x1, . . . ,xn}

    Ex = {(i, j) : i 6= j,(i, j) [0 : n] [1 : n]}

    Gx is a graph withthe sentence words and the dummy root symbol as vertices and

    a directed edge between every pair of distinct words and

    a directed edge from the root symbol to every word

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 23 / 31

  • Chu-Liu-Edmonds Algorithm

    Chu-Liu-Edmonds AlgorithmEach vertex in the graph greedily selects the incoming edge with thehighest weight.

    If a tree results, it must be a maximum spanning tree.If not, there must be a cycle.

    I Identify the cycle and contract it into a single vertex.I Recalculate edge weights going into and out of the cycle.

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 24 / 31

  • Chu-Liu-Edmonds Algorithm

    x = John saw Mary

    Build the directed graph

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 25 / 31

  • Chu-Liu-Edmonds Algorithm

    Find the highest scoring incoming arc for each vertex

    If this is a tree, then we have found MST.

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 26 / 31

  • Chu-Liu-Edmonds Algorithm

    If not a tree, identify cycle and contract

    Recalculate arc weights into and out-of cycle

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 27 / 31

  • Chu-Liu-Edmonds Algorithm

    Outgoing arc weightsEqual to the max of outgoing arc over all vertices in cycle

    e.g., John Mary is 3 and saw Mary is 30.

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 28 / 31

  • Chu-Liu-Edmonds Algorithm

    Incoming arc weightsEqual to the weight of best spanning tree that includes head of incomingarc and all nodes in cycle

    root saw John is 40root John saw is 29

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 29 / 31

  • Chu-Liu-Edmonds Algorithm

    Calling the algorithm again on the contracted graph:

    This is a tree and the MST for the contracted graph

    Go back up the recursive call and reconstruct final graph

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 30 / 31

  • Chu-Liu-Edmonds Algorithm

    The edge from wjs to Mary was from saw

    The edge from root to wjs represented a tree from root to saw to John.

    SNLP 2014 (IIT Kharagpur) PageRank, HyperLex and Chu-Liu-Edmonds November 14, 2014 31 / 31