+ All Categories
Home > Documents > Theoretical Justification for Popular Link Prediction Heuristics

Theoretical Justification for Popular Link Prediction Heuristics

Date post: 23-Feb-2016
Category:
Upload: lidia
View: 31 times
Download: 1 times
Share this document with a friend
Description:
Theoretical Justification for Popular Link Prediction Heuristics. Purnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! Research) Andrew W. Moore (Google, Inc.). Link Prediction. Which pair of nodes {i,j} should be connected? Variant: node i is given. - PowerPoint PPT Presentation
31
Theoretical Justification for Popular Link Prediction Heuristics Purnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! Research) Andrew W. Moore (Google, Inc.)
Transcript

Slide 1

Theoretical Justification for Popular Link Prediction HeuristicsPurnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! Research)Andrew W. Moore (Google, Inc.)Link Prediction Which pair of nodes {i,j} should be connected?Variant: node i is givenFriend suggestion in FacebookShould Facebook suggest Alice to Bob as a future friend?BobAliceHere is your current friendlist. I want to suggest friends for you. This is a link prediction problem. Here are the movies you have liked. I want to suggest new movies to you. This is also a link prediction problem. There are a variety of such problems.2Link Prediction Which pair of nodes {i,j} should be connected?Variant: node i is givenAliceBobCharlie

Movie recommendation in NetflixShould Netflix suggest this movie to Alice?Here is your current friendlist. I want to suggest friends for you. This is a link prediction problem. Here are the movies you have liked. I want to suggest new movies to you. This is also a link prediction problem. There are a variety of such problems.3Link PredictionPaper #2Paper #1SVMmarginmaximumclassificationpaper-has-wordpaper-cites-paperpaper-has-wordlargescaleIs paper #1 relevant to the query SVM?Relevance search in databasesLink Prediction

Classifying Hand Written DigitsAre these two digits the same?Zhu et al, 2003Typical ApproachLink prediction problems rely on Homophily similar nodes are more likely to be connected.

Use a graph-based proximity measure between the query node q and other nodesAnd now predict a link between q and the highest ranking node which is not already connected.

Link Prediction HeuristicsPredict link between nodes With the minimum number of hopsWith max common neighbors (length 2 paths)

8 followers1000followersProlific common friendsLess evidenceLess prolific Much more evidenceAliceBobCharlieThe Adamic/Adar score gives more weight to low degree common neighbors.In general this is answered using heuristics. For example predict the pair connected via the minimum number of hops. Or predict the pair with the maximum number of common neighbors. In fact Facebook mentions the number of common neighbors on its friend suggestions. Often it is important to look at the features of the common neighbors. For example a very prolific common neighbor gives much less information about the similarity between two nodes, whereas a less prolific common neighbor indicates that the nodes are likely to be part of a tight niche. The adamid adar score weights the more popular common neighbors less.7Link Prediction HeuristicsPredict link between nodes With the minimum number of hopsWith max common neighbors (length 2 paths)With larger Adamic/AdarWith more short paths (e.g. length 3 paths )

Common neighbors can be extended to number of length 3 paths, or length 4 pathsIn fact some measures examine the ensemble of short paths between two nodes.8Previous Empirical Studies*RandomShortest PathCommon NeighborsAdamic/AdarEnsemble of short pathsLink prediction accuracy**Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007How do we justify these observations?Especially if the graph is sparseThere has been a lot of work on link prediction. Here is a figure summarizing the common trends in these empirical studies. This is not taken from an actual experiment. ..9ApproachLink prediction problems rely on Homophily similar nodes are more likely to be connected.

Different heuristics are trying to predict this underlying or latent nearness of nodes.

Easier to encode this by using a latent-space model for generating links.

Link Prediction Generative Model11Nodes are uniformly distributed in a latent spaceThe problem of link prediction is to find the nearest neighbor who is not currently linked to the node. Equivalent to inferring distances in the latent spaceRaftery et al.s Model:Unit volume universePoints close in this space are more likely to be connected.We will use the latent space model proposed by Raftery et al. The model assumes that the points are uniformly distributed in some latent space. Points close in this space are more likely to be connected. Now, the link prediction problem is simply to find the nearest neighbor in this euclidean space, which is not connected. Hence now, we have to infer the distances in the latent space.11Link Prediction Generative Model121Higher probability of linkingTwo sources of randomness

Point positions: uniform in D dimensional space

Linkage probability: logistic with parameters , r

, r and D are knownradius r determines the steepnessNote that there are two sources of randomness. Point positions are uniformThe probability of linking is a logistic function of pairwise distance. This has two parameters: r and alpha. When the pair are within distance r, probability of linking is higher, in this case, >1/2.12Problem Statement13Generative modelLink Prediction Heuristicsnode aMost likely neighbor of node i ?node bCompareA few properties Can justify the empirical observations We also offer some new prediction algorithmsHere is the problem statement.13Previous Empirical Studies*RandomShortest PathCommon NeighborsAdamic/AdarEnsemble of short pathsLink prediction accuracy*Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007Especially if the graph is sparse14Common NeighborsPr2(i,j) = Pr(common neighbor|dij)

Product of two logistic probabilities, integrated over a volume determined by dij As Logistic Step functionMuch easier to analyze!ijThe math will essentially carry over from this case to the general case, with some looseness of bounds.15Common Neighbors16Everyone has same radius rij

Empirical BernsteinBounds on distanceV(r)=volume of radius r in D dims

=Number of common neighborsUnit volume universe16Common NeighborsOPT = node closest to iMAX = node with max common neighbors with i

Theorem:

dOPT dMAX dOPT + 2[/V(1)]1/D

= c1 (varN/N) + c2/(N-1)

D=dimensionality

w.h.pCommon neighbors is an asymptotically optimal heuristic as N17Common Neighbors: Distinct RadiiNode k has radius rk .

ik if dik rk (Directed graph) rk captures popularity of node k

18ikjType 1: i k jrirj A(ri , rj ,dij)Type 2: i k jikjrkrk A(rk , rk ,dij)Type 2 common neighborsijk1 ~ Bin[N1 , A(r1, r1, dij)]2 ~ Bin[N2 , A(r2, r2, dij)]Example graph: N1 nodes of radius r1 and N2 nodes of radius r2 r1


Recommended