+ All Categories
Home > Documents > The Link Prediction Problem for Social Networks David Libel-Nowell, MIT John Klienberg, Cornell...

The Link Prediction Problem for Social Networks David Libel-Nowell, MIT John Klienberg, Cornell...

Date post: 13-Jan-2016
Category:
Upload: gabriel-parsons
View: 214 times
Download: 0 times
Share this document with a friend
30
The Link Prediction Problem for Social Networks David Libel-Nowell, MIT John Klienberg, Cornell Saswat Mishra sxm111131
Transcript
Page 1: The Link Prediction Problem for Social Networks David Libel-Nowell, MIT John Klienberg, Cornell Saswat Mishra sxm111131.

The Link Prediction Problem for Social Networks

David Libel-Nowell, MITJohn Klienberg, Cornell

Saswat Mishra sxm111131

Page 2: The Link Prediction Problem for Social Networks David Libel-Nowell, MIT John Klienberg, Cornell Saswat Mishra sxm111131.

Summary

The “Link Prediction Problem”

Given a snapshot of a social network, can we infer which new interactions among its members are likely to occur in the near future?

Based on “proximity” of nodes in a network

Page 3: The Link Prediction Problem for Social Networks David Libel-Nowell, MIT John Klienberg, Cornell Saswat Mishra sxm111131.

Introduction

Natural examples of social networks:

Nodes = people/entitiesEdges = interaction/ collaboration

Nodes Edges

Scientists in a discipline Co-authors of a paper

Employees in a large company

Working on a project

Business Leaders Serve together on a board

Page 4: The Link Prediction Problem for Social Networks David Libel-Nowell, MIT John Klienberg, Cornell Saswat Mishra sxm111131.

Motivation

Understanding how social networks evolve

The link prediction problem Given a snapshot of a social network at time t, we seek

to accurately predict the edges that will be added to the network during the interval (t, t’)

?

Page 5: The Link Prediction Problem for Social Networks David Libel-Nowell, MIT John Klienberg, Cornell Saswat Mishra sxm111131.

Why?

To suggest interactions or collaborations that haven’t yet been utilized within an organization

To monitor terrorist networks - to deduce possible interaction between terrorists (without direct evidence)

Used in Facebook and Linked In to suggest friends

Open Question: How does Facebook do it?

(friends of friends, same school, manually…)

Page 6: The Link Prediction Problem for Social Networks David Libel-Nowell, MIT John Klienberg, Cornell Saswat Mishra sxm111131.

Motivation

Co-authorship network for scientists

Scientists who are “close” in the network will have common colleagues & circles – likely to collaborateCaveat: Scientists who have never collaborated might in future - hard to predict

Goal: make that intuitive notion precise; understand which measures of “proximity” lead to accurate predictions

A

B

C

D

Page 7: The Link Prediction Problem for Social Networks David Libel-Nowell, MIT John Klienberg, Cornell Saswat Mishra sxm111131.

Goals

Present measures of proximity

Understand relative effectiveness of network proximity measures (adapted from graph theory, CS, social sciences)

Prove that prediction by proximity outperforms random predictions by a factor of 40 to 50

Prove that subtle measures outperform more direct measures

Page 8: The Link Prediction Problem for Social Networks David Libel-Nowell, MIT John Klienberg, Cornell Saswat Mishra sxm111131.

Data and Experimental Setup

Co-authorship network (G) from “author list” of the physics e-Print arXiv (www.arxiv.org)

Took 5 such networks from 5 sections of the print

A

B

C

D

A

B

C

Core: set of authors who have at least 3 papers during both training and test

G[1994,1996] = Gcollab = (A,Eold) Enew = new collaborations (edges)

Training interval [1994,1996]Ktraining = 3

Test interval [1997,1999]Ktest = 3

Page 9: The Link Prediction Problem for Social Networks David Libel-Nowell, MIT John Klienberg, Cornell Saswat Mishra sxm111131.

Data

Page 10: The Link Prediction Problem for Social Networks David Libel-Nowell, MIT John Klienberg, Cornell Saswat Mishra sxm111131.

Methods for Link Prediction

Take the input graph during training period Gcollab

Pick a pair of nodes (x, y) Assign a connection weight score(x, y) Make a list in descending order of score

score is a measure of proximity

Any ideas for measures?

Page 11: The Link Prediction Problem for Social Networks David Libel-Nowell, MIT John Klienberg, Cornell Saswat Mishra sxm111131.

Proximity Measures for Link Prediction

Page 12: The Link Prediction Problem for Social Networks David Libel-Nowell, MIT John Klienberg, Cornell Saswat Mishra sxm111131.

Graph distance & Common Neighbors

Graph distance: (Negated) length of shortest path between x and y

Common Neighbors: A and C have 2 common neighbors, more likely to collaborate

A

B

C

D

E

(A, C) -2

(C, D) -2

(A, E) -3

A

B

C

D

E

Page 13: The Link Prediction Problem for Social Networks David Libel-Nowell, MIT John Klienberg, Cornell Saswat Mishra sxm111131.

Jaccard’s coefficient and Adamic / Adar

Jaccard’s coefficient: same as common neighbors, adjusted for degree

Adamic / Adar: weighting rarer neighbors more heavily

A

B

C

DE

Page 14: The Link Prediction Problem for Social Networks David Libel-Nowell, MIT John Klienberg, Cornell Saswat Mishra sxm111131.

Preferential Attachment

Probability that a new collaboration involves x is proportional to T(x), current neighbors of x

score (x, y) :=

Page 15: The Link Prediction Problem for Social Networks David Libel-Nowell, MIT John Klienberg, Cornell Saswat Mishra sxm111131.

Considering all paths: Katz

Katz: measure that sums over the collection of paths, exponentially damped by length (to count short paths heavily)

β is chosen to be a very small value (for

dampening) A

B

C

DE

Page 16: The Link Prediction Problem for Social Networks David Libel-Nowell, MIT John Klienberg, Cornell Saswat Mishra sxm111131.

Hitting time, PageRank

Hitting time: expected number of steps for a random walk starting at x to reach y

Commute time:

If y has a large stationary probability, Hx,y is small. To counterbalance, we can normalize

PageRank: to cut down on long random walks, walk can return to x with a probablity α at every step y

Page 17: The Link Prediction Problem for Social Networks David Libel-Nowell, MIT John Klienberg, Cornell Saswat Mishra sxm111131.

SimRank

Defined by this recursive definition: two nodes are similar to the extent that they are joined by similar neighbors

Page 18: The Link Prediction Problem for Social Networks David Libel-Nowell, MIT John Klienberg, Cornell Saswat Mishra sxm111131.

Low-rank approximation

Treat the graph as an adjacency matrix

Compute the rank-k matrix Mk (noise-reduction) x is a row, y is a row, score(x, y) = inner product of rows

r(x) and r(y)

-A B C

A 1 0

B 1 1

C 0 1

Page 19: The Link Prediction Problem for Social Networks David Libel-Nowell, MIT John Klienberg, Cornell Saswat Mishra sxm111131.

Unseen bigrams and Clustering

Unseen bigrams: Derived from language modeling Estimating frequency of unseen bigrams – pairs of

words (nodes here) that co-occur in a test corpus but not in the training corpus

Clustering: deleting tenuous edges in Gcollab through a clustering procedure and running predictors on the “cleaned-up” subgraph

Page 20: The Link Prediction Problem for Social Networks David Libel-Nowell, MIT John Klienberg, Cornell Saswat Mishra sxm111131.

Results

The results are presented as:

1. Factor improvement of proposed predictors over Random predictor Graph distance predictor Common neighbors predictor

2. Relative performance vs. the above predictors 3. Common Predictions

Page 21: The Link Prediction Problem for Social Networks David Libel-Nowell, MIT John Klienberg, Cornell Saswat Mishra sxm111131.

Factor Improvement of different measures

Page 22: The Link Prediction Problem for Social Networks David Libel-Nowell, MIT John Klienberg, Cornell Saswat Mishra sxm111131.

Factor Improvement - meta approaches

Page 23: The Link Prediction Problem for Social Networks David Libel-Nowell, MIT John Klienberg, Cornell Saswat Mishra sxm111131.

Relative performance vs. Random Predictions

Page 24: The Link Prediction Problem for Social Networks David Libel-Nowell, MIT John Klienberg, Cornell Saswat Mishra sxm111131.

vs. graph distance predictor, vs. common neighbors predictor

a

Page 25: The Link Prediction Problem for Social Networks David Libel-Nowell, MIT John Klienberg, Cornell Saswat Mishra sxm111131.

Common Predictions

a

Page 26: The Link Prediction Problem for Social Networks David Libel-Nowell, MIT John Klienberg, Cornell Saswat Mishra sxm111131.

Conclusions

No single clear winner

Many outperform the random predictor => there is useful information in the network topology

Katz + clustering + low-rank approximation perform significantly well

Some simple measures i.e. common neighbors and Adamic/ Adar perform well

Page 27: The Link Prediction Problem for Social Networks David Libel-Nowell, MIT John Klienberg, Cornell Saswat Mishra sxm111131.

Critique

Even the best predictor (Katz on gr-qc) is correct on only 16% of predictions

How good is that?

Treat all collaborations equally. Perhaps, treating recent collaborations as more important than older ones will help?

Page 28: The Link Prediction Problem for Social Networks David Libel-Nowell, MIT John Klienberg, Cornell Saswat Mishra sxm111131.

References

Lada A. Adamic and Eytan Adar. Friends and neighbors on the web. Social Networks, 25(3):211{230, July 2003.

A. L. Barabasi, H. Jeong, Z. N eda, E. Rav asz, A. Schubert, and T. Vicsek. Evolution of the social network of scientist collaboration. Physica A, 311(3{4):590{614, 2002.

Sergey Brin and Lawrence Page. The anatomy of a large-scale hyper textual Web search engine Computer Networks and ISDN Systems, 30(1{7):107{117, 1998.

Rodrigo De Castro and Jerrold W. Grossman. F amous trails to Paul Erdos. Mathematical Intelligencer, 21(3):51{63, 1999.

Page 29: The Link Prediction Problem for Social Networks David Libel-Nowell, MIT John Klienberg, Cornell Saswat Mishra sxm111131.

Question

Question???

Page 30: The Link Prediction Problem for Social Networks David Libel-Nowell, MIT John Klienberg, Cornell Saswat Mishra sxm111131.

Thank You


Recommended