+ All Categories
Home > Documents > Link Prediction in Co-Authorship Network

Link Prediction in Co-Authorship Network

Date post: 24-Feb-2016
Category:
Upload: ciel
View: 47 times
Download: 0 times
Share this document with a friend
Description:
Link Prediction in Co-Authorship Network. Le Nhat Minh ( A0074403N) Supervisor: Dongyuan Lu. Introduction. Link prediction Introduce future connections within the network scope Co-authorship network A network of collaborations among researchers, scientists, academic writers. - PowerPoint PPT Presentation
Popular Tags:
37
LINK PREDICTION IN CO-AUTHORSHIP NETWORK Le Nhat Minh ( A0074403N) Supervisor: Dongyuan Lu 1
Transcript
Page 1: Link Prediction in  Co-Authorship Network

1

LINK PREDICTION IN CO-AUTHORSHIP NETWORKLe Nhat Minh ( A0074403N)

Supervisor: Dongyuan Lu

Page 2: Link Prediction in  Co-Authorship Network

2

Introduction• Link prediction

• Introduce future connections within the network scope

• Co-authorship network• A network of collaborations among researchers, scientists,

academic writers

Page 3: Link Prediction in  Co-Authorship Network

3

Introduction• Potential applications

• Recommend experts or group of researchers for individual

researcher.

Page 4: Link Prediction in  Co-Authorship Network

4

Outline• Problem Background

• Related Work

• Workflow

• Conclusion

• Result Analysis

• Research plan

Page 5: Link Prediction in  Co-Authorship Network

5

Problem Background

• What connect researchers together ?

• Given an instance of co-authorship network:

• A researcher connect to another if they collaborated on at least one

paper.

Problem

Background

Related

Work

Workflow

Conclusion

X2001

Y2004

X X

XY

Page 6: Link Prediction in  Co-Authorship Network

6

Problem Background

• How to predict the link?

• Based on criteria:

• Co-authorship network topology

• Researcher’s personal information

• Researcher’s papers

• Boost up link predictions performance

• Recommend link should be really relevant to the interest of the

authors or at least possible for researcher to collaborate.

Problem

Background

Related

Work

Workflow

Conclusion

Page 7: Link Prediction in  Co-Authorship Network

7

Related Work

• Link prediction problems in Social network • Liben‐Nowell, D., & Kleinberg, J., 2007

• Bliss, C. A., Frank, M. R., Danforth, C. M., & Dodds, P. S., 2013

• In social network, interactions among users are very

dynamic with:• Creation of new link within a few days

• Deletion or replacement of the existent links

• Different features present by the two networks• Characteristics of individual researcher : citations, affiliations , institutions, ...

• Characteristics of person : marriage status, ages, working places, …

Problem

Background

Related

Work

Workflow

Conclusion

Page 8: Link Prediction in  Co-Authorship Network

8

• Three mainstream approaches for link prediction:

• Similarity based estimation

• Liben‐Nowell, D., & Kleinberg, J., 2007

• Maximum likelihood estimation

• Murata, T., & Moriyasu, S., 2008

• Guimerà, R., & Sales-Pardo, M., 2009

• Supervised Learning model

• Pavlov, M., & Ichise, R., 2007

• Al Hasan, M., Chaoji, V., Salem, S., & Zaki, M., 2006

Problem

Background

Related

Work

Workflow

Conclusion

Page 9: Link Prediction in  Co-Authorship Network

9

Similarity Based Estimation• Use metrics to estimate proximities of pairs of researchers

• Based on those proximities to rank pairs of researchers

• The top pairs of researchers will likely to be the recommendations.

Problem

Background

Related

Work

Workflow

Conclusion

Page 10: Link Prediction in  Co-Authorship Network

10

Similarity Based Estimation• Network structure based measurement

Some conventions:

Yand X node between Similarity :XYS

X of neighbours ofSet :Γ(X)

Yof neighbours ofSet :Γ(Y)

Ynode of Degree|:Γ(Y)|k(Y)

X node of Degree:|Γ(X)|k(X)

Problem

Background

Related

Work

Workflow

Conclusion

Page 11: Link Prediction in  Co-Authorship Network

11

Similarity Based Estimation• Common Neighbor:

|(Y) (X)| SXY

XY

Problem

Background

Related

Work

Workflow

Conclusion

Page 12: Link Prediction in  Co-Authorship Network

12

Similarity Based Estimation• Jaccard’s coefficient:

|)()(||)()(|

YXYXSXY

XY

Problem

Background

Related

Work

Workflow

Conclusion

Page 13: Link Prediction in  Co-Authorship Network

13

Similarity Based Estimation• Preferential Attachment:

)()( YkXkSXY

XY

Problem

Background

Related

Work

Workflow

Conclusion

Page 14: Link Prediction in  Co-Authorship Network

14

Similarity Based Estimation• Adamic/Adar:

)()( )(log

1YXZ

XY ZkS

XY

Z

Problem

Background

Related

Work

Workflow

Conclusion

Page 15: Link Prediction in  Co-Authorship Network

15

Similarity Based Estimation• Shortest Path:

• Defines the minimum number of edges connecting two nodes.

• PageRank:• A random walk on the graph assigning the probability that a node

could be reach. The proximity between a pair of node can be determined by the sum of the node PageRank.

Problem

Background

Related

Work

Workflow

Conclusion

Page 16: Link Prediction in  Co-Authorship Network

16

Maximum Likelihood Estimation• Predefine specific rules of a network

• Required a prior knowledge of the network

• The likelihood of any non-connected link is calculated according to those rules.

Problem

Background

Related

Work

Workflow

Conclusion

Page 17: Link Prediction in  Co-Authorship Network

17

Supervised Learning Model• Construct dimensional feature vectors

• Fetch these vectors to classifiers to optimize a target function (training model)

• Link prediction becomes a binary classification

Problem

Background

Related

Work

Workflow

Conclusion

Page 18: Link Prediction in  Co-Authorship Network

18

Supervised Learning Model

• Related work (Al Hasan, M., Chaoji, V., Salem, S., & Zaki,

M., 2006) using:• Decision Tree• SVM (Linear Kernel)• K nearest neighbor• Multilayer Perceptron• Naives Bayes• Bagging

• Combine many classifiers (Pavlov, M., & Ichise, R., 2007)• Decision stump + AdaBoost• Decision Tree + AdaBoost• SMO + AdaBoost

Problem

Background

Related

Work

Workflow

Conclusion

Page 19: Link Prediction in  Co-Authorship Network

19

Summary• Similarity based estimation

• Not quite well-perform• Maximum likelihood

• Depend on the network• Supervised learning model

• Perform better than similarity based estimation

Problem

Background

Related

Work

Workflow

Conclusion

Page 20: Link Prediction in  Co-Authorship Network

20

Workflow

Problem

Background

Related

Work

Workflow

Conclusion

Classifier Model Features

Page 21: Link Prediction in  Co-Authorship Network

21

Graph Description

• Co-authorship graph:

• Undirected graph G (V , E)

• Node or Vertex ( Author )

• Author ID

• Author Name

• Link or Edge (Co-authorship)

• Pair of author ID

• List of publication year followed by paper title

(Ex: 2004 :”Introduction to …” )

Problem

Background

Related

Work

Workflow

Conclusion

Page 22: Link Prediction in  Co-Authorship Network

22

Setting up data• Dataset is separated into 2 timing spans: 2000 – 2010

and 2010 – 2013• The first is for training, the latter is for testing.• Currently, there are 134,307 researchers in the network

2000 – 2013.• Crop out authors who are not available in testing period,

remaining 104,265 researchers

Problem

Background

Related

Work

Workflow

Conclusion

Page 23: Link Prediction in  Co-Authorship Network

23

Setting up data• Choose a subset from 104,265 researchers• Experiment on 937 researchers

2000-2010 2010-2013

Real Network

No of node 104,265 104,265

No of link 413,691 35,558

Experiment Network

No. of node 937 937

No. of link 3093 57

Problem

Background

Related

Work

Workflow

Conclusion

Page 24: Link Prediction in  Co-Authorship Network

24

Baseline Features

• Extract features from the network structure:

• Local similarity

• Common Neighbor

• Adamic / Adar

• Preferential Attachment

• Jaccard’s coefficient

• Global similarity

• Shortest Path

• PageRank

Problem

Background

Related

Work

Workflow

Conclusion

Page 25: Link Prediction in  Co-Authorship Network

25

Baseline Features

• Feature for co-authorship network

• Keyword matching (Cohen, S., & Ebel, L., 2013 )

A suggested metric to measure the textual relavancy uses a TF-

IDF based function to determine.

Problem

Background

Related

Work

Workflow

Conclusion

Page 26: Link Prediction in  Co-Authorship Network

26

Proposed FeaturesProductivity of the authors

Observe the “history” of an authorFor example, at a particular node A:

Problem

Background

Related

Work

Workflow

Conclusion

T2 = 2005T0 = 2000 T1 = 2004 T3= 2006

i=0 i=1 i=2 i=3

n=3m=1

n=4m=2

n=6m=2

n=7m=3

n : No. of shared paperm: No. of collaborators

1m1n

0m2n

1m1n

Page 27: Link Prediction in  Co-Authorship Network

27

Proposed Features

α : a constant to assign the weight of each time period

0 1

1

1)(

)(i ii

mmTT

i

TTnn

APiTiT

ii

Problem

Background

Related

Work

Workflow

Conclusion

Productivity of the authorsObserve the “history” of an authorThe “productivity” of node A:

Page 28: Link Prediction in  Co-Authorship Network

28

Training set

• Set up training data

• With n nodes, there is possible links.

• Among those, separate two links

• Positive link: links appear in training years.

• Negative link: the remaining non-existent link in training years.

Note: Avoid bias training by balancing the number of instances between true

and false label.

• Classify all the non-existent links

• Compare with the testing data

2)1( nn

Problem

Background

Related

Work

Workflow

Conclusion

Page 29: Link Prediction in  Co-Authorship Network

29

Experimental Results

• Measurement of performance

• Precision:

• Recall:

• Harmonic mean:

• New links to predict: 57 links

005.0558826

26

P

45.03126

26

R

009.031558826*2

2621

F

Problem

Background

Related

Work

Workflow

Conclusion

Prediction

True Link False Link

True Link 26 31

False Link 5,588 429,778

Page 30: Link Prediction in  Co-Authorship Network

30

Result Analysis

• Possible reasons

• Features

• Small set of data – sampling problem

• Instances of the negative links used for training

Problem

Background

Related

Work

Workflow

Conclusion

Page 31: Link Prediction in  Co-Authorship Network

31

Research Plan• Use weighted graph with parameters:

• No. of papers

• No. of neighbor

• No. of citations

• Focus on features that specifically target the co-authorship network:• Citations

• Institutions

• Enlarge the experiment dataset size

Thank you

Problem

Background

Related

Work

Workflow

Conclusion

Page 32: Link Prediction in  Co-Authorship Network

32

References• Adamic, L. A., & Adar, E. (2003). Friends and neighbors on the web. Social networks,

25(3), 211-230.• Al Hasan, M., Chaoji, V., Salem, S., & Zaki, M. (2006). Link prediction using supervised

learning. In SDM’06: Workshop on Link Analysis, Counter-terrorism and Security.• Liben‐Nowell, D., & Kleinberg, J. (2007). The link‐prediction problem for social networks.

Journal of the American society for information science and technology, 58(7), 1019-1031.

• Pavlov, M., & Ichise, R. (2007). Finding Experts by Link Prediction in Co-authorship Networks. FEWS, 290, 42-55.

• Murata, T., & Moriyasu, S. (2008). Link prediction based on structural properties of online social networks. New Generation Computing, 26(3), 245-257.

• Guimerà, R., & Sales-Pardo, M. (2009). Missing and spurious interactions and the reconstruction of complex networks. Proceedings of the National Academy of Sciences, 106(52), 22073-22078.

• Bliss, C. A., Frank, M. R., Danforth, C. M., & Dodds, P. S. (2013). An Evolutionary Algorithm Approach to Link Prediction in Dynamic Social Networks. arXiv preprint arXiv:1304.6257.

• Cohen, S., & Ebel, L. (2013). Recommending collaborators using keywords. In Proceedings of the 22nd international conference on World Wide Web companion 959-962.

Page 33: Link Prediction in  Co-Authorship Network

33

• Link per year of training set is greater than link per year of testing set:• In testing period, only consider “new” collaborations. • Any collaborations between researchers that already has a link will

be disregarded.

2000-2010 2010-2013No of node 937 937No of link 3093 57

Page 34: Link Prediction in  Co-Authorship Network

34

Results with different classifiersClassifier Precision

(Positive Predictive Value)(%)

Recall(Hit rate)

(%)

F1(Harmonic mean)

(%)

Decision Tree 0.3 24.6 0.5

SMO 0.5 45.6 0.9

Bagging 0.4 28.1 0.7

Naive Bayes 0.2 77.2 0.3

Multilayer Perceptron

0.4 47.3 0.8

Page 35: Link Prediction in  Co-Authorship Network

35

Proposed Feature• The reason for proposing this feature:

• Keep track of the researcher tendency• Give “bonus” to researcher who tend to collaborate with “new”

colleagues rather than “old” ones• Also give high score for prolific researchers (based on number of

published paper)

Page 36: Link Prediction in  Co-Authorship Network

36

Stochastic Block Model• Guimerà, R., & Sales-Pardo, M., 2009

Problem

Background

Related

Work

Workflow

Conclusion

lrll QQMA )1()|L(

in isother theand in is node one that such nodes of pairs of No. :

, group between edges of No. :

connected are , group in nodes y that twoprobabilit :

r

l

Q

Page 37: Link Prediction in  Co-Authorship Network

37

Stochastic Block Model

1

2

3

4

5

6

7

X Y

Problem

Background

Related

Work

Workflow

Conclusion

}}7,6,5,4{},3,2,1{{M

61

65

65

611L

5102

The reliability of an individual link is:

')'()'()'|(

)()|()|1()|1(

dMMpMLMAL

dMMpMALMALAALR xy

xyxy


Recommended