Date post: | 04-Apr-2018 |
Category: |
Documents |
Upload: | casey-robinson |
View: | 246 times |
Download: | 0 times |
of 34
7/30/2019 DRX Final Presentation Slides
1/34
Performance Analysis
ofHadoop Link Prediction
7/30/2019 DRX Final Presentation Slides
2/34
7/30/2019 DRX Final Presentation Slides
3/34
7/30/2019 DRX Final Presentation Slides
4/34
??
7/30/2019 DRX Final Presentation Slides
5/34
?
? X
7/30/2019 DRX Final Presentation Slides
6/34
Problem Statement
In a networkG=(V,E,X), for a particular uservsand a set of candidates C to which vsmay
create a link, find a predictive function
f:(V,E,X,vs,C)Y
where Y={y1,y2,...,y|C|} is a set of inferredresults for whether uservswould create links
with users in C.
7/30/2019 DRX Final Presentation Slides
7/34
Challenges
Real networks are large >1 billion users on Facebook (Oct. 2012) >500 million users on Twitter (Jul. 2012)
> 175 million users on LinkedIn (Jun. 2012) Big data makes prediction even slower
7/30/2019 DRX Final Presentation Slides
8/34
Our Solution
Divide Adjacency list
Distributed computing
Hadoop
Smaller Problems
Map Reduce
Data Intensive Scie
7/30/2019 DRX Final Presentation Slides
9/34
split 0 map
sort
split 1 map
sort
split 2 map
sort
reduce
merge
reduce
merge
7/30/2019 DRX Final Presentation Slides
10/34
Link Prediction Framewo
Prepare Vertex Num Split DataProbe Edge
NumD
LP ScoreProbe ScoreNon-Exist
ScoreAUC
7/30/2019 DRX Final Presentation Slides
11/34
Algorithm Design
1 2
3
5
6
74
1 25 61 31 42 32 43 42 64 56 7
5 65 7
2 32 42 6 1 2,3,4
2 3,4,63 4,,,,4 5,,,,
5 6,7,,6 7,,,,
2,2,3,
3 4
1 21 31 4
4 5
4 5
3
34
6
Mapper Reducer Mapper
7/30/2019 DRX Final Presentation Slides
12/34
Data Sets
Name Nodes Edges Relative
HepPh 12,008 237,010 1x
ND Web 325,729 1,497,134 7.14x
Live Journal 4,847,571 68,993,773 357.78
7/30/2019 DRX Final Presentation Slides
13/34
7/30/2019 DRX Final Presentation Slides
14/34
7/30/2019 DRX Final Presentation Slides
15/34
7/30/2019 DRX Final Presentation Slides
16/34
7/30/2019 DRX Final Presentation Slides
17/34
7/30/2019 DRX Final Presentation Slides
18/34
Time Breakdown
Which step(s)?
7/30/2019 DRX Final Presentation Slides
19/34
80
60
40
20
0
Time(%o
ftot
al)
ND Web LiveHEP Ph
7/30/2019 DRX Final Presentation Slides
20/34
7/30/2019 DRX Final Presentation Slides
21/34
Machine Specification
26 Nodes 32 GB RAM 12x2 TB SATA disks (4 dedicated to Hadoop stor 2x8-core Intel Xeon E5620 CPUs @ 2.40 GHz Gigabit Ethernet
7/30/2019 DRX Final Presentation Slides
22/34
Monitoring Tools
Resource Command
CPU iostat -c1
Disk iostat -d1
Network netstat -c -I
7/30/2019 DRX Final Presentation Slides
23/34
7/30/2019 DRX Final Presentation Slides
24/34
7/30/2019 DRX Final Presentation Slides
25/34
7/30/2019 DRX Final Presentation Slides
26/34
Disk
7/30/2019 DRX Final Presentation Slides
27/34
0 1000 2000 3000 4000 5000 60
BlocksRead(1kb
locks)
0
40
80
Time (s)
LP Score AUC
7/30/2019 DRX Final Presentation Slides
28/34
7/30/2019 DRX Final Presentation Slides
29/34
Network
7/30/2019 DRX Final Presentation Slides
30/34
0 1000 2000 3000 4000 5000 60
Dat
aReceived(Mb/s)
0
500
1000
Time (s)
LP Score AUC
7/30/2019 DRX Final Presentation Slides
31/34
7/30/2019 DRX Final Presentation Slides
32/34
n = 130000001
7/30/2019 DRX Final Presentation Slides
33/34
n 13000000double left[] = newdouble[n];double right[] = newdouble[n];int n1=0, n2=0;int m = 3*n;
for(int i = 0; i < m; i++){" index1 = rand.nextInt(n);" index2 = rand1.nextInt(n);
" leftScore = left[index1];" rightScore = right[index2];
if(leftScore > rightScore){
n1++;" } else if( Math.abs(leftScore - rightScore) < 1E-6 ){" n2++;
}}
AUC = ( n1 + 0.5 * n2 ) / m;
1234567
89101112131415
1617181920212223
7/30/2019 DRX Final Presentation Slides
34/34