Date post: | 11-Feb-2017 |
Category: |
Data & Analytics |
Upload: | werner-hoffmann |
View: | 101 times |
Download: | 3 times |
Compression-based Graph Mining Exploiting
Structure Primitives
Seminar explorative DatenanalyseWerner Hoffmann
19.06.2015
Jing Feng, Xiao He, Nina Hubig, Christian Böhm and Claudia Plant
Outline
- What?- Why?- How?- Conclusion!
2
Context [1]
Graphs:unweightedundirectedmodelled as adjacency matrixsparse
3
Social Media DataIs Facebook sparse?-> 1.4 x 10^9 nodes ¹-> on average 340 friends² per node-> 478 x 10^9 edges-> possible edges: 0.9 x 10^18 => only 0,000000156% of all possible edges existYes Facebook is very sparse¹https://en.wikipedia.org/wiki/Facebook ²http://www.statista.com/statistics/232499/americans-who-use-social-networking-sites-several-times-per-day/
4
Why
spa
rse?
http://www.twolfanger.de/wp-content/uploads/2013/06/Degree-Network.png 5
Instagram network Werner
6
51 friends13 edges[4]
Goal
Find values for
transitivity [2] and
hubness of a graph
7
Outline
- What?- Why?- How?- Conclusion!
8
What is the benefit of knowing the structure of a graph?- deeper insights in Graph
- lossless compression is possible
- link prediction
- number of clusters
- graph partitioning9
Outline
- What?- Why?- How?- Conclusion!
10
Basic regular substructures
Trianglestransitivity
11
Starshubness
Characteristics of CXprime(Compression-based eXploiting Primitives)
Minimum Description Length - based [3]¹
no Input parameters (unsupervised)
Clustering is k-means like
¹https://en.wikipedia.org/wiki/Minimum_description_length 12
Three different ways of coding
- edge
- hub (or star)
- mesh (or triangle)
13
Coding Example Hub
14
G={(A,B);(A,C);(A,D);(A,E);(A,F)}
Coding Example Hub
15
G={(A,B);(A,C);(A,D);(A,E);(A,F)}
G={HUB(A|B,C,D,E,F}
Coding Example Mesh
16
G={(A,B);(A,C);(A,D);(A,E);(B,C);(B,D);(B,E);(C,D);(C,E);(D,E)}
Coding Example Mesh
17
G={(A,B);(A,C);(A,D);(A,E);(B,C);(B,D);(B,E);(C,D);(C,E);(D,E)}
G={HUB(A|B,C,D,E);HUB(B|C,D,E);HUB(C|D,E);HUB(D|E)}
Coding Example Mesh
18
G={(A,B);(A,C);(A,D);(A,E);(B,C);(B,D);(B,E);(C,D);(C,E);(D,E)}
G={HUB(A|B,C,D,E);HUB(B|C,D,E);HUB(C|D,E);HUB(D|E)}
G={M(A,B,C,D,E)}
Coding Example Hub
19
G={(A,B);(A,C);(A,D);(A,E);(A,F)}
G={HUB(A|B,C,D,E,F}
G={M(A,B);M(A,C);M(A,D);M(A,E);M(A,F);M(A,G)}
Outcomes 1
After coding the graph in a star-coding and in a
triangle-coding you can see which one is the
smallest, so which basic structure is most
common.
20
21
all possible connections of Three nodes
22
23
24
Outcomes 2
If you always use the minimum of the three
possible codings you get an overall minimum
graph. This graph is now clustered in areas of
hubs and triangles.
25
Outline
- What?- Why?- How?- Conclusion!
26
27
28
Critics
- No example how the coding
actually looks like
- given probabilities are not
replicable
29
Summary
The mentioned results in the paper are really good. The compression rate is extremely high compared to other graph compression algorithms. The clustering results look really good.
30
Thanks for your attention[1] FENG JING , XIAO HE , NINA HUBIG , CHRISTIAN BÖHM, CLAUDIA PLANT: Compression-based Graph Mining Exploiting Structure Primitives. Data Mining (ICDM), 2013 IEEE 13th International Conference on, 181–190. IEEE, 2013
[2] T. Schank and D. Wagner, “Approximating clustering coefficient and transitivity,” J. Graph Algorithms Appl., vol. 9, no. 2, pp. 265–275, 2005.
[3] J. Rissanen, “An introduction to the mdl principle,” Helsinki Institute for Information Technology, Tech. Rep., 2005.
[4] Python, Pyplot, Instagram API
31
32