Compression-based Graph Mining Exploiting Structure Primites

Post on 11-Feb-2017

101 views 3 download

transcript

Compression-based Graph Mining Exploiting

Structure Primitives

Seminar explorative DatenanalyseWerner Hoffmann

19.06.2015

Jing Feng, Xiao He, Nina Hubig, Christian Böhm and Claudia Plant

Outline

- What?- Why?- How?- Conclusion!

2

Context [1]

Graphs:unweightedundirectedmodelled as adjacency matrixsparse

3

Social Media DataIs Facebook sparse?-> 1.4 x 10^9 nodes ¹-> on average 340 friends² per node-> 478 x 10^9 edges-> possible edges: 0.9 x 10^18 => only 0,000000156% of all possible edges existYes Facebook is very sparse¹https://en.wikipedia.org/wiki/Facebook ²http://www.statista.com/statistics/232499/americans-who-use-social-networking-sites-several-times-per-day/

4

Instagram network Werner

6

51 friends13 edges[4]

Goal

Find values for

transitivity [2] and

hubness of a graph

7

Outline

- What?- Why?- How?- Conclusion!

8

What is the benefit of knowing the structure of a graph?- deeper insights in Graph

- lossless compression is possible

- link prediction

- number of clusters

- graph partitioning9

Outline

- What?- Why?- How?- Conclusion!

10

Basic regular substructures

Trianglestransitivity

11

Starshubness

Characteristics of CXprime(Compression-based eXploiting Primitives)

Minimum Description Length - based [3]¹

no Input parameters (unsupervised)

Clustering is k-means like

¹https://en.wikipedia.org/wiki/Minimum_description_length 12

Three different ways of coding

- edge

- hub (or star)

- mesh (or triangle)

13

Coding Example Hub

14

G={(A,B);(A,C);(A,D);(A,E);(A,F)}

Coding Example Hub

15

G={(A,B);(A,C);(A,D);(A,E);(A,F)}

G={HUB(A|B,C,D,E,F}

Coding Example Mesh

16

G={(A,B);(A,C);(A,D);(A,E);(B,C);(B,D);(B,E);(C,D);(C,E);(D,E)}

Coding Example Mesh

17

G={(A,B);(A,C);(A,D);(A,E);(B,C);(B,D);(B,E);(C,D);(C,E);(D,E)}

G={HUB(A|B,C,D,E);HUB(B|C,D,E);HUB(C|D,E);HUB(D|E)}

Coding Example Mesh

18

G={(A,B);(A,C);(A,D);(A,E);(B,C);(B,D);(B,E);(C,D);(C,E);(D,E)}

G={HUB(A|B,C,D,E);HUB(B|C,D,E);HUB(C|D,E);HUB(D|E)}

G={M(A,B,C,D,E)}

Coding Example Hub

19

G={(A,B);(A,C);(A,D);(A,E);(A,F)}

G={HUB(A|B,C,D,E,F}

G={M(A,B);M(A,C);M(A,D);M(A,E);M(A,F);M(A,G)}

Outcomes 1

After coding the graph in a star-coding and in a

triangle-coding you can see which one is the

smallest, so which basic structure is most

common.

20

21

all possible connections of Three nodes

22

23

24

Outcomes 2

If you always use the minimum of the three

possible codings you get an overall minimum

graph. This graph is now clustered in areas of

hubs and triangles.

25

Outline

- What?- Why?- How?- Conclusion!

26

27

28

Critics

- No example how the coding

actually looks like

- given probabilities are not

replicable

29

Summary

The mentioned results in the paper are really good. The compression rate is extremely high compared to other graph compression algorithms. The clustering results look really good.

30

Thanks for your attention[1] FENG JING , XIAO HE , NINA HUBIG , CHRISTIAN BÖHM, CLAUDIA PLANT: Compression-based Graph Mining Exploiting Structure Primitives. Data Mining (ICDM), 2013 IEEE 13th International Conference on, 181–190. IEEE, 2013

[2] T. Schank and D. Wagner, “Approximating clustering coefficient and transitivity,” J. Graph Algorithms Appl., vol. 9, no. 2, pp. 265–275, 2005.

[3] J. Rissanen, “An introduction to the mdl principle,” Helsinki Institute for Information Technology, Tech. Rep., 2005.

[4] Python, Pyplot, Instagram API

31

32