Compression-based Graph Mining Exploiting Structure Primites

transcript

Compression-based Graph Mining Exploiting

Structure Primitives

Seminar explorative DatenanalyseWerner Hoffmann

19.06.2015

Jing Feng, Xiao He, Nina Hubig, Christian Böhm and Claudia Plant

Outline

- What?- Why?- How?- Conclusion!

Context [1]

Graphs:unweightedundirectedmodelled as adjacency matrixsparse

Social Media DataIs Facebook sparse?-> 1.4 x 10^9 nodes ¹-> on average 340 friends² per node-> 478 x 10^9 edges-> possible edges: 0.9 x 10^18 => only 0,000000156% of all possible edges existYes Facebook is very sparse¹https://en.wikipedia.org/wiki/Facebook ²http://www.statista.com/statistics/232499/americans-who-use-social-networking-sites-several-times-per-day/

http://www.twolfanger.de/wp-content/uploads/2013/06/Degree-Network.png 5

Instagram network Werner

51 friends13 edges[4]

Find values for

transitivity [2] and

hubness of a graph

Outline

What is the benefit of knowing the structure of a graph?- deeper insights in Graph

- lossless compression is possible

- link prediction

- number of clusters

- graph partitioning9

Outline

Basic regular substructures

Trianglestransitivity

Starshubness

Characteristics of CXprime(Compression-based eXploiting Primitives)

Minimum Description Length - based [3]¹

no Input parameters (unsupervised)

Clustering is k-means like

¹https://en.wikipedia.org/wiki/Minimum_description_length 12

Three different ways of coding

- edge

- hub (or star)

- mesh (or triangle)

Coding Example Hub

G={(A,B);(A,C);(A,D);(A,E);(A,F)}

Coding Example Hub

G={(A,B);(A,C);(A,D);(A,E);(A,F)}

G={HUB(A|B,C,D,E,F}

Coding Example Mesh

G={(A,B);(A,C);(A,D);(A,E);(B,C);(B,D);(B,E);(C,D);(C,E);(D,E)}

Coding Example Mesh

G={HUB(A|B,C,D,E);HUB(B|C,D,E);HUB(C|D,E);HUB(D|E)}

Coding Example Mesh

G={HUB(A|B,C,D,E);HUB(B|C,D,E);HUB(C|D,E);HUB(D|E)}

G={M(A,B,C,D,E)}

Coding Example Hub

G={(A,B);(A,C);(A,D);(A,E);(A,F)}

G={HUB(A|B,C,D,E,F}

G={M(A,B);M(A,C);M(A,D);M(A,E);M(A,F);M(A,G)}

Outcomes 1

After coding the graph in a star-coding and in a

triangle-coding you can see which one is the

smallest, so which basic structure is most

common.

all possible connections of Three nodes

Outcomes 2

If you always use the minimum of the three

possible codings you get an overall minimum

graph. This graph is now clustered in areas of

hubs and triangles.

Outline

Critics

- No example how the coding

actually looks like

- given probabilities are not

replicable

Summary

The mentioned results in the paper are really good. The compression rate is extremely high compared to other graph compression algorithms. The clustering results look really good.

Thanks for your attention[1] FENG JING , XIAO HE , NINA HUBIG , CHRISTIAN BÖHM, CLAUDIA PLANT: Compression-based Graph Mining Exploiting Structure Primitives. Data Mining (ICDM), 2013 IEEE 13th International Conference on, 181–190. IEEE, 2013

[2] T. Schank and D. Wagner, “Approximating clustering coefficient and transitivity,” J. Graph Algorithms Appl., vol. 9, no. 2, pp. 265–275, 2005.

[3] J. Rissanen, “An introduction to the mdl principle,” Helsinki Institute for Information Technology, Tech. Rep., 2005.

[4] Python, Pyplot, Instagram API

Compression-based Graph Mining Exploiting Structure Primites

Data & Analytics