Date post: | 17-Jul-2015 |
Category: | Technology |
View: | 459 times |
Download: | 0 times |
G R A P H T H E O R Y I N P R A CT I S E
D A V I D S I M O N [email protected] S W A M W I T H T U R T L E S
W H O A M I ?
David Simons
@SwamWithTurtles
github.com/SwamWithTurtles
Technical Lead at Softwire and part-time hacker
Statistician in a past life
T O S E E D ATA D O N E R I G H TM Y PA S S I O N
W H AT I S D ATA D O N E R I G H T ?
Choosing the right database;
Using the right mathematical and statistical techniques to leverage its power
S Q L
SQL has had 40 years of academic set theory applied to it
Lets do the same with neo4j!
T O D AY
Concepts in Graph Theory
Theory;
Use Cases;
Implementation Details
Reward: What shape is the internet?
W H AT I S A G R A P H ?G R A P H T H E O R Y
W H AT I S A G R A P H ?
Taken from Jim Webbers Dr. Who Dataset
W H AT I S A G R A P H ?
{ (V, E) : V = [n], E V(2) }
W H AT I S A G R A P H ?
{ (V, E) : V = [n], E V(2) }Made up of two parts,
V and E
W H AT I S A G R A P H ?
{ (V, E) : V = [n], E V(2) }V is a set of n items
W H AT I S A G R A P H ?
Vertex Set
W H AT I S A G R A P H ?
{ (V, E) : V = [n], E V(2) }E is made up of pairs
of elements of V(Ordered and
not necessarily distinct)
W H AT I S A G R A P H ?
Edge Set
G I V I N G R E A L W O R L D M E A N I N G S T O V A N D E
W H A T I S G R A P H I C A L M O D E L L I N G ?
B R I D G E S AT K N I G S B E R G
B R I D G E S AT K N I G S B E R G
V = bits of land
E = bridges
E L E C T I O N D ATA
E L E C T I O N D ATA
E L E C T I O N D ATA
E = (e.g.) member of, held in,
stood in
V = elections, constituencies,
years, politicians and parties
W H E R E D O E S N E O 4 J F I T I N ?
Stores both the vertex set and the edge set as first class objects:
Queryable
Can store properties
Typed
W H Y L E A R N T H E T H E O R Y ?
Tells us what we can do
Lets us utilise many years of academics
Gives us a common language
C A S E S T U D YT H E B R E A K D O W N
T H E B R I T I S H I S L E S
A G R A P H O F
W H AT I S A G R A P H ?
{ (V, E) : V = [n], E V(2) }
W H AT I S A G R A P H ?
{ (V, E) : V = Places of Interest,
E = Places that are connected }
T H E B R I T I S H I S L E S
L O N D O N
L A N D S E N D
O X F O R D
Y O R K
S T. I V E S
T H E B R I T I S H I S L E S
L O N D O N
L A N D S E N D
O X F O R D
Y O R K
S T. I V E S
P L A N A R I T Y
A planar graph is one that can be drawn on paper with its edges crossing
There are easy theories that tell you when a graph is planar
Used for planning construction of roads
C O N N E C T I V I T Y
A graph is connected if there is a path between any two points
A graph is k-connected if you need to remove at least k vertices to stop it being connected
Used for infrastructure robustness studies
S PA N N I N G T R E E
A tree is a graph with no loops
A spanning tree is a graph with tree with every vertex connected
Ensure resources flow through a network
C O L O U R I N GG R A P H T H E O R Y
W E L I K E T H E S I M P L E T H I N G S I N L I F E
M A T H E M A T I C I A N S
C O L O U R I N G I N
M A T H E M A T I C I A N S
C O L O U R I N G I N
Take your graph (V, E)
Vertex Colouring
Assign every vertex a colour such that no two adjacent vertices have the same colour.
T H AT S A L L V E R Y W E L L
O R G A N I S I N G S P O R T S T O U R N A M E N T S
W H Y ?
O R G A N I S I N G S P O R T S T O U R N A M E N T S
Graph Model
V = all matches that must be played
E = a team is the same across two matches
Two vertices the same colour => they can be played simultaneously
O R G A N I S I N G S P O R T S T O U R N A M E N T S
O R G A N I S I N G S P O R T S T O U R N A M E N T S
O T H E R U S E S
Mobile Phone Tower frequency assignment
V = mobile phone towers
E = towers so close their waves will interfere
Colours = frequencies
O T H E R U S E S
Solving SuDokus
V = Squares on a SuDoku grid
E = Knowledge that they must be different numbers
Colours = numbers 1 to 9
O T H E R U S E S
http://watch.neo4j.org/video/74870401Avoiding Deadlocks in Neo4j on Z-Platform
N O J AVA F R A M E W O R K Y E T !
R A N D O M G R A P H SG R A P H T H E O R Y
R A N D O M N E S S S E E M S C A RY
B U T WA I T
R A N D O M N E S S S E E M S C A RY
It can be!
Someone should do a talk about that
https://www.youtube.com/watch?v=rV9dqR0P0lQ
A graph with a fixed number of vertices, whose edges are generated non-deterministically
U S E C A S E SR A N D O M G R A P H S S T I L L H A V E
S T U B B E D T E S T D ATA
U S E C A S E S
S T U B B E D T E S T D ATA
Suppose you have a method that coloured the vertices of a graph
How could you test that?
S T U B B E D T E S T D ATA
S T U B B E D D ATA S E T
A P P LY M E T H O D
A S S E R T T H AT: * E V E R Y N O D E H A S A
C O L O U R * N O T W O A D J A C E N T
N O D E S S H A R E A C O L O U R
S T U B B E D T E S T D ATA
R A N D O M LY G E N E R AT E D D ATA S E T
A P P LY M E T H O D
A S S E R T T H AT: * E V E R Y N O D E H A S A
C O L O U R * N O T W O A D J A C E N T
N O D E S S H A R E A C O L O U R
S I M U L AT I O N A L G O R I T H M S
U S E C A S E S
- N A S D A Q . C O M
solving a problem by performing a large number of trail runs and inferring a solution from the
collective results of the trial runs.
W H Y S I M U L AT I O N ?
Modelling underlying randomness
Underlying question is impossible (or hard) to solve
Trying to model something of which we cannot have full knowledge
A N D
Its possible to use randomness and always be correct
cf. Probabilistic Combinatorics by Paul Erds
H O W C A N W E A C C O M P L I S H I T I N N E O 4 J ?
D I Y
I N T H E O R Y
D I Y
G R A P H A W A R E
I N P R A C T I S E
G R A P H A W A R E
#1 Neo4j Consultancy
Open-sourced a lot of projects under GPL3 including:
TimeTree
Reco
Algorithms
G R A P H A W A R E
G R A P H A W A R E
A graph with a fixed number of vertices, whose edges are generated non-deterministically
E R DS - R E N Y I
Take a graph with n vertices;
For each pair of vertices, randomly connect them with probability p
E R DS - R E N Y I
I W A N T T O M O D E L D ATA A B O U T K E V I N B A C O N
B U T
I W A N T T O M O D E L D ATA A B O U T S P R E A D O F H I V
B U T
I W A N T T O M O D E L D ATA A B O U T S C A L E F R E E N E T W O R K S
B U T
S C A L E F R E E N E T W O R K S
As the system grows, we have:
A small number of highly connected hubs
A large number of sparsely connected nodes
S C A L E F R E E N E T W O R K S
H U B S S PA R S E N O D E S
A C T O R C O W O R K E R S
Blockbuster stars, like Kevin Bacon
Drama college graduate #1828, #1829, #1830
S P R E A D O F H I V
PatriarchsLess privileged society
members
C H E M I C A L R E A C T I O N S
Catalysts Inert Chemicals
S C A L E F R E E N E T W O R K S
B A R A B A S I - A L B E R T
Take a graph with 2 (connected) vertices
Add vertices one at a time such that it is more likely to add vertices to a node that is already connected
Repeat until you have n vertices
B A R A B A S I - A L B E R T
Y O U R R E WA R DR E M E M B E R
I W A N T T O M O D E L D ATA A B O U T T H E I N T E R N E T
B U T
O V E R V I E W
Looking at graph theory can give us a common language
Utilising techniques means we dont have to solve problems from scratch each time (e.g. colouring, simulation)
The internet looks like Kevin Bacons career
A N Y Q U E ST I O N S ?
@ S W A M W I T H T U R T L E SS W A M W I T H T U R T L E S . C O M
Click here to load reader