Ghislain Fourny
Big Data Fall 2019
14. Graph Databases
pinkyone / 123RF Stock Photo tovovan / 123RF Stock Photo1
Why graph databases?
2
The NoSQL paradigms
foo
bar
foobar
Key-value stores
Triple stores
Column stores Document stores
3
Relational databases...
4
Relational databases...
Entity
Entity
Relationship
5
Relational databases...
Entity
Entity
Relationship
have
expensivejoins!
6
Relational databases...
... are not that
efficient
at relationships!
7
We already know how to partly solve this
though
3NF
0NF
8
We already know how to partly solve this
though
3NF
0NF
... but it has its
limits,too!
9
Traversals...
10
Traversals...
... translate into
multiple joins!11
Reverse traversals...
12
Reverse traversals...
... need
even more indices!13
Traversals...
... translate into
multiple joins!
what if links
would be more
"direct"?
14
Index-free adjacency
15
Graphs
16
Graphs: ingredients
Nodes Edges
17
Graphs: nodes
18
Graphs: edges
19
Graphs: directed graph
20
Graphs: undirected graph
21
Graph representation: adjacency list
A
CB
Node Edges
A [ ]
B [ A, C ]
C [ A ]
22
Graph representation: adjacency matrix
A
CB
A B C
A 0 1 1
B 0 0 0
C 0 1 0
23
Graph representation: incidence matrix
A
CB
1 2 3
A 1 1 0
B -1 0 -1
C 0 -1 1
Edges
Nodes
24
Labeled property graphs: ingredients
Nodes Edges Properties Labels
25
Property graph
26
Properties
Name: Einstein
First name: Albert
Profession: Physicist
27
Labeled graph
28
Labels on nodes
29
Names (labels) on relationships
A
A
A
A
A
B
B
B
A
A
30
Labeled property graph
31
Node with properties and label
Person
Name: Einstein
First name: Albert
Profession: Physicist
In Switzerland
32
Graph database
33
Graph databases: families
Property Graph Triple stores (RDF)
34
Graph databases: native or not
Native Graph DatabaseGraph stored as
RDBMS, document
store, ...
Source Target Name
Alice Bob knows
Eve Bob eavesdrop
Eve Alice eavesdrop
35
RDF
36
Triple-based graph
37
RDF: one triple
ETH Zürich SwitzerlandIs located in
Subject Property Object
38
IRI
http://www.ethz.ch/#school
http://www.example.com/Switzerland
39
Literal
includes XML Schema types!
Foo 2012-12-16
3.1415926535
40
Blank Node
ETH ZürichIs built on ground
Switzerland
Is subset of
41
What can appear where?
Subject Property Object
IRI
Literal
Blank
node
42
Generalized Graphs
Subject Property Object
IRI
Literal
Blank
node
43
Syntax
44
RDF Formats
§ RDF/XML
§ Turtle
§ JSON-LD
§ RDFa
§ N-Triples
45
RDF/XML
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:geo="http://www.example.com/geography#">
<rdf:Description rdf:about="http://www.ethz.ch/#self">
<geo:isLocatedIn
rdf:resource="http://www.example.com/Switzerland"/>
<geo:population>20000</geo:population>
</rdf:Description>
</rdf:RDF>
46
RDF/XML: Subject
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:geo="http://www.example.com/geography#">
<rdf:Description rdf:about="http://www.ethz.ch/#self">
<geo:isLocatedIn
rdf:resource="http://www.example.com/Switzerland"/>
<geo:population>8000000</geo:population>
</rdf:Description>
</rdf:RDF>
47
RDF/XML: Property
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:geo="http://www.example.com/geography#">
<rdf:Description rdf:about="http://www.ethz.ch/#self">
<geo:isLocatedIn
rdf:resource="http://www.example.com/Switzerland"/>
<geo:population>8000000</geo:population>
</rdf:Description>
</rdf:RDF>
48
RDF/XML: Object
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:geo="http://www.example.com/geography#">
<rdf:Description rdf:about="http://www.ethz.ch/#self">
<geo:isLocatedIn
rdf:resource="http://www.example.com/Switzerland"/>
<geo:population>8000000</geo:population>
</rdf:Description>
</rdf:RDF>
49
RDF/XML
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:geo="http://www.example.com/geography#">
<rdf:Description rdf:about="http://www.ethz.ch/#self">
<geo:isLocatedIn
rdf:resource="http://www.example.com/Switzerland"/>
<geo:population>8000000</geo:population>
</rdf:Description>
</rdf:RDF>http://www.example.com/geography#isLocatedIn
50
RDF/XML
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:geo="http://www.example.com/geography#">
<rdf:Description rdf:about="http://www.ethz.ch/#self">
<geo:isLocatedIn
rdf:resource="http://www.example.com/Switzerland"/>
<rdf:type
rdf:resource="http://www.example.com/geography#school"/>
<geo:population>8000000</geo:population>
</rdf:Description>
</rdf:RDF>51
RDF/XML
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:geo="http://www.example.com/geography#">
<rdf:Description rdf:about="http://www.ethz.ch/#self">
<geo:isLocatedIn
rdf:resource="http://www.example.com/Switzerland"/>
<rdf:type
rdf:resource="http://www.example.com/geography#school"/>
<geo:population>8000000</geo:population>
</rdf:Description>
</rdf:RDF>52
RDF/XML
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:geo="http://www.example.com/geography#">
<geo:school rdf:about="http://www.ethz.ch/#self">
<geo:isLocatedIn
rdf:resource="http://www.example.com/Switzerland"/>
<geo:population>20000</geo:population>
</geo:school>
</rdf:RDF>
53
JSON-LD
{
"@context": {
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"geo": "http://www.example.com/geography#"
},
"@id" : "http://www.ethz.ch/#self",
"rdf:type": "geo:school",
"geo:isLocatedIn": "http://www.example.com/Switzerland",
"geo:population" : 8000000
}
54
Turtle
@prefix geo: <http://www.example.com/geography#> .
@prefix countries: <http://www.example.com/> .
@prefix eth: <http://www.ethz.ch/#> .
eth:self geo:isLocated countries:Switzerland .
eth:self geo:population 8000000 .
55
Turtle
@prefix geo: <http://www.example.com/geography#> .
@prefix countries: <http://www.example.com/> .
@prefix eth: <http://www.ethz.ch/#> .
eth:self geo:isLocated countries:Switzerland ;
eth:self geo:population 8000000 .
56
Turtle
@prefix geo: <http://www.example.com/geography#> .
@prefix countries: <http://www.example.com/> .
@prefix eth: <http://www.ethz.ch/#> .
eth:self geo:isLocated countries:Switzerland,
eth:self geo:isLocated countries:Europe ;
eth:self geo:population 8000000 .
57
Querying
58
Querying paradigms
Classical
declarative
querying
Query
by
example
?
59
Two languages
Cypher SPARQL
ETH ZürichSwitzerlandIs located in
ETH ZürichSwitzerlandIs located in
ETH ZürichSwitzerlandIs located in
ETH ZürichSwitzerlandIs located in
ETH ZürichSwitzerlandIs located in
ETH ZürichSwitzerlandIs located in
ETH ZürichSwitzerlandIs located in
ETH ZürichSwitzerlandIs located in
ETH ZürichSwitzerlandIs located in
60
Two languages
Cypher SPARQL
ETH ZürichSwitzerlandIs located in
ETH ZürichSwitzerlandIs located in
ETH ZürichSwitzerlandIs located in
ETH ZürichSwitzerlandIs located in
ETH ZürichSwitzerlandIs located in
ETH ZürichSwitzerlandIs located in
ETH ZürichSwitzerlandIs located in
ETH ZürichSwitzerlandIs located in
ETH ZürichSwitzerlandIs located in
61
Querying labeled property graphs by example
62
Querying labeled property graphs by example
63
Querying labeled property graphs by example
64
Querying labeled property graphs by example
65
Querying labeled property graphs by example
66
Querying labeled property graphs by example
AA
A
B
B
B
B
B
B
B
AB
A
B
67
Querying labeled property graphs by example
AA
A
B
B
B
B
B
B
B
AB
A
B
68
Cypher pattern
AA
A
B
B
B
B
B
B
B
AB
(alpha)-[:A]->(beta)-[:B]->(gamma)
alpha
betagamma
69
Cypher pattern: anchoring a label
AA
A
B
B
B
B
B
B
B
AB
(alpha)
-[:A]->(beta:yellow)
-[:B]->(gamma)
alpha
betagamma
yellow
70
Cypher pattern: filtering a property
AA
A
B
B
B
B
B
B
B
AB
(alpha {name: 'Einstein' })
-[:A]->(beta)
-[:B]->(gamma)
alpha
betagamma
name: Einstein
71
Cypher pattern: anchoring and filtering
AA
A
B
B
B
B
B
B
B
AB
(alpha)
-[:A]->(beta)
-[:B]->(gamma: blue {name: 'ETH'})
alpha
betagamma
name: ETH
blue
72
Cypher pattern: right to left
AA
A
B
B
B
B
B
B
B
AB
(alpha)
-[:A]->(beta)
-[:B]->(gamma)<-[:B]-(delta)
alpha
betagamma
delta
73
Cypher pattern: variable repetition
AA
A
B
B
B
B
B
B
B
AB
(alpha)
-[:A]->(beta)
-[:B]->(gamma)<-[:B]-(delta)
-[:B]->(alpha)
alpha
betagamma
delta
74
Cypher pattern: variable length path
AA
A
B
B
B
B
B
B
B
AB
(alpha)
-[*1..4]->(beta)
alpha
beta
75
Cypher pattern: MATCH clause
MATCH (alpha {name: 'Einstein' })-[:A]->(beta)-[:B]->(gamma)
76
Cypher pattern: MATCH clause
MATCH (alpha {name: 'Einstein' })-[:A]->(beta)-[:B]->(gamma)
RETURN gamma
77
Cypher pattern: WHERE clause
MATCH (alpha {name: 'Einstein' })-[:A]->(beta)-[:B]->(gamma)
RETURN gamma
MATCH (alpha)-[:A]->(beta)-[:B]->(gamma)
WHERE alpha.name = 'Einstein'
RETURN gamma
78
Cypher pattern: CREATE clause
CREATE (einstein:Scientist {name: 'Einstein', first: 'Albert' }),
(eth:University {name: 'ETH Zurich' }),
(einstein)-[:VISITED]->(eth)
79
Other clauses
WITH
DELETE
MERGE
FOREACH
SET
UNION
START
80
Two languages
Cypher SPARQL
ETH ZürichSwitzerlandIs located in
ETH ZürichSwitzerlandIs located in
ETH ZürichSwitzerlandIs located in
ETH ZürichSwitzerlandIs located in
ETH ZürichSwitzerlandIs located in
ETH ZürichSwitzerlandIs located in
ETH ZürichSwitzerlandIs located in
ETH ZürichSwitzerlandIs located in
ETH ZürichSwitzerlandIs located in
81
Querying RDF: SPARQL
PREFIX geo: <http://www.example.com/geography#>
PREFIX countries: <http://www.example.com/>
SELECT ?s
WHERE { ?s geo:isLocatedIn countries:Switzerland }
82
SPARQL
PREFIX geo: <http://www.example.com/geography#>
PREFIX countries: <http://www.example.com/>
SELECT ?s
WHERE { ?s geo:isLocatedIn countries:Switzerland }
83
SPARQL
PREFIX geo: <http://www.example.com/geography#>
PREFIX countries: <http://www.example.com/>
SELECT ?s
WHERE { ?s geo:isLocatedIn countries:Switzerland }
84
SPARQL
PREFIX geo: <http://www.example.com/geography#>
PREFIX countries: <http://www.example.com/>
SELECT ?s
WHERE {
?s geo:isLocatedIn countries:Switzerland .
?s :deliversDiplom :bachelor .
}
85
SPARQL
PREFIX geo: <http://www.example.com/geography#>
PREFIX countries: <http://www.example.com/>
SELECT ?s
WHERE {
?s geo:isLocatedIn ?c .
?c geo:isInContinent geo:America .
}
86
SPARQL
PREFIX geo: <http://www.example.com/geography#>
PREFIX countries: <http://www.example.com/>
SELECT ?s
WHERE {
?s geo:isLocatedIn countries:Switzerland.
?s :deliversDiplom :bachelor
}
LIMIT 10
87
SPARQL
PREFIX geo: <http://www.example.com/geography#>
PREFIX countries: <http://www.example.com/>
SELECT ?s ?name
WHERE {
?s geo:isLocatedIn countries:Switzerland .
?s :deliversDiplom :bachelor .
?s :hasName ?name .
}
ORDER BY ?name
LIMIT 10
88
Architecture (Neo4j)
89
No sharding
90
joins
Graph databases
Document stores
don't like
shardsdon't like
91
Why? Fast traversal
92
Master-slave architecture
Slave
Master
Slave Slave Slave Slave Slave93
Data replication
Slave
Master
Slave Slave94
Data replication
Slave
Master
Slave Slave
Synchronization
95
Data replication (full)
Slave
Master
Slave Slave
Synchronization
96
Read scale-up
Slave97
Writes
Write to the master Write to a slave
or
98
Caching and pages
Fixed-size records
Index-free adjacency
99
Label storage
Person
Jedi
Geek
Person Jedi Geek
100
Properties storage
name first-name Albert
name: Einstein
first-name: Albert
Einstein
101
Relationship storage
A
A
102
Relationship storage
A
A
103
Relationship storage
A
A
104
Relationship storage
A
105
Relationship storage
A
106
Relationship storage
A
107
Relationship storage
A
108
Relationship storage
A
109
Relationship storage
A
110
Relationship storage
A
111
Relationship storage
A
112
Relationship storage
A
113
Relationship storage
A A
114
Relationship storage
A A
115
Relationship storage
A A
116
Relationship storage
A B
A
B
Source
Targets-previous s-next
t-previous t-next
117
Typical sizes
Node: 9 bytes
Relationship: 33 bytes
Relationship name: 5 bytes
Property: 33 bytes
118
Semantics
119
RDF has no semantics
Schwiiz SchoggiChuchichäschtli
120
RDF Schema
Class Property
121
Classes
rdfs:Resource
rdfs:Class
rdf:Property
rdfs:Literal
rdfs:DataType
rdf:HTML
rdf:XMLLiteral
122
Properties
rdf:type
rdfs:label
rdfs:comment
rdfs:range
rdfs:domain
rdfs:subPropertyOf
rdfs:subClassOf
On any resources
On properties
On classes
123
Self-awareness
rdfs:Resource rdfs:Resourcerdf:type
124
Self-awareness
rdfs:Class rdfs:Resourcerdf:subClassOf
125
Self-awareness
rdf:type rdfs:Classrdf:range
126
Self-awareness
rdf:subClassOf rdfs:Propertyrdf:type
127
Simple Entailment (RDF semantics)
I
E
I(E)=true 128
OWL
129
OWL
(In principle) standalone
(Much) More powerful than RDF(S)
130
OWL
<xml/>131
OWL and description logic / AI
132
Entailment (and Syllogisms)
All men are mortal.
Socrates is a man.
Therefore,
Socrates is mortal.
Major
Minor
Conclusion
133
More graph databases...
134
Trees...
135
... and Graphs
136