+ All Categories
Home > Data & Analytics > 10. Graph Databases

10. Graph Databases

Date post: 28-Jul-2015
Category:
Upload: fabio-fumarola
View: 2,642 times
Download: 0 times
Share this document with a friend
Popular Tags:
65
Graph Oriented Databases Ciao ciao Vai a fare ciao ciao Dr. Fabio Fumarola
Transcript
Page 1: 10. Graph Databases

Graph OrientedDatabases

Ciaociao

Vai a fare

ciao ciaoDr. Fabio Fumarola

Page 2: 10. Graph Databases

Outline• Introduction• The Lack of relationship for RDBMS and NoSQL• Graph Databases: Features• Relations• Query Language• Data Modeling with Graphs• Conclusions

2

Page 3: 10. Graph Databases

Introduction• We live in a connected world• Everything is connected: Social Network, Biology,

Bioinformatics• The NoSQL databases analyzed store data using

aggregate• Here we compare graph databases with relational

databases and aggregate NOSQL in storing graph data

3

Page 4: 10. Graph Databases

Three Facts1. Relational Databases Lack Relationships

1. NoSQL Databases also Lack Relationships

2. Graph Databases Embrace Relationships

4

Page 5: 10. Graph Databases

Relational Databases Lack Relationships

• For decades we tried to accommodate connected, semi-structured datasets inside relational databases.

• But: – relational databases are designed to codify tabular

structures– They struggle when modeling ad hoc exceptional

relationships that are in real world.

5

Page 6: 10. Graph Databases

Relational Databases Lack Relationships• Relationships in relational database only mean joining tables• But we want to model the semantic of relationships that

connect real world• As outlier data multiplies:

1. The structure of the dataset becomes more complex and less uniform

2. The relational data becomes more complex and less uniform (large join tables, sparsely populated rows, a lot o null values)

6

Page 7: 10. Graph Databases

Example of customer-centric orders• Complex joins• Foreign key constraints• Sparse table with null

values• Reciprocal queries are

costly “What products did a customer buy?”

7

Page 8: 10. Graph Databases

NoSQL Databases Also Lack Relationships

• key-value, document, or column-oriented store sets of disconnected documents/values/columns

• One well-known strategy for adding relationships is to embed an aggregate’s identifier inside the field belonging to another aggregate

• But this require joins at the application level• Some NoSQL have some concept of navigability but it is

expensive for complex joins

8

Page 9: 10. Graph Databases

Example of aggregate oriented orders

• Some properties are references to foreign aggregates

• This relationship are not first-class citizens

• Are not intended as real realtionships

9

Page 10: 10. Graph Databases

Example of a Small Social Network• it’s easy to find a user’s

immediate friends• friendship isn’t always

reflexive• We can have brute-force

scan across the whole dataset looking for friends entries

10

Page 11: 10. Graph Databases

Graph Databases Embrace Relationships

• The previous examples have dealt with implicitly connected data

• We infer semantic dependencies between entities• We model the data based on this connections• Our application have to navigate on this flat and disconnected

data, and deal with slow queries.• In contrast, in the graph world, connected data is stored as

connected data

11

Page 12: 10. Graph Databases

Example Social Network

12

• The node user:Bob is a Vertex with a property Bob

• We also see relations which are Edges:• Boss_of• Friend_of• Married_to

Page 13: 10. Graph Databases

GRaph DB Features

13

Page 14: 10. Graph Databases

CAP Theorem

14

Page 15: 10. Graph Databases

Consistency• Since Graph DBs operate on connected nodes, they

could not scale well distributing nodes across servers.• There are solutions supporting distribution:

– Neo4j uses one master and several slaves– OrientDB uses MVCC for distributed eventual data

structures– TitanDB partition data by using HBase or Cassandra

15

Page 16: 10. Graph Databases

Transactions• Most of the Graph DB are ACID-compliant• Before doing an operation we have to start a

transaction.• Without wrapping operations in a transaction we will

get an Exception.

16

Page 17: 10. Graph Databases

Availability• Neo4j from version 1.8 achieves availability by

providing for replicated slaves.• Infinity Graph, FlockDB and TitanDB provides for

distribute storage of the nodes.• Neo4J uses Zookeeper to keep track of the last

transaction Ids persisted on each slave node and the current master node

17

Page 18: 10. Graph Databases

Going Into Relations

18

Page 19: 10. Graph Databases

Relations• Relations in a graph naturally forms paths.• Querying or traversing the graph involves following a

path.• A query on the graph is also known as traversing the

graph• As advantage we can change the traversing

requirements without changing nodes and edges

19

Page 20: 10. Graph Databases

Relations• In graph databases traversal operation are highly

efficient.• In the book Neo4j in Action, Partner and Vukotic

perform an experiment comparing relational store and Neo4j

20

Page 21: 10. Graph Databases

Relations

• In a depth two (friend-of-friend), both relational db and graph db perform well enough

• But when we do the depth three it clear that relational db can no longer deal

21

Page 22: 10. Graph Databases

Relations

• Both aggregate store and relational databases perform poorly because of the index lookups.

• Graphs, on the other hand, use index-free adjacency list to ensure that traversing connected data is extremely fast.

22

Page 23: 10. Graph Databases

Relations another case study• Let us consider the

purchase history of a user as connected data.

• If we notice that users who buy strawberry ice cream also buy espresso beans, we can start to recommend those beans to users who normally only buy the ice cream.

23

Page 24: 10. Graph Databases

Relations and Recommendations• The previous was a one dimensional

recommendation• We can join our graph with graph from other

domains.• For example, we can ask to fine

– “all the flavors of ice cream liked by people who live near a user, and enjoy espresso, but dislike Brussels sprouts.”

24

Page 25: 10. Graph Databases

Relations and Patterns• We can use relations to query graph-patterns• Such pattern-matching queries are:

– extremely difficult to write in SQL– And are laborious to write against aggregate stores

• In both cases they tend to perform very poorly• In the other hand, graph databases are optimized for

such kind of queries

25

Page 26: 10. Graph Databases

Query Language

26

Page 27: 10. Graph Databases

Query Language• Graph DBs support query

languages such as Gremlin, Cypher and SPARQL

• Gremlin is a DSL for traversing graphs;

• It can traverse all the graph databases implementing the Blueprints

27

Page 28: 10. Graph Databases

1. Indexing: Nodes and Edges • Indexes are necessary to find the starting node to

being traversal.• How Indexes works:

– Can index properties of nodes and edges.– Adds are done in transactions

• Nodes retrieved can be used to raise queries

28

Page 29: 10. Graph Databases

2. Querying In- Out- Relationships

• Having a node we can query both for Incoming and Outgoing relationships.

• We can apply directional filters on the queries when querying for relations

29

Page 30: 10. Graph Databases

3. Querying Breadth- Depth-• Graph databases are really powerful to query for

incoming and outgoing relationships.• Moreover, we can make the traverser go top-down or

sideways on the graph by using:– BREADTH_FIRST or– DEPTH_FIRST

30

Page 31: 10. Graph Databases

4. Querying Paths• An other good feature of graph databases is

– finding paths between two nodes.– Determining if there are multiple paths– finding the shortest path

• Many Graph DBs use algorithms such as the Dijkstra’s algorithm for finding shortest paths.

31

Page 32: 10. Graph Databases

5. Querying Paths• Finally, with Graph DBs it is possible to use Match

operator• The MATCH is used for matching patterns in

relationships• The WHERE filters the properties on a node or

relationship• The RETURN specifies what to get in the result set.

32

Page 33: 10. Graph Databases

Data Modeling with GraphsIntro

33

Page 34: 10. Graph Databases

Data Modeling with Graphs

how do we model the world in graph terms?

34

Page 35: 10. Graph Databases

how do we model the world in graph terms?

• Formalization of the base model• Enrich the model• Testing the model

35

Page 36: 10. Graph Databases

Formalization of the base model• Modeling is an abstracting activity motivated by a particular

need or goal• We model in order to define structures that can manipulated. • There are no natural representations of the world the way it

“really is,” • There are just many purposeful selections, abstractions, and

simplifications that useful for satisfying a particular goal

36

Page 37: 10. Graph Databases

Formalization of the base model• Graph data modeling is different from many other

techniques.• There is a close affinity between logical and physical

models.• In relational databases we start from a logical model

to arrive to the physical model.• With graph databases, this gap shrinks considerably.

37

Page 38: 10. Graph Databases

The Graph Model• A property graph is made up of nodes, relationships,

and properties.

38

Page 39: 10. Graph Databases

Publishing Messages• We organize messages in order

39

Page 40: 10. Graph Databases

NodesNodes contain properties•Think of nodes as documents that store properties in the form of arbitrary key-value pairs. •The keys are strings and the values are arbitrary data types.

40

Page 41: 10. Graph Databases

RelationshipsRelationships connect and structure nodes. •A relationship always has a direction, a label, and a start node and an end node—there are no dangling relationships. •Together, a relationship’s direction and label add semantic clarity to the structuring of nodes.

41

Page 42: 10. Graph Databases

Relationships: AttributesLike nodes, relationships can also have properties. •The ability to add properties to relationships is particularly useful for:

– Providing additional metadata for graph algorithms– Adding additional semantics to relationships (including

quality and weight), – and for constraining queries at runtime.

42

Page 43: 10. Graph Databases

Modeling Steps: Outline• The initial stage of modeling is similar to the first

stage of many other data modeling techniques, that is:– to understand and agree on the entities in the domain– how they interrelate– and the rules that govern their state transitions

43

Page 44: 10. Graph Databases

Describe the Model in Terms of the Application’s Needs• Agile user stories provide a concise means for

expressing an outside-in, user-centered view of the application needs.

• Here’s an example of a user story for a book review web application: – AS A reader who likes a book,– I WANT to know which books other readers who like the

same book have liked, – SO THAT I can find other books to read.

44

Page 45: 10. Graph Databases

Describe the Model in Terms of the Application’s Needs• This story expresses a user need, which motivates

the shape and content of our data model. • From a data modeling point of view:

– the AS A clause establishes a context comprising two entities—a reader and a book—plus the LIKES relationship that connects them.

– The I WANT clause exposes more LIKES relationships, and more entities: other readers and other books.

45

Page 46: 10. Graph Databases

Describe the Model in Terms of the Application’s Needs• The entities and relationships in analyzing the user

story quickly translate into a simple data model

46

Page 47: 10. Graph Databases

Modeling Rationale• Use nodes to represent entities• Use relationships both:

– to express the connections between entities and – to establish semantic context for each entity

• Use relationship direction to further clarify relationship semantics

47

Page 48: 10. Graph Databases

Describe the Model: Guidelines • Use node properties

– to represent entity attributes, plus any necessary entity metadata, such as timestamps, version numbers, etc

• Use relationship properties – to express the strength, weight, or quality of a relationship,

plus any necessary relationship metadata, such as timestamps, version numbers, etc.

48

Page 49: 10. Graph Databases

Modeling Temporal Relations as Nodes• When two or more domain entities interact for a

period of time, a fact emerges • We represent these facts as separate nodes • In the following examples we show how we might

model facts and actions using intermediate nodes.

49

Page 50: 10. Graph Databases

Example: EmploymentIan was employed as an engineer at Neo Technology

50

Page 51: 10. Graph Databases

Example: PerformanceWilliam Hartnell played the Doctor in the story The Sensorites

51

Page 52: 10. Graph Databases

Example: EmailingIan emailed Jim, and copied in Alistair

52

Page 53: 10. Graph Databases

Example: Timeline Tree

53

A timeline tree showing the broadcast

dates for four episodes of a TV

program

Page 54: 10. Graph Databases

Example: Linked-ListA doubly linked list representing a time-ordered series of events

54

Page 55: 10. Graph Databases

Iterative and Incremental

• We develop the data model feature by feature, user story by user story

• This will ensure we identify the relationships our application will use to query the graph

• With the iterative and incremental delivery of application features we will be a corrected model that provides the right abstraction

55

Page 56: 10. Graph Databases

Data Modeling: Enrich• The next steps diverges from the relational data

methodology• Instead of transforming a domain model’s graph-like

representation into tables, we enrich it.• That is, for each entity in our domain, “we ensure

that we’ve captured both the properties and the connections to neighboring entities necessary to support our application goals”.

56

Page 57: 10. Graph Databases

Data Modeling: Enrich• Remember, the domain model is not totally aligned

to reality.• it is a purposeful abstraction of those aspects of our

domain relevant to our application goals. • By enriching our domain graph with additional

properties and relationships, we effectively produce a graph model aligned to our application’s data needs

57

Page 58: 10. Graph Databases

Data Modeling: EnrichIn graph terms, we are ensuring that:•each node has the appropriate properties •every node is in the correct semantic context.

we do this by creating named and directed (and often attributed) relationships between the nodes to capture the structural aspects of the domain.

58

Page 59: 10. Graph Databases

Data Modeling: Test• The next step is to test how suitable it is for

answering realistic queries• Also if Graph DB are great in supporting evolving

structures there are some design decisions to consider

• By reviewing the domain model and the resulting graph model at this early stage, we can avoid these pitfalls.

59

Page 60: 10. Graph Databases

Data Modeling: Test• In practice there are two techniques that we can

apply here • The first, and simplest, is just to check that the graph

reads well. • We pick a start node, and then follow relationships to

other nodes, reading each node’s role and each relationship’s name as we go

• Doing so should create sensible sentences

60

Page 61: 10. Graph Databases

Data Modeling: Test• The second one is to consider queries we’ll run on

the graph.• To validate that the graph supports the kinds of

queries we expect to run on it, we must describe those queries.

• Given a described query if we can easily write the query in Cypher or Gremlin we can be more certain that the graph meets the needs of our domain.

61

Page 62: 10. Graph Databases

Conclusions

62

Page 63: 10. Graph Databases

Avoid Anti-Patterns• In the general case, don’t encode entities into relationships. • It’s also important to realize that graphs are a naturally

additive structure • It’s quite natural to add facts in terms of domain entities and

how they interrelate adding nodes and relationships• If we model in accordance with the questions we want to ask

of our data, an accurate representation of the domain will emerge.

63

Page 64: 10. Graph Databases

When to Use• Connected Data• Routing, Dispatch, and Location-based Services• Recommendation Engines

64

Page 65: 10. Graph Databases

When Not to Use• When you need to update all or a subset of entities,

for example in analytics• In situation when you need to apply operations that

work on the global graph• When you don’t know the starting point of your

query

65


Recommended