+ All Categories
Home > Software > Austin Data Geeks - Why relationships are cool but join sucks

Austin Data Geeks - Why relationships are cool but join sucks

Date post: 01-Dec-2014
Category:
Upload: orient-technologies
View: 524 times
Download: 3 times
Share this document with a friend
Description:
Slides
96
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 1 www.orientechnologies.com Luca Garulli Founder and CEO @Orient Technologies Ltd Author of OrientDB www.twitter.com/lgarulli Switching from Relational to Graph Model
Transcript
Page 1: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 1www.orientechnologies.com

Luca  Garulli  –    Founder  and  CEO  @Orient  Technologies  Ltd  Author  of  OrientDB  !

www.twitter.com/lgarulli

Switching from Relational to Graph

Model

Page 2: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 2

1979 First Relational DBMS available as product

!!!!!!

2009 NoSQL movement

!

Page 3: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 3

1979 First Relational DBMS available as product

!!!!!!

2009 NoSQL movement

!

30  yrs  is  a  long  time!

Page 4: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 4

Before 2009, teams of developers always wanted a say in selecting:

!

Operative System Programming Language

Middleware (App-Servers) !

What about the Database?

Page 5: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 5

One of the main reasons RDBMS users resist passing to NoSQL is

related to the complexity of the RDBMS model:

!

NoSQL products are great for BigData and BigScale

but...

Page 6: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 6

…can it handle complexity?

Page 7: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 7

What is the NoSQL answer for managing complex domains?

!

!

Key-Value stores ? Column-Based ?

Document database ? Graph database !

No Relationship

support

Page 8: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 8

Why don’t most NoSQL products support

Relationships between entities?

Page 9: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 9

To understand why, let’s see how

Relational DBMS manages them

Page 10: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 10

Domain: the super minimal “Selling App”

Customer Address

Order Stock

Registry  system

Order  system

Page 11: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 11

Stock

Registry  system

Domain: the super minimal “Selling App”

Order

Order  system

Customer Address

How  does  Relational  DBMS  

manage  this  relationship?

Page 12: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 12

Relational World: 1-1 Relationships !!!!!!!!!

JOIN Customer.Address -> Address.Id

Customer

Id Name Address

10 Luca 34

11 Jill 44

34 John 54

56 Mark 66

88 Steve 68

Address

Id Location

34 Rome

44 London

54 Moscow

66 New  Mexico

68 Palo  Alto

Foreign  key

Primary  keyPrimary  key

Page 13: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 13

Relational World: 1-N Relationships !!!!!!!!!

Inverse JOIN Address.Customer -> Customer.Id

Customer

Id Name

10 Luca

11 Jill

34 John

56 Mark

88 Steve

Address

Id Customer Location

24 10 Rome

33 10 London

44 34 Moscow

66 56 Cologne

68 88 Palo  Alto

Page 14: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 14

Relational World: N-M Relationships !!!!!!!!

Additional table with 2 JOINs (1) CustomerAddress.Id -> Customer.Id and (2) CustomerAddress.Address -> Address.Id

Customer

Id Name

10 Luca

11 Jill

34 John

56 Mark

88 Steve

Address

Id Location

24 Rome

33 London

44 Moscow

66 Cologne

68 Palo  Alto

CustomerAddress

Id Address

10 24

10 33

34 44

Page 15: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 15

What’s wrong with the Relational Model?

Page 16: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 16

The JOIN is evil!Customer

Id Name

10 Luca

11 Jill

34 John

56 Mark

88 Steve

Address

Id Location

24 Rome

33 London

44 Moscow

66 Cologne

68 Palo  Alto

CustomerAddress

Id Address

10 24

10 33

34 24

These  JOINs  are  all  executed  every  time  you  traverse  a  

relationship

Page 17: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 17

A JOIN means searching for a key in another table

!

The usual way to improve performance is to index all the keys

!

Indexing speeds up searches, but slows down inserts, updates and deletes

Page 18: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 18

So a JOIN is essentially a lookup into an index

!

This is done for every single join! !

If you traverse hundreds of relationships, you’re executing hundreds of JOINs

Page 19: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 19

Index Lookup is it really that fast?

Page 20: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 20

Index Lookup: how does it work?

A-­‐Z

A-­‐L M-­‐Z

Imagine  an    Address  Book  

where  we  want  to  find  Luca’s  phone  number

Page 21: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 21

Index Lookup: how does it work?

A-­‐Z

A-­‐L M-­‐Z

A-­‐L

A-­‐D E-­‐L

M-­‐Z

M-­‐R S-­‐Z

Index  algorithms  are  all  similar  and  based  on  

balanced  trees

Page 22: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 22

Index Lookup: how does it work?

A-­‐Z

A-­‐L M-­‐Z

A-­‐L

A-­‐D E-­‐L

M-­‐Z

M-­‐R S-­‐Z

A-­‐D

A-­‐B C-­‐D

E-­‐L

E-­‐G H-­‐L

Page 23: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 23

Index Lookup: how does it work?

A-­‐Z

A-­‐L M-­‐Z

A-­‐L

A-­‐D E-­‐L

M-­‐Z

M-­‐R S-­‐Z

A-­‐D

A-­‐B C-­‐D

E-­‐L

E-­‐G H-­‐L

E-­‐G

E-­‐F G

H-­‐L

H-­‐J K-­‐L

Page 24: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 24

Index Lookup: how does it work?

A-­‐Z

A-­‐L M-­‐Z

A-­‐L

A-­‐D E-­‐L

M-­‐Z

M-­‐R S-­‐Z

A-­‐D

A-­‐B C-­‐D

E-­‐L

E-­‐G H-­‐L

E-­‐G

E-­‐F G

H-­‐L

H-­‐J K-­‐L

Luca

Found!    This  lookup  took  5  steps.  With  millions  of  indexed  records,  the  tree  depth  could  be  1000’s  of  levels!

Page 25: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 25

Can you imagine how many steps a

Lookup operation takes for an Index with Billions

of records?

Page 26: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 26

And this JOIN is executed for every involved table,

multiplied for all scanned records!

Page 27: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 27

Querying more tables can easily produce millions of JOINs/Lookups!

!

Here’s the rule: more entries

= more lookup steps = slower queries

Page 28: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 28

AAhThis is why the query performance of your

database suffers as the database becomes bigger,

and bigger, and bigger!

Page 29: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 29

What about Document Databases

like MongoDB?

Page 30: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 30

How MongoDB manages relationships: !

{ “_id” : “292846512”, “type” : “Order”, “number” : 1223, “customer” : “123456789” }

Page 31: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 31

MongoDB uses the same RDBMS approach:

!

it stores the _id of the connected documents. At run-time, it looks up

the _id by using an index.

Page 32: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 32

Is there a better way to manage relationships?

Page 33: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 33

“A graph database is any storage system that provides

index-free adjacency” !

- Marko Rodriguez (author of TinkerPop Blueprints)

Page 34: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 34

How does a GraphDB manage index-free relationships?

Page 35: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 35

Every developer knows the Relational Model, but who knows the

Graph one?

Page 36: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 36

Back to school: Graph Theory crash course

!

Page 37: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 37

Basic Graph

Luca AustinLikes

Page 38: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 38

Property Graph Model*

Luca  !

name:  Luca  surname:  Garulli  company:  Orient  Technologies

!Austin  

!population:  1,900,000

Likes      

since:  2014

Vertices  and  Edges  can  have  propertiesVertices  and  Edges  can  have  propertiesVertices  and  Edges  can  have  properties

Vertices  are  directed

*  https://github.com/tinkerpop/blueprints/wiki/Property-­‐Graph-­‐Model

Page 39: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 39

1-N relationships

Luca!

Austin

Likes  !

since:  2014

An  Edge  connects  only  2  vertices    !

Use  multiple  edges  to  represent  1-­‐N  and  N-­‐M  relationships

Visited  !

when:  [2013,  2014]

Page 40: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 40

Graph Example

Likes

Luca

Austin

Austin  Data  Geeks

HostsIsMemberOf

Visited

Page 41: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 41

Congrats! This is your diploma in «Graph Theory»

Page 42: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 42

The Graph theory is so simple,

yet so powerful

Page 43: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 43

Let’s go back to the Graph Stuff

!

How does OrientDB manage relationships?

Page 44: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 44

Luca  (vertex)

OrientDB: traverse a relationship

label  :  ‘Customer’  name  :  ‘Luca’

RID  =  #13:35 RID  =  #13:100

label  =  ‘City’  name  =  ‘Rome’

The Record ID (RID) is the physical position

Rome  (vertex)

The Record ID (RID) is the physical position

Page 45: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 45

Lives

OrientDB: traverse a relationship

out  :  [#14:54]  label  :  ‘Customer’  name  :  ‘Luca’

out:  [#13:35]  in:  [#13:100]  Label  :  ‘Lives’

RID  =  #13:35 RID  =  #13:100

in:  [#14:54]  label  =  ‘City’  name  =  ‘Rome’

The Edge’s RID is saved inside both vertices, as «out»

and «in»

The Edge’s RID is saved inside both vertices, as «out»

and «in»

RID  =  #14:54

Luca  (vertex)

Rome  (vertex)

Page 46: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 46

Luca Lives

OrientDB: traverse -> outgoing

out  :  [#14:54]  label  :  ‘Customer’  name  :  ‘Luca’

out:  [#13:35]  in:  [#13:100]  Label  :  ‘Lives’

RID  =  #13:35RID  =  #14:54

RID  =  #13:100

in:  [#14:54]  label  =  ‘City’  name  =  ‘Rome’

Rome

Page 47: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 47

Luca Lives

OrientDB: traverse <- incoming

out  :  [#14:54]  label  :  ‘Customer’  name  :  ‘Luca’

out:  [#13:35]  in:  [#13:100]  Label  :  ‘Lives’

RID  =  #13:35RID  =  #14:54

RID  =  #13:100

in:  [#14:54]  label  =  ‘City’  name  =  ‘Rome’

Rome

Page 48: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 48

A GraphDB handles relationships as a physical LINK to the record,

assigned only when the edge is created !

VS !

RDBMS computes the relationship every time you query a database

!

Isn’t that crazy?!

Page 49: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 49

This means jumping from a O(log N) algorithm to a near O(1)

!With OrientDB, the traversing time is

not affected by database size! !

This is huge in the BigData age

Page 50: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 50

an Open Source (Apache licensed) document-graph NoSQL dbms

Page 51: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 51

OrientDB in the Blueprints micro-benchmark, on common hw, with a hot cache,

traverses 29,6 Million records in less than 5 seconds

!about 6 Million nodes traversed per sec!

!!!!

*unless you live in Google’s server farm

Do not try this at home with a RDBMS*!

Page 52: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 52

Create the graph in SQL

!$luca> cd bin $luca> ./console.sh OrientDB console v.1.6.1 (www.orientdb.org) Type 'help' to display all the commands supported. !orientdb> create vertex Customer set name = ‘Luca’ Created vertex #13:35 in 0.03 secs !orientdb> create vertex Address set name = ‘Rome’ Created vertex #13:100 in 0.02 secs !orientdb> create edge Lives from #13:35 to #13:100 Created edge #14:54 in 0.02 secs

Page 53: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 53

Create the graph in Java

!!Graph graph = new OrientGraph("local:/tmp/db/graph”); !Vertex luca = graph.addVertex( “class:Customer” ); luca.setProperty( “name", “Luca” ); !Vertex rome = graph.addVertex ( “class:Address” ); rome.setProperty( “name", “Rome” ); !Edge edge = luca.addEdge( “Lives”, rome ); !graph.shutdown();

Page 54: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 54

Query the graph in SQL

!!!orientdb> select in(‘Lives’) from Address where name = ‘Rome’ !!---+------+---------|--------------------+--------------------+--------+ #| RID |@class |label |out_Lives |in | ---+------+---------+--------------------+--------------------+--------+ 0| 13:35|Customer |Luca |[#14:54] | | ---+------+---------+--------------------+--------------------+--------+1 item(s) found. Query executed in 0.007 sec(s).

Incoming vertices

Page 55: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 55

More on query power

!!orientdb> select sum( out(‘Order’).total ) from Customer where name = ‘Luca’ !orientdb> traverse both(‘Friend’) from Customer while $depth <= 7 !orientdb> select from ( traverse both(‘Friend’) from Customer while $depth <= 7 ) where @class=‘Customer’ and city.name = ‘Austin’

Page 56: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 56

Query vs traversal !

With a well-connected database in the form of a Super Graph, you can cross records instead of query them!

!

All you need is a few “Root Vertices” to start traversing

Page 57: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 57

Query vs traversal

Customers

Luca Mark Jill

Order  2332

Order  8834

White  Soap

StocksSpecial  Customers

This is a root vertex

Page 58: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 58

Root Vertices can be enriched by Meta Graphs

to decorate Graphs with additional information and make the retrieval

easier/faster

Page 59: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 59

Temporal based Meta Graph

Order  2333

Order  2334

Calendar

Hour  9/4/2013  10:00

Hour  9/4/2013  09:00

Order  2332

Day  9/4/2013

Month  April  2013

Year  2013

Page 60: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 60

Location based Meta Graph

Order  2333

Order  2334

Location

City  Fiumicino

City  Rome

Order  2332

State  RM

Region  Lazio

Country  Italy

Page 61: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 61

Mix & Merge graphs

Order  2333

Order  2334

Location

City  Fiumicino

City  Rome

Order  2332

State  RM

Region  Lazio

Country  Italy

Calendar

Hour  9/4/2013  10:00

Hour  9/4/2013  09:00

Day  9/4/2013

Month  April  2013

Year  2013

Page 62: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 62

Order  2333

Order  2334

Location

City  Rome

City  Fiumicino

Order  2332

State  RM

Region  Lazio

Country  Italy

Calendar

Hour  9/4/2013  10:00

Hour  9/4/2013  09:00

Day  9/4/2013

Month  April  2013

Year  2013

!

!

Get all the orders sold in “Rome”

on 9/4/2013 at 10:00

Page 63: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 63

Start from Calendar, look for Hour 10:00

Order  2333

Order  2334

Location

City  Fiumicino

City  Rome

Order  2332

State  RM

Region  Lazio

Country  Italy

Calendar

Hour  9/4/2013  10:00

Hour  9/4/2013  09:00

Day  9/4/2013

Month  April  2013

Year  2013

Page 64: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 64

Start from Calendar, look for Hour 10:00

Order  2333

Order  2334

Location

City  Rome

City  Rome

Order  2332

State  RM

Region  Lazio

Country  Italy

Calendar

Hour  9/4/2013  10:00

Hour  9/4/2013  09:00

Day  9/4/2013

Month  April  2013

Year  2013

Found 2 Orders, filter by incoming edges<

Found 2 Orders, now filter by incoming edges

Page 65: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 65

Order  2333

Location

City  Rome

Order  2332

State  RM

Region  Lazio

Country  Italy

Calendar

Hour  9/4/2013  10:00

Hour  9/4/2013  09:00

Day  9/4/2013

Month  April  2013

Year  2013

Order  2334

Only “Order 2333” has incoming connections with

“Rome”

City  Fiumicino

Start from Calendar, look for Hour 10:00

Page 66: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 66

Order  2333

Location

City  Rome

Order  2332

State  RM

Region  Lazio

Country  Italy

Calendar

Hour  9/4/2013  10:00

Hour  9/4/2013  09:00

Day  9/4/2013

Month  April  2013

Year  2013

Order  2334

City  Fiumicino

Or start from Location, look for Rome

Page 67: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 67

Order  2333

Order  2332

Calendar

Hour  9/4/2013  10:00

Hour  9/4/2013  09:00

Day  9/4/2013

Month  April  2013

Year  2013

Order  2334

Start from Location, look for Rome

Location

City  Fiumicino

City  Rome

State  RM

Region  Lazio

Country  Italy

Page 68: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 68

Luca

Recommendation system

Jill

Enrico

Friend

Friend

Page 69: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 69

Salt  Lick

Luca

Recommendation system

Jill

Enrico

Hut’s  Burgers

Pappasito's

Friend

Friend

Eats

Eats

EatsEats

Franklin’s  BBQ

Eats

Page 70: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 70

Recommendation system

Salt  Lick

LucaJill

Enrico

Hut’s  Burgers

Pappasito's

Friend

Friend

Eats

Eats

EatsEats

Franklin’s  BBQ

select  both(‘Friend’)                      from  Person  where  name  =  ‘Luca’

Page 71: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 71

Recommendation system

Da  Carlone

LucaJill

Enrico

La  Mediterranea

Meridionale

Friend

Friend

Eats

Eats

EatsEats

Eaitaly

Salt  Lick

Hut’s  Burgers

Pappasito's

Franklin’s  BBQ

select  both(‘Friend’).out(‘Eats’)                      from  Person  where  name  =  ‘Luca’

Page 72: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 72

Recommendation system

Salt  Lick

LucaJill

Enrico

Hut’s  Burgers

Pappasito's

Friend

Friend

Eats

Eats

EatsEats

Franklin’s  BBQ

select  both(‘Friend’).out(‘Eats’)                      from  Person  where  name  =  ‘Luca’

Page 73: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 73

Let’s move like a Spider

on the web

Page 74: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 74

Page 75: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 75

It provides the flexibility and speed of a document store combined with the advanced relationship

features of a graph database

OrientDB is a Graph - Document DBMS

GraphDocument

Page 76: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 76

Download

Unzip

Run

Installationin 5 minutes

You only need a JVM installed to run the server

Page 77: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 77

200,000 documents per second

(no index, multi-threads, on commodity hw)

Page 78: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 78

Schema-less !

schema is not mandatory, relaxed model, collect heterogeneous documents all together

Page 79: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 79

Schema-full !

schema with constraints on fields and validation rules !

Customer.age > 17 Customer.address not null

Customer.surname is mandatory Customer.email matches '\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b'

Page 80: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 80

Schema-mixed !

schema with mandatory and optional fields + constraints

= the best of schema-less and schema-full modes

Page 81: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 81

ACID Transactions !!

db.begin(); try { // your code ... db.commit(); !} catch( Exception e ) { db.rollback(); }

Page 82: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 82

SQL !

select * from employee where name like '%Jay%'

Page 83: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 83

Why reinvent yet another language when

almost all developers already know SQL?

!

OrientDB uses SQL but extends it adding new

operators for graph manipulation

Page 84: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 84

For most of the queries a programmer uses everyday

SQL is simpler, more readable and

compact then Scripting (Map/Reduce)

Page 85: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 85

Native  JSON  !!ODocument = new ODocument().fromJSON( " { '@rid' = '26:10', '@class' = 'Developer', 'name' : 'Luca', 'surname' : 'Garulli', 'out' : [ #10:33, #10:232 ] }“ );

Page 86: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 86

FEATURES ORIENTDB)) MONGODB NEO4J MYSQL)(RDBMS)

Operational Database X X X Graph Database X X Document Database X X Object-Oriented Concepts X Schema-full, Schema-less, Schema mix X User and Role & Record Level Security X Record Level Locking X X X SQL X X ACID Transaction X X X Relationships (Linked Documents) X X X Custom Data Types X X X Embedded Documents X X Multi-Master Zero Configuration Replication X Sharding X X Server Side Functions X X X Native HTTP Rest/ JSON X X Embeddable with No Restrictions X

Feature  matrix

Page 87: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 87

Free !

Open Source Apache 2 license free for any purpose,

even commercial

Page 88: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 88

Show time!

Page 89: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 89

Ready to try OrientDB? !

Start from the ETL

Page 90: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 90

{ "extractor" : { "jdbc": { "driver": "com.mysql.jdbc.Driver", "url": "jdbc:mysql://localhost/mysqlcrm", "userName": "root", "userPassword": "", "query": "select * from Client" } }, "transformers" : [ { "vertex": { "class": "Client"} } ], "loader" : { "orientdb": { "dbURL": "plocal:/temp/databases/orientdbcrm", "dbAutoCreate": true } } }

OrientDB  ETL

Page 91: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 91

./oetl.sh mydb.json

Page 92: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 92

Contribute to build or improve a library!

Drivers

Page 93: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 93

Professional Services by !

!

Development Support Production Support

Training Consultancy

LTD  (London,  UK)

LTD

Page 94: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 94

We’re  looking  for  worldwide  partners

Write  to  [email protected]

Page 95: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 95

http://www.orientechnologies.com/why-­‐orientdb

Page 96: Austin Data Geeks - Why relationships are cool but join sucks

(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License 96www.orientechnologies.com

Thanks!Luca  Garulli  –    Founder  and  CEO  !www.twitter.com/lgarulli


Recommended