Introduction to graph databases GraphDays

Post on 12-Jul-2015

1,000 views 0 download

transcript

Graph All The ThingsIntroduction to Graph Databases

Neo4j GraphDays 2014Chicago

Philip RathleVP of Products, Neo4j

@prathle#neo4j

C34,3%B

38,4%A3,3%

D3,8%

1,8%1,8% 1,8%

1,8%

1,8%

E8,1%

F3,9%

INDUSTRY TRENDS: GRAPHS TRANSFORMED CONSUMER WEB

Use of Relationship Information in The Consumer Web

INDUSTRY TRENDS: GRAPHS TRANSFORMED CONSUMER WEB

Use of Relationship Information in The Consumer Web

INDUSTRY TRENDS: GRAPHS TRANSFORMED CONSUMER WEB

Ref: http://www.gartner.com/id=2081316

Interest Graph

Payment Graph

Intent Graph

Mobile Graph

Consumer Web Giants Depends on Five GraphsGartner’s “5 Graphs”

Social Graph

GARTNER’S 5 GRAPHS OF CONSUMER WEB: SUSTAINABLE COMPETITIVE DIFFERENTIATION COMES FROM MASTERING 5 GRAPHS

Key-Value

Graph DB

RiakRedis

Neo4j

membase

0x235C Philip

0xCD21 Neo4j Chicago

0x2014 [PPR,RB,NL]

0x3821 [CHI, SFO, BOS]

0x3890 B75DD108A

Column FamilyName UID Members Groups Photo

0x235C Philip PPR CHI, SFO, BOS B75DD108A893A

0xCD21 Neo4j Chicago CHI PPR,RB,

NL 218758D88E901 CassandraHBase

Document DB0x235C {name:Philip, UID: PPR, Groups: [CHI,SFO,BOS]}

0xCD21{name:Neo4j Chicago, UID: PPR, Members:[PPR,RB,NL],

where:{city:Chicago, State: IL}} MongoDB CouchDB

NI name:Neo4j Chicago, UID: CHI,Photo: 218758D88E901

ABK name:Philip, UID: PPR, Photo: B75DD108A893A

MEMBERsince: 2011

UNLOCKING THE POTENTIAL OF RELATIONSHIPS IN DATA

A GRAPH DATABASE IS PURPOSE-BUILT FOR:

When your business depends on Relationships in Data

The Property Graph ModelTHE PROPERTY GRAPH MODEL

The Property Graph ModelTHE PROPERTY GRAPH MODEL

LovesAnn Dan

The Property Graph Model

Ann DanLoves

THE PROPERTY GRAPH MODEL

The Property Graph Model

(Ann) –[:LOVES]-> (Dan)

THE PROPERTY GRAPH MODEL

Ann DanLoves

The Property Graph Model

(:Person {name:"Ann"}) –[:LOVES]-> (:Person {name:"Dan"})

THE PROPERTY GRAPH MODEL

Ann DanLoves

The Property Graph Model

(:Person {name:"Ann"}) –[:LOVES]-> (:Person {name:"Dan"})

THE PROPERTY GRAPH MODEL

Ann DanLoves

Node Relationship Node

The Property Graph Model

(:Person {name:"Ann"}) –[:LOVES]-> (:Person {name:"Dan"})

THE PROPERTY GRAPH MODEL

Ann DanLoves

Node Relationship Nodeproperty propertylabel labeltype

Cypher

Query: Whom does Ann love?

(:Person {name:"Ann"})–[:LOVES]->(whom)

CYPHER

Cypher

Query: Whom does Ann love?

MATCH (:Person {name:"Ann"})–[:LOVES]->(whom)

CYPHER

Cypher

Query: Whom does Ann love?

MATCH (:Person {name:"Ann"})–[:LOVES]->(whom)

RETURN whom

CYPHER

CypherCYPHER

Under The Hood

MATCH (:Person {name:"Ann"})–[:LOVES]->(whom)RETURN whom

cypher

native graph processing

native storage

UNDER THE HOOD

BUSINESS & PROJECT IMPACT

#1: EASIER TO UNDERSTAND COMPLEX MODELS

“Find all sushi restaurants in NYC that my friends like”

“Find all direct reports and how many they manage, up to 3 levels down”

#2: EASIER TO EXPRESS COMPLEX QUERIES

Example HR Query:

MATCH  (boss)-­‐[:MANAGES*0..3]-­‐>(sub),              (sub)-­‐[:MANAGES*1..3]-­‐>(report)  WHERE  boss.name  =  “John  Doe”  RETURN  sub.name  AS  Subordinate,  count(report)  AS  Total

(SELECT T.directReportees AS directReportees, sum(T.count) AS count FROM ( SELECT manager.pid AS directReportees, 0 AS count FROM person_reportee manager WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") UNION SELECT manager.pid AS directReportees, count(manager.directly_manages) AS count FROM person_reportee manager WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees UNION SELECT manager.pid AS directReportees, count(reportee.directly_manages) AS count FROM person_reportee manager JOIN person_reportee reportee ON manager.directly_manages = reportee.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees UNION SELECT manager.pid AS directReportees, count(L2Reportees.directly_manages) AS count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees ) AS T GROUP BY directReportees) UNION (SELECT T.directReportees AS directReportees, sum(T.count) AS count FROM ( SELECT manager.directly_manages AS directReportees, 0 AS count FROM person_reportee manager WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") UNION SELECT reportee.pid AS directReportees, count(reportee.directly_manages) AS count FROM person_reportee manager JOIN person_reportee reportee ON manager.directly_manages = reportee.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees UNION

(continued from previous page...) SELECT depth1Reportees.pid AS directReportees, count(depth2Reportees.directly_manages) AS count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees ) AS T GROUP BY directReportees) UNION (SELECT T.directReportees AS directReportees, sum(T.count) AS count FROM( SELECT reportee.directly_manages AS directReportees, 0 AS count FROM person_reportee manager JOIN person_reportee reportee ON manager.directly_manages = reportee.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees UNION SELECT L2Reportees.pid AS directReportees, count(L2Reportees.directly_manages) AS count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees ) AS T GROUP BY directReportees) UNION (SELECT L2Reportees.directly_manages AS directReportees, 0 AS count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") )

SAME QUERY IN SQL ( ! ! )

PERFORMANCE AT SCALE

RDBMS/Other vs. Native Graph Database

Connectedness of Data Set

Resp

onse

Tim

e

RDBMS / Other NOSQL# Hops: 0-2 Degree: < 3

Size: ThousandsNeo4j

# Hops: Tens to Hundreds Degree: Thousands+Size: Billions+

1000x faster

#3: PERFORMANCE

DATABASE # PEOPLE QUERY TIME (MS)

MySQL 1,000 2,000

Neo4j 1,000 2

Neo4j 1,000,000 2

Business Impact: Move FasterThe  whole  design,  development,  QA,  

and  release  process  for  CruchBase  

Events  was  a  total  of  2  weeks.”  

“The  ability  to  iterate  that  quickly  is  

a  mammoth  step  up  for  us.    

In  CrunchBase  1.0  (MySQL),  it  probably  

would  have  taken  2  months.”  

-­‐  Kurt  Freytag,  CTO  CrunchBase

Total DollarAmount

Transaction Count

Investigate

Investigate

Business Impact: Invent Faster

“Our  Neo4j  solution  is  literally  thousands  of  times  

faster  than  the  prior  MySQL  solution,with  queries  that  require  10-­‐100  times  less  code.”  

-­‐  Volker  Pacher,  Senior  Developer  eBay

Business Impact: Run Faster

Neo Technology, Inc Confidential

Real-Time/ OLTP

Offline/ Batch

Connected Queries Enable Real-Time Analytics

GRAPHS ARE TRANSFORMING THE WORLD

Core industries & Use Cases WEB / ISV Financial Services Tele-communications

Network &Data Center Management

Master Data Management

Social

Geo

?

Core industries & Use Cases WEB / ISV Financial Services Telecommunications Health Care

& Life Sciences

Network &Data Center Management

Master Data Management

Social

GEO

Finance

GRAPHS ARE TRANSFORMING THE WORLD

Neo Technology, Inc Confidential

Core industries

& Use CasesWEB / ISV Financial

ServicesTelecom-

munications

Health Care & Life

Sciences

Web Social,HR &

Recruiting

Media & Publishing

Energy, Services, Automotive, Gov’t,

Logistics, Education, Gaming, Other

Network &Data Center Management

Master Data Management

Social

GEO

Recomm-endations

Identity & Access Mgmt

Search & Discovery

BI, CRM, Impact Analysis, Fraud

Detection, Resource

Optimization, etc.

Finance

Neo4j Adoption Snapshot

GRAPH DATABASES - THE FASTEST GROWING DBMS CATEGORY

Source: http://db-engines.com/en/ranking/graph+dbms!

0%

10%

20%

30%

2011 2014 2017

25%

2.5%

0%

% o

f Ent

erpr

ises

usi

ng G

raph

Dat

abas

es “Forrester estimates that over 25% of enterprises will be using graph databases by 2017”

Sources• Forrester TechRadar™: Enterprise DBMS, Feb 13 2014 (http://www.forrester.com/TechRadar+Enterprise+DBMS+Q1+2014/fulltext/-/E-RES106801)• Dataversity Mar 31 2014: “Deconstructing NoSQL: Analysis of a 2013 Survey on the Use, Production and Assessment of NoSQL Technologies in the Enterprise” (http://www.dataversity.net)• Neo Technology customer base in 2011 and 2014• Estimation of other graph vendors’ customer base in 2011 and 2014 based on best available intelligence

“25% of survey respondents said they plan to use Graph databases in the future.”

Graph Databases: Powering The EnterpriseGRAPH DATABASES - POWERING

THE ENTERPRISE

Ref: Gartner, ‘IT Market Clock for Database Management Systems, 2014,’ September 22, 2014https://www.gartner.com/doc/2852717/it-market-clock-database-management

“Graph analysis is possibly the single most effective

competitive differentiator for organizations pursuing data-

driven operations and decisions after the design of

data capture.”

Graph Databases: Can Transform Your BusinessGRAPH DATABASES - CAN

TRANSFORM YOUR BUSINESS

Summary

When your business depends on Relationships in Data

SUMMARY

Your Mission:

Connect.