RDBMS to Graph Webinar

Post on 14-Apr-2017

395 views 0 download

transcript

RDBMS TO GRAPH

Live from San Mateo, March 9, 2016Webinar

Data used to be stored like this: punch tape. Or punch cards. Horrible way to read and understand data.Impossible to index easily, cross-reference, eliminate inconsistencies and cross-reference.

Then we started storing data in tables, and “relational” databases.

Sometimes those tables are human-readable.

But as soon as you normalize the data to eliminate duplication and inconsistencies, many fields start referencing auto-generated numerical foreign keys. And your data becomes difficult to understand and maintain without complicated JOIN queries.

ACCOUNT HOLDER 2

ACCOUNT HOLDER 1

ACCOUNT HOLDER 3

CREDIT CARD

BANKACCOUNT

BANKACCOUNT

BANKACCOUNT

ADDRESS

PHONE NUMBER

PHONE NUMBER

SSN 2

UNSECURE LOAN

SSN 2

UNSECURE LOAN

CREDIT CARD

Enter Graph Databases. The future is now.

Graph Databases, like Neo4j, store data in a much more logical way. A way that represents the real world, and prioritizes the representation, discoverability and maintainability of data relationships.

IntuitivnessSpeedAgility

IntuitivenessSpeedAgility

Intuitiveness

IntuitivnessSpeedAgility

Speed

“We found Neo4j to be literally thousands of times faster than our prior MySQL solution, with queries that require

10-100 times less code. Today, Neo4j provides eBay with functionality that was previously impossible.”

- Volker Pacher, Senior Developer

“Minutes to milliseconds” performance Queries up to 1000x faster than RDBMS or other NoSQL

IntuitivnessSpeedAgility

A Naturally Adaptive Model

A Query Language Designed for Connectedness

+

=Agility

CypherTypical Complex SQL Join The Same Query using Cypher

MATCH (boss)-[:MANAGES*0..3]->(sub), (sub)-[:MANAGES*1..3]->(report)WHERE boss.name = “John Doe”RETURN sub.name AS Subordinate, count(report) AS Total

Project ImpactLess time writing queries

Less time debugging queries

Code that’s easier to read

Less time writing queriesMore time understanding the answersLeaving time to ask the next question

Less time debugging queries: More time writing the next piece of codeImproved quality of overall code base

Code that’s easier to read:Faster ramp-up for new project membersImproved maintainability & troubleshooting

ABOUT ME• Developed web apps for 5 years

including e-commerce, business workflow, more.

• Worked at Google for 8 years on Google Apps, Cloud Platform

• Technologies: Python, Java, BigQuery, Oracle, MySQL, OAuth

ryan@neo4j.com @ryguyrg

NEO4j USE CASESReal Time Recommendations

Master Data Management

Fraud Detection

Identity & Access Management

Graph Based Search

Network & IT-Operations

NEO4j USE CASESReal Time Recommendations

Master Data Management

Fraud Detection

Identity & Access Management

Graph Based Search

Network & IT-Operations

GRAPH THINKING: Real Time Recommendations

VIEW

ED

VIEWED

BOUG

HT

VIEWED BOUGHT

BOUGHT

BO

UG

HT

BOUG

HT

Real-Time Recommendations could be about finding the relationsships relevant to make recommend a product or a service…. …which is exactly why Walmart is using Neo4j.

“As the current market leader in graph databases, and with enterprise features for scalability and availability, Neo4j is the right choice to meet our demands.” Marcos Wada

Software Developer, Walmart

NEO4j USE CASESReal Time Recommendations

Master Data Management

Fraud Detection

Identity & Access Management

Graph Based Search

Network & IT-Operations

NEO4j USE CASESReal Time Recommendations

Master Data Management

Fraud Detection

Identity & Access Management

Graph Based Search

Network & IT-Operations

GRAPH THINKING: Master Data Management

MANAGES

MANAGES

LEADS

REGION

MANAGES

MANAGES

REGION

LEADS

LEADS

COLL

ABO

RATE

S

Master Data Management is about bringing together all the entities within an organization and external to the organization. To understand the relationship between each of them.

Neo4j is the heart of Cisco HMP: used for governance and single source of truth and a one-stop shop for all of Cisco’s hierarchies.

NEO4j USE CASESReal Time Recommendations

Master Data Management

Fraud Detection

Identity & Access Management

Graph Based Search

Network & IT-Operations

Cisco uses it for this — to power their content management, resources and knowledge-base articles for use by sales teams. It also powers product recommendations to make sure customers are getting the power of their offerings.

Although this project is focused on sales teams, another group has used Neo4j to power all of their helpdesk content -

NEO4j USE CASESReal Time Recommendations

Master Data Management

Fraud Detection

Identity & Access Management

Graph Based Search

Network & IT-Operations

GRAPH THINKING: Master Data Management

Solu%onSupportCase

SupportCase

KnowledgeBaseAr%cle

Message

KnowledgeBaseAr%cle

KnowledgeBaseAr%cle

Neo4j is the heart of Cisco’s Helpdesk Solution too.

Master Data Management is about bringing together all the entities within an organization and external to the organization. To understand the relationship between each of them.

NEO4j USE CASESReal Time Recommendations

Master Data Management

Fraud Detection

Identity & Access Management

Graph Based Search

Network & IT-Operations

GRAPH THINKING: Fraud Detection

OPENED_ACCOUNT

HAS IS_ISSUED

HAS

LIVES LIVES

IS_ISSUED

OPE

NED_

ACCO

UNT

Discovering fraud is another use case that is particularly suitable to graphs, because it’s all about about finding fraudulent patterns. Here we work with the top banks and insurance companies as well as many governments..

“Graph databases offer new methods of uncovering fraud rings and other sophisticated scams with a high-level of accuracy, and are capable of stopping advanced fraud scenarios in real-time.”

Gorka SadowskiCyber Security Expert

NEO4j USE CASESReal Time Recommendations

Master Data Management

Fraud Detection

Identity & Access Management

Graph Based Search

Network & IT-Operations

GRAPH THINKING: Graph Based Search

NEO4j USE CASESReal Time Recommendations

Master Data Management

Fraud Detection

Identity & Access Management

Graph Based Search

Network & IT-Operations

PUBLISH

INCLUDE

INCLUDE

CREATE

CAPT

URE

IN

IN

SOURCE

USES

USES

IN

IN

USES

SOURCE SOURCE

Uses Neo4j to manage the digital assets inside of its next generation in-flight entertainment system.

NEO4j USE CASESReal Time Recommendations

Master Data Management

Fraud Detection

Identity & Access Management

Graph Based Search

Network & IT-Operations

NEO4j USE CASESReal Time Recommendations

Master Data Management

Fraud Detection

Identity & Access Management

Graph Based Search

Network & IT-Operations

BROWSES

CONN

ECTS

BRIDGES

ROUTES

POW

ERSROUTES

POWERSPOWERS

HOSTS

QUERIES

GRAPH THINKING: Network & IT-Operations

Decency analysisRoot cause analysis

Uses Neo4j for network topology analysis for big telco service providers

NEO4j USE CASESReal Time Recommendations

Master Data Management

Fraud Detection

Identity & Access Management

Graph Based Search

Network & IT-Operations

GRAPH THINKING: Identity And Access Management

NEO4j USE CASESReal Time Recommendations

Master Data Management

Fraud Detection

Identity & Access Management

Graph Based Search

Network & IT-Operations

TRUSTS

TRUSTS

ID

ID

AUTHENTICATES

AUTH

ENTI

CATE

S

OWNS

OWNSC

AN

_REA

D

Think of organizational hierarchies. No longer is it just a tree.

UBS was the recipient of the 2014 Graphie Award for “Best Identify And Access Management App”

NEO4j USE CASESReal Time Recommendations

Master Data Management

Fraud Detection

Identity & Access Management

Graph Based Search

Network & IT-Operations

Neo4j Adoption by Selected VerticalsSOFTWARE FINANCIAL

SERVICES RETAIL MEDIA & BROADCASTING

SOCIAL NETWORKS TELECOM HEALTHCARE

AGENDA• Use Cases • SQL Pains • Building a Neo4j Application • Moving from RDBMS -> Graph Models

• Walk through an Example • Creating Data in Graphs • Querying Data

I hired this kid for all the handwriting you’ll see throughout the presentation.So, don’t blame me.

SQL

Day in the Life of a RDBMS Developer

Let’s explore how your SQL developer works today.

They work with data in tables.

Here’s a table of people and where they're from, their hair color and the university they attended.

This table is fairly natural, but duplicating values across multiple rows. Let’s say you want to change the name of a university or a country, you’d have to update all rows.

So, instead, you’d create a separate table for the country, with an ID that references it. This is your primary key.

This allows you to add additional properties.

Now, you use that ID to reference the country in the people table - a foreign key.

And you’d want to normalize the university table as well.

And use the university ID to reference it. Now your table it a lot less readable.

So, we see this set of 3 tables with arrows indicating references between primary keys and foreign keys, used in JOINs.

SELECT p.name, c.country, c.leader, p.hair, u.name, u.pres, u.stateFROM people p LEFT JOIN country c ON c.ID=p.country LEFT JOIN uni u ON p.uni=u.idWHERE u.state=‘CT’

Your SQL looks like this.Only, this is a super simple JOIN across 3 tables. I’ve often had to work with 10+ tables being JOINed.

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

And your SQL developer? All she’s thinking about is joins

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

All day long.

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

And they’re keeping her up at night as well.

Meanwhile, it’s expensive to find data.So we add indexes to make it easier.But when we have to do index lookups for each and every JOIN?And we have a dozen JOINs?That’s expensive.

What’s the solution? Denormalize! But now hard to maintain and have consistent data.

• Complex to model and store relationships • Performance degrades with increases in data • Queries get long and complex • Maintenance is painful

SQL Pains

• Easy to model and store relationships • Performance of relationship traversal remains constant with

growth in data size • Queries are shortened and more readable • Adding additional properties and relationships can be done on

the fly - no migrations

Graph Gains

John Resig, who you may know as the creator of jQuery, loves Neo4j because it simplifies life.

What does this Graph look like?

So you’ve seen what tables look like. How do graphs make this better?

CYPHER

Ann DanLoves

The obligatory “Ann Loves Dan” example

Property Graph Model

CREATE (:Person { name:“Dan”} ) - [:LOVES]-> (:Person { name:“Ann”} )

LOVES

LABEL PROPERTY

NODE NODE

LABEL PROPERTY

The whiteboard model is the physical model.

MATCH (p:Person)-[:WENT_TO]->(u:Uni), (p)-[:LIVES_IN]->(c:Country), (u)-[:LED_BY]->(l:Leader), (u)-[:LOCATED_IN]->(s:State)WHERE s.abbr = ‘CT’RETURN p.name, c.country, c.leader, p.hair, u.name, l.name, s.abbr

David Meza, chief knowledge architect at NASA, had this to say.

How do you use Neo4j?

CREATE MODEL

+

LOAD DATA QUERY DATA

Create your Graph ModelLoad your data Query your data

How do you use Neo4j?

Querying can be done in the Neo4j Browser.

Querying can be done in the Neo4j Browser.

How do you use Neo4j?

Language Drivers

javascript, java, ruby, .net, python, php

Language Drivers

haskell, go

Native Server-Side Extensions

Need to get every last ounce of performance?You can write server-side extensions in Java.

Architectural Options

DataStorageandBusinessRulesExecu5on

DataMiningandAggrega5on

Applica'on

GraphDatabaseCluster

Neo4j Neo4j Neo4j

AdHocAnalysis

BulkAnaly'cInfrastructureHadoop,EDW…

DataScien'st

EndUser

DatabasesRela5onalNoSQLHadoop

RDBMS to Graph Options

MIGRATEALLDATA

MIGRATESUBSET

DUPLICATESUBSET

Non-GraphQueries GraphQueries

GraphQueriesNon-GraphQueries

AllQueries

Rela3onalDatabase

GraphDatabase

Application

Application

Application

NonGraphData

AllData

FROM RDBMS TO GRAPHS

Northwind

Northwind - the canonical RDBMS Example

( )-[:TO]->(Graph)

( )-[:IS_BETTER_AS]->(Graph)

Starting with the ER Diagram

Locate the Foreign Keys

Drop the Foreign Keys

Find the JOIN Tables

(Simple) JOIN Tables Become Relationships

Attributed JOIN Tables -> Relationships with Properties

Querying a Subset Today

As a Graph

QUERYING THE GRAPH

using openCypher

Declarative query languageEasy to learn for someone familiar with languages like SQLBut optimized for graphs, and quickly readable

Property Graph Model

CREATE(:Employee{firstName:“Steven”})-[:REPORTS_TO]->(:Employee{firstName:“Andrew”})

REPORTS_TO Steven Andrew

LABEL PROPERTY

NODE NODE

LABEL PROPERTY

Who do people report to?MATCH (e:Employee)<-[:REPORTS_TO]-(sub:Employee)RETURN *

Who do people report to?

Results can be returned as nodes and relationships

Who do people report to?MATCH (e:Employee)<-[:REPORTS_TO]-(sub:Employee)RETURN e.employeeID AS managerID, e.firstName AS managerName, sub.employeeID AS employeeID, sub.firstName AS employeeName;

or alternatively as a table.

Who do people report to?

Who does Robert report to?

MATCH p=(e:Employee)<-[:REPORTS_TO]-(sub:Employee)WHERE sub.firstName = ‘Robert’RETURN p

Who does Robert report to?

What is Robert’s reporting chain?

MATCH p=(e:Employee)<-[:REPORTS_TO*]-(sub:Employee)WHERE sub.firstName = ‘Robert’RETURN p

But the power of the graph is in the ability to query arbitrary length paths.See the asterisks.

What is Robert’s reporting chain?

Who’s the Big Boss?MATCH (e:Employee)WHERE NOT (e)-[:REPORTS_TO]->()RETURN e.firstName as bigBoss

Who’s the Big Boss?

Product Cross-SellingMATCH (choc:Product {productName: 'Chocolade'}) <-[:INCLUDES]-(:Order)<-[:SOLD]-(employee), (employee)-[:SOLD]->(o2)-[:INCLUDES]->(other:Product)RETURN employee.firstName, other.productName, COUNT(DISTINCT o2) as countORDER BY count DESCLIMIT 5;

Product Cross-Selling

(ASIDE ON GRAPH COMPUTE)

Optimized for OLTPBut can be used for Graph ComputeEither with built-in functionsOr server-side extensionsOr via exporting data to spark / graphx for analysis

Shortest Path Between AirportsMATCH p = shortestPath( (a:Airport {code:”SFO”})-[*0..2]-> (b:Airport {code: “MSO”}))RETURN p

Example using built-in algorithms.Dijkstra also available for weighted paths

(END ASIDE ON GRAPH COMPUTE)

POWERING AN APP

Simple App

Simple App

Simple Python Code

Simple Python Code

Simple Python Code

Simple Python Code

LOADING OUR DATA

CSV

CSV files for Northwind

CSV files for Northwind

3 Steps to Creating the Graph

IMPORT NODES CREATE INDEXES IMPORT RELATIONSHIPS

Importing Nodes// Create customersUSING PERIODIC COMMITLOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/developer-resources/gh-pages/data/northwind/customers.csv" AS rowCREATE (:Customer {companyName: row.CompanyName, customerID: row.CustomerID, fax: row.Fax, phone: row.Phone});

// Create productsUSING PERIODIC COMMITLOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/developer-resources/gh-pages/data/northwind/products.csv" AS rowCREATE (:Product {productName: row.ProductName, productID: row.ProductID, unitPrice: toFloat(row.UnitPrice)});

Importing Nodes// Create suppliersUSING PERIODIC COMMITLOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/developer-resources/gh-pages/data/northwind/suppliers.csv" AS rowCREATE (:Supplier {companyName: row.CompanyName, supplierID: row.SupplierID});

// Create employeesUSING PERIODIC COMMITLOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/developer-resources/gh-pages/data/northwind/employees.csv" AS rowCREATE (:Employee {employeeID:row.EmployeeID, firstName: row.FirstName, lastName: row.LastName, title: row.Title});

Importing Nodes// Create categoriesUSING PERIODIC COMMITLOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/developer-resources/gh-pages/data/northwind/categories.csv" AS rowCREATE (:Category {categoryID: row.CategoryID, categoryName: row.CategoryName, description: row.Description});

// Create ordersUSING PERIODIC COMMITLOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/developer-resources/gh-pages/data/northwind/orders.csv" AS rowMERGE (order:Order {orderID: row.OrderID}) ON CREATE SET order.shipName = row.ShipName;

Creating IndexesCREATE INDEX ON :Product(productID);CREATE INDEX ON :Product(productName);CREATE INDEX ON :Category(categoryID);CREATE INDEX ON :Employee(employeeID);CREATE INDEX ON :Supplier(supplierID);CREATE INDEX ON :Customer(customerID);CREATE INDEX ON :Customer(customerName);

Creating RelationshipsUSING PERIODIC COMMITLOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/developer-resources/gh-pages/data/northwind/orders.csv" AS rowMATCH (order:Order {orderID: row.OrderID})MATCH (customer:Customer {customerID: row.CustomerID})MERGE (customer)-[:PURCHASED]->(order);

USING PERIODIC COMMITLOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/developer-resources/gh-pages/data/northwind/products.csv" AS rowMATCH (product:Product {productID: row.ProductID})MATCH (supplier:Supplier {supplierID: row.SupplierID})MERGE (supplier)-[:SUPPLIES]->(product);

Creating RelationshipsUSING PERIODIC COMMITLOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/developer-resources/gh-pages/data/northwind/orders.csv" AS rowMATCH (order:Order {orderID: row.OrderID})MATCH (product:Product {productID: row.ProductID})MERGE (order)-[pu:INCLUDES]->(product)ON CREATE SET pu.unitPrice = toFloat(row.UnitPrice), pu.quantity = toFloat(row.Quantity);

USING PERIODIC COMMITLOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/developer-resources/gh-pages/data/northwind/orders.csv" AS rowMATCH (order:Order {orderID: row.OrderID})MATCH (employee:Employee {employeeID: row.EmployeeID})MERGE (employee)-[:SOLD]->(order);

Creating RelationshipsUSING PERIODIC COMMITLOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/developer-resources/gh-pages/data/northwind/products.csv" AS rowMATCH (product:Product {productID: row.ProductID})MATCH (category:Category {categoryID: row.CategoryID})MERGE (product)-[:PART_OF]->(category);

USING PERIODIC COMMITLOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/developer-resources/gh-pages/data/northwind/employees.csv" AS rowMATCH (employee:Employee {employeeID: row.EmployeeID})MATCH (manager:Employee {employeeID: row.ReportsTo})MERGE (employee)-[:REPORTS_TO]->(manager);

High Performance LOADingneo4j-import

4.58 million thingsand their relationships…

Loads in 100 seconds!

WRAPPING UP

“We found Neo4j to be literally thousands of times faster than our prior MySQL solution, with queries that require 10 to 100 times less code. Today, Neo4j provides eBay with functionality that was previously impossible.”

Volker PacherSenior Developer

THANK YOU!

Ryan Boyd @ryguyrg ryan@neo4j.com

Thank you for listening!