Post on 27-Jan-2015
description
transcript
BETTING THE COMPANYBETTING THE COMPANY
(LITERALLY) ON A
GRAPH DATABASEGRAPH DATABASE
TIPS, TRICKS, AND LESSONS LEARNED
Aseem KishoreJan 2013
START user=node(1), other=node(2)MATCH (user) -[r1:has|wants]-> (thing) <-[r2:has|wants]- (other)WHERE TYPE(r1) <> TYPE(r2)RETURN TYPE(r1), TYPE(r2), thing
SO…
JUST WHAT IS A
GRAPH DATABASEGRAPH DATABASE?
# adjacency list:nodes = List<Node>neighbors = Map<Node, List<Node>>neighbors[node1].add(node2)
# adjacency matrix:nodes = List<Node>connections = Map<Node, Map<Node, bool>>connections[node1][node2] = true
“ By definition, a graph database is any storagesystem that provides index-free adjacency. ”
“ This means that every element contains adirect pointer to its adjacent element and noindex lookups are necessary. ”
QUERYING
1. Start somewhere
2. Traverse elsewhere
QUERYING IN NEO4J
1. Start somewhereRoot nodeID directly (file offset)
Lucene index
2. Traverse elsewhere
Traversal APIsCypher patternsBuilt-in graph algos (Djikstra, A*, etc.)
NEO4J USAGE
Embedded mode (Java API)
Server mode (REST API)
Cypher query language (both)
OUR USAGE
NODE.JS
+
REST API
+
CYPHER
NEO4J EDITIONS
Community editionSingle instanceOffline backup
Advanced editionMeh
Enterprise editionMulti-instance cluster!
Online backup!
NEO4J SCALING
Master-slave replication
Cache-based sharding
Feature-based polyglot'ing
64B limit on nodes, rels, propsBut can be easily upped; just flipping some bits100 props/node (high) ⇒ 640M nodes
OKAY...
LET'S TALK ABOUT
WHAT WE LEARNEDWHAT WE LEARNED
WHAT WE LEARNED
Unique, expressive relationship types
WHAT WE LEARNED
Unique, expressive relationship types
Cache stats where possible
WHAT WE LEARNED
Unique, expressive relationship types
Cache stats where possible
Capture history through event nodes
WHAT WE LEARNED
Unique, expressive relationship types
Cache stats where possible
Capture history through event nodes
First-class objects ⇒ nodes, not rels
WHAT WE LEARNED
Unique, expressive relationship types
Cache stats where possible
First-class objects ⇒ nodes, not rels
Capture history through event nodes
Connected data ⇒ nodes, not props
WHAT WE LEARNED
Unique, expressive relationship types
Cache stats where possible
First-class objects ⇒ nodes, not rels
Capture history through event nodes
Connected data ⇒ nodes, not props
Maintain linked lists for O(1) queries
NEO4J ROADMAP
Overhaul of indexing API
Relationship type grouping
Socket and/or binary protocol
Automatic sharding?
THANKS!
TWITTER: @ASEEMK
GITHUB: @ASEEMK
EMAIL: ASEEM.KISHORE@GMAIL.COM
Questions?