1st UIM-GDB - Connections to the Real World

Post on 10-May-2015

1,825 views 1 download

Tags:

transcript

Connections to the Real WorldGraph Databases and Applications

Achim Friedland <achim@graph-database.org>, Aperis GmbH 1st University-Industrial Meeting on Graph Databases - 7.-8. Feb.. 2011, Barcelona , Spain

2

Let’s change out point of view...

3

Welcome on the customer side... ;)

www.graph-database.org

4

The Graph Representation Problem

Adjacency matrix vs. Incidence matrix vs. Adjacency list vs. Edge list vs. Classes,

Index-based vs. Index-free Adjacency, Dense vs. Sparse graphs, On-disc vs. In-memory

graphs, All-Indexed vs. Specific-Index-Creation, directed vs. undirected edges,

hypergraphs?, hierarchical graphs?, dynamicgraphs?

• Different levels of expressivity• Sometimes very application specific• Hard to optimize a single one for every use-case

The GraphDB Vendor Problem

5

• Multiple APIs from different vendors• Unknown internal graph representation• Unclear design goals• Community involvement?

6

Step 1) Define a common API

The Property-Graph Model

• directed:• attributed:• edge-labeled:• multi-graph:

The most common graph model withinthe NoSQL GraphDB space

Each edge has a source and destination vertexVertices and edges carry key/value pairsThe label denotes the type of relationshipMultiple edges between any two vertices allowed

7

Id: 1name: Alice

age: 21

vertex properties

Id: 2name: Bob

age: 23since: 2009/09/21

edge properties

Friends

edge label

8

• Vertex type vs. vertex interfaces?• Edge label/type vs. edge interfaces?• Vertex<->Edge constraints?• Extension: Undirected Edges?• Extension: Hyperedges?• Extension: Semantic graphs?• Extension: Dynamic graphs?

Property-Graph Constraints?

9

// Use a class-based in-memory graphvar graph = new InMemoryGraph();

var v1 = graph.AddVertex(new VertexId(1));var v2 = graph.AddVertex(new VertexId(2));v1.SetProperty("name", "Alice");v1.SetProperty("age" , 21);v2.SetProperty("name", "Bob");v2.SetProperty("age" , 23);

var e1 = graph.AddEdge(v1, v2, new EdgeId(1), "Friends");e1.SetProperty(“since”, ”2009/09/21”);

A Property Graph Model Interface for Java and .NET

structured data (XML, JSON)

10

• Strings• Integers• DataTime?• byte[]?• structured data like XML/JSON?• List<...>• ...

Supported datatypes?

11

Step 2) Declarative ways for querying

Querying a Graph Database

12

• Programmatic / API• From any programming language, Pipes, ...• Synchronous or Asynchronous • Allow bypassing all optimizations• Do not try to be smarter than the application

developer

• Ad hoc / Explorative• Gremlin aka. “high-level pipes”?• sones GQL, OrientDB QL aka. “SQL style”?• Pattern matching aka. “SPARQL style”?• Easy embedding of domain specific query languages?

13

A data flow framework for property graph models

ISideEffectPipe<in S, out E, out T>S ESource

ElementsEmitted

ElementsTSide Effect

: IEnumerator<E>, IEnumerable<E>

Pipeline<S, E>

14

pipe1<S,A> pipe2<B,C> pipe3<C,E>

SSource

Elements

EEmitted

Elements

Create complex pipes by combining pipes to pipelines

15

// Friends-of-a-friendvar pipe1 = new VertexEdgePipe(VertexEdgePipe.Step.OUT_EDGES);var pipe2 = new LabelFilterPipe("Friends", ComparisonFilter.EQUALS);var pipe3 = new EdgeVertexPipe(EdgeVertexPipe.Step.IN_VERTEX);var pipe4 = new VertexEdgePipe(VertexEdgePipe.Step.OUT_EDGES);var pipe5 = new LabelFilterPipe("Friends", ComparisonFilter.EQUALS);var pipe6 = new EdgeVertexPipe(EdgeVertexPipe.Step.IN_VERTEX);var pipe7 = new PropertyPipe("name");

var pipeline = new Pipeline(pipe1,pipe2,pipe3,pipe4,pipe5,pipe6,pipe7);pipeline.SetSource(new SingleEnumerator( graph.GetVertex(new VertexId(1))));

g:id-v(1)/outE[@label='Friends']/inV/outE[@label='Friends']/inV/@name

A “perl”-style Ad Hoc query language for graphs

16

// Friends-of-a-friendvar pipe1 = new VertexEdgePipe(VertexEdgePipe.Step.OUT_EDGES);var pipe2 = new LabelFilterPipe("Friends", ComparisonFilter.EQUALS);var pipe3 = new EdgeVertexPipe(EdgeVertexPipe.Step.IN_VERTEX);var pipe4 = new VertexEdgePipe(VertexEdgePipe.Step.OUT_EDGES);var pipe5 = new LabelFilterPipe("Friends", ComparisonFilter.EQUALS);var pipe6 = new EdgeVertexPipe(EdgeVertexPipe.Step.IN_VERTEX);var pipe7 = new PropertyPipe("name");

var pipeline = new Pipeline(pipe1,pipe2,pipe3,pipe4,pipe5,pipe6,pipe7);pipeline.SetSource(new SingleEnumerator( graph.GetVertex(new VertexId(1))));

From User u SELECT u.Friends.Friends.nameWHERE u.Id = 1

sones GQL

A “SQL”-style Ad Hoc query language for graphs

17

Step 3) Query result formats

Query Result Formats

18

• Graphs• QR may be queried over and over again• QR may be stored/cached as a graph• But again: (Too) may graph representations available

• Other data structures• If result is just a list, why converting it to a graph?• Simple for programming languages• Much more complicated for Query Languages

19

• Reduced 2-tier architecture (GraphDB -> Client)• Higher performance

• Avoids relational architecture anti-patterns

• Link-aware, self-describing hypermedia (see Neo4J)

• e.g. ATOM, XML + XLINK, RDFa

• User-defined/application specific protocols• E.g. serve HTML/GEXF directly (see CouchDB)

• Allows to create powerful embedded applications

Query Result Formats

20

Step 4) Accessing remote graphs

21

• rexster server• Exposes a graph via HTTP/REST• Vertices and edges are REST resources• Neo4J, OrientDB are available,

InfiniteGraph announced

• rexster client• Accessing remote graphs

A HTTP/REST interface for property graphs

22

Common CRUD operations...

23

Common CRUD operations...

24

What about other HTTP verbs?

• PATCH for applying small changes?• NEIGHBORS?• EXPLORE (more neighbors...)• SHORTESTPATH• CENTRALITY

25

Default resource representation: JSON

curl -H Accept:application/json http://localhost:8182/graph1/vertices/1{ "version" : "0.1", "results" : { "_type" : "vertex", "_id" : "1", "name" : "Alice", "age" : 21 }, "query_time" : 0.014235 }

26

• HTTP caching support?• HTTP Authentication support?• Conditional PUT/POST requests?

Advanced HTTP/REST concepts

27

The GraphDB Graph...

Neo4J for GIS

InfoGrid for WebApps In-Memory for Caching

OrientDB for Documents

OrientDB for Ad Hoc

ThinkerGraph & Gremlin for Ad Hoc

Neo4J for HA

InfiniteGraph for Clustering

28

Questions?

http://www.graph-database.orghttp://www.twitter.com/graphdbs