Traversing Graph Databases with Gremlin

Post on 06-May-2015

9,065 views 2 download

Tags:

description

A discussion of Blueprints, Pipes, and Gremlin. The presentation's second half was a live Gremlin tutorial/demo.

transcript

Traversing Graph Databases with Gremlin

Marko A. RodriguezGraph Systems Architecthttp://markorodriguez.com

NoSQL New York City Meetup – May 16, 2011

GremlinG = (V,E)

May 10, 2011

Thank You Sponsors

Short Slideshow + Live Demo = This Presentation

TinkerPop Productions

1

1TinkerPop (http://tinkerpop.com), Marko A. Rodriguez (http://markorodriguez.com), PeterNeubauer (http://www.linkedin.com/in/neubauer), Joshua Shinavier (http://fortytwo.net/), PavelYaskevich (https://github.com/xedin), Darrick Wiebe (http://ofallpossibleworlds.wordpress.com/), Stephen Mallette (http://stephen.genoprime.com/), Alex Averbuch (http://se.linkedin.com/in/alexaverbuch)

TinkerPop Graph Stack

TinkerGraph

http://host:8182/graph/vertices/1

Graph Database

Generic Interface

Traversal Engine

Traversal Language

RESTful Server

Graph-to-Object Mapper

TinkerPop Graph Stack – Focus

TinkerGraph

http://host:8182/graph/vertices/1

Generic Interface

Traversal Engine

Traversal Language

Blueprints –> Pipes –> Groovy –> Gremlin

To understand Gremlin, its important to understand Groovy,Blueprints, and Pipes.

Blueprints Pipes

Gremlin is a Domain Specific Language

• Gremlin 0.7+ uses Groovy as its host language.2

• Gremlin 0.7+ takes advantage of Groovy’s meta-programming, dynamictyping, and closure properties.

• Groovy is a superset of Java and as such, natively exposes the full JDKto Gremlin.

2Groovy available at http://groovy.codehaus.org/.

Gremlin is for Property Graphs

name = "marko"age = 29

1

4

knows

weight = 1.0

name = "josh"age = 32

name = "vadas"age = 27

2

knows

weight = 0.5

created

weight = 0.4

name = "lop"lang = "java"

3created

weight = 0.4

name = "ripple"lang = "java"

5

created

weight = 1.0

name = "peter"age = 35

6

created

weight = 0.2

78

9

11

10

12

A graph is composed of vertices (nodes), edges (links), and properties (keys/values).

Gremlin is for Blueprints-enabled Graph Databases

• Blueprints can be seen as the JDBC for property graph databases.3

• Provides a collection of interfaces for graph database providers to implement.

• Provides tests to ensure the operational semantics of any implementation are correct.

• Provides numerous “helper utilities” to make working with graph databases easy.

Blueprints

3Blueprints is available at http://blueprints.tinkerpop.com.

A Blueprints Detour - Implementations

TinkerGraph

A Blueprints Detour - Ouplementations

JUNGJava Universal Network/Graph Framework

Gremlin Compiles Down to Pipes

• Pipes is a data flow framework for evaluating lazy graph traversals.4

• A Pipe extends Iterator, Iterable and can be chained togetherto create processing pipelines.

• A Pipe can be embedded into another Pipe to allow for nestedprocessing.

Pipes4Pipes is available at http://pipes.tinkerpop.com.

A Pipes Detour - Chained Iterators

• This Pipeline takes objects of type A and turns them into objects oftype D.

Pipe1A B Pipe2 C Pipe3 D

Pipeline

A

AA

A

D

DD

D

Pipe<A,D> pipeline =

new Pipeline<A,D>(Pipe1<A,B>, Pipe2<B,C>, Pipe3<C,D>)

A Pipes Detour - Simple Example

“What are the names of the people that marko knows?”

A C

B

D

knows

knows

created

createdname=marko

name=peter

name=pavel

name=gremlin

A Pipes Detour - Simple ExamplePipe<Vertex,Edge> pipe1 = new OutEdgesPipe("knows");

Pipe<Edge,Vertex> pipe2 = new InVertexPipe();

Pipe<Vertex,String> pipe3 = new PropertyPipe<String>("name");

Pipe<Vertex,String> pipeline = new Pipeline(pipe1,pipe2,pipe3);

pipeline.setStarts(new SingleIterator<Vertex>(graph.getVertex("A"));

A C

B

D

knows

knows

created

createdname=marko

name=peter

name=pavel

OutEdgesPipe("knows")

InVertexPipe()

PropertyPipe("name")

name=gremlin

HINT: The benefit of Gremlin is that this Java verbosity is reduced to g.v(‘A’).outE(‘knows’).inV.name.

A Pipes Detour - Pipes Library

[ FILTERS ]

AndFilterPipe

BackFilterPipe

CollectionFilterPipe

ComparisonFilterPipe

DuplicateFilterPipe

FutureFilterPipe

ObjectFilterPipe

OrFilterPipe

RandomFilterPipe

RangeFilterPipe

[ GRAPHS ]

OutEdgesPipe

InEdgesPipe

OutVertexPipe

InVertexPip

IdFilterPipe

IdPipe

LabelFilterPipe

LabelPipe

PropertyFilterPipe

PropertyPipe

[ SIDEEFFECTS ]

AggregatorPipe

GroupCountPipe

CountPipe

SideEffectCapPipe

[ UTILITIES ]

GatherPipe

PathPipe

ScatterPipe

Pipeline

...

A Pipes Detour - Creating Pipes

public class NumCharsPipe extends AbstractPipe<String,Integer> {

public Integer processNextStart() {

String word = this.starts.next();

return word.length();

}

}

When extending the base class AbstractPipe<S,E> all that is required isan implementation of processNextStart().

Now onto Gremlin proper...

The Gremlin Architecture

OrientDB DEXNeo4j

The Many Ways of Using Gremlin

• Gremlin has a REPL to be run from the shell.

• Gremlin can be natively integrated into any Groovy class.

• Gremlin can be interacted with indirectly through Java, via Groovy.

• Gremlin has a JSR 223 ScriptEngine as well.

Pipe = Step: 3 Generic Steps

In general a Pipe is a “step” in Gremlin. Gremlin is founded on a collectionof atomic steps. The syntax is generally seen as step.step.step. Thereare 3 categories of steps.

• transform: map the input to some output. S → T

? outE, inV, paths, copySplit, fairMerge, . . .

• filter: either output the input or not. S → (S ∪ ∅)

? back, except, uniqueObject, andFilter, orFilter, . . .

• sideEffect: output the input and yield a side-effect. S → S

? aggregate, groupCount, . . .

Abstract/Derived/Inferred Adjacency

}

outE

inVoutE

loop(2){..} back(3){it.salary}

friend_of_a_friend_who_earns_less_than_friend_at_work

• outE, inV, etc. is low-level graph speak (the domain is the graph).

• codeveloper is high-level domain speak (the domain is software development).5

5In this way, Gremlin can be seen as a DSL (domain-specific language) for creating DSL’s for yourgraph applications. Gremlin’s domain is “the graph.” Build languages for your domain on top of Gremlin(e.g. “software development domain”).

Explicit Graph

Developer CreatedGraph

Software ImportsGraph

FriendshipGraph

EmploymentGraph

Developer Imports Graph

Friend-Of-A-FriendGraph

Friends at WorkGraph

Developer's FOAF ImportGraph

You need not make derived graphs explicit. You can, at runtime, compute them. Moreover, generate them locally, not globally (e.g. ``Marko's friends from work relations").

This concept is related to automated reasoning and whether reasoned relations are inserted into the explicit graph or computed at query time.

Integrating the JDK (Java API)

• Groovy is the host language for Gremlin. Thus, the JDK is nativelyavailable in a path expression.

gremlin> v.out(‘friend’){it.name.matches(‘M.*’)}.name

==>Mattias

==>Marko

==>Matt

...

gremlin> x.out(‘rated’).transform{JSONParser.get(it.uri).stars}.mean()

...

gremlin> y.outE(‘rated’).filter{TextAnalysis.isElated(it.review)}.inV

...

Time for a demo

marko$ gremlin

\,,,/

(o o)

-----oOOo-(_)-oOOo-----

gremlin>

GremlinG = (V,E)

http://gremlin.tinkerpop.com

http://groups.google.com/group/gremlin-users

http://tinkerpop.com

Graph Bootcamp

http://markorodriguez.com/services/workshops/graph-bootcamp/

• June 23-24th in Chicago, Illinois.

• August X-Yth in Denver, Colorado.

• Book a private gig for your organization.