+ All Categories
Home > Technology > Gremlin: A Graph-Based Programming Language

Gremlin: A Graph-Based Programming Language

Date post: 08-May-2015
Category:
Upload: marko-rodriguez
View: 33,706 times
Download: 0 times
Share this document with a friend
Description:
Gremlin is a Turing-complete, graph-based programming language developed for key/value-pair multi-relational graphs called property graphs. Gremlin makes extensive use of XPath 1.0 to support complex graph traversals. Connectors exist to various graph databases and frameworks. This language has application in the areas of graph query, analysis, and manipulation.
60
Gremlin G =(V,E ) A Graph-Based Programming Language Marko A. Rodriguez T-5, Center for Nonlinear Studies Los Alamos National Laboratory http://markorodriguez.com http://gremlin.tinkerpop.com February 25, 2010
Transcript
Page 1: Gremlin: A Graph-Based Programming Language

GremlinG = (V,E)

A Graph-Based Programming Language

Marko A. RodriguezT-5, Center for Nonlinear StudiesLos Alamos National Laboratoryhttp://markorodriguez.com

http://gremlin.tinkerpop.com

February 25, 2010

Page 2: Gremlin: A Graph-Based Programming Language

AbstractGremlin is a Turing-complete, graph-based programming languagedeveloped for key/value-pair multi-relational graphs called property graphs.Gremlin makes extensive use of XPath 1.0 to support complex graphtraversals. Connectors exist to various graph databases and frameworks.This language has application in the areas of graph query, analysis, andmanipulation.

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 3: Gremlin: A Graph-Based Programming Language

Acknowledgements

• Marko A. Rodriguez [http://markorodriguez.com]

designed, developed, tested, and documented Gremlin.

• Peter Neubauer [http://www.linkedin.com/in/neubauer]

aided in the design and the evangelizing of Gremlin.

• Pavel Yaskevich [http://github.com/xedin]

aided in the development of user defined functions in Gremlin.

• Joshua Shinavier [http://fortytwo.net]

provided initial conceptual support for Gremlin.

• Ketrina Yim [http://csillustrated.berkeley.edu]

designed the logo for Gremlin.

• Gremlin-Users Group [http://groups.google.com/group/gremlin-users]

provided much direction in the design and implementation of Gremlin.

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 4: Gremlin: A Graph-Based Programming Language

Outline

• Introduction to Graphs and Graph Software

• Basic Gremlin Concepts

• Gremlin Language Description

• Advanced Gremlin Concepts

• Conclusions

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 5: Gremlin: A Graph-Based Programming Language

Outline

• Introduction to Graphs and Graph Software

• Basic Gremlin Concepts

• Gremlin Language Description

• Advanced Gremlin Concepts

• Conclusions

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 6: Gremlin: A Graph-Based Programming Language

What is a Graph?• A graph (network) is composed of a collection of vertices (dots) and edges (lines).

There are many types of graphs: directed/undirected, weighted, attributed, etc.

http://ex.com/123

a

0.2 knowsmul

ti

weighted

directed

edge-labeled

vertex-labeled

undi

rect

edtype="person"name="emil"

vertex-attributed

created=2-01-09modified=2-11-09

edge-attributedhyper

pseudo

resource description framework

regular

half-

edge se

mantic

hired

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 7: Gremlin: A Graph-Based Programming Language

Why Use a Graph?

• A graph is a very general data structure that can be used to modelvarious systems.

? A graph can model the structure of transportation, technological,bibliographic, etc. systems.

? A graph can model a list, a map, a tree, etc.

• There are numerous graph algorithms that are defined independent ofthe domain of the graph model.

• There are numerous graph databases, frameworks, packages, etc.that aid in the creation, manipulation, and analysis of graphs.

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 8: Gremlin: A Graph-Based Programming Language

Graph Databases, Frameworks, and Packages

• Neo4j Graph Database [http://neo4j.org]

• AllegroGraph Quad Store [http://http://www.franz.com/agraph]

• HyperGraphDB [http://www.kobrix.com/hgdb.jsp]

• Java Universal Network/Graph Framework [http://jung.sourceforge.net]

• OpenRDF Sesame Framework [http://www.openrdf.org]

• InfoGrid Graph Database [http://infogrid.org]

• Filament Graph Toolkit [http://filament.sourceforge.net]

• OWLim Semantic Repository [http://www.ontotext.com/owlim]

• Sones Graph Database [http://www.sones.com]

• NetworkX Graph Toolkit [http://networkx.lanl.gov]

• iGraph Toolkit [http://igraph.sourceforge.net]

• Blueprints Graph API [http://blueprints.tinkerpop.com]

• ... and many more.

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 9: Gremlin: A Graph-Based Programming Language

What Makes Gremlin Different?

• Gremlin is a domain specific language for working with graphs.

• Gremlin is not an application programming interface (API).

• Gremlin makes use of various graph databases, frameworks, packages.

• Gremlin is a language that currently has a virtual machineimplementation written in Java.

• What can be succinctly expressed in Gremlin is verbose/clumsy toexpress in general purpose languages such as Java, Python, Ruby, etc.

• Gremlin allows one to map single-relational graph analysis algorithmsover to the multi-relational domain.

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 10: Gremlin: A Graph-Based Programming Language

Single-Relational Graphs• In single-relational graphs, all edges have the same meaning

(e.g. all edges are either frienship, kinship, worksWith, knows, etc.).

? G = (V,E ⊆ (V × V ))

• Most graph algorithms are defined for single-relational graphs(e.g. centrality/ranking, clustering/community detection, etc.).

person-a person-b

person-c

NOTE: These types of graphs are also known as directed, vertex-labeled graphs.

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 11: Gremlin: A Graph-Based Programming Language

Multi-Relational Graphs

• In multi-relational graphs, edges can have different meanings.

? G = (V,E ⊂ (V × V ), ω : E → Σ∗)

• Most graph software is designed for multi-relational graphs (e.g. arbitraryobjects as vertices and edges, knowledge-based reasoning systems, etc.).

person-a book-b

book-c

read cites

authored

NOTE: These types of graphs are also known as directed, vertex/edge-labeled graphs.

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 12: Gremlin: A Graph-Based Programming Language

Gremlin and Multi-Relational Graphs

• Gremlin provides a means to elegantly map single-relational graphanalysis algorithms over to the multi-relational graph domain.

• Gremlin provides an elegant way to do automated reasoning inmulti-relational graphs using path expressions.

These two points form the primary thesis of this presentation.

Rodriguez M.A., Shinavier, J., “Exposing Multi-Relational Networks to Single-Relational Network Analysis

Algorithms,” Journal of Informetrics, 4(1), 29–41, doi:10.1016/j.joi.2009.06.004, LA-UR-08-03931,

http://arxiv.org/abs/0806.2274, December 2009.

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 13: Gremlin: A Graph-Based Programming Language

Property Graphs

• Gremlin works with a type of multi-relational graph called a propertygraph.

? Vertices and edges are labeled with unique identifiers.? Edges are directed, labeled, and can form loops.? Multiple edges of the same label can exist for the same vertex pair.? Vertices and edges can have any number of key/value pair

properties/attributes.

Property graphs are a relatively general graph structure that can be constrained to model other graph

structures — though, a property-based hypergraph would be the most general (see HyperGraphDB and the

JUNG API).

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 14: Gremlin: A Graph-Based Programming Language

Property Graphs

name = "marko"age = 29

1

4

knows

weight = 1.0

name = "josh"age = 32

name = "vadas"age = 27

2

knows

weight = 0.5

created

weight = 0.4

name = "lop"lang = "java"

3

created

weight = 0.4

name = "ripple"lang = "java"

5

created

weight = 1.0

name = "peter"age = 35

6

created

weight = 0.2

78

9

11

10

12

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 15: Gremlin: A Graph-Based Programming Language

Outline

• Introduction to Graphs and Graph Software

• Basic Gremlin Concepts

• Gremlin Language Description

• Advanced Gremlin Concepts

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 16: Gremlin: A Graph-Based Programming Language

Gremlin System Architecture

NativeStore TinkerGraphNeo4j

GremlinConsole

Gremlin ScriptEngine

• The Gremlin console is a scripting environmentwhich allows for the dynamic evaluation ofGremlin code.

• Gremlin implements JSR 223 which allowsGremlin to also be used within the Javalanguage and thus, as a virtual machine directlyaccessible to Java applications. Popular JSR223 implementations include Jython, JRuby, andGroovy. For a fine list of implementations seehttps://scripting.dev.java.net.

• Blueprints is a set of interfaces for abstractdata structures such as graphs and documents.Implementations to these interfaces exist forvarious data management systems.

• There exist many graph data managementsystems that span various graph data models(e.g. edge labeled graphs, RDF graphs,hypergraphs, etc.).

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 17: Gremlin: A Graph-Based Programming Language

“Hello World” in the Gremlin Console

marko$ ./gremlin.sh

\,,,/(o o)

-----oOOo-(_)-oOOo-----gremlin>gremlin> concat(‘goodbye’, ‘ ’, ‘self’)==>goodbye self

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 18: Gremlin: A Graph-Based Programming Language

Simple Traversals in Gremlin

name = "marko"age = 29

1

4

knows

name = "josh"age = 32

name = "vadas"age = 27

2

knows

created

name = "lop"lang = "java"

3

created

5

created

6

created

78

9

11

10

12

weight = 1.0

weight = 0.5

weight = 0.4

gremlin> $_ := g:key(‘name’,‘marko’)==>v[1]gremlin> .==>v[1]gremlin> ./outE==>e[7][1-knows->2]==>e[9][1-created->3]==>e[8][1-knows->4]gremlin> ./outE/@weight==>0.5==>0.4==>1.0

./outE/@weight: “Get the current object(s). Then get the outgoing edges of those objects. Then get the

weights of those edges.”

$ is a reserved variable meaning the root list of objects.

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 19: Gremlin: A Graph-Based Programming Language

Simple Traversals in Gremlin

name = "marko"age = 29

1

4

knows

2

knows

created

name = "lop"lang = "java"

3

created

5

created

6

created

78

9

11

10

12

gremlin> .==>v[1]gremlin> ./outE[@label=‘created’]/inV==>v[3]gremlin> $_ := $_last==>v[3]gremlin> ./@name==>lopgremlin> g:map(.)==>name=lop==>lang=java

./outE[@label=‘created’]/inV: “Get the current object(s). Then get the outgoing edges of those

objects, where their labels equal ‘created’. Then get the incoming vertices of those ‘created’ edges.”

$ last is a reserved variable meaning the last value evaluated.

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 20: Gremlin: A Graph-Based Programming Language

Simple Traversals in Gremlin

name = "marko"age = 29

1

4

knows

name = "josh"age = 32

name = "vadas"age = 27

2

knows

created

name = "lop"lang = "java"

3

created

5

created

6

created

78

9

11

10

12

./outE[@label=‘knows’]/inV[matches(@name,‘va.{3}’) and @age > 21]/@name==>vadas

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 21: Gremlin: A Graph-Based Programming Language

Simple Traversals in Gremlin

./outE[@label=‘knows’]/inV[matches(@name,‘va.{3}’) and @age > 21]/@name

1. .: Get the current object(s).

2. outE[@label=‘knows’]: Get the outgoing edges of the currentobject(s), where their labels equal ‘knows’.

3. inV[matches(@name,‘va.{3}’) and @age > 21]: Get the incomingvertices of those ‘knows’ edges, where the names of those vertices are 5characters long, start with ‘va’, and whose age is greater than 21.

4. @name: get the name of those particular incoming vertices.

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 22: Gremlin: A Graph-Based Programming Language

Knowledge-Based Reasoning

• Blueprints implements the Sesame SAIL interfaces and thus, Gremlincan be used over the many Resource Description Framework (RDF)triple/quad stores. In such cases, RDF is modeled as a property graphwhere the named graph component is the @ng edge property.

• Gremlin makes use of the Sesame SAIL SPARQL engine to allow forqueries based on graph-pattern matching.

gremlin> sail:sparql(‘SELECT ?x ?y WHERE { ?x foaf:knows ?y }’)==>{y=v[http://ex.com#2], x=v[http://ex.com#1]}==>{y=v[http://ex.com#4], x=v[http://ex.com#1]}

• Gremlin is useful for knowledge-based reasoning using pathexpressions.

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 23: Gremlin: A Graph-Based Programming Language

Reasoning as Defining New Types of Adjacency

marko

josh

knows

vadas

knows

created

lop

created

ripple

created

peter

created

co-developer

co-developer

For these “co-developer” examples, we will use

vertex 1 (marko) as the source of the reasoning

process.

• Graph-based reasoning is the processof making explicit what is implicit inthe graph.

• A reasoner takes a graph G

and a collection of graph-patterns

(i.e. transformation/rewrite rules) and

creates a new graph G′ (usually, G ⊂G′). G′ has new relationships/edges

and thus, new definitions of vertexadjacency.

• Example: The co-developers of person

A are those people who have created

the same software as person A and who

are themselves, not person A (as person

A has created the same software as him

or herself).

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 24: Gremlin: A Graph-Based Programming Language

The Co-Developers of Marko A. Rodriguez in SPARQL

name = "marko"age = 29

1

4

knows

name = "josh"age = 32

2

knows

created

name = "lop"lang = "java"

3

created

5

created

name = "peter"age = 35

6

createdmarko

?x

?x

?y

?z

?z

SELECT ?x WHERE {marko created ?y .?z created ?y .?z != marko .?z name ?x

}

This query would return: josh andpeter.

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 25: Gremlin: A Graph-Based Programming Language

The Co-Developers of Marko A. Rodriguez in Gremlin

marko

josh

knows

vadas

knows

created

lop

created

ripple

created

peter

created

co-developer

co-developer

co-developer

gremin> ./@name==>markogremlin> ./outE[@label=‘created’]/inV/inE[@label=‘created’]/outV[g:except($_)]/@name==>josh==>peter

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 26: Gremlin: A Graph-Based Programming Language

The Co-Developers of Marko A. Rodriguez in Gremlin

./outE[@label=‘created’]/inV/inE[@label=‘created’]/outV[g:except($_)]/@name

1. .: Get the current object(s) (i.e. vertex 1 — denoting Marko).

2. outE[@label=‘created’]: Get the outgoing edges of the Marko vertex, where their

labels equal ‘created’.

3. inV: Get the incoming (i.e. head) vertices of those ‘created’ edges.

4. inE[@label=‘created’]: Get the incoming edges of those vertices, where their

labels equal ‘created’.

5. outV[g:except($ )]: Get the outgoing (i.e. tail) vertices of those ‘created’ edges,

where those vertices are not the Marko vertex.

6. @name: get the name of those non-Marko vertices.

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 27: Gremlin: A Graph-Based Programming Language

Defining Co-Developers in Gremlin

path co-developer./outE[@label=‘created’]/inV/inE[@label=‘created’]/outV[g:except($_)]end

Once defined, you can use it like any other path segment.

gremlin> ./co-developer==>v[4]==>v[6]gremlin> ./co-developer/@name==>josh==>peter

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 28: Gremlin: A Graph-Based Programming Language

Defining Co-Developers in Java

public class CoDeveloperPath implements Path {public List invoke(Object root) {

if(root instanceof Vertex) {List<Vertex> projects = new ArrayList<Vertex>();for(Edge edge : ((Vertex)root).getOutEdges()) {

if(edge.getLabel().equals("created")) {projects.add(edge.getInVertex());

}}List<Vertex> coDevelopers = new ArrayList<Vertex>();for(Vertex project : projects) {

for(Edge edge : project.getInEdges()) {if(edge.getLabel().equals("created") && edge.getOutVertex() != root) {

coDevelopers.add(edge.getOutVertex());}

}}return coDevelopers;

} else {return null;

}}

}

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 29: Gremlin: A Graph-Based Programming Language

Outline

• Introduction to Graphs and Graph Software

• Basic Gremlin Concepts

• Gremlin Language Description

• Advanced Gremlin Concepts

• Conclusions

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 30: Gremlin: A Graph-Based Programming Language

Gremlin Type System

object

graphelement

vertex edge

booleannumber string listmap

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 31: Gremlin: A Graph-Based Programming Language

Predefined Paths and Properties

1

4

knows created

3created

vertex 4 id

vertex 1 out edges

9

8 11

edge 9 labelvertex 3 in edges

edge 9 in vertexedge 9 out vertexedge 9 id

name = "josh"age = 32

vertex 4 properties

object property description example

graph V the vertex iterator of the graph $g/Vgraph E the edge iterator of the graph $g/E

vertex/edge @id the identifier of the element $v/@idvertex outE the outgoing edges of the vertex $v/outEvertex inE the incoming edges of the vertex $v/inEvertex bothE both in and out edges of the vertex $v/bothEedge outV the outgoing tail vertex of the edge $e/outVedge inV the incoming head vertex of the edge $e/outVedge bothV both in and out vertices of the edge $e/bothVedge @label the label of the edge $e/@label

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 32: Gremlin: A Graph-Based Programming Language

Predefined Functions

g:assign()g:assign()g:unassign()g:id()g:key()g:add-v()g:add-e()g:remove-ve()g:idx-all()g:add-idx()

g:remove-idx()g:load()g:save()g:clear()g:close()g:keys()g:values()g:map()g:get()g:op-value()

g:list()g:dedup()g:union()g:intersect()g:difference()g:retain()g:except()g:remove()g:get()g:op-value()

g:sort()g:map()g:keys()g:values()g:rand-nat()g:rand-real()g:prob()g:cont()g:halt()g:type()

g:print()g:time()g:p()g:to-json()g:from-json()......

There are over 70 predefined functions. See the following for a description of each.

http://wiki.github.com/tinkerpop/gremlin/core-function-library

http://wiki.github.com/tinkerpop/gremlin/gremlin-function-library

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 33: Gremlin: A Graph-Based Programming Language

Working With Non-Graph Typesgremlin> 1.2 + 6==>7.2gremlin> ‘this is a string’==>this is a stringgremlin> true() or false()==>truegremlin> g:map(‘marko’,‘lanl’,‘peter’,‘neotech’,‘josh’,‘rpi’)==>marko=lanl==>peter=neotech==>josh=rpigremlin> g:list(‘graphs’,‘hockey’,‘motorcylces’,6)==>graphs==>hockey==>motorcylces==>6.0

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 34: Gremlin: A Graph-Based Programming Language

Working With Non-Graph Types

gremlin> $m := g:map(‘hobbies’,g:list(‘hockey’,‘graphs’),‘location’, g:map(‘state’,‘new mexico’, ‘city’, ‘santa fe’,

‘zipcode’, 87501), ‘age’, 30)==>location={zipcode=87501.0, state=new mexico, city=santa fe}==>age=30.0==>hobbies=[hockey, graphs]gremlin> $m/@age==>30.0gremlin> $m/@hobbies[2]==>graphsgremlin> $m/@location/@city==>santa fe

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 35: Gremlin: A Graph-Based Programming Language

Variables

• Variables in Gremlin are prefixed with a $ character.

• There are a collection of reserved variables that all begin with $ .

? $ is the root list of objects.? $ last is the last result evaluated by the evaluator.? $ g is the “working graph” to reduce typing with graph functions.

gremlin> $x := 1==>1.0gremlin> $y := 2==>2.0gremlin> $x + $y==>3.0

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 36: Gremlin: A Graph-Based Programming Language

Language Statements

Variable Assignment

gremlin> $i := 1 + 5==>6.0gremlin> $i==>6.0

If/Else

gremlin> if true()$i := 1

else$i := 2end

==>1.0

Repeat

gremlin> $i := 0==>0.0gremlin> repeat 10$i := $i + 1end

==>10.0

While

gremlin> $i := ‘g’==>ggremlin> while not(matches($i, ‘ggg’))$i := concat($i,‘g’)end

==>ggg

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 37: Gremlin: A Graph-Based Programming Language

Language StatementsForeach

gremlin> $i := 0==>0.0gremlin> foreach $j in 1 | 2 | 3

$i := $i + $jend

==>6.0

Function

gremlin> func ex:hello($name)concat(‘hello ’, $name)end

gremlin> ex:hello(‘pavel’)==>hello pavel

Path

gremlin> path friend_name./outE[@label=‘knows’]/inV/@nameend

gremlin> gremlin> ./friend_name==>vadas==>josh

You can define functions and paths in native Gremlin (as demonstrated above) or in Java.

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 38: Gremlin: A Graph-Based Programming Language

XPath Filters

• Use [ ] filters to filter objects in a path expression (i.e. “such that” or“where”)

• The evaluated result of [ ] must be a number or boolean.

? If its a number, it is treated as the position within an array (i.e. list).? If it is boolean, it is treated as whether to include or exclude the

object from the next path in the sequence.

gremlin> ./outE[@label=‘knows’]==>e[7][1-knows->2]==>e[8][1-knows->4]gremlin> ./outE[@label=‘knows’ and @weight>0.5]/inV[@age<21 or @name=‘josh’][true()][1]==>v[4]

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 39: Gremlin: A Graph-Based Programming Language

Outline

• Introduction to Graphs and Graph Software

• Basic Gremlin Concepts

• Gremlin Language Description

• Advanced Gremlin Concepts

• Conclusion

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 40: Gremlin: A Graph-Based Programming Language

A Grateful Dead Dataset

2,500 concerts35,000 songs played600 songs30 years11 members1 band... the Grateful Dead.

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 41: Gremlin: A Graph-Based Programming Language

A Grateful Dead Dataset

• vertices denote songs and artists

? type: “song” or “artist”? name: name of song or artist.? performances: number of times song was

played in concert.? song type: whether the song was a “cover”

or “original”.

• edges denote followed by, sung by,written by

? weight: number of times a song wasfollowed by another song over all concertsplayed.

Rodriguez, M.A., Gintautas, V., Pepe, A., “A Grateful Dead Analysis: The Relationship Between Concert and Listening

Behavior,” First Monday, 14(1), University of Illinois at Chicago Library, http://arxiv.org/abs/0807.2466, January 2009.

NOTE: A portion of the raw dataset courtesy of Mark Leone http://www.cs.cmu.edu/ mleone/gdead/setlists.html

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 42: Gremlin: A Graph-Based Programming Language

A Grateful Dead Dataset

Stanley TheaterPittsburgh, PA (11/30/79)

2nd Set-------------------Scarlet BegoniasFire on the MountainPassengerTerrapin Station......

1

type="song"name="Scarlet.."

2

type="song"name="Fire on.."

3

type="song"name="Pass.."

4

type="song"name="Terrap.."

followed_by

followed_by

followed_by

weight=239

weight=1

weight=2

5

type="artist"name="Garcia"

sung_by

sung_by

6

type="artist"name="Lesh"

sung_by

sung_by

7

type="artist"name="Hunter"

written_by

written_by

written_by

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 43: Gremlin: A Graph-Based Programming Language

A Grateful Dead Dataset – Load Data/Basic Stats

gremlin> g:load(‘data/graph-example-2.xml’)==>truegremlin> count($_g/V)==>809.0gremlin> count($_g/E)==>8049.0

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 44: Gremlin: A Graph-Based Programming Language

A Grateful Dead Dataset – Out-Degree of Each Vertex

gremlin> $degrees := g:map()gremlin> foreach $v in $_g/V$degrees[@name=$v/@name] := count($v/outE)

end

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 45: Gremlin: A Graph-Based Programming Language

A Grateful Dead Dataset – Out-Degree of Each Vertex

gremlin> g:sort($degrees, ‘value’, true())==>PLAYING IN THE BAND=96.0==>SUGAR MAGNOLIA=92.0==>PROMISED LAND=89.0==>GOOD LOVING=87.0==>NOT FADE AWAY=86.0==>I KNOW YOU RIDER=85.0==>CASSIDY=83.0==>DEAL=82.0==>JACK STRAW=81.0==>ONE MORE SATURDAY NIGHT=81.0==>EL PASO=80.0==>MEXICALI BLUES=79.0...

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 46: Gremlin: A Graph-Based Programming Language

A Grateful Dead Dataset – Inspecting Single Vertex

gremlin> $v := g:key(‘name’,‘CHINA DOLL’)[1]==>v[129]gremlin> g:map($v)==>name=CHINA DOLL==>song_type=original==>performances=114==>type=songgremlin> $v/outE[@label=‘sung_by’]/inV/@name==>Garcia

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 47: Gremlin: A Graph-Based Programming Language

A Grateful Dead Dataset – Inspecting Single Vertex

gremlin> $v/outE[@label=‘followed_by’]/inV/@name==>BIG RIVER==>THROWING STONES==>SAMSON AND DELILAH==>TRUCKING==>CASEY JONES==>HIGH TIME...gremlin> $v/outE[@label=‘followed_by’]/@weight==>2==>8==>1==>2==>1==>1...

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 48: Gremlin: A Graph-Based Programming Language

Introduction to PageRank• The remainder of this section will discuss the PageRank algorithm and

its application to multi-relational graphs.

• The arguments made and the examples presented generalizes to all othersingle-relational graph algorithms. However, for the sake of brevity andconsistency, only PageRank will be discussed.

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 49: Gremlin: A Graph-Based Programming Language

Introduction to Matrix-Based PageRank

• PageRank is a centrality measure based on the primary eigenvector

of a modified version of a graph. Let A ∈ R+|V |×|V | denote theadjacency matrix representing the graph.

• In order to ensure a positive real values in the eigenvector, the graphmust be strongly connected. PageRank induces strong connectivityby overlaying a low probability (defined by α ∈ [0, 1] – usually 0.15)

“teleportation” graph over the original graph. Let B ∈ 1|V ||V |×|V |

denote

a teleportation adjacency matrix where ever vertex is connected to vertexwith equal probability.

? C = (1− α)A + αB, where C ∈ R+|V |×|V |

? λ = λC, where λ ∈ R+|V | is the PageRank vector over V .

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 50: Gremlin: A Graph-Based Programming Language

Introduction to Random Walk-Based PageRank

• PageRank can be implemented by a random walk.

• Create a vertex counter map, m : V → N+.

• Place a walker on a random vertex in V . Denote the walker’s currentvertex i ∈ V .

1. increment the vertex counter by 1 (i.e. m(i)← m(i) + 1).2. the walker chooses a random adjacent vertex with probability α.3. the walker chooses a random vertex in V with probability 1− α.4. rinse and repeat until m reaches a stationary probability distribution

(continually normalize m if you want a probability distribution).

• We will use this random walk model in the Gremlin examples to follow.

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 51: Gremlin: A Graph-Based Programming Language

PageRank over Multi-Relational Graphs

• PageRank was designed for single-relational graphs (i.e. where all edgeshave the same meaning).

• In a multi-relational graph, what does it mean to find the centralityof a vertex when vertices can be related by various types of edges?For example, if there exists “socializes with” and “met once”, then theperson who “met once” many people could be the most centrally locatedin the graph. Also, what if you graph has more than just “person”-typevertices (e.g. cars, pets, buildings, articles, etc.) and “person”-typeedges (e.g. owns, walks, livesAt, cites, etc.).

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 52: Gremlin: A Graph-Based Programming Language

PageRank over Multi-Relational Graphs

• Calculating single-relational PageRank

would yield Person as the most central

vertex.

• You can boolean filter certain edge labels

(e.g. ignore type edges — in such cases,

you would have the centrality scores over

the knows social graph).

• However, what if you only wanted to

traverse knows edges if and only if the

adjacent vertex knows more than 10

other people?

• In the end, you want completecontrol (universal computability)over the paths that thetraverser/walker can take througha graph.

Person

Herbert Johan Marko Josh Jen

type

...

type type type type typetypetypetypetypetypetype

knows knows

knows

knows knows

knows

...

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 53: Gremlin: A Graph-Based Programming Language

PageRank over Multi-Relational Graphs

• In multi-relational graphs, the meaning of your graph algorithm’s results aredefined by your definition of adjacency.

• With respect to random walk-based PageRank, define the path that the walkershould take. That path is the definition of adjacency.

• The stationary probability distribution created from this walk yields a path-dependentcentrality.

• Thus, in a multi-relational graph, there are many types of PageRanks that canbe calculated — one for each type of path defined for a walker.

Rodriguez, M.A., “Grammar-Based Random Walkers in Semantic Networks”, Knowledge-Based Systems,

21(7), 727–739, http://arxiv.org/abs/0803.4355, October 2008.

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 54: Gremlin: A Graph-Based Programming Language

PageRank over “Garcia Followed By” SubGraph

• Define a path that will go from song-to-song by “followed by” edges andonly traverse songs that are “sung by” Jerry Garcia.

(./outE[@label=‘followed_by’]/inV/outE[@label=‘sung_by’]/inV[name=‘Garcia’]/../..)[g:rand-nat()]

.

Afollowed_by

followed_by

followed_by

Bsung_by

sung_by

sung_by

C Dname="Garcia"

name="Garcia"

name="Weir"

g:rand-nat()

/../..

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 55: Gremlin: A Graph-Based Programming Language

PageRank over “Garcia Followed By” SubGraphpath garcia-followed_by

(./outE[@label=‘followed_by’]/inV/outE[@label=‘sung_by’]/inV[name=‘Garcia’]/../..)[g:rand-nat()]

end

$m := g:map()$alpha := 0.15$_ := g:key(‘type’, ‘song’)[g:rand-nat()]repeat 2500

$_ := ./garcia-followed_byif count($_) > 0

g:op-value(‘+’,$m,$_[1]/@name, 1.0)endif g:rand-real() < $alpha or count($_) = 0

$_ := g:key(‘type’, ’song’)[g:rand-nat()]end

end

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 56: Gremlin: A Graph-Based Programming Language

PageRank over “Garcia Followed By” SubGraphgremlin> g:sort($m,‘value’,true())==>CRAZY FINGERS=98.0==>HES GONE=85.0==>CHINA CAT SUNFLOWER=79.0==>BERTHA=76.0==>UNCLE JOHNS BAND=74.0==>TERRAPIN STATION=72.0==>GOING DOWN THE ROAD FEELING BAD=71.0==>WHARF RAT=71.0==>EYES OF THE WORLD=65.0==>COLD RAIN AND SNOW=62.0==>SHIP OF FOOLS=58.0==>RAMBLE ON ROSE=53.0==>CASEY JONES=51.0==>DARK STAR=47.0==>DEAL=46.0...

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 57: Gremlin: A Graph-Based Programming Language

Universal Computation in Pathspath path-name# any arbitrary computation can occur hereend

• A path definition can be used to define adjacencies.

? adjacency can be expressed as anything that can be computed by a Turing machine.

? path definitions are used to create “semantically meaningful” results from single-

relational graph algorithms applied to multi-relational graphs.

? path definitions make explicit what is implicit in the structure of the graph. This

has applications to knowledge-based reasoning.

• A path definition can perform any arbitrary computation.

? path definitions can check/set vertex/edge properties.

? path definitions can create new vertices and edges.

? path definitions can call/define functions.

This allows fine grained control over how your traverser/walker moves through a graph.

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 58: Gremlin: A Graph-Based Programming Language

Outline

• Introduction to Graphs and Graph Software

• Basic Gremlin Concepts

• Gremlin Language Description

• Advanced Gremlin Concepts

• Conclusions

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 59: Gremlin: A Graph-Based Programming Language

The Current Gremlin EcoSystems

• Webling: Web console for Gremlin

(developed by Pavel Yaskevich w/ funding from Neo Technology)

Webling• Project Gargamel: Distributed Graph Computing

(uses Linked Process and Gremlin)

• ReXster: A Graph-Based Recommender Engine

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

Page 60: Gremlin: A Graph-Based Programming Language

Thank You

Please enjoy Gremlin at http://gremlin.tinkerpop.com ...

My homepage is http://markorodriguez.com.Please feel to contact me with any questions or comments.

Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010


Recommended