A gremlin in my graph confoo 2014

Post on 08-May-2015

674 views 0 download

description

Neo4j comes with enhanced connectivity of data and whiteboard friendly paradigm. It also brings a gremlin in your code : one of the supported graph query language brings a refreshing look at how one can search for data in a vast and interconnect web of data. Gremlin provides an abstract layer that make it easy to express your business logic without fighting with the code. It may even change your mind on object oriented programming.

transcript

A gremlin in your graph

Montreal, Québec, Canada, February, 28th 2014

What is gremlin?

G : graph, or the dataset

V : Vertices, or nodes or objects

E : Edges, or links or relations

graph database, http://www.neo4j.org/

Speaker

Damien Seguy

dams@php.net

Exakat : Expert services in PHP

Yes we take questions

?

Graph Database

==> ==>          \,,,/==>          (o o)==> -----oOOo-(_)-oOOo-----==> ==> Available variables:==>   g = (neo4jgraph[EmbeddedGraphDatabase [data/graph.db]]==> , null)  out = (java.io.PrintStream@398a3257==> , null)gremlin>

http://www.neo4j.org/

http://localhost:7474/Console -> Gremlin

Console in web browser!

g is the graph where the nodes live

v represents all the vertices

Nodes always have an id

g.v(1) => v(1)

Welcome to the graph

g.v(1).id => 1g.v(1).name => ext/datetimeg.v(1).version => null

PropertiesGraph is schemaless

g.v(2).map => {name=timezonedb, version=2013.9}

Node discovery

map is convenient to discover the graph

In the graph

Only objects and relations

Vertices have id and properties

Edges have id, label and properties

g.v(1).out => v(2) v(3)

g.v(1).in => v(4)

g.v(1).both => v(2) v(3)

v(4)

Edges

Directed graph

g.v(1).inE.label => WROTE WROTE

WROTEHAS

g.v(1).inE.id => 2348

PECL database

database of PHP extension authors.

The extensions are stored in categories

http://pecl.php.net/

Gephi

g.v(1).in(‘WROTE’).name => Derick RethansHannes Magnusson

Jeremy Mikola

g.v(1).in(‘WROTE’,‘HAS’).name => /*same as previous plus */ DB

Following edges

g.v(2).out(‘WROTE’).in(‘WROTE’).name =>

Hannes MagnussonJeremy Mikola

Derick Rethans

Collaborators

g.v(2).out(‘WROTE’).in(‘WROTE’).except(g.v(2)).name =>

Hannes MagnussonJeremy Mikola

Collaborators

Intro recap

nodes and vertices : basic blocs

in and out (and both) : navigation

except(), in(‘label’) : filtering

Traversing the graph

Means reading information in the graph

Traversing involves listing nodes then following links until all conditions are met

The graph contains Vertices and Edges. Is there anything else ?

PECL databaseAuthors Ext Categories

Count authors

count()

All vertices are created equal

Count contributors

Too manys!

g.V.out(‘wrote’).count()

=> 5

Arrays or pipes

g.V.out(‘wrote’)[1] v(12)g.V.out(‘wrote’)[1..2] .name ext/xdebug ext/mongo

Count contributors

g.V.in(‘wrote’) .unique() .count()==> 3

Gremlin functionsPipe level function

in, out, unique, count,

Node level function

map, has, filter{}

Value level

{property}

// making name UPPERCASEg.v(79).name.toUpperCase(); EXT/GEARMAN// size of the name’s stringg.v(130).name.toList().size(); 13

// extracting words in a stringg.v(146).transform{ it.name.tokenize();} [Johann-Peter, Hartmann]

Property level

http://groovy.codehaus.org/Documentation

Vertex level

g.v(130).map; {name=Ben Ramsey}

g.v(14).propertykeys nameg.v(12).setProperty(‘ext_nb’, g(12).out(‘wrote’).count() );

CollaboratingAdding collaborators to the graph

Except() produces a pipe! g.addedge doesn’t accept it

g.addEdge( g.v(1), g.v(1).out(‘WROTE’). in(‘WROTE’). except(g.v(1)), ‘COLLABORATE’);

==> No signature of method: com.tinkerpop.gremlin.groovy.GremlinGroovyPipeline.except() is applicable for argument types: (com.tinkerpop.blueprints.pgm.impls.neo4j.Neo4jVertex) values: [v[1]]

==> Possible solutions: except(java.util.Collection), select(), next(), reset(), cap(), toSet()

CollaboratingAdding collaborator

Adding collaborators

wonderful world of closures

g.addEdge( g.v(1), g.v(1).out(‘WROTE’). in(‘WROTE’). except(g.v(1)).next(), ‘COLLABORATE’);

g.v(1).out(‘WROTE’).in(‘WROTE’).except(g.v(1)).each{ g.addEdge(g.v(1), it, ‘COLLABORATE’)}

Working with pipes

Pipes functions often offer possibility for closure

Closure is between {} and uses ‘it’ as current node

Closure often have a standard default behavior, so they are sometimes stealth

Filteringfilter links by label (in/out/both(‘label’))

Filter node with has(‘property’, ‘value’) or hasNot(‘property’, null)

Filter authors with 14 or more extensions

g.V.in(‘WROTE’).filter{it.out(‘WROTE’).count() > 14} Ilia Alshanetsky, Wez Furlong, Sara Golemon

Filter allows us to work within the pipe

FilteringList of contributors with more ext in two categories

g.V.in(‘wrote’).filter{ it.out(‘wrote’).in(‘has’).unique().count() > 2 }.name

Filter longer than query ?

GroupCount

groupCount(m)

Number of categories by author

g.V.out(‘wrote’).in(‘has’).groupCount(m)

Apparently counted but who is v(274) ?

==> v[380]=1==> v[379]=1==> v[378]=2==> v[273]=2==> v[274]=2==> v[272]=2==> v[173]=3==> v[240]=2==> v[301]=1==> v[300]=1

GroupCount

g.V.out(‘has’) .in(‘wrote’) .groupCount(m) {it.name}

==> Zeev Suraski =1==> Zak Greant =1==> Georg Richter =2==> Warren Read =2==> Jay Kint =2==> shekhar joshi =2==> Scott MacVicar =3==> Esen Sagynov =2==> Andi Gutmans =1==> Christopher Jones =1==> Ard Biesheuvel =1==> Grant Croker =1

GroupCountCount of categories, without PHP standard distribution

The second closure counts elements (default +1)

g.V.has(‘wrote’) .in(‘has’) .groupCount(m) {it.name} {if (!it.name in [‘mysql’, ‘timezonedb’,’gd’, ‘dbase’ /* ....*/]) { it.b + 1;} }

Pipesarray notation

closure usage

useful function for pipes :

groupcount, groupby{key}{value}{mapreduce}, ordermap, [n..m] operator,

More on http://gremlindocs.com/

Graph modifications

Gremlin allows graph modification

Adding a type property to authors

g.V.in(‘WROTE’).each{ it.setProperty(‘type’, ‘author’); }

Updating on the waysideEffect runs a closure, but keep running the query

Adding type to extensions AND categories in the same query

g.V.as(‘ext’) .in(‘HAS’) .sideEffect{ it.setProperty(‘type’, ‘Category’); } .back(‘ext’) .sideEffect{ it.setProperty(‘type’, ‘Extension’); }

back tracking

back( ‘name’ ) : goes back to the vertex or edge that was named with as(‘name’)

back( n ) : goes back n vertex or edges behind

Make is possible to check a branch, come back and check another branch

Manipulating vertexg.addVertex(id or null, [property:value,...]);

g.addEdge(origin vertex, destination vertex, label, [property:value...]);

g.removeVertex(vertex);

g.removeEdge(edge);

Application to OOP?Gremlin goes beyond class specifics

g.v(1).out(‘WROTE’).in(‘HAS’).unique().count()

Gremlin generalizes the navigation

$total = array();$author = new Author(1);foreach($author->getExtensions() as $ext) {$total[$ext->getCategory()] = true;}return count($total);

Thanks

Dams@php.net

http://www.slideshare.net/dseguy/

on the http://confoo.ca/

Kevin BaconSuggest collaborators to authors ?

Authors who worked with collaborators but not with the author, are recommendations

g.V.has(‘name’, ‘contrib’).sideEffect{init = it}out(‘wrote’).in(‘wrote’).except(init).has(‘name’,‘contrib2’).path