Date post: | 08-May-2015 |
Category: |
Technology |
Upload: | seguy-damien |
View: | 674 times |
Download: | 0 times |
A gremlin in your graph
Montreal, Québec, Canada, February, 28th 2014
What is gremlin?
G : graph, or the dataset
V : Vertices, or nodes or objects
E : Edges, or links or relations
Yes we take questions
?
Graph Database
==> ==> \,,,/==> (o o)==> -----oOOo-(_)-oOOo-----==> ==> Available variables:==> g = (neo4jgraph[EmbeddedGraphDatabase [data/graph.db]]==> , null) out = (java.io.PrintStream@398a3257==> , null)gremlin>
http://www.neo4j.org/
http://localhost:7474/Console -> Gremlin
Console in web browser!
g is the graph where the nodes live
v represents all the vertices
Nodes always have an id
g.v(1) => v(1)
Welcome to the graph
g.v(1).id => 1g.v(1).name => ext/datetimeg.v(1).version => null
PropertiesGraph is schemaless
g.v(2).map => {name=timezonedb, version=2013.9}
Node discovery
map is convenient to discover the graph
In the graph
Only objects and relations
Vertices have id and properties
Edges have id, label and properties
g.v(1).out => v(2) v(3)
g.v(1).in => v(4)
g.v(1).both => v(2) v(3)
v(4)
Edges
Directed graph
g.v(1).inE.label => WROTE WROTE
WROTEHAS
g.v(1).inE.id => 2348
PECL database
database of PHP extension authors.
The extensions are stored in categories
http://pecl.php.net/
Gephi
g.v(1).in(‘WROTE’).name => Derick RethansHannes Magnusson
Jeremy Mikola
g.v(1).in(‘WROTE’,‘HAS’).name => /*same as previous plus */ DB
Following edges
g.v(2).out(‘WROTE’).in(‘WROTE’).name =>
Hannes MagnussonJeremy Mikola
Derick Rethans
Collaborators
g.v(2).out(‘WROTE’).in(‘WROTE’).except(g.v(2)).name =>
Hannes MagnussonJeremy Mikola
Collaborators
Intro recap
nodes and vertices : basic blocs
in and out (and both) : navigation
except(), in(‘label’) : filtering
Traversing the graph
Means reading information in the graph
Traversing involves listing nodes then following links until all conditions are met
The graph contains Vertices and Edges. Is there anything else ?
PECL databaseAuthors Ext Categories
Count authors
count()
All vertices are created equal
Count contributors
Too manys!
g.V.out(‘wrote’).count()
=> 5
Arrays or pipes
g.V.out(‘wrote’)[1] v(12)g.V.out(‘wrote’)[1..2] .name ext/xdebug ext/mongo
Count contributors
g.V.in(‘wrote’) .unique() .count()==> 3
Gremlin functionsPipe level function
in, out, unique, count,
Node level function
map, has, filter{}
Value level
{property}
// making name UPPERCASEg.v(79).name.toUpperCase(); EXT/GEARMAN// size of the name’s stringg.v(130).name.toList().size(); 13
// extracting words in a stringg.v(146).transform{ it.name.tokenize();} [Johann-Peter, Hartmann]
Property level
http://groovy.codehaus.org/Documentation
Vertex level
g.v(130).map; {name=Ben Ramsey}
g.v(14).propertykeys nameg.v(12).setProperty(‘ext_nb’, g(12).out(‘wrote’).count() );
CollaboratingAdding collaborators to the graph
Except() produces a pipe! g.addedge doesn’t accept it
g.addEdge( g.v(1), g.v(1).out(‘WROTE’). in(‘WROTE’). except(g.v(1)), ‘COLLABORATE’);
==> No signature of method: com.tinkerpop.gremlin.groovy.GremlinGroovyPipeline.except() is applicable for argument types: (com.tinkerpop.blueprints.pgm.impls.neo4j.Neo4jVertex) values: [v[1]]
==> Possible solutions: except(java.util.Collection), select(), next(), reset(), cap(), toSet()
CollaboratingAdding collaborator
Adding collaborators
wonderful world of closures
g.addEdge( g.v(1), g.v(1).out(‘WROTE’). in(‘WROTE’). except(g.v(1)).next(), ‘COLLABORATE’);
g.v(1).out(‘WROTE’).in(‘WROTE’).except(g.v(1)).each{ g.addEdge(g.v(1), it, ‘COLLABORATE’)}
Working with pipes
Pipes functions often offer possibility for closure
Closure is between {} and uses ‘it’ as current node
Closure often have a standard default behavior, so they are sometimes stealth
Filteringfilter links by label (in/out/both(‘label’))
Filter node with has(‘property’, ‘value’) or hasNot(‘property’, null)
Filter authors with 14 or more extensions
g.V.in(‘WROTE’).filter{it.out(‘WROTE’).count() > 14} Ilia Alshanetsky, Wez Furlong, Sara Golemon
Filter allows us to work within the pipe
FilteringList of contributors with more ext in two categories
g.V.in(‘wrote’).filter{ it.out(‘wrote’).in(‘has’).unique().count() > 2 }.name
Filter longer than query ?
GroupCount
groupCount(m)
Number of categories by author
g.V.out(‘wrote’).in(‘has’).groupCount(m)
Apparently counted but who is v(274) ?
==> v[380]=1==> v[379]=1==> v[378]=2==> v[273]=2==> v[274]=2==> v[272]=2==> v[173]=3==> v[240]=2==> v[301]=1==> v[300]=1
GroupCount
g.V.out(‘has’) .in(‘wrote’) .groupCount(m) {it.name}
==> Zeev Suraski =1==> Zak Greant =1==> Georg Richter =2==> Warren Read =2==> Jay Kint =2==> shekhar joshi =2==> Scott MacVicar =3==> Esen Sagynov =2==> Andi Gutmans =1==> Christopher Jones =1==> Ard Biesheuvel =1==> Grant Croker =1
GroupCountCount of categories, without PHP standard distribution
The second closure counts elements (default +1)
g.V.has(‘wrote’) .in(‘has’) .groupCount(m) {it.name} {if (!it.name in [‘mysql’, ‘timezonedb’,’gd’, ‘dbase’ /* ....*/]) { it.b + 1;} }
Pipesarray notation
closure usage
useful function for pipes :
groupcount, groupby{key}{value}{mapreduce}, ordermap, [n..m] operator,
More on http://gremlindocs.com/
Graph modifications
Gremlin allows graph modification
Adding a type property to authors
g.V.in(‘WROTE’).each{ it.setProperty(‘type’, ‘author’); }
Updating on the waysideEffect runs a closure, but keep running the query
Adding type to extensions AND categories in the same query
g.V.as(‘ext’) .in(‘HAS’) .sideEffect{ it.setProperty(‘type’, ‘Category’); } .back(‘ext’) .sideEffect{ it.setProperty(‘type’, ‘Extension’); }
back tracking
back( ‘name’ ) : goes back to the vertex or edge that was named with as(‘name’)
back( n ) : goes back n vertex or edges behind
Make is possible to check a branch, come back and check another branch
Manipulating vertexg.addVertex(id or null, [property:value,...]);
g.addEdge(origin vertex, destination vertex, label, [property:value...]);
g.removeVertex(vertex);
g.removeEdge(edge);
Application to OOP?Gremlin goes beyond class specifics
g.v(1).out(‘WROTE’).in(‘HAS’).unique().count()
Gremlin generalizes the navigation
$total = array();$author = new Author(1);foreach($author->getExtensions() as $ext) {$total[$ext->getCategory()] = true;}return count($total);
Thanks
http://www.slideshare.net/dseguy/
on the http://confoo.ca/
Kevin BaconSuggest collaborators to authors ?
Authors who worked with collaborators but not with the author, are recommendations
g.V.has(‘name’, ‘contrib’).sideEffect{init = it}out(‘wrote’).in(‘wrote’).except(init).has(‘name’,‘contrib2’).path