+ All Categories
Home > Technology > Storing and manipulating graphs in HBase

Storing and manipulating graphs in HBase

Date post: 12-Nov-2014
Category:
Upload: dan-lynn
View: 3,441 times
Download: 2 times
Share this document with a friend
Description:
 
Popular Tags:
71
Storing and Manipulating Graphs in HBase Dan Lynn [email protected] @danklynn
Transcript
Page 1: Storing and manipulating graphs in HBase

Storing and Manipulating Graphs in HBase

Dan [email protected]

@danklynn

Page 2: Storing and manipulating graphs in HBase

Keeps Contact Information Current and Complete

Based in Denver, Colorado

CTO & Co-Founder

Page 3: Storing and manipulating graphs in HBase

Turn Partial Contacts Into Full Contacts

Page 4: Storing and manipulating graphs in HBase

Refresher: Graph Theory

Page 5: Storing and manipulating graphs in HBase

Refresher: Graph Theory

Page 6: Storing and manipulating graphs in HBase

Refresher: Graph Theory

Vertex

Page 7: Storing and manipulating graphs in HBase

Refresher: Graph Theory

Edge

Page 8: Storing and manipulating graphs in HBase

Social Networks

Page 9: Storing and manipulating graphs in HBase

Tweets

@danklynn

@xorlev

“#HBase rocks”

author

follows

retweeted

Page 10: Storing and manipulating graphs in HBase

Web Links

http://fullcontact.com/blog/

http://techstars.com/

<a href=”...”>TechStars</a>

Page 11: Storing and manipulating graphs in HBase

Why should you care?

Vertex Influence- PageRank

- Social Influence

- Network bottlenecks

Identifying Communities

Page 12: Storing and manipulating graphs in HBase

Storage Options

Page 13: Storing and manipulating graphs in HBase

neo4j

Page 14: Storing and manipulating graphs in HBase

Very expressive querying(e.g. Gremlin)

neo4j

Page 15: Storing and manipulating graphs in HBase

Transactional

neo4j

Page 16: Storing and manipulating graphs in HBase

Data must fit on a single machine

neo4j

:-(

Page 17: Storing and manipulating graphs in HBase

FlockDB

Page 18: Storing and manipulating graphs in HBase

Scales horizontally

FlockDB

Page 19: Storing and manipulating graphs in HBase

Very fast

FlockDB

Page 20: Storing and manipulating graphs in HBase

No multi-hop query support

:-(

FlockDB

Page 21: Storing and manipulating graphs in HBase

RDBMS(e.g. MySQL, Postgres, et al.)

Page 22: Storing and manipulating graphs in HBase

Transactional

RDBMS

Page 23: Storing and manipulating graphs in HBase

Huge amounts of JOINing

RDBMS

:-(

Page 24: Storing and manipulating graphs in HBase
Page 25: Storing and manipulating graphs in HBase

Massively scalable

HBase

Page 26: Storing and manipulating graphs in HBase

Data model well-suited

HBase

Page 27: Storing and manipulating graphs in HBase

Multi-hop querying?

HBase

Page 28: Storing and manipulating graphs in HBase

Modeling Techniques

Page 29: Storing and manipulating graphs in HBase

1

2

3

Adjacency Matrix

Page 30: Storing and manipulating graphs in HBase

Adjacency Matrix

0 1 1

1 0 1

1 1 0

1 2 3

1

2

3

Page 31: Storing and manipulating graphs in HBase

Adjacency Matrix

Can use vectorized libraries

Page 32: Storing and manipulating graphs in HBase

Adjacency Matrix

Requires O(n2) memory n = number of vertices

Page 33: Storing and manipulating graphs in HBase

Adjacency Matrix

Hard(er) to distribute

Page 34: Storing and manipulating graphs in HBase

1

2

3

Adjacency List

Page 35: Storing and manipulating graphs in HBase

Adjacency List

1 2,3

2 1,3

3 1,2

Page 36: Storing and manipulating graphs in HBase

Adjacency List Design in HBase

e:[email protected]

t:danklynn

p:+13039316251

Page 37: Storing and manipulating graphs in HBase

Adjacency List Design in HBase

e:[email protected] p:+13039316251= ...

t:danklynn= ...

p:+13039316251

t:danklynn= ...

e:[email protected]= ...

row key “edges” column family

t:danklynn e:[email protected]= ...

p:+13039316251= ...

Page 39: Storing and manipulating graphs in HBase

Custom Writables

package org.apache.hadoop.io;

public interface Writable { void write(java.io.DataOutput dataOutput); void readFields(java.io.DataInput dataInput);}

java

Page 40: Storing and manipulating graphs in HBase

Custom Writables

class EdgeValueWritable implements Writable { EdgeValue edgeValue

void write(DataOutput dataOutput) { dataOutput.writeDouble edgeValue.weight }

void readFields(DataInput dataInput) { Double weight = dataInput.readDouble() edgeValue = new EdgeValue(weight) }

// ...}

groovy

Page 41: Storing and manipulating graphs in HBase

Don’t get fancy with byte[]

class EdgeValueWritable implements Writable { EdgeValue edgeValue

byte[] toBytes() { // use strings if you can help it}

static EdgeValueWritable fromBytes(byte[] bytes) { // use strings if you can help it}

}groovy

Page 42: Storing and manipulating graphs in HBase

Querying by vertex

def get = new Get(vertexKeyBytes)get.addFamily(edgesFamilyBytes)

Result result = table.get(get);result.noVersionMap.each {family, data ->

// construct edge objects as needed// data is a Map<byte[],byte[]>

}

Page 43: Storing and manipulating graphs in HBase

Adding edges to a vertex

def put = new Put(vertexKeyBytes)

put.add( edgesFamilyBytes, destinationVertexBytes, edgeValue.toBytes() // your own implementation here)

// if writing directlytable.put(put)

// if using TableReducercontext.write(NullWritable.get(), put)

Page 44: Storing and manipulating graphs in HBase

Distributed Traversal / Indexing

e:[email protected]

t:danklynn

p:+13039316251

Page 45: Storing and manipulating graphs in HBase

Distributed Traversal / Indexing

e:[email protected]

t:danklynn

p:+13039316251

Page 46: Storing and manipulating graphs in HBase

Distributed Traversal / Indexing

e:[email protected]

t:danklynn

p:+13039316251

Pivot vertex

Page 47: Storing and manipulating graphs in HBase

Distributed Traversal / Indexing

e:[email protected]

t:danklynn

p:+13039316251

MapReduce over outbound edges

Page 48: Storing and manipulating graphs in HBase

Distributed Traversal / Indexing

e:[email protected]

t:danklynn

p:+13039316251

Emit vertexes and edge data grouped by the pivot

Page 49: Storing and manipulating graphs in HBase

Distributed Traversal / Indexing

e:[email protected]

t:danklynn

p:+13039316251Reduce key

“Out” vertex

“In” vertex

Page 50: Storing and manipulating graphs in HBase

Distributed Traversal / Indexing

e:[email protected] t:danklynn

Reducer emits higher-order edge

Page 51: Storing and manipulating graphs in HBase

Distributed Traversal / Indexing

Iteration 0

Page 52: Storing and manipulating graphs in HBase

Distributed Traversal / Indexing

Iteration 1

Page 53: Storing and manipulating graphs in HBase

Distributed Traversal / Indexing

Iteration 2

Page 54: Storing and manipulating graphs in HBase

Distributed Traversal / Indexing

Iteration 2

Reuse edges created during previous iterations

Page 55: Storing and manipulating graphs in HBase

Distributed Traversal / Indexing

Iteration 3

Page 56: Storing and manipulating graphs in HBase

Distributed Traversal / Indexing

Iteration 3

Reuse edges created during previous iterations

Page 57: Storing and manipulating graphs in HBase

Distributed Traversal / Indexing

hops requires only

iterations

Page 58: Storing and manipulating graphs in HBase

Tips / Gotchas

Page 59: Storing and manipulating graphs in HBase

Do implement your own comparator

java

public static class Comparator extends WritableComparator {

public int compare( byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) { // ..... }

}

Page 60: Storing and manipulating graphs in HBase

Do implement your own comparator

java

static { WritableComparator.define(VertexKeyWritable, new VertexKeyWritable.Comparator())}

Page 61: Storing and manipulating graphs in HBase

MultiScanTableInputFormat

MultiScanTableInputFormat.setTable(conf,"graph");

MultiScanTableInputFormat.addScan(conf, new Scan());

job.setInputFormatClass(MultiScanTableInputFormat.class);

java

Page 62: Storing and manipulating graphs in HBase

TableMapReduceUtil

TableMapReduceUtil.initTableReducerJob("graph", MyReducer.class, job);

java

Page 63: Storing and manipulating graphs in HBase

Elastic MapReduce

Page 64: Storing and manipulating graphs in HBase

Elastic MapReduce

HFiles

Page 65: Storing and manipulating graphs in HBase

Elastic MapReduce

HFiles

SequenceFiles

Copy to S3

Page 66: Storing and manipulating graphs in HBase

Elastic MapReduce

HFiles

SequenceFiles SequenceFiles

Copy to S3 Elastic MapReduce

Page 67: Storing and manipulating graphs in HBase

Elastic MapReduce

HFiles

SequenceFiles SequenceFiles

Copy to S3 Elastic MapReduce

Page 68: Storing and manipulating graphs in HBase

Elastic MapReduce

HFiles

SequenceFiles SequenceFiles

HFiles

Copy to S3 Elastic MapReduce

HFileOutputFormat.configureIncrementalLoad(job, outputTable)

Page 69: Storing and manipulating graphs in HBase

Elastic MapReduce

HFiles

SequenceFiles SequenceFiles

HFiles HBase

Copy to S3 Elastic MapReduce

HFileOutputFormat.configureIncrementalLoad(job, outputTable)

$ hadoop jar hbase-VERSION.jar completebulkload

Page 70: Storing and manipulating graphs in HBase

Additional Resources

Google Pregel: BSP-based graph processing system

Apache Giraph: Implementation of Pregel for Hadoop

MultiScanTableInputFormat example

Apache Mahout - Distributed machine learning on Hadoop


Recommended