Post on 02-Dec-2014
description
transcript
AURELIUS THINKAURELIUS.COM
Titan:db Scaling Relationship Data with C*
Matthias Broecheler @mbroecheler September XI, MMXIV
#CassandraSummit
Storing relationship data in Cassandra entails data denormalization or pointer chasing inside the application which reduces developer productivity, is error prone, and slow due to lack of optimization. Titan:db exposes a property graph data model directly atop Cassandra which makes storing and querying relationship data fast, easy, and scalable to huge graphs. This talk demonstrates how Titan's features enable complex, multi-relational databases in Cassandra and discusses customer use cases for recommendation and personalization engines.
Multi-Relational Data Structure
Graph
Titan = Cassandra + Graph
Titan 0.5
Cassandra
,CH?;L M=;F;<CFCNS
@;OFN NIF?L;H=?
IJ?H MIOL=?
GOFNC >;N;=?HN?L
BCAB J?L@ILG;H=?
Key ColumnA ColumnB ColumnC ColumnD ColumnE ColumnF
username email password
ma7 ma7@ 12345
john john@ qwerty
billy billy@ abcde
producDd name price
52235 cup 12.55
42215 spoon 7.22
24529 knife 5.32
User Product
CREATE INDEX ON User.username, User.email, Product.productid
CREATE INDEX ON username(User), email(User), productid(Product)
User Product
productid: 52235 name: cup price: 12.55
username: matt email: matt@ password: 12345
username producDd Dme
ma7 52235 9/5/14
billy 42215 8/7/14
billy 42215 8/7/14
Buy
username email password
ma7 ma7@ 12345
john john@ qwerty
billy billy@ abcde
producDd name price
52235 cup 12.55
42215 spoon 7.22
24529 knife 5.32
User Product
productid: 52235 name: cup price: 12.55
username: matt email: matt@ password: 12345
buy time: 9/5/14
What did ‘matt’ buy? Application level join
username producDd Dme
ma7 52235 9/5/14
billy 42215 8/7/14
billy 42215 8/7/14
Buy
username email password
ma7 ma7@ 12345
john john@ qwerty
billy billy@ abcde
producDd name price
52235 cup 12.55
42215 spoon 7.22
24529 knife 5.32
What did ‘matt’ buy? g.V.has(‘username’,’matt’) .out(‘buy’)
User Product
productid: 52235 name: cup price: 12.55
username: matt email: matt@ password: 12345
buy time: 9/5/14
What did ‘matt’ recently buy?
Application level join
username producDd Dme
ma7 52235 9/5/14
billy 42215 8/7/14
billy 42215 8/7/14
Buy
username email password
ma7 ma7@ 12345
john john@ qwerty
billy billy@ abcde
producDd name price
52235 cup 12.55
42215 spoon 7.22
24529 knife 5.32
g.V.has(‘username’,’matt’) .outE(‘buy’).orderBy(‘time’,DESC) [0..9].inV
What did ‘matt’ recently buy?
User Product
productid: 52235 name: cup price: 12.55
username: matt email: matt@ password: 12345
buy time: 9/5/14
What did ‘matt’ recently buy?
slow
User Product
productid: 52235 name: cup price: 12.55
username: matt email: matt@ password: 12345
buy time: 9/5/14
g.V.has(‘username’,’matt’) .outE(‘buy’).orderBy(‘time’,DESC) [0..9].inV
username producDd Dme
ma7 52235 9/5/14
billy 42215 8/7/14
billy 42215 8/7/14
What did ‘matt’ recently buy?
Rewrite join logic
username Dme producDd
ma7 9/5/14 52235
billy 8/7/14 42215
billy 8/7/14 42215
username email password
ma7 ma7@ 12345
john john@ qwerty
billy billy@ abcde
producDd name price
52235 cup 12.55
42215 spoon 7.22
24529 knife 5.32
User Product
productid: 52235 name: cup price: 12.55
username: matt email: matt@ password: 12345
buy time: 9/5/14
What did ‘matt’ recently buy?
CREATE INDEX ON buy edges by time OUT direction
g.V.has(‘username’,’matt’) .outE(‘buy’).orderBy(‘time’,DESC) [0..9].inV
producDd username Dme
52235 ma7 9/5/14
42215 billy 8/7/14
42215 billy 8/7/14
Who bought ‘52235’? More application joins
producDd Dme producDd
52235 9/5/14 ma7
42215 8/7/14 billy
42215 8/7/14 billy
username email password
ma7 ma7@ 12345
john john@ qwerty
billy billy@ abcde
producDd name price
52235 cup 12.55
42215 spoon 7.22
24529 knife 5.32
User Product
productid: 52235 name: cup price: 12.55
username: matt email: matt@ password: 12345
buy time: 9/5/14
g.V.has(‘productid’,52235) .in(‘buy’)
CREATE INDEX ON buy edges by time IN direction
Who bought ‘52235’?
Product join tables won’t scale
username producDd Dme
ma7 52235 9/5/14
billy 42215 8/7/14
billy 42215 8/7/14
username Dme producDd
ma7 9/5/14 52235
billy 8/7/14 42215
billy 8/7/14 42215
username email password
ma7 ma7@ 12345
john john@ qwerty
billy billy@ abcde
producDd name price
52235 cup 12.55
42215 spoon 7.22
24529 knife 5.32
producDd username Dme
52235 ma7 9/5/14
42215 billy 8/7/14
42215 billy 8/7/14
producDd Dme producDd
52235 9/5/14 ma7
42215 8/7/14 billy
42215 8/7/14 billy
User Product
productid: 52235 name: cup price: 12.55
username: matt email: matt@ password: 12345
buy time: 9/5/14
PARTITION Product Vertices
Token Ring (BOP)
Edge Cut
- assigns ids to map vertices into “optimal” token range - Maintains virtual partitions
Vertical Partitioning = divide communities
Vertex Cut
Combined Graph Partitioning
Database
Datastore
Transactions
v = g.V.has(‘username’,’matt’) .has(‘password’,’12345’) p = g.V.has(‘productid’,52235) e = v.addEdge(‘buy’,p) e.setProperty(‘time’,’9/11/2014’) o = g.addVertex([orderid:242343]) o.addEdge(‘buyer’,v) o.addEdge(‘product’,p) g.commit()
unit of work
Atomicity Consistency
Isolation Durability
Transaction Consistency
u = g.addVertex([username:’matt’]) p = g.V.has(‘username’,’senior’) u.addEdge(‘father’,p) p.setProperty(‘surname’,’Jones’) g.commit()
Locks acquired to ensure consistency constraints are enforced
• Index Uniqueness • Multiplicity Constraints • Cardinality Constraints
Polyglot Data Architecture
© Jay Kreps @ LinkedIn
Transaction modifications
logged
Consumers
Titan Event Framework
Use Cases
http://arli.us/magazinaluiza
Security
Fraud
http://arli.us/cisco-sec1
© Sean York @ Pearson Education
http://bit.ly/ WPTitanSEAGraph
http://arli.us/musicgraphintro
Music Graph
Knowledge Graph
TitanDB.io
Relationships + Cassandra
AURELIUS THINKAURELIUS.COM