+ All Categories
Home > Software > ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

Date post: 14-Jan-2017
Category:
Upload: 8kdata-technology
View: 360 times
Download: 4 times
Share this document with a friend
51
ToroDB: Next Generation Java MongoDB compatible NoSQL & SQL Database Álvaro Hernández <[email protected]>
Transcript
Page 1: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB:

Next Generation JavaMongoDB compatible

NoSQL & SQL Database

Álvaro Hernández <[email protected]>

Page 2: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

About *8Kdata*

● Research & Development in databases

● Consulting, Training and Support in PostgreSQL

● Java Developers, JavaSpecialists.eu, JCrete.org

● About myself: CTO at 8Kdata:@ahachetehttp://linkd.in/1jhvzQ3

www.8kdata.com

Page 3: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

Say you want…

A database with:

● Great functionality

● Consistency (ACID)

● Reliability

● SQL

Page 4: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

… and then also want

A database with:

● NoSQL(MongoDB)

● “Schema-less”

● Scalability

Page 5: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

Fear no more!You can now

have both SQL and NoSQL!

= +

Page 6: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

DEMO!

Page 7: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

Page 8: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

ToroDB in one slide

● Document-oriented, JSON, NoSQL db

● Open source (AGPL). Written in Java

● MongoDB compatibility (wire protocol level)

● Uses PostgreSQL as a storage backend

Page 9: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

Mapping unstructured datato relational

Page 10: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

ToroDB storage internals

{ "name": "ToroDB", "data": { "a": 42, "b": "hello world!" }, "nested": { "j": 42, "deeper": { "a": 21, "b": "hello" } }}

Page 11: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

ToroDB storage internals

The document is split into the following subdocuments:

{ "name": "ToroDB", "data": {}, "nested": {} }

{ "a": 42, "b": "hello world!"}

{ "j": 42, "deeper": {}}

{ "a": 21, "b": "hello"}

Page 12: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

ToroDB storage internals

select * from demo.t_3┌─────┬───────┬────────────────────────────┬────────┐│ did │ index │ _id │ name │├─────┼───────┼────────────────────────────┼────────┤│ 0 │ ¤ │ \x5451a07de7032d23a908576d │ ToroDB │└─────┴───────┴────────────────────────────┴────────┘select * from demo.t_1┌─────┬───────┬────┬──────────────┐│ did │ index │ a │ b │├─────┼───────┼────┼──────────────┤│ 0 │ ¤ │ 42 │ hello world! ││ 0 │ 1 │ 21 │ hello │└─────┴───────┴────┴──────────────┘select * from demo.t_2┌─────┬───────┬────┐│ did │ index │ j │├─────┼───────┼────┤│ 0 │ ¤ │ 42 │└─────┴───────┴────┘

Page 13: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

ToroDB storage internals

select * from demo.structures┌─────┬────────────────────────────────────────────────────────────────────────────┐│ sid │ _structure │├─────┼────────────────────────────────────────────────────────────────────────────┤│ 0 │ {"t": 3, "data": {"t": 1}, "nested": {"t": 2, "deeper": {"i": 1, "t": 1}}} │└─────┴────────────────────────────────────────────────────────────────────────────┘

select * from demo.root;┌─────┬─────┐│ did │ sid │├─────┼─────┤│ 0 │ 0 │└─────┴─────┘

Page 14: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

How data is stored in schema-less

Data normalization

Page 15: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

This is how we store in ToroDB

Page 16: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

Advantages over MongoDB

Page 17: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

ToroDB: native SQL

Page 18: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

Mix-and-match relational & NoSQL

● Use the same database for both your relational data and ToroDB

● Just use separate schemas (if you will)

● Don't write to ToroDB data or metadata tables

● Query with SQL, do joins, whatever!

Page 19: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

Atomic operations

● There is no support for atomic bulk insert/update/delete operations

● Not even with $isolated:“Prevents a write operation that affects multiple documents from yielding to other reads or writes […] You can ensure that no client sees the changes until the operation completes or errors out. The $isolated isolation operator does not provide “all-or-nothing” atomicity for write operations.”http://docs.mongodb.org/manual/reference/operator/update/isolated/

Page 20: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

“Clean” reads

Oh really?

Page 21: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

“Clean” readshttp://docs.mongodb.org/manual/reference/write-concern/#read-isolation-behavior

“MongoDB will allow clients to read the results of a write operation before the write operation returns.”

“If the mongod terminates before the journal commits, even if a write returns successfully, queries may have read data that will not exist after the mongod restarts.”

Thus, MongoDB suffers from dirty reads. But let's call them just “tainted reads”.

Page 22: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

“Clean” reads

What about $snapshot? Nope:

“The snapshot() does not guarantee that the data returned by the query will reflect a single moment in time nor does it provide isolation from insert or delete operations.”

http://docs.mongodb.org/manual/faq/developers/#faq-developers-isolate-cursors

Cursors in ToroDB run in repeatable read, read-only mode:globalCursorDataSource.setTransactionIsolation("TRANSACTION_REPEATABLE_READ"); globalCursorDataSource.setReadOnly(true);

Page 23: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

Replication&

Horizontal scalability(aka sharding)

Page 24: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

ToroDB v0.4

● ToroDB works as a secondary slave of a MongoDB master (or slave, chained rep)

● Implements the full replication protocol (not as an oplog tailable query)

● Replicates from Mongo to a PostgreSQL

Page 25: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

Write scalability(sharding)

● MongoDB's sharding API not implemented yet (roadmap: ToroDB 0.8)

● Will use MongoDB's mongos without modification, as well as config servers

● Currently we implement sharding at the db level, using backends such as Greenplum

Page 26: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

ToroDBThe software

Page 27: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

The software

Written in Java. v0.40 requires Java7, 1.0 will require 8.

Tested with Oracle and IBM JVMs.Anyone from Azul here today? ;)

Distributed as a JAR file (actually, wrapped with shell executables). Future: also EAR to deploy.

Page 28: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

Standing on the shoulders of giants

And also PostgreSQL, Greenplum, JDBC

Page 29: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

Very modular source code

● The app: 20+ modules (Maven)

● Some of them are individually reusable

● Several abstraction layers:➔ D2R (Document 2 Relational)➔ KVDocument (KV docs abstraction)➔ Database/backend (relational)

Page 30: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

MongoWP

● MongoWP is our implementation of MongoDB's wire protocol

● Based on Netty, an excellent, async and high performance NIO framework

● Callback interface for any MongoDB-based “middleware” implementation (ToroDB, proxy...)

Page 31: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

Architecture

Page 32: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

Executor engines

Page 33: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

Performance

Page 34: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

●Amazon c3.8xlarge➔32 virtual CPUs➔60 GB RAM➔2 x 320 GB SSD

●YCSB 0.5.0, only inserts, 10 minutes● WriteConcern {w:1, fsync: true}● Batch size 1000, 1 and 4 threads● MongoDB 3.2 WiredTiger● ToroDB 0.40 Oracle Java 8, PostgreSQL 9.5 (shared_buffers: 15GB, effective_cache: 45GB)

OLTP Benchmark: YCSB

Page 35: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

OLTP Benchmark: YCSB

Page 36: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

● Amazon reviews datasetImage-based recommendations on styles and substitutesJ. McAuley, C. Targett, J. Shi, A. van den HengelSIGIR, 2015

● AWS c4.xlarge (4vCPU, 8GB RAM) 4KIOPS SSD EBS

● 4x shards, 3x config; 4x segments GP

● 83M records, 65GB plain json

Data Analytics Benchmark

Page 37: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

Disk usage

Mongo 3.0, WT, SnappyGP columnar, zlib level 9

table size index size total size0

10000000000

20000000000

30000000000

40000000000

50000000000

60000000000

70000000000

80000000000

Storage requirements

MongoDB vs ToroDB on Greenplum

Mongo

ToroDB on GP

byt

es

Page 38: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

SELECT count( distinct( "reviewerID" ))FROM reviews;

Queries: which one is easier?

db.reviews.aggregate([{ $group: { _id: "reviewerID"}},{ $group: {_id: 1, count: { $sum: 1}}}])

Page 39: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

SELECT "reviewerName", count(*) as reviews FROM reviews GROUP BY "reviewerName" ORDER BY reviews DESC LIMIT 10;

Queries: which one is easier?

db.reviews.aggregate([ { $group : { _id : '$reviewerName', r : { $sum : 1 } } }, { $sort : { r : -1 } }, { $limit : 10 } ], {allowDiskUse: true})

Page 40: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

Query times

3 different queriesQ3 on MongoDB: aggregate fails

27.95 74.87 00

200

400

600

800

1000

1200

9691007

035 13 31

Query duration (s)

MongoDB vs ToroDB on Greenplum

MongoDB

ToroDB on GP

speedup

seco

nd

s

Page 41: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

Tips & TricksLessons Learned

Page 42: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

●Lots of type and value for each of those types to manage: strings, integers, Arrays

●Lots of case we have to handle in the code:➔transformation from document to table data structure

➔transformation to internal query lang➔...

Visitor pattern for document manipulation

Page 43: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

●Smaller methods●Somewhere in the deepest class you can see a huge if {} else if {} ... else {}

●Safely add new types

Visitor pattern for document manipulation

Compiler will tell us if we forget to implement some visitor

Page 44: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

Oracle Java Mission Control.●Great tool in general, low impact on perf. Gives A LOT of information on memory allocation, exceptions thrown, etc

●But quite bad to measure the time spent on methods, as it ignores time spent in native code and IO

●Very coarse-grained

Used tools to monitor performance

Page 45: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

VisualVM●Very fine grained●By Default measures time spent on native code and IO

●Impact on performance:➔Configurable, but high in general➔The performance impact seems to be heterogeneous (some methods are more penalized than others)

Used tools to monitor performance

Page 46: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

● ToroDB uses HashMaps. Keys are the JSON keys● When there is a lookup on a HashMap, the equals must be executed.

● Each key is a String and String#equals is O(1) when both Strings are the same, but O(n) when both Strings are equal but not the same object.

● As a result, we were spending much more time than expected looking for on HashMaps

● We use a pool of keys that guarantees that if two keys equal, they are the same object.

● Cons: Some time is spent on the pool of keys, as they are basically a map.

Document keys & maps

Page 47: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

● ToroDB has to deal with memory pressure when the MongoDB clients produce requests faster than the SQL backend can handle them.

● This is specially important when the client is using the async drivers

● Ideal solution: Make the backend faster● Specially adding async behaviour● But it requires a new non-JDBC driver => Phoebe● Practical solution: To use a back pressure mechanism to make the client be as fast as the backend can be.

Dealing with Memory Pressure

Page 48: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

● It is important to monitor the hotspots● We found some parts of our code that were correct, but very inefficient.➔Some of them were errors (some analysis that were executed twice on different parts of the code)

➔Some operations that we considered faster enough were executed so many times that it was critical to reimplement on a more performant way

Chasing performance problems

Page 49: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

Download, clone, PR, star it!

https://github.com/torodb/torodb

Check our FAQ:

https://github.com/torodb/torodb/wiki/FAQ

Page 50: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

ToroDB @NoSQLonSQL

Page 51: ToroDB: Next Generation Java, MongoDB-compatible, NoSQL & SQL Database

Recommended