+ All Categories
Home > Documents > ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation...

ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation...

Date post: 21-May-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
51
ToroDB internals How to create a NoSQL db on top of SQL @NoSQLonSQL Álvaro Hernández <[email protected]>
Transcript
Page 1: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB internalsHow to create a NoSQL db

on top of SQL

@NoSQLonSQL

Álvaro Hernández <[email protected]>

Page 2: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

About *8Kdata*

● Research & Development in databases

● Consulting, Training and Support in PostgreSQL

● Founders of PostgreSQL España, 5th largest PUG in the world (>500 members as of today)

● About myself: CTO at 8Kdata:@ahachetehttp://linkd.in/1jhvzQ3

www.8kdata.com

Page 3: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

ToroDB in one slide

● Document-oriented, JSON, NoSQL db

● Open source (AGPL)

● MongoDB compatibility (wire protocol level)

● Uses PostgreSQL as a storage backend

Page 4: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

How to make a NoSQL database

● Varying schema (aka “schema-less”)

● Document-oriented API

● Replication

● Horizontal scalability

Page 5: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

Varying schema(aka “schema-less”)

Page 6: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

ToroDB's Document 2 Relational transformation

● Data is stored in tables. No blobs

● JSON documents are split by hierarchy levels into “subdocuments”, which contain no nested structures. Each subdocument level is stored separately

● Subdocuments are classified by “type”. Each “type” maps to a different table

Page 7: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

● A “structure” table keeps the subdocument “schema”

● Keys in JSON are mapped to attributes, which retain the original name

● Tables are created dinamically and transparently to match the exact types of the documents

ToroDB's Document 2 Relational transformation

Page 8: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

ToroDB storage internals

{ "name": "ToroDB", "data": { "a": 42, "b": "hello world!" }, "nested": { "j": 42, "deeper": { "a": 21, "b": "hello" } }}

Page 9: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

ToroDB storage internals

The document is split into the following subdocuments:

{ "name": "ToroDB", "data": {}, "nested": {} }

{ "a": 42, "b": "hello world!"}

{ "j": 42, "deeper": {}}

{ "a": 21, "b": "hello"}

Page 10: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

ToroDB storage internals

select * from demo.t_3┌─────┬───────┬────────────────────────────┬────────┐│ did │ index │ _id │ name │├─────┼───────┼────────────────────────────┼────────┤│ 0 │ ¤ │ \x5451a07de7032d23a908576d │ ToroDB │└─────┴───────┴────────────────────────────┴────────┘select * from demo.t_1┌─────┬───────┬────┬──────────────┐│ did │ index │ a │ b │├─────┼───────┼────┼──────────────┤│ 0 │ ¤ │ 42 │ hello world! ││ 0 │ 1 │ 21 │ hello │└─────┴───────┴────┴──────────────┘select * from demo.t_2┌─────┬───────┬────┐│ did │ index │ j │├─────┼───────┼────┤│ 0 │ ¤ │ 42 │└─────┴───────┴────┘

Page 11: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

ToroDB storage internals

select * from demo.structures┌─────┬────────────────────────────────────────────────────────────────────────────┐│ sid │ _structure │├─────┼────────────────────────────────────────────────────────────────────────────┤│ 0 │ {"t": 3, "data": {"t": 1}, "nested": {"t": 2, "deeper": {"i": 1, "t": 1}}} │└─────┴────────────────────────────────────────────────────────────────────────────┘

select * from demo.root;┌─────┬─────┐│ did │ sid │├─────┼─────┤│ 0 │ 0 │└─────┴─────┘

Page 12: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

Why not jsonb?

● jsonb is really cool

● But keys (metadata) are repeated most of the time (storage, I/O savings)

● Docuements are not classified (normalized)

● ToroDB does all that

Page 13: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

Metadata repetition

{ “a”: 1, “b”: 2 }{ “a”: 3 }{ “a”: 4, “c”: 5 }{ “a”: 6, “b”: 7 }{ “b”: 8 }{ “a”: 9, “b”: 10 }{ “a”: 11, “b”: 12, “j”: 13 }{ “a”: 14, “c”: 15 }

Counting “document types” in collections of millions: at most, 1000s of different types

Page 14: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

ToroDB storage and I/O savings

29% - 68% storage required,compared to Mongo 2.6

Page 15: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

ToroDB: query “by structure”

● ToroDB is effectively partitioning by type

● Structures (schemas, partitioning types) are cached in ToroDB memory

● Queries only scan a subset of the data

● Negative queries are served directly from memory

Page 16: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

How data is stored in schema-less

Data normalization

Page 17: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

This is how we store in ToroDB

Page 18: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

Document oriented-API

Page 19: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

MongoDB WP and API

● ToroDB speaks MongoDB protocol natively, and implements Mongo API

● All MongoDB tools, drivers, GUIs, whould work as is on ToroDB

● Offer a widely used API on top of your current infrastructure

Page 20: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

● NoSQL is trying to get back to SQL

● ToroDB is SQL native!

● Insert with Mongo, query with SQL!

● Powerful PostgreSQL SQL: window functions, recursive queries, hypothetical aggregates, lateral joins, CTEs, etc

ToroDB: native SQL

Page 21: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

ToroDB: native SQL

Page 22: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

db.toroviews.insert({id:5, person: {

name: "Alvaro", surname: "Hernandez",contact: { email: "[email protected]", verified: true }}

})

db.toroviews.insert({id:6, person: {

name: "Name", surname: "Surname", age: 31,contact: { email: "[email protected]" }}

})

ToroDB VIEWs

Page 23: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

torodb$ select * from toroviews.person ;┌─────┬───────────┬────────┬─────┐│ did │ surname │ name │ age │├─────┼───────────┼────────┼─────┤│ 0 │ Hernandez │ Alvaro │ ¤ ││ 1 │ Surname │ Name │ 31 │└─────┴───────────┴────────┴─────┘(2 rows)

torodb$ select * from toroviews."person.contact";┌─────┬──────────┬────────────────────────┐│ did │ verified │ email │├─────┼──────────┼────────────────────────┤│ 0 │ t │ [email protected] ││ 1 │ ¤ │ [email protected] │└─────┴──────────┴────────────────────────┘(2 rows)

ToroDB VIEWs

Page 24: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

VIEWs, ToroDB from any SQL tool

Page 25: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

Replication

Page 26: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

Replication protocol choice

● ToroDB is based on PostgreSQL

● PostgreSQL has either binary streaming replication (async or sync) or logical replication

● MongoDB has logical replication

● ToroDB uses MongoDB's protocol

Page 27: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

MongoDB's replication protocol

● Every change is recorded in JSON documents, idempotent format (collection: local.oplog.rs)

● Slaves pull these documents from master (or other slaves) asynchronously

● Changes are applied and feedback is sent upstream

Page 28: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

MongoDB replication implementation

https://github.com/stripe/mosql

Page 29: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

ToroDB v0.4● ToroDB works as a secondary slave of a MongoDB master (or slave, chained rep)

● Implements the full replication protocol (not as an oplog tailable query)

● Replicates from Mongo to a PostgreSQL

● Open source github.com/torodb/torodb (repl branch, version 0.4-SNAPSHOT)

Page 30: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

MongoDB's oplog & tailable cursors

Oplog is a system, capped collection which contains idempotent DML commands

Tailable cursors are “open queries” where data is streamed as it appears on the database

Both are used a lot (replication, Meteor...)

Page 31: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

How to do oplog & tailable with PG?

Thanks to 9.4's logical decoding, it's easy

Replication (master) is just plain logical decoding, transformed to MongoDB's json format

Tailable cursors would be logical decoding plus filtering (to make sure streamed data matches query filters)

Page 32: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

Horizontal scalability(aka sharding)

Page 33: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

Write scalability(sharding)

● MongoDB's sharding API not implemented yet (roadmap: ToroDB 0.8)

● Will use MongoDB's mongos without modification, as well as config servers

● That might change in the future (pg_shard?)

Page 34: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

Horizontal scalability(storage level)

● Another non-exclusive option is to have ToroDB store data in a distributed database

● Requires a distributed database like GreenPlum, CitusDb or RedShift

● Paired with replication as a slave:DW in NoSQL enabler

Page 35: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

● GreenPlum was open sourced Oct/15

● ToroDB already runs on GP!CitusDB 5.0 port coming soon!

● Goal: ToroDB v0.4 replicates from a MongoDB master onto Toro-Greenplum and runs the reporting queries using distributed SQL

ToroDB on GP: SQL DW for MongoDB

Page 36: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

● Amazon reviews datasetImage-based recommendations on styles and substitutesJ. McAuley, C. Targett, J. Shi, A. van den HengelSIGIR, 2015

● AWS c4.xlarge (4vCPU, 8GB RAM) 4KIOPS SSD EBS

● 4x shards, 3x config; 4x segments GP

● 83M records, 65GB plain json

Benchmark

Page 37: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

Disk usage

Mongo 3.0, WT, SnappyGP columnar, zlib level 9

table size index size total size0

10000000000

20000000000

30000000000

40000000000

50000000000

60000000000

70000000000

80000000000

Storage requirements

MongoDB vs ToroDB on Greenplum

Mongo

ToroDB on GP

byt

es

Page 38: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

SELECT count( distinct( "reviewerID" ))FROM reviews;

Queries: which one is easier?

db.reviews.aggregate([{ $group: { _id: "reviewerID"}},{ $group: {_id: 1, count: { $sum: 1}}}])

Page 39: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

SELECT "reviewerName", count(*) as reviews FROM reviews GROUP BY "reviewerName" ORDER BY reviews DESC LIMIT 10;

Queries: which one is easier?

db.reviews.aggregate([ { $group : { _id : '$reviewerName', r : { $sum : 1 } } }, { $sort : { r : -1 } }, { $limit : 10 } ], {allowDiskUse: true})

Page 40: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

Query times

3 different queriesQ3 on MongoDB: aggregate fails

27,95 74,87 00

200

400

600

800

1000

1200

9691007

035 13 31

Query duration (s)

MongoDB vs ToroDB on Greenplum

MongoDB

ToroDB on GP

speedup

seco

nd

s

Page 41: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

Doing what MongoDBcan't offer

Page 42: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

Atomic operations

● There is no support for atomic bulk insert/update/delete operations

● Not even with $isolated:“Prevents a write operation that affects multiple documents from yielding to other reads or writes […] You can ensure that no client sees the changes until the operation completes or errors out. The $isolated isolation operator does not provide “all-or-nothing” atomicity for write operations.”http://docs.mongodb.org/manual/reference/operator/update/isolated/

Page 43: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

Cheap single-node durability

● Without journaling, MongoDB is not durable nor crash-safe

● MongoDB requires “j: true” for true single-node durability. But who guarantees its consistent usage? Who uses it by default?j:true creates I/O storms equivalent to SQL CHECKPOINTs

Page 44: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

“Clean” reads

Oh really?

Page 45: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

“Clean” reads

http://docs.mongodb.org/manual/reference/write-concern/#read-isolation-behavior

“MongoDB will allow clients to read the results of a write operation before the write operation returns.”

“If the mongod terminates before the journal commits, even if a write returns successfully, queries may have read data that will not exist after the mongod restarts.”

Thus, MongoDB suffers from dirty reads. But let's call them just “tainted reads”.

Page 46: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

“Clean” reads

What about $snapshot? Nope:

“The snapshot() does not guarantee that the data returned by the query will reflect a single moment in time nor does it provide isolation from insert or delete operations.”

http://docs.mongodb.org/manual/faq/developers/#faq-developers-isolate-cursors

Cursors in ToroDB run in repeatable read, read-only mode:globalCursorDataSource.setTransactionIsolation("TRANSACTION_REPEATABLE_READ"); globalCursorDataSource.setReadOnly(true);

Page 47: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

https://aphyr.com/posts/322-call-me-maybe-mongodb-stale-reads

Replication dirty and stale reads(“call me maybe”)

Page 48: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

MongoDB's dirty and stale reads (repl)

Dirty readsA primary in minority accepts a write that other clients see, but it later steps down, write is rolled back (fixed in 3.2?)

Stale readsA primary in minority serves a value that ought to be current, but a newer value was written to the other primary in minority

Page 49: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

Data discoverability, SQL connectors

● They are two of the major announcements for MongoDB 3.2

● To discover data, MongoDB samples data. ToroDB: just look at table structures! (and join with root if you want a count)

● SQL connectors: native, no emulation

Page 50: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

ToroDB @NoSQLonSQL

Download, clone, PR, star it!

https://github.com/torodb/torodb

Page 51: ToroDB internals€¦ · “MongoDB will allow clients to read the results of a write operation before the write operation returns.” “If the mongod terminates before the journal

Recommended