+ All Categories
Home > Software > From SQL to MongoDB

From SQL to MongoDB

Date post: 14-Apr-2017
Category:
Upload: nuxeo
View: 468 times
Download: 0 times
Share this document with a friend
40
Nuxeo: from SQL to MongoDB Florent Guillaume — Director of R&D, Nuxeo 2014-07-03
Transcript
Page 1: From SQL to MongoDB

Nuxeo: from SQL to MongoDBFlorent Guillaume — Director of R&D, Nuxeo 2014-07-03

Page 2: From SQL to MongoDB

The Nuxeo Model

Page 3: From SQL to MongoDB

Nuxeo Platform

SQL DB

Document

BLOBS

<META>

<META>

<META>

Repository

BlobStore

Store

Read

Cache

Persistence Engine

Insert Update

Select

FS

MongoDB

VCS DBS

Page 4: From SQL to MongoDB

Nuxeo Core — Rich Documents

• Scalars

• Strings, Integers, Floats, Booleans, Dates

• Binary blobs (stored using separate BinaryStore service)

• Arrays of scalars

• Complex properties (sub-documents)

• Lists of complex properties

• System properties

• Id, type, facets, lifecycle state, ACL, version flags...

Page 5: From SQL to MongoDB

Nuxeo Core — Rich Documents

• Scalar properties and arrays

• dc:title = "My Document"

• dc:contributors = ["bob", "pete", "mary"]

• dc:created = 2014-07-03T12:15:07+0200

• ecm:uuid = 52a7352b-041e-49ed-8676-328ce90cc103

• ecm:primaryType = "MyFile"

• ecm:majorVersion = 2, ecm:minorVersion = 0

• ecm:isLatestMajorVersion = true, ecm:isLatestVersion = false

Page 6: From SQL to MongoDB

Nuxeo Core — Rich Documents

• Complex properties and lists of them

• primaryAddress = { street = "1 rue René Clair", zip = "75018",city = "Paris", country = "France" }

• files = [

• { name = "doc.txt", length = 1234, mime-type = "plain/text",data = 0111fefdc8b14738067e54f30e568115 }

• { name = "doc.pdf", length = 29344, mime-type = "application/pdf", data = 20f42df3221d61cb3e6ab8916b248216 }

]

Page 7: From SQL to MongoDB

Nuxeo Core — Rich Operations

• CRUD

• Create

• Retrieve

• Update

• Delete

• Move

• Copy

• ... but in a Hierarchy

Page 8: From SQL to MongoDB

Nuxeo Core — Rich Features

• Security based on ACLs and inheritance

• block bob for Write, allow members for Read

• Proxies (multi-filing)

• Versioning

• Placeless documents (versions, tags, relations...)

• Facets (dynamic typing)

• Locking

• Search (NXQL)SELECT * FROM File WHERE files/*/name = 'doc.txt'

Page 9: From SQL to MongoDB

Nuxeo Core — Hierarchy

• Parent-child relationship

• Recursion

• Find all the children to change something

• Lifecycle state

• Security

• Search on a subset of the hierarchy

• ... AND ecm:path STARTSWITH '/workspaces/receipts'

Page 10: From SQL to MongoDB

SQL vs DBS/MongoDB

Page 11: From SQL to MongoDB

Storage — SQL

• Stores data in a set of JOINed tables

• Star schema, around the main hierarchy

• Lists as JOINed table with item/pos

• Complex properties as sub-documents (children)

• Lists of complex properties as ordered sub-documents

• Id generated by application or database

• String / native UUID / serial integer

Page 12: From SQL to MongoDB

Storage — SQL (base hierarchy)

Page 13: From SQL to MongoDB

Storage — SQL (simple props)

Page 14: From SQL to MongoDB

Storage — SQL (complex props)

Page 15: From SQL to MongoDB

Storage — MongoDB

• Standard JSON documents

• Property names fully prefixed

• Lists as arrays of scalars

• Complex properties as sub-documents

• Complex lists as arrays of sub-documents

• Id generated by MongoDB

• Counter using findAndModify, $inc and returnNew

Page 16: From SQL to MongoDB

Storage — MongoDB

"ecm:id": "52a7352b-041e-49ed-8676-328ce90cc103","dc:title": "My Document","dc:contributors": ["bob", "pete", "mary"],"dc:created": ISODate("2014-07-03T12:15:07+0200"), "ecm:primaryType": "MyFile","ecm:majorVersion": NumberLong(2),"ecm:minorVersion": NumberLong(0),"ecm:isLatestMajorVersion": true,"ecm:isLatestVersion": false,

Page 17: From SQL to MongoDB

Storage — MongoDB

primaryAddress: { street: "1 rue René Clair", zip: "75018", city: "Paris", country: "France" },files: [{ name: "doc.txt", length: 1234, mime-type: "plain/text", data: "0111fefdc8b14738067e54f30e568115" }, { name: "doc.pdf", length: 29344, mime-type: "application/pdf", data: "20f42df3221d61cb3e6ab8916b248216" }] "ecm:acp": [{ name: "local", acl: [{ grant: false, perm: "Write", user: "bob" }, { grant: true, perm: "Read", user: "pete" }, { grant: true, perm: "Read", user: "members" }] }]

Page 18: From SQL to MongoDB

Hierarchy — SQL

• Parent-child relationship

• hierarchy.parentid column

• Recursion optimized through ancestors table

• For each document list all its ancestors

• Maintained by database triggers (create, delete, move, copy)

• Alternative for PostgreSQL: array column with all ancestors

Page 19: From SQL to MongoDB

Hierarchy — SQL

Page 20: From SQL to MongoDB

Hierarchy — MongoDB

• Parent-child relationship

• ecm:parentId field

• Recursion optimized through ecm:ancestorIds array

• Maintained by framework (create, delete, move, copy)

Page 21: From SQL to MongoDB

Hierarchy — MongoDB

"ecm:parentId": "afb488e7",

"ecm:ancestorIds": ["00000000", "18ba9e90", "afb488e7"],

Page 22: From SQL to MongoDB

Proxies — SQL

• Reference to target document

• proxies.targetid column

• Holds only hierarchy-based information, no content

• Parent, name, ACL...

• Additional JOIN during search

Page 23: From SQL to MongoDB

Proxies — MongoDB

• Copy of the target document

• ecm:proxyTargetId field

• Target document knows who's pointing to it

• ecm:proxyIds field

• Maintained by framework

• Copy needs to be kept up to date when target changes

• Maintained by framework

Page 24: From SQL to MongoDB

Proxies — Semantics

• What to do when:

• Target removed (→ forbid)

• Proxy removed

• Proxy + target removed at the same time (→ ok)

• Target copied

• Proxy copied (→ new proxy to original target)

• Proxy + target copied at the same time (todo)

Page 25: From SQL to MongoDB

Security — SQL

• Generic ACP stored in acls table

• Precomputed Read ACLs needed for search

• Ordered list of identities having access, with blocking["Management", "Supervisors", "-Temps", "bob"]

• Read ACLs are given an identifier

• Identities having access to which Read ACL is precomputed

• Maintained by database triggers

• Search matches using JOIN

Page 26: From SQL to MongoDB

Security — SQL

Page 27: From SQL to MongoDB

Security — SQL

Page 28: From SQL to MongoDB

Security — MongoDB

• Generic ACP stored in ecm:acp field

• Precomputed Read ACLs needed for search

• Simple set of identities having accessecm:racl: ["Management", "Supervisors", "bob"]!

• Semantic restrictions on blocking

• Maintained by framework

• Search matches if intersection{"ecm:racl": {"$in": ["bob", "members", "Everyone"]}}

Page 29: From SQL to MongoDB

Search — SQL

• Translated from NXQL to SQL

• JOIN of all required star/list/complex properties tables

• Additional UNION + JOINs for proxies

• Additional JOIN for security

• Can have correlations (reuse same JOIN)

• Fulltext index(es) on fulltext.simpletext / fulltext.binarytext columns

Page 30: From SQL to MongoDB

• Translated from NXQL to MongoDB syntax

• Proxies queried directly

• Security queried by set intersection

• One fulltext index for ecm:fulltextSimple / ecm:fulltextBinary fields

• Some limitations

Search — MongoDB

Page 31: From SQL to MongoDB

Search — MongoDB Limitations

• Only one fulltext search per query, restrictions on position

• No generic boolean NOT, must be pushed down as negative operators

• Search is field/value based

• No multi-field operators (title = description, expirationDate > modificationDate)

• No multi-field arithmetic (amount + bonus < 1000)

• Subdocument correlation with $elemMatch is less generic than full JOINs

Page 32: From SQL to MongoDB

Transactions — SQL

• Standard SQL database capabilities

• Atomic commit

• Two-phase commit (prepare/commit) also useable, although costly

• Rollback

• Transient data is data modified in the database but not yet committed

• Transient data is visible along committed data for retrieval and search

Page 33: From SQL to MongoDB

Transactions — MongoDB

• No atomic commit beyond a single document

• Commit using a big batch of create/delete/update accumulated in-memory

• Not atomic, others can see partial state

• No transient space

• Emulate transient space in-memory, flush at commit time

• All accesses and searches must check the transient space as well as MongoDB

Page 34: From SQL to MongoDB

Transactions — MongoDB

• No rollback

• Rollback by dropping the in-memory transient space

• Operations involving several documents in relation

• Move, delete, copy, ancestors or recursion checks

• Using transient space + MongoDB for them is too complex

• Flush to MongoDB before doing them (commit)

• Must be able to be rolled back if needed (transaction compensation)

• Others can see state that's eventually invalid

Page 35: From SQL to MongoDB

MongoDB — Restrictions

• Eventual consistency and no transactions

• Prevents strong checks

• Duplicate name in a folder

• Move creating cycles

• Remove target before proxy

• Create document in a deleted folder

• Prevents full consistency of hierarchical processing

• Read ACLs, quotas

• Needs background jobs that check consistency

Page 36: From SQL to MongoDB

MongoDB — Features

• Bulk operations

• Map-reduce for aggregations

• Quotas / count / folder content last modified

• Conditional updates

• Locks

• Prevent dirty writes

• GridFS to store binaries

• Sharding

Page 37: From SQL to MongoDB

DBS — Future Work

Page 38: From SQL to MongoDB

Future Work

• DBS used for more services

• Directories / Vocabularies / User database

• Audit log

• DBS for other backends

• Elasticsearch

• Redis

• PostgreSQL / JSON

• Other...

Page 39: From SQL to MongoDB

Thanks!

Page 40: From SQL to MongoDB

We're Hiring!


Recommended