Date post: | 09-Jun-2015 |
Category: |
Technology |
Upload: | daniel-coupal |
View: | 337 times |
Download: | 0 times |
Semi Formal Model for Document Oriented DatabasesDaniel CoupalUniversia.com
1
Agenda
1.Why Having a Model?
2.Modeling Steps
3.Capturing the Model
4.Tools
2
Why having a Model?
• Documentation, common language
• Repeatable process
• Abstraction from database implementations
• Support for tools
• A document DB is supposed to be “schemaless”!
• No! Having a schema is a good thing.Need to declare everything is the problem.
3
What if you have many apps?Info about the schema is in the code of Application A
Application B wants to read the data in the DB.Where is the description of what it can read, write, ...?
4
Why we choose NoSQL?• Rewards
• Huge amount of data
• Cheap hardware
• Blazing fast
5
Why we choose NoSQL?• Rewards
• Huge amount of data
• Cheap hardware
• Blazing fast
• Compromises
• No joins, no transactions, less integrity
• Not as mature technology
• Less tools
6
Tradeoff between Performance and Data Integrity
NoSQL Little Secrets• No experience on maintaining
databases and apps over the years, which is the most expensive activity in software development.
• Not all the same vendors will be there in few years.
• What if your DB is not maintained anymore?
• What if there is a better DB available?
7
NoSQL State of the Art
• Designing by Example
• Used in most tutorials
• Works well on small examples, like blogs
• Database with more tables needs a better way to capture the design
8
{ "_id" : ObjectId("508d27069cc1ae293b36928d"), "title" : "This is the title", "body" : "This is the body text.", "tags" : [ "chocolate", "spleen", "piano", "spatula" ], "created_date" : ISODate("2012-10-28T12:41:39.110Z"), "author_id" : ObjectId("508d280e9cc1ae293b36928e"), "category_id" : ObjectId("508d29709cc1ae293b369295"), "comments" : [ { "subject" : "This is comment 1", "body" : "This is the body of comment 1.", "author_id" : ObjectId("508d345f9cc1ae293b369296"), "created_date" : ISODate("2012-10-28T13:34:23.929Z") }, { "subject" : "This is comment 2", "body" : "This is the body of comment 2.", "author_id" : ObjectId("508d34739cc1ae293b369297"), "created_date" : ISODate("2012-10-28T13:34:43.192Z") }, ] }
9
NoSQL State of the Art
Complex ER Diagram
10
Northwind ER Diagram
11
Northwind Doc Diagram
11 tables in those 5 collectionsNo need for: - CustomerCustomerDemographics - EmployeeTerritoriesbecause they are N-to-N relationships, and don’t contain any data
Products
Suppliers
Orders Employees Customers
Customer Demographics
Shippers
OrderDetailsRegion
Categories
12
Territories
That was a bad example...
• Why?
13
That was a bad example...
• Why?
• With a document database, you don’t model data as your first step!
• Data is modeled based on the usage
• SQL’s model first approach leads to bad performance for every app.NOSQL does the opposite.
14
Modeling Steps
SQL NoSQL
Goal
Answer to
Step 1
Step 2
Step 3
Step 4
general usage current usage
what answer do I have? what questions do I have?
model data write queries
write application add indexes
write queries model data
add indexes write application
15
Step 1: Write Queries
• Basic fields to retrieve
• Frequency of the query, requested speed
• Criticality of the query for the system
• Design notes
➡ Sort the queries by importance
16
Step 2: Add Indexes
• Which indexes do you need for the queries to go fast?
• Attributes of your indexes
17
Step 3: Model Data
• List the collections
• How many documents per collection?
➡ NoSQL is all about size and performance, no?
• Attributes on the collections (capped, ...)
• List the fields, their types, constraints
➡ Only for the important fields
18
Step 4: Write Application
• Integration code/driver/queries/database
• Balance between using the product functionality and isolating the layer that deals with the database.
• Interesting new tools to normalize to a common query language: JSONiq, BigSQL, ...
19
Capturing the Model
• JSON is a cool format!
• Your document database is a cool storage facility!
• Language for the model: JSON Schema• supports things like: types, cardinality, references, acceptable values, ...
20
JSON Schema
{ "address": { "streetAddress": "21 2nd Street", "city":"New York" }, "phoneNumber": [ { "type":"home", "number":"212 555-1234" } ]}
{ "type": "object", "properties": { "address": { "type": "object", "properties": { "city": { "type": "string" }, "streetAddress": { "type": "string" } } }, "phoneNumber": { "type": "array", "items": { "properties": { "number": { "type": "string" }, "type": { "type": "string" } } } } }}
21
Model: Query
• Use:• the native DB notation
• or use SQL (everyone can read SQL)
• Avoid joins!!!
• Example:• Product by ProductID, ProductName, SupplierID
• Order by OrderID, CustomerID, ContactName
• Customer by CustomerID, ContactName, OrderID
22
Example
23
{! "id" : "REQ002",! "name" : "Get product by name",! "n" : “20000/day”, “t” : “2 ms”,! "notes" : [! ! "User asking about a product availability by product name"! ],! "sqlquery" : "select * from product where product.ProductName = abcde",! "mongoquery" : {! ! "ProductName" : "abcde"! }}
Model: Index
• Again, use the native DB notation
• Example:• Product.ProductID, .ProductName, .SupplierID
• Order.OrderID, .CustomerID, .ContactName
• Customer by .CustomerID, .ContactName, .OrderID
• Why is it useful, it looks so trivial?• If written a tool can validate it or create estimates
24
Example
25
{! "id" : "REQ002",! "name" : "Get product by name",! "n" : “20000/day”, “t” : “2 ms”,! "notes" : [! ! "User asking about a product availability by product name"! ],! "sqlquery" : "select * from product where product.ProductName = abcde",! "mongoquery" : {! ! "ProductName" : "abcde"! },! "index" : {! ! "collection" : "Products",! ! "field" : "ProductName"! }}
Model: Data
• Collection
• One JSON-Schema document per collection
• Fields for collection and database
• Optionally, add a version number
26
Example for ‘Orders’
27
{ “database” : “northwind”, “collection” : “Orders”, “version” : 1, "type":"object", "$schema": “http://json-schema.org/draft-03/schema”, "id": "http://jsonschema.net", “properties”: { "CustomerID": { "type":"string", "id": "http://jsonschema.net/CustomerID" }, “Details”: { "type":"array", "id": "http://jsonschema.net/Details", "items": { “type”: “object”, "id": "http://jsonschema.net/Details/0", “required”: [ “ProductID”, “Quantity” ], "properties": { "ProductID": { "type":"number", "id": "http://jsonschema.net/Details/0/ProductID" }, "Quantity": { “type”: “number", },
Simpler...
28
{ “database” : “northwind”, “collection” : “Orders”, “version” : 1, "type":"object", "properties": { "CustomerID": { "type":"string" }, "Details": { "type":"array", "items": { "type":"object", "properties": { "ProductID": { "type":"number" }, "Quantity": { "type":"number" },...
Model: Versioning
• Each modified version of a collection is a new document
• db.<database>.find({“version:2”})
➡shows all collections for version ‘2’ of the schema for the DB.
29
Partial Schema
• Example: you just want to validate the ‘version’ field which has values as ‘string’ and as ‘number’
30
{ "type": "object", "properties": { "version": { "type": "string", } }}
{ "version": 1.0, ...},{ "version": “1.0.1”, ...}
JSON SchemaJSON
Tools
• Get some JSON Schema from JSON:
• http://www.jsonschema.net/
• Validate your schema
• http://jsonschemalint.com/
• https://github.com/dcoupal/godbtools.git
• Validate/edit JSON
• http://jsonlint.com/ or RoboMongo
• Import SQL into NoSQL
• Pentaho, Talend
31
Tools considerations
• NoSQL often relies on data being in RAM. Scanning all your data can make your dataset in memory “cold”, instead of “hot”
• running incremental validations work better, ensure you have timestamps on insertions and updates
32
Document Validator
33
Schema(JSON Schema)
Collection(JSON)
Validator
“Eventual Integrity”
• NoSQL have eventual consistency
• With tools that validate and fix the data according to a set of rules, we get “eventual integrity”
34
Tools to be developed
• UI to manipulate a schema graphically
• More Complete Validators:
• constraints
• relationships
• Per language library to validate inserted/updated documents
35
Conclusion: Take Aways
• Design in this order: queries, indexes, data, application.
• Capture your model outside the application.
• Not having a schema is not a good thing!Use the attribute ‘schemaless’ wisely!
36
NoSQL
Goal
Answer to
Step 1
Step 2
Step 3
Step 4
current usage
what questions do I have?
write queries
add indexes
model data
write application