Post on 01-Nov-2014
description
transcript
distilled
Boris Trofimov Team Lead@Sigma Ukraine
@b0ris_1btrofimoff@gmail.com
Agenda● Part 1. Why NoSQL
– SQL benefints and critics
– NoSQL challange● Part 2. MongoDB
– Overview
– Console and query example
– Java Integration
– Data consistancy
– Scaling
– Tips
Part 1. Why NoSQL
Relational DBMS Benefits
SQL
● Simplicity● Uniform representation● Runtime schema modifications
SELECT DISTINCT p.LastName, p.FirstName FROM Person.Person AS p JOIN HumanResources.Employee AS e ON e.BusinessEntityID = p.BusinessEntityID WHERE 5000.00 IN (SELECT Bonus FROM Sales.SalesPerson AS sp WHERE e.BusinessEntityID = sp.BusinessEntityID);
Strong schema definition
Strong consistency
SQL features like Foreign and Primary Keys, Unique fields
ACID (atomicity, consistency, isolation, durability) transactions
Business transactions ~ system transactions
RDBMS Criticism
Big gap between domain and relational model
Performance Issues
JOINS Minimization Choosing right transaction strategy Query Optimization
Consistency costs too much
Normalization Impact Performance issues
Schema migration issuesConsistency issues
Reinventing bicycle
Involving external tools like DBDeploy
Scaling options
Consistency issues
Poor scaling options
SQL Opposition
● Object Databases by OMG● ORM● ?
No SQL Yes
● Transactionaless in usual understanding
● Schemaless, no migration
● Closer to domain
● Focused on aggregates
● Trully scalable
NoSQL Umbrella
Key-Value Databases
Column-Family Databases
Document-oriented Databases
Graph-oriented Databases
Aggregate oriented Databases
● Document databases implement idea of Aggregate oriented database.
● Aggregate is a storage atom● Aggregate oriented databsaes are closer to application
domain.● Ensures atomic operations with aggregate● Aggregate might be replicated or sharded efficiently● Major question: to embed or not to embed
Relations vs Aggregates
// in customers{"id":1,"name":"Medvedev","billingAddress":[{"city":"Moscow"}]}
// in orders{"id":99,"customerId":1,"orderItems":[ { "productId":47, "price": 444.45, "productName": "iPhone 5" } ],"shippingAddress":[{"city":"Moscow"}]"orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ],}
Relational Model Document Model
Part 2. MongoDB
MongoDB Basics
MongoDB is document-oriented and DBMS
MongoDB is Client-Server DBMS
Mongo DB = Collections + Indexes
JSON/JavaScript is major language to access
Collections
Simple creating (during first insert).
Two documents from the same collection might be completly different
NameDocuments
IndexesIndexes
Document
{ "fullName" : "Fedor Buhankin", "course" : 5, "univercity" : "ONPU", "faculty" : "IKS", "_id" : { "$oid" : "5071c043cc93742e0d0e9cc7" } "homeAddress" : "Ukraine, Odessa 23/34", "averageAssessment" : 5, "subjects" : [ "math", "literature", "drawing", "psychology" ] }
Identifier (_id)
Body i JSON (Internally BSON)
● Any part of the ducument can be indexed● Max document size is 16M
● Major bricks: scalar value, map and list
MongoDB Console
Query Examples
// in customers{"id":1,"name":"Medvedev","billingAddress":[{"city":"Moscow"}]}
// in orders{"id":99,"customerId":1,"orderItems":[ { "productId":47, "price": 444.45, "productName": "iPhone 5" } ],"shippingAddress":[{"city":"Moscow"}],"orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ]}
SELECT * FROM ORDERS;
db.orders.find()
Simple Select
SELECT * FROM ORDERS WHERE customerId = 1;
db.orders.find( {"customerId":1} )
Simple Condition// in customers{"id":1,"name":"Medvedev","billingAddress":[{"city":"Moscow"}]}
// in orders{"id":99,"customerId":1,"orderItems":[ { "productId":47, "price": 444.45, "productName": "iPhone 5" } ],"shippingAddress":[{"city":"Moscow"}],"orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ]}
SELECT * FROM orders WHERE customerId > 1
db.orders.find({ "customerId" : { $gt: 1 } } );
Simple Comparison// in customers{"id":1,"name":"Medvedev","billingAddress":[{"city":"Moscow"}]}
// in orders{"id":99,"customerId":1,"orderItems":[ { "productId":47, "price": 444.45, "productName": "iPhone 5" } ],"shippingAddress":[{"city":"Moscow"}],"orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ]}
SELECT * FROM orders WHERE customerId = 1 AND orderDate is not NULL
db.orders.find( { customerId:1, orderDate : { $exists : true } } );
AND Condition// in customers{"id":1,"name":"Medvedev","billingAddress":[{"city":"Moscow"}]}
// in orders{"id":99,"customerId":1,"orderItems":[ { "productId":47, "price": 444.45, "productName": "iPhone 5" } ],"shippingAddress":[{"city":"Moscow"}],"orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ]}
SELECT * FROM orders WHERE customerId = 100 OR orderDate is not NULL
db.orders.find( { $or:[ {customerId:100}, {orderDate : { $exists : false }} ] } );
OR Condition// in customers{"id":1,"name":"Medvedev","billingAddress":[{"city":"Moscow"}]}
// in orders{"id":99,"customerId":1,"orderItems":[ { "productId":47, "price": 444.45, "productName": "iPhone 5" } ],"shippingAddress":[{"city":"Moscow"}],"orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ]}
SELECT orderId, orderDateFROM orders WHERE customerId = 1
db.orders.find({customerId:1},{orderId:1,orderDate:1})
Select fields// in customers{"id":1,"name":"Medvedev","billingAddress":[{"city":"Moscow"}]}
// in orders{"id":99,"customerId":1,"orderItems":[ { "productId":47, "price": 444.45, "productName": "iPhone 5" } ],"shippingAddress":[{"city":"Moscow"}],"orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ]}
SELECT * FROM OrdersWHERE Orders.id IN (
SELECT id FROM orderItem WHERE productName LIKE '%iPhone%')
db.orders.find( {"orderItems.productName":/.*iPhone.*/} )
Inner select// in customers{"id":1,"name":"Medvedev","billingAddress":[{"city":"Moscow"}]}
// in orders{"id":99,"customerId":1,"orderItems":[ { "productId":47, "price": 444.45, "productName": "iPhone 5" } ],"shippingAddress":[{"city":"Moscow"}],"orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ]}
SELECT * FROM orders WHERE orderDate is NULL
db.orders.find( { orderDate : { $exists : false } } );
NULL checks// in customers{"id":1,"name":"Medvedev","billingAddress":[{"city":"Moscow"}]}
// in orders{"id":99,"customerId":1,"orderItems":[ { "productId":47, "price": 444.45, "productName": "iPhone 5" } ],"shippingAddress":[{"city":"Moscow"}],"orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ]}
More examples
• db.orders.sort().skip(20).limit(10)
• db.orders.count({ "orderItems.price" : { $gt: 444 })
• db.orders.find( { orderItems: { "productId":47, "price": 444.45, "productName": "iPhone 5" } } );
• db.orders.find()._addSpecial( "$comment" , "this is tagged query" )
Queries between collections
● Remember, MongoDB = no JOINs
● 1 approach: Perform multiple queries (lazy loading)● 2 approach: use MapReduce framework● 3 approach: use Aggregation Framework
Map Reduce Framework● Is used to perform complex grouping with collection
documents● Is able to manipulate over multiple collections● Uses MapReduce pattern● Use JavaScript language● Support sharded environment● The result is similar to materialized views
Map Reduce Concept
a1a1
a2a2
a3a3
a4a4
a5a5
a6a6
anan
......
b1b1
b2b2
b3b3
b4b4
b5b5
b6b6
bnbn
......
Launch mapFor every elem
Launch reduce
mapmap
mapmap
mapmap
mapmap
mapmap
mapmap
mapmap
reducereduce cc
f map : A→ B f reduce : B[ ]→C
Implement MAP functionImplement MAP function
Implement REDUCE functionImplement REDUCE function
Execute MAP func:Mark each document
with specific color
Execute MAP func:Mark each document
with specific color
Input
Execute REDUCE func:Merge each colored set
into single element
Execute REDUCE func:Merge each colored set
into single element
MAP
REDUCE
Output
Collection X
How it works
Take amount of orders for each customer
db.cutomers_orders.remove(); mapUsers = function() { emit( this.customerId, {count: 1, this.customerId} );}; reduce = function(key, values) { var result = {count: 0, customerId:key}; values.forEach(function(value) { result.count += value.count; }); return result; }; db.customers.mapReduce(mapUsers, reduce, {"out": {"replace""cutomers_orders"}});
Output: [ {count:123, customerId:1}, {count:33, customerId:2} ]
Aggregation andAggregation Framework
● Simplify most used mapreduce operarions like group by criteria
● Restriction on pipeline size is 16MB● Support sharded environment (Aggregation
Framework only)
Indexes
● Anything might be indexed● Indexes improve performance● Implementation uses B-trees
Access via API
Mongo m = new Mongo();// orMongo m = new Mongo( "localhost" );// orMongo m = new Mongo( "localhost" , 27017 );// or, to connect to a replica set, supply a seed list of membersMongo m = new Mongo(Arrays.asList(new ServerAddress("localhost", 27017), new ServerAddress("localhost", 27018), new ServerAddress("localhost", 27019)))DB db = m.getDB( "mydb" );
DBCollection coll = db.getCollection("customers");
ArrayList list = new ArrayList(); list.add(new BasicDBObject("city", "Odessa")); BasicDBObject doc= new BasicDBObject(); doc.put("name", "Kaktus"); doc.put("billingAddress", list); coll.insert(doc);
Use Official MongoDB Java Driver (just include mongo.jar)
Closer to Domain model● Morphia http://code.google.com/p/morphia/● Spring Data for MongoDB
http://www.springsource.org/spring-data/mongodb
Major features:● Type-safe POJO centric model● Annotations based mapping behavior● Good performance● DAO templates● Simple criterias
Example with Morphia@Entity("Customers")class Customer { @Id ObjectId id; // auto-generated, if not set (see ObjectId) @Indexed String name; // value types are automatically persisted List<Address> billingAddress; // by default fields are @Embedded Key<Customer> bestFriend; //referenceto external document @Reference List<Customer> partners = new ArrayList<Customer>(); //refs are stored and loaded automatically // ... getters and setters
//Lifecycle methods -- Pre/PostLoad, Pre/PostPersist... @PostLoad void postLoad(DBObject dbObj) { ... }}
Datastore ds = new Morphia(new Mongo()).createDatastore("tempDB")morphia.map(Customer.class); Key<Customer> newCustomer = ds.save(new Customer("Kaktus",...)); Customer customer = ds.find(Customer.class).field("name").equal("Medvedev").get();
To embed or not to embed● Separate collections are good if you need
to select individual documents, need more control over querying, or have huge documents.
● Embedded documents are good when you want the entire document, size of the document is predicted. Embedded documents provide perfect performance.
Schema migration● Schemaless● Main focus is how the aplication will behave when
new field will has been added● Incremental migration technque (version field)
Use Cases : – removing field– renaming fields– refactoring aggregate
Data Consistency● Transactional consistency
– domain design should take into account aggregate atomicity
● Replication consistency– Take into account Inconsistency window (sticky sessions)
● Eventual consistency● Accept CAP theorem
– it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: consistency, availability and partition tolerance.
Scaling
Scaling options
● Autosharding● Master-Slave replication● Replica Set clusterization● Sharding + Replica Set
Sharding● MongoDB supports autosharding● Just specify shard key and pattern● Sharding increases writes● Major way for scaling the system
Master-Slave replication● One master, many slaves● Slaves might be hidden or can be used to read● Master-Slave increase
reades and provides
reliability
Replica Set clusterization● The replica set automatically elects a primary (master)● Master shares the same state between all replicas
● Limitation (limit: 12 nodes)● WriteConcern option
● Benefits:– Failover and Reliability
– Distributing read load
– maintance without downtime
Sharding + ReplicaSet
● Allows to build huge scalable failover database
MongoDB Criticism
● Dataloss reports on heavy-write configurations● Atomic operatons over multiple documents
When not to use
● Heavy cross-document atomic operations● Queries against varying aggregate structure
Tips● Do not use autoincrement ids● Small names are are preffered● By default DAO methods are async● Think twise on collection design● Use atomic modifications for a document
Out of scope
● MapReduce options● Indexes● Capped collections
Further reading
http://www.mongodb.org
Kyle Banker, MongoDB in Action
Martin Fowler NoSQL Distilled
Thank you!