Post on 02-Jul-2018
transcript
NoSQL Systems > Timeline
2003 Memcached2006 Google BigTable2007 Amazon Dynamo
2007 HBase2008 Cassandra, CouchDB2009 P.Voldemort, Redis, Riak, MongoDB
30/05/2011 Sistemi NoSQL 2
NoSQL Systems > Memcached
• What ismemcached– Caching system intended to alleviate database load.– In‐memory key‐value store for small chunks of data.
• Extremely successful– Facebook, Yahoo, Wikipedia, Ebay, Digg, ….
30/05/2011 Sistemi NoSQL 3
Memcached > How does it work
30/05/2011 Sistemi NoSQL 4
Super simple!
v = memcachedClient.get(key);if(v == NULL) {
v = db.query( SOME SLOW QUERY );memcachedClient.set(key, v);
}
Key‐value cache1. Keys are hashed2. Hash table span across an
arbitrary number of servers
NoSQL Systems > Google BigTable
30/05/2011 Sistemi NoSQL 6
• BigTable is a distributed storagesystem for managing structureddata that is designed to scale to a very large size.
• Petabytes of data across thousands of commodity servers.
• Built on top of Google File System
Google BigTable > Data Model
30/05/2011 Sistemi NoSQL 7
…
Row Id ColumnFamily1 ColumnFamily2 … ColumnFamilyN
rowid1 qualifier1 = “abc”qualifier2 = “def”qualifier3 = “123”…
qualifier1 = “xyz”qualifier5 = “fgh”
… …
rowid2…
• Column Families are (the only things) defined in the schema• Qualifiers are added dynamically.
• Simple queries• Get a row by key• Get a range of rows by (start key, end key)
Google BigTable > Data Model > Example
30/05/2011 Sistemi NoSQL 8
• Student – Course– 1 student > many courses– 1 course > many students
Studentsid PKnameemailbirthdate
Courseid PKtitledescriptionteacher_id
Student2Coursestudent_idcourse_id
Google BigTable > Data Model > Example
30/05/2011 Sistemi NoSQL 9
De‐normalized data
Single key‐space
Google BigTable > Infrastructure
• Partition model: sharding on the row key :– Data is divided into tablets– Each tablet is defined by the range of row keys it isresponsible for (start key – end key)
– Each tablet is served by one tablet server at a time– Each tablet server may serve (has the lock for) manytablets.
• Distributed locking service called Chubby– Manages tablet servers lifecycle
30/05/2011 Sistemi NoSQL 10
Google BigTable > Infrastructure
• Three‐level hierarchy to store tablet location– Analogous to a B+ Tree
30/05/2011 Sistemi NoSQL 11
Master ServerTablet Servers
Google BigTable > Infrastructure
30/05/2011 Sistemi NoSQL 12
Client Master Server
Tablet Server
Tablet Server
Tablet Server
Tablet Serverrequest
request
response
• Strong consistency– Only one tablet server is responsible for a given piece of data.– Replication is handled on the GFS layer
• Trade‐off with availability– If a tablet server fails, its portion of data is temporarily unavailable until a new
server is assigned
NoSQL Systems > Amazon Dynamo
“An extra tenth of second in response times will cost us1% in sales” ‐ Amazon
• Dynamo: Highly available key‐value store
• Challenge: reliability at massive scale– Tens of millions of customers.– Tens of thousands of servers.
30/05/2011 Sistemi NoSQL 13
Amazon Dynamo > Data Model
• Binary objects (i.e. blobs) identified by uniquekeys
• Query model: – Simple read and write operations to data retrievedby primary key
– No operations span multiple data items
30/05/2011 Sistemi NoSQL 14
Amazon Dynamo > Infrastructure
• Partitioning similar to P2P (Chord, Pastry, etc.)– Keys are hashed.– The range of the hash function is treated as a circular space (ring).
– Each node is responsible for a region of the ring.– Distributed Hash Table (DHT)
30/05/2011 Sistemi NoSQL 15
AA
N=1N=1
N=2N=2
N=2N=2
N=3N=3
NoSQL Systems > Amazon Dynamo
30/05/2011 Sistemi NoSQL 16
“AE107FB…”
• Each node is responsiblefor the region between itand its N predecessors.
• N is tuned on per‐nodebasis
NoSQL Systems > Amazon Dynamo
• Replication– Each data item is replicated at many hosts
• Eventual consistency– Updates are propagated to replicas asynchronously– The system eventually reaches a consistent state
• Tradeoff between consistency and availability– Number of replicas is crucial
30/05/2011 Sistemi NoSQL 17
Case Study > Facebook Messages
• Real‐time, reliable messaging system that combines chat, messagesand emails.
• 135+ billion messages per month
• Two main usage patterns– A short set of temporal data that tends to be volatile– An ever growing set of data that rarely gets accessed
• Candidate systems: – MySQL– Apache Cassandra– Apache HBase
30/05/2011 Sistemi NoSQL 19
Facebook Messages > MySQL
• Attractive choice:+ Facebook core infrastructure is MySQL‐based
• It is indeed a giant LAMP application+ Facebook team has extensive knowledge in running and managing MySQL
• But…– MySQL clustering is hard to mantain (and scale)– MySQL performances suffer with large indexes and data sets
30/05/2011 Sistemi NoSQL 20
Facebook Messages > Apache HBase
• BigTable’s open‐source clone– Extensible record store– Strong consistency
• Availability trade‐off
• Part of the Hadoop ecosystem– Built on top of HDFS– Integrates with Hive, ZooKeeper, etc.
30/05/2011 Sistemi NoSQL 21
Facebook Messages > Apache Cassandra
• Marriage between BigTable and Dynamo– Data model: Extensible record store (BigTable)– Infrastructure: Distributed Hash Table (Dynamo)
• Eventual consistency• High availability
• Developed by Facebook itself– To serve the (old) inbox system
30/05/2011 Sistemi NoSQL 22
Facebook Messages > Evaluation results
• MySQL soon discarded• Hbase vs Cassandra
30/05/2011 Sistemi NoSQL 23
Data model Consistency model Availability
HBase Extensible record store
Strong consistency ‐ Replicas managed by HDFS
‐ Region servers are singlepoints of failure
Cassandra Extensible record store
Eventual consistency ‐ No single point of failure
Facebook Messages > Evaluation results
• MySQL soon discarded• Hbase vs Cassandra
30/05/2011 Sistemi NoSQL 24
Data model Consistency model Availability
HBase Extensible record store
Strong consistency ‐ Replicas managed by HDFS
‐ Region servers are singlepoints of failure
Cassandra Extensible record store
Eventual consistency ‐ No single point of failure
• Hbase won– Strong consistency is a better match for real‐time systems
NoSQL Systems > Overview
• We have seen:– Extensible record stores
• BigTable, HBase, Cassandra
– Key‐value stores• Dynamo
• There’s more to it!– Document stores
30/05/2011 Sistemi NoSQL 25
NoSQL Systems > Document stores
• Systems that store collections of documents
• What is a document?– Generally, an object with a number of fields, whosevalues can be scalars, lists, or nested documents aswell
• e.g.: XML, JSON
30/05/2011 Sistemi NoSQL 26
Guardian.co.uk > 2005‐09
Modern Java application– Strong model in Java– Oracle RDBMS– Database abstractedwith ORM
30/05/2011 Sistemi NoSQL 28
Problems: increasing complexity– Complex Hibernate binding (10.000+ lines of XML config)– Lots of optimisations– Complex caching strategy– Load becoming an issue– …
Guardian.co.uk > 2009‐10
30/05/2011 Sistemi NoSQL 29
• Introduce yet more caching
Memcached
• Decouple applications from db by building APIs– Power APIs using scalable technologies (Apache Solr)– JSON results
DB Load
Guardian.co.uk > 2009‐10
30/05/2011 Sistemi NoSQL 30
Three models now:– RDBMS Tables– Java objects– JSON API
JSON model is very simple:– Multiple domain objects expressed in a single doc– Can be designed in a forwardly extensible way
headache
Guardian.co.uk > 2009‐10
30/05/2011 Sistemi NoSQL 32
Article
Tags
What if the JSON API was the primary model?• CouchDB• MongoDB
What if the JSON API was the primary model?• CouchDB• MongoDB
NoSQL Systems > MongoDB vs CouchDB
30/05/2011 Sistemi NoSQL 33
CouchDB MongoDB
Data Model Collections of JSON docs Collections of BSON docs
Queries Low‐level query language Rich, declarative query language
Consistency Model Eventual Consistency Strong Consistency (tunable though)
Replication Master‐Master Master‐Slave
Scalability Through replication Sharding
NoSQL Systems > MongoDB vs CouchDB
30/05/2011 Sistemi NoSQL 34
CouchDB MongoDB
Data Model Collections of JSON docs Collections of BSON docs
Queries Low‐level query language Rich, declarative query language
Consistency Model Eventual Consistency Strong Consistency (tunable though)
Replication Master‐Master Master‐Slave
Scalability Through replication Sharding
• MongoDB was chosen:• Can easily express complex queries• Good if you come from RDBMS• No need for extreme scalability (where CouchDB shines)
NoSQL Systems > Links and References
• Rick Cattel – Scalable SQL and NoSQL Datastores• R.Cattel, M.Stonebraker – Ten Rules for Scalable Performance in “Simple
Operation” Datastores• M.Stonebraker – SQL vs NoSQL Databases
• A.Popescu – MyNoSQL Blog
• Chang et al. – Google BigTable• DeCandia et al – Amazon Dynamo
• We have encountered:– Cassandra – cassandra.apache.org– Hbase ‐ hbase.apache.org– CouchDB ‐ couchdb.apache.org– MongoDB ‐ http://www.mongodb.org– Memcached ‐memcached.org
30/05/2011 Sistemi NoSQL 35