Date post: | 18-Jan-2015 |
Category: |
Technology |
Upload: | sergio-bossa |
View: | 12,097 times |
Download: | 0 times |
Scale Your Database
And Be HappySergio Bossa
@sbtourist
Spring Framework Italian Meeting 2009
Sergio Bossa - http://www.linkedin.com/in/sergiob
About Me
➔ Software architect and engineer➔ Gioco Digitale (online gambling and casinos)
➔ Open Source enthusiast➔ Terracotta Messaging (http://forge.terracotta.org)➔ Actorom (http://code.google.com/p/actorom/)➔ Terrastore (coming soon…)
➔ (Micro-)Blogger➔ http://twitter.com/sbtourist➔ http://sbtourist.blogspot.com
Sergio Bossa - http://www.linkedin.com/in/sergiob
Premise #1
Database ≠
Relational Database
Sergio Bossa - http://www.linkedin.com/in/sergiob
Premise #2
Relational DatabasesAre Not
Dead
Sergio Bossa - http://www.linkedin.com/in/sergiob
Premise #3
You'll never hear the wordNoSQL
Here
Sergio Bossa - http://www.linkedin.com/in/sergiob
Scaling Your Database … what?
● Scaling used as a loose term here.● Scale to handle heterogeneous data.● Scale to handle more data.● Scale to handle more load.● Scale to handle topology changes due to:
● Unplanned growth.● Unpredictable failures.
Sergio Bossa - http://www.linkedin.com/in/sergiob
Scaling Your Database … why?
● Scaling the way you handle your data is going to be more and more important.● Business is moving toward data-centric
applications.● Let's call them “social”.
● Interest is toward efficient ways of:● Storing …● Serving …● Analyzing …● Data!
Sergio Bossa - http://www.linkedin.com/in/sergiob
Scaling Your Relational Database
Sergio Bossa - http://www.linkedin.com/in/sergiob
Replication
● Master - Slave replication.● One (and only one)
master database.● One or more slaves.● All writes goes to the
master.● Replicated to slaves.
● Reads are balanced among master and slaves.
● Major issues:● Single point of failure.● Single point of bottleneck.● Static topology.
Sergio Bossa - http://www.linkedin.com/in/sergiob
Replication
● Master - Master replication.● One or more masters.● Writes and reads can go
to any master node.● Writes are replicated
among masters.● Major issues:
● Limited performance and scalability (due to quorum).
● Complexity.● Static topology.
Sergio Bossa - http://www.linkedin.com/in/sergiob
Partitioning
● Vertical partitioning.● Put tables belonging to
different functional areas on different database nodes.● Scale your data and load
by function.● Move joins to the
application level.● Major issues:
● No more truly relational.● Limited scalability (what if
a functional area grows too much?).
Sergio Bossa - http://www.linkedin.com/in/sergiob
Partitioning
● Horizontal partitioning.● Split tables by key and put
partitions (shards) on different nodes.● Scale your data and load
by key.● Move joins to the
application level.● Needs some kind of
routing.● Major issues:
● No more truly relational.● Limited scalability (what if
you need to rebalance?).
Sergio Bossa - http://www.linkedin.com/in/sergiob
Caching
● Put a cache in front of your database.● Distribute.● Write-through for scaling
reads.● Write-behind for scaling
reads and writes.● Saves you a lot of pain, but
...● “Only” scales read/write
load.
Sergio Bossa - http://www.linkedin.com/in/sergiob
Still left out ...
● We didn't scale our data model.● Still bound to the relational data model.
● We didn't scale our topology.● Still static.● Hard to add nodes for handling growth.● Hard to tolerate nodes leaving due to failures.
Sergio Bossa - http://www.linkedin.com/in/sergiob
Non Relational Databases, coming...
Sergio Bossa - http://www.linkedin.com/in/sergiob
Friends or Foes?
We come in peace.To help our old friend: the relational database.
Sergio Bossa - http://www.linkedin.com/in/sergiob
Requirements
● Flexible data model.● Extreme reliability.● Scale as you need.
● Scale at unplanned change in the data model.● Scale at unplanned growth in data size.● Scale at unplanned growth in load.
Sergio Bossa - http://www.linkedin.com/in/sergiob
Data Model
● Column oriented (hybrid).● Group by columns.● Hybrid: group by keys and column families.
● Dynamically add columns.● Different key-identified values may have
different number of columns.● Efficiently access the same group of columns
(column family).
Sergio Bossa - http://www.linkedin.com/in/sergiob
Data Model
● Document oriented.● Group by named collections.● Identify by key.● Store a schema-less document.
● JSON.● XML.● Whatever ...
● Dynamically update your data model by simply changing your documents.
● Efficiently access whole documents.
Sergio Bossa - http://www.linkedin.com/in/sergiob
Data Model
● Key/Value oriented.● Group by named collections.● Identify by key.● Store an opaque value (whatever).
● Maybe the ancestor of modern non relationals.
Sergio Bossa - http://www.linkedin.com/in/sergiob
Data Partitioning
● Consistent Hashing.● Nodes mapped on a ring space of integers.
● Each node mapped on multiple locations.● Each node owns a range of integers.
● Keys assigned to integers in the ring space.● Stored on the owner node.
● Joining/Leaving nodes only affect the partition they're mapped to.● Hence, keys re-balancing is limited to that
specific range (efficient).
Sergio Bossa - http://www.linkedin.com/in/sergiob
Data Partitioning
Sergio Bossa - http://www.linkedin.com/in/sergiob
Data Consistency
● Strict (ACID) Consistency.● All nodes ...
● At every point in time ...● Hold a consistent view of the stored data.
● Reads and writes can executed on every node.● Results will be always consistent and up-to-
date.● Due to the CAP Theorem you will sacrifice one
of:● Availability.● Partition tolerance.
Sergio Bossa - http://www.linkedin.com/in/sergiob
Data Consistency
● Eventual (BASE) Consistency.● N: number of nodes you want to replicate to.● W: number of required writes to succeed.● R: number of required reads to succeed.● W < N
● Nodes not receiving the write may eventually get that value later.
● R < N● Nodes not holding the read value are ignored.
Sergio Bossa - http://www.linkedin.com/in/sergiob
Data Consistency
● Eventual (BASE) Consistency.● High read/write availability.
● Work even when some nodes fail to read and write values.
● Partition tolerance.● Work even when some nodes cannot be
reached anymore.● Due to the CAP Theorem you are sacrificing
consistency.
Sergio Bossa - http://www.linkedin.com/in/sergiob
Data Versioning
● Vector Clocks.● List of (node, counter) values associated to
each object version.● Every time a given object is read by a node, all
its vector clocks are transferred.● Every time a given object is written back by a
node, counter for that node is incremented.● A vector clock can express causal ordering.● A vector clock can express branching.● Read-time reconciliation (read repair).
Sergio Bossa - http://www.linkedin.com/in/sergiob
Data Versioning
● Other...● Multi-Version Concurrency Control.
● Each read/write operation works on a consistent snapshot.
● Optimistic concurrency.● Write operations succeed only if their version
is the current one.● Last Wins (optionally with timestamps).
● Last write operation wins.● Optionally, with the highest timestamp.
Sergio Bossa - http://www.linkedin.com/in/sergiob
Data Recovery
● Hinted Handoff.● Writes to unavailable nodes get directed to
“secondary” nodes.● Secondary nodes get an hint about the
original destination node.● When the node is available again, the
secondary node send back the value.
Sergio Bossa - http://www.linkedin.com/in/sergiob
Data Recovery
● Merkle Trees.● For nodes missing large number of values (i.e.
after disaster recovery).● Nodes exchange a tree composed of:
● Leaves containing each the hash of a value hosted by the node.
● Parents containing each the hash of the children.
● Updated values are recovered by comparing hashes and reading back from healthy nodes.
Sergio Bossa - http://www.linkedin.com/in/sergiob
Membership
● Master-based.● Registry-like.● Membership
information maintained and broadcasted by one or more master nodes.
● Consistent.● No SPOF with
active/passive master.● Prone to partitioning
failures.
Sergio Bossa - http://www.linkedin.com/in/sergiob
Membership
● Gossip-based.● Peer-to-Peer.● Membership information
is randomly spread among nodes.● Each node picks one
or more nodes, broadcasting them its own topology view.
● All nodes will eventually reach a consistent view of the cluster topology.
Sergio Bossa - http://www.linkedin.com/in/sergiob
Data Analysis
● The importance of data locality.● A distributed system is built by:
● Moving data toward its behavior.● ... or ...● Moving behavior toward its data.
● An efficient distributed system is built by:● Moving behavior toward its data.
Sergio Bossa - http://www.linkedin.com/in/sergiob
Data Analysis
● Map-Reduce.● Map data
analysis and computation tasks toward the data itself.
● Reduce results.● No need to
move data around.
Sergio Bossa - http://www.linkedin.com/in/sergiob
Use Cases (1)
● Runtime data.● “Runtime” VS “Transactional”.● Not all data need complex relations.● Not all data need to be persisted forever.
● That is, everything regarding the current “runtime” state.● User session and everything related.
● Put the “runtime” state into your N-RDBMS.● When the “runtime” state turns into
“transactional”, put it into your RDBMS.
Sergio Bossa - http://www.linkedin.com/in/sergiob
Use Cases (2)
● Hot spots.● For read-intensive data:
● Use your N-RDBMS as a primary database for reads.
● Use your RDBMS as a primary database for writes and load data into the N-RDBMS from a background thread.
● For read/write-intensive data:● Use your N-RDBMS as a primary database
for writes and reads.● Put your data in your RDBMS from a
background thread (if needed).
Sergio Bossa - http://www.linkedin.com/in/sergiob
Use Cases (3)
● Intense data computations.● When the relational model doesn't efficiently
represent your data ...● And join operations are just too expensive ...● N-RDBMS come to rescue!
● Providing more efficient data representation/storage.
● Providing grid-style computations (i.e. Map-Reduce).
Sergio Bossa - http://www.linkedin.com/in/sergiob
Products (1)
● MongoDB● http://www.mongodb.org● Document-based.
● (Binary) Json.● Support for indexes and object queries.● Full support for master-slave replication.● Alpha support for sharding.● ACID (unless failure scenarios during
replication).
Sergio Bossa - http://www.linkedin.com/in/sergiob
Products (2)
● Cassandra● http://incubator.apache.org/cassandra/● Column-based (hybrid).
● Keys.● Column Families.
● Columns.● Super-Columns.
● Support for ordered range queries.● Fully distributed.
● Peer-to-Peer.● Eventually consistent.
Sergio Bossa - http://www.linkedin.com/in/sergiob
Products (3)
● Voldemort● http://project-voldemort.com● Key/Value.
● Pluggable data serialization.● No support for queries.● Fully distributed.
● Peer-to-Peer.● Eventually consistent.
Sergio Bossa - http://www.linkedin.com/in/sergiob
Products (4)
● Riak● http://riak.basho.com/● Document-based.
● Json.● Links.
● Support for Map-Reduce.● Fully distributed.
● Peer-to-Peer.● Eventually consistent.
● With runtime dynamic tuning.
Sergio Bossa - http://www.linkedin.com/in/sergiob
Final words
● Know how to scale your relational database.● Don't dismiss it just to follow the hype.
● Know how non-relational databases scale.● There are many choices around.
● Know your use cases.● Make sensible decisions.
● Enjoy!● And be happy!
Sergio Bossa - http://www.linkedin.com/in/sergiob
Thank you!
Q&A