Scalable Databases - From Relational Databases To Polyglot Persistence

transcript

Sergio Bossa – sergio.bossa@gmail.comJavaday IV – Roma – 30 gennaio 2010

SCALABLE DATABASESFrom Relational Databases

To Polyglot Persistence

Sergio Bossa sergio.bossa@gmail.comhttp://twitter.com/sbtourist

About Me● Software architect and engineer

● Gioco Digitale (online gambling and casinos)● Open Source enthusiast

● Terracotta Messaging (http://forge.terracotta.org)● Terrastore (http://code.google.com/p/terrastore)● Actorom (http://code.google.com/p/actorom)

● (Micro-)Blogger● http://twitter.com/sbtourist● http://sbtourist.blogspot.com

Five fallacies of data-centric systems

Data model is static.Data volume is predictable.

Data access load is predictable.Database topology doesn't change.

Database never fails.

Scalable databases in action

● Scaling your database as a way to solve fallacies above.● Scale to handle heterogeneous data.● Scale to handle more data.● Scale to handle more load.● Scale to handle topology changes due to:

● Unplanned growth.● Unpredictable failures.

Scaling Relational Databases

Master-Slave replication● Master - Slave replication.

● One (and only one) master database.

● One or more slaves.● All writes goes to the master.

● Replicated to slaves.● Reads are balanced among master

and slaves.● Major issues:

● Single point of failure.● Single point of bottleneck.● Static topology.

Master-Master replication● Master - Master replication.

● One or more masters.● Writes and reads can go to any

master node.● Writes are replicated among

masters.● Major issues:

● Limited performance and scalability (typically due to 2PC).

● Complexity.● Static topology.

Vertical partitioning● Vertical partitioning.

● Put tables belonging to different functional areas on different database nodes.● Scale your data and load by

function.● Move joins to the application

level.● Major issues:

● No more truly relational.● What if a functional area grows too

Horizontal partitioning● Horizontal partitioning.

● Split tables by key and put partitions (shards) on different nodes.● Scale your data and load by key.● Move joins to the application

level.● Needs some kind of routing.

● Major issues:

● No more truly relational.● What if your partition grows too

Caching● Put a cache in front of your database.

● Distribute.● Write-through for scaling reads.● Write-behind for scaling reads and

writes.● Saves you a lot of pain, but ...

● “Only” scales read/write load.

Did we solve our fallacies?● We tried, but ...

● Still bound to the relational model.● Replication only covers a few use cases.● Partitioning is hard.● Caching is good, but not definitive.● ...

● Can we do any better?

It's Not Only SQL

NOSQL Characteristics● Main traits of characterization:

● Data Model.● Data Processing.● Consistency Model.● Scale Out.

Data Model (1)● Column-family based.● Structure:

● Key-identified rows with a sparse number of columns.● Columns grouped in families.● Multiple families for the same key.

● Highlights:● Dynamically add and remove columns.● Efficiently access columns in the same group (column

family).

Data Model (2)● Document based.● Structure:

● Key-identified documents.● Schema-less (but optionally constrained).

– JSON, XML ...● Highlights:

● Dynamically change inner documents structure.● Efficiently access documents as a unit.

Data Model (3)● Graph based.● Structure:

● Nodes to represent your data.● Relations as meaningful links between nodes.● Properties to enrich both.

● Highlights:● Rich data model.● Efficient, fast, traversal of nodes and relations.

Data Model (4)● Key-Value based.● Structure:

● Key-identified opaque values.● Highlights:

● Great flexibility.● Fast reads/writes for single entries.

Data Processing● Several options:

● Map/Reduce.● Predicates.● Range Queries.● ...

● One common principle:● Move processing toward related data.

Consistency Model (1)● Strict Consistency.

● All nodes ...● At every point in time ...● See a consistent view of the stored data.

– Per-key consistency.– Multi-key consistency.

Consistency Model (2)● Eventual Consistency.

● Only a subset of all nodes ...● At a specific point in time ...● See a consistent view of the stored data.

– Other nodes will serve stale data.– Other nodes will eventually get updates later.

Scale Out (1)● Master-based.

● Membership managed and broadcasted by masters.

● Data consistency guaranteed by masters.

● No SPOF with active/passive masters.

● No SPOB with active/active masters or cluster-cluster replication.

● Prone to partitioning failures.

Scale Out (2)● Peer-to-peer.

● Membership is maintained through multicast or gossip-based protocols.

● Data consistency is maintained through quorum protocols.

● Easier to scale.● Harder to maintain consistency.

NOSQL Use Cases● Use cases evolve along the following kinds of data:

● Rich.● Runtime.● Hot Spot.● Massive.● Computational.

● Do not use the same product for all cases.● Pick multiple products for different use cases.

NOSQL Products - Cassandra● Cassandra (http://incubator.apache.org/cassandra)● Data Model:

● Column-family based.● Data Processing:

● Range queries, Predicates.● Consistency:

● Eventual consistency.● Scalability:

● Peer-to-peer, gossip based.

NOSQL Products - Mongo DB● Mongo DB (http://www.mongodb.org)● Data Model:

● Document based (JSON).● Data Processing:

● Map/Reduce, SQL-like queries.● Consistency:

● Per-document strict consistency.● Scalability:

● Replication, partitioning (alpha).

NOSQL Products - Neo4j● Neo4j (http://neo4j.org)● Data Model:

● Graph based.● Data Processing:

● Path traversal, Index-based search.● Consistency:

● Strict consistency.● Scalability:

● Replication.

NOSQL Products - Riak● Riak (http://riak.basho.com)● Data Model:

● Map/Reduce.● Consistency:

NOSQL Products - Terrastore● Terrastore (http://code.google.com/p/terrastore)● Data Model:

● Range queries, Predicates.● Consistency:

● Per-document strict consistency.● Scalability:

● Master-based.

NOSQL Products - Voldemort● Voldemort (http://project-voldemort.com)● Data Model:

● Key-Value.● Data Processing:

● None.● Consistency:

NOSQL Products and Use Cases

Final words● A New World.

● New paradigms.● New use cases.● New products.

● Don't dismiss the old stuff.● Relational databases still have their place.

● Embrace change.● May the NOSQL power be with you.

● Let the Polyglot Persistence era begin!

Scalable Databases - From Relational Databases To Polyglot Persistence

Technology