Polyglot Persistence - Two Great Tastes That Taste Great Together

Post on 15-Jan-2015

10,955 views 1 download

Tags:

description

The days of the relational database being a one-stop-shop for all of your persistence needs are over. Although NoSQL databases address some issues that can’t be addressed by relational databases, the opposite is true as well. The relational database offers an unparalleled feature set and rock solid stability. One cannot underestimate the importance of using the right tool for the job, and for some jobs, one tool is not enough. This talk focuses on the strength and weaknesses of both relational and NoSQL databases, the benefits and challenges of polyglot persistence, and examples of polyglot persistence in the wild. These slides were presented at WindyCityDB 2010.

transcript

Polyglot PersistenceTwo Great Tastes

That Taste Great Together!

John Woodjohn_p_wood@yahoo.com

@johnpwood

About Me

● Software Developer at Interactive Mediums● Primarily work on a web application that allows

our customers to engage and interact with their customers

● Writing code for about 15 years● Tinkering with NoSQL for about 1.5 years● Have a NoSQL solution that has been running

in production for a year

You Now Have A Choice

You Now Have A Choice

You Now Have A Choice

You Now Have A Choice

You Now Have A Choice

You Now Have A Choice

You Now Have A Choice

You Now Have A Choice

You Now Have A Choice

You Now Have A Choice

The RDBMS Is No Longer The Default Choice

The RDBMS Is No Longer The Default Choice

● Can be very difficult to scale horizontally● Schemas can be difficult to maintain and

migrate● For some applications, the data integrity

features of the RDBMS are an unnecessary overhead

● Data constraints and JOINs can be expensive at runtime

NoSQL Databases Have Stepped Up To Address These Issues

NoSQL Databases Have Stepped Up To Address These Issues

● Schema-less● Little to no data integrity enforcement● Self-contained data● Eventually consistent● Easy to scale horizontally to add processing

power and storage

But The RDBMS Is Far From Dead

But The RDBMS Is Far From Dead

● Incredibly mature, and battle tested● Immediate and constant consistency● Integrity of data is enforced● Efficient use of storage space if data

normalized properly● Supported by everyone and everything (tools,

frameworks, libraries, etc)● Incredibly flexible and powerful query language● Help is plentiful and easy to find

Choice is good...right?

Decisions, Decisions...

You Don't Have to Choose

“You've got your chocolate in my peanut butter!”

Polyglot Persistence

pol●y●glot - AdjectiveKnowing or using several languages

pol●y●glot - AdjectiveKnowing or using several languages

per●sist●ence - NounThe continued or prolonged existence of

something

Polyglot PersistenceThe continued or prolonged existence of

something using several languages

Polyglot PersistenceThe continued or prolonged existence of

something using several languagesdatabases

“Polyglot Persistence, like polyglot programming, is all

about choosing the right persistence option for the task at

hand.” - Scott Leberknight, October, 2008

http://www.nearinfinity.com/blogs/scott_leberknight/polyglot_persistence.html

Why On Earth Would You Want To Do This?

CAP Theorem

http://en.wikipedia.org/wiki/CAP_theorem

http://blog.nahurst.com/visual-guide-to-nosql-systems

Compromise

Consistency and Data Integrity

+Scalability and

Flexibility

Support A Wide Range of Storage

Requirements

Get The Job Done Faster, With Better

Quality

DB Doesn't Just Stand For Database

Don't Swim Upstream

Possible Use Cases

Use A NoSQL Database For A Particular

Application Feature

Use A NoSQL Database For Speedy Batch

Processing

Use A NoSQL Database For Distributed Logging

Use A NoSQL Database For Large Tables

Use A RDBMS For Reporting

Sounds Great!What's The Catch?

Difficult For Data In Different Databases To

Interact

You Now Have To Decide Where To Store

Data

Increased Application And Deployment

Complexity

Additional Administrative

Responsibilities

Training

What Will This Do To My Beautiful Code?

It's All About The Layers

class User < ActiveRecord::Baseend

class ContestEntry < CouchRest::ExtendedDocument property :entry_numberend

class User < ActiveRecord::Base def contest_entries ContestEntry.entries_for_user(self.id) endend

class ContestEntry < CouchRest::ExtendedDocument property :entry_number property :user_id

def self.entries_for_user(user_id) # Execute your view to fetch the contest entries end

def user User.fi nd_by_id(user_id) endend

Additional Options Available

So, Who Is Actually Doing This?

● Primary MySQL database with a backup● A few very large tables, containing 5M – 30M

rows each, and growing quickly● Increasing query execution time● Some pages on the web app were timing out● Increasing database migration time● Rigid schema of the RDBMS was preventing

some planned features from moving forward

● Brought in a consultant to help us optimize our MySQL setup

● Optimized slow queries● Added some indexes● Offloaded some work to the backup database● Considered the use of summary tables for

statistics

+

● Migrated old data from large tables to CouchDB● Using CouchDB views to aggregate summary

data● Data is imported and views are updated nightly● Queries for statistics now very fast● Using Lucene (via couchdb-lucene) for full text

searching● Taking full advantage of CouchDBs schema-

less nature in several new application features

It's Not All Rainbows And Unicorns

● CouchDB databases and views can be very large on disk

● Some queries could not be substituted with CouchDB views

● Indexing tens of millions of documents for full text search with Lucene takes weeks

● Development takes longer, as the map/reduce model requires additional thought and planning

● Changing/Upgrading views in production not straightforward

http://www.couch.io/migrating-to-couchdb

http://twitter.com/about/opensource

● Vertically and horizontally partitioned MySQL● Several layers of aggressive caching, all

application managed● Schema changes impossible, resulting in the

use of bitfields and piggyback tables● Hardware intensive● Error prone● Hitting MySQL limits● Already eventually consistent

FlockDB

HBase

● Migrating from MySQL to Cassandra as their main online data store

● Hadoop/HBase used for people search feature● FlockDB used to manage the social graph● Hadoop for analytics● “As with all NoSQL systems, strengths in

different situations” - Kevin Weil, Analytics Lead, Twitter

http://www.slideshare.net/kevinweil/nosql-at-twitter-nosql-eu-2010

● Increased availability● The ability to support new features● The ability to analyze their massive amount of

data in a reasonable amount of time

http://www.slideshare.net/kevinweil/nosql-at-twitter-nosql-eu-2010

Right Tool For The Job

Thanks!

john_p_wood@yahoo.com@johnpwood