Making sense of NoSQLAnn Kelly
Dan McCrearyDipti BorkarApril 2013
Copyright Kelly-McCreary & Associates, LLC2
Presenters
Dan McCreary
Kelly-McCreary
& Associates
Ann Kelly
Kelly-McCreary
& Associates
Dipti Borkar
Couchbase
Copyright Kelly-McCreary & Associates, LLC3
Agenda
• What is NoSQL?• What Triggered the NoSQL Movement?• Database Architecture Patterns• Common Characteristics of NoSQL System• Business Benefits of NoSQL• Core NoSQL Concepts• Selected NoSQL Implementations• Recent NoSQL Developments• Selecting the Right NoSQL System• Next Step: Selecting the Right NoSQL Pilot Project• Quick introduction to Document databases & Couchbase
Copyright Kelly-McCreary & Associates, LLC4
Pressures on Single CPU SQL
Single CPURDBMS
Velocity Agility
Volume
Variability
Copyright Kelly-McCreary & Associates, LLC5
Three Eras of Databases
• RDBMS for transactions, Data Warehouse for analytics and NoSQL for scalability
RDBMSRDBMS
DataWarehouse
1985-1995
1995-2010 2010-Now
DataWarehouseRDBMS
NoSQL
Copyright Kelly-McCreary & Associates, LLC6
Advancements in Distributed Databases
Kelly-McCreary & Associates, LLC7
NoSQL on Google Trends
http://www.google.com/trends/explore#q=NoSQL
Interest over time The number 100 represents the peak search volume
Kelly-McCreary & Associates, LLC8
2009: the NoSQL "Revolt"
“NoSQLers came to share how they had overthrown the tyranny of slow, expensive relational databases in favor of more efficient and cheaper ways of managing data.”
Computerworld magazine, July 1st, 2009
NoSQL!
Copyright Kelly-McCreary & Associates, LLC9
Common Themes
• Horizontal scalability• Clever use of hashing and caching• Parallel execution of queries
– move queries to the data, not the other way around
• Share resources when possible– Example – memcached protocol
• Use simple interfaces when possible– put, get, delete
Copyright Kelly-McCreary & Associates, LLC10
Selecting a Database…
"Selecting the right data storage solution is no longer a trivial task."
Does it look like
document?
Use MicrosoftOffice
Use theRDBMS
Start
Stop
No
Yes
11
Six Types of Databases
Copyright Kelly-McCreary & Associates, LLC
Relational Analytical (OLAP) Key-Value
Column-Family
key value
key value
key value
key value
DocumentGraph
Copyright Kelly-McCreary & Associates, LLC12
Relational
• Data is usually stored in row by row manner (row store)
• Standardized query language (SQL)• Data model defined before you add
data• Joins merge data from multiple tables• Results are tables• Pros: mature ACID transactions with
fine-grain security controls• Cons: Requires up front data
modeling, does not scale well
Copyright Kelly-McCreary & Associates, LLC13
Analytical (OLAP)
• Based on "Star" schema with central fact table for each event
• Optimized for analysis of read-analysis of historical data
• Use of MDX language to count query "measures" for "categories" of data
• Pros: fast queries for large data• Cons: not optimized for
transactions and updates
Copyright Kelly-McCreary & Associates, LLC14
Key-Value Stores
• Keys used to access opaque blobs of data
• Values can contain any type of data (images, video)
Pros: scalable, simple API (put, get, delete)
Cons: no way to query based on the content of the value
key value
key value
key value
key value
Copyright Kelly-McCreary & Associates, LLC15
Column-Family
• Key includes a row, column family and column name
• Store versioned blobs in one large table
• Queries can be done on rows, column families and column names
• Pros: Good scale out• Cons: Can not query blob content,
row and column designs are critical Examples: HBase, Cassandra
Copyright Kelly-McCreary & Associates, LLC16
Graph Store
• Data is stored in a series of nodes and properties
• Queries are really graph traversals• Ideal when relationships between
data is key: – e.g. social networks
• Pros: fast network search, works with public linked data sets
• Cons: Poor scalability when graphs don't fit into RAM, specialized query language
Examples: Neo4j, AllegroGraph
Copyright Kelly-McCreary & Associates, LLC17
Document Store
• Data stored in nested hierarchies
• Logical data remains stored together as a unit
• Any item in the document can be queried
• Pros: No object-relational mapping layer, ideal for search
• Cons: Complex to implement, incompatible with SQL
Examples: MongoDB, Couchbase
Copyright Kelly-McCreary & Associates, LLC18
Business Solutions
• Big Data – horizontal scalability• Search – full-text search• High availability – fault tolerance• Agility – quickly adapt to change• Enterprise Class
– Security– Monitoring
Kelly-McCreary & Associates, LLC19
Shared Nothing Architecture
Shared nothing systems have proven to be most cost-effective and flexible
Shared DiskShared RAM Shared Nothing
CPU
RAM
CPU
RAM
SAN
LAN
CPU
RAM
CPU
BUS
CPU
RAM
Disk
CPU
RAM
Disk
LAN
Copyright Kelly-McCreary & Associates, LLC20
Distribution Models
Peer to peer models do not have standby nodes that are idle
Master-Slave Peer-to-Peer
MasterStandbyMaster
Node Node
Node
Node
Node
Node
Node
requestsrequests
Used only if primary master fails
21
Move Queries to the Nodes
Database
MapReduce
Database
MapReduce
Database
MapReduce
Database
MapReduce
Database
MapReduce
Database
MapReduce
Database
MapReduce
Database
MapReduce
Database
MapReduce
Database
MapReduce
Database
MapReduce
Database
MapReduce
Database
MapReduce
Database
MapReduce
Database
MapReduce
Database
MapReduce
Queries work best if the run on the local node that has the data
Query
Kelly-McCreary & Associates, LLC22
Structured Search
• Retain document structure to allow keyword matches in "title" to rank higher then a keyword match in text body
lovefind
search
index
synonym
locationspace
structure
NoSQL
plural
distance
rank
dictionary
words
nounsverbs
annotations
people
flat
synonymboolean
lost
unknown
database
text
Flat Ocean Retained Structure
23
Incremental MapReduce
• Unlike standard MapReduce, Incremental MapReduce only updates aggregates that need to be updated.
• This is an example of how pre-built values are updated with only deltas
• Very useful to save time when calculating aggregates of large data collections
count(), sum(), avg(), min(), max()
Pre-calculatedaggregate values
new itemonly read the new item
prior itemsHBase
aggregate valuesare updated in HBase
Copyright 2008 Dan McCreary & Associates
24
Is Shredding Really Necessary?
• Every time you take hierarchical data and put it into a traditional database you have to put repeating groups in separate tables and use SQL “joins” to reassemble the data
Kelly-McCreary & Associates, LLC25
Object Relational Mapping
• T1 – HTML into Objects• T2 –Objects into SQL Tables• T3 – Tables into Objects• T4 – Objects into HTML
T1
T3
T2
T4
Object MiddleTier
RelationalDatabaseWeb Browser
Copyright Kelly-McCreary & Associates, LLC26
"The Vietnam of Applications"
• Object-relational mapping has become one of the most complex components of building applications today
• A "Quagmire" where many projects get lost• Many "heroic efforts" have been made to
solve the problem:– Hibernate– Ruby on Rails
• But sometimes the way to avoid complexity is to keep your architecture very simple
Kelly-McCreary & Associates, LLC27
Perspectives
DocumentStores
OLAPMDX
ObjectStores
GraphStores
NoSQL for Web 2.0
and BigData
Perspective depends on your context
Kelly-McCreary & Associates, LLC28
Selection Checklist
• Horizontal Scalability• High Availability• Search• No object-relational mapping• Security• Monitoring
Kelly-McCreary & Associates, LLC29
Kelly-McCreary & Associates, LLC30
Architectural Tradeoffs
"I want a fast car with good mileage."
"I want a scaleable database with low cost that runs well on the 1,000 CPUs in our data center."
Introduction to Document Databases
and CouchbaseIntroduction to Document Databases and Couchbase
Dipti Borkar
Director, Product Management
NoSQL Document Database
Easy Scalabili
ty
Consistent High
Performance
Always On
24x365
Grow cluster without application changes, without downtime with a single click
Consistent sub-millisecond read and write response times
with consistent high throughput
No downtime for software upgrades, hardware maintenance, etc.
JSONJSONJSON
JSONJSON
PERFORMANCE
Flexible Data Model
JSON document model with no fixed schema.
Couchbase Server - Core Capabilities
Relational vs Document data model
Relational data model Document data modelCollection of complex documents with
arbitrary, nested data formats andvarying “record” format.
Highly-structured table organization with rigidly-defined data formats and
record structure.
JSONJSON
JSON
C1 C2 C3 C4
{
}
User ID First Last Zip
1 Dipti Borkar 94040
2 Joe Smith 94040
3 Ali Dodson 94040
4 Sarah Gorin NW1
5 Bob Young 30303
6 Nancy Baker 10010
7 Ray Jones 31311
8 Lee Chen V5V3M
• • •
50000 Doug Moore 04252
50001 Mary White SW195
50002 Lisa Clark 12425
Country ID
TEL3
001
Country ID
Country name
001 USA
002 UK
003 Argentina
004 Australia
005 Aruba
006 Austria
007 Brazil
008 Canada
009 Chile
• • •
130 Portugal
131 Romania
132 Russia
133 Spain
134 Sweden
User ID Photo ID Comment
2 d043 NYC
2 b054 Bday
5 c036 Miami
7 d072 Sunset
5002 e086 Spain
Photo Table
001
007
001
133
133
User ID Status ID Text
1 a42 At conf
4 b26 excited
5 c32 hockey
12 d83 Go A’s
5000 e34 sailing
Status Table
134
007
008
001
005
Country Table
User ID Affl ID Affl Name
2 a42 Cal
4 b96 USC
7 c14 UW
8 e22 Oxford
Affiliations TableCountry
ID
001
001
001
002
Country ID
Country ID
001
001
002
001
001
001
008
001
002
001
User Table
.
.
.
Making a Change Using RDBMS
Making the Same Change with a Document Database
{ “ID”: 1, “FIRST”: “Dipti”, “LAST”: “Borkar”, “ZIP”: “94040”, “CITY”: “MV”, “STATE”: “CA”, “STATUS”: { “TEXT”: “At Conf”
}
}
“GEO_LOC”: “134” },“COUNTRY”: ”USA”
Just add information to a document
JSON
,}
Couchbase Server 2.0 Architecture
Hea
rtbe
at
Proc
ess
mon
itor
Glo
bal s
ingl
eton
sup
ervi
sor
Confi
gura
tion
man
ager
on each node
Reba
lanc
e or
ches
trat
or
Nod
e he
alth
mon
itor
one per cluster
vBuc
ket s
tate
and
repl
icati
on m
anag
er
httpRE
ST m
anag
emen
t API
/Web
UI
HTTP8091
Erlang port mapper4369
Distributed Erlang21100 - 21199
Erlang/OTP
storage interface
Couchbase EP Engine
11210Memcapable 2.0
Moxi
11211Memcapable 1.0
Memcached
New Persistence Layer
8092Query API
Que
ry E
ngin
e
Data Manager Cluster Manager
Couchbase “The basics”
COUCHBASE SERVER CLUSTER
Basic Operation
• Docs distributed evenly across servers
• Each server stores both active and replica docs– Only one server active at a time
• Client library provides app with simple interface to database
• Cluster map provides map to which server doc is on– App never needs to know
• App reads, writes, updates docs
• Multiple app servers can access same document at same time
User Configured Replica Count = 1
READ/WRITE/UPDATE
ACTIVE
Doc 5
Doc 2
Doc
Doc
Doc
SERVER 1
ACTIVE
Doc 4
Doc 7
Doc
Doc
Doc
SERVER 2
Doc 8
ACTIVE
Doc 1
Doc 2
Doc
Doc
Doc
REPLICA
Doc 4
Doc 1
Doc 8
Doc
Doc
Doc
REPLICA
Doc 6
Doc 3
Doc 2
Doc
Doc
Doc
REPLICA
Doc 7
Doc 9
Doc 5
Doc
Doc
Doc
SERVER 3
Doc 6
APP SERVER 1
COUCHBASE Client Library
CLUSTER MAP
COUCHBASE Client Library
CLUSTER MAP
APP SERVER 2
Doc 9
Add Nodes to Cluster
• Two servers added withone-click operation
• Docs automatically rebalance across cluster– Even distribution of docs– Minimum doc movement
• Cluster map updated
• App database calls now distributed over larger number of servers
REPLICA
ACTIVE
Doc 5
Doc 2
Doc
Doc
Doc 4
Doc 1
Doc
Doc
SERVER 1
REPLICA
ACTIVE
Doc 4
Doc 7
Doc
Doc
Doc 6
Doc 3
Doc
Doc
SERVER 2
REPLICA
ACTIVE
Doc 1
Doc 2
Doc
Doc
Doc 7
Doc 9
Doc
Doc
SERVER 3 SERVER 4 SERVER 5
REPLICA
ACTIVE
REPLICA
ACTIVE
Doc
Doc 8 Doc
Doc 9 Doc
Doc 2 Doc
Doc 8 Doc
Doc 5 Doc
Doc 6
READ/WRITE/UPDATE READ/WRITE/UPDATE
APP SERVER 1
COUCHBASE Client Library
CLUSTER MAP
COUCHBASE Client Library
CLUSTER MAP
APP SERVER 2
COUCHBASE SERVER CLUSTER
User Configured Replica Count = 1
Fail Over Node
REPLICA
ACTIVE
Doc 5
Doc 2
Doc
Doc
Doc 4
Doc 1
Doc
Doc
SERVER 1
REPLICA
ACTIVE
Doc 4
Doc 7
Doc
Doc
Doc 6
Doc 3
Doc
Doc
SERVER 2
REPLICA
ACTIVE
Doc 1
Doc 3
Doc
Doc
Doc 7
Doc 9
Doc
Doc
SERVER 3 SERVER 4 SERVER 5
REPLICA
ACTIVE
REPLICA
ACTIVE
Doc 9
Doc 8
Doc Doc 6 Doc
Doc
Doc 5 Doc
Doc 2
Doc 8 Doc
Doc
• App servers accessing docs
• Requests to Server 3 fail
• Cluster detects server failed– Promotes replicas of docs to
active– Updates cluster map
• Requests for docs now go to appropriate server
• Typically rebalance would follow
Doc
Doc 1 Doc 3
APP SERVER 1
COUCHBASE Client Library
CLUSTER MAP
COUCHBASE Client Library
CLUSTER MAP
APP SERVER 2
User Configured Replica Count = 1
COUCHBASE SERVER CLUSTER
New in 2.0
JSON support Indexing and Querying
Cross data center replicationIncremental Map Reduce
JSONJSONJSON
JSONJSON
Cluster wide - XDCRCOUCHBASE SERVER CLUSTER
NY DATA CENTER
ACTIVE
Doc
Doc 2
SERVER 1
Doc 9
SERVER 2 SERVER 3
RAM
Doc Doc Doc
ACTIVE
Doc
Doc
Doc RAM
ACTIVE
Doc
Doc
DocRAM
DISK
Doc Doc Doc
DISK
Doc Doc Doc
DISK
COUCHBASE SERVER CLUSTERSF DATA CENTER
ACTIVE
Doc
Doc 2
SERVER 1
Doc 9
SERVER 2 SERVER 3
RAM
Doc Doc Doc
ACTIVE
Doc
Doc
Doc RAM
ACTIVE
Doc
Doc
DocRAM
DISK
Doc Doc Doc
DISK
Doc Doc Doc
DISK
Couchbase Server Admin Console
Use cases
Data driven use cases
• Support for unlimited data growth
• Data with non-homogenous structure
• Need to quickly and often change data structure
• 3rd party or user defined structure
• Variable length documents
• Sparse data records
• Hierarchical data
Performance driven use cases
• Low latency matters
• High throughput matters
• Large number of users
• Unknown demand with sudden growth of users/data
• Predominantly direct document access
• Workloads with very high mutation rate per document
Social Gaming• Couchbase stores
player and game data
• Examples customers include: Zynga
• Tapjoy, Ubisoft, Tencent
Mobile Apps• Couchbase stores user
info and app content• Examples customers
include: Kobo, Playtika
Ad Targeting• Couchbase stores
user information for fast access
• Examples customers include: AOL, Mediamind, Convertro
Session store• Couchbase Server as a key-
value store
• Examples customers include: Concur, Sabre
User Profile Store• Couchbase Server as a
key-value store
• Examples customers include: Tunewiki
High availability cache• Couchbase Server used as a cache tier
replacement• Examples customers include: Orbitz
Content & Metadata Store
• Couchbase document store with Elastic Search
• Examples customers include: Tunewiki, McGraw Hill
3rd party data aggregation • Couchbase stores social media
and data feeds• Examples customers include:
Sambacloud
Common Use Cases
Recommended Reading
• Making Sense of NoSQL: A guide for managers and the rest of us
• Manning Publications
• Focus on objective architectural analysis
• Available now in Manning Early Access Program (MEAP) e-book (PDF)
• In print June 2013
• http://manning.com/mccreary
Dan McCreary & Ann KellyKelly-McCreary & Associateswww.danmccreary.com
Dipti Borkar@[email protected]