Date post: | 12-May-2015 |
Category: |
Technology |
Upload: | dipti-borkar |
View: | 553 times |
Download: | 2 times |
NoSQL for SQL Professionals
Dipti Borkar Director, Product Management
Link to Slides
http://bit.ly/17pgrcP
NoSQL+ +
More Data More Users Interactive Apps
Macro Trends Driving NoSQL Technology
Lacking Solutions, Users Forced to Invent
DynamoOctober 2007
CassandraAugust 2008
VoldemortFebruary 2009November 2006
Bigtable
Very few organizations can build and maintain database software technology.But every organization building interactive web applications needs this technology.
What Is Biggest Data Management Problem Driving Use of NoSQL in Coming Year?
Lack of flexibility/rigid schemas
Inability to scale out data
Performance challenges
Cost All of these Other
49%
35%
29%
16% 12% 11%
Source: Couchbase Survey, December 2011, n = 1351.
Relational vs. NoSQL
Key Differences
RDBMS Scales UpGet a bigger, more complex server
Users
Application Scales OutJust add more commodity web servers
Users
System CostApplication Performance
Relational Technology Scales Up
Relational Database
Web/App Server Tier
Expensive and disruptive sharding, doesn’t perform at web scale
System CostApplication Performance
Won’t scale beyond this point
Couchbase Server Scales Out Like App Tier
NoSQL Database Scales OutCost and performance mirrors app tier
Users
Scaling out flattens the cost and performance curves
Couchbase Distributed Data Store
Application Scales OutJust add more commodity web servers
Users
System CostApplication Performance
Application Performance System Cost
Web/App Server Tier
Differences
• 1. Tables vs Document Relational has tables with predefined columns: Schema pre-determined before
data can be inserted. Best practice is to normalize by splitting into several tables, joined by PK-FK
relation.
Differences
• Tables vs Document (contd.) In Couchbase, there are no tables only documents A logical entity is stored within a single document Different documents do not need to have the same set of fields or structure You differentiate different types of documents either based on key names you
provide or by adding attributes
Relational vs Document Data Model
Relational data model Document data modelCollection of complex documents with
arbitrary, nested data formats andvarying “record” format.
Highly-structured table organization with rigidly-defined data formats and
record structure.
C1 C2 C3 C4
JSONJSON
JSON
{
}
Differences
• Joins vs logical single document Single logical document. No need for joins. If normalized and several documents, then use a series of gets
• Transactions Relational: Atomicity can span several records across several tables. NoSQL: Atomicity confined to at document level
recipe= couchbase.get("my-recipe-id"); reviews = couchbase.multiget(recipe.comments);
Key Couchbase Concepts
Couchbase Cluster
Multitenant Architecture
Server Nodes
User/application data
based on bucket partitioning
Which live on
Data Buckets
DocumentsRead/write from/to
That form a
Clients
Servers
dynamically scalable
RDBMS Example: User Profile
Address Info
1 DEN 30303CO
2 MV 94040CA
3 CHI 60609IL
User Info
KEY First ZIP_idLast
4 NY 10010NY
1 Dipti 2Borkar
2 Joe 2Smith
3 Ali 2Dodson
4 John 3Doe
ZIP_id CITY ZIPSTATE
1 2
2 MV 94040CA
To get information about specific user, you perform a join across two tables
Document Example: User Profile
All data in a single document
{ “ID”: 1, “FIRST”: “Dipti”, “LAST”: “Borkar”, “ZIP”: “94040”, “CITY”: “MV”, “STATE”: “CA” }
JSON
= +
Making a Change Using RDBMSUser ID First Last Zip
1 Dipti Borkar 94040
2 Joe Smith 94040
3 Ali Dodson 94040
4 Sarah Gorin NW1
5 Bob Young 30303
6 Nancy Baker 10010
7 Ray Jones 31311
8 Lee Chen V5V3M
• • •
50000 Doug Moore 04252
50001 Mary White SW195
50002 Lisa Clark 12425
Country ID
TEL3
001
Country ID Country name
001 USA
002 UK
003 Argentina
004 Australia
005 Aruba
006 Austria
007 Brazil
008 Canada
009 Chile
• • •
130 Portugal
131 Romania
132 Russia
133 Spain
134 Sweden
User ID Photo ID Comment
2 d043 NYC
2 b054 Bday
5 c036 Miami
7 d072 Sunset
5002 e086 Spain
Photo Table
001
007
001
133
133
User ID Status ID Text
1 a42 At conf
4 b26 excited
5 c32 hockey
12 d83 Go A’s
5000 e34 sailing
Status Table
134
007
008
001
005
Country Table
User ID Affl ID Affl Name
2 a42 Cal
4 b96 USC
7 c14 UW
8 e22 Oxford
Affiliations TableCountry
ID
001
001
001
002
Country ID
Country ID
001
001
002
001
001
001
008
001
002
001
User Table
...
Making the Same Change With a Document DB
{ “ID”: 1, “FIRST”: “Dipti”, “LAST”: “Borkar”, “ZIP”: “94040”, “CITY”: “MV”, “STATE”: “CA”, “STATUS”: { “TEXT”: “At Conf” }
}
“GEO_LOC”: “134” },“COUNTRY”: ”USA”
Just add information to a document
JSON
,}
User ID First Last Zip
1 Frank Wiegel 94040
2 Joe Smith 94040
3 Ali Dodson 94040
4 Sarah Gorin NW1
5 Bob Young 30303
6 Nancy Baker 10010
7 Ray Jones 31311
8 Lee Chen V5V3
• • •
5000 Doug Moore 04252
5001 Mary White 41694
5002 Lisa Clark 12425
User ID
PhotoID Comment
2 d043 NYC
2 b054 Bday
5 c036 Miami
7 d072 Sunset
5002 e086 Spain
User Table Photo Table
User ID
Status ID Text
1 a42 At conf
4 b26 excited
5 c32 hockey
12 d83 Go A’s
5000 e34 sailing
Status Table
User ID
AffiliationsID
AffiliationsName
2 a42 Cal
4 b96 USC
7 c14 UW
8 e22 Oxford
Affiliations Table
Relational vs Document Performance
1 Frank 94040Weigel
a421 At conf
5 Bob 30303Young
c0365 Miami
4 Sarah NW1Gorin
b264 hockey
JSON
{
}
JSON
{
}
JSON
{
}JSON
{
}
JSON
{
}JSON
{
}
JSON
{
}JSON
{
}
JSON
{
}JSON
{
}8 Lee V5V3Chen
e228 Oxford5002 Lisa 12425Clark
e0865002 Spain
c0325 excited
Faster response times and higher throughput
Document Databases Easily Accommodate Unstructured Data
{ “ID”: 1, “NAME”: “Fairmont San Francisco”, “DESCRIPTION”: “Historic grandeur…”, “AVG_REVIEWER_SCORE”: “4.3”, “AMENITY”: {“TYPE”: “gym”, DESCRIPTION: “fitness center” }, {“TYPE”: “wifi”, “DESCRIPTION”: “free wifi”}, “RATE_TYPE”: “nightly”, “PRICE”: “$199”, “REVIEWS”: [“review_1”, “review_2”], “ATTRACTIONS”: “Chinatown”, }
JSON
{ “ID”: 2, “NAME”: “W San Francisco”, “DESCRIPTION”: “Chic, hip accommodations..”, “AVG_REVIEWER_SCORE”: “4.0”, “AMENITY”: {“TYPE”: “spa”, DESCRIPTION: “Bliss Spa” }, {“TYPE”: “wifi”, “DESCRIPTION”: “free wifi”}, {“TYPE”: “dining”, “DESCRIPTION”: “bar/lounge”}, “RATE_TYPE”: “nightly”, “PRICE”: “$194”, “REVIEWS”: [“review_1”, “review_2”],} JSON
Hotels
Document Databases Easily Accommodate Unstructured Data
{ “ID”: 1, “NAME”: “Fairmont San Francisco”,…} JSON
{ “REVIEW_ID”: 1, “REVIEW”: “Loved Hotel & Location”, “WOULD RECOMMEND”: “yes”, “AVG_REVIEWER_SCORE”: “5”, “REVIEW_DATE”: “May 29, 2013”, “USER_PROFILE_ID”: “271”,
}
JSON
{ “REVIEW_ID”: 2, “REVIEW”: “Nice, but a few kinks”, “WOULD RECOMMEND”: “yes”, “AVG_REVIEWER_SCORE”: “4”, “REVIEW_DATE”: “May 22, 2013”, “USER_PROFILE_ID”: “923”,
}
JSON
Hotels
Reviews
Document Databases Easily Accommodate Unstructured Data
{ “ID”: 1, “NAME”: “Fairmont San Francisco”,…} JSON
Hotel Descriptions
Reviews { “REVIEW_ID”: 1, “REVIEW”: “Loved Hotel…”,…} JSON
{ “REVIEW_ID”: 2, “REVIEW”: “Nice, but …”,…}
JSON
User Profiles { “USER_ID”: 1, “DISPLAY_NAME ”: “Ted’s Trip Experience”, “CITY”: “Saratoga”, “STATE”: “California”,“NUM_OF_REVIEWS”: “8”, }
JSON
{ “USER_ID”: 1, “DISPLAY_NAME ”: “WhatWhat567”, “CITY”: “Kansas City”, “STATE”: “MO”,“NUM_OF_REVIEWS”: “3”, } JSON
Document Databases Easily Accommodate Unstructured Data
{ “ID”: 1, “NAME”: “Fairmont San Francisco”,…} JSON
Hotel Descriptions
Reviews { “REVIEW_ID”: 1, “REVIEW”: “Loved Hotel…”,…} JSON
{ “REVIEW_ID”: 2, “REVIEW”: “Nice, but …”,…}
JSON
User Profiles { “USER_ID”: 1, “DISPLAY”: “Ted’s Trip…”,…}
JSON
{ “USER_ID”: 2, “DISPLAY”: “WhatWhat …”,…}
JSON
Document IDs associates related objects
Hotels points to reviews
Reviews points to users
Indexing with Document DatabasesIndex on AVG_REVIEWER_SCORE
Indexing with Document DatabasesIndex on AVG_REVIEWER_SCORE
…4.0, doc_id4.0, doc_id4.1, doc_id4.3, doc_id5.0, doc_id…
Index
Querying with Document DatabasesQuery on AVG_REVIEWER_SCORE
…3.4, doc_id3.4, doc_id3.5, doc_id3.6, doc_id3.7, doc_id3.8, doc_id4.0, doc_id4.1, doc_id4.3, doc_id4.5, doc_id4.7, doc_id4.9, doc_id5.0, doc_id…5.0, doc_id
Index Matching ResultsQuery
Flavors of NoSQL
Key-Value
memcached
membase
redis
Data Structure Document Column Graph
mongoDB
couchbase cassandra
Cach
e(m
emor
y on
ly)
Dat
abas
e(m
emor
y/di
sk)
Neo4j
NoSQL catalog
The Key-Value Store – the foundation of NoSQL
Key
101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
Memcached – the NoSQL precursor
Key
101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
memcached
In-memory onlyLimited set of operationsBlob Storage: Set, Add, Replace, CASRetrieval: GetStructured Data: Append, Increment
“Simple and fast.”
Challenges: cold cache, disruptive elasticity
Couchbase – document-oriented database
Key
{ “string” : “string”, “string” : value, “string” : { “string” : “string”, “string” : value }, “string” : [ array ]}
Auto-shardingDisk-based with built-in memcached cacheCache refill on restartMemcached compatible (drop in replace)Highly-available (data replication)Add or remove capacity to live cluster
When values are JSON objects (“documents”):Create indices, views and query against the views
JSONOBJECT
(“DOCUMENT”)
Couchbase
NoSQL catalog
Key-Value
memcached
membase
redis
Data Structure Document Column Graph
Cach
e(m
emor
y on
ly)
Dat
abas
e(m
emor
y/di
sk)
membase couchbase
MongoDB – Document-oriented database
Key
{ “string” : “string”, “string” : value, “string” : { “string” : “string”, “string” : value }, “string” : [ array ]}
Disk-based with in-memory “caching”BSON (“binary JSON”) format and wire protocolMaster-slave replicationAuto-shardingValues are BSON objectsSupports ad hoc queries – best when indexed
BSONOBJECT
(“DOCUMENT”)
MongoDB
MongoDB Architecture
NoSQL catalog
Key-Value
memcached
membase
redis
Data Structure Document Column Graph
mongoDB
couchbase
Cach
e(m
emor
y on
ly)
Dat
abas
e(m
emor
y/di
sk)
Cassandra – Column overlays
Disk-based systemClustered External caching required for low-latency reads“Columns” are overlaid on the dataNot all rows must have all columnsSupports efficient queries on columnsRestart required when adding columnsGood cross-datacenter support
CassandraKey
101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
Column 1
Column 2
Column 3 (not present)
Cassandra Architecture
NoSQL catalog
Key-Value
memcached
membase
redis
Data Structure Document Column Graph
mongoDB
couchbase cassandra
Cach
e(m
emor
y on
ly)
Dat
abas
e(m
emor
y/di
sk)
Neo4j – Graph database
Disk-based systemExternal caching required for low-latency readsNodes, relationships and pathsProperties on nodesDelete, Insert, Traverse, etc.
Neo4j
Key
101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
Key
101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
Key
101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
Key
101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
Key
101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
NoSQL catalog
Key-Value
memcached
membase
redis
Data Structure Document Column Graph
mongoDB
couchbase cassandra
Cach
e(m
emor
y on
ly)
Dat
abas
e(m
emor
y/di
sk)
Neo4j
Where is NoSQL a good fit?
Market AdoptionInternet Companies Enterprises
• Communications
• Retail
• Financial Services
• Health Care
• Automotive/Airline
• Agriculture
• Consumer Electronics
• Business Systems
• Social Gaming
• Ad Networks
• Social Networks
• Online Business Services
• E-Commerce
• Online Media
• Content Management
• Cloud Services
Market Adoption – CustomersInternet Companies Enterprises
More than 300 customers -- 5,000 production deployments worldwide
Application Characteristics - Data driven
• 3rd party or user defined structure (Twitter feeds)
• Support for unlimited data growth (Viral apps)
• Data with non-homogenous structure
• Need to quickly and often change data structure
• Variable length documents
• Sparse data records
• Hierarchical data
Couchbase is a good fit
Application Characteristics - Performance driven
• Low latency critical (ex. 1millisecond)
• High throughput (ex. 200000 ops / sec)
• Large number of users
• Unknown demand with sudden growth of users/data
• Predominantly direct document access
• Read / Mixed / Write heavy workloads
Couchbase is a good fit
Common Use CasesSocial Gaming
• Couchbase stores player and game data
• Examples customers include: Zynga
• Tapjoy, Ubisoft, Tencent
Mobile Apps• Couchbase stores user
info and app content
• Examples customers include: Kobo, Playtika
Ad Targeting• Couchbase stores
user information for fast access
• Examples customers include: AOL, Mediamind, Convertro
Session store• Couchbase Server as a key-
value store
• Examples customers include: Concur, Sabre
User Profile Store• Couchbase Server as a
key-value store
• Examples customers include: Tunewiki
High availability cache• Couchbase Server used as a cache tier replacement
• Examples customers include: Orbitz
Content & Metadata Store
• Couchbase document store with Elasticsearch
• Examples customers include: McGraw Hill, Tunewiki
3rd party data aggregation • Couchbase stores social media and
data feeds• Examples customers include:
Sambacloud
Q & A