CAP Theorem - babanski.com · The limitations of distributed databases can be described in the so...

Lecture 6B

CS4411: Databases II

• CAP Theorem • NoSQL Databases: Overview

Agenda

CAP Theorem

• Atomic – Transaction cannot be subdivided – All or nothing

• Consistent – Constraints don’t change from before transaction to after

transaction – A transaction transforms a database from one consistent

state to another consistent state. • Isolated

– Transactions execute independently of one another. – Database changes not revealed to users until after

transaction has completed • Durable

– Database changes are permanent and must not be lost.

Transaction ACID Properties

The limitations of distributed databases can be described in the so called the CAP theorem

§ Consistency: every node always sees the same data at any given instance (i.e., strict consistency)

§ Availability: the system continues to operate, even if nodes in a cluster crash, or some hardware or software parts are down due to upgrades

§ Partition Tolerance: the system continues to operate in the presence of network partitions

CAP Properties of distributed databases

• Consistency in Databases (ACID): – Database has a set of integrity constraints – A consistent database state is one where all integrity

constraints are satisfied – Each transaction run individually on a consistent

database state must leave the database in a consistent state

• Consistency in distributed systems with replication – Strong consistency: a schedule with read and write

operations on an object should give results and final state equivalent to some schedule on a single copy of the object, with order of operations from a single site preserved

– Weak consistency (several forms)

What is Consistency?

n When no updates occur for a long period of time, eventually all updates will propagate through the system and all the nodes will be consistent

n For a given accepted update and a given node, eventually either the update reaches the node or the node is removed from service

n Known as BASE (Basically Available, Soft state, Eventual consistency), as opposed to ACID § Soft state: copies of a data item may be inconsistent § Eventually Consistent: copies becomes consistent at

some later time if there are no more updates to that data item

Eventual Consistency

n Traditionally, availability of centralized server n For distributed systems - availability of system to

process requests n For large system, at almost any point in time there’s a good

chance that n a node is down or even n Network partitioning

n Distributed consensus algorithms will block during partitions to ensure consistency n Many applications require continued operation even during

a network partition, even at cost of consistency

Availability

Also known as Brewer’s Theorem by Prof. Eric Brewer, published in 2000 at University of Berkeley.

CAP Theorem

“Of three properties of a shared data system: data consistency, system availability and tolerance to network partitions, only two can be achieved at any given moment.”

There are many levels of consistency. Strict Consistency – RDBMS. Tunable Consistency – Cassandra. Eventual Consistency – Amazon Dynamo.

q Traditional database choose consistency q Most Web applications choose availability n Except for specific parts such as order

processing

A B

Data Data

Consistent and Available No Partition.

App

CAP Theorem

A B

Data Old Data

Available and Partitioned Not Consistent, we get back old data.

App

CAP Theorem

A B

New Data Wait for new data

Consistent and Partitioned Not available, waiting…

App

CAP Theorem

Almost the opposite of ACID. • Basically available: Nodes in the a distributed

environment can go down, but the whole system shouldn’t be affected.

• Soft State (scalable): The state of the system and data changes over time.

• Eventual Consistency: Given enough time, data will be consistent across the distributed system.

BASE, an ACID Alternative

does not make safety guarantees, i.e., an eventually consistent system can return any value before it converges

BASE differs from ACID – trades consistency for availability

ACID: • Strong Consistency. • Less availability. • Pessimistic concurrency. • Complex.

BASE: • Availability is the most important thing. • Willing to sacrifice for this (CAP). • Weaker consistency (Eventual). • Best effort. • Simple and fast. • Optimistic.

BASE vs ACID

§ When companies such as Google and Amazon were designing large-scale databases, 24/7 Availability was a key § A few minutes of downtime means lost revenue

§ When horizontally scaling databases to 1000s of machines,

the likelihood of a node or a network failure increases tremendously

§ Therefore, in order to have strong guarantees on Availability and Partition Tolerance, they had to sacrifice “strict” Consistency (implied by the CAP theorem)

Large-Scale Databases

Maintaining consistency should balance between the strictness of consistency versus availability/scalability § Good-enough consistency depends on your application

Strict Consistency

Generally hard to implement, and is inefficient

Loose Consistency

Easier to implement, and is efficient

Trading-Off Consistency

? Examples:

Acceptable in ATM withdrawals and cellphone calls Decouple updates to seller and buyer in transaction

Abadi’s classification system: PACELC • CAP theorem only matters when there is a partition • Even if partitions are rare, applications may trade

off Consistency for Latency – E.g. PNUTS allows inconsistent reads to reduce latency

• Critical for many applications – But update protocol (via master) ensures consistency over

availability • Thus Abadi asks two questions:

– If there is Partitioning, how does system trade off Availability for Consistency ?

– Else (no partitioning), how does system trade off Latency for Consistency ?

PACELC: Availability vs Latency

PACELC: Availability vs Latency

• If there is Partitioning, how does system tradeoff Availability for Consistency ? • Else how does system trade off Latency for Consistency ?

• Google Megastore: PC/EC • Yahoo PNUTS: PC/EL

Amazon Dynamo (by default): PA/EL

NoSQL Databases: Overview

• From CAP Theorem: – CA, CP, PA databases

• Data model – What data is being stored?

• CRUD interface – API for Create, Read, Update, Delete – Sometimes preceding S for Search

• Transaction consistency guarantees • Replication and sharding model

– What’s automated and what’s manual?

NoSQL Database Features More than 150 different NoSQL databases!!!

NoSQL Databases

Column-Family Store Key/Value Store

Document Store Graph Databases

NoSQL: we focus on 4 Data Models

NoSQL Data Models

Key-Value store

• Eventually-consistent Key-Value store • Hierarchical Key-Value Stores • Key-Value Stores In RAM • Key Value Stores on Disk • Ordered Key-Value Stores

• Essentially, big distributed hash maps • Origin attributed to Dynamo – Amazon’s DB for

world-scale catalog/cart collections – But Berkeley DB has been here for >20 years

• Data Model: store pairs ⟨key,opaque-value⟩ – Opaque means that DB does not associate any

structure/semantics with the value; oblivious to values – This may mean more work for the user: retrieving a large

value and parsing to extract an item of interest – Keys are unique.

• Sharding via partitioning of the key space – Hashing, gossip and remapping protocols for load

balancing and fault tolerance

• Redis • Amazon’s DynamoDB

– Originally designed for Amazon’s workload at peaks – Offered as part of Amazon’s Web services

• Riak – Focuses on high availability, BASE – “As long as your Riak client can reach one Riak server, it should be

able to write data.”

• FoundationDB – Focus on transactions, ACID

• Berkeley DB – First release 1994, by Berkeley, acquired by Oracle – ACID, replication

25

Example: Key-Value databases

• Redis is most popular key-value database

Redis

• Basically a data structure for strings, numbers, hashes, lists, sets

• Simplistic "transaction" management – Queuing of commands as blocks, really – Among ACID, only Isolation guaranteed

• A block of commands that is executed sequentially; no transaction interleaving; no roll back on errors

• In-memory store – Persistence by periodical saves to disk

• Comes with – A command-line API – Clients for different programming languages

• Perl, PHP, Rubi, Tcl, C, C++, C#, Java, R, …

key value set x 10 x 10

hset h y 5 h yà5 hset h1 name two

hset h1 value 2 h1 nameàtwo valueà2

hmset p:22 name Alma age 25 p:22 nameàAlma ageà25 sadd s 20

sadd s Alma s {20,Alma}

rpush l a rpush l b lpush l c

l (c,a,b)

get x >> 10

hget h y >> 5

hkeys p:22 >> name , age

(simple value) (hash table)

smembers s >> 20 , Alma

scard s >> 2

(set)

(list)

llen l >> 3

lrange l 1 2 >> a , b

lindex l 2 >> b

lpop l >> c

rpop l >> b

Example of Redis Commands

• A value: – Any <512MB binary string (e.g., JPEG image) – List with < 232 - 1 elements (more than 4 billion of elements).

• Some key operations: – Select database: select index (default index is 0) – List all keys: keys * – Remove all keys: flushall – Check if a key exists: exists k

• You can configure the persistency model – save m k means save every m seconds if at least k

keys have changed

Redis: extra notes

• Add-on module for managing multi-node applications over Redis

• Master-slave architecture for sharding + replication – Multiple masters holding pairwise disjoint sets of keys, every

master has a set of slaves for replication and sharding

http://redis.io/presentation/Redis_Cluster.pdf

Redis Cluster

Document store

• Similar in nature to key-value store, but value is tree structured as a Document

• Data model: store pairs ⟨key,Document⟩ • Motivation: avoid joins; ideally, all relevant joins

already encapsulated in the document structure • A document is an atomic object that cannot be split

across servers – But a document collection will be split

• Moreover, transaction atomicity is typically guaranteed within a single document

"Documents" are encoded in a standard data exchange format such as XML, JSON (JavaScript Object Notation) or BSON (Binary JSON). Unlike the simple key-value stores, the value column in document databases contains semi-structured data A single column can house hundreds of such attributes, and the number and type of attributes recorded can vary from row to row. Also, unlike simple key-value stores, both keys and values are fully searchable in document databases.

Document store

Model generalizes column-family and key-value stores

• MongoDB • Apache CouchDB

– Emphasizes Web access

• RethinkDB – Optimized for highly dynamic application data

• RavenDB – Deigned for .NET, ACID

• Clusterpoint Server – XML and JSON, a combined SQL/JavaScript QL

Example: Document store databases

• Open source, 1st release 2009, document store – Actually, an extended format called BSON (binary JSON)

for typing and better compression

• Supports replication (master/slave), sharding – Developer provides the “shard key” – collection is

partitioned by ranges of values of this key

• Consistency guarantees, CP of CAP • Used by Adobe (experience tracking), Craigslist, eBay,

FIFA (video game), LinkedIn, McAfee • Provides connector to Hadoop

– Cloudera provides the MongoDB connector in distributions

MongoDB

• JavaScript Object Notation (JSON) model • Database = set of named collections • Collection = sequence of documents • Document = BJSON: {attribute1:value1,...,attributek:valuek} • Attribute = string (attributei≠attributej) • Value = primitive value (string, number, date, ...), or a

document, or an array • Array = [value1,...,valuen]

• Key properties: hierarchical (like XML), no schema

– Collection docs may have different attributes

MongoDB Data Model

An example record from MongoDB, using JSON format, might look like { "_id" : ObjectId("4fccbf281168a6aa3c215443"), "first_name" : "Thomas", "last_name" : "Jefferson", "address" : { "street" : "1600 Pennsylvania Ave NW", "city" : "Washington", "state" : "DC" } }

Embedded object

Though records are called documents, they are not documents in the sense of a word processing document, although you can store binary data (using BSON format) in any of the fields in the document. You can also modify the structure of any document on the fly by adding and removing members from the document, either by reading the document into your program, modifying it and re-saving it, or by using various update commands.

MongoDB: Collection example

36

{ item: "ABC2", details: { model: "14Q3", manufacturer: "M1 Corporation" }, stock: [ { size: "M", qty: 50 } ], category: "clothing” } { item: "MNO2", details: { model: "14Q3", manufacturer: "ABC Company" }, stock: [ { size: "S", qty: 5 }, { size: "M", qty: 5 }, { size: "L", qty: 1 } ], category: "clothing” }

(docs.mongodb.org)

Collection inventory

db.inventory.insert( { item: "ABC1", details: {model: "14Q3",manufacturer: "XYZ Company"}, stock: [ { size: "S", qty: 25 }, { size: "M", qty: 50 } ], category: "clothing" } ) Document insertion

MongoDB: Collection example

{ _id: "a", cust_id: "abc123", status: "A", price: 25, items: [ { sku: "mmm", qty: 5, price: 3 }, { sku: "nnn", qty: 5, price: 2 } ] } { _id: "b", cust_id: "abc124", status: "B", price: 12, items: [ { sku: "nnn", qty: 2, price: 2 }, { sku: "ppp", qty: 2, price: 4 } ] }

Collection orders db.orders.find( { status: "A" }, { cust_id: 1, price: 1, _id: 0 } )

In SQL it would look like this: SELECT cust_id, price FROM orders WHERE status="A"

{ cust_id: "abc123", price: 25 }

selection

projection

MongoDB: Simple Query

{ _id: "a", cust_id: "abc123", status: "A", price: 25 } { _id: "b", cust_id: "abc124", status: "B", price: 12 } { _id: "c", cust_id: "abc123", status: "A", price: 20 }

Collection orders

{ _id: "abc123", price: 45 } { _id: "abc124", price: 12 }

Collection PurchasesPerCustomer

Sum up the purchases per

customer:

In SQL it would look like this: SELECT cust_id, sum(price) FROM orders GROUP BY cust_id; But orders are distributed all

over...

We'll do it later

2 options now: (1) Built-in MongoDB aggregates (2) MapReduce + custom JS code (more

flexible, less smart)

MongoDB: Map-reduce

• Similar to relational database model

• Structure:

– Column

– Super-column

– Column family

• Structure of database is defined by super-columns and column families.

• Data access is accomplished by specifying column family, key and column in order to get value, using following structure:

• <columnFamily>.<key>.<column> = <value>

Column family Model

keyspace

sid name address year faculty

861 Alma Haifa 2 NULL

753 Amir Jaffa NULL CS

955 Ahuva NULL 2 IE Standard RDB

id sid

1 861

2 753

3 955

id name

1 Alma

2 Amir

3 Ahuva

id address

1 Haifa

2 Jaffa

id year

1 2

3 2

id faculty

2 CS

3 IE

Column Store: each column stored separately (still SQL)

Why? Efficiency (fetch only required columns), compression, sparse data for free

1 sid:861 name:Alma address:Haifa ts:20

2 sid:753 name:Amir address:Jaffa ts:22

3 sid:955 name:Ahuva ts:32

1 year:2 ts:26

2 faculty:CS ts:25 email:{prime:c@d ext:c@e}

3 year:2 faculty:IE ts:32 email:{prime:a@b ext:a@c}

column family

column family

“column”

“supercolumn”

Column-Family Store: NoSQL

(Cassandra model) timestamp for conflicts

Two Types of Column Store

• The two often mixed as “column store” à confusion – See Daniel Abadi’s blog

• Common idea: don’t keep a row in a consecutive block, split via projection – Column store: each column is independent; – Column family store: each column family is independent

• Both provide some major efficiency benefits in common read-mainly workloads – Given a query, load to memory only the relevant columns – Columns can often be highly compressed due to value

similarity – Effective form for sparse information (no NULLs, no space)

Column store vs Column family store

• Column store (SQL): – MonetDB (started 2002, Univ. Amsterdam) – VectorWise (spawned from MonetDB) – Vertica (M. Stonebraker) – SAP Sybase IQ – Infobright

• Column family store (NoSQL): – Apache Cassandra – Google’s BigTable (main inspiration to column families) – Apache HBase (used by Facebook, LinkedIn, Netflix...) – Hypertable

Example: Column store and Column-family store

• Initially developed by Facebook – Open-sourced in 2008

• Used by 1500+ businesses, e.g., Comcast, eBay, GitHub, Hulu, Instagram, Netflix, Best Buy, ...

• Column-family store – Supports key-value interface – Provides a SQL-like CRUD interface: CQL

• Uses Bloom filters – An interesting membership test that can have false positives but never

false negatives, well behaves statistically

• BASE consistency model (AP) – Gossip protocol (constant communication) to establish consistency – Ring-based replication model

Appache Cassandra

Cassandra Data Model

Columns are added and modified dynamically

Super-columns group columns under a common name

Cassandra Data Model

• Graph databases employ nodes, edges and properties • Based on graph theory

• Nodes represent entities • Edges are the lines that connect nodes to nodes • Properties are pertinent information that relate to nodes

Graph Model Those databases are used when data can be represented as graphs/ For example, social networks, criminal rings, gated communities, etc.

• Graph with nodes/edges marked with labels and properties (labeled property graph) – neo4j (Java, 1st release 2010) – Sparksee (DEX) (Java, 1st release 2008) – InfiniteGraph (Java/C++, 1st release 2010) – OrientDB (Java, 1st release 2010)

• Triple stores: Support W3C RDF and SPARQL, also viewed as graph databases – MarkLogic, AllegroGraph, Blazegraph, IBM SystemG,

Oracle Spatial & Graph, OpenLink Virtuoso, ontotext

Example: Graph databases

• Open source, written in Java – First version released 2010

• Supports the Cypher query language • Clustering support

– Replication and sharding through master-slave architectures

• Used by ebay, Walmart, Cisco, National Geographic, TomTom, Lufthansa, ...

neo4j

49

label property

direction name

Cypher Graph for Social Networks

Cypher Graph: E-mail Exchange

51

CREATE (alice:User {username:'Alice'}), (bob:User {username:'Bob'}), (charlie:User {username:'Charlie'}), (davina:User {username:'Davina'}), (edward:User {username:'Edward'}), (alice)-[:ALIAS_OF]->(bob)

Creating Graph Data

MATCH p = (email:Email {id:'6'}) <-[:REPLY_TO*1..4]-(:Reply)<-[:SENT]-(replier) RETURN replier.username AS replier, length(p) - 1 AS depth ORDER BY depth

replier depth Davina 1

Bob 1 Charlie 2

Bob 3

Path Assignment

MATCH (bob:User{username:'Bob'})-[:SENT]->(email)-[:CC]->(alias), (alias)-[:ALIAS_OF]->(bob) RETURN email

email Node{id:"1",content:"..."}

Query Example

Graph database

Discovering insurance fraud

http://info.neo4j.com/rs/neotechnology/images/Fraud%20Detection%20Using%20GraphDB%20-%202014.pdf

Graph database

Discovering insurance fraud

Graph representation of Insurance Fraud

Other Popular NoSQL Databases

• Replaced by Redis nowadays • Designed to speeding up dynamic web

applications by alleviating database load • RAM resident key-value store for small chunks

of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering

• Simple interface • Designed for quick deployment, ease of

development • APIs in many languages

Memcached

Why Redis beats Memcached

• A distributed key-value system

• Used at LinkedIn

• 10K-20K node operations/CPU

• Auto-sharding

• Graceful server failure handling

Voldemort

• Open source project by Apache Foundation

• Consists of two core components

– Hadoop Distributed File System (Storage)

– MapReduce (Compute)

• Column-oriented data store

• Java interface

• Hbase designed specifically to work with Hadoop

Hadoop / Hbase

• Apache document-oriented store

• Written in ERLANG

• RESTful JSON API

• Distributed, featuring robust, incremental replication with bi-directional conflict detection and management

CouchDB

• Native XML database designed to used by Petabyte data stores

• ACID compliant

• Heavy use by federal agencies, document publishers and "high-variability" data

• Arguably the most successful NoSQL company

MarkLogic

• OpenSource native XML database

• Strong support for XQuery and XQuery extensions

• Heavily used by the Text Encoding Initiative (TEI) community and XRX/XForms communities

• Ideal for metadata management

• Integrated Lucene search and structured search

eXist

• Open Source • Closely modeled after Google's Bigtable

project • High performance distributed data storage

system • Designed to support applications requiring

maximum performance, scalability, and reliability

• Hypertable Query Language (HQL) that is syntactically similar to SQL

Hypertable

• The data is not structured or structure is changing

• You need to have a denormalized representation of your data

• You need massive write performance

• You need fast key-value access

• You need flexible schema/data types

• You need schema migration

• You need easier maintainability

When to use NoSQL ?

Date post:	15-May-2020
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times