+ All Categories
Home > Technology > Some NoSQL

Some NoSQL

Date post: 16-Jul-2015
Category:
Upload: malk-zameth
View: 137 times
Download: 0 times
Share this document with a friend
Popular Tags:
72
Some NoSQL
Transcript

Some NoSQL

By "SQL" we mean :Relational DBs

So NoSQL:Non-Relational

The biggest difference

So, what is relational?

Data is represented by Tables and Relations

Relational model, created by IBM in 69

Data manipulation is done by queries

grouped by transactions

What is the problem with that?

"It does not scale"

A common, but bad, answer!

Why is it common?

ACIDThe 4 qualities of transactions on a relational model

Atomicity, Consistency, Isolation, Durability

AtomicityEach transaction is "All or nothing".

ConsistencyEach transaction brings the Database from a valid state

to another

IsolationConcurrent transactions have no side effects

(reentrancy)

DurabilityOnce a transaction is commited it remains so (even on

shutdown)

Brewer's CAP theorem

Of Consistency, Availability and Partition

toleranceChose two

You cannot scale without Partition tolerance

no mainframe would be that big!

You cannot afford ignoring availabilitySorry customer our service is down again!

So to scale you have to drop consistency

which means some stale data is ok. that is a compromise. A risk to be taken and considered.

ACID means having Consistency

to scale you need an alternative

BASE

Basic Availability Soft-state Eventual consistency

details here : http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf

You drop Consistency for Eventual Consistency

That is the most important change for scaling purposes

All that is true!

So if Relational implies ACID and ACID does not

scale?then: relational databases do not scale, right?

Wrong!

Why is the argument bad?

see FacebookThe worlds biggest hive of data

Facebook uses several datastores

polyglotism, we will get to that

But most of facebook data is on MySQL

and it scales

You can make your relational data behave in

a BASE wayGiven enough effort, time and money.

Should you?it depends on your data

So what is the problem with the relational model?

The real one?

"If all you have is a hammer, everything looks

like a nail"Abraham Maslow, The Psychology of Science, 1966, p.

15

We use it fornon-relational data

Database (Complex Algorithms)

SQL Interpreter

SQL Generation (souped up concatenation)

Database Abstraction Layer (avoid lock in)

Model Translator (ORM usually)

Model Logic (the M in MVC)

Your App

Several layers just to force our data to be

something elseAND to go back being our data!

This adds bugsin each layer

This adds performance costs

in all those translations

This adds integration costs

Ever spent dev time making those layers work?

This adds Dev costsYou must jump hoops making your data behave

relationally

What about NoSQL ?

Several data representations!

● Key-Value● Document● Column-Family● Graph● XML-bases● Object● Grid● mixed (using several types)● etc.

Key-ValueRedis, Riak, CouchBase, etc.

Key-Value Datastore: What is it?

You store keys (identifiers) and values (pretty much anything, serialized)

Just a quick way to store things under a name and recover them using that name.

Key-Value Datastore: When to use?

● Dictionaries● Session data● User preferences● Shopping cart● Anything whose content you do not want to

scry or query.

Key-Value Datastore: When to avoid?

● You have relations● You have multi-operational transactions● You want to query the values● You want to operate on sets of entries

DocumentMongoDB, CouchDB, TerraStore, RavenDB, Lotus Notes,

etc.

Document Datastore: What is it?

As with the key-value, but your data is not amorph is a document!

Each document behaves like an Hash-table, it has entries of a given kind that may themselves have entries (like a xml or json file).

documents are schemaless, you have complete liberty of what goes inside them.

Document Datastore: When to use?

● When you have documents!○ Blogs○ CMS

● When freedom of schema is required○ Analytics○ E-commerce products

● When you wanted a key-value but wanted to query the values.

Document Datastore: When to avoid?

● You need complex/atomic transactions over different documents○ in that case you may have a relation, you may need

sql after all!● The schema-free usage render your queries

impossible.● You want to force a schema.

Column-FamilyHadoop, Cassandra, Amazon SimpleDB, Amazon

DynamoDB etc.

Column-Family Datastore:What is it?

Data in tables of rows and columns like the relational model but:● Each row has a varying number of columns

(hence the name)● Each row is timestamped for comparison,

expiring and conflict resolution.● There is no master node; writing can be

scaled by adding nodes.● A column may contain another row.

Column-Family Datastore:When to use it?

● Logging● Registering events● Counters● when you have massive concurrent writes

with small chances of collisions (facebook uses for their internal messaging system)

● when your information has a due date

Column-Family Datastore:When to avoid it?

● You need ACID● You need aggregate results (sums,

averages, etc)● Your data is not tabular

GraphNeo4J, Titan, FlockDB, OrientDB etc.

Graph Datastore: What is it?

Data is represented by nodes (objects) connected by vertices (relations).

The very school definition of a graph.

The same data can represent several graphs.

Graph traversal may be persisted as a relation.

Graph Datastore: When to use it?

Anywhere you should already be using Graphs on your application:● Any relations (in the relational model

sense) that have no data.● Social relations (friend of, employee, chief

of, etc)● Dependency● Geographical data● Routing, dispatching etc.

Graph Datastore: When to avoid it?

Your application writes over large sets of nodes commonly (writing to many nodes at once is expensive)

Your relations carry payloads (in that case you need sql)

Which one to chose?

The ones closerto your data

yes plural

Polyglot PersistenceDifferent Datastores for different Data

For each slice of data you want to store

Ask what datastore model would better represent it

Stop nailing screws!

How do you diagnose the correct type of each data?

Linagora can help!

Questions?


Recommended