NoSQL in Perspective

Post on 17-May-2015

670 views 5 download

Tags:

transcript

NoSQL in Perspective

Jeff Smith jeffreyksmithjr@gmail.com

NoSQL on Wikipedia

92 databases 8 types 6 sub-types

Easy Questions

Is this a graph? Do I already have XML or JSON? Is this a caching problem?

Paul Graham on Programming Languages

Lisp

C

Math Problem

Lisp is just math. Math doesn't get stale. What in databases is just math?

Putting the R in RDBMSes

Relation

Attributes

Tuples

Database Analogy

C is to Lisp as Relational Algebra is to Relational Calculus

C: Lisp::Relational Algebra: Relational Calculus

Relational Algebra in Action

Relational Algebra:

SQL:

R ⋉S = { t : t R, s S, Fun (t s) }

SELECT * FROM audience WHERE clue > 0;

Relational Calculus in Action?

Relational Calculus:

Relevant Implemented Language:

{ t : {name} | ∃ s : {name, wage} ( Employee(s) ∧ s.wage = 50.000 ∧ t.name = s.name ) }

This space under construction.

Relational Model Utility

Essentially, all models are wrong, but some are useful.

- George E. P. Box

When relations are wrong

Sparse data Irregular data Poorly understood interrelationships No definable indexes Big data No vertically scalable hardware

Papers Read Around the World

Google's BigTable: http://research.google.com/archive/bigtable.html

Amazon's Dynamo: http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html

Lessons from Functional Programming

MapReduce: http://research.google.com/archive/mapreduce.html

MapReduce map(String key, String value): // key: document name

// value: document contents for each word w in value:

EmitIntermediate(w, "1");

reduce(String key, Iterator values):

// key: a word // values: a list of counts

int result = 0;

for each v in values: result += ParseInt(v);

Emit(AsString(result)); [1]

CAP Theorem

Consistency Availability Partition tolerance

CAP Theorem?

Consistency

Availability

Partition Tolerance

Sacrifice Availability

Consistency Partition Tolerance

Then, sacrifice what?

Consistency Partition Tolerance

Availability Availability

PACELC

In the event of a Partition, does the system prioritize Availability or Consistency

Else does the system prioritize Latency or Consistency?

PACELC as a Tree

Partition Else

Availability Consistency Latency Consistency

Traditional RDBMSes: PC/EC

Partition Else

Consistency Consistency

Eventually Consistent: PA/EL

Partition Else

Availability Latency

ELC: Replication Options

1. Update all nodes 2. Update the master node first 3. Update an arbitrary node first

Best of both worlds?

SQL

HadoopDB

MySQL Cluster

Riak Demo

N: persisted copies

R: read copies

W: write copies

Strong Consistency: R + W > N

Thanks

Jeff Smith jeffreyksmithjr@gmail.com