1/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
CS-695NoSQL Database
Polyglot Persistence; Or, The Many Ways WeStore Data
Dr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck Cartledge
27 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 201527 Aug. 2015
2/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
Table of contents I
1 A little history
2 A change in the air
3 Database layouts
4 CRUDy stuff
5 Databases that I/we use
6 Conclusion
7 References
3/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
Hammer and nails . . .
“. . . it is tempting,if the only tool youhave is a hammer, totreat everything as if itwere a nail.”
Abraham H. Maslow [8]
4/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
Miscellania
Origin of “polyglot . . . ”
Popularized by Neal Ford [4]:
Talked about software development
How things are evolving (SQL,XML, .NET, etc.)
How multi-threading is hard(concurrency, coordination, etc.)
Promoted the idea of enterprisedevelopment via Java and .NET
Take away: choose the right tool for thejob.
Different languages will continue to exist because each is good atsomething and all are necessary.
5/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
Miscellania
The world BC (Before Codd).
Databases existed before EdgarCodd.
Hierarchical approach – aliveand well in our file system
Network approach –currently underpinning ideasfor graph databases
These suffered because peoplehad to know lots of details abouthow the database wasimplemented.
6/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
Miscellania
The world after Codd.
Separate representation fromimplementation
Changes in database foroptimization needn’t affect dataqueries
User interactions aren’t clutteredby “construction noise” (includingindexing and sorting)
Codd’s relational data bank hides allimplementation information.
Relational database management systems (RDBMS) hidinginformation about how data is stored. Data language isindependent of how data is stored [3].
7/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
Miscellania
The world according to RDBMS.
Everything is neat and tidy
Everything can be defined ina set of tables that haverelationships between them
If you make the databaselarge enough, you can storeanything and ask anyquestion
Image from [10].
RDBMS reigned supreme for 30 - 40 years (starting in 1970). Andthen reality and Big Data started to hit.
8/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
How we turned and started to get to now.
And then things started changing.
Can’t point a finger at a specificincident, might be a critical mass.
The Internet made it easier tocollect data.
A new generation of peoplethought about things in a differentway.
The new data had three attributes:velocity, volume, variety [7].
New ways of looking at dataencouraged new questions.
People wanted answers faster.
Many of these items couldn’t be supported by a RDBMS.
9/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
Make things faster.
Simple and complex ways
How to get more processingpower to answer databasequestions?? Basically:
Scale up – buy faster CPUand more RAM
Scale out – buy more CPUsand get them to work inparallel
Scaling up with custom CPUsgets expensive very, very quickly.
Image from [9].
Commodity CPUs are almost a dime a dozen. Leading to clusters,network services, distributed applications, etc.
10/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
Make things faster.
Amdahl’s Law [1]
Division and measurement of serial and parallel operations appearstime and again. (Shades of Mandelbrot.)
“Make the common fast.”
“Make the fast common.”
Understand what parts haveto be done serially.
Understand what parts canbe done in parallel.
Need to factor in “overhead” costs when computing speed up.
11/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
Make things faster.
Amdahl’s Law (A summary)
Time for serial executiondef.== T (1)
Portion that is NOT beparalyzable
def.== B ∈ (0, 1]
Number of parallel resourcesdef.== n
T (n) = T (1) ∗ (B + 1n(1− B))
Speed updef.== S(n)
S(n) = T (1)T (n)
= 1
B+ 1n(1−B)
Dr. Gene Amdahl (circa 1960)
12/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
The questions changed.
We knew that we didn’t know.
Our questions and our data changed.RDBMS had limitations:
Supported ad hoc questions onpredefined data
Didn’t support undefined orunstructured data
Could scale up not out, sodatabase size was practicallylimited
SQL predicate calculus madelogic awkward
RDBMS are very, very good at somethings, but user needs were changing.
13/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
The questions changed.
What happens when we ask a different question??
When the RDBMS database was designed, wethought we knew what we wanted to know.That was then.
Now if we want to look at familyrelationships (parent, child, sibling,extended family, etc.)
We can add a column to the table forup/down relationships
We can add a column for side to siderelationships
We can add a column for extended familyrelationships
The database doesn’t look like how we thinkabout the problem.
When the data representation doesn’t match how we think, then something has
to change.
14/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
A collection of different database layouts.
A RDBMS
Can add well formed data easily
Difficult to add new data fields ortypes
Each row is expected to have thesame data
Supports unknown (ad hoc) querieswell
Scales up not out
Popular RDBMS: Oracle, MySQL, MSSQL Server, PostgreSQL
The “King of the World” for a very long time. (A version lives inyour phone.)
15/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
A collection of different database layouts.
A columnar database
Takes the idea of a roworientated database and turns iton its side.
Can add new columns easily
Each row can have differentuse different columns
Scales up and out
Popular column orienteddatabases: IBM DB2, Sybase IQ,Teradata Image from [2].
16/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
A collection of different database layouts.
A Key-Value design
A number (called the key) locates all otherdata (the value[s]).
Use math on some data (may be morethan one piece)
The math (hash function) returns onevalue (the key)
Use the key to find the rest of the data
Locating data can be fast
Hash function should return unique values
Popular Key-Value DBMS: Redis, Memcached,Amazon DynamoDB, Riak
Key-value databases are fast when using the hash function. Not so fast if you
aren’t.
17/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
A collection of different database layouts.
An Online Analytical Processing (OLAP) design
A way to visualize and analyze data using a“data cube” and basic functions:
Basic functions:
1 Consolidation (roll-up) of themulti-dimensional data
2 Drill-down into the data3 Slicing and dicing
Fast execution time
Incorporates aspects of navigational,hierarchical, and relational databases
Popular OLAP databases: Hyperion Solutions,Cognos, MicroStrategy, Applix
Image from [15].
Target users are business analysts and business process management.
18/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
A collection of different database layouts.
A Graph design
A very different way to think about data.
Consists of two parts:
1 Node (something that exists asan entity in the database)
2 Arcs (something that describes arelationship between nodes)
You can have nodes without arcs. Youcan not have arcs without nodes. Arcscan be unidirectional.
Popular graph databases: Neo4j, OrientDB,Titan, Giraph
Image from [6].
Questions are driven by the relationships between nodes vice the nodes
themselves.
19/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
A collection of different database layouts.
A document design
Document oriented databases can be “viewed,”and can have internal document databases(recursively).
Database is organized based on “tags”
Tag’s meaning is instance dependent
Tags can be nested (recursively)
Database structure maybe XML basedand represented in different ways
Popular document databases: MongoDB,CouchDB, Couchbase, MarkLogic
Sometimes document databases show up in unexpected places.
20/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
Which design to use?
If I had a hammer, . . .
Questions to ask:
1 How much data will be in thedatabase??
2 Will I be reading mostly??
3 Will I be writing mostly??
4 How accurate must the data be??
5 How many simultaneous readersand writers??
6 How robust/resilient must thedatabase be??
7 How will the database beaccessed??
8 What about ACID vs. BASE??
So many choices.
21/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
Which design to use?
ACID vs. BASE
One is a design principle, the other is counter marketing.
ACID [5]1 A – Atomicity - all or nothing2 C – Consistency - database is always valid3 I – Isolation - concurrent equal serial ops.4 D – Durable - the database is written to disk
A database action will completecompletely.
BASE [12]1 BA – Basically Available2 S – Soft state - user guarantees consistency3 E – Eventually consistent
A database action will probably completeeventually.
ACID comes with SQL. BASE comes with NoSQL.
22/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
Which design to use?
Consistency, Availability, Partition tolerance (CAP)Theorem
Sharing data in distributed systems ishard.
Data can be consistent across thesystem
Data can be available across thesystem
The system can continue tofunction if partitioned/split
You only get to choose two.
Image from [17].
RDBMS on a single machine means partition is undefined. Distributed systems
only get two.
23/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
Create — darkness was on the face of the deep.
Ex nihilo nihil fit (out of nothing, nothing comes).
The CRUD approach doesn’t say what happened before the C.
RDBMS CREATE DATABASE db name;
CREATE TABLE table name (column name1 data type(size),column name2 data type(size), . . . );
Columnar
CREATE DATABASE
CREATE table name, column name1,column name2, ...;
Key-Value, Graph, Document
CREATE DATABASE
CREATE table name
Graph, Document
CREATE DATABASE
Image from [11].
Implementation agnostic.
24/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
Create — darkness was on the face of the deep.
Create an entry
RDBMSINSERT INTO table name VALUES (value1,value2,value3,...);
ColumnarPUT table name, row name, column name1:, “value”;
Key-ValueADD table name, key value, value;
GraphCREATE relationship name, vertex name1, vertex name2
DocumentINSERT table name (GML/XML/JSON “marked up” data)
25/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
Report — databases aren’t much good if you can’t get stuff out.
Report/Retrieve data an entry from the database
RDBMSSELECT column name,column name FROM table name;
ColumnarGET table name, row name1:, column name:;
Key-ValueGET table name, key value;
Graph (pipe operations)GET VERTEX|EDGE FILTER(expression) (. . . )
DocumentFIND document id
26/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
Update — things change.
Update an entry
RDBMS
UPDATE table name SET column1=value1,column2=value2,... WHEREsome column=some value;
Columnar
DELETE FROM table name WHERE [expression];
PUT table name, row name, column name1:, “value”;
Key-Value
SET table name, key value, value;
Graph
GET VERTEX | EDGE FILTER(expression) (. . . ) REMOVE propertyADD property
Document
UPDATE document id value (same format as CREATE)
27/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
Delete — to remove that which once was.
Delete an entry
RDBMSDELETE FROM table name WHEREsome column=some value;
ColumnarDELETE FROM table name WHERE [expression];
Key-ValueDROP table name, key value;
GraphGET VERTEX|EDGE FILTER(expression) (. . . ) REMOVE
DocumentREMOVE document id value
28/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
Lots and they are hidden.
Shopping as an example
Firefox – SQLite for browserhistory
Shopping cart – Key-Valuebased on session ID
Recommended purchases –graph database
Credit card payment – SQLdatabase
Excel record purchase –document
Save Excel file – hierarchicaldatabase
29/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
A continuum.
Things from a 50,000 foot perspective
Messy Neat andtidy
Rigid
Ad-hoc
Data
Queries
Free textK-V
Doc.
OLAP
Col.
RDBMS
30/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
A continuum.
Notional strengths and weaknesses
Database type
RDBMS K-V Col. Doc. Graph
ACIDBASE
Ad-hoc queries∆ Hardware
Hardware failure
SupportedNot supported by data model
No statement
31/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
Where can I get these things??
Popular open source databases
RDBMS – MySQL,PostrgreSQL, SQLite
Key-Value – Redis,Memcached, Riak
Columnar – HBase,Accumulo, Hypertable
Document – MongoDB,CouchDB, Couchbase
Graph – Neo4j, OrientDB,Titan Image from [16].
Open source does not mean free; your time costs money.
32/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
In summary . . .
What can we say??
1 Each type of databasedesign fills a specificneed/niche.
2 Each type could do the workof the others
1 Each type has a datamodel tailored to itsproblem domain
2 Performance is tied to thehardware (CPU and I/O)
RDBMS has been the King for a long time. Expect it to remain sodue to inertia.
33/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
NoSQL Distilled: A Brief Guide to the Emerging Worldof Polyglot Persistence
by Sadalage and Fowler [14].
Book to be used and refered toduring the course, ISBN9780321826626.
34/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
Seven Databases in Seven Weeks: A Guide to ModernDatabases and the NoSQL Movement
by Redmon and Wilson [13].
A very nice and graspable tour ofvarious NoSQL database types.Examples of each type ispresented with exercises that canbe completed in a weekend.Book to be used and refered toduring the course, ISBn9781934356920.
35/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
References I
[1] Gene M Amdahl, Validity of the single processor approach to achievinglarge scale computing capabilities, Proceedings of the Spring JointComputer Conference, ACM, 1967, pp. 483–485.
[2] Dale Anderson, Column oriented database technologies,http://www.dbbest.com/blog/column-oriented-database-technologies/,2012.
[3] Edgar F. Codd, A relational model of data for large shared data banks,Communications of the ACM 13 (1970), no. 6, 377–387.
[4] Neal Ford, Polyglot programming,http://memeagora.blogspot.com/2006/12/polyglot-programming.html,2006.
[5] Jim Gray, The transaction concept: Virtues and limitations, Very LargeDatabases, vol. 81, 1981, pp. 144–154.
36/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
References II
[6] Andy Hogg, Whiteboard it the power of graph databases,http://www.computerweekly.com/feature/Whiteboard-it-the-power-of-graph-
2013.
[7] Doug Laney, 3d data management: Controlling data volume, velocity andvariety, META Group Research Note 6 (2001).
[8] Abraham H. Maslow, The psychology of science, Henry Regency, 1966.
[9] Andrea Mauro, Storage scale-up vs. scale-out,http://vinfrastructure.it/2014/06/scale-out-vs-scale-in/,2014.
[10] David Mertz, Xml matters: Putting xml in context with hierarchical,relational, and object-oriented models,http://www.ibm.com/developerworks/library/x-matters8/, 2001.
37/37
A little history A change in the air Database layouts CRUDy stuff Databases that I/we use Conclusion References
References III
[11] Brian Panulla, If libraries were like relational databases,http://ghostednotes.com/2010/12/31/if-libraries-were-like-relational-
2010.
[12] Dan Pritchett, Base: An acid alternative, Queue 6 (2008), no. 3, 48–55.
[13] Eric Redmond and Jim R Wilson, Seven databases in seven weeks,Pragmatic Bookshelf, 2012.
[14] Pramod J Sadalage and Martin Fowler, Nosql distilled, PearsonEducation, 2012.
[15] DatabaseJournal Staff, Examples of sql server implementations, DatabaseJournal (2010).
[16] Wikipedia Staff, Database,https://en.wikipedia.org/wiki/Database, 2015.
[17] Saeid Zebardast, Said experts, http://blog.zebardast.ir/, 2015.