+ All Categories
Home > Technology > The CIOs Guide to NoSQL 2012

The CIOs Guide to NoSQL 2012

Date post: 20-Aug-2015
Category:
Upload: dataversity
View: 1,890 times
Download: 1 times
Share this document with a friend
Popular Tags:
63
The CIO's Guide to NoSQL Dan McCreary July 12th 2012 Version 6
Transcript
Page 1: The CIOs Guide to NoSQL 2012

The CIO's Guide to

NoSQL

Dan McCreary

July 12th 2012 Version 6

Page 2: The CIOs Guide to NoSQL 2012

M

D

Agenda

• What is NoSQL?

• What Triggered the NoSQL Movement?

• How is NoSQL distinct from Big Data and Cloud

Computing?

• Common Characteristics of NoSQL System

• Business Benefits of NoSQL

• Core NoSQL Concepts

• Selected NoSQL Implementations

• Recent NoSQL Developments

• Selecting the Right NoSQL System

• Next Step: Selecting the Right NoSQL Pilot Project

Copyright Kelly-McCreary & Associates, LLC 2

Page 3: The CIOs Guide to NoSQL 2012

M

D

Manning NoSQL Books

Kelly-McCreary & Associates, LLC 3

Page 4: The CIOs Guide to NoSQL 2012

M

D

Background for Dan McCreary

• Bell Labs

• NeXT Computer (Steve Jobs)

• Owner of Custom Object-Oriented Software Consultancy

• Federal data integration (National Information Exchange Model)

• Native XML/XQuery – 2006

• Advocate of NoSQL/XRX systems

• Working with Manning Publications on NoSQL Topic

Copyright Kelly-McCreary & Associates, LLC 4

Page 5: The CIOs Guide to NoSQL 2012

M

D

NoSQL Definition

The NoSQL movement is a set of concepts

and technologies that allow the rapid and

efficient processing of large data sets with a

focus on performance and resiliency.

Copyright Kelly-McCreary & Associates, LLC 5

Page 6: The CIOs Guide to NoSQL 2012

M

D

Sample of NoSQL Jargon

Document orientation

Schema free

MapReduce

Horizontal scaling

Sharding and auto-sharding

Brewer's CAP Theorem

Consistency

Reliability

Partition tolerance

Single-point-of-failure

Object-Relational mapping

Key-value stores

Column stores

Document-stores

Memcached

6 Copyright Kelly-McCreary & Associates, LLC

Indexing

B-Tree

Configurable durability

Documents for archives

Functional programming

Document Transformation

Document Indexing and Search

Alternate Query Languages

Aggregates

OLAP

XQuery

MDX

RDF

SPARQL

Architecture Tradeoff Modeling

ATAM

Note that within the context of NoSQL many

of these terms have different meanings!

Page 7: The CIOs Guide to NoSQL 2012

M

D

Selecting a Database…

"Selecting the right data storage solution is

no longer a trivial task."

Copyright Kelly-McCreary & Associates, LLC 7

Does it look like

document?

Use Microsoft Office

Use the RDBMS

Start

Stop

No

Yes

Page 8: The CIOs Guide to NoSQL 2012

M

D

Pressures on SQL Only Systems

Copyright Kelly-McCreary & Associates, LLC 8

SQL OLAP/BI/Data Warehouse

Social Networks

Scalability

Agile Schema

Free

Page 9: The CIOs Guide to NoSQL 2012

M

D

Simplicity is a Virtue

• Many systems derive their strength by dramatically limiting the features in their system

• Simplicity allows database designers to focus on the primary business driver

• Examples: – Touch screen interfaces

– Key-value data stores

Copyright Kelly-McCreary & Associates, LLC 9

Page 10: The CIOs Guide to NoSQL 2012

M

D

Historical Context

Mainframe Era

• 1 CPU

• COBOL and FORTRAN

• Punchcards and flat files

• $10,000 per CPU hour

MapReduce Era

• 10,000 CPUs

• Functional programming

• MapReduce "server farms"

• Pennies per CPU hour

Copyright Kelly-McCreary & Associates, LLC 10

Page 11: The CIOs Guide to NoSQL 2012

M

D

Two Approaches to Computation

11 Copyright 2010 Dan McCreary & Associates

Alonzo Church John Von Neumann

Manage state with a program counter. Make computations act like math functions.

Which is simpler? Which is cheaper? Which will scale to 10,000 CPUs?

1930s and 40s

Page 12: The CIOs Guide to NoSQL 2012

M

D

Standard vs. MapReduce Prices

Copyright Kelly-McCreary & Associates, LLC 12

http://aws.amazon.com/elasticmapreduce/#pricing

John's Way Alonzo's Way

Page 13: The CIOs Guide to NoSQL 2012

M

D

MapReduce CPUs Cost Less!

0

5

10

15

20

25

30

35

40

StandardCPU

MapReduceCPU

Cost Per CPU Hour (Cents)

Copyright Kelly-McCreary & Associates, LLC 13

http://aws.amazon.com/elasticmapreduce/#pricing

Cuts cost from 32 to 6 cents per CPU hour! Perhaps Alanzo was right!

Why? (hint: how "shareable" is this process)

Page 14: The CIOs Guide to NoSQL 2012

M

D

Perspectives

Kelly-McCreary & Associates, LLC 14

Native XML

OLAP MDX

Object Stores

Graph Stores

NoSQL for Web 2.0

and BigData

Perspective depends on your context

Page 15: The CIOs Guide to NoSQL 2012

M

D

Architectural Tradeoffs

Kelly-McCreary & Associates, LLC 15

"I want a fast car with good mileage."

"I want a scaleable database with low cost that runs

well on the 1,000 CPUs in our data center."

Page 16: The CIOs Guide to NoSQL 2012

M

D

NoSQL on Google Trends

16 Kelly-McCreary & Associates, LLC

!

Page 17: The CIOs Guide to NoSQL 2012

M

D

Recent History

• The term NoSQL became re-popularized around 2009

• Used for conferences of advocates of non-relational databases

• Became a contagious idea "meme"

• First of many "NoSQL meetups" in San Francisco organized by Jon Oskarsson

• Conversion from "No SQL" to "Not Only SQL" in recent year

17 Kelly-McCreary & Associates, LLC

Page 18: The CIOs Guide to NoSQL 2012

M

D

NoSQL and Web 2.0 Startups

• Many web 2.0 startups did not use Oracle

or MySQL

• They built their own data stores influenced

by Amazon’s Dynamo and Google’s

BigTable in order to store and process

huge amounts of data

• In the social community or cloud

computing applications, most of these data

stores became OpenSource software

18 Kelly-McCreary & Associates, LLC

Page 19: The CIOs Guide to NoSQL 2012

M

D

Google MapReduce

• 2004 paper that had huge impact of functional programming in the entire community

• Copied by many organizations, including Yahoo

Copyright Kelly-McCreary & Associates, LLC 19

Page 20: The CIOs Guide to NoSQL 2012

M

D

Google Bigtable Paper

• 2006 paper that gave focus to scaleable databases

• designed to reliably scale to petabytes of

data and thousands of machines

Copyright Kelly-McCreary & Associates, LLC 20

Page 21: The CIOs Guide to NoSQL 2012

M

D

Amazon's Dynamo Paper

• Werner Vogels

• CTO - Amazon.com

• October 2, 2007

• Used to power Amazon's S3 service

• One of the most influential papers in the NoSQL movement

• Service in 2012

Copyright Kelly-McCreary & Associates, LLC 21

Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin,

Swami Sivasubramanian, Peter Vosshall and Werner Vogels, “Dynamo: Amazon's Highly Available Key-Value Store”,

in the Proceedings of the 21st ACM Symposium on Operating Systems Principles, Stevenson, WA, October 2007.

Page 22: The CIOs Guide to NoSQL 2012

M

D

NoSQL "Meetups"

“NoSQLers came to share how they had

overthrown the tyranny of slow, expensive

relational databases in favor of more

efficient and cheaper ways of managing

data.”

22 Kelly-McCreary & Associates, LLC

Computerworld magazine, July 1st, 2009

Page 23: The CIOs Guide to NoSQL 2012

M

D

Key Motivators

• Licensing RDBMS on multiple CPUs

• The Thee "V"s

– Velocity – lots of data arriving fast

– Volume – web-scale BigData

– Variability – many exceptions

• Desire to escape rigid schema design

• Avoidance of complex Object-Relational Mapping (the "Vietnam" of computer science)

23 Kelly-McCreary & Associates, LLC

Page 24: The CIOs Guide to NoSQL 2012

M

D

Copyright 2008 Dan McCreary & Associates

24

Many Processes Today Are Driven By…

The constraints of yesterday…

Challenge:

Ask ourselves the question…

Do our current method of solving problems with tabular data…

Reflect the storage of the 1950s…

Or our actual business requirements?

What structures best solve the actual business problem?

Page 25: The CIOs Guide to NoSQL 2012

M

D

Copyright 2008 Dan McCreary & Associates

25

No-Shredding!

• Relational databases take a single hierarchical document and shred it into many pieces so it will fit in tabular structures

• Document stores prevent this shredding

My

Data

Page 26: The CIOs Guide to NoSQL 2012

M

D

Copyright 2008 Dan McCreary & Associates

26

Is Shredding Really Necessary?

• Every time you take

hierarchical data and

put it into a traditional

database you have to

put repeating groups in

separate tables and

use SQL “joins” to

reassemble the data

Page 27: The CIOs Guide to NoSQL 2012

M

D

Object Relational Mapping

• T1 – HTML into Objects

• T2 –Objects into SQL Tables

• T3 – Tables into Objects

• T4 – Objects into HTML

T1

T3

T2

T4

Object Middle Tier

Relational Database

Web Browser

27 Kelly-McCreary & Associates, LLC

Page 28: The CIOs Guide to NoSQL 2012

M

D

"The Vietnam of Applications"

• Object-relational mapping has become one of the most complex components of building applications today

• A "Quagmire" where many projects get lost

• Many "heroic efforts" have been made to solve the problem:

– Hibernate

– Ruby on Rails

• But sometimes the way to avoid complexity is to keep your architecture very simple

Copyright Kelly-McCreary & Associates, LLC 28

Page 29: The CIOs Guide to NoSQL 2012

M

D

Document Stores Need No Translation

• Documents in the database (JSON or XML)

• Documents in the application

• No object middle tier

• No "shredding"

• No reassembly

• Simple!

29

Copyright 2010 Dan McCreary & Associates

Application Layer Database

Document Document

Page 30: The CIOs Guide to NoSQL 2012

M

D

The XML "Full Stack"

• XML lives in the web browser (XForms)

• REST interfaces

• XML in the database (Native XML, XQuery)

• XRX Web Application Architecture

• No translation!

30

Copyright 2010 Dan McCreary & Associates

Web Browser XML database

XForms REST-Interfaces

Page 31: The CIOs Guide to NoSQL 2012

M

D

"Schema Free"

• Systems that automatically determine how to

index data as the data is loaded into the

database

• No a priori knowledge of data structure

• No need for up-front logical data modeling

– …but some modeling is still critical

• Adding new data elements or changing data

elements is not disruptive

• Searching millions of records still has sub-

second response time

31

Copyright 2010 Dan McCreary & Associates

Page 32: The CIOs Guide to NoSQL 2012

M

D

Monoculture and Mono-architecture

32

Copyright 2010 Dan McCreary & Associates

Image Source: Wikipedia

Page 33: The CIOs Guide to NoSQL 2012

M

D

Eric Evans

“The whole point of seeking alternatives

[to RDBMS systems] is that you need to

solve a problem that relational databases

are a bad fit for.”

Eric Evans Rackspace

33 Kelly-McCreary & Associates, LLC

Page 34: The CIOs Guide to NoSQL 2012

M

D

Evolution of Ideas in OpenSource

• How quickly can new ideas be recombined into new database products?

• OpenSource software has proved to be the most efficient way to quickly recombine new ideas into new products

Copyright Kelly-McCreary & Associates, LLC 34

Product A

Product B

Product B

OpenSource

Proprietary Software New Database Ideas

Schema-free

MapReduce Auto-sharding

New Products

Cloud Computing

Page 35: The CIOs Guide to NoSQL 2012

M

D 35 Copyright 2010 Dan McCreary & Associates

Storage Architectural Patterns

Tables Trees

Triples Stars

Page 36: The CIOs Guide to NoSQL 2012

M

D

Finding the Right Match

36 Copyright 2010 Dan McCreary & Associates

Schema-Free

Mature Query Language

Standards Compliant

Use CMU's Architectural Tradeoff and Modeling (ATAM) Process

Page 37: The CIOs Guide to NoSQL 2012

M

D

Avoidance of Unneeded Complexity

• Relational databases provide a variety of

features to ALWAYS support strict data

consistency

• Rich feature set and the ACID properties

implemented by RDBMSs might be more

than necessary for particular applications

and use cases

37 Kelly-McCreary & Associates, LLC

Page 38: The CIOs Guide to NoSQL 2012

M

D

"Once Size Fits…"

"One Size Does Not Fit All"

James Hamilton Nov. 3rd, 2009

Kelly-McCreary & Associates, LLC 38

http://perspectives.mvdirona.com/CommentView,guid,afe46691-a293-4f9a-8900-5688a597726a.aspx

Page 39: The CIOs Guide to NoSQL 2012

M

D

Different Thinking

Sequential Processing

• The output of any step can be used in the next step

• State must be carefully managed

Parallel Processing

• Each loop of XQuery FLOWR statements are independent thread (no side-effects)

39 Kelly-McCreary & Associates, LLC

Page 40: The CIOs Guide to NoSQL 2012

M

D

Cloud Computing

• High scalability

– Especially in the horizontal direction (multi

CPUs)

• Low administration overhead

– Simple web page administration

40 Kelly-McCreary & Associates, LLC

Page 41: The CIOs Guide to NoSQL 2012

M

D

Databases work well in the cloud

• Data warehousing specific databases for batch data processing and map/reduce operations

• Simple, scalable and fast key/value-stores

• Databases containing a richer feature set than key/value-stores fitting the gap with traditional

• RDBMS while offering good performance and scalability properties (such as document databases).

41 Kelly-McCreary & Associates, LLC

Page 42: The CIOs Guide to NoSQL 2012

M

D

Auto-Sharding

• When one database gets almost full it tells a "coordinator" system

and the data automatically gets migrated to other systems

• Systems have "Partition Tolerance"

Copyright Kelly-McCreary & Associates, LLC 42

Warning Disk Full!

Time to "Shard" Before: one disk 90% full:

After: two disks 45% full:

Page 43: The CIOs Guide to NoSQL 2012

M

D

Brewer's CAP Theorem

Consistency

Availability Partition Tolerance

43 Kelly-McCreary & Associates, LLC

You can not

have all three

so pick two!

Page 44: The CIOs Guide to NoSQL 2012

M

D

Migrating to Partition Tolarance

Copyright Kelly-McCreary & Associates, LLC 44

Consistency

Availability Partition Tolerance

CP

AP

CA RDBMS

Page 45: The CIOs Guide to NoSQL 2012

M

D

Scale Up vs. Scale Out

Scale Up • Make a single CPU as fast as

possible

• Increase clock speed

• Add RAM

• Make disk I/O go faster

Scale Out • Make Many CPUs work

together

• Learn how to divide your problems into independent threads

Copyright Kelly-McCreary & Associates, LLC 45

Page 46: The CIOs Guide to NoSQL 2012

M

D

Sample of NO-SQL Systems

46

Copyright 2010 Dan McCreary & Associates

Document Stores Key-Value Stores

Graph Stores

XML

Object Stores

Column Stores

Memcache

Page 47: The CIOs Guide to NoSQL 2012

M

D

If you can't beat them…

Kelly-McCreary & Associates, LLC 47

Page 48: The CIOs Guide to NoSQL 2012

M

D

Key Value Stores

• A table with two columns

and a simple interface

– Add a key-value

– For this key, give me the

value

– Delete a key

• Blazingly fast and easy to

scale

Copyright Kelly-McCreary & Associates, LLC 48

Key Value

Page 49: The CIOs Guide to NoSQL 2012

M

D

Types of Key-Value Stores

• Eventually‐consistent Key‐Value store

• Hierarchical Key-Value Stores

• Key-Value Stores In RAM

• Key Value Stores on Disk

• Ordered Key-Value Stores

Copyright Kelly-McCreary & Associates, LLC 49

Page 50: The CIOs Guide to NoSQL 2012

M

D

Cassendra

• Apache open source project

• Originally developed by Facebook

• Designed for highly distributed high-

reliable systems

• No single point of failure

• Column-family data model

Copyright Kelly-McCreary & Associates, LLC 50

http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf

Page 51: The CIOs Guide to NoSQL 2012

M

D

MongoDB

• Open Source License

• Document/Collection centric

• Sharding built-in, automatic

• Stores data in JSON format

• Query language is JSON

• Can be 10x faster than MySQL

• Many languages (C++, JavaScript, Java,

Perl, Python etc.)

Copyright Kelly-McCreary & Associates, LLC 51

Page 52: The CIOs Guide to NoSQL 2012

M

D

Hadoop/Hbase

• Open source implementation of MapReduce algorithm written in Java

• Initially created by Yahoo

– 300 person-years development

• Column-oriented data store similar to Google's BigTable

• Java interface

• H-Base designed specifically to work with Hadoop and the Hadoop file system

Copyright Kelly-McCreary & Associates, LLC 52

Page 53: The CIOs Guide to NoSQL 2012

M

D

CouchDB

• Commercial Company

• Apache Project

• Written in ERLANG

• RESTful JSON API

• Distributed, featuring robust, incremental replication with bi-directional conflict detection and management

Copyright Kelly-McCreary & Associates, LLC 53

Page 54: The CIOs Guide to NoSQL 2012

M

D

Memcached

• Free & open source in-memory caching system

• Designed to speeding up dynamic web applications by alleviating database load

• RAM resident key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering

• Simple interface

• Designed for quick deployment, ease of development

• APIs in many languages

Copyright Kelly-McCreary & Associates, LLC 54

Page 55: The CIOs Guide to NoSQL 2012

M

D

MarkLogic

• Native XML database designed to used by

Petabyte data stores

• ACID compliant

• Role-based access control

• Heavy use by federal agencies, document

publishers and "high-variability" data

• Arguably the most successful NoSQL

company

Copyright Kelly-McCreary & Associates, LLC 55

Page 56: The CIOs Guide to NoSQL 2012

M

D

eXist

• OpenSource native XML database

• Strong support for XQuery and XQuery

extensions

• Heavily used by the Text Encoding Initiative

(TEI) community and XRX/XForms communities

• Ideal for metadata management

• Integrated Lucene search and structured search

Copyright Kelly-McCreary & Associates, LLC 56

Page 57: The CIOs Guide to NoSQL 2012

M

D

Riak

• Community and Commercial licenses

• A "Dynamo-inspired" database

• Written in ERLANG

• Query JSON or ERLANG

Copyright Kelly-McCreary & Associates, LLC 57

Page 58: The CIOs Guide to NoSQL 2012

M

D

Hypertable

• Open Source

• Closely modeled after Google's Bigtable project

• High performance distributed data storage system

• Designed to support applications requiring maximum performance, scalability, and reliability

• Hypertable Query Language (HQL) that is syntactically similar to SQL

Copyright Kelly-McCreary & Associates, LLC 58

Page 59: The CIOs Guide to NoSQL 2012

M

D

Selecting a NoSQL Pilot Project

• The "Goldilocks Pilot

Project Strategy"

• Not to big, not to

small, just the right

size

• Duration

• Sponsorship

• Importance

• Skills

• Mentorship

59

Copyright 2010 Dan McCreary & Associates

Page 60: The CIOs Guide to NoSQL 2012

M

D

The Future of the NoSQL Movement

• Will data sets continue to grow at exponential rates?

• Will new system options become more diverse?

• Will new markets have different demands?

• Will some ideas be "absorbed" into existing RDBMS vendors products?

• Will the NoSQL community continue to be the place where new database ideas and products are incubated?

• Will the job of doing high-quality architectural tradeoffs analysis become easier?

Copyright Kelly-McCreary & Associates, LLC 60

Growth Diversity

Page 61: The CIOs Guide to NoSQL 2012

M

D

Start Finish

Using the Wrong Architecture

Credit: Isaac Homelund – MN Office of the Revisor

Page 62: The CIOs Guide to NoSQL 2012

M

D

Using the Right Architecture

Start Finish

Find ways to remove barriers to empowering the non programmers on your team.

Page 63: The CIOs Guide to NoSQL 2012

M

D

Questions

Dan McCreary

President, Kelly-McCreary & Associates

[email protected]

63 Kelly-McCreary & Associates, LLC


Recommended