Date post: | 20-Aug-2015 |
Category: |
Technology |
Upload: | dataversity |
View: | 1,890 times |
Download: | 1 times |
The CIO's Guide to
NoSQL
Dan McCreary
July 12th 2012 Version 6
M
D
Agenda
• What is NoSQL?
• What Triggered the NoSQL Movement?
• How is NoSQL distinct from Big Data and Cloud
Computing?
• Common Characteristics of NoSQL System
• Business Benefits of NoSQL
• Core NoSQL Concepts
• Selected NoSQL Implementations
• Recent NoSQL Developments
• Selecting the Right NoSQL System
• Next Step: Selecting the Right NoSQL Pilot Project
Copyright Kelly-McCreary & Associates, LLC 2
M
D
Manning NoSQL Books
Kelly-McCreary & Associates, LLC 3
M
D
Background for Dan McCreary
• Bell Labs
• NeXT Computer (Steve Jobs)
• Owner of Custom Object-Oriented Software Consultancy
• Federal data integration (National Information Exchange Model)
• Native XML/XQuery – 2006
• Advocate of NoSQL/XRX systems
• Working with Manning Publications on NoSQL Topic
Copyright Kelly-McCreary & Associates, LLC 4
M
D
NoSQL Definition
The NoSQL movement is a set of concepts
and technologies that allow the rapid and
efficient processing of large data sets with a
focus on performance and resiliency.
Copyright Kelly-McCreary & Associates, LLC 5
M
D
Sample of NoSQL Jargon
Document orientation
Schema free
MapReduce
Horizontal scaling
Sharding and auto-sharding
Brewer's CAP Theorem
Consistency
Reliability
Partition tolerance
Single-point-of-failure
Object-Relational mapping
Key-value stores
Column stores
Document-stores
Memcached
6 Copyright Kelly-McCreary & Associates, LLC
Indexing
B-Tree
Configurable durability
Documents for archives
Functional programming
Document Transformation
Document Indexing and Search
Alternate Query Languages
Aggregates
OLAP
XQuery
MDX
RDF
SPARQL
Architecture Tradeoff Modeling
ATAM
Note that within the context of NoSQL many
of these terms have different meanings!
M
D
Selecting a Database…
"Selecting the right data storage solution is
no longer a trivial task."
Copyright Kelly-McCreary & Associates, LLC 7
Does it look like
document?
Use Microsoft Office
Use the RDBMS
Start
Stop
No
Yes
M
D
Pressures on SQL Only Systems
Copyright Kelly-McCreary & Associates, LLC 8
SQL OLAP/BI/Data Warehouse
Social Networks
Scalability
Agile Schema
Free
M
D
Simplicity is a Virtue
• Many systems derive their strength by dramatically limiting the features in their system
• Simplicity allows database designers to focus on the primary business driver
• Examples: – Touch screen interfaces
– Key-value data stores
Copyright Kelly-McCreary & Associates, LLC 9
M
D
Historical Context
Mainframe Era
• 1 CPU
• COBOL and FORTRAN
• Punchcards and flat files
• $10,000 per CPU hour
MapReduce Era
• 10,000 CPUs
• Functional programming
• MapReduce "server farms"
• Pennies per CPU hour
Copyright Kelly-McCreary & Associates, LLC 10
M
D
Two Approaches to Computation
11 Copyright 2010 Dan McCreary & Associates
Alonzo Church John Von Neumann
Manage state with a program counter. Make computations act like math functions.
Which is simpler? Which is cheaper? Which will scale to 10,000 CPUs?
1930s and 40s
M
D
Standard vs. MapReduce Prices
Copyright Kelly-McCreary & Associates, LLC 12
http://aws.amazon.com/elasticmapreduce/#pricing
John's Way Alonzo's Way
M
D
MapReduce CPUs Cost Less!
0
5
10
15
20
25
30
35
40
StandardCPU
MapReduceCPU
Cost Per CPU Hour (Cents)
Copyright Kelly-McCreary & Associates, LLC 13
http://aws.amazon.com/elasticmapreduce/#pricing
Cuts cost from 32 to 6 cents per CPU hour! Perhaps Alanzo was right!
Why? (hint: how "shareable" is this process)
M
D
Perspectives
Kelly-McCreary & Associates, LLC 14
Native XML
OLAP MDX
Object Stores
Graph Stores
NoSQL for Web 2.0
and BigData
Perspective depends on your context
M
D
Architectural Tradeoffs
Kelly-McCreary & Associates, LLC 15
"I want a fast car with good mileage."
"I want a scaleable database with low cost that runs
well on the 1,000 CPUs in our data center."
M
D
NoSQL on Google Trends
16 Kelly-McCreary & Associates, LLC
!
M
D
Recent History
• The term NoSQL became re-popularized around 2009
• Used for conferences of advocates of non-relational databases
• Became a contagious idea "meme"
• First of many "NoSQL meetups" in San Francisco organized by Jon Oskarsson
• Conversion from "No SQL" to "Not Only SQL" in recent year
17 Kelly-McCreary & Associates, LLC
M
D
NoSQL and Web 2.0 Startups
• Many web 2.0 startups did not use Oracle
or MySQL
• They built their own data stores influenced
by Amazon’s Dynamo and Google’s
BigTable in order to store and process
huge amounts of data
• In the social community or cloud
computing applications, most of these data
stores became OpenSource software
18 Kelly-McCreary & Associates, LLC
M
D
Google MapReduce
• 2004 paper that had huge impact of functional programming in the entire community
• Copied by many organizations, including Yahoo
Copyright Kelly-McCreary & Associates, LLC 19
M
D
Google Bigtable Paper
• 2006 paper that gave focus to scaleable databases
• designed to reliably scale to petabytes of
data and thousands of machines
Copyright Kelly-McCreary & Associates, LLC 20
M
D
Amazon's Dynamo Paper
• Werner Vogels
• CTO - Amazon.com
• October 2, 2007
• Used to power Amazon's S3 service
• One of the most influential papers in the NoSQL movement
• Service in 2012
Copyright Kelly-McCreary & Associates, LLC 21
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin,
Swami Sivasubramanian, Peter Vosshall and Werner Vogels, “Dynamo: Amazon's Highly Available Key-Value Store”,
in the Proceedings of the 21st ACM Symposium on Operating Systems Principles, Stevenson, WA, October 2007.
M
D
NoSQL "Meetups"
“NoSQLers came to share how they had
overthrown the tyranny of slow, expensive
relational databases in favor of more
efficient and cheaper ways of managing
data.”
22 Kelly-McCreary & Associates, LLC
Computerworld magazine, July 1st, 2009
M
D
Key Motivators
• Licensing RDBMS on multiple CPUs
• The Thee "V"s
– Velocity – lots of data arriving fast
– Volume – web-scale BigData
– Variability – many exceptions
• Desire to escape rigid schema design
• Avoidance of complex Object-Relational Mapping (the "Vietnam" of computer science)
23 Kelly-McCreary & Associates, LLC
M
D
Copyright 2008 Dan McCreary & Associates
24
Many Processes Today Are Driven By…
The constraints of yesterday…
Challenge:
Ask ourselves the question…
Do our current method of solving problems with tabular data…
Reflect the storage of the 1950s…
Or our actual business requirements?
What structures best solve the actual business problem?
M
D
Copyright 2008 Dan McCreary & Associates
25
No-Shredding!
• Relational databases take a single hierarchical document and shred it into many pieces so it will fit in tabular structures
• Document stores prevent this shredding
My
Data
M
D
Copyright 2008 Dan McCreary & Associates
26
Is Shredding Really Necessary?
• Every time you take
hierarchical data and
put it into a traditional
database you have to
put repeating groups in
separate tables and
use SQL “joins” to
reassemble the data
M
D
Object Relational Mapping
• T1 – HTML into Objects
• T2 –Objects into SQL Tables
• T3 – Tables into Objects
• T4 – Objects into HTML
T1
T3
T2
T4
Object Middle Tier
Relational Database
Web Browser
27 Kelly-McCreary & Associates, LLC
M
D
"The Vietnam of Applications"
• Object-relational mapping has become one of the most complex components of building applications today
• A "Quagmire" where many projects get lost
• Many "heroic efforts" have been made to solve the problem:
– Hibernate
– Ruby on Rails
• But sometimes the way to avoid complexity is to keep your architecture very simple
Copyright Kelly-McCreary & Associates, LLC 28
M
D
Document Stores Need No Translation
• Documents in the database (JSON or XML)
• Documents in the application
• No object middle tier
• No "shredding"
• No reassembly
• Simple!
29
Copyright 2010 Dan McCreary & Associates
Application Layer Database
Document Document
M
D
The XML "Full Stack"
• XML lives in the web browser (XForms)
• REST interfaces
• XML in the database (Native XML, XQuery)
• XRX Web Application Architecture
• No translation!
30
Copyright 2010 Dan McCreary & Associates
Web Browser XML database
XForms REST-Interfaces
M
D
"Schema Free"
• Systems that automatically determine how to
index data as the data is loaded into the
database
• No a priori knowledge of data structure
• No need for up-front logical data modeling
– …but some modeling is still critical
• Adding new data elements or changing data
elements is not disruptive
• Searching millions of records still has sub-
second response time
31
Copyright 2010 Dan McCreary & Associates
M
D
Monoculture and Mono-architecture
32
Copyright 2010 Dan McCreary & Associates
Image Source: Wikipedia
M
D
Eric Evans
“The whole point of seeking alternatives
[to RDBMS systems] is that you need to
solve a problem that relational databases
are a bad fit for.”
Eric Evans Rackspace
33 Kelly-McCreary & Associates, LLC
M
D
Evolution of Ideas in OpenSource
• How quickly can new ideas be recombined into new database products?
• OpenSource software has proved to be the most efficient way to quickly recombine new ideas into new products
Copyright Kelly-McCreary & Associates, LLC 34
Product A
Product B
Product B
OpenSource
Proprietary Software New Database Ideas
Schema-free
MapReduce Auto-sharding
New Products
Cloud Computing
M
D 35 Copyright 2010 Dan McCreary & Associates
Storage Architectural Patterns
Tables Trees
Triples Stars
M
D
Finding the Right Match
36 Copyright 2010 Dan McCreary & Associates
Schema-Free
Mature Query Language
Standards Compliant
Use CMU's Architectural Tradeoff and Modeling (ATAM) Process
M
D
Avoidance of Unneeded Complexity
• Relational databases provide a variety of
features to ALWAYS support strict data
consistency
• Rich feature set and the ACID properties
implemented by RDBMSs might be more
than necessary for particular applications
and use cases
37 Kelly-McCreary & Associates, LLC
M
D
"Once Size Fits…"
"One Size Does Not Fit All"
James Hamilton Nov. 3rd, 2009
Kelly-McCreary & Associates, LLC 38
http://perspectives.mvdirona.com/CommentView,guid,afe46691-a293-4f9a-8900-5688a597726a.aspx
M
D
Different Thinking
Sequential Processing
• The output of any step can be used in the next step
• State must be carefully managed
Parallel Processing
• Each loop of XQuery FLOWR statements are independent thread (no side-effects)
39 Kelly-McCreary & Associates, LLC
M
D
Cloud Computing
• High scalability
– Especially in the horizontal direction (multi
CPUs)
• Low administration overhead
– Simple web page administration
40 Kelly-McCreary & Associates, LLC
M
D
Databases work well in the cloud
• Data warehousing specific databases for batch data processing and map/reduce operations
• Simple, scalable and fast key/value-stores
• Databases containing a richer feature set than key/value-stores fitting the gap with traditional
• RDBMS while offering good performance and scalability properties (such as document databases).
41 Kelly-McCreary & Associates, LLC
M
D
Auto-Sharding
• When one database gets almost full it tells a "coordinator" system
and the data automatically gets migrated to other systems
• Systems have "Partition Tolerance"
Copyright Kelly-McCreary & Associates, LLC 42
Warning Disk Full!
Time to "Shard" Before: one disk 90% full:
After: two disks 45% full:
M
D
Brewer's CAP Theorem
Consistency
Availability Partition Tolerance
43 Kelly-McCreary & Associates, LLC
You can not
have all three
so pick two!
M
D
Migrating to Partition Tolarance
Copyright Kelly-McCreary & Associates, LLC 44
Consistency
Availability Partition Tolerance
CP
AP
CA RDBMS
M
D
Scale Up vs. Scale Out
Scale Up • Make a single CPU as fast as
possible
• Increase clock speed
• Add RAM
• Make disk I/O go faster
Scale Out • Make Many CPUs work
together
• Learn how to divide your problems into independent threads
Copyright Kelly-McCreary & Associates, LLC 45
M
D
Sample of NO-SQL Systems
46
Copyright 2010 Dan McCreary & Associates
Document Stores Key-Value Stores
Graph Stores
XML
Object Stores
Column Stores
Memcache
M
D
If you can't beat them…
Kelly-McCreary & Associates, LLC 47
M
D
Key Value Stores
• A table with two columns
and a simple interface
– Add a key-value
– For this key, give me the
value
– Delete a key
• Blazingly fast and easy to
scale
Copyright Kelly-McCreary & Associates, LLC 48
Key Value
M
D
Types of Key-Value Stores
• Eventually‐consistent Key‐Value store
• Hierarchical Key-Value Stores
• Key-Value Stores In RAM
• Key Value Stores on Disk
• Ordered Key-Value Stores
Copyright Kelly-McCreary & Associates, LLC 49
M
D
Cassendra
• Apache open source project
• Originally developed by Facebook
• Designed for highly distributed high-
reliable systems
• No single point of failure
• Column-family data model
Copyright Kelly-McCreary & Associates, LLC 50
http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf
M
D
MongoDB
• Open Source License
• Document/Collection centric
• Sharding built-in, automatic
• Stores data in JSON format
• Query language is JSON
• Can be 10x faster than MySQL
• Many languages (C++, JavaScript, Java,
Perl, Python etc.)
Copyright Kelly-McCreary & Associates, LLC 51
M
D
Hadoop/Hbase
• Open source implementation of MapReduce algorithm written in Java
• Initially created by Yahoo
– 300 person-years development
• Column-oriented data store similar to Google's BigTable
• Java interface
• H-Base designed specifically to work with Hadoop and the Hadoop file system
Copyright Kelly-McCreary & Associates, LLC 52
M
D
CouchDB
• Commercial Company
• Apache Project
• Written in ERLANG
• RESTful JSON API
• Distributed, featuring robust, incremental replication with bi-directional conflict detection and management
Copyright Kelly-McCreary & Associates, LLC 53
M
D
Memcached
• Free & open source in-memory caching system
• Designed to speeding up dynamic web applications by alleviating database load
• RAM resident key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering
• Simple interface
• Designed for quick deployment, ease of development
• APIs in many languages
Copyright Kelly-McCreary & Associates, LLC 54
M
D
MarkLogic
• Native XML database designed to used by
Petabyte data stores
• ACID compliant
• Role-based access control
• Heavy use by federal agencies, document
publishers and "high-variability" data
• Arguably the most successful NoSQL
company
Copyright Kelly-McCreary & Associates, LLC 55
M
D
eXist
• OpenSource native XML database
• Strong support for XQuery and XQuery
extensions
• Heavily used by the Text Encoding Initiative
(TEI) community and XRX/XForms communities
• Ideal for metadata management
• Integrated Lucene search and structured search
Copyright Kelly-McCreary & Associates, LLC 56
M
D
Riak
• Community and Commercial licenses
• A "Dynamo-inspired" database
• Written in ERLANG
• Query JSON or ERLANG
Copyright Kelly-McCreary & Associates, LLC 57
M
D
Hypertable
• Open Source
• Closely modeled after Google's Bigtable project
• High performance distributed data storage system
• Designed to support applications requiring maximum performance, scalability, and reliability
• Hypertable Query Language (HQL) that is syntactically similar to SQL
Copyright Kelly-McCreary & Associates, LLC 58
M
D
Selecting a NoSQL Pilot Project
• The "Goldilocks Pilot
Project Strategy"
• Not to big, not to
small, just the right
size
• Duration
• Sponsorship
• Importance
• Skills
• Mentorship
59
Copyright 2010 Dan McCreary & Associates
M
D
The Future of the NoSQL Movement
• Will data sets continue to grow at exponential rates?
• Will new system options become more diverse?
• Will new markets have different demands?
• Will some ideas be "absorbed" into existing RDBMS vendors products?
• Will the NoSQL community continue to be the place where new database ideas and products are incubated?
• Will the job of doing high-quality architectural tradeoffs analysis become easier?
Copyright Kelly-McCreary & Associates, LLC 60
Growth Diversity
M
D
Start Finish
Using the Wrong Architecture
Credit: Isaac Homelund – MN Office of the Revisor
M
D
Using the Right Architecture
Start Finish
Find ways to remove barriers to empowering the non programmers on your team.
M
D
Questions
Dan McCreary
President, Kelly-McCreary & Associates
63 Kelly-McCreary & Associates, LLC