Date post: | 11-May-2015 |
Category: |
Data & Analytics |
Upload: | koverse-inc |
View: | 388 times |
Download: | 0 times |
NoSQL Databases and Analytic Use Cases
Aaron Cordova INFORMS
NoSQL
• Perhaps better is “Non-Relational”
• Departure from conventional relational db
• Trade traditional features for simplicity, scalability, flexibility
Types of NoSQL DBs
Columnar!!
BigTable Hbase
Accumulo Cassandra
Graph!!
Neo4j OrientDB
Key-Value !
Dynamo Riak
Voldemort BerkeleyDB
Document!!
MongoDB CouchDB
MarkLogic (XML)
Trades
Give up!!
Cross-row Transactions Relational JOINS Type Checking
SQL
Gain!!
Simplicity Scalability (distributed)
Schema Flexibility Geographic distribution
Programmatic APIs
NoSQL Distributed
Name Age Phone
Bob 43 555-1212
Jenny 32 555-1213
Sally 28 555-1214
Joe 45 555-1215
Up to Petabytes
Consistency
Name Age Phone
Bob 43 555-1212
Jenny 32 555-1213
Sally 28 555-1214
Joe 45 555-1215
Name Age Phone
Bob 43 555-1212
Jenny 32 867-5309
Sally 28 555-1214
Joe 45 555-1215
Name Age Phone
Bob 43 555-1212
Jenny 32 555-1213
Sally 28 555-1214
Joe 45 555-1215
X
Multiple Data Centers
Single Data Center
Consistency
Geographically Distributed, !
Eventually Consistent!!
Dynamo Riak
Voldemort Cassandra MongoDB CouchDB
Single Data Center, Highly Consistent!
!BigTable Hbase
Accumulo Cassandra
Neo4j OrientDB MongoDB
MarkLogic (XML)
Programmability
SQLObjects DB
Objects DB
VS
Programmability
MongoDBWeb Client Javascript
Node.js server JavascriptJSON JSON
Analytics
Analytics
Analytical DB
Operational DB
Operational DB
Operational DB
Business Activity
Business Intelligence
Updates, transactions
Denormalized, Aggregations
Analytics
OLAP
OLTP
OLTP
OLTP
Business Activity
Business Intelligence ETL
Schema knowledge
Joins happen here
Analytics
NoSQL DB
OLTP
OLTP
OLTP
Business Activity
Business Intelligence ?
NoSQL and Analytics
• Importing operational data can create a scale problem
• Combining operational data can create sparse data
• Operational schemas may change
NoSQL and Analytics
Scalability, Schema Flexibility
Full Outer Join
Cust.name Cust.age Orders.shoes Facebook.likes …
Bob 43 $50 - …
Sarah 32 $25 5/5/14 …
Sally 28 - 4/3/12 …
- - $35 11/1/13 …
- - - 9/24/12 …
Joe 45 $45 - …
… … … … …
Billions of rows
Thousands of columns
Sparse
BigTable Data Model
Row ID Column Value
R000 Cust.name Bob
R000 Cust.age 43
R000 Orders.shoes $50
R002 Cust.name Sally
R002 Cust.age 32
R002 Facebook.likes 4/3/12
… … …
MongoDB Data Model{ !! Cust.name: “Bob”,!! Cust.age: 43,!! Orders.shoes: $50!},!{!! Cust.name: “Sally”,!! Cust.age: 32,!! Facebook.likes: 4/3/12!},!…!
NoSQL Data Loading Shift
NoSQL Analytics!!
Composite, Sparse Schemas Scale out
Aggressive Indexing Data Discovery
Conventional BI!!
Data cleaning Regularization
Denormalization Star Schema
Known operational Schemas
Analytics
NoSQL DB
OLTP
OLTP
OLTP
Business Activity
Business Intelligence
Schema Discovery
Joins happen here
NoSQL Analytics Shift
Transformations!!
MapReduce Pre-computed Large answers Simple Lookups
Queries!!
SQL Computed on the fly
Small answers Roll up
Drill down
Analytics
NoSQL DB
OLTP
OLTP
OLTP
Business Activity
Business Intelligence
MapReduce
Transformations
Fast Lookups
MapReduce Analytics
Supported!!
SQL (Hive) Statistical Modeling Machine Learning
Text Analytics Feature Extraction Image Processing
Graph Analysis
MapReduce Analytic WorkflowReusable
Transforms
SearchableCollections
Combined-Data Security
Requirements!!
Physically co-located data Strong logical access control
Role-based
Questions
?