Date post: | 18-Dec-2014 |
Category: |
Technology |
Upload: | nati-shalom |
View: | 688 times |
Download: | 0 times |
Complex Analytics with NoSQL Data Store in Real Time
Nested Queries, Projection, Transactions and more
Nati Shalom@natishalom slideshare.net/giganati
What were here to discuss?
Making Sense of the Exploding Data World
How that World Could Look Like if Disk is no Longer the Bottleneck
Live Demo
Making Sense of The Exploding Data World
GB
TB
PB
Dat
a Vo
lum
e
Yr Mo Day Hr Min Sec MS µS
Data MiningMachine Learning
Data Velocity
Data Warehouse High Throughput OLTP
Operational Intelligence
Exploratory Analytics
OLTP
Business Intelligence
Streaming
Capacity and Performance Drives New Data Management Technologies
Let’s Look at Tradeoffs of
Some Selected Solutions
SQL Queries
• Query: SQL • Semantics:
• CRUD• Aggregation• Projection• Partial update
• Performance: 100’s/Sec • Consistency: Transactional• Scaling: Mostly Scale-UP• Availability: Disk Based
NoSQL• Query: Proprietary but rich• Semantics:
• CRUD• Limited Aggregation
(Map/Reduce)• No Projection• No Partial update
• Performance: 1000s/Sec • Consistency: Eventual • Scaling: Mostly Scale-Out• Availability: Based on replication
IMDG • Query: Propriety but rich• Semantics:
• CRUD• Aggregation API +
Map/Reduce• Projection (GigaSpaces)• Partial Update
(GigaSpaces)• Performance: 100k/sec• Consistency: Transactional • Scaling: Mostly Scale-Out• Availability: Replication
Key/Value
• Query: Key, Value• Semantics:
• Mostly Read• No Aggregation• No Projection• No Partial update
• Performance: 1M’s/sec • Consistency: Atomic• Scaling: Mostly Scale-Out• Availability: Limited (varies quite substantially between implementations)
Stream Processing (Storm)
• Semantics– Event driven data processing
• Used for continues updates– No need for a costly “SELECT
FOR UPDATE”
• Performance: 10’sM/sec updates
Spouts
Bolt
Common Assumption
Disk is the bottleneck
2010
Perf
orm
ance
1̂0
2000 2020
CPU Perform
ance = 100X PER DECADE
HDD Latency (Seek & Rotate) = Little Improvement
100X
10,000X
Source: GigaOM Research
Capacity and Performance Drives New Data Management Technologies
(Source: IDC, 2013)
Big Data (Hadoop)
NoSQL
In Memory, Stream Processing
RDBMS
There’s No One Size Fits All
A Typical App Looks Like This..
Front End Analytics
RT
Batch
STORM
The Data Flow Complexity
What if Disk Was no Longer the Bottleneck?
FLASH Closes the CPU to Storage Gap
Our Application Cloud Look Like This..
Front End
High Speed Data Store
(Using Flash/NVM)
Key/Value
SQL
Document
Graph
Transactional
Map/Reduce
Disk Becomes the new Tape
StreamBase
Common Data Store servingMultiple Semantics/API
We're not there yet ..
But..
We can use High Speed Data Bus for Integrating All of our Data Sources
Front End Analytics
RT
Batch
STORM
High Speed Data Bus(Built-In
Caching)
RT Transactional Data Access
Direct Access
RT Streaming
Hadoop Synch
MySQL Synch
Mongo Synch
High Speed Data Bus (Zoom In)
Designed for Transactional and Analytics Scenarios..
Homeland Security
Real Time Search
Social
eCommerce
User Tracking & Engagement
Financial Services
Many API’s – Same Data
Key/Value SQL Document Graph TransactionalMap/Reduce
Let’s take a closer look..
Nested Queries & Projections
Aggregations.
Fast Update …
Remains with strong consistency!
Transactions support
- 1KB object size and uniform distribution- 2 sockets 2.8GHz CPU with total 24 cores, CentOS 5.8, 2 FusionIO SLC PCIe cards RAID- YCSB measurements performed by SanDisk
No Read / 100% Write 100 % Read / No Write0
20
40
60
80
100
120
140
160
62
121
17
56
FDF-GigaSpaces on SSDs Stock GigaSpaces in DRAM
Assumptions: 1TB Flash = $2K; 1TB RAM = $20K
The Performance of RAM at a Cost/Capacity Closer to Disk
ZetaScale-GigaSpaces on SSDsStock GigaSpaces in DRAM
ZetaScale-GigaSpaces
Provides 2x – 3.6x Better TPS/$ 1:50 More Capacity
ZetaScale™ – XAP MemoryXtend
Capacity0
200
400
600
800
1000
1200
20
1000
XAP XAP Extend
1:50
242k Read/Sec
Data is Moving to Cloud
Source: Managing Storage: Trends, Challenges, and Options (2013-2014). (EMC, 2013)
Orchestration needs to be integrated into DataBase solution to make it Cloud Ready
Many API’s Same Data
Data Bus (Integration with Storm)
Built In Orchestration
Demo References
Click on the relevant box to get the demo
Summary
Nati Shalom
Check out the slide on http://www.slideshare.net/giganati