+ All Categories
Page 1: Taming the Big Data Fire Hose

the NewSQL database you’ll never outgrow

Taming the Big DataFire Hose

John HuggSr. Software Engineer, VoltDB

Page 2: Taming the Big Data Fire Hose

VoltDB 2

Big Data Defined

Velocity+ Moves at very high rates (think sensor-driven systems)+ Valuable in its temporal, high velocity state

Volume+ Fast-moving data creates massive historical archives+ Valuable for mining patterns, trends and relationships

Variety+ Structured (logs, business transactions)+ Semi-structured and unstructured

Page 3: Taming the Big Data Fire Hose

VoltDB 3

Lower-frequency operations

High-frequency operations


Example Big Data Use Cases

Capital markets Write/index all trades, store tick data

Show consolidated risk across traders

Call initiation request Real-time authorization Fraud detection/analysis

Inbound HTTP requests

Visitor logging, analysis, alerting Traffic pattern analytics

Online gameRank scores:•Defined intervals•Player “bests”

Leaderboard lookups

Real-time ad trading systems

Match form factor, placement criteria, bid/ask

Report ad performance from exhaust stream

Mobile device location sensor

Location updates, QoS, transactions Analytics on transactions

Page 4: Taming the Big Data Fire Hose

VoltDB 4

Big Data and You

Incoming data streams are different than traditional business apps

+ You need to write data quickly and reliably, but …

It’s not just about high speed writes+ You need to validate in real-time+ You need to count and aggregate+ You need to analyze in real-time+ You need to scale on demand+ You may need to transact

Big Data and You

Page 5: Taming the Big Data Fire Hose

VoltDB 5

Big Data Management Infrastructure

Online gaming





SaaS,Web 2.0



Structured data ACID guarantees Relational/SQL Real-time analytics


Unstructured data Eventual consistency Schemaless KV, document


Other OLAPdata stores


High Velocity High Volume

Page 6: Taming the Big Data Fire Hose

VoltDB 6

Big Data Management Infrastructure

Online gaming





SaaS,Web 2.0





Other OLAPdata stores


High Velocity High Volume

Page 7: Taming the Big Data Fire Hose

High VelocityData Management

Page 8: Taming the Big Data Fire Hose

VoltDB 8

High Velocity DBMS Requirements

Ingest at very high speeds and rates Scale easily to meet growth and demand peaks Support integrated fault tolerance Support a wide range of real-time (or “near-time”)

analytics Integrate easily with high volume analytic datastores

Page 9: Taming the Big Data Fire Hose

VoltDB 9

High Speed Data Ingestion

Support millions of write operations per second at scale

Read and write latencies below 50 milliseconds Provide ACID-level consistency guarantees (maybe) Support one or more well-known application

interfaces+ SQL+ Key/Value+ Document

Page 10: Taming the Big Data Fire Hose

VoltDB 10

Scale to Meet Growth and Demand

Scale-out on commodity hardware Built-in database partitioning

+ Manual sharding and/or add-on solutions are brittle, require apps to do “heavy lifting”, and can be an operational nightmare

Database must automatically implement defined partitioning strategy

+ Application should “see” a single database instance

Database should encourage scalability best practices+ For example, replication of reference data minimizes need for

multi-partition operations

Page 11: Taming the Big Data Fire Hose

VoltDB 11

A Look Inside Partitioning

1 101 21 101 34 401 2

1 knife2 spoon3 fork

Partition 1

2 201 15 501 35 502 2

1 knife2 spoon3 fork

Partition 2

3 201 16 601 16 601 2

1 knife2 spoon3 fork

Partition 3

table orders : customer_id (partition key)(partitioned) order_id


table products : product_id (replicated) product_name

select count(*) from orders where customer_id = 5single-partition

select count(*) from orders where product_id = 3multi-partition

insert into orders (customer_id, order_id, product_id) values (3,303,2)single-partition

update products set product_name = ‘spork’ where product_id = 3multi-partition

Page 12: Taming the Big Data Fire Hose

VoltDB 12

Integrated Fault Tolerance

Database should transparently support built-in “Tandem-style” HA

+ Users should be able to easily increase/decrease fault tolerance levels

Database should be easily and quickly recoverable in the event of severe hardware failures

Database should be able to automatically detect and manage a variety of partition fault conditions

Downed nodes should be “rejoinable” without the need for service windows

Page 13: Taming the Big Data Fire Hose

VoltDB 13

Partition Detection & Recovery

Server A

Server B

Server C

Network fault protectionDetects partition event

Determines which side of fault to disable

Snapshots and disables orphaned node(s)

Server A

Server B

Server C

Live node rejoinAllows “downed” nodes to rejoin live cluster

Automatically re-synchs all node data

Coordinates transactions during re-synch

Page 14: Taming the Big Data Fire Hose

VoltDB 14

Real-time Analytics

Database should support a wide variety of high performance reads

+ High-frequency single-partition+ Lower-frequency multi-partition

Common analytic queries should be optimized in the database

+ Multi-partition aggregations, limits, etc.

Database should accommodate a flexible range of relational data operations

+ Particularly relevant to structured data

Page 15: Taming the Big Data Fire Hose

VoltDB 15

Integration with Analytic Datastores

Database should offer high performance, transactional export

Export should allow a wide variety of common data enrichment operations

+ Normalize and de-normalize+ De-duplicate+ Aggregate

Architecture should support loosely-coupled integrations

+ Impedance mismatches+ Durability

Page 16: Taming the Big Data Fire Hose

VoltDB 16

VoltDB Export Data Flow

Loosely-coupled, asynchronous Queue must be durable Bi-directional durability

High VelocityDatabase Cluster

Page 17: Taming the Big Data Fire Hose

VoltDB 17


Big Data infrastructures will usually require more than one engine

+ High velocity engine for “fast” data+ Analytic engine for “deep” data

Data characteristics will often determine which high velocity engine to use

+ NewSQL is often well-suited to structured data+ NoSQL is often a good fit for unstructured data

Choose solutions that suit your needs and are designed for interoperability

Top Related