+ All Categories
Home > Technology > The Data Mullet: From all SQL to No SQL back to Some SQL

The Data Mullet: From all SQL to No SQL back to Some SQL

Date post: 25-Jun-2015
Category:
Upload: datadogslides
View: 601 times
Download: 1 times
Share this document with a friend
Popular Tags:
70
The Data Mullet From All SQL to No SQL back to Some SQL Alexis Lê-Quôc @alq
Transcript
Page 1: The Data Mullet: From all SQL to No SQL back to Some SQL

The Data MulletFrom All SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

Page 2: The Data Mullet: From all SQL to No SQL back to Some SQL

The Data MulletFrom All SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

Page 3: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

This Talk

• A (mostly) DIRTy Architecture for...

• A new application (datadoghq.com) on a limited budget

• Running on a public cloud

• Focussing on data stores.

Page 4: The Data Mullet: From all SQL to No SQL back to Some SQL

Some context

Page 5: The Data Mullet: From all SQL to No SQL back to Some SQL

Servers

Monitoring

IaaS, PaaS Usage AnalyticsPerf. Management

Apps

Hosting

CDNs Asset Management

SDLC

Ops team Dev team

Page 6: The Data Mullet: From all SQL to No SQL back to Some SQL
Page 7: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

Dev & Ops “collaborate”

Page 8: The Data Mullet: From all SQL to No SQL back to Some SQL

Concretely, what does Datadog do?

Page 9: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

etc.

Page 10: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

Watching real time feeds

Looking for patterns

Constant telemetry

Real-tim

e

Bursty batches

Share

Page 11: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

Data Taxonomy

MetricsUnique visitorsLoadTransaction duration...

EventsConversationsAlertsBuild & Deploys...

Page 12: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

Unit of scale

• 1 source, typically a server

• 100 metrics

• Every 15 s

• 24,000 points per hour

• ~3 bytes per point

• 100 KB/hour, 850 MB/year

• Events

• whenever they occur

• Highest resolution: 1s

• Small payload + metadata

Page 13: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

ACID, BASE & DIRT

• ACID

• http://en.wikipedia.org/wiki/ACID

• BASE

• http://en.wikipedia.org/wiki/Eventual_consistency

• DIRT (Bryan Cantrill at Surge 2010)

• http://dtrace.org/resources/bmc/DIRT.pdf

Page 14: The Data Mullet: From all SQL to No SQL back to Some SQL

Let’s dig some DIRT

Page 15: The Data Mullet: From all SQL to No SQL back to Some SQL

DI-RealTime

Page 16: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

The Consequences of DIRT?Latency

• Data consumed by people (and machines)

• Low end-to-end latency (5-15s)

• Psycho-physiological Factor

• Same order of magnitude as email/SMS*

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.76.2465&rep=rep1&type=pdf*

Page 17: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

The Consequences of DIRT?Concurrency

• Concurrent events & data points show up in sync

• Access Patterns?

• All recent data, e.g. last 24 hours

Page 18: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

The Consequences of DIRT?Tolerance to noise

• Not a System of Record

• “Real-time” decisions

• Drop (some) individual data points rather be late

• Applies to metrics, not events

Page 19: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

Cross here? Or here?

Noise but no Latency Latency but no Noise

Page 20: The Data Mullet: From all SQL to No SQL back to Some SQL

DataIntensive-RT

Page 21: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

The Consequences of DIRT?Storage

• Business Cycles

• Retention Policy > Business Cycle

• E.g. retail, education 12 months

• Elastic Storage

• !CAPEX

Page 22: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

The Consequences of DIRT?Latency

• Datadog, a data exploration app for people

• Looking for patterns

• Ideal: 300 ms round-trip

• Access patterns for long-term data?

• Storage trade-off: precompute oft-used properties

• Run-time Trade-off: want longer timespan, get lower resolution

• != RRD

Page 23: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

Page 24: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

AggregateConstant data influxLarge data sets

Page 25: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

AggregateConstant data influxLarge data sets

Watch & ShareReal-time updatesOn-the-fly data analysis

Page 26: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

AggregateConstant data influxLarge data sets

Look for PatternsOn-demand visualizationBackground data analysis

Watch & ShareReal-time updatesOn-the-fly data analysis

Page 27: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

AggregateConstant data influxLarge data setsBA

SE

DIR

T

Look for PatternsOn-demand visualizationBackground data analysis

Watch & ShareReal-time updatesOn-the-fly data analysis

Page 28: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

AggregateConstant data influxLarge data setsBA

SE

DIR

T

Look for PatternsOn-demand visualizationBackground data analysis

Watch & ShareReal-time updatesOn-the-fly data analysisD

IRT

Page 29: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

AggregateConstant data influxLarge data setsBA

SE

DIR

T

Look for PatternsOn-demand visualizationBackground data analysisBA

SE

Watch & ShareReal-time updatesOn-the-fly data analysisD

IRT

Page 30: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

AggregateConstant data influxLarge data setsBA

SE

DIR

T

Look for PatternsOn-demand visualizationBackground data analysisBA

SE

Watch & ShareReal-time updatesOn-the-fly data analysisD

IRT

Datadog = DIRT + BASE + a tiny bit of ACID

Page 31: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

How It All Fits Together

Page 32: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

The MulletAll SQL in front, NoSQL party in the back

Page 33: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

Actual Stack

Page 34: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

Choices, choices

• 5 axes

• Volume of Data

• Latency

• Ops: wake-up-in-the-middle-of-the-night factor

• Dev: community & tools

• Cost as in “a function of X”

Page 35: The Data Mullet: From all SQL to No SQL back to Some SQL

Choosing Elastic Storage

Page 36: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

Durable, Large-Scale Storage

• Postgres

• Mongo

• Cassandra

• (Riak)

• SciDB

Page 37: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

Durable, Large-Scale Storage

• Postgres

• Itemized data points in a time series are useless

• BLOB management not fun

• Mongo

• Cassandra

• (Riak)

• SciDB

Page 38: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

Durable, Large-Scale Storage

• Postgres

• Mongo

• SciDB

• Cassandra

• (Riak)

Page 39: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

Durable, Large-Scale Storage

• Postgres

• Mongo

• Durability in question in 2010

• SciDB

• Cassandra

• (Riak)

Page 40: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

Durable, Large-Scale Storage

• Postgres

• Mongo

• SciDB

• Very very early

• Cassandra

• (Riak)

Page 41: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

Durable, Large-Scale Storage

• Postgres

• Mongo

• SciDB

• Our pick: Cassandra

• (Riak)

Page 42: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

Cassandra: Volume of Data

• 100s of hosts, 150TB at FB in 2010

• Easy to distribute data, durable quorum writes

Page 43: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

Cassandra: Latency

• < 10ms on writes

• reads more variable (on EC2)*

* More on this in a bit

Page 44: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

Cassandra: Ops

• Release Engineering too aggressive

• ~10 releases since 1/2011 on 0.7 branch

• Good resilience to node loss in the later 0.7 versions

• Annoying idiosyncrasies (cassandra.yaml, predictability of disk use)

Page 45: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

Cassandra: Dev

• Bizarre nomenclature (rows, columns... families?)

• Cumbersome data access

• Limited Semantics when used to SQL

• Good libraries

Page 46: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

Cassandra: Cost

• Ops time

• I/O limits raised by increasing number of nodes

• Thereby increasing costs,

Page 47: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

Riak

• Prototyped out of spite for Cassandra 0.7[0123]

• We ♡ Erlang

• Great folks

• But Cassandra pain subsided, priorities shifted.

• git merge datadog/riak did not happen

Page 48: The Data Mullet: From all SQL to No SQL back to Some SQL

Choosing In-Mem

Page 49: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

In-memory DB

• We started with Redis

• Then we stopped looking :)

Page 50: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

Redis• Volume of Data

• Limited by available RAM, easy partitioning in our case

• Latency

• << 5 ms, dominated by network

• Ops

• Low-maintenance, stable, predictable, replicated, boringly rock-solid

• Dev

• Brilliant, clear docs, simple protocol, oft-used native data structures

• Cost

• ~ cost of RAM on EC2

Page 51: The Data Mullet: From all SQL to No SQL back to Some SQL

Choosing a SQL Data Store

Page 52: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

General-purpose data store

• We ♡ SQL

• Oracle

• Postgres

Page 53: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

Oracle in numbers

• base license 47.5

• clustered db 23

• replication 10

• partitioning 11.5

• analytics 23

• in-mem cache 23

• total: $138,000

Page 54: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

Oracle in numbers

• base license 47.5

• clustered db 23

• replication 10

• partitioning 11.5

• analytics 23

• in-mem cache 23

• total: $138,000

• for 2 cores

• + 22% annual support

• Just in licenses...

Page 55: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

Oracle in numbers

• base license 47.5

• clustered db 23

• replication 10

• partitioning 11.5

• analytics 23

• in-mem cache 23

• total: $138,000

• for 2 cores

• + 22% annual support

• Just in licenses...

Page 56: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

General-purpose data store

• Oracle

• Postgres

Page 57: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

Postgres• Volume of Data

• High GBs, Low TBs

• Latency

• 10-100 ms after EXPLAIN ANALYZE

• Ops

• Low-maintenance, stable, predictable, replicated, boringly rock-solid

• Dev

• Well understood by (a certain class of) engineers

• Cost, a function of storage latency

Page 58: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

Not forgetting...

• VoltDB

• RAM-based, potentially a match for our DIRTy parts

• Stored procedures, an acquired taste

• Home-grow data stores (soon)

• Rainbird

• ...

Page 59: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

The Data Mullet

• All open-source, good if you’re ready to dive in code

• $0 CAPEX

• All OPEX on EC2

Page 60: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

The Data Mullet on EC2

Structural Weakness: I/O latency at moderate throughputs

Page 61: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

One “bad” cassandra query

Page 62: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

Clogging the I/O pipes on EC2

Maximum Average Wait: up to 670 msMaximum Service Time: up to 5 ms

While writing 100 MB/sand reading 30 MB/s

Page 63: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

Another “Bad” Query

DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util03:35:02 PM dev8-80 380 24000 5.7 62 47 130 1.3 4703:35:02 PM dev8-96 370 24000 5.6 63 46 120 1.2 4503:35:02 PM dev8-112 380 24000 5.5 63 46 120 1.2 4603:35:02 PM dev8-128 380 24000 7.2 63 56 150 1.3 50

Average service time in ms

Average wait in ms

Read throughput in sector/sTotal: 46 MB/s

Transfer per secondsConsumer HD: ~75 tps

SSD: 1-30 Ktps

Page 64: The Data Mullet: From all SQL to No SQL back to Some SQL

Mitigation of I/O issues?

Page 65: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

Cassandra: I/O Mitigation

• More nodes, more RAM, more partitions, more parallelism

• $$$

Page 66: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

Postgres: I/O Mitigation

• Scale up to a point

• Replicate

• Move to bare Metal => $$$

• A well-trodden but difficult path

Page 67: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

Better yet...

• Less dependency on low-latency, durable storage

• Move more data to RAM (Redis)

• Archive immutable data

• S3/Cloudfront?

Page 68: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

A digression:Your Very Own Chaos Monkey

• Instances go bye-bye

• https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/741224

• Instances go bye-bye, take 2 (high load)

• https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/708920

Page 69: The Data Mullet: From all SQL to No SQL back to Some SQL

Alexis Lê-Quôc @alq

Takeaway

• By mixing and matching open-source SQL (PG) and NoSQL (Redis, Cassandra) Datadog has been able to quickly & simply get up-and-running with “$0” down payment on infrastructure.

Page 70: The Data Mullet: From all SQL to No SQL back to Some SQL

http://datadoghq.com@datadoghq

Alexis Lê-Quôc @alq


Recommended