Running Cassandra in AWS

transcript

Patrick Eaton, PhDpatrick@stackdriver.com@PatrickREaton

Joey Imbascianojoey@stackdriver.com@_joeyi

Stackdriver at a Glance

Stackdriver's hosted intelligent monitoring service helps SaaS companies innovate more by reducing the burden of day-to-day operations● Cloud-native and cloud-aware● Designed for complex distributed applications● Founded by cloud/infrastructure industry veterans

(Microsoft, VMware, EMC, Endeca, Red Hat) with deep systems and DevOps expertise

● Team of ~25, based in Downtown Boston

Intelligent MonitoringDiscover customer’s cloud-hosted applications● Infrastructure inventory● Logical units, like groups/clusters● Services, hosted and self-managed● Elastic resources

Monitor● Various data sources

● Provider metrics● Host metrics● Custom metrics● Endpoints● Events● Health

● Rich visualizations

Analyze● Integrate data sources● Aggregate metrics● Report utilization, cost, etc.● Detect policy violations● Recommend actions

Lambda Architecture

● Typical of modern architectures for on-line applications.

● Formalized by Nathan Marz● Composed of "batch", "speed", and "serving" layers● Batch layer

○ Store of record○ Compute arbitrary views

● Speed layer○ Low latency updates○ Streaming algorithms

● Serving layer○ Combine data from batch and speed layers to

answer queries

Speed Batch

Serving

Stackdriver Architecture

● Shares characteristics of lambda architecture● Indexing (speed) path

○ Make "live" data available "pre-analysis"● Analysis (batch) path

○ Compute aggregations○ Create recommendations

● Query (serving) layer○ Combine "live" and analyzed

data to answer queries○ May require on-the-fly analysis

● Alerting (speed) path (not discussed here)○ Stream processing to detect

policy-based anomalies

Database

Query(Serving)

Analysis(Batch)

Indexing(Speed)

Alerting(Speed)

Notification(Serving)

Database Options

● We chose Cassandra!○ True P2P architecture○ Good support for write-heavy workloads○ Compatible data model for time series data

■ Column per metric type, timestamps as columns● Why not MySQL?

○ Experience with operating large, sharded deployments○ Relational data model not a good match

● Why not HBase?○ Operational complexity - zk, hadoop, hdfs, ...○ Special "Master" role

● Why not Dynamo?○ Avoid vendor lock-in and high cost

Stackdriver Architecture ++

● Archival pipeline stores all data● Very small surface area, battle-tested● Critical for disaster recovery● S3 considered durable enough● Replicated for availability

● Archive means Cassandra is "soft state"● C* consolidates analysis and indexing results● Properties of data in C*

● Immutable data● Append-only● Read-1, write-1 consistency

● Scales out easily● Indexers, archivers, analyzers, query servers

Analyze

ArchiveIndex

Roll-upsAnalysis

InventoryData Series

Cassandra

Cassandra at Stackdriver Cluster Configuration

● Version: Datastax Community Edition 1.2.10● Replication Factor: 3● Vnodes● Murmur3Partitioner● Ec2Snitch

○ Aids in request efficiency○ Enables Cassandra to ensure replicas are in

different Availability Zones● phi_convict_threshold: 8 -> 12

○ Used to determine when nodes are down○ AWS network can be spotty

Cassandra Topology in AWS

us-east-1a

us-east-1c

us-east-1b

Where we started...

Keep it balanced!

us-east-1a

us-east-1cus-east-1b

Where we are...

Cassandra EC2 Node Configuration

● m1.xlarge ○ 4 cores○ 15 GB RAM○ 4 ephemeral disks available

● 4 disks RAID-0 for Data Volume and CommitLog○ ext4 - defaults,noatime○ mdadm RAID-0○ Compactions○ Heavy Read/Write IO

Cassandra Automation and Operations

● Combination of Boto, Fabric, & Puppet○ Boto for AWS API○ Fabric + Puppet for Bootstrapping○ Fabric for Operations

● One command to:○ Launch a new cluster○ Upsize a cluster○ Replace a dead node○ Remove existing nodes○ List nodes in a cluster

Our (Internal) Slogan

Cassandra Backups using S3

● No Cassandra Powered Backups● Restore from S3● Useful for major version upgrades

S3Bulk Loader

Map Reduce CassandraData

1. Data is archived when it is received2. Bulk loader reads from S33. M/R re-analyzes data4. Cassandra is repopulated

Disaster Recover in the Wild

● October 23, Stackdriver suffered a total loss of our C* cluster● Exhausted memory due to number of open file descriptors (see graph)

● We did not notice the problem until it was too late● Nodes began crashing, resulted in inconsistent view of the ring

● Attempted to restart the cluster unsuccessfully for ~2 hours● Provisioned new 36 node cluster in ~2 hours● Directed “live” data to new cluster● Started bulk restore operation from archive

● Full-fidelity data and aggregations● No data loss due to archival pipeline● See http://www.stackdriver.com/post-mortem-october-23-stackdriver-outage/

Cluster Restoration Process

UIUIAPI

S3Bulk Loader

Map Reduce

Gateway

Historical Data

New Data

New Cluster

Old Cluster

Thank you!

Yes, we are hiring!

Patrick Eaton - patrick@stackdriver.com - @PatrickREatonJoey Imbasciano - joey@stackdriver.com - @_joeyi

Running Cassandra in AWS

Technology