Post on 06-May-2015
description
transcript
Running Cassandra in AWS
Patrick Eaton, PhDpatrick@stackdriver.com@PatrickREaton
Joey Imbascianojoey@stackdriver.com@_joeyi
Stackdriver at a Glance
Stackdriver's hosted intelligent monitoring service helps SaaS companies innovate more by reducing the burden of day-to-day operations● Cloud-native and cloud-aware● Designed for complex distributed applications● Founded by cloud/infrastructure industry veterans
(Microsoft, VMware, EMC, Endeca, Red Hat) with deep systems and DevOps expertise
● Team of ~25, based in Downtown Boston
Intelligent MonitoringDiscover customer’s cloud-hosted applications● Infrastructure inventory● Logical units, like groups/clusters● Services, hosted and self-managed● Elastic resources
Monitor● Various data sources
● Provider metrics● Host metrics● Custom metrics● Endpoints● Events● Health
● Rich visualizations
Analyze● Integrate data sources● Aggregate metrics● Report utilization, cost, etc.● Detect policy violations● Recommend actions
Lambda Architecture
● Typical of modern architectures for on-line applications.
● Formalized by Nathan Marz● Composed of "batch", "speed", and "serving" layers● Batch layer
○ Store of record○ Compute arbitrary views
● Speed layer○ Low latency updates○ Streaming algorithms
● Serving layer○ Combine data from batch and speed layers to
answer queries
Speed Batch
Data
Serving
Stackdriver Architecture
● Shares characteristics of lambda architecture● Indexing (speed) path
○ Make "live" data available "pre-analysis"● Analysis (batch) path
○ Compute aggregations○ Create recommendations
● Query (serving) layer○ Combine "live" and analyzed
data to answer queries○ May require on-the-fly analysis
● Alerting (speed) path (not discussed here)○ Stream processing to detect
policy-based anomalies
Database
Data
Query(Serving)
Analysis(Batch)
Indexing(Speed)
Alerting(Speed)
Notification(Serving)
Database Options
● We chose Cassandra!○ True P2P architecture○ Good support for write-heavy workloads○ Compatible data model for time series data
■ Column per metric type, timestamps as columns● Why not MySQL?
○ Experience with operating large, sharded deployments○ Relational data model not a good match
● Why not HBase?○ Operational complexity - zk, hadoop, hdfs, ...○ Special "Master" role
● Why not Dynamo?○ Avoid vendor lock-in and high cost
Stackdriver Architecture ++
● Archival pipeline stores all data● Very small surface area, battle-tested● Critical for disaster recovery● S3 considered durable enough● Replicated for availability
● Archive means Cassandra is "soft state"● C* consolidates analysis and indexing results● Properties of data in C*
● Immutable data● Append-only● Read-1, write-1 consistency
● Scales out easily● Indexers, archivers, analyzers, query servers
Analyze
ArchiveIndex
S3
Roll-upsAnalysis
Recs
InventoryData Series
Data
Query
Cassandra
Cassandra at Stackdriver Cluster Configuration
● Version: Datastax Community Edition 1.2.10● Replication Factor: 3● Vnodes● Murmur3Partitioner● Ec2Snitch
○ Aids in request efficiency○ Enables Cassandra to ensure replicas are in
different Availability Zones● phi_convict_threshold: 8 -> 12
○ Used to determine when nodes are down○ AWS network can be spotty
Cassandra Topology in AWS
1
us-east-1a
3
us-east-1c
2
us-east-1b
Where we started...
Keep it balanced!
us-east-1a
us-east-1cus-east-1b
Where we are...
Cassandra EC2 Node Configuration
● m1.xlarge ○ 4 cores○ 15 GB RAM○ 4 ephemeral disks available
● 4 disks RAID-0 for Data Volume and CommitLog○ ext4 - defaults,noatime○ mdadm RAID-0○ Compactions○ Heavy Read/Write IO
Cassandra Automation and Operations
● Combination of Boto, Fabric, & Puppet○ Boto for AWS API○ Fabric + Puppet for Bootstrapping○ Fabric for Operations
● One command to:○ Launch a new cluster○ Upsize a cluster○ Replace a dead node○ Remove existing nodes○ List nodes in a cluster
Our (Internal) Slogan
Cassandra Backups using S3
● No Cassandra Powered Backups● Restore from S3● Useful for major version upgrades
S3Bulk Loader
Map Reduce CassandraData
1. Data is archived when it is received2. Bulk loader reads from S33. M/R re-analyzes data4. Cassandra is repopulated
Disaster Recover in the Wild
● October 23, Stackdriver suffered a total loss of our C* cluster● Exhausted memory due to number of open file descriptors (see graph)
● We did not notice the problem until it was too late● Nodes began crashing, resulted in inconsistent view of the ring
● Attempted to restart the cluster unsuccessfully for ~2 hours● Provisioned new 36 node cluster in ~2 hours● Directed “live” data to new cluster● Started bulk restore operation from archive
● Full-fidelity data and aggregations● No data loss due to archival pipeline● See http://www.stackdriver.com/post-mortem-october-23-stackdriver-outage/
Cluster Restoration Process
UIUI
UI
UIUIAPI
S3Bulk Loader
Map Reduce
UIUI
Gateway
Historical Data
New Data
New Cluster
Old Cluster
Thank you!
Yes, we are hiring!
Patrick Eaton - patrick@stackdriver.com - @PatrickREatonJoey Imbasciano - joey@stackdriver.com - @_joeyi