Post on 10-May-2015
transcript
Cassandra at Lithium
Paul Cichonski, Senior Software Engineer
@paulcichonski
2
Lithium?
• Helping companies build social communities for their customers
• Founded in 2001• ~300 customers• ~84 million users • ~5 million unique logins
in past 20 days
3
Use Case: Notification Service
1. Stores subscriptions
2. Processes community events
3. Generates notifications when events match against subscriptions
4. Builds user activity feed out of notifications
4
Notification Service System View
5
The Cluster (v1.2.6)
• 4 nodes, each node:– Centos 6.4– 8 cores, 2TB for commit-log, 3x 512GB SSD
for data
• Average writes/s: 100-150, peak: 2000• Average reads/s: 100, peak: 1500• Use Astyanax on client-side
6
Data Model
7
Data Model: Subscriptions Fulfillment
identifies target of subscription
identifies entity that is subscribed
8
standard_subscription_index row
66edfdb7-6ff7-458c-94a8-
421627c1b6f5:message:13
user:2:creationtimestamp
1390939660
user:53:creationtimestamp
1390939665
user:88:creationtimestamp
1390939670
stored as:
maps to (cqlsh):
9
Data Model: Subscription Display (time series)
10
subscriptions_for_entity_by_time row
66edfdb7-6ff7-458c-94a8-
421627c1b6f5:user:2:0
1390939660:message:131390939665:board:531390939670:label:testl
abel
stored as:
maps to (cqlsh):
11
Data Model: Subscription Display (content browsing)
12
subscriptions_for_entity_by_type row
66edfdb7-6ff7-458c-94a8-
421627c1b6f5:user:2
message:13:creationtimestamp
1390939660
board:53:creationtimestamp
1390939665
label:testlabel:creationtimestamp
1390939670
stored as:
maps to (cqlsh):
13
Data Model: Activity Feed (fan-out writes)
JSON blob representing activity
14
activity_for_entity row
66edfdb7-6ff7-458c-94a8-
421627c1b6f5:user:2:0
1571b680-7254-11e3-8d70-000c29351b9d:kudos:event_
summary
{kudos_json}
31aac580-8550-11e3-ad74-000c29351b9d:moderationAc
tion:event_summary
{moderation_json}
f4efd590-82ca-11e3-ad74-000c29351b9d:badge:event_
summary
{badge_json}
stored as:
maps to (cqlsh):
15
Migration Strategy(mysql cassandra)
16
Data Migration: Trust, but Verify
lia NS
1) Bulk Migrate all subscription data (HTTP)
2) Consistency check all subscription data (HTTP)
Also runs after migration to verify shadow-writes
Fully repeatable due to idempotent writes
17
Verify: Consistency Checking
18
Subscription Write Strategy
CassandraNotification
Service
activemqlia
subscription_write (shadow_write)
NS system boundary
user
subscription_write
mysql
subs
crip
tion_
writ
e
Reads for UI fulfilled by legacy mysql (temporary)
Reads for subscription fulfillment happen in ns.
19
Path to Production: QA Issue #1(many writes to same row kill cluster)
20
Problem: CQL INSERTS
Single Thread SLOW, even with BATCH (multiple second latency for writing chunks of 1000 subscriptions)
Largest customer (~20 million subscriptions) would have taken weeks to migrate
21
Just Use More Threads? Not Quite
22
Cluster Essentially Died
23
Mutations Could Not Keep Up
24
Solution: Work Closer to Storage Layer
66edfdb7-6ff7-458c-94a8-
421627c1b6f5:message:13
user:2:creationtimestamp
1390939660
user:53:creationtimestamp
1390939665
user:88:creationtimestamp
1390939670
Work here:
Not here:
25
Solution: Thrift batch_mutate
More details: http://thelastpickle.com/blog/2013/09/13/CQL3-to-Astyanax-Compatibility.html
Allowed us to write 200,000 subscriptions to 3 CFs in ~45 seconds with almost no impact on cluster. NOTE: supposedly fixed in 2.0: CASSANDRA-4693
26
Path to Production: QA Issue #2(read timeouts)
27
Tombstone Buildup and Timeouts
CF holding notification settings re-written every 30 minutes
Eventually tombstone build-up caused reads to time out
28
Solution
29
Production Issue #1(dead cluster)
30
Hard Drive Failure on All Nodes4 days after release, we started seeing this in /var/log/cassandra/system.log
After following a bunch of dead ends, we also found this in /var/messages.log
This cascaded to all nodes and within an hour, cluster was dead
31
TRIM Support to the Rescue
* http://www.slideshare.net/rbranson/cassandra-and-solid-state-drives
32
Production Issue #2(repair causing tornadoes of destruction)
33
Activity Feed Data Explosion
• Activity data written with a TTL of 30 days.• Users in 99th percentile were receiving
multiple thousands of writes per day.• compacted row maximum size: ~85mb
(after 30 days)
Here, be Dragons:– CASSANDRA-5799: Column can expire while lazy compacting
it...
34
Problem Did Not Surface for 30 Days
• Repairs started taking up to a week• Created 1000’s of SSTables• High latency:
35
Solution: Trim Feeds Manually
36
activity_for_entity cfstats
37
How we monitor in Prod
• Nodetool, Opscenter and JMX to monitor cluster
• Yammer Metrics at every layer of Notification Service, use graphite to visualize
• Use Netflix Hystrix in Notification Service to guard against cluster failure
38
Lessons Learned
• Have a migration strategy that allows both systems to stay live until you have proven Cassandra in prod
• Longevity tests are key, especially if you will have tombstones
• Understand how gc_grace_seconds and compaction affect tombstone cleanup
• Test with production data loads if you can
39
Questions?@paulcichonski