C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

transcript

CMB – A Message Bus for the Cloud

CQS – Queuing Service CNS – Topic based Pub Sub Service

Why did we build our own? •  General purpose message bus to replace project driven one-off

solutions •  Smooth data center failover, maybe even “active-active” queues •  Must scale to millions of queues and 1000s of messages/sec (for

example 1 queue per STB) •  Tight latency requirements (“10ms response time 95th pct”) •  Evaluated other options to arrive at AWS SQS/SNS

AWS SQS Primer “Simple Queuing Service” •  Focus on guaranteed delivery •  Best effort on orderly delivery, duplicates •  Few simple core APIs:

SendMessage() ReceiveMessage() DeleteMessage() •  Do not trust message recipients

Advantages of adopting an API If you do it on your own: •  API design typically biased towards first use case •  Almost guaranteed: You won’t get it right the first time (iterations) •  Difficult for new users to adopt: Documentation, tools, community,…

Why did we build our own? AWS SQS

Guaranteed Delivery + Simple, Robust API + Scalability + Ac;ve-‐Ac;ve ? DC Failover ? Latency & Throughput ? Limita;ons (Msg Size, # Ar;facts, …) ?

“Build a horizontally scalable queuing service on top of Cassandra (and Redis) which is API compatible with AWS SQS / SNS API”

CQS over Cassandra and Redis Cassandra •  Cross-DC persistence and replication •  Proven horizontal scalability Redis •  Meet latency requirements •  Help with best effort ordering •  Handle Visibility Timeout (VTO)

Cassandra Data Modeling How to represent queued messages in Cassandra? •  Single Column Queue •  Single Row Queue •  Multi-Row Queue

Single Column Queue

Single Row Queue

Multi-Row Queue

CQS Data Flow Example 1.  SendMessage(MSG1) 2.  SendMessage(MSG2) 3.  SendMessage(MSG3) 4.  MSG1 = ReceiveMessage() 5.  DeleteMessage(MSG1)

CQS Architecture Recap Cassandra Persistence Layer •  Messages sharded across 100 rows per queue •  Avoid wide rows (> 500K) •  Minimize churn (Tombstones) •  Distribute queue among Cassandra nodes Redis Caching Layer •  To meet latency requirements •  Payload cache (kicks in after first miss, pre-load next 10k) •  Improve FIFOness by storing Msg IDs in Redis List •  Handle message visibility entirely in Redis (Hashtable)

CQS Key Cassandra Features Persistence and failover •  Cross-DC replication in combination with Local Quorum Reads/Writes

(tunable consistency) Millions of queues, spiky traffic patterns •  Massive horizontal scalability Message order (FIFOness) / future dated messages •  Wide rows, composite column keys / TimeUUID and column sort order Message retention period (expiration) •  TTL Fast lookup of static metadata (Queues, Users etc.) •  Row Cache, Secondary Indexes

CQS Scalability •  Send(), Receive() and Delete() scale with Cassandra

Ring, API Servers (stateless) and Redis Shards •  Are constant time operations •  Queues not sharded across Redis servers!

CQS Availability •  Depends on availability of Cassandra •  Service functions without Redis!

CQS DC Failover

AWS SNS API “Simple Notification Service”

•  Topic based Publish/Subscribe Service •  Supported protocols: HTTP/CQS/SQS •  Few simple core APIs

CreateTopic() / DeleteTopic() Subscribe() / Unsubscribe() ConfirmSubscription() Publish()

•  Do not trust message recipients (redelivery policy)

CNS Data Flow Example •  Single operation: Publish message MSG1 to a topic

T with four Subscribers S1, S2, S5, S6. •  S1, S2 are HTTP endpoints •  S5, S6 are CQS queues

CNS Architecture Recap •  CQS Queue preserves messages when Publish

Workers are down or overloaded •  CQS Visibility Timeout takes care of guaranteed

delivery •  Retry policy improves guaranteed delivery for

temporarily unavailable endpoints (http) •  Publish Workers hardened for rogue endpoints (failing

endpoints, slow endpoints, …)

Differences SQS/SNS and CQS/CNS Goal: Full API compatibility Current state: •  All APIs implemented, most parameters supported •  Can use AWS Java SDK and others Limitations: •  AWS4 signatures not supported (V1 and V2 ok) •  SMS endpoints not supported, limited email support Enhancements: •  Additional APIs for monitoring and management •  Unlimited number of queues, topics and subscriptions •  Adjustable message size and other parameters (MSG <= 64KB, LP <= 20 sec, DS <= 900 sec, RP, …)

CMB Ready for Production Use? •  Code of CMB Core is stable •  Extensive testing done (including throughput

scalability testing) •  In use at Comcast (Sports, DVR, …)

Testing Goals •  Functional testing (unit tests, good code coverage) •  Stress testing (simulate Redis outage, data center failover) •  Endurance testing •  Load testing: Verify linear horizontal scalability (CQS / CNS

throughput scalability)

CQS Throughput Scalability •  Throughput as a function of Cassandra Ring size •  Increase load until throughput (msg/sec) reaches a maximum •  Increase ring size and re-test •  Ensure sufficient API and Redis capacity to support largest ring •  Deployment: 10 API Servers, 5 Redis Shards, 4-16 Node Cassandra

CQS Throughput Scalability # Load Gen # API Servers # Redis Shards Ring Size API / Sec P99

5 10 5 4 2832 <= 100 ms

5 10 5 8 6072 <= 100 ms

5 10 5 12 9472 <= 100 ms

6 12 6 16 11667 <= 100 ms

8 15 7 20 13514 <= 100 ms

8 15 7 24 15365 <= 100 ms

0 2000 4000 6000 8000 10000 12000 14000 16000 18000

4 8 12 16 20 24

API/sec

Ring Size

CNS Throughput Scalability •  Most important metric: End-to-end latency •  Fixed number of subscribers, gradually increase #msg/sec published

until system is “overwhelmed” •  Increase number of Publish Workers and re-test •  Deployment: 8 node Cassandra Ring, 2 API Servers, 2 Redis Shards,

3-6 Publish Workers •  Test setup: Single topic with 100 HTTP subscribers, 10 min test

duration

CNS Throughput Scalability 3 Publish Workers

#PUB/SEC

#MSG/SEC

AVG(LAT)

API AVG(RT)

API P95(RT)

API AVG(CQ

API P95(CQ

PROD AVG(RT

PROD P95(RT)

PROD AVG(CQ

PROD P95(CQS)

CONS AVG(RT)

CONS P95(RT)

CONS AVG(CQS)

CONS P95(CQS

HTTP AVG(RT)

HTTP P95(CQS

5 500 198 21 44 10 26 77 150 77 155 47 96 38 85 12 18

10 1000 177 15 30 8 16 68 119 68 118 39 70 31 59 12 17

20 2000 160 16 29 9 17 69 120 69 120 40 69 31 59 9 16

40 4000 209 18 37 10 22 78 138 78 139 41 74 32 62 11 18

80 8000 75656 28 61 14 27 237 1020 237 1020 143 790 131 770 14 21

CNS Throughput Scalability 6 Publish Workers

#PUB/SEC #MSG/SEC AVG(LAT)

API AVG(RT)

API P95(RT)

API AVG(CQ

API P95(CQ

PROD AVG(RT

PROD P95(RT)

PROD AVG(CQ

PROD P95(CQS)

CONS AVG(RT)

CONS P95(RT)

CONS AVG(CQS)

CONS P95(CQS

HTTP AVG(RT)

HTTP P95(CQS

5 500 247 37 117 19 66 141 290 139 290 77 200 57 160 18 34

10 1000 226 41 130 21 79 163 380 162 370 79 200 58 160 17 39

20 2000 199 37 118 20 74 133 280 133 280 68 180 50 150 11 21

40 4000 225 45 140 25 110 148 320 148 320 76 210 53 170 18 25

80 8000 267 48 126 25 80 149 300 149 300 77 180 58 150 22 38

160 16000 145135 76 180 41 120 228 460 228 460 115 280 97 250 28 70

500 1000 2000 4000 8000

Throughput (Msgs/Sec)

6 workers

3 workers

CNS Throughput Scalability

Use Case: X1 Sports App

Moving Forward •  Follow SNS / SQS APIs •  More load and stress testing •  Ease of deployment and scale up •  More in-house production deployments (currently isolated

by application) •  CQS as a Service

Thank You!

http://github.com/Comcast/cmb

http://groups.google.com/forum/#!forum/cmb-user-forum

bwolf@sv.comcast.com

BACKUP

CNS Endurance Test

Single topic with 5 HTTP subscribers, 65 msg/sec published 14 mio messages published over 12hrs

Use Case: EAS

C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

Technology