+ All Categories
Home > Technology > C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

Date post: 05-Dec-2014
Category:
Upload: planet-cassandra
View: 5,499 times
Download: 0 times
Share this document with a friend
Description:
The Comcast Silicon Valley Innovation Center has developed a general purpose message bus for the cloud. The service is API compatible with Amazon's SQS/SNS and is built on Cassandra and Redis with the goal of linear horizontal scalability. This presentation offers and in-depth look at the architecture of the system and how they employ Cassandra as a central component to meet key requirements. Latest feature enhancements and performance data will also be covered.
53
CMB – A Message Bus for the Cloud
Transcript
Page 1: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

CMB – A Message Bus for the Cloud

Page 2: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

CMB – A Message Bus for the Cloud

CQS – Queuing Service CNS – Topic based Pub Sub Service

Page 3: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

Why did we build our own? •  General purpose message bus to replace project driven one-off

solutions •  Smooth data center failover, maybe even “active-active” queues •  Must scale to millions of queues and 1000s of messages/sec (for

example 1 queue per STB) •  Tight latency requirements (“10ms response time 95th pct”) •  Evaluated other options to arrive at AWS SQS/SNS

Page 4: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

AWS SQS Primer “Simple Queuing Service” •  Focus on guaranteed delivery •  Best effort on orderly delivery, duplicates •  Few simple core APIs:

SendMessage() ReceiveMessage() DeleteMessage() •  Do not trust message recipients

Page 5: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

Advantages of adopting an API If you do it on your own: •  API design typically biased towards first use case •  Almost guaranteed: You won’t get it right the first time (iterations) •  Difficult for new users to adopt: Documentation, tools, community,…

Page 6: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

Why did we build our own? AWS  SQS  

Guaranteed  Delivery   +  Simple,  Robust  API   +  Scalability   +  Ac;ve-­‐Ac;ve   ?  DC  Failover   ?  Latency  &  Throughput   ?  Limita;ons  (Msg  Size,  #  Ar;facts,  …)   ?  

Page 7: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

“Build a horizontally scalable queuing service on top of Cassandra (and Redis) which is API compatible with AWS SQS / SNS API”

Page 8: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

CQS over Cassandra and Redis Cassandra •  Cross-DC persistence and replication •  Proven horizontal scalability Redis •  Meet latency requirements •  Help with best effort ordering •  Handle Visibility Timeout (VTO)

Page 9: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

Cassandra Data Modeling How to represent queued messages in Cassandra? •  Single Column Queue •  Single Row Queue •  Multi-Row Queue

Page 10: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

Single Column Queue

Page 11: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

Single Row Queue

Page 12: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

Multi-Row Queue

Page 13: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

CQS Data Flow Example 1.  SendMessage(MSG1) 2.  SendMessage(MSG2) 3.  SendMessage(MSG3) 4.  MSG1 = ReceiveMessage() 5.  DeleteMessage(MSG1)

Page 14: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf
Page 15: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf
Page 16: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf
Page 17: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf
Page 18: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf
Page 19: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf
Page 20: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

CQS Architecture Recap Cassandra Persistence Layer •  Messages sharded across 100 rows per queue •  Avoid wide rows (> 500K) •  Minimize churn (Tombstones) •  Distribute queue among Cassandra nodes Redis Caching Layer •  To meet latency requirements •  Payload cache (kicks in after first miss, pre-load next 10k) •  Improve FIFOness by storing Msg IDs in Redis List •  Handle message visibility entirely in Redis (Hashtable)

Page 21: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

CQS Key Cassandra Features Persistence and failover •  Cross-DC replication in combination with Local Quorum Reads/Writes

(tunable consistency) Millions of queues, spiky traffic patterns •  Massive horizontal scalability Message order (FIFOness) / future dated messages •  Wide rows, composite column keys / TimeUUID and column sort order Message retention period (expiration) •  TTL Fast lookup of static metadata (Queues, Users etc.) •  Row Cache, Secondary Indexes

Page 22: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

CQS Scalability •  Send(), Receive() and Delete() scale with Cassandra

Ring, API Servers (stateless) and Redis Shards •  Are constant time operations •  Queues not sharded across Redis servers!

Page 23: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

CQS Availability •  Depends on availability of Cassandra •  Service functions without Redis!

Page 24: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

CQS DC Failover

Page 25: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

AWS SNS API “Simple Notification Service”

•  Topic based Publish/Subscribe Service •  Supported protocols: HTTP/CQS/SQS •  Few simple core APIs

CreateTopic() / DeleteTopic() Subscribe() / Unsubscribe() ConfirmSubscription() Publish()

•  Do not trust message recipients (redelivery policy)

Page 26: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

CNS Data Flow Example •  Single operation: Publish message MSG1 to a topic

T with four Subscribers S1, S2, S5, S6. •  S1, S2 are HTTP endpoints •  S5, S6 are CQS queues

Page 27: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf
Page 28: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf
Page 29: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf
Page 30: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf
Page 31: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf
Page 32: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf
Page 33: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

CNS Architecture Recap •  CQS Queue preserves messages when Publish

Workers are down or overloaded •  CQS Visibility Timeout takes care of guaranteed

delivery •  Retry policy improves guaranteed delivery for

temporarily unavailable endpoints (http) •  Publish Workers hardened for rogue endpoints (failing

endpoints, slow endpoints, …)

Page 34: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

Differences SQS/SNS and CQS/CNS Goal: Full API compatibility Current state: •  All APIs implemented, most parameters supported •  Can use AWS Java SDK and others Limitations: •  AWS4 signatures not supported (V1 and V2 ok) •  SMS endpoints not supported, limited email support Enhancements: •  Additional APIs for monitoring and management •  Unlimited number of queues, topics and subscriptions •  Adjustable message size and other parameters (MSG <= 64KB, LP <= 20 sec, DS <= 900 sec, RP, …)

Page 35: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

CMB Ready for Production Use? •  Code of CMB Core is stable •  Extensive testing done (including throughput

scalability testing) •  In use at Comcast (Sports, DVR, …)

Page 36: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

Testing Goals •  Functional testing (unit tests, good code coverage) •  Stress testing (simulate Redis outage, data center failover) •  Endurance testing •  Load testing: Verify linear horizontal scalability (CQS / CNS

throughput scalability)

Page 37: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

CQS Throughput Scalability •  Throughput as a function of Cassandra Ring size •  Increase load until throughput (msg/sec) reaches a maximum •  Increase ring size and re-test •  Ensure sufficient API and Redis capacity to support largest ring •  Deployment: 10 API Servers, 5 Redis Shards, 4-16 Node Cassandra

Ring

Page 38: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf
Page 39: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf
Page 40: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

CQS Throughput Scalability #  Load  Gen   #  API  Servers   #  Redis  Shards   Ring  Size   API  /  Sec   P99  

5   10   5   4   2832   <=  100  ms  

5   10   5   8   6072   <=  100  ms  

5   10   5   12   9472   <=  100  ms  

6   12   6   16   11667   <=  100  ms  

8   15   7   20   13514   <=  100  ms  

8   15   7   24   15365   <=  100  ms  

Page 41: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

0  2000  4000  6000  8000  10000  12000  14000  16000  18000  

4   8   12   16   20   24  

API/sec  

Ring  Size  

Page 42: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

CNS Throughput Scalability •  Most important metric: End-to-end latency •  Fixed number of subscribers, gradually increase #msg/sec published

until system is “overwhelmed” •  Increase number of Publish Workers and re-test •  Deployment: 8 node Cassandra Ring, 2 API Servers, 2 Redis Shards,

3-6 Publish Workers •  Test setup: Single topic with 100 HTTP subscribers, 10 min test

duration

Page 43: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

CNS Throughput Scalability 3 Publish Workers

#PUB/SEC  

#MSG/SEC  

AVG(LAT)    

API    AVG(RT)    

API    P95(RT)  

 

API  AVG(CQ

S)    

API  P95(CQ

S)    

PROD  AVG(RT

)    

PROD  P95(RT)  

 

PROD  AVG(CQ

S)    

PROD  P95(CQS)  

 

CONS  AVG(RT)  

 

CONS  P95(RT)  

 

CONS  AVG(CQS)  

 

CONS  P95(CQS

)    

HTTP  AVG(RT)  

 

HTTP  P95(CQS

)    

5   500   198   21   44   10   26   77   150   77   155   47   96   38   85   12   18  

10   1000   177   15   30   8   16   68   119   68   118   39   70   31   59   12   17  

20   2000   160   16   29   9   17   69   120   69   120   40   69   31   59   9   16  

40   4000   209   18   37   10   22   78   138   78   139   41   74   32   62   11   18  

80   8000   75656   28   61   14   27   237   1020   237   1020   143   790   131   770   14   21  

Page 44: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

CNS Throughput Scalability 6 Publish Workers

#PUB/SEC   #MSG/SEC   AVG(LAT)  

 

API    AVG(RT)    

API    P95(RT)  

 

API  AVG(CQ

S)    

API  P95(CQ

S)    

PROD  AVG(RT

)    

PROD  P95(RT)  

 

PROD  AVG(CQ

S)    

PROD  P95(CQS)  

 

CONS  AVG(RT)  

 

CONS  P95(RT)  

 

CONS  AVG(CQS)  

 

CONS  P95(CQS

)    

HTTP  AVG(RT)  

 

HTTP  P95(CQS

)    

5   500   247   37   117   19   66   141   290   139   290   77   200   57   160   18   34  

10   1000   226   41   130   21   79   163   380   162   370   79   200   58   160   17   39  

20   2000   199   37   118   20   74   133   280   133   280   68   180   50   150   11   21  

40   4000   225   45   140   25   110   148   320   148   320   76   210   53   170   18   25  

80   8000   267   48   126   25   80   149   300   149   300   77   180   58   150   22   38  

160   16000   145135   76   180   41   120   228   460   228   460   115   280   97   250   28   70  

Page 45: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

0

200

400

600

800

1000

1200

500 1000 2000 4000 8000

Lat

ency

(ms)

Throughput (Msgs/Sec)

6 workers

3 workers

CNS  Throughput  Scalability  

Page 46: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

Use Case: X1 Sports App

Page 47: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

Use Case: X1 Sports App

Page 48: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

Use Case: X1 Sports App

Page 49: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

Moving Forward •  Follow SNS / SQS APIs •  More load and stress testing •  Ease of deployment and scale up •  More in-house production deployments (currently isolated

by application) •  CQS as a Service

Page 50: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

Thank You!

http://github.com/Comcast/cmb

http://groups.google.com/forum/#!forum/cmb-user-forum

[email protected]

Page 51: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

BACKUP

Page 52: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

CNS Endurance Test

Single  topic  with  5  HTTP  subscribers,  65  msg/sec  published  14  mio  messages  published  over  12hrs  

Page 53: C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf

Use Case: EAS


Recommended