MongoDB as A Message Queue
Luke Gotszling
Aol / About.me
Silicon Valley MongoDB User GroupBig Data WeekPalo Alto, CAApril 25, 2012
1
Prior AMQP Usage
• 3-node RabbitMQ cluster on v1.8, opted to forego disk persistence for better performance
• Hard to diagnose cause of failure at scale
2
At About.me
• All asynchronous and periodic tasks• Short lived messages
• No journalling• Sharded cluster on v2.0.4 (shard key =
queue name)
3
Benefits
• Async operations• Per message (document) atomicity• Batch processes• Periodic processes• Durability / ability to shard• Operational familiarity
4
AMQP?Direct Topic Fanout
AMQP Push Yes Yes
Mongo Queue
PollRegular
expression Sort of*
* Options include passing a message along with an incrementing key or multiple declarations. Added to Kombu in v2.1 -- reduces performance for non-fanout operations due to additional queries
?
5
To cap or not to cap• Capped collections[1]
• Better performance but limited to single node[2]
• FIFO• Uncapped collections -- rest of this presentation
• Can shard, lower performance per-node• FIFO-ish[3], custom ordering available
[1] http://blog.boxedice.com/2011/04/13/queueing-mongodb-using-mongodb/
http://blog.boxedice.com/2011/09/28/replacing-rabbitmq-with-mongodb/
[2] SERVER-211, SERVER-2654
[3] Only down to 1 second granularity6
Code (mongo)• Create:
• Consume:
• Index:
db.messages.findAndModify( { query:{"queue":"email"}, sort:{"_id":+1}, remove:true} )
db.messages.insert( { queue:"email", payload:serialized_data} )
db.messages.ensureIndex({ queue:1 })db.messages.ensureIndex({ queue:1, _id:1})
7
Code (Python)• Create:
• Consume:
• Index:
self.client.database.command("findandmodify", "messages", query={"queue": queue}, sort={"_id": pymongo.ASCENDING}, remove=True)
self.client.insert({"payload": serialize(message), "queue": queue})
col.ensure_index([("queue", 1)])col.ensure_index([("queue", 1),("_id", 1)])
http://packages.python.org/kombu/
8
Celery Task Creation Benchmarks (Single-Node)
celery 2.4.5 / kombu 2.0 / pymongo 2.1 / amqplib 1.0.2 / eventlet 0.9.16
0
1400
2800
4200
5600
1 2 3 4 5
Cre
ated
/ s
Concurrency (processes)
RabbitMQ v2.7.1 MongoDB (2.0.4) --nojournalMongoDB (2.0.4) --journal
9
0
500
1000
1500
2000
1 5 9 13 17 21 25
Con
sum
ed /
s
Concurrency (eventlet)
RabbitMQ v2.7.1 MongoDB (2.0.4) --nojournalMongoDB (2.0.4) --journal
Celery Task Consumption Benchmarks (Single-Node)
celery 2.4.5 / kombu 2.0 / pymongo 2.1 / amqplib 1.0.2 / eventlet 0.9.16
10
Pros Cons• Familiar technology
• Sharding
• Durability
• Lower operational overhead
• Advanced querying (map/reduce etc...)
• Not AMQP
• Need to poll
• Performance depends on polling frequency and concurrency
• Message consumption is a locking operation
• Fewer libraries available[1]
[1] Python has kombu, < v2.1 no fanout support but better async task performance
11
Don’t Forget To Shard Your Collections!
12