Download - MongoDB as Message Queue

MongoDB as A Message Queue

Luke Gotszling

Aol / About.me

Silicon Valley MongoDB User GroupBig Data WeekPalo Alto, CAApril 25, 2012

1

Prior AMQP Usage

• 3-node RabbitMQ cluster on v1.8, opted to forego disk persistence for better performance

• Hard to diagnose cause of failure at scale

2

At About.me

• All asynchronous and periodic tasks• Short lived messages

• No journalling• Sharded cluster on v2.0.4 (shard key =

queue name)

3

Benefits

• Async operations• Per message (document) atomicity• Batch processes• Periodic processes• Durability / ability to shard• Operational familiarity

4

AMQP?Direct Topic Fanout

AMQP Push Yes Yes

Mongo Queue

PollRegular

expression Sort of*

* Options include passing a message along with an incrementing key or multiple declarations. Added to Kombu in v2.1 -- reduces performance for non-fanout operations due to additional queries

?

5

To cap or not to cap• Capped collections[1]

• Better performance but limited to single node[2]

• FIFO• Uncapped collections -- rest of this presentation

• Can shard, lower performance per-node• FIFO-ish[3], custom ordering available

[1] http://blog.boxedice.com/2011/04/13/queueing-mongodb-using-mongodb/

http://blog.boxedice.com/2011/09/28/replacing-rabbitmq-with-mongodb/

[2] SERVER-211, SERVER-2654

[3] Only down to 1 second granularity6

http://blog.boxedice.com/2011/04/13/queueing-mongodb-using-mongodb/

http://blog.boxedice.com/2011/04/13/queueing-mongodb-using-mongodb/



https://jira.mongodb.org/browse/SERVER-211




Code (mongo)• Create:

• Consume:

• Index:

db.messages.findAndModify( { query:{"queue":"email"}, sort:{"_id":+1}, remove:true} )

db.messages.insert( { queue:"email", payload:serialized_data} )

db.messages.ensureIndex({ queue:1 })db.messages.ensureIndex({ queue:1, _id:1})

7

Code (Python)• Create:

• Consume:

• Index:

self.client.database.command("findandmodify", "messages", query={"queue": queue}, sort={"_id": pymongo.ASCENDING}, remove=True)

self.client.insert({"payload": serialize(message), "queue": queue})

col.ensure_index([("queue", 1)])col.ensure_index([("queue", 1),("_id", 1)])

http://packages.python.org/kombu/

8



Celery Task Creation Benchmarks (Single-Node)

celery 2.4.5 / kombu 2.0 / pymongo 2.1 / amqplib 1.0.2 / eventlet 0.9.16

0

1400

2800

4200

5600

1 2 3 4 5

Cre

ated

/ s

Concurrency (processes)

RabbitMQ v2.7.1 MongoDB (2.0.4) --nojournalMongoDB (2.0.4) --journal

9

0

500

1000

1500

2000

1 5 9 13 17 21 25

Con

sum

ed /

s

Concurrency (eventlet)

RabbitMQ v2.7.1 MongoDB (2.0.4) --nojournalMongoDB (2.0.4) --journal

Celery Task Consumption Benchmarks (Single-Node)

celery 2.4.5 / kombu 2.0 / pymongo 2.1 / amqplib 1.0.2 / eventlet 0.9.16

10

Pros Cons• Familiar technology

• Sharding

• Durability

• Lower operational overhead

• Advanced querying (map/reduce etc...)

• Not AMQP

• Need to poll

• Performance depends on polling frequency and concurrency

• Message consumption is a locking operation

• Fewer libraries available[1]

[1] Python has kombu, < v2.1 no fanout support but better async task performance

11

Don’t Forget To Shard Your Collections!

12

Questions?

[email protected] about.me/luke

@lmgtwit

13

mailto:[email protected]

mailto:[email protected]