Post on 29-Nov-2014
description
transcript
DESIGN CONSIDERATIONSExpect failure and design accordingly (process crashes,machine reboots, network partition)
Break work into small, bite-size tasks
Idempotency: ensure nothing bad will happen if your job runsmultiple times
SINGLE MACHINEDistribute work to multiple worker threads or forked worker
processes.
Can easily parallelize work, but jobs go away if the processrestarts
Cannot distribute work to multiple machines this way
IPC (Inter-Process Communication) is difficult to do right
Big no-no for web apps (you want to offload work to aseparate machine)
MULTIPLE MACHINESDistribute work to workers on other machines directly over the
network
Ruby’s DRb can distribute work, but is unstable under highload
A dedicated messaging system can be used to distribute workreliably
Jobs are (usually) not persistent so can be lost if somethingcrashes
PERSISTENT QUEUEWorkers pull jobs from a persistent backend queue
Suitable when many jobs need to be queued up and workedover time
Jobs can still be lost if workers crash or database hiccups
“Reliable” queueing can recover jobs if workers crash
STATUSReport back to the application on the job’s completion
percentage and whether it succeeded or failed.
PRIORITYIf your queue fills up, important jobs might be waiting in the backof the queue. A priority queue allows important jobs to go to the
top so they can be executed ASAP.
DEDICATED QUEUING SYSTEMBackend built specifically for the purpose of queueing
Natively supports desired properties of queues
Gearman: One of the originals. Out of date, not as fully-features as modern alternatives
Beanstalkd: Very fully featured and well-maintained
GENERALPURPOSE DATABASESimple to use if you’re already using a standard database
May not scale to massive / high-throughput workloads
SQL: May have locking / concurrency issues
Document Store: Probably won’t provide reliability
Redis: Swiss-Army Knife of key-value stores, used by Resqueand Sidekiq. Everything has to fit in memory.
MESSAGING SYSTEMProvides generic message-passing capabilities (queues arejust a special case)
Very scalable and high-throughput
Can be very complex to set up and use (topics, consumers,exchanges, brokers, OH MY)
ActiveMQ, RabbitMQ, ZeroMQ, HornetQ
- distributed commit logApache Kafka
BATCH PROCESSING SYSTEMMapReduce on huge volumes of data
Apache Hadoop
Apache Spark
Amazon Elastic MapReduce - hosted Hadoop
REALTIME PROCESSINGSYSTEM
Continual stream of input (firehose), need results withinseconds or minutes
Apace Storm
THIRD PARTY SERVICE - reliable message queue service
Amazon SQS: Scalable, but very bare-bones (lacks good Rubyworker client)
IronMQ / IronWorker
RUBY WORK QUEUE LIBRARIESA backend isn’t very useful without a good worker library to run
the jobs. Often the library can provide capabilities that thebackend does not.
RESQUE VS SIDEKIQResque forks workers, Sidekiq uses threads via Celluloid
Both use Redis for the backend and are mostly compatiblewith each other
Very fully featured (often via a separate gem)
Both come with web UI to make it easier to monitor job status
Sidekiq has a performance edge, and Sidekiq Pro offersreliability and batches
DELAYED JOBUses Active Record, so easy to plug into existing Rails app
Fairly well supported in the community
Alternatives that take advantage of PostgreSQL advancedfeatures: Queue Classic, Que, Toro
INMEMORYSucker Punch and Threaded In Memory Queue run workers in
the same process (in background threads) and distribute thejobs directly to these workers.
HONORABLE MENTIONSSneakers - RabbitMQ
Backburner - Beanstalkd
TorqueBox Backgroundable (JRuby-only)
Qu - Supports multiple backends (Redis, MongoDB, SQS). Notas well maintained or fully-featured.