Date post: | 06-May-2015 |
Category: |
Technology |
Upload: | jonathan-dahl |
View: | 22,874 times |
Download: | 0 times |
Asynchronous Processing
Jonathan Dahl
(and RailSpikes,Slantwise,Zencoder,
etc.)
I. What is it, and
why should I care?
Wife: What are you talking about at RailsConf this year?
Jon:
Wife:
Asynchronous Processing
[changes subject]
important tool
(ahem...)
Related Concepts
• Background Processing
• Parallel Processing
• Distributed Processing
has_attachment :storage => :s3
Browser Response
Send to S3
Image Upload ( 15 seconds)
( 15 seconds)
Browser Response
Send to S3
Image Upload
Browser Response
Send to S3
Image Upload
( 3 seconds)
Browser Response
Send to S3
Image Upload
has_attachment :storage => :file_system
Send to S3
Browser Response
Image Upload ( 15 seconds)
Send to S3
Browser Response
Image Upload
( 3 seconds)
( who cares?)
has_attachment :storage => :s3, :thumbnails => { :thumb => '100x100!', :small => '240x180>', :medium => '500x500>' }
Browser Response
Send Thumbnail A to S3
Image Upload
Generate 3 Thumbnails
Send Thumbnail B to S3
Send Thumbnail C to S3
Send Original to S3
Browser Response
Send Thumbnail A to S3
Image Upload
Generate 3 Thumbnails
Send Thumbnail B to S3
Send Thumbnail C to S3
Send Original to S3
Browser Response
Send Thumbnail A to S3
Image Upload
Generate 3 Thumbnails
Send Thumbnail B to S3
Send Thumbnail C to S3
Send Original to S3
Browser Response
Send Thumbnail A to S3
Image Upload
Generate 3 Thumbnails
Send Thumbnail B to S3
Send Thumbnail C to S3
Send Original to S3
Browser Response
Send Thumbnail A to S3
Image Upload
Generate 3 Thumbnails
Send Thumbnail B to S3
Send Thumbnail C to S3
Send Original to S3
Browser Response
Send Thumbnail A to S3
Image Upload
Generate 3 Thumbnails
Send Thumbnail B to S3
Send Thumbnail C to S3
Send Original to S3
Browser Response
Send Thumbnail A to S3
Image Upload
Generate 3 Thumbnails
Send Thumbnail B to S3
Send Thumbnail C to S3
Send Original to S3
II. When do I need it?
Time
Request
• Method (GET, POST)
• URI (host, port, path)
• Parameters
Response
• Status (200, 404, 500)
• Metadata (content type, server info, etc.)
• Body (xml, html, file)
Resources
Trigger
HTTP trigger - browser request
GET /photos/1.xml HTTP/1.1Host: example.com:80
HTTP trigger - API request
cap staging deploy
Human trigger - capistrano
rake db:migrate
Human trigger - rake
$ script/console productionLoading production environment (Rails 2.0.2)>> Photo.destroy_all
Human trigger - console
- Send email in 2 hours
- Sync data at 3:00am PST
- Notify admin when disk is 90% full
- Expire sessions that are inactive
- Archive records that exceed quota
No trigger?
1. Time2. Resources3. Trigger
Concrete examples
• Sending mail
• Transcoding video/audio
• Storing images on S3
• Receiving email
• Synching with outside database
• Complex computations
class Emailer < ActionMailer::ARMailer
ZencoderUser
Zencoder Manager
Worker
Video SharingWebsite
Data Storage(Amazon S3)
Worker Worker
Worker
class Photo < ActiveRecord::Base after_create :background_s3_upload def background_s3_upload Bj.submit "./script/runner ./jobs/send_to_s3.rb #{self.id}" endend
III. So how do you
decide what to use?
be seamless
how reliable?
when should it run?
dependencies and system complexity
scaling and/or
performance
IV. The simple solution:
fork or thread
Parallel vs. Background
1. Stay within one request
2. thread.join
3. ActiveRecordActiveRecord::Base.allow_concurrency = true
fire and forget
Spawn
spawn(:method => :fork) do # do somethingend
1. Time2. Resources
3. Trigger
V. More robust
solutions
Task StorageTask Trigger
Task Storage
• task details (what happens?)
• priority
• when to run
Task Trigger
• worker pulling jobs
• time-based
• execute immediately
Task StorageDatabase
Message Queue
Task Triggerdaemon
cron
Task StorageDatabase
Message Queue
Task Triggerdaemon
cron
create_table "jobs" do |t| t.text "command" t.integer "priority" t.integer "pid" t.datetime "submitted_at" t.datetime "started_at" t.datetime "finished_at" t.text "result"end
create_table "photos" do |t| t.string "filename" t.datetime "created_at" t.datetime "processed_at"endt.datetime "processed_at"
create_table "photos" do |t| t.string "filename" t.datetime "created_at" t.datetime "processed_at"endt.datetime "processed_at"
Task StorageDatabase
Message Queue
Task Triggerdaemon
cron
Task StorageDatabase
Message Queue
Task Triggerdaemon
cron
• Amazon SQS
• Websphere MQ
• Starling
• JMS
• beanstalkd
queue = SQS.get_queue("task_list")
queue.send_message "process:2872"
put message
message = queue.receive_message
receive message
Starling
starling -h 192.168.1.1 -d
require 'memcache'starling = MemCache.new('192.168.1.1:22122')
# Put messages onto a queue:starling.set('my_queue', 12345)
# Get message from the queue:starling.get('my_queue')
Task StorageDatabase
Message Queue
Task Triggerdaemon
cron
storage choice?
• queue: optimized for performance
• database: you’ve already got one
Task StorageDatabase
Message Queue
Task Triggerdaemon
cron
daemon
#!/usr/bin/env rubyclass JobRequester < SimpleDaemon::Base def self.start loop { Job.process_next } endend
JobRequester.daemonize
Task StorageDatabase
Message Queue
Task Triggerdaemon
cron
0 6 * * * script/runner jobs/send_emails.rb
cronedit
require 'cronedit'
CronEdit::Crontab.Add "send-emails", { :minute => 0, :hour => 6, :command => "script/runner jobs/send_emails.rb" }
CronEdit::Crontab.Remove 'old-task'
trigger choice
• process: always running
• cron: as reliable as your operating system
BackgroundDRbclass BillingWorker < BackgrounDRb::MetaWorker set_worker_name :billing_worker def create(args = nil) # this method is called when worker is loaded for the first time end
def charge_customer(customer_id = nil) logger.info 'charging customer now' endend
MiddleMan.worker(:billing_worker).charge_customer(current_customer.id)
:backgroundrb: :ip: 0.0.0.0
:development: :backgroundrb: :port: 11111 :log: foreground
:production: :backgroundrb: :port: 22222 :lazy_load: true :debug_log: false ./script/backgroundrb start
AP4Rdef MyController def queue ap4r.async_to({:action => 'download'}, {:story => story.id, :url => params[:url]}) end
def download # long-running task endend
Bj
Acronym
create_table "bj_job", :primary_key => "bj_job_id", :force => true do |t| t.text "command" t.text "state" t.integer "priority" t.text "tag" t.integer "is_restartable" t.text "submitter" t.text "runner" t.integer "pid" t.datetime "submitted_at" t.datetime "started_at" t.datetime "finished_at" t.text "env" t.text "stdin" t.text "stdout" t.text "stderr" t.integer "exit_status"end
Bj.submit "./script/runner ./jobs/task.rb"
after_create :bj_send_to_s3
def bj_send_to_s3 Bj.submit "./script/runner ./jobs/send.rb #{id}"end
# environment configWorkling::Remote.dispatcher = Workling::Remote::Runners::StarlingRunner.new
Workling
# task classclass ImageWorker < Workling::Base def send_to_s3(options = {}) # put file to S3 endend
# trigger asynchronous jobImageWorker.asynch_send_to_s3(:image_id => 2927)
script/workling_starling_client start
Pitfalls
race conditions
alive, but stalled
VI.some
recommendations
general purpose
Bj
distributed processing
SQS(+ custom worker)
time-scheduled
cron(+ rake or script)
speed + scalability
Starling/Workling
Thanks!Jonathan Dahl
Slides at RailSpikes http://railspikes.com