Asynchronous Processing with Ruby on Rails (RailsConf 2008)

Post on 06-May-2015

22,874 views 0 download

Tags:

description

RailsConf 2008 presentation: Asynchronous Processing with Ruby on Rails

transcript

Asynchronous Processing

Jonathan Dahl

(and RailSpikes,Slantwise,Zencoder,

etc.)

I. What is it, and

why should I care?

Wife: What are you talking about at RailsConf this year?

Jon:

Wife:

Asynchronous Processing

[changes subject]

important tool

(ahem...)

Related Concepts

• Background Processing

• Parallel Processing

• Distributed Processing

has_attachment :storage => :s3

Browser Response

Send to S3

Image Upload ( 15 seconds)

( 15 seconds)

Browser Response

Send to S3

Image Upload

Browser Response

Send to S3

Image Upload

( 3 seconds)

Browser Response

Send to S3

Image Upload

has_attachment :storage => :file_system

Send to S3

Browser Response

Image Upload ( 15 seconds)

Send to S3

Browser Response

Image Upload

( 3 seconds)

( who cares?)

has_attachment :storage => :s3, :thumbnails => { :thumb => '100x100!', :small => '240x180>', :medium => '500x500>' }

Browser Response

Send Thumbnail A to S3

Image Upload

Generate 3 Thumbnails

Send Thumbnail B to S3

Send Thumbnail C to S3

Send Original to S3

Browser Response

Send Thumbnail A to S3

Image Upload

Generate 3 Thumbnails

Send Thumbnail B to S3

Send Thumbnail C to S3

Send Original to S3

Browser Response

Send Thumbnail A to S3

Image Upload

Generate 3 Thumbnails

Send Thumbnail B to S3

Send Thumbnail C to S3

Send Original to S3

Browser Response

Send Thumbnail A to S3

Image Upload

Generate 3 Thumbnails

Send Thumbnail B to S3

Send Thumbnail C to S3

Send Original to S3

Browser Response

Send Thumbnail A to S3

Image Upload

Generate 3 Thumbnails

Send Thumbnail B to S3

Send Thumbnail C to S3

Send Original to S3

Browser Response

Send Thumbnail A to S3

Image Upload

Generate 3 Thumbnails

Send Thumbnail B to S3

Send Thumbnail C to S3

Send Original to S3

Browser Response

Send Thumbnail A to S3

Image Upload

Generate 3 Thumbnails

Send Thumbnail B to S3

Send Thumbnail C to S3

Send Original to S3

II. When do I need it?

Time

Request

• Method (GET, POST)

• URI (host, port, path)

• Parameters

Response

• Status (200, 404, 500)

• Metadata (content type, server info, etc.)

• Body (xml, html, file)

Resources

Trigger

HTTP trigger - browser request

GET /photos/1.xml HTTP/1.1Host: example.com:80

HTTP trigger - API request

cap staging deploy

Human trigger - capistrano

rake db:migrate

Human trigger - rake

$ script/console productionLoading production environment (Rails 2.0.2)>> Photo.destroy_all

Human trigger - console

- Send email in 2 hours

- Sync data at 3:00am PST

- Notify admin when disk is 90% full

- Expire sessions that are inactive

- Archive records that exceed quota

No trigger?

1. Time2. Resources3. Trigger

Concrete examples

• Sending mail

• Transcoding video/audio

• Storing images on S3

• Receiving email

• Synching with outside database

• Complex computations

class Emailer < ActionMailer::ARMailer

ZencoderUser

Zencoder Manager

Worker

Video SharingWebsite

Data Storage(Amazon S3)

Worker Worker

Worker

class Photo < ActiveRecord::Base after_create :background_s3_upload def background_s3_upload Bj.submit "./script/runner ./jobs/send_to_s3.rb #{self.id}" endend

III. So how do you

decide what to use?

be seamless

how reliable?

when should it run?

dependencies and system complexity

scaling and/or

performance

IV. The simple solution:

fork or thread

Parallel vs. Background

1. Stay within one request

2. thread.join

3. ActiveRecordActiveRecord::Base.allow_concurrency = true

fire and forget

Spawn

spawn(:method => :fork) do # do somethingend

1. Time2. Resources

3. Trigger

V. More robust

solutions

Task StorageTask Trigger

Task Storage

• task details (what happens?)

• priority

• when to run

Task Trigger

• worker pulling jobs

• time-based

• execute immediately

Task StorageDatabase

Message Queue

Task Triggerdaemon

cron

Task StorageDatabase

Message Queue

Task Triggerdaemon

cron

create_table "jobs" do |t| t.text "command" t.integer "priority" t.integer "pid" t.datetime "submitted_at" t.datetime "started_at" t.datetime "finished_at" t.text "result"end

create_table "photos" do |t| t.string "filename" t.datetime "created_at" t.datetime "processed_at"endt.datetime "processed_at"

create_table "photos" do |t| t.string "filename" t.datetime "created_at" t.datetime "processed_at"endt.datetime "processed_at"

Task StorageDatabase

Message Queue

Task Triggerdaemon

cron

Task StorageDatabase

Message Queue

Task Triggerdaemon

cron

• Amazon SQS

• Websphere MQ

• Starling

• JMS

• beanstalkd

queue = SQS.get_queue("task_list")

queue.send_message "process:2872"

put message

message = queue.receive_message

receive message

Starling

starling -h 192.168.1.1 -d

require 'memcache'starling = MemCache.new('192.168.1.1:22122')

# Put messages onto a queue:starling.set('my_queue', 12345)

# Get message from the queue:starling.get('my_queue')

Task StorageDatabase

Message Queue

Task Triggerdaemon

cron

storage choice?

• queue: optimized for performance

• database: you’ve already got one

Task StorageDatabase

Message Queue

Task Triggerdaemon

cron

daemon

#!/usr/bin/env rubyclass JobRequester < SimpleDaemon::Base def self.start loop { Job.process_next } endend

JobRequester.daemonize

Task StorageDatabase

Message Queue

Task Triggerdaemon

cron

0 6 * * * script/runner jobs/send_emails.rb

cronedit

require 'cronedit'

CronEdit::Crontab.Add "send-emails", { :minute => 0, :hour => 6, :command => "script/runner jobs/send_emails.rb" }

CronEdit::Crontab.Remove 'old-task'

trigger choice

• process: always running

• cron: as reliable as your operating system

BackgroundDRbclass BillingWorker < BackgrounDRb::MetaWorker set_worker_name :billing_worker def create(args = nil) # this method is called when worker is loaded for the first time end

def charge_customer(customer_id = nil) logger.info 'charging customer now' endend

MiddleMan.worker(:billing_worker).charge_customer(current_customer.id)

:backgroundrb: :ip: 0.0.0.0

:development: :backgroundrb: :port: 11111 :log: foreground

:production: :backgroundrb: :port: 22222 :lazy_load: true :debug_log: false ./script/backgroundrb start

AP4Rdef MyController def queue ap4r.async_to({:action => 'download'}, {:story => story.id, :url => params[:url]}) end

def download # long-running task endend

Bj

Acronym

create_table "bj_job", :primary_key => "bj_job_id", :force => true do |t| t.text "command" t.text "state" t.integer "priority" t.text "tag" t.integer "is_restartable" t.text "submitter" t.text "runner" t.integer "pid" t.datetime "submitted_at" t.datetime "started_at" t.datetime "finished_at" t.text "env" t.text "stdin" t.text "stdout" t.text "stderr" t.integer "exit_status"end

Bj.submit "./script/runner ./jobs/task.rb"

after_create :bj_send_to_s3

def bj_send_to_s3 Bj.submit "./script/runner ./jobs/send.rb #{id}"end

# environment configWorkling::Remote.dispatcher = Workling::Remote::Runners::StarlingRunner.new

Workling

# task classclass ImageWorker < Workling::Base def send_to_s3(options = {}) # put file to S3 endend

# trigger asynchronous jobImageWorker.asynch_send_to_s3(:image_id => 2927)

script/workling_starling_client start

Pitfalls

race conditions

alive, but stalled

VI.some

recommendations

general purpose

Bj

distributed processing

SQS(+ custom worker)

time-scheduled

cron(+ rake or script)

speed + scalability

Starling/Workling

Thanks!Jonathan Dahl

Slides at RailSpikes http://railspikes.com