+ All Categories
Home > Technology > Orchestrating the execution of workflows for media streaming service and even more

Orchestrating the execution of workflows for media streaming service and even more

Date post: 31-Jul-2015
Category:
Upload: shuen-huei-guan
View: 464 times
Download: 3 times
Share this document with a friend
47
Orchestrating the execution of workflows for media streaming service and even more Shuen-Huei (Drake) Guan sr. principal engineer, KKBOX vice chairperson, PyCon APAC 2015
Transcript

Orchestrating the execution of workflows for media streaming service and even moreShuen-Huei (Drake) Guan

sr. principal engineer, KKBOXvice chairperson, PyCon APAC 2015

Who am I?

• administrator, Ptt BBS

• technical director / R&D manager, Digimax

• team player, KKBOX

• contributor, PyCon Taiwan

Rather a story than tech sharing.No any KKBOX trade secrets get revealed.

There're just some slides talking about Python.

And, it's not about music streaming.

350 team players to serve10M users across 6 countries

20M songs

Events

KKTIXthe always lovely sponsor!

If we can make music streaming work, how

about video streaming?— KKBOX CxO

Let's work on a video-on-demand service

• Adaptive streaming.

• DRM protection.

• Video processing on cloud.

We thought video streaming is similar to music streaming,

but we were wrong.

Issue 1. Workflow

multiple distinct interconnected steps that need to be executed in a particular order in a distributed environment...— someoneflickr:siddhu2020

flickr:siddhu2020 http://bit.ly/1FAukT2

Sample encoding workflow for music

def run(source, secret_key, cipher): # verify if the source is ok. if not verify(source): return False

# convert audio with different bitrates _ = [convert(source, i) for i in range(4)]

# update id3 tag for all converted audios _ = update_id3_tag(_)

# encrypt all audios _ = encrypt(_, secret_key, cipher)

# deploy to backend DB deploy(_)

return True

Issue 2. Distribute tasks to the cloud, and use the cloud

efficiently!

Gearman

Sample encoding workflow for music

Sample client code to submit a workflow1

$workflow = new Gearman_Workflow('KKBOX_Convert_Audio' 'source' => $source, 'args' => $args);

$workflow->attachCallback(function () {});

$client->run($workflow);

1 warning, it's PHP.

Sample worker (server) code to do things1

class KKBOX_Convert_Audio extends Gearman_Worker { public function run($arg) { // check the source if (!verify()) return; // convert audio with different bitrates for ($i=0; $i<4; $i++) { convert($i); } // update id3 tag for all audios update_id3_tag(); // encrypt audios encrypt(); // sequentially deploy to backend DB for ($i=0; $i<4; $i++) { deploy($i); }}

1 warning, it's PHP.

Sample encoding workflow for video, a little bit complicated

Sample worker (server) code to do things1

class KKBOX_Encode_Video extends Gearman_Worker { public function run($arg) { transcode(); encrypt(); }}

class KKBOX_Convert_Video extends Gearman_Worker { public function run($arg) { if (!verify()) return;

// create asynchronous sub-workflows $result = create_sub_workflow(KKBOX_Encode_Video); // wait for all sub-workflows to finish joint($result);

create_sub_workflow(KKBOX_Package_DASH, $result->encrypted); create_sub_workflow(KKBOX_Package_HLS, $result->plain); joint();

deploy();}

1 warning, it's PHP.

The real gearman worker code is way more complicated w/o elegance we like to have

Issue 3. Workflows would evolve...

• Let's save file size and IO.

• Let's make it faster.

• Let's add some more profiles.

• Let's fix some encoding.

Everything fails all the time.— Werner Vogels, CTO of Amazonflickr:Bill Abbott

flickr:Bill Abbott http://bit.ly/1GnrSGr

Issue 4. Gearman server down!

Factors we like to pay much attention in

• Encoding workflow

• Tasks distributing across machines on cloud.

• Server maintenance.

We hope ...

1. no need to maintain this system;

2. easier to distribute workflow/tasks, even to local machine;

3. with high-level workflow.As long as you can draw your processes on a paper, you can map it to a workflow!

What Google suggests us...

• Apache Kafka, Mesos, ...

• Gearman (sorry, but we've tried.)

• Luigi by Spotify

• Celery

• Potentially all message brokers with some additional work.

AWS Simple Workflow (SWF)

class HelloWorker(swf.ActivityWorker):

domain = DOMAIN version = VERSION task_list = TASKLIST

def run(self): activity_task = self.poll() if 'activityId' in activity_task: print 'Hello, World!' self.complete() return True

class HelloDecider(swf.Decider):

domain = DOMAIN task_list = TASKLIST version = VERSION

def run(self): history = self.poll() if 'events' in history: # Find workflow events not related to decision scheduling. workflow_events = [e for e in history['events'] if not e['eventType'].startswith('Decision')] last_event = workflow_events[-1]

decisions = swf.Layer1Decisions() if last_event['eventType'] == 'WorkflowExecutionStarted': decisions.schedule_activity_task(...) elif last_event['eventType'] == 'ActivityTaskCompleted': decisions.complete_workflow_execution() self.complete(decisions=decisions) return True

SWF

• Decider defines the workflow.

• We still need to write workflow logic in decider.

• Workers do the action.

• Everytime, we changed workflow or action, we need to re-deploy deciders and workers.

Let's de-couple the workflow and action out of SWF

Job script for a workflow

Job {KKBOX Convert Video} -subtasks { Task {Source Inspection} -cmds { Cmd { emilia verify -i s3://bucket/source.mp4 } }

Task {Transcode} --parallel -subtasks { Iterate i -from 0 -to 4 -by 1 -template { Task {Transcode Audio} -cmds { Cmd { ffmpeg -i s3://bucket/source.mp4 -o /tmp/converted_$i.mp4 } } } Iterate i -from 0 -to 8 -by 1 -template { Task {Transcode Video} -cmds { Cmd { ffmpeg -i s3://bucket/source.mp4 -o /tmp/converted_$i.mp4 } } } }

Task {Adaptive} -subtasks { Task {DASH} -subtasks { } Task {HLS} -subtasks { } Task {MSS} -subtasks { } }}

What is exactly a job script?

����

Make it pythonic if that makes developers happier

source = 's3://bucket/source.mp4'

with Job(): with Task('Source Inspection'): Cmd('emilia verify -i %s' % source)

with Task('Transcode', parallel=True): for i in range(4): with Task(): Cmd('ffmpeg -i %s ... -o /tmp/a_%d.mp4' % (source, i)) for i in range(9): with Task(): Cmd('ffmpeg -i %s ... -o /tmp/v_%d.mp4' % (source, i))

with Task('Adaptive'): with Task('DASH'): pass with Task('HLS'): pass with Task('MSS'): pass

Status

• 1,500,000-minute videos got encoded.

• 3,000 videos per day (max).

• 800 workers on 100 c3.8xlarge instances (max).

• spent lots of $.

• everyone is really happy for that performance.

Technical status

• Fault tolerance by retry. [decider]

• Workflow/task has priorities. [SWF]

• try..except..finally mechanism. [-whendone, -whenerror, -precmds, -postcmds, ...]

Question:Are you interested in this project?

To do:

• Use JSON or YAML for job script.

• A viewer to see the progress of workflows!

• Replace SWF by Apache Mesos or Mistral.

Thank You!@drakeguan


Recommended