+ All Categories
Home > Technology > Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

Date post: 28-Jan-2018
Category:
Upload: dmitri-zimine
View: 268 times
Download: 0 times
Share this document with a friend
48
Genomic Computation at Scale with Serverless, StackStorm, and Docker SC17, 14 Nov 2017 Dmitri Zimine Fellow @ Extreme Networks @dzimine Image by Miki Yoshihito, Creative Commons license
Transcript
Page 1: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

Genomic Computation at Scalewith Serverless, StackStorm, and DockerSC17, 14 Nov 2017Dmitri ZimineFellow @ Extreme Networks@dzimine

Image by Miki Yoshihito, Creative Commons license

Page 2: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

Genomic Sequencing and Annotation

ACGTGACCGGTACTGGTAACGTACACCTACGTGACCGGTACTGGTAACGTACGCCTACGTGACCGGTACTGGTAACGTATACACGTGACCGGTACTGGTAACGTACACCTACGTGACCGGTACTGCTGGTAACGTATACCTCT...

Sequencer

Sequenced Genome

DNA Sample

Annotated Sequence

Computein silko

Page 3: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

3

So that…

Source: http://www.yourgenome.org

Page 4: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

Victor SolovyevPartner,

Leading scientist in computational

biologyVictor Solovyev is a leading scientist in computational biology. His experience is a good mixture of academic positions, including Professor at Royal Holloway and KAUST, and various industry roles. His research on bioinformatics and genomic computations are published in Nature, Science, Genome Research and highly cited.

As Chief Sci. Officer at Softberry, he is leading software development for biomedical data analysis and research in computational biology. Softberry software products have been used in over 2000 research publications in 2016 alone. Fgenesh program has been cited in ~ 3200, Bprom program in ~ 800, Fgenesb pipeline in ~500 scientific publications.

Page 5: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

5

fgenesb pipeline: some [prev] results

Page 6: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Page 7: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

PROPERTIES:

Challenges:• Offer annotation pipelines online• Use cloud, for large elastic capacity• Handle scale - spiky workload• Economically

GAaaS – Genomic Annotation as a Service

Page 8: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

Agenda

8

Problem & Solution

Domain demands, technology selection & serverless, toolchain, solution overview

Show & Tell Demo

Discussion Lessons learned, what to keep & what to refactor, the path forward

Page 9: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

Typical genomic annotation pipelineSearch for similar

proteins in databases

KEGG

Prediction of genes and proteins

Compilation and presentation of

results

NR

fgenesb

Blast(NR)

GCView

50-100Gb

KOALA(KEGG)

1Mb-3Gb

HighlyParallel-able

Page 10: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

Annotation Pipelines

A basic exome pipeline delivering called variants from raw sequence could consist of as few as 12 steps, most of which can be run in parallel, but a real analysis will typically involve several additional downstream steps and complex report generation.

Source: Brief Bioinform bbw020. DOI: https://doi.org/10.1093/bib/bbw020

Page 11: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

Annotation Pipelines

A basic exome pipeline delivering called variants from raw sequence could consist of as few as 12 steps, most of which can be run in parallel, but a real analysis will typically involve several additional downstream steps and complex report generation.

Source: Brief Bioinform bbw020. DOI: https://doi.org/10.1093/bib/bbw020

PROPERTIES:

• Steps: • jobs/functions • Run times – may be hours & days• Diverse (a.k.a. “don’t run on the same box”)

• Workflow orchestration:• Logical patterns: splits, parallels, joins• Data flow:

Upstream results –> downstream inputs• Scale dimentions: spiky load

• Low volume of requests, • Very high compute demand per request

Properties:

Page 12: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

Serverless

Page 13: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

Authoritative: Mike Roberts on martinfowler.com:

My summary• Function, not service: “down when done”• Scale – elastic, infinite, transparent for developer• Pay per use consumption model

https://goo.gl/bTfgfU

What is Serverless?

Page 14: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

14

Serverless fits!

*) BYOC – Bring Your Own Code (see the serverless compute manifesto, https://goo.gl/q9HsXB

Typical Serverless requirements:

• “Functions”, not “servers”, down when done

• Elastic scale: handle spiky workload pattern

• BYOC*: package algorithms into containers

• Launch on a variety of events

Page 15: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

Additional requirements:

• Long running times: hours

• Pipeline orchestration: execution logic and data passing

• Local Dev environment, consistent and convenient

15

Serverless fits, but…

Typical Serverless requirements:

• Elastic scale: handle spiky workload pattern

• “Functions”, not “servers”, down when done

• BYOC*: package programs into containers, run everywhere

• Launch on a variety of events

Page 16: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

Why not <…>

16

AWS Lambda? 5 min limitation - jobs run for hours and days

Azure? No native support for Functionsin docker containers *

OpenWhisk?Lacks powerful workflow to orchestrate pipelines (only sequences)

*) At the time of selecting. I will cover ”what has changed” in Discussion.

Page 17: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

D I Y

Page 18: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

18

Page 19: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

Terraform provisions infra on AWS (WIP);

Vagrant for local dev infra.

Ansible deploys & cofigures software on

Infra.

Docker to containerize functions and

push to local Docker Registry.

StackStorm orchestrates pipeline

executions,

invokes Swarm to run functions,

dynamically scales Swarm on load.

Tool Chain

Page 20: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

StackStorm, in 1 minute

ActionsSensors

WorkflowsRules

IT Domains

Config mgmtStorageNetworking ContainersCloud InfraMonitoring Ops Support

Triggers Calls

Page 21: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

©2017 Extreme Networks, Inc. All rights reserved

StackStorm is like …

ActionsSensors

WorkflowsRules

Step Functions

AWS Lambda

OpenSource, for DIY Serverless

Page 22: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

Three Sides to Serverless Story

DevOps

Developer

End User

Submits sequence,Gets results,fast and cheap.

Packs algorithms incontainers, Defines pipelines

Provides infrastructure

Page 23: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

1. DevOps: deploys serverless solution

23share(:rw) data(:ro)

StackStorm

other infra…

f(x)

Registry

Controller

f(x)

f(x)

f(x)

Worker

f(x)

f(x)

f(x)

Worker

f(x)

f(x)

f(x)

Worker

/share /data

$ function

Scale

DevOps

Page 24: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Page 25: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

2. Developer: creates functions, defines pipeline

25

StackStorm

Registry

Create functions (BYOC), pack into Docker image,push to local Registry

Define pipelines as StackStorm workflowsDeveloper

1

2

f(x)

f(x)

f(x)

f(x)

Page 26: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

StackStorm

StackStorm sends results back to user

Swarmcontroller

2

46Docker pulls

function’s images 5Functions run in containers, produce data

f(x)

StackStorm runs workflowschedules functionsas jobs on Swarm

SwarmWorker

3Swarm schedulesservices

User sendssequence data1

f(x) f(x)

Registry

3. User submits data, System runs pipeline & produces results

End User

Page 27: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

27

Genomic annotation pipeline with StackStorm, Docker,

and Docker Swarm

Show & Tell, PART 1

Page 28: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Page 29: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

Scale: dynamically, on load

29

share(:rw) data(:ro)

StackStorm

other infra…

f(x)

Registry

Controller

f(x)

f(x)

f(x)

Worker

f(x)

f(x)

f(x)

Worker

f(x)

Worker

Scale

Page 30: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

30

Show & Tell, PART 2

Dynamically scaling Swarm cluster on AWS,

on workload

Page 31: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Page 32: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

Agenda

32

Problem & Solution

Domain demands, technology selection & serverless, toolchain, solution overview

Show & Tell Demo

Discussion Lessons learned, what to keep & what to refactor, the path forward

Page 33: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Page 34: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

Serverless hype accelerates

25+ framewors … but no turn-key fit yet

Page 35: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

Kubernetes Won Container Arm Race

now with built-in AWS autoscaler .

Page 36: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

Azure Introduced Container Instances

no messing with VMs, per-second billing .

Page 37: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

We are outpaced by technology

Page 38: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

We are outpaced by technology

So What?

Page 39: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

Path Forward: Options

Option 1: Kubernetes

• Use Kubernetes pack from StackStorm Exchange• Utilize k8s “run to completion” jobs• Deploy on AWS, minikube for local development, • Leverage AWS autoscaler for elastic capacity

StackStorm handles pipeline workflow, calls k8s Jobs. Same app developer experience.

39

Page 40: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

Path Forward: Options

Option 2: Azure

• Use Azure’s ”Self-orchestration” option with StackStorm• Azure provides containers on demand (no VMs!)• Per container, per second billing

StackStorm handles pipeline workflow, calls Azure containers. App developer experience stays the same.

40

Page 41: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

StackStorm

StackStorm sends results back to user

Azure Container

Service

2

46Docker pulls

function’s imagesfrom Registry

5Functions run in containers, produce data

f(x)

StackStorm runs workflowschedules functionsas containers on Azure

AzureContainerInstance

3Azure schedulescontainer instances

User sendssequence data1

f(x) f(x)

Registry

Path forward: Change to Azure Container Instances

End User

Page 42: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

42

Page 43: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

43

Page 44: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

STACKSTORM EVENT-DRIVEN AUTOMATION ALLOWS YOU TO GET YOUR SOLUTION UP AND RUNNING QUICKLY SO YOU CAN DELIVER BUSINESS FAST, EXPERIMENT AND INNOVATE. ONCE YOU HAVE IT JUST RIGHT, YOU CAN BUILD A MORE PERMANENT VERSION WITH MICROSERVICES

ActionsSensors

WorkflowsRules

44

Page 45: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

StackStorm is an innovation platform where we can build solutions, experiment and learn, while deliver business value, before moving implementation to dedicated services

Page 46: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

46

StackStorm OpenSourcePlatform

Brocade Workflow Composer(StackStorm Enterprise Edition)

Network Automation

StackStorm Exchange Community

Security AssistedNetworking

Page 47: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

©2017 Extreme Networks, Inc. All rights reserved

Come and see! SC17 Excibition, Booth #519

47

Page 48: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

Image by Miki Yoshihito, Creative Commons license

Dmitri ZimineExtreme Networks@dziminehttp://github.com/dzimine/serverless-swarm

@Stack_Stormhttp://github.com/StackStorm/st2 Star 2,317

Thank You!


Recommended