Date post: | 28-Jan-2018 |
Category: |
Technology |
Upload: | dmitri-zimine |
View: | 268 times |
Download: | 0 times |
Genomic Computation at Scalewith Serverless, StackStorm, and DockerSC17, 14 Nov 2017Dmitri ZimineFellow @ Extreme Networks@dzimine
Image by Miki Yoshihito, Creative Commons license
Genomic Sequencing and Annotation
ACGTGACCGGTACTGGTAACGTACACCTACGTGACCGGTACTGGTAACGTACGCCTACGTGACCGGTACTGGTAACGTATACACGTGACCGGTACTGGTAACGTACACCTACGTGACCGGTACTGCTGGTAACGTATACCTCT...
Sequencer
Sequenced Genome
DNA Sample
Annotated Sequence
Computein silko
3
So that…
Source: http://www.yourgenome.org
Victor SolovyevPartner,
Leading scientist in computational
biologyVictor Solovyev is a leading scientist in computational biology. His experience is a good mixture of academic positions, including Professor at Royal Holloway and KAUST, and various industry roles. His research on bioinformatics and genomic computations are published in Nature, Science, Genome Research and highly cited.
As Chief Sci. Officer at Softberry, he is leading software development for biomedical data analysis and research in computational biology. Softberry software products have been used in over 2000 research publications in 2016 alone. Fgenesh program has been cited in ~ 3200, Bprom program in ~ 800, Fgenesb pipeline in ~500 scientific publications.
5
fgenesb pipeline: some [prev] results
PROPERTIES:
Challenges:• Offer annotation pipelines online• Use cloud, for large elastic capacity• Handle scale - spiky workload• Economically
GAaaS – Genomic Annotation as a Service
Agenda
8
Problem & Solution
Domain demands, technology selection & serverless, toolchain, solution overview
Show & Tell Demo
Discussion Lessons learned, what to keep & what to refactor, the path forward
Typical genomic annotation pipelineSearch for similar
proteins in databases
KEGG
Prediction of genes and proteins
Compilation and presentation of
results
NR
fgenesb
Blast(NR)
GCView
50-100Gb
KOALA(KEGG)
1Mb-3Gb
HighlyParallel-able
Annotation Pipelines
A basic exome pipeline delivering called variants from raw sequence could consist of as few as 12 steps, most of which can be run in parallel, but a real analysis will typically involve several additional downstream steps and complex report generation.
Source: Brief Bioinform bbw020. DOI: https://doi.org/10.1093/bib/bbw020
Annotation Pipelines
A basic exome pipeline delivering called variants from raw sequence could consist of as few as 12 steps, most of which can be run in parallel, but a real analysis will typically involve several additional downstream steps and complex report generation.
Source: Brief Bioinform bbw020. DOI: https://doi.org/10.1093/bib/bbw020
PROPERTIES:
• Steps: • jobs/functions • Run times – may be hours & days• Diverse (a.k.a. “don’t run on the same box”)
• Workflow orchestration:• Logical patterns: splits, parallels, joins• Data flow:
Upstream results –> downstream inputs• Scale dimentions: spiky load
• Low volume of requests, • Very high compute demand per request
Properties:
Serverless
Authoritative: Mike Roberts on martinfowler.com:
My summary• Function, not service: “down when done”• Scale – elastic, infinite, transparent for developer• Pay per use consumption model
https://goo.gl/bTfgfU
What is Serverless?
14
Serverless fits!
*) BYOC – Bring Your Own Code (see the serverless compute manifesto, https://goo.gl/q9HsXB
Typical Serverless requirements:
• “Functions”, not “servers”, down when done
• Elastic scale: handle spiky workload pattern
• BYOC*: package algorithms into containers
• Launch on a variety of events
Additional requirements:
• Long running times: hours
• Pipeline orchestration: execution logic and data passing
• Local Dev environment, consistent and convenient
15
Serverless fits, but…
Typical Serverless requirements:
• Elastic scale: handle spiky workload pattern
• “Functions”, not “servers”, down when done
• BYOC*: package programs into containers, run everywhere
• Launch on a variety of events
Why not <…>
16
AWS Lambda? 5 min limitation - jobs run for hours and days
Azure? No native support for Functionsin docker containers *
OpenWhisk?Lacks powerful workflow to orchestrate pipelines (only sequences)
*) At the time of selecting. I will cover ”what has changed” in Discussion.
D I Y
18
Terraform provisions infra on AWS (WIP);
Vagrant for local dev infra.
Ansible deploys & cofigures software on
Infra.
Docker to containerize functions and
push to local Docker Registry.
StackStorm orchestrates pipeline
executions,
invokes Swarm to run functions,
dynamically scales Swarm on load.
Tool Chain
StackStorm, in 1 minute
ActionsSensors
WorkflowsRules
IT Domains
Config mgmtStorageNetworking ContainersCloud InfraMonitoring Ops Support
Triggers Calls
©2017 Extreme Networks, Inc. All rights reserved
StackStorm is like …
ActionsSensors
WorkflowsRules
Step Functions
AWS Lambda
OpenSource, for DIY Serverless
Three Sides to Serverless Story
DevOps
Developer
End User
Submits sequence,Gets results,fast and cheap.
Packs algorithms incontainers, Defines pipelines
Provides infrastructure
1. DevOps: deploys serverless solution
23share(:rw) data(:ro)
StackStorm
other infra…
f(x)
Registry
Controller
f(x)
f(x)
f(x)
Worker
f(x)
f(x)
f(x)
Worker
f(x)
f(x)
f(x)
Worker
/share /data
$ function
Scale
DevOps
2. Developer: creates functions, defines pipeline
25
StackStorm
Registry
Create functions (BYOC), pack into Docker image,push to local Registry
Define pipelines as StackStorm workflowsDeveloper
1
2
f(x)
f(x)
f(x)
f(x)
StackStorm
StackStorm sends results back to user
Swarmcontroller
2
46Docker pulls
function’s images 5Functions run in containers, produce data
f(x)
StackStorm runs workflowschedules functionsas jobs on Swarm
SwarmWorker
3Swarm schedulesservices
User sendssequence data1
f(x) f(x)
Registry
3. User submits data, System runs pipeline & produces results
End User
27
Genomic annotation pipeline with StackStorm, Docker,
and Docker Swarm
Show & Tell, PART 1
Scale: dynamically, on load
29
share(:rw) data(:ro)
StackStorm
other infra…
f(x)
Registry
Controller
f(x)
f(x)
f(x)
Worker
f(x)
f(x)
f(x)
Worker
f(x)
Worker
Scale
30
Show & Tell, PART 2
Dynamically scaling Swarm cluster on AWS,
on workload
Agenda
32
Problem & Solution
Domain demands, technology selection & serverless, toolchain, solution overview
Show & Tell Demo
Discussion Lessons learned, what to keep & what to refactor, the path forward
Serverless hype accelerates
25+ framewors … but no turn-key fit yet
Kubernetes Won Container Arm Race
now with built-in AWS autoscaler .
Azure Introduced Container Instances
no messing with VMs, per-second billing .
We are outpaced by technology
We are outpaced by technology
So What?
Path Forward: Options
Option 1: Kubernetes
• Use Kubernetes pack from StackStorm Exchange• Utilize k8s “run to completion” jobs• Deploy on AWS, minikube for local development, • Leverage AWS autoscaler for elastic capacity
StackStorm handles pipeline workflow, calls k8s Jobs. Same app developer experience.
39
Path Forward: Options
Option 2: Azure
• Use Azure’s ”Self-orchestration” option with StackStorm• Azure provides containers on demand (no VMs!)• Per container, per second billing
StackStorm handles pipeline workflow, calls Azure containers. App developer experience stays the same.
40
StackStorm
StackStorm sends results back to user
Azure Container
Service
2
46Docker pulls
function’s imagesfrom Registry
5Functions run in containers, produce data
f(x)
StackStorm runs workflowschedules functionsas containers on Azure
AzureContainerInstance
3Azure schedulescontainer instances
User sendssequence data1
f(x) f(x)
Registry
Path forward: Change to Azure Container Instances
End User
42
43
STACKSTORM EVENT-DRIVEN AUTOMATION ALLOWS YOU TO GET YOUR SOLUTION UP AND RUNNING QUICKLY SO YOU CAN DELIVER BUSINESS FAST, EXPERIMENT AND INNOVATE. ONCE YOU HAVE IT JUST RIGHT, YOU CAN BUILD A MORE PERMANENT VERSION WITH MICROSERVICES
ActionsSensors
WorkflowsRules
44
StackStorm is an innovation platform where we can build solutions, experiment and learn, while deliver business value, before moving implementation to dedicated services
46
StackStorm OpenSourcePlatform
Brocade Workflow Composer(StackStorm Enterprise Edition)
Network Automation
StackStorm Exchange Community
Security AssistedNetworking
©2017 Extreme Networks, Inc. All rights reserved
Come and see! SC17 Excibition, Booth #519
47
Image by Miki Yoshihito, Creative Commons license
Dmitri ZimineExtreme Networks@dziminehttp://github.com/dzimine/serverless-swarm
@Stack_Stormhttp://github.com/StackStorm/st2 Star 2,317
Thank You!