Post on 23-Feb-2016
description
transcript
Extending the Galaxy portal with parallel and distributed execution
capability
Ketan Maheshwari, Alex Rodriguez, David Kelly, Ravi Madduri, Justin Wozniak, Michael Wilde, Ian Foster
Argonne National Laboratory & University of Chicago
swift-lang.org
2
Overview
Introduce the Galaxy and Swift systems Couple the Swift and Galaxy gateway frameworks Combine the features offered by Galaxy and Swift into an
integrated platform Different integration schemes based on user
requirements, and application patterns Data management schemes Example use-case A demo screencast (if time permits) Summary and Future work
3
Overview of the Galaxy Workflow System*
swift-lang.org *slide courtesy: Center for Genomic Regulation, Barcelona, Spain
workspaceTools panelMonitor/HistoryPanel
swift-lang.org
4
Overview of Swift Parallel Scripting Framework
Simulation of super-cooled glass materials
Protein folding using homology-free approaches
Climate model analysis and decision making in energy policy
Simulation of RNA-protein interaction
Multiscale subsurfaceflow modeling
Modeling of power gridapplications
All have published science results obtained using Swift
T0623, 25 res., 8.2Å to 6.3Å (excluding tail)
Protein loop modeling. Courtesy A. Adhikari
Native Predicted
Initial
E
D
C
A B
A
B
C
D
E
F
F>
swift-lang.org
5
Motivation : Swift and Galaxy are Complementary in many ways
Galaxy (galaxyproject.org) offers a simple, user-friendly web-based interface for composing, execution, monitoring workflows
Galaxy results are sharable, reproducible and reusable
Galaxy is a widely used: well-supported by user community e.g. Next Generation Sequencing (NGS) Community
Swift provides sophisticated interface to parallel and distributed platforms
Swift scripts are structured expressions of complex application flows which are readily executable on multiple, diverse and independent remote resources
swift-lang.org
6
Galaxy web-console
Swift-Galaxy Integration Overview
Approaches enabling integration in different ways:– At tool level– At Workflow level– At language/expression level
Galaxy server Galaxy-tool
Swift
app libraries
CloudsClusters
SupercomputersGrids
user computer
7
Computational Infrastructure
Galaxy offers a limited support for Distributed and Parallel Resources– Needs additional adhoc configuration to interface– Constrained in some ways, e.g. needs shared file system*
Swift is robustly interfaced to a wider types of Resource Managers with finer control over job submission parameters:– Supports: PBS/Torque, SGE, SLURM, Condor– Supports bag-of-workstations: clouds, workstation clusters– Supports distributed file system, multiple execution sites
simultaneously
swift-lang.org * To the best of our knowledge
swift-lang.org
8
Interface with heterogeneous parallel systems is a challenge
## SLURM#!/bin/bash#SBATCH -J ...#SBATCH -oe ...#SBATCH –p ...#SBATCH –N ...ibrun ./my_exec args
# CONDORExecutable=eUniverse=stdError=err.$Input=in.$Output=out.$Log=foo.logQueue
SGE#!/bin/bash#$ -cwd#$ -j y#$ -S /bin/bashpwd./my_exec args
##TORQUE/PBS#!/bin/bash#PBS -q ccs_short#PBS -N my_serial_job#PBS -l walltime=01:00:00#PBS -l nodes=1:noib:ppn=1#PBS -m e./a.out
swift-lang.org
9
Scheme1: Wrap Swift around Galaxy Tools
swift-tool A
swift-tool B
swift-tool N
Other Galaxytools
.
.
.
.
.
execution history
execution history
execution history
.
.
.
swift-lang.org
10
Scheme 2: Interoperability between expressions
Swift script Galaxy WorkflowXML transformation
Internally both Swift and Galaxy codes are represented in XML dialects
Automated transformation to convert from one form into another
Currently under development
swift-lang.org
11
Scheme 3: Harness Data Parallelism using foreach
Galaxy-tool
Galaxy-tool
Galaxy-tool
.
.
in-data(split)
out-data(merge)
swift-foreach wrapper
foreach protein, idx in proteinList{ runBlast (protein); tracef(“The index is: %i\n", idx);}
foreach idx in [begin:end:step]{ runmyapp (idx);}
swift-lang.org
12
Cloud Interfaces Galaxy instances running on cloud nodes are already taking
advantage of cloud-based resources Swift’s coasters mechanism can farm resources and combine
multiple cloud and non-cloud resources in a single application run.
swift-lang.org
13
Data Management
Both Galaxy and Swift offer various data management capabilities
Galaxy offers remote data uploading and viewing capabilities Swift allows disc resident data to be operated upon as
program variables Swift’s data-providers are interfaced with various data
management protocols and can manage data motions at runtime
swift-lang.org
14
Evaluation Application: Inference analysis for power prices
generatesample
Candidate Solution
Variance &Mean
generatesample
Candidate Solution
generatesample
generatesample
generatesamplegenerate
sample
lower bound
upper bound
…
…
… …
samples
batches
batch size lower bound
upper bound
… …
…
batches
swift-lang.org
15
Swift Script for Inference Analysis
import "mappings";import "apps”;type file;
int nS[] = [10, 100, 1000, 10000, 100000];foreach S, idxs in nS { sample0 = gensample(S, wind_data); obj[idxs] = ampl(sample0); foreach B, idxb in [10:40:10] { foreach k in [0:B]{ sample1 = gensample(S, wind_data); obj_l[idxs][idxb][k] = ampl_L(sample1); sample2 = gensample(S, wind_data); obj_u[idxs][idxb][k] = ampl_U(sample2, obj[idxs]);}}}
swift-lang.org
16
Summary
Swift-Galaxy integration improves science gateways:– User control– Structured distributed computing– Simple– Interactive
Commonalities in basic execution model of Galaxy and Swift leads to many avenues of integration schemes
Broadly, Swift acts as a backend manager while Galaxy being the frontend for operations
Example of combining command-line and GUI based frameworks
swift-lang.org
17
Future Work
A generic approach for each of the integration schemes Wider application adaptation Finer and broader exposure to configuration options to users Interactive monitoring features Authentication features, Globus based identity management
swift-lang.org
18
Acknowledgements
This work was supported in part by the NIH through the NHLBI grant: The Cardiovascular Research Grid (R24HL085343) and by the U.S. Department of Energy under contract DE-AC02- 06CH11357.
We are grateful to Amazon, Inc., for an award of Amazon Web Services time that facilitated early experiments.
Colleagues at Swift and Globus groups at the MCS Division, Argonne National Laboratory
swift-lang.org
19
Thank you!
Visit swift-lang.org for more information about Swift parallel
scripting framework