+ All Categories
Home > Technology > Expressing and sharing workflows

Expressing and sharing workflows

Date post: 28-Jan-2018
Category:
Upload: daniel-s-katz
View: 146 times
Download: 2 times
Share this document with a friend
7
National Center for Supercomputing Applications University of Illinois at Urbana–Champaign Expressing and sharing workflows Daniel S. Katz Assistant Director for Scientific Software & Applications, NCSA Research Associate Professor, CS Research Associate Professor, ECE Research Associate Professor, iSchool [email protected], [email protected] @danielskatz
Transcript
Page 1: Expressing and sharing workflows

National Center for Supercomputing ApplicationsUniversity of Illinois at Urbana–Champaign

Expressing and sharing workflows

Daniel S. KatzAssistant Director for Scientific Software & Applications, NCSAResearch Associate Professor, CSResearch Associate Professor, ECEResearch Associate Professor, [email protected], [email protected]@danielskatz

Page 2: Expressing and sharing workflows

What’s a workflow?

• A set of tasks and dependencies between them• Perhaps expressed as data structure, e.g. graph (DAG or cyclic)

• How is this different than a computer program?• The tasks as more well-defined (inputs, outputs)• The tasks are longer (running time O(sec) – O(hr))

• Why express it differently?• Program (script) is a natural way of expressing a workflow

• Examples: shell scripts, programs in Swift/Parsl• YesWorkflow annotations to help in understanding scripts• Swift/Parsl: functions used to identify components

• Expressing it as data corresponds to the compiled (assembly) version of the workflow• Useful for a lot of things, but not understanding

http://swift-lang.org, http://parsl-project.org

Page 3: Expressing and sharing workflows

YesWorkflow (YW)

• Name: “Yes, scripts are (can be) workflows, too!”• But, workflow (dataflow) usually hidden in the script• Idea: let the script author reveal the structure by

declaring tasks (steps) and dataflow between tasks.• This is a modeling step

• very coarse (workflow: one big black box w/ inputs & outputs)• or rather fine (workflow has many steps, linked by dataflow)

• => language to explain (graphically) what the concepts (relevant steps, relevant data) you want to share

• => this conceptual YW model can itself be queried; linked with runtime observables, provenance

Credit: Bertram Ludäscher

Page 4: Expressing and sharing workflows

Parsl• A python-based parallel scripting library (http://parsl-project.org),

based on ideas in Swift (http://swift-lang.org)• Tasks exposed as functions (python or bash)

@App('bash', data_flow_kernel)def echo(message, outputs=[]):

return 'echo {0} &> {outputs[0]}’@App('python', data_flow_kernel)def cat(inputs=[]):

with open(inputs[0]) as f:return f.readlines()

• Return values are futures• Other tasks can be called that depend on these futures

• Will not run until futures are satisfied/filled• Main code used to glue functions together

hello = echo("Hello World!", outputs=['hello1.txt'])message = cat(inputs=[hello.outputs[0]])

• Fairly easy to understand

Page 5: Expressing and sharing workflows

How to promote/share workflows

• How do we share general software?• Libraries (units of execution with well-defined APIs)• Source code (fork model)• Source code repositories (GitHub), packaging

systems/repositories (PyPI, CRAN)• How do we share data?

• Repositories (Dryad)• For workflows

• Libraries -> sub-workflows, defined to provide well-specified functionality

• Source code -> source code (scripts), may still be hard to understand

• Data -> data repository for workflows (MyExperiment)

Page 6: Expressing and sharing workflows

www.myexperiment.orgDeRoure,D.,Goble,C.Stevens,R.(2009)TheDesignandRealisation ofthemyExperiment VirtualResearchEnvironmentforSocialSharingofWorkflows.FutureGenerationComputerSystems25,pp.561-7.

• Aworkflowcommonsforworkflowsharing,designedusingWeb2.0principles

• LaunchedopenbetainNovember2007,stillactivelyused

• Largestpubliccollectionofworkflows,formultipleworkflowsystems

• 2400+entriesinGoogleScholarrefertomyExperiment

• Opensource,RESTAPI,partofOpenLinkedDatacloud(66ktriples)- lod-cloud.net

• Introduced“packs”whichledtoResearchObjects– www.researchobject.org

• Workflowcollectionstudiedinscientificworkflowande-Sciencecommunities

• ServicemaintainedbyManchesterandOxforduniversities.Informsdesignofotherworkflowsharingsystems.

• Contentstats:10591members,393groups,3876workflows,1233files,477packs

Credit: Carole Goble

Page 7: Expressing and sharing workflows

GitHub• Widely used for sharing software, and socially working

on/with software (and many other types of documents)• GitHub is used for sharing workflows today

• Both scripts and data• Borrowing from “Software vs. data in the context of

citation”• A workflow as a program or a script is code, a creative work

• Appropriate license: OSI-approved open source (e.g., BSD)• A workflow as a DAG is data?

• Appropriate license: Creative Commons (e.g., CC-BY)?• So, let’s keep workflows as programs/scripts• Use YesWorkflow with scripts• Use GitHub to share

Katz DS, Niemeyer KE, et al. (2016) Software vs. data in the context of citation. PeerJ Preprints 4:e2630v1 doi: 10.7287/peerj.preprints.2630v1


Recommended