Taverna workflows: provenance and reproducibility
Aleksandra PawlikThe University of Manchester
Workflow approaches to investigation of biological complexitySTFC/NERC Workshop 15-16 October 2013
Workflows for improvement
Workflows are more than just pipelines…
Scaling up automated executionBringing together distributed and continually changing resourcesDealing with different standards, interfaces and implementationSupport for repeatable analysis
Taverna Engine Execution
Workflows in Scufl2 Functional dataflow, simple control flows, implicit
iteration Linking services and tools Different data resources and formats “In Workflow Programming” (eg. Beanshell scripting) Provenance collection: W3C PROV-O, OPM Plug-in Framework
Infrastructures: Web Services (SOAP, REST), Grid, HPC Common Tools: Excel Spreadsheets, Google Refine, R
OAuth security plug-in
Workflow engine to run workflows
List of services
Construct and visualise workflows
Taverna Workbench• Customizable for domains (eg. expose services only for biodiversity)• Desktop application• Intermediate results views• Plug-in framework
Taverna User Spectrum
Workflow Engineer
ComputationalScientist
DomainScientist(Workflow User)
Workbench WorkbenchComponents
Lite Domain-SpecificWebsite / Tool / Portal
Workflow Visibility
Concept KnowledgeTaverna Domain
High Low
Player
Application Factory: productive by reuse
Right apps, right users Commodity apps:
Web. Spreadsheets. R. Customisation Mixed workflow / scripting Deployment / Portability
Web based / desktop Virtualised deployments Cloud hosted service A cloud-enabled local host
Local ownership Capability building
InfrastructureInfrastructureLegacy, others and your own software, datasets, services, codes, and platforms. Optimise and manage use of computing infrastructure.
WFMSmiddleware
WFMSmiddleware
Support design, config. and execution of workflows. manage utility actions for data, logging, security, compute, error. Shield incompatibilities & complexity.
Parameterised, integrative, multi-step (data) pipelines, analytics, computational protocols. Can be repetitively reused.
WorkflowWorkflow
AppsAppsDomain/task specific apps that incorporate (an ecosystem of) workflows. Integrate
Reuse and Reproducibility
~6,000 membersover 300 groups, over 3,000 workflows
Taverna ComponentsWorkflow Blocks made of a
workflow
Well described Well behaved Well looked after Agreed fail Agreed formats in and out Agreed provenance
Deposited in myExperimentGrouped into families
Provenance: how did you do it? The link between
computation and results
Collecting -> Using Provenance
Reporting at different scales/ levels
PDIFF: comparing provenance traces to diagnose divergence across experimental results [Woodman et al, 2011]
Research Objectshttp://www.researchobject.org/
bundles and relates digital resources of a scientific experiment or investigation using standard mechanisms
http://www.w3.org/community/rosc/
Galaxy execution
Tools
Taverna server
Wrap as Tool
Taverna in Galaxy
Workflow in
Upload
The Taverna Suite of ToolsClient User InterfacesUser InterfacesWorkflow
Repository
Service Catalogue
Third Party Tools
Web Portals / Gateways
Activity and Service Plug-in
Manager
Workflow Provenance
Workflow Server
Workflow Engine
Virtual Machine
Prog APIs
Command Line
Player
WorkflowComponent
s
Workbench Taverna Lite
Interaction Server
Freely availableOpen source
Current version 2.4
80,000+ downloads across version
Active user forum & support
Windows/Mac OS X/Linux/Unix
Sustainability and user support
Tutorials and Workshops
www.taverna.org.uk
Taverna in other projects
BioDiversity Virtual e-Laboratorywww.biovel.eu
Wf4Everwww.wf4ever-project.org
VPH-Sharewww.vph-share.eu
SCAPEwww.scape-project.eu
Pacific Northwest National Laboratorywww.pnnl.gov
Pacific Northwest National Laboratorywww.pnnl.gov
Scientific Workflows and Provenance Working Groupwww.dataone.org
Scientific Workflows and Provenance Working Groupwww.dataone.org
iPlant Collaborativewww.iplantcollaborative.org
iPlant Collaborativewww.iplantcollaborative.org
HELIOwww.helio-vo.eu
HELIOwww.helio-vo.eu KBase
www.kbase.usKBasewww.kbase.us
SHIWAwww.shiwa-workflow.eu
Data-centric ComputationScientific workflows over Distributed Cyber-Infrastructure.
Data sharing libraries and catalogues for all types of scientific artefacts and all types of scientists.
Knowledge ManagementMetadata, semantics digital exchange, preservation, publishing
Software EngineeringSoftware sustainability, software and data policy, training
Products Methods
For more information
Taverna http://www.taverna.org.uk
myExperiment http://www.myexperiment.org
myGrid http://www.mygrid.org.uk