+ All Categories
Home > Technology > Versioning for Workflow Evolution

Versioning for Workflow Evolution

Date post: 04-Dec-2014
Category:
Upload: eran-withana
View: 1,388 times
Download: 1 times
Share this document with a friend
Description:
My Presentation on "Versioning for Workflow Evolution", I did in DIDC 2010 conference in June 2010.
Popular Tags:
16
Versioning for Workflow Evolution Eran Chinthaka Withana, Beth Plale School of Informatics and Computing Indiana University, Bloomington, Indiana Roger Barga, Nelson Araujo Microsoft Research, Microsoft Corporation, Redmond, Washington 3 rd International Workshop on Data Intensive Distributed Computing, Chicago, IL, US; “Versioning for Workflow Evolution”; June 22, 2010; Eran C. Withana
Transcript
Page 1: Versioning for Workflow Evolution

Versioning for Workflow Evolution

Eran Chinthaka Withana, Beth Plale School of Informatics and ComputingIndiana University, Bloomington, Indiana

Roger Barga, Nelson Araujo Microsoft Research,

Microsoft Corporation, Redmond, Washington

3rd International Workshop on Data Intensive Distributed Computing, Chicago, IL, US; “Versioning for Workflow Evolution”;

June 22, 2010; Eran C. Withana

Page 2: Versioning for Workflow Evolution

Workflow Evolution• Computational Science Experiments

– Sequence of activities– Set of configurable parameters and input data– Produces outputs to be analyzed and evaluated further

• Evolution of Research– Changes in research artifacts

Page 3: Versioning for Workflow Evolution

Workflow Evolution• Workflows as a good tool to track evolution of research

– Automate repeatable tasks in an efficient manner– Algorithms & experimental procedures encoded in to workflows– Tracking workflows tracks research too

• Tracking effects over time– Provenance of data products– Lineage of and the roots of errors and affected data products

• Comparing Results– More than one research direction in a given experiment– Comparing outputs from different paths of the research

• Attribution– Attribution of credit based on who performed, who owns/created, who own data products– Sharing and attribution of research can and should be an integral part of research

• Eg: Sub-modules from myexperiments.org

• Workflow Evolution Framework and versioning model– Enables the management of knowledge encoded in workflow executions

Page 4: Versioning for Workflow Evolution

Related Work• Workflow evolution share a lot in common with provenance collection frameworks

– I. T. Foster, J.-S. Vockler, M. Wilde, and Y. Zhao. Chimera: A virtual data system for representing, querying, and automating data derivation. In Proceedings of the 14th International Conference on Scientific and Statistical Database Management, pages 37-46, Washington, DC, USA, 2002. IEEE Computer Society.

• Existing evolution frameworks– J. Freire, C. Silva, S. Callahan, E. Santos, C. Scheidegger, and H. Vo. Managing rapidly-evolving scientific

workflows. Lecture Notes in Computer Science, 4145:10, 2006.

• Evolution Data Models– L. Bavoil, S. P. Callahan, P. J. Crossno, J. Freire, C. E. Scheidegger, C. T. Silva, H. T. Vo. Vistrails: Enabling

interactive multiple-view visualizations. In IEEE Visualization, 2005. VIS 05, pages 135-142

• Versioning at different levels– Application level: D. Santry, M. Feeley, N. Hutchinson, and A. Veitch. Elephant: The file system that never

forgets. In Workshop on Hot Topics in Operating Systems, pages 2-7. IEEE Computer Society, 1999. – System/database level: R. Chatterjee, G. Arun, S. Agarwal, B. Speckhard, and R. Vasudevan. Using

applications of data versioning in database application development. In ICSE '04: Proceedings of the 26th International Conference on Software Engineering, pages 315{325, Washington, DC, USA, 2004. IEEE Computer Society

– Disk storage level: M. Flouris and A. Bilas. Clotho: Transparent data versioning at the block I/O level. In Proceedings of the 12th NASA Goddard, 21st IEEE Conference on Mass Storage Systems and Technologies (MSST 2004),pages 315-328, 2004.

Page 5: Versioning for Workflow Evolution

Use Cases1. Research Reproduction2. Scientific Workflows

– In LEAD tracking namelist input files and visualizations

– Tracking activity binaries

Page 6: Versioning for Workflow Evolution

Versioning Model• Dimensions of workflow evolution

– Direct evolution occurs when a user of the workflow performs one of the following actions:• Changes the flow and arrangements of the components within the system• Changes the components within the workflow• Changes inputs and/or output parameters or configuration parameters to different

components within the workflow– Contributions tracks components that are reused from a previous system

• Workflow Evolution Capturing Stages– User explicitly saves the workflow– User closes the workflow editor– Execution of a workflow

• Warning: This granularity might not capture all edits

Page 7: Versioning for Workflow Evolution

Architecture within Trident Scientific workflow worbench

Trident RegistryTrident RegistryTrident Runtime ServicesTrident Runtime Services

Publish-Subscribe BlackboardPublish-Subscribe Blackboard Data ModelData Model

Data Access LayerData Access Layer

ManagementManagement

MonitorMonitor

AdministrationAdministration

RegistryManagement

RegistryManagement

Workflow

Packages

Workflow

Packages

Scientific

Workflows

Scientific

Workflows

WindowsWorkflow

Foundation

DesignDesign

WorkbenchWorkbench

BrowserBrowser

Trident Data ModelTrident Data Model

Trident RegistryTrident Registry

Evolution FrameworkEvolution Framework

Versioning ModelVersioning Model

Local StorageLocal

StorageOther Local/remote Versioning System

Other Local/remote Versioning System

Trident WorkbenchTrident Workbench

Trident Architecture

Trident Evolution FrameworkArchitecture

Page 8: Versioning for Workflow Evolution

User View (within Trident)

Versioned Objects in Registry

Workflow Evolution View

Page 9: Versioning for Workflow Evolution

Performance Evaluation• Evaluation strategies

• Delta – difference between two consecutive versions• Checkpointing - complete version saved after fixed number

of version

– No Delta, No Checkpointing• Each version saved as it is

– With Delta, No Checkpointing• Delta with previous version

– With Delta, With Checkpointing• Checkpointed after n versions

• Workflows usedWorkflow Size (Bytes) Delta

(Bytes)O 1032 210

M 4087 2564

Page 10: Versioning for Workflow Evolution

Performance Evaluation• File Write Time

O Workflow M Workflow

Page 11: Versioning for Workflow Evolution

Performance Evaluation

• Version Recovery Time

O Workflow M Workflow

Page 12: Versioning for Workflow Evolution

Performance Evaluation• Space Usage for a Version

O Workflow M Workflow

Page 13: Versioning for Workflow Evolution

Performance Evaluation• Data Retrieved per Version

O Workflow M Workflow

Page 14: Versioning for Workflow Evolution

Discussion• "No delta, No Checkpointing" options performs poorly with respect to storage

usage – 4-5 times for smaller workflow, smaller delta and 2-times for larger workflow, large delta

• outperforms both other options with respect to – version save time, 20-30 times for the large workflow, large delta and 5 times for smaller

workflow, small delta– version recovery time 10 times for the smaller workflow, small delta and 5 times larger

workflow, large delta

• Criteria for selecting object maintenance strategy– size of data objects– average changes for data objects between different versions of the same

object– response time to the user and the system

• Challenges in working with different types of artifacts

Page 15: Versioning for Workflow Evolution

Future Work• Dynamic strategy to adjust versioning

technique depending on object properties• Challenges

– Unavailability of visualization software – Visualizing different types of data products,

integrating other viz tools• LEAD II Vortex2 Use case

– Tracking different WF Activity library versions

Page 16: Versioning for Workflow Evolution

Thank You !!!

Questions …?


Recommended