+ All Categories
Home > Documents > An Open Provenance Model for Scientific Workflows Professor Luc Moreau [email protected]...

An Open Provenance Model for Scientific Workflows Professor Luc Moreau [email protected]...

Date post: 14-Dec-2015
Category:
Upload: jamya-sailer
View: 225 times
Download: 3 times
Share this document with a friend
Popular Tags:
39
An Open Provenance Model for Scientific Workflows Professor Luc Moreau [email protected] University of Southampton www.ecs.soton.ac.uk/~lavm
Transcript
Page 1: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

An Open Provenance Model for Scientific Workflows

Professor Luc [email protected] of Southampton

www.ecs.soton.ac.uk/~lavm

Page 2: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

Provenance & PASOA Teams

University of Southampton Luc Moreau, Paul Groth, Simon Miles, Victor Tan, Miguel Branco,

Sofia Tsasakou, Sheng Jiang, Steve Munroe, Zheng Chen IBM UK (EU Project Coordinator)

John Ibbotson, Neil Hardman, Alexis Biller University of Wales, Cardiff

Omer Rana, Arnaud Contes, Vikas Deora, Ian Wootten, Shrija Rajbhandari

Universitad Politecnica de Catalunya (UPC) Steven Willmott, Javier Vazquez

SZTAKI Laszlo Varga, Arpad Andics, Tamas Kifor

German Aerospace Andreas Schreiber, Guy Kloss, Frank Danneman

Page 3: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

Contents

Motivation Provenance Concept Map Process documentation in a

concrete bioinformatics application Conclusions

Page 4: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

Motivation

Page 5: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

Peer Review/Audit

Accounting

BankingHealthcare

Academicpublishing

Page 6: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

e-Science datasets

How to undertake peer-reviewing and validation of e-Scientific results?

Page 7: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

Current Solutions

Proprietary, Monolithic

Silos, Closed Do not inter-operate

with other applications

Not adaptable to new regulations

Page 8: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

Provenance

Oxford English Dictionary: the fact of coming from some particular

source or quarter; origin, derivation the history or pedigree of a work of art,

manuscript, rare book, etc.; concretely, a record of the passage of an item through its various owners.

Concept vs representation

Page 9: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

Application Drivers

Aerospace engineering: maintain a historical record of design processes, up to 99 years.

Organ transplant management: tracking of previous decisions, crucial to maximise the efficiency in matching and recovery rate of patients

High Energy Physics: tracking, analysing, verifying data sets in the ATLAS Experiment of the Large Hadron Collider (CERN)

Bioinformatics: verification and auditing of “experiments” (e.g.for drug approval)

Page 10: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

Provenance Concept Map

Page 11: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

is an execution of

Application

Services

Provenance(concept)

Data product

produces

Process Documentation

P-structure

has a structure

operates over

P-assertionsconsists of

contains

assert

Process

documents

is defined as a past

Provenance (representation)

is represented by

Provenance Query

is obtained by

has

Page 12: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

Making Applications Provenance Aware

ApplicationApplication

Data Product

ProvenanceStore

Assert p-assertions and record them as Process Documentation

Obtain the provenanceof data by issuing

provenance queries

Page 13: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

Process Documentation

M1

M2

M3

M4

f1

f2

M3 = f1(M1)M2 = f2(M1,M4)M2 is in reply to M1

I received M1, M4I sent M2, M3

Interaction p-assertions

Relationshipp-assertions

Service statep-assertions

I received M1 at time tI used algorithm x.y.z

Page 14: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

Data flow

Interaction p-assertions allow us to specify a flow of data between services

Relationship p-assertions allow us to characterise the flow of data “inside” an service

Overall data flow (internal + external) constitutes a DAG, which characterises the process that led to a result

Page 15: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

Process Documentation in a Concrete Bioinformatics Application

Page 16: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

Biology Determine how protein

sequences fold into a 3D structure?

Structure of protein sequences may help to answer this question.

Structure can be quantified by textual compressibility.

Determine the amino acid groupings that maximize compressibility?

Page 17: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

Collaboration Diagram

Page 18: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

Actual Call DAG

Page 19: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

The P-StructureThe logical structure of a provenance store

Page 20: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

Interaction Record

The set of p-assertions pertaining to agiven interaction (i.e., message exchange between a sender and areceiver)

Page 21: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

Interaction KeyA unique identifier for an interaction

Sender identity

Receiver identity

Local id

Page 22: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

View

The set of p-assertions created by an asserterinvolved in an interaction (sender or receiverview)

Page 23: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

Asserter

The identity of an asserter

Page 24: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

Interaction P-Assertion

An assertion of the contents of a message by an actor that has sent or received that message

Page 25: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

Interaction P-Assertion Content

The content of an interaction p-assertion:here, the invocation of blast (through awrapper)

Page 26: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

Interaction Content

Provenance-related information passed inapplication messages

Page 27: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

Actor State P-Assertion

An assertion made by an actor about its internalstate in the context of a specific interaction

Page 28: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

Relationship P-AssertionWith respect to an interaction, a relationship p-assertion is anassertion, made by an actor, that describes how the actor obtainedoutput data or the whole message sent in that interaction by applyingsome function to input data or messages from other interactions.

Page 29: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

Subject Id

The identity of the subject of a relationship

Page 30: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

Object Id

The identity of the object of a relationship

Page 31: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

Process Documentation Characteristics

Common logical structure of the provenance store shared by all asserting and querying actors

Can be produced autonomously, asynchronously by the different application components

Open, extensible model, for which we are producing a public specification

Tools can operate on it (e.g. visualisation, reasoning)

Page 32: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

Performance (HPDC’05)

Page 33: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

Standardisation Philosophy

Thin layer common between systems: extensible data model

Model can be extended for specific: technologies (WS, Web, …), or application domains (Bio, Healthcare,

Desktop, …) Service interfaces

Page 34: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

WS-Prov-Intro

WS-Prov-DM

WS-Prov-Glo

WS-Prov-Rec WS-Prov-Query

WS-Prov-DM-Link

WS-Prov-DM-Infer

WS-Prov-DM-DS

Generic Profiles Domain Specific Profiles

WS-Prov-SOAP

Technology Bindings

WS-Prov-DM-Sec

WS-Prov-WWW

WS-Prov-DM-Rel

WS-Prov-Primer

Proposed List of Specifications

Page 35: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

Conclusions

Page 36: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

ProvenanceStore

Reco

rd

To Sum Up

Query

Compliance check Rerun/Reproduce Analyse

Standardising thedocumentation of

Business Processes

Provenance Architecture Methodology

Apply

Healthcare

DistributionFinance

Aerospace

Automobile

Pharmaceutical

Slide from John Ibbotson

Page 37: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

Conclusions

Crucial topic for many applications Full architectural specification Implementation available for download Methodology to make application

provenance-aware Draft standardisation proposal to be

released www.pasoa.org www.gridprovenance.org

Page 38: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

twiki.ipaw.info

Provenance Challenge

Provenance Challenge Workshopat OGF18, Washington, September 11-14

Page 39: An Open Provenance Model for Scientific Workflows Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton lavm.

Questions


Recommended