Date post: | 14-Dec-2015 |
Category: |
Documents |
Upload: | jamya-sailer |
View: | 225 times |
Download: | 3 times |
An Open Provenance Model for Scientific Workflows
Professor Luc [email protected] of Southampton
www.ecs.soton.ac.uk/~lavm
Provenance & PASOA Teams
University of Southampton Luc Moreau, Paul Groth, Simon Miles, Victor Tan, Miguel Branco,
Sofia Tsasakou, Sheng Jiang, Steve Munroe, Zheng Chen IBM UK (EU Project Coordinator)
John Ibbotson, Neil Hardman, Alexis Biller University of Wales, Cardiff
Omer Rana, Arnaud Contes, Vikas Deora, Ian Wootten, Shrija Rajbhandari
Universitad Politecnica de Catalunya (UPC) Steven Willmott, Javier Vazquez
SZTAKI Laszlo Varga, Arpad Andics, Tamas Kifor
German Aerospace Andreas Schreiber, Guy Kloss, Frank Danneman
Contents
Motivation Provenance Concept Map Process documentation in a
concrete bioinformatics application Conclusions
Peer Review/Audit
Accounting
BankingHealthcare
Academicpublishing
e-Science datasets
How to undertake peer-reviewing and validation of e-Scientific results?
Current Solutions
Proprietary, Monolithic
Silos, Closed Do not inter-operate
with other applications
Not adaptable to new regulations
Provenance
Oxford English Dictionary: the fact of coming from some particular
source or quarter; origin, derivation the history or pedigree of a work of art,
manuscript, rare book, etc.; concretely, a record of the passage of an item through its various owners.
Concept vs representation
Application Drivers
Aerospace engineering: maintain a historical record of design processes, up to 99 years.
Organ transplant management: tracking of previous decisions, crucial to maximise the efficiency in matching and recovery rate of patients
High Energy Physics: tracking, analysing, verifying data sets in the ATLAS Experiment of the Large Hadron Collider (CERN)
Bioinformatics: verification and auditing of “experiments” (e.g.for drug approval)
is an execution of
Application
Services
Provenance(concept)
Data product
produces
Process Documentation
P-structure
has a structure
operates over
P-assertionsconsists of
contains
assert
Process
documents
is defined as a past
Provenance (representation)
is represented by
Provenance Query
is obtained by
has
Making Applications Provenance Aware
ApplicationApplication
Data Product
ProvenanceStore
Assert p-assertions and record them as Process Documentation
Obtain the provenanceof data by issuing
provenance queries
Process Documentation
M1
M2
M3
M4
f1
f2
M3 = f1(M1)M2 = f2(M1,M4)M2 is in reply to M1
I received M1, M4I sent M2, M3
Interaction p-assertions
Relationshipp-assertions
Service statep-assertions
I received M1 at time tI used algorithm x.y.z
Data flow
Interaction p-assertions allow us to specify a flow of data between services
Relationship p-assertions allow us to characterise the flow of data “inside” an service
Overall data flow (internal + external) constitutes a DAG, which characterises the process that led to a result
Biology Determine how protein
sequences fold into a 3D structure?
Structure of protein sequences may help to answer this question.
Structure can be quantified by textual compressibility.
Determine the amino acid groupings that maximize compressibility?
Collaboration Diagram
Actual Call DAG
The P-StructureThe logical structure of a provenance store
Interaction Record
The set of p-assertions pertaining to agiven interaction (i.e., message exchange between a sender and areceiver)
Interaction KeyA unique identifier for an interaction
Sender identity
Receiver identity
Local id
View
The set of p-assertions created by an asserterinvolved in an interaction (sender or receiverview)
Asserter
The identity of an asserter
Interaction P-Assertion
An assertion of the contents of a message by an actor that has sent or received that message
Interaction P-Assertion Content
The content of an interaction p-assertion:here, the invocation of blast (through awrapper)
Interaction Content
Provenance-related information passed inapplication messages
Actor State P-Assertion
An assertion made by an actor about its internalstate in the context of a specific interaction
Relationship P-AssertionWith respect to an interaction, a relationship p-assertion is anassertion, made by an actor, that describes how the actor obtainedoutput data or the whole message sent in that interaction by applyingsome function to input data or messages from other interactions.
Subject Id
The identity of the subject of a relationship
Object Id
The identity of the object of a relationship
Process Documentation Characteristics
Common logical structure of the provenance store shared by all asserting and querying actors
Can be produced autonomously, asynchronously by the different application components
Open, extensible model, for which we are producing a public specification
Tools can operate on it (e.g. visualisation, reasoning)
Performance (HPDC’05)
Standardisation Philosophy
Thin layer common between systems: extensible data model
Model can be extended for specific: technologies (WS, Web, …), or application domains (Bio, Healthcare,
Desktop, …) Service interfaces
WS-Prov-Intro
WS-Prov-DM
WS-Prov-Glo
WS-Prov-Rec WS-Prov-Query
WS-Prov-DM-Link
WS-Prov-DM-Infer
WS-Prov-DM-DS
Generic Profiles Domain Specific Profiles
WS-Prov-SOAP
Technology Bindings
WS-Prov-DM-Sec
WS-Prov-WWW
WS-Prov-DM-Rel
WS-Prov-Primer
Proposed List of Specifications
ProvenanceStore
Reco
rd
To Sum Up
Query
Compliance check Rerun/Reproduce Analyse
Standardising thedocumentation of
Business Processes
Provenance Architecture Methodology
Apply
Healthcare
DistributionFinance
Aerospace
Automobile
Pharmaceutical
Slide from John Ibbotson
Conclusions
Crucial topic for many applications Full architectural specification Implementation available for download Methodology to make application
provenance-aware Draft standardisation proposal to be
released www.pasoa.org www.gridprovenance.org
twiki.ipaw.info
Provenance Challenge
Provenance Challenge Workshopat OGF18, Washington, September 11-14