Capturing provenance data
Dr Alison McKay (in place of Dr Richard Bagshaw)
University of Leeds, School of Mechanical Engineering
Distributed Aircraft Maintenance Environment - DAME
Purpose of presentation
• to present the DAME provenance research • to discuss the experiences of deploying this
technology in a Grid based systems
Distributed Aircraft Maintenance Environment - DAME
Outline of presentation
• What do we mean by “provenance data”?• What are we aiming for?• What does achieving this goal entail?• What progress has been made to date?• What remains to be done?
Distributed Aircraft Maintenance Environment - DAME
• Provenance Data– Recording the history of data and its place of origin
Distributed Aircraft Maintenance Environment - DAME
Provenance Database
Provenance Viewer Workflow Advisor
Workflow Script
Workflow Definition (BPEL)
Workflow InstanceWorkflow Instance
Workflow InstanceWorkflow Instance
Workflow InstanceService Instance
Workflow Manager
DAME Provenance Architecture
Distributed Aircraft Maintenance Environment - DAME
Outline of presentation
• What do we mean by “provenance data”?
• What are we aiming for?• What does achieving this goal entail?• What progress has been made to date?• What remains to be done?
Distributed Aircraft Maintenance Environment - DAME
RR Integrated Product Development process
Stage 1
New Project Planning
Business ConceptDefinition
Identify the Need
Preliminary ConceptDefinition
Stage 2Full
ConceptDefinition
Stage 3Propulsion
SystemRealisation
Stage 4In-ServiceMonitoring
&TechnicalSupport
Capability Acquisition
EngineLaunch
Entryinto
Service
Distributed Aircraft Maintenance Environment - DAME
Provenance Requirement
LegalImplications
Audit Trail
Contractual Obligations
Troubleshooting
Re-run diagnosis
DAME provenance data users
Distributed Aircraft Maintenance Environment - DAME
failure mode curvesPosition and shape depend on-engine type (from PDM/SDM)- engine state (eg, age)- events (eg, from QUOTE data)
this line shows when failure occurs – its positionand shape depends upon its operating environment
position of an engine, ie, its current state ofhealth
Time
Textra T
Potential benefits
Distributed Aircraft Maintenance Environment - DAME
Specific tasks to be supported
• Create an audit trail (Who, What, Where, Why, When, Which, hoW)
• Re-execute a workflow process– repeat a workflow process (same Grid resources & services,
sequence and data)
– rerun a workflow process (same Grid resources & services and sequence on different data)
Distributed Aircraft Maintenance Environment - DAME
Outline of presentation
• What are we aiming for?
• What does achieving this goal entail?• What progress has been made to date?• What remains to be done?
Distributed Aircraft Maintenance Environment - DAME
Initial requirements
Support the re-execution of workflows with new data *
Provide provenance data for the Workflow Advisor
Provide a viewer to captured provenance data
* As opposed to repeating a given workflow using the same data and resources
Distributed Aircraft Maintenance Environment - DAME
DS&S perspective on requirements
• Origin of data fully traceable– (Including time and date stamps)
• Processed data traceable through application software
• Any human interaction/annotations must be captured
Distributed Aircraft Maintenance Environment - DAME
Research issues
Specify DefineExecute /
deploy
Product
Process
Product Data Management
system
Service Data Manager
Workflow process definition
Workflow execution data
Distributed Aircraft Maintenance Environment - DAME
Process definition (as defined)
processdefinition
process
processrelationship
compositionrelationship
connection relationship
processelement
processelement
relationship
(1)
(1)
[GRID]resource
GRIDresource
usage
start
enddate_and_
timename
description
id
resource
callee
caller
why_usedoutcome
executed_by
descriptionid
related relating
*
of
Distributed Aircraft Maintenance Environment - DAME
Case Workflow Resource
Case_idUser_idOpen_dateClose_dateFlight_start_dateDeadline_dateTail_numberAirlineAirportStandQuote_diagnosisQuote_statusEngineerEngineer_activeEngineer_whyAnalystAnalyst_activeAnalyst_whyExpertExpert_activeExpert_why
Workflow_sequence_numberWorkflow_idWorkflow_author_idWorkflow_nameWorkflow_descriptionWorkflow_start_dateWorkflow_end_dateWorkflow_ip_data_typeWorkflow_op_data_typeWorkflow_diagnosisWorkflow_status
Resource_sequence_numberResource_idResource_nameResource_typeResource_descriptionResource_start_timeResource_end_timeResource_locationResource_configurationResource_version_numberResource_statusResource_req_no_of_processorsResource_req_memoryResource_req_operating_systemResource_req_op_sys_ver_number
Process definition (as executed)
Distributed Aircraft Maintenance Environment - DAME
MyGrid Workflow Provenance
• Workflow instance capture– Workflow overview
• Workflow ID, Status, Start Time, End Time, O/All input and outputs, Service List.
– Service Invocations• Status, Start Time, End Time,
WSDLURI, DataSets x 2.
– Inputs and Outputs• ID, Name, Type, Value
Distributed Aircraft Maintenance Environment - DAME
Outline of presentation
• What do we mean by “provenance data”?• What are we aiming for?• What does achieving this goal entail?
• What progress has been made to date?• What remains to be done?
Distributed Aircraft Maintenance Environment - DAME
Legend
Interface (transfer) resource
Data storage resource
Transient data resource
Compute resource
Application resource Interface (search) resource
User executed process step
XTOControl
FilesXTO
MySQL-SDM2XTOSDM
CR1
Look at SDM to select an engine
Get XTO control files for selected
engine
Run XTO for selected engine
Data interface GRID resource
Distributed Aircraft Maintenance Environment - DAME
BOM data viewer
Product data
database
Software
(Java)
Software
(Java)
Software
(Microsoft .Net)
Web service: Database
Graphical user interface
Web service: Structure
constructor
Distributed Aircraft Maintenance Environment - DAME
Outline of presentation
• What do we mean by “provenance data”?• What are we aiming for?• What does achieving this goal entail?• What progress has been made to date?
• What remains to be done?
Distributed Aircraft Maintenance Environment - DAME
Remaining tasks
• Support the re-execution of workflows with new data• Provide provenance data for the Workflow Advisor• Provide a viewer for captured provenance data• Provide audit trail for accountability purposes
Distributed Aircraft Maintenance Environment - DAME
Provenance research issues
• Provenance requirements and scope• Provenance data security• Data storage format• Centralised provenance data• Stop points for audit trails• Repeatability of GRID resources
Distributed Aircraft Maintenance Environment - DAME
Longer term research
Specify DefineExecute /
deploy
Product
Process
Product Data Management
system
Service Data Manager
Workflow process definition
Workflow execution data
Requirements definition
Workflow process specification