+ All Categories
Home > Documents > “ Workflow ” in Data Access and Integration An OGSA-DAI/DAIS Perspective

“ Workflow ” in Data Access and Integration An OGSA-DAI/DAIS Perspective

Date post: 01-Jan-2016
Category:
Upload: alden-gallagher
View: 40 times
Download: 0 times
Share this document with a friend
Description:
“ Workflow ” in Data Access and Integration An OGSA-DAI/DAIS Perspective. Mario Antonioletti EPCC [email protected]. Talk Overview. Background: OGSA-DAI and DAIS Motivation and Definitions Hierarchies of Service Coordination Conclusions. OGSA-DAI and DAIS. GGF DAIS WG - PowerPoint PPT Presentation
28
Workflow” in Data Access and Integration An OGSA-DAI/DAIS Perspective Mario Antonioletti EPCC [email protected]
Transcript

“Workflow” in Data Access and Integration

An OGSA-DAI/DAIS Perspective

Mario Antonioletti

EPCC

[email protected]

e-Science Workflow Services - www.ogsadai.org.uk 2

Talk Overview

Background: OGSA-DAI and DAIS Motivation and Definitions Hierarchies of Service Coordination Conclusions

e-Science Workflow Services - www.ogsadai.org.uk 3

OGSA-DAI and DAIS GGF DAIS WG

Database Access and Integration Services Attempting to standardise interfaces based on OGSI

OGSA-DAI Aim to provide an implementation of DAIS Serve UK e-Science Community

OGSA-DAI and DAIS Currently not aligned

Data service interface in OGSA-DAI coarse grained Based on an earlier version of DAIS

Data service interface in DAIS currently fine grained Scope for more coarse grained interfaces

OGSA-DAI will realign DAIS once the latter stabilizes

e-Science Workflow Services - www.ogsadai.org.uk 4

OGSA-DAI Project Partners

Powered by ….

e-Science Workflow Services - www.ogsadai.org.uk 5

Data Resource

1. Provides access to a data resource.

Simple Data Service Scenario

Client Data Service

Data Resource

Data Resource2. May provide integration of several data resources.

e-Science Workflow Services - www.ogsadai.org.uk 6

Some Definitions

Data Resource An object that can source/sink data Currently databases in scope

Files and file systems may come in scope

Data Services Grid services Provides common interface to data resources Exposes some capabilities of a data resource

SQL Queries, XPath, BinX, …

Can also provide additional capabilities Transformations, Third party data delivery, etc …

e-Science Workflow Services - www.ogsadai.org.uk 7

Motivation Want common interfaces for:

Data access Data integration

As requests to data service may produce lots of data Want to minimise data movement

Hence encapsulate interactions with service Serialise multiple interactions into one interaction Abstract each interaction into an “activity” Data flows between activities Use a document mechanism to describe this

DAIS and OGSA-DAI Concerned with data flow Currently do not have control constructs

No looping, conditionals, splits, joins, …

e-Science Workflow Services - www.ogsadai.org.uk 8

Service Coordination Patterns

Client Data Service

1. Coordinate of activities

performed at one Data Service.

Data Service

2. Client choreographs a set of services to work together.

ServiceService

Service

… or a service mayorchestrate on behalf of the client.

3. Orchestration of services using a document directed to one service.4. Possibly interface with standard workflow languages, e.g. BPEL4WS, WSCI, …

e-Science Workflow Services - www.ogsadai.org.uk 9

Coordination Hierarchies

Service coordination may take place: Intra service

Document based

Inter services – application driven Choreographed/orchestrated by a client or service

Inter service – document driven Orchestration Ideally would look the same

as the intra service document based interface

Combined with other workflow languages

e-Science Workflow Services - www.ogsadai.org.uk 10

Intra Service Processing

Service processing described by a document Possible activities (OGSA-DAI perspective):

Statement SQL Query, XPath Query

Delivery Input data from third party Output data to a third party Deliver data in the response

Transformations XSL Transformations, compression

OGSA-DAI has produced a framework for this

e-Science Workflow Services - www.ogsadai.org.uk 11

Simple Example: no data flow

sqlQueryStatement

DeliverToURL

<sqlQueryStatement name="statement"> <expression> select * from myTable where id=10 </expression></sqlQueryStatement>

<deliverToURL name="deliverOutput"> <toURL> ftp://anon:[email protected]/home </toURL> </deliverToURL>

e-Science Workflow Services - www.ogsadai.org.uk 12

Simple Example: with data flow

DeliverToURL

<sqlQueryStatement name="statement"> <expression> select * from myTable where id=10 </expression> <resultSetStream name=“output1"/></sqlQueryStatement>

<deliverToURL name="deliverOutput"> <fromLocal from=“output1"/> <toURL> ftp://anon:[email protected]/home </toURL></deliverToURL>

sqlQueryStatement

e-Science Workflow Services - www.ogsadai.org.uk 13

The Perform Document<?xml version="1.0" encoding="UTF-8"?>

<gridDataServicePerform

xmlns="http://ogsadai.org.uk/namespaces/2003/07/gds/types"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://ogsadai.org.uk/namespaces/2003/07/gds/types

../../../../schema/ogsadai/xsd/activities/activities.xsd">

<documentation>

This example performs a simple select statement to retrieve

one row from the test database. The results are delivered

within the response document.

</documentation>

<sqlQueryStatement name="statement">

<expression>

select * from littleblackbook where id=10

</expression>

<resultSetStream name=“output"/>

</sqlQueryStatement>

<deliverToURL name="deliverOutput">

<fromLocal from=“output"/>

<toURL>ftp://anon:[email protected]/home</toURL>

</deliverToURL>

</gridDataServicePerform>

e-Science Workflow Services - www.ogsadai.org.uk 14

Predefined Building Blocks

sqlQueryStatement

sqlStoredProcedure

sqlUpdateStatement

sqlBulkLoadRowset

xPathStatement

xUpdateStatement

xQueryStatement

xmlResourceManagement

xmlCollectionManagement

relationalResourceManager

gzipCompression

zipArchive

xslTransform

inputStream

outputStream

DeliverFromURL

DeliverToURL

DeliverToGFTP

DeliverFromGFTP

DeliverToStream

DeliverFromGDT DeliverToGDT

e-Science Workflow Services - www.ogsadai.org.uk 15

Activities: positives

Simple sequence pattern Data-flow

Avoid multiple message exchanges Minimise data movement Extensible

XML Schema excerpt gives syntax Associate an implementation with activity Done at configuration

Allows optimisation Enactment engine can optimise interaction

e-Science Workflow Services - www.ogsadai.org.uk 16

Activities: negatives Incomplete syntax

Activity inputs and outputs are not typed No typing of data streams Possible issue in coming up with a sensible document

Activity implementation & XML schema loosely coupled Keeping activity and implementation in synch

Semantics are not specified Puts work load on the server

Workloads on the server may need to be managed Activities not exposed at the interface level

This may change in line with DAIS Perform document factored out from DAIS base specs

Standardisation to become a DAIS informational document Scope may be bigger than DAIS

e-Science Workflow Services - www.ogsadai.org.uk 17

Inter Service Application Defined "Workflow"

Services stitched together by an application Could be a client

Use the OGSA-DAI GridDataTransport (GDT) portType

Could be another service Distributed Query Processing (DQP)

Service configured separately Each performs its part in the workflow

e-Science Workflow Services - www.ogsadai.org.uk 18

Client Driven Scenario (aka poor man's data integration)

Client

Data Service

Data Service

<inputStream … /><sqlUpdateStatement>…</sqlUpdateStatement>

<sqlQueryStatement>…</sqlQueryStatement><deliverToGDT … />

GDT

Client creates Data Services.

e-Science Workflow Services - www.ogsadai.org.uk 19

Service Driven Scenario

Client

Query planning,compilation, scheduling,evaluation, partitioning

GDQS

GQES

GQES

GQES

Evaluate sub-queriesDistributed Query Processing

e-Science Workflow Services - www.ogsadai.org.uk 20

More Complex DQP Scenario

GFactory G Q ES F

GFactory G Q ES F

GFactory G Q ES F

N 2

N 1

N3

GC lie n tGG D S

GG D S

G D Q

G D T

G D Q S

N 0G D S

GFactory G Q ES F

N4

p erform (Q u ery)1

cre a te S e rv ice

cre a te S e rv ice2

cre ate S e rvi ce

2

2

GG D S G Q ES 2

G D T

GG D S G Q ES 3

G D T

GG D S G Q ES 1

G D T

GG D S G Q ES 1

G D T

p erform (Q u ery S u b p la n )

p erform (Q u ery S u b p la n )

perform(Q

uer ySu bpl an)

3

s eq u en t ial_ s can

red u ce (p r o tein ID ,s eq u en ce )

s eq u en t ial_ s can ( ter m = 8 3 7 2 )

red u ce (p r o tein ID )

h as h _ jo in(p .p r o tein ID = t.p r o tein ID )

3

o p erat io n _ callb la s t(p .s eq u en ce)

red u ce (p .p r o tein ID , b la s t)

o p erat io n _ callb la s t(p .s eq u en ce)

red u ce (p .p r o tein ID , b la s t)

3

W e b S e rvi ce s (B L A S T)

resu lts

resu lts

resu lts

4

1144

e-Science Workflow Services - www.ogsadai.org.uk 21

Application Driven "Workflow" Labour intensive

Client driven (service choreography) Restricted to small numbers of services

Need tooling Even then this is best done through other means

Service driven (service orchestration) DQP hides details There may be other examples …

Need to explore this space further Can probably accommodate these patterns in an

existing workflow language For more general data integration need:

Describe more sophisticated behaviour

e-Science Workflow Services - www.ogsadai.org.uk 22

Inter Service Document Coordination

Currently evolving Document describes:

Sequence of operations that may span multiple services

Single document includes enough information to: Run an expression on a source data service Deliver the results to a target data service Run and expression on the target data service

Informational document to be presented at GGF10

e-Science Workflow Services - www.ogsadai.org.uk 23

A Dataset Example

Client Data Service

RequestDataRequest.xsd<dataRequest> …</dataRequest>

RemoteRequiredTableDataAccessRecipe.xsd<dar> <gsh> … </gsh> <type> …</type> <dataSet>

… </dataSet></dar>

Data Service

e-Science Workflow Services - www.ogsadai.org.uk 24

Document Driven "Workflow"

Work in this area is tentative No implementations as yet

OGSA-DAI needs to see how it matures

Shows versatility Carries over some of the OGSA-DAI activity framework

Focused on data Can track provenance in the dataSet

Needs to be positioned against general workflow languages

e-Science Workflow Services - www.ogsadai.org.uk 25

Traditional Workflow OGSA-DAI has not explored this space … yet

May need such a framework to facilitate data integration Traditionally workflow:

Revolves around the execution of atomic activities Use a processing model, e.g. WfMC based

Akin to how people talk about service orchestration Want to use existing frameworks as far as possible

OGSA-DAI does not want to define its own workflow DAIS may come up with something

Clearly: Activity model can be used to implement a workflow Collecting use cases

e-Science Workflow Services - www.ogsadai.org.uk 26

Workflow Issues

OGSA-DAI needs to play to see what works Standards still evolving

IP rights: BPEL4WS

Royalty-free … ? WSCI

Royalty-free

Need workflow engines Tooling to construct workflow

Ptolemy II … Triana … ?

e-Science Workflow Services - www.ogsadai.org.uk 27

Summary & Conclusions Base standards in a state of flux

DAIS not settled down yet If you don't like what you see get involved and change it

Document based interface needs to be re-worked OGSA-DAI implemented simple "workflow" patterns

Successful for data access Shied away from real workflow Should try to use emerging standards if possible

Data integration will require workflow patterns Need to examine use cases

Positioning of OGSA-DAI Want it to be the leaves of your complex workflow graphs Wrap your data sources and sinks

Try OGSA-DAI and feedback!

e-Science Workflow Services - www.ogsadai.org.uk 28

Further information The OGSA-DAI Project Site:

http://www.ogsadai.org.uk The DAIS-WG site:

http://cs.man.ac.uk/grid-db OGSA-DAI Users Mailing list

[email protected] General discussion on grid DAI matters

Formal support for OGSA-DAI releases http://www.ogsadai.org.uk/support [email protected]

OGSA-DAI training courses


Recommended