+ All Categories
Home > Documents > 10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham...

10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham...

Date post: 27-Mar-2015
Category:
Upload: marissa-shelton
View: 217 times
Download: 5 times
Share this document with a friend
Popular Tags:
15
10 Sep 2005 NVO Summer School 2005 1 Managing VO data and process flows Matthew J. Graham CACR/Caltech THE US NATIONAL VIRTUAL OBSERVATORY
Transcript
Page 1: 10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY.

10 Sep 2005

NVO Summer School 2005 1

Managing VO data and process flows

Matthew J. GrahamCACR/Caltech

THE US NATIONAL VIRTUAL OBSERVATORY

Page 2: 10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY.

10 Sep 2005

NVO Summer School 2005 2

Overview

• Astronomical data• VOStore/VOSpace• Workflows• Astrogrid workflow• CEA

Page 3: 10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY.

10 Sep 2005

NVO Summer School 2005 3

The importance of data

• Data is the raison d’être of the VO• LSST is the data source nonpareil

– data rates of 540MB/s ~16TB in 8 hrs– final archive > 3PB of data

VO Wheel™

• Well-established ways of handling distributed data:

– SRB– PVFS– OGSA-DAI

Page 4: 10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY.

10 Sep 2005

NVO Summer School 2005 4

Data use cases

• Client has data:– stored locally: transfers it to service– stored locally: service retrieves it– stored elsewhere: service retrieves it

• Service generates data:– stores it locally: notifies client of location– transfers it to the client’s local store– transfers it to a client-designated store

Page 5: 10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY.

10 Sep 2005

NVO Summer School 2005 5

VOStore

• Provides a uniform interface to existing or new data storage locations (Facade pattern)

• Structured/unstructured data both first level• Methods:

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

• get• put• list / listAll• importInit• importData (sync/async)• exportInit• exportData (sync/async)

• delete• rename

Page 6: 10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY.

10 Sep 2005

NVO Summer School 2005 6

VOSpace

• Orchestrates VOStores:

– data collections: directories, user-defined– authorisation: user groups – processing efficiency: where is the nearest copy?

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

• move• copy• identifiers

Page 7: 10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY.

10 Sep 2005

NVO Summer School 2005 7

A virtual super-peer data network?

Page 8: 10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY.

10 Sep 2005

NVO Summer School 2005 8

How to manage the flows?

• Way of describing a flow:– processes/steps, inputs/outputs, serial/parallel

execution, control logic, variables, inline scripting– preferably XML (verbose but rigourous)

• Way of controlling a flow: engine• e-Science vs. e-Business:

– open-ended vs. closed– verification and publication– static vs. dynamic workflows– volume and type of data– meta-transactions– customer, manager and user vs. scientist

Page 9: 10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY.

10 Sep 2005

NVO Summer School 2005 9

Workflow patterns

Sequence:

Parallel split Synchronisation

AND

XOR Exclusive choice

Simple Merge

Multi choice

Multi Multi Merge

Multi + Synchronizing Merge

Multi + Multi

Multi + Discriminator

Deferred choice

Multiple Instances with/out Synch

Implicit termination

Interleaved Parallel Routing

Milestone

Page 10: 10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY.

10 Sep 2005

NVO Summer School 2005 10

Workflow kerfuffle

• Workflow languages: BPEL (BPEL4WS, WSBPEL, WSFL, XLANG), BPML, WS-CDL (WSCL, WSCI) , XPDL, BPSS, PSL, AGWL, DGL, DPML, GJobDL, GSFL, GFDL, GWorkflowDL, MoML, SWFL, YAWL, SCUFL/Xscufl, WPDL, PIF, PSL, OWL-S, xWFL, XPL, INCA

• Workflow engines: Taverna, Kepler, Pegasus, DiscoveryNet, Triana, SPA, Geodise, ICENI, Askalon, GridNexus, BioPipe, BizTalk, BPWS4J, DAGMan, GridAnt, GJH, GRMS, GWFE, GWES, ITIEE, JIGSA, Karajan, ScyFLOW, SDSC Matrix, SHOP2, wftk, YAWL Engine, WFEE

Page 11: 10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY.

10 Sep 2005

NVO Summer School 2005 11

Astrogrid workflow components

• JES (Job Execution System)– Astrogrid workflow engine– Manages control flow– Runs steps in a controlled asynchronous fashion

• CEC (Common Execution Controller)– Manages step execution– Manages data flow

• CEA (Common Execution Architecture) apps– datacenters: support complex quesries against

archives– processing: consume data files and reduce them

Page 12: 10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY.

10 Sep 2005

NVO Summer School 2005 12

Astrogrid workflow schematic

Portal

Registry

MySpace

Command LineCEA

Datacenter CEA

JES

Clientlibrary

CEC

Save/load workflow Save/load data

Resolve application

Application list

Submit workflow

Page 13: 10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY.

10 Sep 2005

NVO Summer School 2005 13

Astrogrid workflow language

<workflow name=“a workflow”><description>description of the workflow</description>

<sequence/flow><set var=“dec” value=“15”/><step name=“a” result-var=“a-results”>

<tool name=“toolA” interface=“simpleInterface”><input>

<parameter name=“RA”><value>21</value></parameter>

<parameter name=“Dec”><value>${dec}</value></parameter>

</input><output>

<parameter name=“results ”indirect=“true”> <value>ftp://aServer/myResults</value></parameter>

</output></tool>

</step><step name=“b”>…

</sequence/flow><script>…<if test=…> <while test=…> <for var=… items=…> <parfor var=… items=…> <try>

<catch></workflow>

Page 14: 10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY.

10 Sep 2005

NVO Summer School 2005 14

CEA

• Create a uniform interface and model for an application and its parameters

• Provides higher level description than WSDL:– Restrict how interfaces can be expressed– Provide specific semantics for astronomical quantitites– Extra information, such as default values, GUI labels

• VOResource extensions for a general application• Provide asynchronous operation:

– callback, polling and job identification

• Allow separate data and control flows

Page 15: 10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY.

10 Sep 2005

NVO Summer School 2005 15

Minimum CEA compliance

• Must implement CommonExecutionConnector interface

• Must send a message to services implementing ResultsListener interface

• Should send messages to services implementing JobMonitor interface

• Should perform basic type checking on all parameter types during init phase


Recommended