Planning on the Grid

Post on 14-Jan-2016

39 views 0 download

description

Planning on the Grid. With slides contributed by Ewa Deelman and Yolanda Gil. Thinking about applications of planning. You’ve seen Planning as X, X  { SAT, CSP, ILP, …} Now: Y as Planning Y  { Grid/Web services composition, …}. - PowerPoint PPT Presentation

transcript

Planning on the Grid

With slides contributed by

Ewa Deelman and Yolanda Gil

2USC INFORMATION SCIENCES INSTITUTE

Thinking about applications of planning

You’ve seen Planning as X,

X {SAT, CSP, ILP, …}

Now: Y as Planning

Y {Grid/Web services composition, …}

3USC INFORMATION SCIENCES INSTITUTE

Problem-solving on Grids

Users pool access to distributed resources (computers, instruments, data, ..)

Applications are often composed of separate components run at several locations

Grid middleware tools allow for scheduling jobs, resource discovery. e.g. Globus toolkit

4USC INFORMATION SCIENCES INSTITUTE

The Computational Grid

Emerging computational and networking infrastructure bring together compute resources, data storage system,

instruments, human resources Enable entirely new approaches to applications and problem

solving remote resources the rule, not the exception can solve ever bigger problems

Wide-area distributed computing national and international

Facilitate collaborative environments Sharing of data which can be expensive to produce

(experimentation/simulation)

5USC INFORMATION SCIENCES INSTITUTE

Example: LIGO Experiment(Laser Interferometer Gravitational-Wave Observatory)

Aims to detect gravitational waves predicted by theory of relativity. Can be used to detect

binary pulsars mergers of black holes “starquakes” in neutron stars

Two installations: in Louisiana (Livingston) and Washington State Other projects: Virgo (Italy), GEO (Germany), Tama (Japan)

Instruments are designed to measure the effect of gravitational waves on test masses suspended in vacuum.

Data collected during experiments is a collection of time series (multi-channel)

Analysis is performed in time and Fourier domains

6USC INFORMATION SCIENCES INSTITUTE

LIGO’s Pulsar Search(Laser Interferometer Gravitational-wave Observatory)

Long time frames

Store

raw channels

Short time frames

Hz

Time

Single Frame

Extract channel

transpose

Time-frequency Image

Find Candidate event DB

archiveIn

terf

ero

mete

r

ShortFourierTransform

Extract frequency range

Construct image

30 minutes

7USC INFORMATION SCIENCES INSTITUTE

Motivation: Using Today’s Grid

Users have high level requirements naturally stated in terms of the application domain Ex: Obtain frequency spectrum for signal S in instrument I and

timeframe T Users have to turn these requirements into executable job

workflows in detailed scripts Users must figure out which code generates desired products,

which files contain it, physical location of the files, hosts that support execution given code requirements, availability of hosts, access policies, etc.

Users must query Grid middleware: metadata catalog, replica locator, resource descriptor and monitoring, etc.

Users must oversee execution

8USC INFORMATION SCIENCES INSTITUTE

Problems with today’s Grid

Usability: users must be proficient in grid computing Complexity: many interrelated choices and dead

ends Solution cost: any-cost solutions are already hard Global cost: optimization necessary when

contention Reliability of execution: job resubmission upon

failure

9USC INFORMATION SCIENCES INSTITUTE

Planning for workflow generation and maintenance

Outline:

Formalization as a planning problem Integration with the grid middleware Case study: planning for workflows in LIGO The grid as a test bed for planning and scheduling

research

10USC INFORMATION SCIENCES INSTITUTE

FFT

FFT filea

/usr/local/bin/fft /home/file1

transfer filea from host1://home/filea

to host2://home/file1

ApplicationDomain

AbstractWorkflow

ConcreteWorkflow

ExecutionEnvironment

host1 host2

Data

Data

host2

App

licat

ion

Dev

elop

men

t and

Exe

cutio

n P

roce

ss

DataTransfer

Resource SelectionData Replica Selection

Transformation InstanceSelection

ApplicationComponentSelection

Retry

Pick different Resources

Specify aDifferentWorkflow

Failure RecoveryMethod

Abstract Workflow

Generation

ConcreteWorkflow

Generation

11USC INFORMATION SCIENCES INSTITUTE

Desiderata for workflow generator

Allow users to refer to data requirements by descriptions, not file names Intuitive, requires far less input

Seek high quality workflows according to variable metric

Model variety of constraints declaratively Data dependencies, resource constraints, user access

rights, ….

12USC INFORMATION SCIENCES INSTITUTE

Planning for workflow generation and maintenance

Outline:

Formalization as a planning problem Integration with the grid middleware Case study: planning for workflows in LIGO The grid as a test bed for planning and scheduling

research

13USC INFORMATION SCIENCES INSTITUTE

Planning for workflow generation

Application components as operators

Desired data as goals

World state includes available hosts, existing data products, network bandwidths, …

14USC INFORMATION SCIENCES INSTITUTE

Existing tools for building workflows:abstract workflow generation

Chimera Input-ouput transforms for files, in ‘Virtual Data Language’:

DV third1->pulsar(a=@{input:"H2_sSFT_LSC-AS-Q_714384000_256_50_1.ilwd"}, b=@{output:"H2_pulsar_LSC-AS-Q_714384000_256_50.5_0.004_3.ilwd"}, t1="714384000", t2="714384255", format="ilwd", channel="LSC-AS-Q", fcenter="50.5", fband="0.004", instrument="H2", ra="3.123643", de="+2.56234", fderv1="0.0", fderv2="0.0", fderv3="0.0", fderv4="0.0", fderv5="0.0");

15USC INFORMATION SCIENCES INSTITUTE

Planning operator(operator pulsar-search (preconds (

(<start-time> 7143800) (<channel> LSC-AS-Q) (<fcenter> 0.5) (<right-ascension> 50) (<sample-rate> 20)

…) (and (created “H2_sSFT_LSC-AS-Q_714384000_256_50_1.ilwd”))

(effects () ( (add

(created “H2_pulsar_LSC-AS-Q_714384000_256_50.5_0.004_3.ilwd”)) ) ))

16USC INFORMATION SCIENCES INSTITUTE

Operator with metadata parameters(operator pulsar-search (preconds (

(<start-time> Number) (<channel> Channel) (<fcenter> Number) (<right-ascension> Number) (<sample-rate> Number) (<file> File-Handle) ;; These two are parameters for the frequency-extract. (<f0> (and Number (get-low-freq-from-center-and-band

<fcenter> <fband>))) (<fN> (and Number (get-high-freq-from-center-and-band

<fcenter> <fband>))) …)

(and (forall ((<sub-sft-file-group>

(and File-Group-Handle (gen-sub-sft-range-for-pulsar-search <f0> <fN> <start-time> <end-time>

<sub-sft-file-group>)))) (and (sub-sft-group <start-time> <end-time>

<channel> <instrument> <format><f0> <fN> <sample-rate> <sub-sft-file-group>)

(at <sub-sft-file-group> <host>)))))

(effects () ( (add (created <file>)) (add (pulsar <start-time> <end-time> <channel>

<instrument> <format> <fcenter> <fband>

<fderv1> <fderv2> <fderv3> <fderv4> <fderv5> <right-ascension> <declination> <sample-

rate> <file>))

) ))

17USC INFORMATION SCIENCES INSTITUTE

Operator with host identified(operator pulsar-search (preconds ((<host> (or Condor-pool Mpi))

(<start-time> Number) (<channel> Channel) (<fcenter> Number) (<right-ascension> Number) (<sample-rate> Number) (<file> File-Handle) ;; These two are parameters for the frequency-extract. (<f0> (and Number (get-low-freq-from-center-and-band

<fcenter> <fband>))) (<fN> (and Number (get-high-freq-from-center-and-band

<fcenter> <fband>))) (<run-time> (and Number

(estimate-pulsar-search-run-time <start-time> <end-time> <sample-rate>

<f0> <fN> <host> <run-time>))) …)

(and (available pulsar-search <host>) (forall ((<sub-sft-file-group>

(and File-Group-Handle (gen-sub-sft-range-for-pulsar-search <f0> <fN> <start-time> <end-time>

<sub-sft-file-group>)))) (and (sub-sft-group <start-time> <end-time>

<channel> <instrument> <format><f0> <fN> <sample-rate> <sub-sft-file-group>)

(at <sub-sft-file-group> <host>)))))

(effects () ( (add (created <file>)) (add (at <file> <host>)) (add (pulsar <start-time> <end-time> <channel>

<instrument> <format> <fcenter> <fband>

<fderv1> <fderv2> <fderv3> <fderv4> <fderv5> <right-ascension> <declination> <sample-

rate> <file>))

) ))

18USC INFORMATION SCIENCES INSTITUTE

Planning for workflow generation

Application components as operators Parameters include host: plan is a concrete workflow

Desired data (in descriptive form) as goals

World state includes available hosts, existing data products, network bandwidths, …

19USC INFORMATION SCIENCES INSTITUTE

Operator descriptions

Represent applying a given component at a particular location with fixed parameters, inputs and outputs.

Preconditions combine data dependencies – derive input requirements from outputs Task constraints – e.g. component must be run on an MPI

machine

20USC INFORMATION SCIENCES INSTITUTE

Plan quality

Objective function may include Performance – expected runtime, variance Reliability – probability of failure, expected number

of retries Computational cost – use of ‘expensive’ resources,

conformance to policies

21USC INFORMATION SCIENCES INSTITUTE

Using local heuristics and global metrics

Need local heuristics since search space is intractable e.g. prefer host for program with high-bandwidth connection

to where the output is required

Need to test a global metric (e.g. overall runtime) since local heuristics can lead to globally poor solution Create as many plans as possible, return best Search control to eliminate redundant solutions

22USC INFORMATION SCIENCES INSTITUTE

Example search heuristics

(control-rule only-transfer-from-loc-with-greatest-bandwidth

(if (and (current-ops (transfer-file))

(current-goal (at <file> <dest>))

(true-in-state (at <file> <loc1>))

(true-in-state (at <file> <loc2>))

(higher-bandwidth <loc1> <loc2> <dest>)))

(then reject bindings ((<from-loc> . <loc2>))))

(control-rule prefer-mpi-to-condor-for-pulsar-search

(if (and (current-ops (pulsar-search))

(type-of <mpi> Mpi)

(type-of <condor> Condor-pool)))

(then prefer bindings ((<host> . <mpi>)) ((<host> . <condor>))))

23USC INFORMATION SCIENCES INSTITUTE

Planning for workflow generation and maintenance

Outline:

Formalization as a planning problem Integration with the grid middleware The grid as a test bed for planning and scheduling

research

24USC INFORMATION SCIENCES INSTITUTE

GridGridGrid

workflow executor(DAGman)Execution

WorkflowPlanning

Globus ReplicaLocation Service

Globus Monitoringand Discovery

Service

Information andModels

Metadata CatalogService

Resource Models

detector

Raw data

Co

nc

rete

Wo

rkfl

ow

High-level specs ofdesired results andintermediate data

products

Dy

na

mic

info

rma

tio

n

Request Manager

CurrentState

Generator

Submission andMonitoring System

AI-basedPlanner

25USC INFORMATION SCIENCES INSTITUTE

Generating the planning problem

Currently, static file representation for available hosts, bandwidths

Query grid services prior to planning to find which relevant files exist Future versions will make dynamic queries

Goal is translated from user request, plan is translated into DAG format suitable for grid scheduler.

26USC INFORMATION SCIENCES INSTITUTE

LIGO’s Pulsar Search at SC’02

Used LIGO’s data collected during the first scientific run of the instrument

Targeted a set of 1000 locations: known pulsar or random locations

Results of the analysis published to the LIGO Scientific Collaboration

Performed using LDAS and compute and storage resources at Caltech, University of Southern California, University of Wisconsin Milwaukee.

27USC INFORMATION SCIENCES INSTITUTE

Summary: benefits of planning

Automating workflow composition Just being addressed in Grid middleware

Reasoning with explicit descriptions of data More intuitive for users Far fewer inputs required than at file level

Better workflows by searching many plans

28USC INFORMATION SCIENCES INSTITUTE

Planning for workflow generation and maintenance

Outline:

Existing Grid tools for workflow generation Formalization as a planning problem Integration with the grid middleware The grid as a test bed for planning and

scheduling research

29USC INFORMATION SCIENCES INSTITUTE

Many areas of planning research relevant for grid

Planning for a dynamic environment: plan monitoring and repair, planning under uncertainty

Scheduling: resource reasoning, temporal reasoning Plan quality: learning, acquiring preferences, local

search planning Planning for information gathering: integrating access

to grid services with workflow creation Domain modeling: handling multiple ontologies,

acquiring metadata descriptions, acquiring operators

30USC INFORMATION SCIENCES INSTITUTE

Fault-tolerant planning for a dynamic environment

Grid resources become unavailable, queue length & network bandwidth change

Exploring plan repair strategies, balance of work done off-line and on-line

Modeling failures, keeping statistics for creating plans more likely to succeed, conditional plans, ..

31USC INFORMATION SCIENCES INSTITUTE

Fault-tolerant straw men

1. Current version: build fully detailed plan offline, resource allocation is fixed Ignores world dynamics

2. Build abstract plan (without specifying hosts) offline, use a matchmaker online Matchmaker makes local decisions only

32USC INFORMATION SCIENCES INSTITUTE

Global reasoning is needed for resource allocation

Start

B (1)

C (5)

A (3)

Finish

33USC INFORMATION SCIENCES INSTITUTE

Approaches for fault-tolerant planning in dynamic domains

RAX (Jonsson et al.) general framework. As implemented: offline: builds complete plan

online: adjusts temporal intervals

Combining planning and scheduling offline: build several abstract plans

online: reason about critical path to instantiate each plan

MDP/POMDP approaches

Open area..

34USC INFORMATION SCIENCES INSTITUTE

Challenge: understanding when different approaches are more important

Hypotheses: Uneven task distribution, in terms of computational and data

expense and resource constraints will indicate global planning

Time-dependency, e.g. need to re-plan during execution, will indicate local planning

Interesting project: use experiments in synthetic and real domains to test hypotheses and uncover new insights

35USC INFORMATION SCIENCES INSTITUTE

Empirical tests with synthetic LIGO problems

Example: Problem requires 100 files on one machine. Vary the number that exist.

distribution - 1 machine

300

400

500

600

700

800

no of files

run

-tim

e

min

max

p-max

g-max

avg

36USC INFORMATION SCIENCES INSTITUTE

Domain modeling

Monolithic planner

Knowledge from several sources must be used

Current system:

Info from Grid services(RLS, MCS etc)

State info (files, resources)

Grid task schedulers

Concrete tasks

KBs combinedin one location

task requirements

existing data in files

available resources

Comp. selector

Resource selector

resourcepolicies

Resourcequeues

Exec.monitor

Networkbandwidth

Userpolicies

37USC INFORMATION SCIENCES INSTITUTE

Where does knowledge used by our planners come from?

(Operator …

(preconditions

..

))

(effects

..

))data

dependencies(VDL*)

task resource

requirements

user policies & preferences

resource policies

Each knowledge component is used for other purposes beyond planning

38USC INFORMATION SCIENCES INSTITUTE

Automatically generated operators for several application domains

(Operator …

(preconditions

..

))

(effects

..

))datadependencies(VDL*)

task resourcerequirements

policies

Investigating patterns of data descriptions for more efficient planning

Digital sky surveyLIGOGEOGalaxy morphologyTomography

{

39USC INFORMATION SCIENCES INSTITUTE

Question: if operators are gathered from distributed services, can we still guarantee soundness and completeness?

Under what kinds of conditions?

40USC INFORMATION SCIENCES INSTITUTE

Representing appropriate information units with metadata

E.g. Have 60,000 files, want to allocate 60 tasks each dealing with 1,000 files.

Previously, application components specified in terms of specific files:

DV run59000->extractSFTData( input=[@{input:“nSFT.59000"},…,@{input:”nSFT.59999”}],

output=[@{output:” eSFT.59000”},…,@{output:”eSFT.59999”}],

t1="714384000", t2="714384063", freq=“1008”,band=“4”,instrument="H2");

… 59 similar clauses…

DV final->computeFStatistic( input=[@{input:”eSFT.00000”},…,@{input:”eSFT.59999”}],…);

1000 files

60000 files

41USC INFORMATION SCIENCES INSTITUTE

Metadata representation

Replace with two clauses, two input predicates A predicate now represents a range of files Simpler to model, greater generality, more efficient for reasoner

(operator run-extractSFTData-range (preconds ((<begin-file> Number) (<number-of-files> (and Number (> <number-of-files> 0))) (<local-begin-file> (and Number

(gen-smaller-number <number-of-files> 1000 <begin-file>))))

(and (range "eSFT" <begin-file> 2 1 <local-begin-file>) (range "nSFT" <local-begin-file> 2 1 999))) (effects ()

((add (range "eSFT" <begin-file> 2 <number-of-files>)))))

42USC INFORMATION SCIENCES INSTITUTE

Requires library operators for ranges

E.g. if a range of files exists, then so does any subrange

Questions: what are the required operators? Similar to spatial calculus RCC-8?

(operator subranges-exist (preconds ((<begin-file> Number) (<type> Object) (<number-of-files> (and Number (> <number-of-files> 0))) (<enclosing-begin> (and Number (gen-known-enclosing-begins <type> <begin-file> 2 1 <number-of-files>))) (<enclosing-number-of-files> (and Number (gen-known-enclosing-number-of-files <type> <enclosing-begin>

2 1 <number-of-files> <begin-file>))))

(created-range <type> <enclosing-begin> 2 1 <enclosing-number-of-files>)) (effects ()

((add (created-range <type> <begin-file> 2 1 <number-of-files>)))))

43USC INFORMATION SCIENCES INSTITUTE

Conclusions

Implemented system takes data description requests from LIGO users, composes workflow and executes on the Grid

Planning and scheduling technologies can make a large contribution to Grid infrastructure

Many interesting challenges for planning and scheduling research from Grid applications

http://www.isi.edu/ikcap/cognitive-grids

http://www.isi.edu/~deelman/pegasus.htm

44USC INFORMATION SCIENCES INSTITUTE

Koehler and Srivastava

Different approaches to specifying workflows by hand

45USC INFORMATION SCIENCES INSTITUTE

WSDL service specification(no workflow specified)<definitions targetNamespace="http://..."xmlns="http://schemas.xmlsoap.org/wsdl/"><message name = "OrderEvent"></message><message name = "TripRquest"></message><message name = "FlightRequest"></message><message name = "HotelRequest"></message><message name = "BookingFailure"></message><portType name ="pt1"><operation name ="CToCI"><input message ="TripRequest"/></operation></portType><portType name ="pt2"><operation name ="CIToHS"><output message ="HotelRequest"/></operation></portType><portType name ="pt3"><operation name ="CIToFS"><output message ="FlightRequest"/></operation></portType>...<portType name ="pt9"><operation name ="RIToFS"><output message ="BookingFailure/></operation></portType></definitions>

46USC INFORMATION SCIENCES INSTITUTE

BPEL4WS

<sequence><receive partner="Customer"portType ="pt1"operation ="CToCI"container ="OrderEvent"></receive><flow><invoke partner ="HotelService"portType ="pt2"operation ="CIToHS"inputContainer ="HotelRequest"></invoke><invoke partner ="FlightService"portType ="pt3"operation ="CIToFS"inputContainer ="FlightRequest"></invoke></flow>

47USC INFORMATION SCIENCES INSTITUTE

Golog

48USC INFORMATION SCIENCES INSTITUTE

Back-up slides

49USC INFORMATION SCIENCES INSTITUTE

What is Needed

We need alternative foundations that offer expressive representations flexible reasoners

Many Artificial Intelligence (AI) techniques are relevant: Planning to achieve given requirements Searching through problem spaces of related choices Using and combining heuristics Expressive knowledge representation languages Reasoners that can incorporate rules, definitions, axioms,

etc. Schedulers and resource allocation techniques

50USC INFORMATION SCIENCES INSTITUTE

Existing tools for building workflows:abstract workflow generation

Chimera Input-ouput transforms at level of actual files, in ‘Virtual Data

Language’:

DV first1->createSFT( b=@{output:"H2_SFT_LSC-AS-Q_714384000_64.gwf"}, t1="714384000", t2="714384063", format="frame", channel="H2:LSC-AS-Q", instrument="H2");

DV first2->createSFT( b=@{output:"H2_SFT_LSC-AS-Q_714384064_64.gwf"}, t1="714384064", t2="714384127", format="frame", channel="H2:LSC-AS-Q", instrument="H2");

DV third1->pulsar(a=@{input:"H2_sSFT_LSC-AS-Q_714384000_256_50_1.ilwd"}, b=@{output:"H2_pulsar_LSC-AS-Q_714384000_256_50.5_0.004_3.123643_+2.56234.ilwd"}, t1="714384000", t2="714384255", format="ilwd", channel="LSC-AS-Q", fcenter="50.5", fband="0.004", instrument="H2", ra="3.123643", de="+2.56234", fderv1="0.0", fderv2="0.0", fderv3="0.0", fderv4="0.0", fderv5="0.0");

51USC INFORMATION SCIENCES INSTITUTE

Existing tools for building workflows:abstract workflow generation

Chimera Input-ouput transforms for files, in ‘Virtual Data Language’:

DV first1->createSFT( b=@{output:"H2_SFT_LSC-AS-Q_714384000_64.gwf"}, t1="714384000", t2="714384063", format="frame", channel="H2:LSC-AS-Q", instrument="H2");

DV first2->createSFT( b=@{output:"H2_SFT_LSC-AS-Q_714384064_64.gwf"}, t1="714384064", t2="714384127", format="frame", channel="H2:LSC-AS-Q", instrument="H2");

DV third1->pulsar(a=@{input:"H2_sSFT_LSC-AS-Q_714384000_256_50_1.ilwd"}, b=@{output:"H2_pulsar_LSC-AS-Q_714384000_256_50.5_0.004_3.123643_+2.56234.ilwd"}, t1="714384000", t2="714384255", format="ilwd", channel="LSC-AS-Q", fcenter="50.5", fband="0.004", instrument="H2", ra="3.123643", de="+2.56234", fderv1="0.0", fderv2="0.0", fderv3="0.0", fderv4="0.0", fderv5="0.0");

52USC INFORMATION SCIENCES INSTITUTE

Existing tools 2: concrete planner

Assigns specific hosts and data locations for tasks Makes random selection of resources and data Provided a feasible solution Reused existing data products

INPUT: OUTPUT:F.a

F.b2F.b1

F.c2F.c1

F.d

Extract

DecimateResample

Concat

Gridftp host://f.a ….lumpy.isi.edu/nfs/temp/f.a

F.c2

F.c1

Register /F.d at home/malcolm/f2

lumpy.isi.edu://usr/local/bin/extract

Jet.caltech.edu://home/malcom/resample -I /home/malcolm/F.b1

Concat

DataTransferNodes

ReplicaCatalog

RegistrationNodes

53USC INFORMATION SCIENCES INSTITUTE

Sample Pulsar Search Results to Date

SC 2002 run: Over 58 pulsar searches Total of

330 tasks 469 data transfers 330 output files produced.

The total runtime was 11:24:35.

To date: 185 pulsar searches Total of

975 tasks 1365 data transfers 975 output files

Total runtime96:49:47