+ All Categories
Home > Documents > 1998 - A Resource Management Architecture for Met a Computing Systems - Czajkowski Et Al

1998 - A Resource Management Architecture for Met a Computing Systems - Czajkowski Et Al

Date post: 07-Apr-2018
Category:
Upload: weimings1
View: 233 times
Download: 0 times
Share this document with a friend

of 21

Transcript
  • 8/6/2019 1998 - A Resource Management Architecture for Met a Computing Systems - Czajkowski Et Al

    1/21

    A R e s o u r c e M a n a g e m e n t A r c h i t e c t u r e fo rM e t a c o m p u t i n g S y s t e m s

    K a r l Cza jko w sk i 1, I an Fo s t e r 2 , N ick K a ron i s 2 , Ca r l K e sse lm an 1, S tu a r tM a r t i n 2 , W a r r e n S m i t h 2, a n d S t e v e n T u e c k e 2

    { k a r l c z , i t f , k a r o n i s , c a r l , s m a r t i n , w s m i t h , t u e c k e } @ g l o b u s . o r gh t t p : / / w w w , g l o b u s , o r g

    1 Information Sciences In stitu teUniversity of Southern CaliforniaMarina del Rey, CA 90292-6695h t t p : / / w w w . i s i . e d u

    2 Mathematics and Computer Science DivisionArgonne National LaboratoryArgonne, IL 60439h t t p : / /www.m c s . a n l . g o v

    A b s t r a c t . M e t a c o m p u t i n g s y s t e m s a r e i n t e n d e d t o s u p p o r t r e m o t ea n d / o r c o n c u r r e n t u s e o f g e o g r ap h i c a l l y d i s t r i b u t e d c o m p u t a t i o n a l r e -s o u r c e s . R e s o u r c e m a n a g e m e n t i n s u c h s y s t e m s i s c o m p l i c a t e d b y f i v ec o n c e r n s t h a t d o n o t t y p i c a l l y a r i s e i n o t h e r s i t u a t i o n s : s i t e a u t o n o m ya n d h e t e r o g e n e o u s s u b s t r a t e s a t t h e r e s o u r c e s , a n d a p p l i c a t i o n r e q u i r e -m e n t s f o r p o l i c y ex t eu s ib i li t y, o - a l l o c at i o n , a n d o n l i n e c o n t ro l . W e d e -s c r ib e a r e s o u r c e m a n a g e m e n t a r c h i t e c t u r e t h a t a d d r e s s e s t h e s e c o n -c e r n s . T h i s a r c h i t e c t u r e d i s t r i b u t e s t h e r e s o u r c e m a n a g e m e n t p r o b l e ma m o n g d i s t i n ct l o c a l m a n a g e r , r e s o u r c e b r o k e r , a n d r e s o u r c e c o -a l l o c a t o rc o m p o n e n t s a n d d e f i n e s a n e x t e n s i b l e r e s o u r c e s p e c i f i c a t i o n a n g u a g e t oe x c h a n g e i n f o r m a t i o n a b o u t r e q u i r e m e n t s . W e d e s c r i b e h o w t h e s e t e c h -n i q u e s h a v e b e e n i m p l e m e n t e d i n t h e c o n t e x t o f t h e G l o b u s m e t a c o m p u t -i n g t o o l k i t a n d u s e d t o i m p l e m e n t a v a r i e t y o f d i f f er e n t e s o u r c e m a n a g e -m e n t s t ra t e g ie s . W e r e p o r t o n o u r e x p e r i e n c e s a p p l y i n g o u r t e c h n i q u e si n a l a r g e t e s t b e d , G U S T O , i n c o r p o r a t i n g 1 5 s it es , 3 0 c o m p u t e r s , a n d3 6 0 0 p r o ce s s or s .

    1 I n t r o d u c t i o nMetacomputing systems a l low app l i c a t i ons t o a s semble and use co l l e c t i ons o fc o m p u t a t i o n a l r e s o u r c e s o n a n a s - n e e d e d b a s is , w i t h o u t r e g a r d t o p h y s i c a l l o ca -t i o n . V a r i o u s g r o u p s a r e i m p l e m e n t i n g s u c h s y s t e m s a n d e x p l o r i n g a p p l i c a t i o n si n d i s t r ib u t e d s u p e r c o m p u t i n g , h i g h - t h ro u g h p u t c o m p u t i n g , s m a r t i n s tr u m e n t s ,co l l ab ora t iv e env i ron me nt s , and da t a min ing [10 , 12 , 18 , 20 , 22 , 6, 25 ].

    T h i s p a p e r i s c o n c e r n e d w i t h resource management f or m e t a c o m p u t i n g : t h a tis , w i t h t h e p r o b l e m s o f l o c a ti n g a n d a l lo c a t in g c o m p u t a t i o n a l r e s o u r c e s , a n dw i t h a u t h e n t i c a t i o n , p r o c e s s c r e a ti o n , a n d o t h e r a c t iv i t ie s r e q u i re d t o p r e p a r e aD r o r G . F e i t el s on , L a r r y R u d o l p h ( E ds . ): J S S P P ' 9 8 , L N C S 1 4 5 9 , p p . 6 2 - 8 2 , 1 9 9 8 .(~ ) S p r i n g e r - V e r l a g B e r l i n H e i d e l b e r g 1 9 9 8

  • 8/6/2019 1998 - A Resource Management Architecture for Met a Computing Systems - Czajkowski Et Al

    2/21

    A Resou rce Management Architecture for M etacom puting System s 63r e s o u r c e f o r u s e. W e d o n o t a d d r e s s o t h e r i s s u e s t h a t a r e t r a d i t io n a l l y a s s o c i a t e dw i t h s c h e d u l i n g ( s u c h a s d e c o m p o s i t i o n , a s s i g n m e n t , a n d e x e c u t i o n o r d e r i n g o ft a s k s ) o r t h e m a n a g e m e n t o f o t h e r r e s o u r c e s su c h a s m e m o r y , d is k , a n d n e t w o r k s .

    T h e m e t a c o m p u t i n g e n v i r o n m e n t i n t r o d u c e s f iv e c h a ll e ng i n g r e s o u r c e m a n -a g e m e n t p r o b l e m s : s i t e a u to n o m y , h e t e r o g e n e o u s s u b s t r a t e , p o l i c y e x t e n s ib i li ty ,co -a l l oca t ion , and on l ine con t ro l .1 . The s i t e a u t o n o m y p r o b l e m r e f e r s t o t h e f a c t t h a t r e s o u r c e s a r e t y p i c a l l yo w n e d a n d o p e r a t e d b y d i f f e r e n t o r g a n i z a t i o n s , i n d i f f e r e n t a d m i n i s t r a t i v e

    d o m a i n s [5 ]. H e n c e , w e c a n n o t e x p e c t t o s e e c o m m o n a l i t y i n a c c e p t a b l e u s epo l icy , s chedu l ing p o l ic i e s, s ecur i t y m echan i sm s , and the like.

    2 . The h e t e r o g en eo u s s u b s t r a t e p r o b l e m d e r iv e s fr o m t h e s i te a u t o n o m y p r o b l e mand re fe r s t o t he fac t t ha t d i f fe ren t s i t e s may use d i f fe ren t l oca l r e sourcem a n a g e m e n t s y s t e m s [ 16 ], s u c h a s C o n d o r [ 18 ], N Q E [ 1 ], C O D I N E [1 1],E A S Y [1 7], L S F [ 2 8] , P B S [ 1 4] , a n d L o a d L e v e l e r [1 5]. E v e n w h e n t h e s a m esys t em i s used a t tw o s i t e s , d i f fe ren t conf igura t i ons and loca l mod i f i c a t i onsof t e n l e ad to s ign if i c an t d i f fe rences i n func t iona l i t y .

    3 . The p o l i cy ex t en s ib i l i t y p r o b l e m a ri se s b e c a u s e m e t a c o m p u t i n g a p p l ic a t io n sa r e d r a w n f r o m a w i d e ra n g e o f d o m a i n s , e a c h w i t h i t s o w n r e q u i re m e n t s .A r e s o u r c e m a n a g e m e n t s o l u ti o n m u s t s u p p o r t t h e f r e q u e n t d e v e lo p m e n t o fn e w d o m a i n - s p e c i f i c m a n a g e m e n t s t r u c t u r e s , w i t h o u t r e q u i r i n g c h a n g e s t ocode ins t a l l ed a t pa r t i c ipa t ing s i t e s .4 . T h e co-a l loca t ion p r o b l e m a r i s e s b e c a u s e m a n y a p p l i c a t i o n s h a v e r e s o u r c er e q u i r e m e n t s t h a t c a n b e s a ti s fi e d o n l y b y u s in g r e s o u r c e s s i m u l t a n e o u s l y a tseve ra l s i te s . S i te au to no m y an d the poss ib i l i t y o f f a i lu re dur ing a l l oca t ionin t ro du ce a need fo r spec i a li z ed mec han i sm s fo r a l l oca t ing mu l t i p l e re source s ,i n i t i a t i n g c o m p u t a t i o n o n t h o s e r e s o u r c e s , a n d m o n i t o r i n g a n d m a n a g i n gt h o s e c o m p u t a t io n s .

    5 . The o n l i n e c o n t r o l p r o b l e m a r i s es b e c a u s e s u b s t a n t i a l n e g o t i a t i o n c a n b e r e -q u i r e d t o a d a p t a p p l i c a t i o n r e q u i r e m e n t s t o r e s o u r c e a v a il a bi li ty , p a r t i c u l a r l yw h e n r e q u i r e m e n t s a n d r e s o u r c e c h a r a c t e r is t i c s c h a n g e d u r in g e x e c u t i o n . F o re x a m p l e , a t e l e - i m m e r s i v e a p p l i c a t i o n t h a t n e e d s t o s i m u l a t e a n e w e n t i t ym ay p re fe r a l ow er - re so lu t ion rende r ing , i f t he a l t e rna t iv e i s t h a t t he e n t i t yn o t b e m o d e l e d a t all. R e s o u r c e m a n a g e m e n t m e c h a n i sm s m u s t s u p p o r t s u c hn e g o t i a t i o n .A s w e e x p l a i n i n S e c t i o n 2 , n o e x i s t i n g r e s o u r c e m a n a g e m e n t s y s t e m s a d -

    d r e s s e s a ll fi ve p r o b l e m s . S o m e b a t c h q u e u i n g s y s t e m s s u p p o r t c o - a ll o c a ti o n , b u tn o t s i t e a u t o n o m y , p o l i c y e x t e n si b il it y , a n d o n l in e c o n t r o l [ 1 6]. C o n d o r s u p p o r t ss i t e au ton om y, b u t no t co -a l l oca t ion o r on l ine con t ro l [18 ]. G a l lop [26] addr e s se so n l i n e c o n t ro l a n d p o l i c y e x t en s i b il it y , b u t n o t t h e h e t e r o g e n e o u s s u b s t r a t e o rc o - a l l o c a t io n p r o b l e m . L e g i o n [ 12 ] d o e s n o t a d d r e s s t h e h e t e r o g e n e o u s s u b s t r a t ep r o b l e m .

    I n t h i s p a p e r , w e d e s c r i b e a r e so u r c e m a n a g e m e n t a r c h i t e c t u r e t h a t w e h a v ed e v e l o p e d to a d d r e s s t h e f iv e p r o b l e m s . I n t h i s a r c h i te c t u r e , d e v e l o p e d i n t h ec o n t e x t o f t h e G l o b u s p r o j e c t [ 10 ], w e a d d r e ss p r o b l e m s o f si te a u t o n o m y a n d

  • 8/6/2019 1998 - A Resource Management Architecture for Met a Computing Systems - Czajkowski Et Al

    3/21

    64 Karl Czajkowski et al.heterogeneous subs trate by introducing entities called resource managers to pro-vide a well-defined interface to diverse local resource management tools, policies,and security mechanisms. To support online control and policy extensibility,we define an extensible resource specification language that supports negotia-tion between different components of a resource management architecture, andwe introduce resource brokers to handle the mapping of high-level applicationrequests into requests to individual managers. We address the problem of co-allocation by defining various co-allocation strategies, which we encapsulate inresource co-allocators.

    One measure of success for an architecture such as this is its usability in apractical setting. To this end, we have implemented and deployed this architec-ture on GUSTO, a large computational grid testbed comprising 15 sites, 330computers , and 3600 processors, using LSF, NQE, LoadLeveler, EASY, Fork,and Condor as local schedulers. To date, this architecture and testbed havebeen used by ourselves and others to implement numerous applications and halfa dozen different higher-level resource management strategies. This experimentrepresents a significant step forward in terms of number of global metacomputingservices implemented and number and variety of commercial and experimentallocal resource management systems employed. A more quantitative evaluationof the approach remains as a significant challenge for future work.

    The rest of this paper is structured as follows. In the next section, we reviewcurrent distributed resource management solutions. In subsequent sections wefirst outline our architecture and then examine each major function in detail:the resource specification language, local resource managers, resource brokers,and resource co-allocators. We summarize the paper and discuss future work inSection 8.

    2 R e s o u r c e M a n a g e m e n t A p p r o a c h e sPrevious work on resource management for metacomputing systems can be bro-ken into two broad classes:

    - Ne two rk batch queuing systems. These systems focus strictly on resourcemanagement issues for a set of networked computers. These systems do notaddress policy extensibility and provide only limited support for online con-trol and co-allocation.

    - Wide-area scheduling systems. Here, resource management is performed as acomponent of mapping application components to resources and schedulingtheir execution. To date, these systems do not address issues of heterogeneoussubstrates, site autonomy, and co-allocation.

    In the following, we use representat ive examples of these two types of system toillustrate the strengths and weaknesses of current approaches.

  • 8/6/2019 1998 - A Resource Management Architecture for Met a Computing Systems - Czajkowski Et Al

    4/21

    A Resource Management Architecture for Metacomputing Systems 652.1 Networ ked Batch Queuing Sys t emsNetworked batch queuing systems, such as NQE [1], CODINE [11], LSF [28],PBS [14], and LoadLeveler [15], handle user-submitted jobs by allocating re-sources from a networked pool of computers. The user characterizes applicationresource requirements either explicitly, by some type of job control language, orimplicitly, by selecting the queue to which a request is submitted. Networkedbatch queuing systems typically are designed for single administrative domains,making site autonomy difficult to achieve. Likewise, the heterogeneous substrateproblem is also an issue because these systems generally assume tha t they are theonly resource management system in operation. One exception is the CODINEsystem, which introduces the concept of a t rans fer queue to allow jobs submit tedto CODINE to be allocated by some other resource management system, at areduced level of functionality. An alternative approach to supporting substrateheterogeneity is being explored by the PSCHED [13] initiative. This project isattempting to define a uniform API through which a variety of batch schedulingsystems may be controlled. The goals of PSCHED are similar in many ways tothose of the Globus Resource Allocation Manager described in Section 5.

    Batch scheduling systems provide a limited form of policy extensibility inthat resource management policy is set by either the system or the system ad-ministrator, by the creation of scheduling policy or batch queues. However, thiscapability is not available to the end users, who have little control over how thebatch scheduling system interprets their resource requirements.Finally, we observe that batch queuing systems have limited support for on-line allocation, as these systems are designed to support applications in whichthe requirements specifications are in the form "get X done soon", where Xis precisely defined but "soon" is not. In metacomputing applications, we havemore complex, fluid constraints, in which we will want to make tradeoffs betweentime (when) and space (physical characteristics). Such constraints lead to a needfor the resource management system to provide capabilities such as negotiation,inquiry interfaces, information-based control, and co-allocation, none of whichare provided in these systems.In summary, batch scheduling systems do not provide in themselves a com-plete solution to metacomputing resource management problems. However, clear-ly some of the mechanisms developed for resource location, distributed processcontrol, remote file access, to name a few, can be applied to wide-area systems aswell. Furthermore, we note th at network batch queuing systems will necessarilybe part of the local resource management solution. Hence, any metacomputingresource management architecture must be able to interface to these systems.

    2.2 Wid e-A rea Scheduling Sys t emsWe now examine how resource management is addressed within systems de-veloped specifically to schedule metacomputing applications. To gain a goodperspective on the range of possibilities, we discuss four different schedulers,designed variously to support specific classes of applications (Gallop [26]), an

  • 8/6/2019 1998 - A Resource Management Architecture for Met a Computing Systems - Czajkowski Et Al

    5/21

    66 Karl Czajkowski et al.extensible object-oriented system (Legion [12]), general classes of parallel pro-grams (PRM [22]), and high-throughput computation (Condor [18]).

    The Gallop [26] system allocates and schedules tasks defined by a statictask graph onto a set of networked computational resources. (A similar mech-anism has been used in Legion [27].) Resource allocation is implemented by ascheduling manager, which coordinates scheduling requests, and a local man-ager, which manages the resources at a local site, potential ly interfacing to site-specific scheduling and resource allocation services. This decomposition, whichwe also adopt, separates local resource management operations from global re-source management policy and hence facilitates solutions to the problems of siteautonomy, heterogeneous substrates, and policy extensibility. However, Gallopdoes not appear to handle authentication to local resource management services,thereby limiting the level of site autonomy tha t can be achieved.

    The use of a static task-graph model makes online control in Gallop difficult.Resource selection is performed by attempting to minimize the execution timeof task graph as predicted by a performance model for the application and theprospective resource. However, because the minimization procedure and the costmodel is fixed, there is no support for policy extensibility. Legio n [12] overcomesthis limitation by leveraging its object-oriented model. Two specialized objects,an application-specific Sc he du le r and a resource-specific En ac to r negotiate withone another to make allocation decisions. The Enactor can also provide co-allocation functions.

    Gallop supports co-allocation for resources maintained within an administra-tive domain, but depends for this purpose on the ability to reserve resources. Un-fortunately, reservation is not currently supported by most local resource man-agement systems. For this reason, our architecture does not rely on reservationto perform co-allocation, but rather uses a separate co-allocation managementservice to perform this function.

    The Pr os pe ro Re so ur ce Ma na g er [22] (PRM) provides resource manage-ment functions for parallel programs written by using the PVM message-passinglibrary. PRM consists of three components: a system manager, a job manager,and a node manager. The job manager makes allocation decisions, while the sys-tem and node manager actually allocate resources. The node manager is solelyresponsible for implementing resource allocation functions. Thus, PRM doesnot address issues of site autonomy or substrate heterogeneity. A variety of jobmanagers can be constructed, allowing for policy extensibility, although there isno provision for composing job managers so as to extend an existing manage-ment policy. As in our architecture, PRM has both an information infrastructure(Prospero [21]) and a management API, providing the infrastructure needed toperform online control. However, unlike our architecture, PRM does not supportco-allocation of resources.

    Condor [18] is a resource management system designed to support high-throughput computations by discovering idle resources on a network and allo-cating those resources to application tasks. While Condor does not interfacewith existing resource management systems, resources controlled by Condor are

  • 8/6/2019 1998 - A Resource Management Architecture for Met a Computing Systems - Czajkowski Et Al

    6/21

    A Resource Management Architecture for Metacomputing Systems 67deallocated as soon as the "rightful" owner starts to use them. In this sense,Condor supports site autonomy and heterogeneous substrates. However, Condorcurrently does not interoperate with local resource authentication, limiting thedegree of autonomy a site can assert. Condor provides an extensible resourcedescription language, called c l a s s i f i e d a d s , which provides limited control overresource selection to both the application and resource. However, the matchingof application component to resource is performed by a system c l a s s i f i e r , whichdefines how matches--and consequently resource management--t ake place, lim-iting the extensibility of this selection policy. Finally, Condor provides no supportfor co-allocation or online control.

    I'" B r o k e r ~ s p e R S L c i a l i z a t i o nR S L / iJ[ A p p l ic a t i o n ~ i ~ I n f o ' [ In sf~ 1 7 6 [

    RSL

    I" C o-a lloca tor

    L o c a l I G R A M ! I G R A M I [ G R A M ]r e s o u r c em a n a g e r s [ L S F l [ E A S Y - L L I [ N Q E ]Fi g. 1. The Globus resource management architecture, showing how RSL speci-fications pass between application, resource brokers, resource co-allocators, andlocal managers (GRAMs). Notice the central role of the information service.

    In summary, our review of current resource management approaches revealeda range of valuable services, but no single system that provides solutions to allfive metacomputing resource management problems posed in the introduction.3 O u r R e s o u r c e M a n a g e m e n t A r c h i t e c t u r eOur approach to the metacomputing resource management problem is illustratedin Figure 1. In this architecture, an extensible r e s o u r c e s p e c i f i c a t i o n l a n g u a g e(RSL), discussed in Section 4 below, is used to communicate requests for re-sources between components: from applications to resource brokers, resource

  • 8/6/2019 1998 - A Resource Management Architecture for Met a Computing Systems - Czajkowski Et Al

    7/21

    68 K ar l Cza jkowski e t a l.c o - a l l o c a t o r s , a n d r e s o u r c e m a n a g e r s . A t e a c h s t a g e in th i s p r o c e s s , i n f o r m a t i o na b o u t r e s o u r c e r e q u i r e m e n t s , c o d e d a s a n R S L e x p r e s s io n b y t h e a p p l ic a t i o n ,is r e fi n e d b y o n e o r m o r e r e s o u r c e b ro k e r s a n d c o - a ll o c at o rs ; i n f o r m a t i o n a b o u tr e s o u r c e a v a i l a b il i ty a n d c h a r a c t e r i s t i c s is o b t a i n e d f r o m a n i n f o r m a t i o n s e r v i ce .

    Resource brokers a r e r e s p o n s i b l e f o r t a k i n g h i g h - le v e l R S L s p e c i f ic a t io n s a n dt r a n s f o r m i n g t h e m i n t o m o r e c o n c r e t e s p e ci fi c at io n s t h r o u g h a p r o c e s s w e c a llspecialization. A s i l l u s t r a t e d i n F i g u r e 2 , m u l t i p l e b r o k e r s m a y b e i n v o l v e d in s e r-v i c i n g a s i n g le r e q u e s t , w i t h a p p l i c a ti o n - s p e c i f ic b r o k e r s t r a n s l a t i n g a p p l i c a t i o nr e q u i r e m e n t s i n t o m o r e c o n c r e t e r es o u r c e r e q u i re m e n t s , a n d d i ff e re n t r es o u r c eb r o k e r s b e i n g u s e d t o l o c a t e a v ai la b le r e s o u r ce s t h a t m e e t t h o s e r e q u i r e m e n t s .

    T r a n s f o r m a t i o n s e f fe c t e d b y r e s o u r c e b r o k e r s g e n e r a t e a s p e c i f i c a ti o n i n w h i c ht h e l o c a t i o n s o f t h e r e q u i r e d r e s o u r c e s a r e c o m p l e t e l y sp e c if ie d . S u c h a groundrequest c a n b e p a s s e d t o a co-allocator, w h i c h i s r e s p o n s i b l e f o r c o o r d i n a t i n gt h e a l l o c a t i o n a n d m a n a g e m e n t o f r e s o u r c e s a t m u l t i p l e si te s . A s w e d e s c r i b e i nS e c t i o n 7 , a v a r i e t y o f c o - a l l o c a t o r s w i ll b e r e q u i r e d i n a m e t a c o m p u t i n g s y s t e m ,p r o v i d i n g d i f f e re n t c o - a l l o c a t i o n se m a n t i c s .

    " I w a nt t o run a f - , , , "S u pe rc o m pu t e r sdistributed interactiv_.~ DIS -specific ~ prov iding 100 Gflop s,simulation involving"~ broker ) " 100 GB, < 100 msec100,000 entities"x _

    "I want to performa param eter studyinvolving 10,000separate trials"L( Par am eter study-~" ~ speci fic bro ke r ) , . . . , ,

    a tny \ /( Supercomputer~resource brokerJ

    "I want to create a / f Collaborativeshared virtual space .~ enviro nm ent-specifi~with participants X, k ,,. resource broker JY, and Z"

    (

    , . . . , ,

    Informationservice

    "Run SF-Exp ress J ?~ 80 ~ n S F -E x p re /

    J on 256 nod e~Argonne x~ ( CIT '~r e s o u r c e / ~ . r e s o u r c e /manager .ff manager j

    "80 nodes on Argonne SP,256 nodes on CIT E xemplar,300 nodes on NCSA 02 000"Simultaneousstartco-allocator

    Run SF-Express300 nodes"

    resourcemanager

    F i g . 2 . T h i s v i e w o f t h e G l o b u s r es o u r c e m a n a g e m e n t a r c h i t e c t u r e s ho w s h o wd i f f e r en t t y p e s o f b r o k e r c a n p a r t i c i p a t e i n a s in g l e re s o u r c e r e q u e s t

  • 8/6/2019 1998 - A Resource Management Architecture for Met a Computing Systems - Czajkowski Et Al

    8/21

    A Resource Management Architecture for Metacomputing Systems 69Resource co-allocators break a mult irequest--that is, a request involving re-sources at multiple sites--into its constituent elements and pass each component

    to the appropriate r e s o u r c e m a n a g e r . As discussed in Section 5, each resourcemanager in the system is responsible for taking an RSL request and translatingit into operations in the local, site-specific resource management system.

    The i n f o r m a t i o n s e r v i c e is responsible for providing efficient and pervasiveaccess to information about the current availability and capability of resources.This information is used to locate resources with particular characteristics, toidentify the resource manager associated with a resource, to determine propertiesof that resource, and for numerous other purposes as high-level resource spec-ifications are translated into requests to specific managers. We use the Globussystem's Metacomputing Directory Service (MDS) [8] as our information ser-vice. MDS uses the data representation and application programming interface(API) defined on the Lightweight Directory Access Protocol (LDAP) to meet re-quirements for uniformity, extensibility, and distributed maintenance. It definesa data model suitable for distributed computing applications, able to representcomputers and networks of interest, and provides tools for populating this datamodel. LDAP defines a hierarchical, tree-structured name space called a direc-t o r y i n f o r m a t i o n t r e e (DIT). Fields within the namespace are identified by aunique d i s t i n g u i s h e d n a m e (DN). LDAP supports both distr ibution and replica-tion. Hence, the local service associated with MDS is exactly an LDAP server(or a gateway to another LDAP server, if multiple sites share a server), plusthe utilities used to populate this server with up-to-date information about thestructure and state of the resources within that site. The global MDS serviceis simply the ensemble of all these servers. An advantage of using MDS as ourinformation service is that resource management information can be used byother tools, as illustrated in Figure 3.

    4 R e so u r c e Sp e c i f i c a t i o n L a n g u a g e

    We now discuss the resource specification language itself. The syntax of an RSLspecification, summarized in Figure 4, is based on the syntax for filter specifica-tions in the Lightweight Directory Access Protocol and MDS. An RSL specifica-tion is constructed by combining simple parameter specifications and conditionswith the operators &; to specify conjunction of parameter specifications, I; toexpress the disjunction of parameter specifications, +; or to combine two or morerequests into a single compound request, or multirequest.

    The set of parameter-name terminal symbols is extensible: resource brokers,co-allocators, and resource managers can each define a set of parazneter namesthat they will recognize. For example, a resource broker that is specialized fortele-immersive applications might accept as input a specification containing aframes-per-second parameter and might generate as output a specificationcontaining an mf lo ps -p er -s eco nd parameter, to be passed to a broker that dealswith computational resources. Resource managers, the system components that

  • 8/6/2019 1998 - A Resource Management Architecture for Met a Computing Systems - Czajkowski Et Al

    9/21

    70 Ka rl Czajkowski et al .

    F i g . 3 . T h e G l o b u s V i e w to o l u s es M D S i n f o r m a t i o n a b o u t r e s ou r c e m a n a g e r s t a -t u s t o p r e s en t i n f o r m a t i o n a b o u t t h e c u r r e n t s t a t u s o f a m e t a c o m p u t i n g t e s tb e d .O n t h e l e f t , w e s e e t h e s i t e s t h a t a r e c u r r e n t l y p a r t i c i p a t i n g i n t h e t e s t b e d ; o nt h e r i g h t i s i n f o r m a t i o n a b o u t t h e t o t a l n u m b e r o f n o d e s t h a t e a c h s i te is c o n -t r i b u t i n g , t h e n u m b e r o f t h o s e n o d e s t h a t a r e c u r r e n t l y a v a i l a b l e t o e x t e r n a lu s e r s , an d t h e u s ag e o f t h o s e n o d e s b y G l o b u s u s e r s .

    specification :-- requestrequest := mul t i request I conjunct ion t d is junction [ par am ete rmu ltirequ est :-- + request-l istconjun ction := ~ request-l istdisjunction :-- [ request-l istrequest-l ist := ( request ) request-l ist ] ( reque st )param eter : -- param eter -nam e op va lueop := =I>I =I

  • 8/6/2019 1998 - A Resource Management Architecture for Met a Computing Systems - Czajkowski Et Al

    10/21

    A Resource Management Architecture for Metacomputing Systems 71actually talk to local scheduling systems, recognize two types of paramete r-nameterminal symbols:

    - M D S a t t r i b u t e n a m e s , used to express constraints on resources: for exam-ple, memory>=64 or network=atm. In this case, the parameter name refersto a field defined in the MDS entry for the resource being allocated. Thetruth of the parameter specification is determined by comparing the valueprovided with the specification with the current value associated with thecorresponding field in the MDS. Arbitrary MDS fields can be specified byproviding their full distinguished name.

    - S c h e d u l e r p a r a m e t e r s , used to communicate information regarding the job,such as c o u n t (number of nodes required), m a x _ t i m e (maximum time re-quired), executable, arguments, directory, and environment (environ-ment variables). Schedule parameters are interpreted directly by the resourcemanager.

    For example, the specification& ( e x e c u t a b l e - - m y p r o g )

    ( i ( & ( c o u n t = 5 ) ( m e m o r y > = 6 4 ) ) ( & ( c o u n t = 1 0 ) ( m e m o r y > = 3 2 ) ) )r e q u e s t s 5 n o d e s w i t h a t l e a s t 6 4 M B m e m o r y , o r i 0 n o d e s w i t h a t l e a s t

    3 2 M B . I n t h i s r e q u e s t , e x e c u t a b l e a n d c o u n t a r e s c h e d u l e r a t t r i b u t e n a m e s ,w h i l e m e m o r y i s a n M D S a t t r i b u t e n a m e .O u r c u r r e n t R S L p a r s e r a n d r e s o u r c e m a n a g e r d i s a m b i g u a t e t h e s e t w o p a -r a m e t e r t y p e s o n t h e b a s i s o f t h e p a r a m e t e r n a m e . T h a t is, t h e r e s o u r c e m a n a g e rk n o w s w h i c h f i e l d s i t w i l l a c c e p t a s s c h e d u l e r p a r a m e t e r s a n d a s s u m e s a l l o t h -e r s a r e M R S a t t r i b u t e n a m e s . N a m e c l a s h e s c a n b e d i s a m b i g u a t e d b y u s i n g t h ec o m p l e t e d i s t i n g u i s h e d n a m e f o r t h e M D S f i e l d i n q u e s t i o n .

    T h e a b i l i t y t o i n c l u d e c o n s t r a i n t s o n M D S a t t r i b u t e v a l u e s i n R S L s pe ci fi -c a t i o n s i s i m p o r t a n t . A s w e d i s c u s s i n S e c t i o n 5 , t h e s t a t e o f r e s o u r c e m a n a g e r si s s t o r e d i n M R S . H e n c e , r e s o u r c e s p e c i f i c a t i o n s c a n r e f e r t o r e s o u r c e c h a r a c -t er is ti cs s u c h a s q u e u e - l e n g t h , e x p e c t e d w a i t t i m e , a n d n u m b e r o f p r o c e s s o r sa v a i l a b l e . T h i s t e c h n i q u e p r o v i d e s a p o w e r f u l m e c h a n i s m f o r c o n t r o l l i n g h o w a nR S L s p e c i f i c a t i o n i s i n t e r p r e t e d .

    T h e f o l l o w i n g e x a m p l e o f a m u l t i r e q u e s t i s d e r i v e d f r o m t h e e x a m p l e s h o w ni n F i g u r e 2 .

    + ( & ( c o u n t = S 0 ) ( m e m o r y > - - 6 4 M ) ( e x e c u t a b l e = s f _ e x p r e s s )( r e s o u r c e m a n a g e r - - i c o 1 6 . m c s . a n l . g o v : 8 7 1 1 ) )

    ( & ( c o u n t - - 2 5 6 ) ( n e t w o r k - - a r m ) ( e x e c u t a b l e = s f _ e x p r e s s )( r e s o u r c e m a n a g e r = n e p t u n e . c a c r . c a l t e c h , e d u : 7 5 5 ) )

    ( & ( c o u n t - - 3 0 0 ) ( m e m o r y > = 6 4 M ) ( e x e c u t a b l e = s f _ e x p r e s s )( r e s o u r c e m a n a g e r - - m o d i 4 , n c s a . e d u : 4 0 0 0 ) )T h i s i s a g r o u n d r e q u e s t : e v e r y c o m p o n e n t o f t h e m u l t i r e q u e s t s p e c i f i e s a r e -

    source manager. A co-allocator can use the resourcemanager parameters spec-ified in this request to determine to which resource manager each component ofthe multirequest should be submitted.

  • 8/6/2019 1998 - A Resource Management Architecture for Met a Computing Systems - Czajkowski Et Al

    11/21

    72 K ar l Cza jkowski e t a l .N o t a t i o n s i n t e n d e d f o r s i m i l ar p u r p o s e s i n c l u d e t h e C o n d o r " c la s si fi ed a d " [1 8]

    a n d C h a p i n ' s " t a s k d e s c r i p t i o n v e c t o r " [ 5] . O u r w o r k i s n o v e l i n t h r e e r e s p e c t s :t h e t i g h t i n t e g r a t i o n w i t h a d i r e c t o r y se r v ic e , t h e u s e o f s p e c i f ic a t i o n re w r i t i n g t oe x p r e s s b r o k e r o p e r a t i o n s ( as d e s c r ib e d b e lo w ) , a n d t h e f a c t t h a t t h e l a n g u a g ea n d a s s o c i a te d t o o l s h a v e b e e n i m p l e m e n t e d a n d d e m o n s t r a t e d e f f e ct iv e w h e nl a y e r e d o n t o p o f n u m e r o u s d i f f er e n t lo w - le v e l s c h e d u l e rs .

    W e c o n c l u d e t h is s e c t i o n b y n o t i n g t h a t i t is th e c o m b i n a t i o n o f r e s o u r c eb r o k e r s , i n f o r m a t i o n s e r v i c e , a n d R S L t h a t m a k e s o n l i n e c o n t r o l p o s s i b l e i n o u ra r c h i t e c t u r e . T o g e t h e r , t h e s e s er v i c es m a k e i t p o s si b le t o c o n s t r u c t r e q u e s t s d y -n a m i c a ll y , b a se d o n c u r r e n t s y s t e m s t a t e a n d n e g o t i a t i o n b e t w e e n t h e a p p l i c a t io na n d t h e u n d e r l y i n g r e s o u rc e s .

    5 L o c al R e s o u r c e M a n a g e m e n tW e n o w d e s c r ib e t h e l ow e s t le v el o f o u r r e s o u r c e m a n a g e m e n t a r c h i t e c tu r e : t h el o ca l r e s o u r c e m a n a g e r s , i m p l e m e n t e d in o u r a r c h i t e c t u r e as G l o b u s R e s o u r c eA l l o c a t i o n M a n a g e r s ( G R A M s ) . A G R A M is r e s p o n si b le f o r

    1 . p r o c e s s i n g R S L s p e c i f i c a ti o n s r e p r e s e n t i n g r e s o u r c e r e q u e s t s , b y e i t h e r d e n y -i n g t h e r e q u e s t o r b y c r e a t i n g o n e o r m o r e p r o c e s s e s (a " j o b " ) t h a t s a t i s f yt h a t r e q u e s t ;2 . e n a b l i n g r e m o t e m o n i t o r i n g a n d m a n a g e m e n t o f j o b s c r e a t e d in r e sp o n s e t oa r e s o u r c e r e q u e s t ; a n d

    3 . p e r i o d i c a l l y u p d a t i n g t h e M D S i n f o r m a t i o n s e rv i c e w i t h i n f o r m a t i o n a b o u tt h e c u r r e n t a v a i l a b i li t y a n d c a p a b i l it i e s o f t h e r e s o u r c e s t h a t i t m a n a g e s .

    A G R A M s e rv e s a s t h e i n t e rf a c e b e t w e e n a w id e a r e a m e t a c o m p u t i n g e n v i -r o n m e n t a n d a n a u t o n o m o u s e n t i t y a b l e t o c r e a t e p r o c e s s e s , s u c h a s a p a r a l l e lc o m p u t e r s c h e d u l e r o r a C o n d o r p o o l . H e n c e , a r e s o u r c e m a n a g e r n e e d n o t c o r -r e s p o n d t o a s in g l e h o s t o r a sp e c if ic c o m p u t e r , b u t r a t h e r t o a s e r v ic e t h a t a c t so n b e h a l f o f o n e o r m o r e c o m p u t a t i o n a l r e so u r c es . T h i s u s e o f l o ca l sc h e d u l e ri n t e r f a c e s w a s f ir s t e x p l o r e d in t h e s o f t w a r e e n v i r o n m e n t f o r th e I - W A Y n e t w o r k -i n g e x p e r i m e n t [ 9] , b u t i s e x t e n d e d a n d g e n e r a l i z e d h e r e s i g n i fi c a n t ly t o p r o v i d ea r i c h e r a n d m o r e f l ex i b le i n t e rf a c e .

    A r e s o u r c e s p e c i f ic a t i o n p a s s e d t o a G R A M is a s s u m e d t o b e g r o u n d : t h a t i s ,t o b e s u ff ic ie n t ly c o n c r e t e t h a t t h e G R A M c a n i d e n t if y l o ca l r e s o u rc e s t h a t m e e tt h e s p e c i f ic a ti o n w i t h o u t f u r t h e r i n t e r a c t i o n w i t h t h e e n t i t y t h a t g e n e r a t e d t h er e q u e s t. A p a r t i c u l a r G R A M i m p l e m e n t a t i o n m a y a c h i ev e t h i s g o a l b y s ch e d u l-i n g re s o u r c e s i ts e l f o r , m o r e c o m m o n l y , b y m a p p i n g t h e r e s o u r c e s p e c i f ic a t i o ni n t o a r e q u e s t t o s o m e l o ca l r e s o u r c e a ll o c a t i o n m e c h a n i s m s . ( T o d a t e , w e h a v ei n t e r f a c e d G R A M s t o s ix d i f f e re n t s c h e d u l e r s o r r e s o u r c e a l lo c a t o r s : C o n d o r ,E A S Y , F o r k , L o a d L e v e l e r, L S F , a n d N Q E . ) H e n c e , t h e G R A M A P I p l a y s fo rr e s o u r c e m a n a g e m e n t a s im i l ar r ol e t o t h a t p l a y e d b y I P f o r c o m m u n i c a t i o n : i tc a n c o - e x i s t w i t h l o ca l m e c h a n i s m s , j u s t a s I P r id e s o n t o p o f e t h e r n e t , F D D I ,o r A T M n e t w o r k i n g t e c h n o l o g y .

  • 8/6/2019 1998 - A Resource Management Architecture for Met a Computing Systems - Czajkowski Et Al

    12/21

    A Resource Management Architecture for Metacomputing Systems 73The GRAM API provides functions for submitting and for canceling a job re-

    quest and for asking when a job (submitted or not) is expected to run. An imple-mentation of the latter function may use queue time estimation techniques [24].When a job is submitted, a globally unique job handle is returned that canthen be used to monitor and control the progress of the job. In addition, a jobsubmission call can request that the progress of the requested job be signaledasynchronously to a supplied callback URL. Job handles can be passed to otherprocesses, and callbacks do not have to be directed to the process that submittedthe job request. These features of the GRAM design facilitate the implementa-tion of diverse higher-level scheduling strategies. For example, a high-level brokeror co-allocator can make a request on behalf of an application, while the appli-cation monitor the progress of the request.

    5.1 GR AM Scheduling ModelWe discuss briefly the scheduling model defined by GRAM because this is rele-vant to subsequent discussion of co-allocation. This model is illus trated in Fig-ure 5, which shows the state transitions that may be experienced by a GRAMjob.

    ( S t a r t

    9~ J F a i l e d )I TPending)----~( Active~ Done )Fig. 5. State transition diagram for resource allocation requests submitted tothe GRAM resource management API

    When submitted, the job is initially pe nd in g, indicating tha t resources havenot yet been allocated to the job. At some point, the job is allocated the requestedresources, and the application starts running. The job then transitions to theactive state. At any point prior to entering the done state, the job can beterminated, causing it to enter the fai led state. A job can fail because of explicittermination, an error in the format of the request, a failure in the underlyingresource management system, or a denim of access to the resource. The sourceof the failure is provided as par t of the notification of state transit ion. When allof the processes in the job have terminated and resources have been deallocated,the job enters the done state.

  • 8/6/2019 1998 - A Resource Management Architecture for Met a Computing Systems - Czajkowski Et Al

    13/21

    74 Karl Czajkowski et al.5 . 2 G R A M I m p l e m e n t a t i o nThe GRAM implementations that we have constructed have the s tructu re shownin Figure 6. The principal components are the GRAM client library, the gate-keeper, the RSL parsing library, the job manager, and the GRAM reporter. TheGlobus security infrastructure (GSI) is used for authentication and for autho-rization.

    f C l i _ n . ~ .......MDSc l ie n t A P I calls......... .JrGRAM MDSUpdateMDS wit~G R A M d i e n t A P I c a l ls r e so u r ce s ta tvt o r e q u e s t r d .s o u rc e a l l o c a t i o n i n f o n n a l i o n iand process creation :\

    - - - ~;~ ~oundarv-~ - - - - - ( G R A M R e p o r t e r )QuerycurrentstatusIof resource( G atekeeper )~ C r e at e ( Local Resource Mana~er~

    AuthenticationJ. ~ . _ ~ 1 ~ u e s t ~ A!locate&GlobusVSecurityl ( ~ l a n a g e r~ ~ "N,4\creal]r~ esInfrastructure I * " N ~ I l l I l l, , %

    ( P r o c e s s )I RSLLibrarYl ~ ~ ~

    Fig. 6. Major components of the GRAM implementation. Those represented bythick-lined ovals are long-lived processes, while the thin-lined ovals are short-lived processes created in response to a request.

    The GRAM lient ibrary s used by an application or a co-allocator actingon behalf of an application. It interacts with the GRAM gatekeeper at a remotesite to perform mutual authentication and transfer a request, which includes aresource specification and a callback (described below).

    The gatekeepers an extremely simple component that responds to a requestby doing three tasks: performing mutual authentication of user and resource,determining a local user name for the remote user, and starting a job managerwhich executes as th at local user and actually handles the request. The first twosecurity-related tasks are performed by calls to the Globus security infrastruc-ture (GSI), which handles issues of site autonomy and substrate heterogeneity

  • 8/6/2019 1998 - A Resource Management Architecture for Met a Computing Systems - Czajkowski Et Al

    14/21

    A Resource Management Architecture for Metacomputing Systems 75in the security domain. To start the job manager, the gatekeeper must run as aprivileged program: on Unix systems, this is achieved via suid or inetd. How-ever, because the interface to the GSI is small and well defined, it is easy fororganizations to approve (and port) the gatekeeper code. In fact, the gatekeepercode has successfully undergone security reviews at a number of large super-computer centers. The mapping of remote user to locally recognized user nameminimizes the amount of code that must run as a privileged program; it alsoallows us to delegate most authorization issues to the local system.The j o b m a n a g e r is responsible for creating the actual processes requested bythe user. This task typically involves submitt ing a resource allocation request tothe underlying resource management system, although if no such system existson a particular resource, a simple fork may be performed. Once processes arecreated, the job manager is also responsible for monitoring the state of thecreated processes, notifying the callback contact of any state transitions, andimplementing control operations such as process termination. The job managerterminates once the job for which it is responsible has terminated.

    The G R A M r ep or te r is responsible for storing into MDS various informationabout scheduler structure (e.g., whether the scheduler supports reservation andthe number of queues) and state (e.g., total number of nodes, number of nodescurrently available, currently active jobs, and expected wait time in a queue). Anadvantage of implementing the GRAM reporter as a distinct component is thatMDS reports can continue even when no gatekeeper or job manager is running:for example, when the gatekeeper is run from inetd.

    As noted above, GRAM implementations have been constructed for six localschedulers to date: Condor, LSF, NQE, Fork, EASY, and LoadLeveler. Much ofthe GRAM code is independent of the local scheduler, and so only a relativelysmall amount of scheduler-specific code needed to be written in each case. Inmost cases, this code comprises shell scripts that use the local scheduler's user-level API. State transitions are handled mostly by polling, because this provedto be more reliable than monitoring job processes by using mechanisms providedby the local schedulers.

    6 R e so u r c e B r o k e r sAs noted above, we use the term resource broker to denote an entity in ourarchitecture that translates abstract resource specifications into more concretespecifications. As illustrated in Figure 2, this definition is broad enough to en-compass a variety of behaviors, including application-level schedulers [3] thatencapsulate information about the types of resource required to meet a particu-lar performance requirement, resource locators that maintain information aboutthe availability of various types of resource, and (ultimately) traders that cre-ate markets for resources. In each case, the broker uses information maintainedlocally, obtained from MDS, or contained in the specification to specialize thespecification, mapping it into a new specification that contain more detail. Re-quests can be passed to several brokers, effectively composing the behaviors of

  • 8/6/2019 1998 - A Resource Management Architecture for Met a Computing Systems - Czajkowski Et Al

    15/21

    76 Karl Czajkowski et al.those brokers, until eventually the specification is specialized to the point thatit identifies a specific resource manager. This specification can then be passedto the appropriate GRAM or, in the case of a multirequest, to a resource co-allocator.We claim that our architecture makes it straightforward to develop a varietyof higher-level schedulers. In support of this claim, we note that following thedefinition and implementation of GRAM services, a variety of people, includingpeople not directly involved in GRAM definition, were able to construct half adozen resource brokers quite quickly. We describe three of these here.

    6 . 1 N i m r o d - GDavid Abramson and Jonathan Giddy are using GRAM mechanisms to developNimrod-G, a wide-area version of the Nimrod [2] tool. Nimrod automates thecreation and management of large parametric experiments. It allows a user torun a single application under a wide range of input conditions and then toaggregate the results of these different runs for interpretation. In effect, Nimrodtransforms file-based programs into interactive "meta-applications" that invokeuser programs much as we might call subroutines.

    When a user first requests that a computational experiment be performed,Nimrod /G queries MDS to locate suitable resources. It uses informat ion in MDSentries to identify sufficient nodes to perform the experiment. T he initial Nimrod-G prototype operates by generating a number of independent jobs, which arethen allocated to computational nodes using GRAM. This module hides thenature of the execution mechanism on the underlying platform from Nimrod,hence making it possible to schedule work using a variety of different queuemanagers without modification to the Nimrod scripts. As a result, a reasonablycomplex cluster computing system could be retargeted for wide-area executionwith relatively little effort.

    In the future, the Nimrod-G developers plan to provide a higher level brokerthat allows the user to specify time and cost constraints. These cons traints willbe used to select computational nodes that can meet user requirements for timeand cost or, if constraints cannot be met, to explain the nature of the cost/timetradeoffs. As part of this work, a dynamic resource allocation module is plannedthat will monitor the state of each system and relocate work when necessary inorder to meet the deadlines.

    6.2 AppLeSRich Wolski has used GRAM mechanisms to construct an application-levelscheduler (AppLeS) [3] for a large, loosely coupled problem from computationa lmathematics. As in Nimrod-G, the goal was to map a large number of inde-pendent tasks to a dynamically varying pool of available computers. GRAMmechanisms were used to locate resources (including parallel computers) andto initiate and manage computation on those resources. AppLeS itself provided

  • 8/6/2019 1998 - A Resource Management Architecture for Met a Computing Systems - Czajkowski Et Al

    16/21

    A Resource Managem ent Architecture for Me tacom puting Systems 77f a u l t t o l e r a n c e , s o t h a t e r r o r s r e p o r t e d b y G R A M w o u l d r e s u l t i n a t a s k b e i n gr e s u b m i t t e d e l s e w h e r e .6 . 3 A G r a p h i c a l R e s o u r c e S e l e c t o rTh e g raph ica l r e sou rce se lec to r (GRS) i l lu s t r a t ed in F igu re 7 is an exa mp le o f ani n t e r a c t i v e r e s o u r c e s el e c to r c o n s t r u c t e d w i t h o u r s e rv ic e s. T h i s J a v a a p p l i c a t i o na l lows the u se r to bu i ld up a ne twork r ep resen t ing the r e sou rces r equ i r ed fo r ana p p l i ca t i o n; a n o t h e r n e t w o r k c a n b e c o n s t r u c t e d t o m o n i t o r t h e s t a t u s o f c a n d i-d a t e p h y s i c a l r e s ou r c es . A c o m b i n a t i o n o f a u t o m a t i c a n d m a n u a l t e c h n i q u e s i sthe n u sed to gu ide r e sou rce se lec tion , even tu a l ly gene ra t ing an R SL spec i f i ca t ionfo r the r e sou rces in ques t ion . MDS se rv ices a r e u sed to ob ta in the in fo rma t ionused fo r r e sou rce mon i to r ing and se lec t ion , and r e sou rce co -a l loca to r se rv icesa r e u s e d t o g e n e r a t e t h e G R A M r e q u e s t s r e q u i r e d t o e x e c u t e a p r o g r a m o n c e ar e sou rce se lec t ion i s made .

    F i g . 7 . A s c r e e n s h o t o f t h e G r a p h i c a l R e s o u r c e S e l e c t or . T h i s n e t w o r k s h o w st h r e e c a n d i d a t e r e s o u r c e s a n d a s s o c i a t e d n e t w o r k c o n n e c t i o n s . S t a t i c i n f o r m a -t i o n r e g a r d i n g o p e r a t i n g s y s t e m v e r s i on a n d d y n a m i c a l l y u p d a t e d i n f o r m a t i o nr e g a r d i n g t h e n u m b e r o f c u r r e n t l y av a il a bl e n o d e s ( f r e e n o d e s ) a n d n e t w o r k la -t e n c y a n d b a n d w i d t h ( in m s ec a n d M b / s , r e s p e c ti v e ly ) al lo w s t h e u s e r t o s e l e c ta p p r o p r i a t e r e s o u rc e s fo r a p a r t i c u l a r e x p e r i m e n t .

    7 R e s o u r c e C o - a l l o c a t i o nT h r o u g h t h e a c t i o n s o f o n e o r m o r e r e s o u rc e b ro k e r s, t h e r e q u i r e m e n t s o f a napp l i ca t ion a r e r e f ined in to a g roun d RS L exp ress ion . I f t he exp ress ion cons i s t s

  • 8/6/2019 1998 - A Resource Management Architecture for Met a Computing Systems - Czajkowski Et Al

    17/21

    7 8 K a r l C z a j k o w s k i e t a l .o f a s i n g le r e s o u r c e r e q u e s t , i t c a n b e s u b m i t t e d d i r e c t l y t o t h e m a n a g e r t h a tc o n t r o l s t h a t r e s o u r ce . H o w e v e r , a s d i s cu s s e d a b o v e , a m e t a c o m p u t i n g a p p l i c a -t i o n o f t en r e q u i re s t h a t s e v e ra l r e s o u r c e s - - s u c h a s t w o o r m o r e c o m p u t e r s a n di n t e r v e n i n g n e t w o r k s - - b e a l l o c a t e d s i m u l t a n e o u s l y . I n t h e s e c a s e s , a r e s o u r c eb r o k e r p r o d u c e s a m u l t i r e q u e s t , a n d c o - a l l o c a t io n is r e q u i re d . T h e c h a l le n g e i nr e s p o n d i n g t o a c o - a l l o c a t i o n r e q u e s t i s t o a l l o c a t e t h e r e q u e s t e d r e s o u r c e s ina d i s t r i b u t e d e n v i r o n m e n t , a c r o s s t w o o r m o r e re s o u r c e m a n a g e r s , w h e r e g l o b a ls t a t e , s u c h a s a v a i l a b i l i t y o f a s e t o f r e s o u r c e s , i s d i f fi c u lt t o d e t e r m i n e .

    W i t h i n o u r re s o u r ce m a n a g e m e n t a r c h i t e c t u r e , m u l t i re q u e s t s a r e h a n d l e d b ya n e n t i t y c a l l e d a r e s o u r c e c o - a l l o c a t o r . I n b r i e f , t h e r o l e o f a c o - a l l o c a t o r i s t os p l it a r e q u e s t i n to i ts c o n s t i t u e n t c o m p o n e n t s , s u b m i t e a c h c o m p o n e n t t o t h ea p p r o p r i a t e r e s o u r c e m a n a g e r , a n d t h e n p r o v i d e a m e a n s fo r m a n i p u l a t i n g t h er e s u l ti n g s e t o f r e s o u r c e s a s a w h o le : f or e x a m p l e , f o r m o n i t o r i n g j o b s t a t u s o rt e r m i n a t i n g t h e j o b . W i t h i n t h e s e g e n e r a l g u id e li n es , a r a n g e o f d i f fe r e n t c o -a l l o c a t i o n s e rv i c e s c a n b e c o n s t r u c t e d . F o r e x a m p l e , w e c a n i m a g i n e a l l o c a t o r st h a t

    - m i r r o r c u r r e n t G R A M s e m a n t i c s : t h a t is , r e q u i r e a ll r e s o u r c e s t o b e a v a i la b l eb e f o r e t h e j o b i s a l l o w e d t o p r o c e e d , a n d f a i l g l o b a l l y i f f a i l u r e o c c u r s a t a n yr e s o u r c e ;

    - a l l o c a t e a t l e a s t N o u t o f M r e q u e s t e d r e s o u r c e s a n d t h e n r e t u r n ; o r- r e t u r n i m m e d i a t e l y , b u t g r a d u a l l y r e t u r n m o r e r e so u r c es as t h e y b e c o m e

    a v a i l a b l e .E a c h o f t h e s e s e r v i c e s is u s e f u l t o a c l a s s o f a p p l i c a t i o n s . T o d a t e , w e h a v e h a dt h e m o s t e x p e r i e n c e w i t h a c o - a l l o c a t o r t h a t t a k e s t h e f i rs t o f t h e s e a p p r o a c h e s :t h a t is , e x t e n d s G R A M s e m a n t i c s to p r o v i d e fo r s i m u l t a n e o u s a l lo c a t io n o f ac o l le c t io n o f r e s o u r ce s , e n a b l i n g t h e d i s t r i b u t e d c o l le c t io n o f p r o c e s s e s t o b et r e a t e d a s a u n i t . W e d is c u s s th i s c o - a l l o c a t o r in m o r e d e t a i l .

    F u n d a m e n t a l t o a G R A M - s t y l e c o n c u r r e n t a l lo c a t io n a l g o r i t h m i s t h e a b i l i tyt o d e t e r m i n e w h e t h e r t h e d e s i r e d s e t o f r e s o u r c es is a v a i l a b le a t s o m e t i m e i n t h ef u t u r e . I f t h e u n d e r l y i n g l o c al s c h e d u le r s s u p p o r t r e s e r v a t io n , t h i s q u e s t i o n c a nb e e a s i l y a n s w e r e d b y o b t a i n i n g a l is t o f a v a i l a b le t i m e s l o ts f r o m e a c h p a r t i c i p a t -i n g r e s o u r c e m a n a g e r , a n d c h o o s i n g a s u i t a b l e t i m e s l o t [2 3]. I d e a l ly , th i s s c h e m ew o u l d u s e tr a n s a c t i o n - b a s e d r e s e r v a t i o n s ac r o s s a se t o f r e s o u r c e m a n a g e r s , a sp r o v i d e d b y G a l l o p [26]. I n t h e a b s e n c e o f t r a n s a c t i o n s , t h e a b i l i ty e i t h e r t o m a k ea t e n t a t i v e r e s e r v a t i o n o r t o r e t r a c t a n e x i s t i n g r e s e r v a t io n i n n e e d e d . H o w e v e r ,i n g e n e r a l , a r e s e r v a t i o n - b a s e d s t r a t e g y is l i m i t e d b e c a u s e c u r r e n t l y d e p l o y e dl oc a l r e s o u r ce m a n a g e m e n t s o lu t io n s d o n o t s u p p o r t r e se r v a ti o n .

    I n t h e a b s e n c e o f r e s e r v a ti o n , w e a r e f o r ce d t o u s e in d i r e c t m e t h o d s t o a c h i e v ec o n c u r r e n t a l l o c a ti o n . T h e s e m e t h o d s o p t i m i s t i c a l l y a l l o c a t e r e s o u r c e s i n t h eh o p e t h a t t h e d e s i r e d s e t w i ll b e a v a i l a b l e a t s o m e " r e a s o n a b l e " t i m e i n th ef u t u r e . G u i d e d b y s o u r c e s o f i n f o r m a t i o n , s u c h a s t h e c u r r e n t a v a i l a b i l it y o fr e s o u r c e s ( p r o v i d e d b y M D S ) o r q u e u e - t i m e e s t i m a t i o n [24, 7], a r e s o u r c e b r o k e rc a n c o n s t r u c t a n R S L r e q u e st t h a t is l i k e l y , b u t n o t g u a r a n t e e d , t o s u c ce e d . I ff o r s o m e r e a s o n t h e a l l o c a t i o n e v e n t u a l l y f ai ls , a ll o f t h e s t a r t e d j o b s m u s t b et e r m i n a t e d . T h i s a p p r o a c h h a s s e v e ra l d r a w b a c k s :

  • 8/6/2019 1998 - A Resource Management Architecture for Met a Computing Systems - Czajkowski Et Al

    18/21

    A Resource Management Architecture for Metacomputing Systems 79- It is inefficient in that computational resource are wasted while waiting forall of the requested to become available.- We need to ensure that application components do not star t to execute before

    the co-allocator can determine whether the request will succeed. Therefore,the application must perform a barrier operation to synchronize startupacross components, meaning that the application must be altered beyondwhatis required for GRAM.

    - Detecting failure of a request can be difficult if some of the request compo-nents are directed to resource managers that interface to queue-based localresource management systems. In these situations, a timeout must be usedto detect failure.

    However, in spite of all of these drawbacks, co-allocation can frequently beachieved in practice as long as the resource requirements are not large comparedwith the capacity of the metacomputing system.We have implemented a GRAM-compatible co-allocator that implements a

    job abstraction in which multiple GRAM sub obs are collected into a singledistributed job entity. State information for the distributed job is synthesizedfrom the individual states of each sub job, and job control (e.g., cancellation) isautomatically propagated to the resource managers at each subjob site. Subjobsare started independently and as discussed above must perform a runtime check-in operation. With the exception of this check-in operation, the co-allocatorinterface is a drop-in replacement for GRAM.We have used this co-allocator to manage resources for SF-Express [19, 4], alarge-scale distributed interactive simulation application. Using our co-allocatorand the GUSTO testbed, we were able to simultaneously obtain 852 computenodes on three different architectures located a t six different computer centers,controlled by three different local resource managers. The use of a co-allocationservice significantly simplified the process of resource allocation and applicationstartup.Running SF-Express "at scale" on a realistic testbed allowed us to studythe scalability of our co-allocation strategy. One clear lesson learned is tha t thestrict "all or nothing" semantics of the distributed job abstraction severely limitsscalability. Even if each individual parallel computer is reasonably reliable andwell understood, the probability of sub ob failure due to improper configuration,network error, authorization difficulties, and the like. increases rapidly as thenumber of sub obs increases. Yet many such failure modes resulted simply froma failure to allocate a specific instance of a commodity resource, for which anequivalent resource could easily have been substituted. Because such failuresfrequently occur after a large number of sub obs have been successfully allocated,it would be desirable to make the substitution dynamically, rather than to cancelall the allocations and start over.

    We plan to extend the current co-allocation structure to support such dy-namic job structure modification. By passing information about the nature ofthe subjob failure out of the co-allocator, a resource broker can edit the specifica-tion, effectively implementing a backtracking algorithm for dist ributed resource

  • 8/6/2019 1998 - A Resource Management Architecture for Met a Computing Systems - Czajkowski Et Al

    19/21

    80 Ka r l Cza jkowski e t a l .a l lo c a t io n . N o t e t h a t w e c a n e n c o d e th e n e c e s s a r y i n f o r m a t i o n a b o u t f a il u r e i n am o d i f ie d v e r si o n o f t h e o r i g i n al R S L r e q u e s t , w h i c h c a n b e r e t u r n e d t o t h e c o m -p o n e n t t h a t o r i g i n a l l y r e q u e s t e d t h e c o - a l l o c a t i o n s e r v ic e s . I n t h i s w a y , w e c a ni t e r a t e t h r o u g h t h e r e s o u r c e - b r o k e r / c o - a l l o c a t i o n c o m p o n e n t s o f t h e r e s o u r c em a n a g e m e n t a r c h i t e c t u r e u n t i l a n a c c e p t a b l e c o l le c t io n o f r e s o u r c es h a s b e e na c q u i r e d o n b e h a l f o f t h e a p p l ic a t io n .

    8 C o n c l u s i o n sW e h a v e d e s c ri b e d a re s o u r c e m a n a g e m e n t a r c h i t e c t u r e fo r m e t a c o m p u t i n g s y s -t e m s t h a t a d d r e s se s r e q u i r e m e n t s o f s i t e a u t o n o m y , h e t e r o g e n e o u s s u b s t r a t e s,p o l i c y e x t en s i b i li t y , c o - a l l o c a ti o n , a n d o n l i n e c o n t r o l. T h i s a r c h i t e c t u r e h a s b e e nd e p l o y e d a n d a p p l i e d s u c c e s s fu l l y i n a l a rg e t e s t b e d c o m p r i s i n g 1 5 s it e s, 3 3 0c o m p u t e r s , a n d 3 6 00 p r o ce s so r s , w i t h i n w h i c h L S F , N Q E , L o a d L e v e l e r , E A S Y ,F o r k , a n d C o n d o r w e r e u s e d a s lo c a l s c h e d u l e rs .

    T h e p r i m a r y f o c u s o f o u r f u t u r e w o r k i n t h is a r e a w i ll b e o n t h e d e v e l o p m e n to f m o r e s o p h i s t i c a te d r e s o u r c e b r o k e r a n d r e s o u r c e c o - a ll o c a t o r se r v ic e s w i t h i no u r a r c h i t e c t u r e , a n d o n t h e e x t e n s i o n o f o u r r e s o u r c e m a n a g e m e n t a r c h i t e c t u r et o e n c o m p a s s o t h e r r e s o u r c e s s u c h a s d i s k a n d n e t w o r k . W e a r e a l s o i n t e r e s t e di n t h e q u e s t i o n o f h o w p o l ic y i n f o r m a t i o n c a n b e e n c o d e d s o a s t o f a c i l i ta t ea u t o m a t i c n e g o t i a t io n o f p o l ic y r e q u i r e m e n t s b y r e so u r c es , u se rs , a n d p r o c e ss e ss u c h a s b r o k e r s a c t i n g a s i n t e r m e d i a r i e s .

    A c k n o w l e d g m e n t sW e g r a t e f u l l y ac k n o w l e d g e t h e c o n t r i b u t io n s m a d e b y m a n y c o l le a g u e s t o t h ed e v e l o p m e n t o f t h e G U S T O t e s t b e d a n d t h e G l o b u s re s o u rc e m a n a g e m e n t a rc h i-t e c t u r e : i n p a r t i c u l a r, D o r u M a r c n s i u a t N C S A a n d B i l l S a p h i r a t N E R S C . T h i sw o r k w a s s u p p o r t e d b y D A R P A u n d e r c o n t r a c t N 6 6 0 01 - 96 - C -8 5 2 3, a n d b y t h eM a t h e m a t i c a l , I n f o r m a t i o n , a n d C o m p u t a t i o n a l S c i en c e s D i v i si o n s u b p r o g r a mo f t h e O f fi ce o f C o m p u t a t i o n a l a n d T e c h n o l o g y R e s e a r c h , U .S . D e p a r t m e n t o fE n e r g y , u n d e r C o n t r a c t W - 3 1 - 1 0 9 - E n g - 3 8 .

    R e f e r e n c e s[1] C ra y R e s e a rc h , 1997 . D oc um e n t N um be r IN -2153 2 / 97 .[2] D. Ab ram son, R . Sosic, J . Giddy, and B . Hall . N imrod: A too l for per fo rm ing pa-r a m e t e r i s e d s i mu l a ti ons u s ing d i s t r i bu t e d w orks ta t ions . I n Proc . ~ th IE EE Symp.

    on High Performance Distributed Com puting. IEE E C om put e r S oc i e t y P re s s , 1995 .[3] F . B erm an, R . Wolski, S . F igue i ra , J . Schopf , and G . Shao. App l ica t ion- leve ls c he du li ng on d i s t r ibu t e d he t e roge ne ous ne t w orks. I n Proceedings of Supercom-puting '96. ACM Press , 1996.[4] S. B r un e t t a nd T . G o t t s c ha l k . S c a la b l e M odS A F s imu l a t i ons w i t h mo re t ha n50,000 vehicles us ing m ult iple scalable paral le l processors . In Proceedings of theSimulation Interoperability Workshop, 1997.

  • 8/6/2019 1998 - A Resource Management Architecture for Met a Computing Systems - Czajkowski Et Al

    20/21

    A R e s ourc e M a na ge m e n t A rc h i t e c t u r e fo r M e t a c om pu t i ng S ys t e ms 81[5] S . Cha pin . Dis t r ib uted schedul ing sup po r t in the presence of au ton om y. In Proc.Heterogeneous Computing Workshop, pages 22-29, 1995.[6] Jose ph Czy zyk , Michae l P . Mesnie r , and Jorge J . Mor~ . The Ne two rk-E nab led Op-

    t i mi z a t i on S ys t e m (N EO S ) S e rve r. P r e p r i n t MC S -P 615-0996, A rgonn e N a t i ona lLa bo rato ry , Arg onn e, I ll inois , 1996.[7] A. Downey. P redic t ing queue t imes on space-shar ing para l l e l computers . In Pro-ceedings of the 11th International Parallel Processing Symposium, 1997.[8] S . Fi tzg erald , I. Fo s ter , C. Ke sselm an, G. vo n Laszewski, W . Sm ith, and S. Tue cke.A d i r e c t o ry s e rv ic e fo r c on fi gu ri ng h i gh -pe r fo rma nc e d i s t r ibu t e d c om pu t a t i ons . I nProc . 6 th IEEE Symp . on High Per formance Dis t r ibu ted Comput ing , pa ge s 365-375 . IEEE C o m put e r S oc i e ty P re s s, 1997 .[9] I . Fo s ter , J . Geis ler , W . Nickless, W . Sm ith, and S. Tueck e. Software inf ras t ru ctu ref o r t h e I -W A Y m e t a c o m p u t i n g e x p e r im e n t . Concurrency: Pra ctice ~ Experience,1998. to appear .[10] I . Fos te r and C . Kesse lman. Globus : A m etac om pu t ing inf ras t ruc ture toolk i t .In ternat ional Journal o f Supercomputer Appl icat ions , 11(2):115-128, 1997.[11 ] G EN IA S S of t w a re G mbH . C O D IN E: C omput i ng i n d i s t r i bu t e d ne t w orke d e nv i -ronme n t s , 1995 . h t t p : / / w w w . ge n i a s . de / ge n i a s / e ng l i s h / c od i ne . h t ml .[12] A. Gr imshaw, W. Wul f , J . F rench , A . Weaver , and P . Reynolds , J r . Legion: Thene x t l og ic a l s t e p t ow a rd a na t i onw i de v i r t ua l c omp u t e r . Te c hn i c a l R e p or t C S -94-21 , D e p a r t m e n t o f C om put e r S c ie nc e, U n i ve r s it y o f V i rg in i a , 1994.[1 3] T h e P S C H E D A P I W o r k in g G r o u p . P S C H E D : A n A P I f o r p a r al le l j o b / r e s o u r c em a n a g e m e n t v e r si o n 0 .1 , 1 9 9 6 . h t t p : // p a r a l l e l .n a s . n a s a . g o v / P S C H E D / .[14] R . H e n de r s on a nd D . Tw e t e n . P o r t a b l e B a t c h S ys t e m: Ex t e rna l r e f e re nc e s pec if i-c a t i on . Te c hn i c a l r e po r t , N A S A A me s R e s e a rc h C e n t e r , 1996.[15] In t e rna t i ona l B us i nes s Ma c h i ne s C orpora t i on , K i ngs t on , N Y . I B M Load Leveler:User's Guide, S e p t e mbe r 1993 .[16] J . Jones and C . Br ickel l. Second eva lua t ion of job qu euing / sched ul ingsof tware : Ph ase 1 repo r t . NA S Techn ica l Re po r t NAS-97-013,NA SA A m es Research Cente r , M offe t t F ie ld , CA 94035-1000, 1997.h t t p : / / s c i e n c e . n a s . n a s a . g o v / P u b s / T e c h R e p o r t s / N A S r e p o r t s / N A S - 9 7 -013 / j ms . e va l . r e p2 . h t ml .[17] D a v i d A . Lifka . T he A N L / IB M S P s c he du l ing s ys t e m. In T h e IPPS ' 9 5 W o r k s h o pon Job Scheduling Strategies f or P aralle l Processing, pages 187-191, Apri l 1995.

    [18] M. Li tzkow , M. Livny, and M. M utka . Co ndo r - a hu nte r of id le wo rks ta t ions . InProc. 8 th In t l Conf . on Dis tr ibuted Computing Sys tems, pag es 104-111, 1988.[19] P . Mess ina , S . Brune t t , D . Davis , T . Got t scha lk , D . Curkenda l l , L . Ekroot , andH. S iegel . Dis t r ib uted in te rac t ive s imula t ion for syn the t i c forces. In Proceedingsof the 11th In ternat ional Paral le l Process ing Symposium, 1997.[20] K. M oore , G . Fagg, A . Geis t , and J . Do ngar ra . Sca lable ne two rked inform at ionproc e s si ng e nv i ronm e n t (S N IP E ) . I n Proceedings o f Supercomputing '97 , 1997.[21] B . C . Neum an. P rospero : A tool for organiz ing in te rn e t resources. ElectronicNetwo rking: Research, Appl icat ions , and Pol icy, 2(1) :30-37 , Spr ing 1992.[22] B . C . N e um a n a nd S . R a o . Th e P ros pe ro r e s ou rc e ma na ge r : A s c a la b l e fr a me -work for processor a l loca t ion in d i s t r ibuted sys tems . Concurrency: Pr ac tice ~4Experience, 6(4):339-355, 1994.[23] R . R a m a m oor t h i , A . R i fk i n , B . D i mi t rov , a nd K . M. C ha ndy . A ge ne ra l r e s ou rc ereserva t ion f ramew ork for sc ien t if ic com put ing . In Scient i f ic Computing in Object-Oriented Paral le l Environments , pages 283-290. Springer-Verlag, 1997.[24] W. Smi th , I . Fos te r , and V. Taylor . P redic t ing appl ica t ion run t imes us ing h i s -tor ica l informat ion . Lecture Notes on Computer Science, 1998.

  • 8/6/2019 1998 - A Resource Management Architecture for Met a Computing Systems - Czajkowski Et Al

    21/21

    82 K ar l Cza jkow sk i e t a l .[2 5] A m in V a h d a t , E s h w a r Be la n i, P a u l E a s th a m , C h a d Y o s h ika w a , T h o m a s A n d e r s o n ,

    D a v id Cu l le r , a n d M ic ha e l D a h l in . W e b O S : O p e r a t i n g s y s t e m s e r vi c e s f o r w id ea r e a a p p l i c a t io n s . I n 7 th S y m p o s i u m o n H ig h P e r f o r m a n c e D i s t r ib u t e d C o m p u t in g ,to appear, Ju ly 1998 .[26] J . W eissman . Gal lop : Th e benef i t s o f w ide -a rea com pu t in g fo r para l le l p rocess ing .T e c h n i c a l r e p o r t , U n iv e r s i t y o f T e x a s a t S a n A n to n io , 1 99 7.[27] J . We issm an and A . Gr imsh aw . A f edera ted mode l fo r schedu l ing in w id e-a reas y s t e m s . I n Proc . 5 th IEEE Syrup . on High Per formance Dis t r ibu ted Comput ing ,1996.[28] S . Zhou . LSF: Lo ad shar ing in la rge- sca le he te rogen eous d i s t r ibu ted s ys tem s . InProc . Workshop on Clus ter Comput ing , 1992.


Recommended