+ All Categories
Transcript
Page 1: Presenting a Single System Image with Fine Granularity Mounts

Presenting a Single System Image with Fine Granularity Mounts

Charles H . Sauer

I B M A d v a n c e d Engineer ing Syslcms

A u s t i n , Texas 78758

A B S T R A C T

I n d is t r ibuted system env i ronmen t s , a var ie ly of admin i s t r a t i ve env i ronmen t s (system images) can be presented, ref lect ing di f ferent user requirements and admin i s t r a t i ve objectives. One o f the most impor t an t system images is the so cal led "single system i m a g e . " T h i s paper provides a context and d e f i n i t i o n for single system image. It descrihes an effecl ive approach to c o l l e c t i n g m u l t i p l e UNIX'" systems i n t o a single system image, based on s imple use o f remote moun t s at f ine granular i t ies , i n c l u d i n g i n d i v i d u a l fi les. T h e approach is designed to a l low for r ep l i ca t ion o f ad­min i s t r a t i ve files, e.g. , /e tc/passwd, and graceful reconf igura t ion o f the system to accomoda te p lanned outages and respond to u n p l a n n e d outages. Experiences w i t h this approach and A/X" D i s ­t r ibu ted Services are summar i zed .

I N T R O D U C T I O N

I n a d i s t r ibu ted system e n v i r o n m e n t , i n d i v i d u a l machines usual ly p e r f o r m roles as servers ( f i l e , p r i n t , name , . . . ) and /o r cl ients , Subsets o f machines may be associated i n t o adminis t ra t ive groups or the associations between machines may r ema in p r i m a r i l y pai rwise and ad hoc. Figure I i l lustrates a typ ica l software deve lopment e n v i r o n m e n t . Some machines p rov ide services to a l l o ther machines i n the o rgan iza t ion , e .g . , n e t w o r k news, source c o n t r o l , special devices, e lc . Some machines are ad­min i s t e red d i rec t ly by their owners and have o n l y loose associations w i t h o i l i e r machines , e .g . , Ihe organiza t ion w ide servers. M a n y o f the o ther mach ines are co l l ec ted i n t o single system images, based on suborganizal ions. These machines are admin is te red as a g roup , w i t h the in ten t that users can use any o f the machines oquiva len t ly . T h e r e w i l l he inherent except ions to this, e.g. , some machines w i l l have co lo r displays and others w i l l have m o n o c h r o m e displays. A n d even where the hardware c o n ­f igurat ions are the same, the end users w i l l usual ly be able lo dis t inguish one m a c h i n e f r o m another , e.g. , by que ry ing a m a c h i n e readable serial number . B u i a successful approach w i l l give users the i l l u s i o n that a l l the machines are Ihe same under mos l c i rcumstances :

user accounts/passwords. A user can login to any o f the mach ines using the same login name and the same password. Regardless o f w h i c h m a c h i n e is used, (he user has Ihe same h o m e d i rec to ry and execut ion e n v i r o n m e n t . W h e n the user changes his /her password, us ing the standard passwd c o m m a n d , the change is effect ive i m m e d i a t e l y on a l l machines i n the single system image. W h e n an admin i s t r a to r adds a new account , this is done once for a l l ihe mach ines .

availability. Even though one o r m o r e o f the machines is unava i lab le , the rest o f Ihe machines are s l i l l able lo f u n c t i o n together and present the same system image, except for resources w h i c h exisl on ly on unavai lable machines , A m a c h i n e w h i c h cannot connect to o i l i e r machines is s l i l l usable.

administered services. System wide func t ions , e.g. , p r in t and m a i l service, f u n c t i o n the same f r o m m a c h i n e to m a c h i n e . i f m a i l is seni lo a par t icu lar user, i l can be seen/handled on any o f the machines ,

UNIX, is developed and licensed by AT&T. Unix is a registered trademark in the U.S.A. and olhcr countries. AIX is a trademark of International Business Machines Corporation.

February 17, 1988 CHS - I

Page 2: Presenting a Single System Image with Fine Granularity Mounts

T h i s is n o l an exhaustive list, but is intended to be indicat ive . Wherever possible, the adminis t rator should view Ihe col lec t ion o f machines as i f it were one machine and use the same procedures that wou ld be used on a single machine . We w i l l use the above characterislics as an operational de f in i t ion o f "single system image" and discuss an approach which we believe is effective in meeting Ihe de f in i ­t i on .

/usr/news • /usr/lpp/pl8cc

• •

/dev/aps5 /dev/mt

• d75

0 auschs

bgeynon

• D

Arch

a o • • D

Build

• Test • • •

• • •

• /usr/src

• • • • D • Prototype

y

Figure 1 - Associations o f machines.

Distributed Services ( D S ) provides distr ibuted operating system capabilities for the A I X operat­ing system. These include distributed file services w i t h local / remote transparency, distributed inter­process c o m m u n i c a t i o n and a number of administrat ive services. For background in fo rma t ion on DS, see Saner et a! [ 1 , 2 , 3 ] and Lev i t t [ 4 ] . One of the design goals o f DS was lo provide support for mixed administrat ive environments , such as the one depicted In Figure 1, using the same protocols and convent ions across the administrat ive envi ronment . One o f Ihe cornerstones o f Ihis adminis t ra­tive f lex ib i l i ty is a general remote mount model . T h e focus of this paper is to show how the features of this remote mount model can be used to s imply and effectively present a single system image. We first describe some of the characteristics o f the DS mount mode l , then describe the approach to single sysiem image, and f inal ly discuss some addi t ional related topics.

D I S T R I B U T E D S E R V I C E S MOUNT M O D E L Distr ibuted Services uses " remole moun t s" lo achieve local / remote transparency. A remole

moun t is much l ike a convent ional mount in the Unix operating system, but the mounted fi lcsyslem is on a different machine than the mounted on d i rec tory . Once the remote mount is established, local and remote files appear in Ihe same directory hierarchy, and, w i th m i n o r exceplions, fi le sysiem calls

February 17, 19SS CIIS - 2

Page 3: Presenting a Single System Image with Fine Granularity Mounts

have Ihe same effeci regardless o f w h e l h e r f i l c s ( d i r e c l o r i c s ) are local o r r e m o t e 1 . M o u n t s , b o l h c o n ­v e n t i o n a l and r e m o l e , are t yp i ca l ly made as p a r i o f system star tup, and thus are established before users l o g i n . A d d i t i o n a l r e mo l e moun t s can he established d u r i n g n o r m a l system o p e r a t i o n , i f des i red.

C o n v e n t i o n a l moun t s requi re that an en t i r e f i le sysiem be m o u n t e d . D i s t r i bu ted Services r emo le m o u n t s a l l o w moun t s o f subdirector ies and i n d i v i d u a l files o f a r emo te Rlesystem over a loca l d i rec­tory or f i l e , respect ively. Fi le g ranu la r i ty m o u n t s are useful i n c o n f i g u r i n g a single system image. F o r e x a m p l e , a shared copy o f / e t c / p a s s w d m a y be m o u n t e d over a local / e t c / p a s s w d w i t h o u t h i d i n g o the r , m a c h i n e specif ic , files i n the / e t c d i r e c t o r y . Use o f m o u n t s at a f ine granula r i ty is key lo this approach lo single system image.

Virtual File Systems

T h e Di s t r i bu ted Services remote m o u n t design is based on the V i r t u a l Fi le System approach used w i l h NFS [5,6]. T h i s approach a l lows c o n s t r u c l i o n o f essentially a rb i t ra ry m o u n l h ierarchies , i n c l u d i n g m o u n t i n g a loca l object over a r emo le object , m o u n t i n g a r e m o l e object over a r emo le ob jec t , m o u n l i n g an object m o r e than once w i t h i n the same h ie ra rchy , m o u n t h ierarchies spanning m o r e than one m a c h i n e , e l c . T h e m a i n cons t ra in t is tha i moun t s are on ly effect ive i n the m a c h i n e p e r f o r m i n g the m o u n t .

I n c o n j u n c t i o n w i t h us ing ihe V i r t u a l Fi le System concep t , we necessarily have replaced Ihe t r a d i t i o n a l n a m e i ( ) ke rne l f u n c t i o n , w h i c h t ranslated a f u l l pa th name lo an i - n u m b e r , w i t h a c o m ­ponen t by c o m p o n e n t l o o k u p O f u n c t i o n , l o o k u p ! ] is used b o t h for loca l and remote path name r e so lu t ion . T h e arguments to l o o k u p ! ) are a f i l e handle represent ing a d i r ec to ry and the name oT a c o m p o n e n t lo be f o u n d i t ! that d i r ec to ry , l o o k u p [ ) returns a handle for the c o m p o n e n t , iT f o u n d . A hand le is ef fect ively a p o i n l e r to Ihe on disk inode for Ihe co r r e spond ing object and a genera t ion n u m b e r for tha t i n o d e . T h e genera t ion n u m b e r is used for subsequent va l i d i t y tests.

W h e n a c l ien t successfully requests a m o u n l f r o m a server, i l receives an ha/idle for Ihe object it is m o u n l i n g and slores i l i n its m o u n l table. W h e n the c l ien t is pars ing a fi le p a t h n a m e , e.g. , for o p e n ( ) , and encounters the m o u n t e d ob jee l , Ihe handle is g iven lo Ihe server as an a r g u m e n l i n the l o o k u p ( ) r emo te p rocedure ca l l . T y p i c a l l y , the m o u n l e d objeel is a d i r ec to ry , and Ihe server w i l l l o o k u p an objeel w i t h i n that d i r e c t o r y .

F o r example , let us suppose that a c l i e n l m o u n l s server 's / B over / a / b . T h e c l i en t t hen opens / a / b / c . W h e n the c l ien t gets to b / c , it passes Ihe handle for b and the c o m p o n e n t c to Ihe server, request ing the server lo l o o k u p and r e l u r n a handle for c l h a l can be used i n Ihe actual o p e n ( ) c a l l . T h e server w i l l r e l u r n a handle for / B / c .

For fi le granularity m o u n l s , Ihe s i r ing f o r m o f the fi le name c o m p o n e n t is r e t u r n e d , a long w i t h the f i le handle o f the ( rea l ) parent d i r e c t o r y . T h i s a l te rna t ive l o using the fi le handle Tor the m o u n t e d file a l lows rep lacement of Ihe m o u n t e d file w i t h a new version w i t h o u t loss o f access lo the fi le ( w i l h that n a m e ) . ( F o r example , w h e n / e t c / p a s s w d is m o u n t e d and the p a s s w d c o m m a n d is used, Ihe o l d fi le is r e n a m e d o p a s s w d and a new p a s s w d file is p r o d u c e d . I f we used a fi le hand le for Ihe file g r a n u l a r i t y m o u n t , t hen the c l i en t w o u l d c o n t i n u e to access Ihe o ld vers ion o f Ihe f i le . O u r approach gives Ihe, p resumably i n t ended , effeci that ihe c l i e n l sees ihe new vers ion o f Ihe f i l e . )

1. The LradiLional p roh ib i t i on o l l inks across devices appl ies lo remote moun ts . In a d d i t i o n . D is t r ibu ted Services does no i support direct access to remo le special t i les (devices) and Ihe remote mapp ing ot data t i les using the A I X e x t e n ­sions to Ihe s h m n l ( ) sysiem ca l l . N o t e tha i p rog ram licenses may n o ! n l low execu t ion ot a remote ly s lorcd copy ot a p r o g r a m .

February 17, 19S8 ens - 3

Page 4: Presenting a Single System Image with Fine Granularity Mounts

There ore several po in l s lo no l i ce here . F i r s I , this approach is staleless i n l ha l Ihe server can be recycled ( e . g . , powered of f and o n ) and Ihe hand le ( s ) given lo Ihe c l i e n t ( s ) p e r f o r m i n g a m o u n l ( s ) is s t i l l v a l i d , so Ihe m o u n l need n o l be repeated. T h i s is true because Ihe handle refers lo an on disk s t ruc ture , not an in m e m o r y s t ructure . Second . Ihe pa th reso lu t ion process must necessarily ignore m o u n t s o n the server, since these arc not ref lec ted i n ihe o n disk structures and are not necessarily repeated w h e n Ihe server is r ecyc led . T h i r d , as an i m m e d i a t e consequence, the c l i en t must e x p l i c i t l y p e r f o r m a l l moun t s " f o r i t s e l f , " since i l does not "see" m o u n t s pe r fo rmed by the server.

Inherited Mounts

In cons t ruc t ing a single system image or U n i x systems, i l is desirable, i f n o l necessary, to preserve the t r ad i t iona l d i r ec to ry h i e r a r c h y and conven t i ons . A l l Ihe mach ines i n the single system image must see the same inslances o f /e tc /passwd, /e lc /hos ls h o m e di rec tor ies , spool d i rec tor ies for m a i l , and so f o r t h . H o w e v e r , i t is also desirable/necessary to be able to access loca l equivalents o f these f i les /d i rec tor ies so that they may be kept u p to date w i t h the shared copies. For example , /e ic /passwd refers lo a shared copy o f the f i l e , and /na l ive /e l c /passwd refers lo Ihe unshared loca l ve r s ion . I n general / n a l i v e / a / b / . . . is established as the path to Ihe loca l instance o f /a /b/

W i t h o u t the concept o f i n h e r i t e d m o u n t s , discussed b e l o w , Ihis imp l i e s that each m a c h i n e w o u l d have to he doubly conf igured for i t ' s l oca l (dev ice ) moun t s . E .g . , i f / ( r o o l ) , / u and /usr are o n pa r t i t i ons /dev /hd( ) , / d e v / h d l and / d c v / h d 2 , then Ihe desired moun t s c o u l d be achieved by Ihe c o m ­mands:

moun t / d e v / h d l / u m o u n t / d e v / h d 2 / u s r m o u n t / / n a t i v e moun t / u / n a t i v e / u moun t / u s r / n a t i v e / u s r

A l t e r n a t i v e l y , the m o u n l p ro f i l e ( / e t c / f i l e s y s t e m s i n A i X ) w o u l d c o n t a i n an en t ry for each o f these m o u n t s . I f ano iher disk was added to h o l d /usr/src, Ihen t w o p ro f i l e entries w o u l d he needed, one for m o u n t /usr/src and one for m o u n l i n g /na t ive /usr / s re .

D i s t r i b u t e d Services i m p l e m e n t s i n h e r i t e d m o u n l s on l o p o f v i r t u a l f i le systems. T h e r e is a m n t c t l ( ) sysiem ca l l and c o r r e s p o n d i n g r emo te procedure c a l l . O n e o f the op t ions o f m n t c t l ( ) is to query and r e l u r n a list o f a i l m o u n l s c u r r e n t l y in eTfecl on a g iven server. T h e m o u n t c o m m a n d i n A I X supports a - i ( i n h e r i t e d ) f lag w h i c h causes ihe query to be p e r f o r m e d and the a d d i t i o n a l m o u n t s to be made . For Ihe above e x a m p l e ,

m o u n t - i / / n a t i v e

w o u l d have the same n c i effeci as Ihe three separate m o u n l c o m m a n d s Tor Ihe /na t ive subtree. W h e n a d d i t i o n a l device moun t s are c o n f i g u r e d , Ihis single m o u n t c o m m a n d s l i l l provides ihe desired effect o f an aliased n a m i n g pa th for ihe loca l instance o f Ihe f i le h i e r a r c h y . A d d i t i o n a l examples o f m o t i v a ­t i o n for i n h e r i t e d moun t s are g iven i n [ 3 ] ,

February 17, 19RR CHS - 4

Page 5: Presenting a Single System Image with Fine Granularity Mounts

P R E S E N T A T I O N O F S I N G L E S Y S T E M I M A G E

Objectives T h e Conf igura t ion is managed by a few s imple p rof i l e s . I t s h o u l d be easy to add/dele te mach ines and users, and to make other c o n f i g u r a t i o n changes.

A l l o f Ihe mach ines i n the single system image cluster shou ld use exact ly ihe same c o n ­f i g u r a t i o n Tiles, i . e . , there is n o d i s t i n c t i o n be tween the prof i les o n the server for /e lc /passwd and relaler i files and the ones o n the c l i en t s .

A s a result o f Ihe above, it shou ld be s imp le to r econf igure to use a d i f fe ren t server for the /e tc Tiles, e i ther because o f p l a n n e d outages o f the ex is t ing server or because o f fa i lu re o f Ihe ex is t ing server.

C l i e n l mach ines should recognize w h e n the server is unava i l ab l e , and s w i t c h to a l te rnate copies oT admin i s t r a t i ve Tiles and o the r shared f i les .

L o c a l r ep l i ca l ed copies o f Ihe a d m i n i s t r a t i v e files s h o u l d be p e r i o d i c a l l y upda t ed , so that i f there is an u n p l a n n e d outage o f Ihe server, the o the r mach ines have u p to date copies .

I f a m a c h i n e p r o v i d i n g some o f ihe h o m e d i rec to r ies is unava i l ab l e , a user shou ld d iscover Ihis i m m e d i a t e l y al l o g i n l i m e , and and be able to e i the r use an a l te rnate h o m e d i r e c t o r y o r wai t u n t i l Ihe m a c h i n e becomes avai lable aga in .

General Approach

T h e discussion w i l l focus o n a d m i n i s t r a t i v e files, e .g . , / e tc /passwd, and data f i les , e .g . , h o m e d i r e c t o r y subhierarchies and m a i l spoo l ing areas. A s s u m i n g , for the m o m e n t , a homogeneous proces­sor a r c h i t e c t u r e , executable files may be v i e w e d be tween t w o ex t remes : ( 1 ) a l l o f the mach ines i n the c lus ter have f u l l copies o f fhe executable code , and so Ihere is no sha r ing o f executables, or ( 2 ) there is a single shared copy o f each executable f i l e . T h e first ex t r eme has the p o t e n t i a l f o r incons i s tency amongst the m u l t i p l e copies , a d m i n i s l r a l i v e b u r d e n lo ensure that incons is ten t copies are n o l present, and waslage o f disk space for the r edundan t copies . T h e second ex t r eme has the l i m i t a t i o n that execulables w i l l be unusable i f the shared copy is no t accessible. A n a d m i n i s t r a t o r w i l l t y p i c a l l y choose a p o l i c y i n be tween Ihe ex t remes , e .g . , tha t Ihe k e r n e l and the files i n /e tc and / b i n arc r e p l i c a l e d , hut that o the r execulables are shared . T h e app roach discussed b e l o w supports and a l lows f l c x i h i l i l y i n d e t e r m i n i n g such po l i c i es . H o w e v e r , A I X and DS have o lhe r a d m i n i s l r a l i v e m e c h a n i s m s , k n o w n as "code s e rv ing" w h i c h address these po l ic ies i n d e t a i l , so we focus o n the a d m i n i s t r a t i v e and da la f i les .

W h e r e heterogeneous processor a rchi tec tures are i n v o l v e d , the m o t i v a t i o n for a separate m e c h a n i s m for code serving is s t ronger , since the mechan i sms b e l o w shou ld n o l be used for shar ing b i n a r y execulables across d i f f e r e n t processor a rch i t ec tu res . T h e riles e x p l i c i t l y shared i n the m e c h a n i s m s descr ibed be low arc i n A S C I I f o r m a l , and are sui table for sha r ing across heterogeneous processor a rch i t ec tu res . T h u s Ihe m e c h a n i s m s themselves w i l l w o r k across helerogeneous processor a rch i t ec tu res . H o w e v e r , i n Ihe helerogeneous e n v i r o n m e n t , m a c h i n e boundar i e s are m u c h m o r e l i k e l y to be v i s ib le , e .g . , due l o byte o rde r cons ide ra l ions i n a p p l i c a t i o n data . M o r e s t r ingent requ i re ­men t s mus t be placed on a p p l i c a t i o n code i n such an e n v i r o n e m e n t , i f ihe i l l u s i o n o f a single system is lo be preserved.

Configuration Files

/ e t c / a d m i n s e r v e r . One m a c h i n e is designated as Ihe " a d m i n i s l r a l i v e server" and is the m a c h i n e lha t has the disk copies o f Ihe shared a d m i n i s t r a t i v e files such as /e lc /passwd. T h e a d -

February 17, 19R8 CHS - 5

Page 6: Presenting a Single System Image with Fine Granularity Mounts

m i n i s l r a l i v e server can lie changed w h i l e machines are i n o p e r a t i o n , as discussed be low. T h e name o f the admin i s t r a t i ve server is s tored i n /e tc /adminserver .

/ e t c / S S I m a c h i n e s . T h i s f i le lists the names o f a l l mach ines i n the single system image ( i n ­c l u d i n g Ihe a d m i n i s l r a l i o n server) .

/ e t c / s e r v e r . f i l e s . This fi le lisls i n d i v i d u a l files that w i l l be shared based on the a d m i n i s l r a ­l ive server 's copy . For example , ih i s list might i nc lude

/ e t c / p a s s w d / e t c / g r o u p / e t c / m o t d / e t c / q c o n f i g / e t c / h o s t s / e t c / h o s t s . e q u i v / e t c / a d m i n s e r v e r / e t c / S S I m a c h i n e s / u s r / a d m / u s e r . c f i l e / e t c / s e r v e r . f i l e s / e t c / s e r v e r . d i r s / e t c / r e m o u n t s . l i s t / e t c / u g . S S I / e t c / o u g . S S I / e t c / o p a s s w d / e t c / o g r o u p / e t c / u m o u n t d , c / u s r / a d m / n e w u s e r . s y s / u s r / a d m / n e w u s e r . u s r

( A I X p r in t e r con f igu ra t i on f i l e )

( A 1 X adduser defaul ts)

(see b e l o w ) ( see b e l o w )

( fo r i d translates - see b e l o w )

(source for u m o u n t d - see b e l o w )

( A I X adduser defaul ts)

T h o u g h the l i s l c o u l d be longer or shorter , roughly this scl o f files has been appropr ia te i n ou r ex­per ience .

/ e t c / s e r v e r . d i r s . T h i s fi le lists d i rec tor ies , o ther than h o m e di rec tor ies , that w i l l be shared based on the a d m i n i s l r a l i v e server 's copy . A s s u m i n g that code serving is hand led separately, Ihis l i s l m i g h t i n c l u d e

/ u s r / m a i l

/ u s r / l i b / n e w s

/ u s r / s p o o l / n e w s

/ u s r / m a n

I f code serving is n o l hand led separately, t hen / u s r / b i n , / u s r / l i b , . . . m igh t be added lo this l i s l .

/ e t c / r e m o u n t s , l i s t . T h i s f i le lisls Tiles and d i rec tor ies that w i l l be u n m o u n t e d w h e n it is delected i ha l ihe server is inaccessible and r e m o u n t e d w h e n Ihe server becomes accessible again. T h i s is hand led by the u m o u n t d d a e m o n , discussed be low. T h i s l is l w i l l be a subset o f the c o m b i n e d lists i n server.f i les and server .dirs , e .g . ,

February 17, 1988 CHS - 6

Page 7: Presenting a Single System Image with Fine Granularity Mounts

/ e t c / p a s s w d / e t c / g r o u p / e t c / m o t d

/ e t c / q c o n f i g ( A I X p r in t e r c o n f i g u r a t i o n f i l e ) / e t c / h o s t s / e t c / h o s t s . e q u i v / e t c / a d m i n s e r v e r / e t c / r e m o u n t s , l i s t / u s r / m a i l

T h i s is a subset o f lite c o m b i n e d list o r i en t ed t o w a r d n o r m a l o p e r a t i o n w h e n the admin i s t r a t i ve server is inaccessible. It is a subset because some opera t ions , e.g. , chang ing passwords, p resumably w i l l be de fe r red w h e n the a d m i n i s t r a t i v e server is inaccessible, and some d i rec tor ies , e .g . , / u s r / m a n and /us r / spoo l /news , are l i k e l y to be e m p t y , except o n the a d m i n i s t r a t i v e server, and thus un in t e r e s t i ng w h e n that m a c h i n e is unava i lab le ,

/ e t c / u g . S S I , / e t c / o u g . S S I . DS provides general t r ans l a t ion mechan i sms for user and g roup id t rans la t ion [ 1 , 2 ] , For mach ines w i t h i n the cluster , there shou ld he one to one t rans la t ions , so that n u m e r i c id ' s are the same on a l l mach ines in the cluster, but mach ines i n the cluster may also need translates lo o the r mach ines ou ls ide the cluster . ug.SSl is used for a cluster w ide d e f i n i t i o n o f Ihe translates. For b r e v i l y , we w i l l n o l discuss the con ten t o f these f i les .

Home Directories

F o r sake o f s i m p l i c i t y , i t is assumed that h o m e d i r ec to r i e s ' pa lhnames have Ihe f o r m . . . / m a c h i n e / u s e r , whe re " m a c h i n e " is Ihe name o f Ihe m a c h i n e where ihe h o m e d i r ec to ry is ac tua l ly s tored . Even though paths are o f Ihis f o r m , users w i l l see the same actual h o m e d i r ec to ry o n each m a c h i n e o f the cluster , e .g . , i n ou r e n v i r o n m e n t w h e n user sauer is logged i n t o m a c h i n e d7S , h is h o m e d i r ec to ry is s t i l l /u/auschs/sauer, since his h o m e d i r ec to ry is s tored on auschs. T h i s is a m i n o r sacr if ice o f t ransparency , since users usual ly d o not use roo ted paths l o get to the i r h o m e d i rec ­tories — the i r h o m e d i r e c t o r y is l isted i n /e lc /passwd, so t ha i is whe re they start, cd takes I h e m back there , and the c shell " - " n o t a t i o n is o f l en used to gel lo o lhe r users h o m e d i rec to r i e s , e .g . , cd -da l e . Sh ip l ey has proposed a s i m i l a r c o n v e n t i o n for shar ing h o m e d i rec tor ies [ 7 ] ,

H a v i n g paths oT this f o r m a l lows each m a c h i n e to s i m p l y m o u n l Ihe h o m e d i rec tor ies s lored o n o the r mach ines , e .g . , m o u n l - n auschs /u/auschs /u/auschs, or , i n genera l ,

for i i n 'ca t / e t c /SSImach ines '

do

iT [ Si 1= S m y n a m e ]

then

m o u n l - i - n Si / u / $ i / u / J i

fi

done

T h i s is a s l igh l ly s i m p l i f i e d f ragment f r o m / e t c / S S I m o u n l s , discussed b e l o w .

Machine Initialization

A s i n i t processing, a f ie r n o r m a l s tandalone I n i t i a l i z a t i o n , e .g . , fsck and device m o u n t s , / e t c / r c . D S starts DS and then runs / e l c / S S I m o u n t s . S S I m o u n i s runs i n the backg round so that l oca l opera t ions can begin w i t h o u t server ava i l ab i l i t y . S S I moun i s runs on a l l mach ines , i n c l u d i n g Ihe a d -

February 17, 1988 ens - 7

Page 8: Presenting a Single System Image with Fine Granularity Mounts

minserver . F o l l o w i n g are s imp l i f i ed sketches t aken f r o m SSImoun i s . E r ro r checks , louches /mkci i r s for m o u n l po in t s , p recau t iona ry u m o u n t s , e tc . are o m i t t e d :

Initial mounts from adminserver, updating local copies of files:

i f [ S m y n a m e 1= Sadminserver ] (hen

u n t i l m o u n t - i - n Sadminserver /na t ive /Sadminserver do

sleep Sdelay

done for i i n 'cat / c i c / se rve r . f i l e s ' do

' c p - p /SadminserverSi $ i m o u n t - n Sadminserver / n a l i v c S i $ i

done

for i i n 'cat /e tc/server .dirs* d o

m o u n t - i - n Sadminserver / n a l i v c S i Si

iT [ $ i = ' / u s r / m a i l ' ]

t hen

/ e t c / m o v e m a i l & It see b e l o w

.n done

fi

Start / e t c / u m o u i i t d

m a k e u m o u n t d / e t c / u m o u n t d &

Update user/group ids.

c m p /e tc /ug .SSI / c t c /oug .SSI i f [ $? - n e 0 1 t hen

d s l d x p r o f - a - f /e tc /ug .SSI

i f | $? - e q 0 ]

i h e n

cp - p / e t c /ug .5S l /e tc /nug.SSI

0 • f i

Mount home directories. This is as ind ica ted in the previous f ragment , except lha t ihe m o u n l s are r e t r i ed asynchronous ly in the background so lhat ava i l ab i l i ty o f any given m a c h i n e doesn ' t delay ava i l ab i l i t y o f h o m e d i r e c l o i i e s f r o m other mach ines .

O n c e these steps have been p e r f o r m e d , then the m a c h i n e has j o i n e d the single system image.

letcimovemail. M a i l received w h i l e ihe admin i s t r a t i ve server is n o l avai lable w i l l be k e p i i n Ihe na l tve spool d i r e c l o r y , / u s r / m a i l . m o v e m a i l moves m a i l f r o m Ihe na l i ve spoo l d i r ec to ry lo Ihe shared

February 17, 1988 CHS - S

Page 9: Presenting a Single System Image with Fine Granularity Mounts

spool d i r e c t o r y whenever i h e shared d i r ec to ry is m o u n t e d , c i the r by SSImount s or r emoun t s (see b e l o w ) .

umountd

T h e key r e m a i n i n g topic is the d aemon u m o u n t d . u m o u n t d uses a p o l l i n g l o o p , p e r f o r m i n g the f o l l o w i n g func t ions and then s leeping u n t i l r epea t ing the func t ions . T h e default sleep t i m e is 60 seconds.

Detection of server inaccessibility, u m o u n t d a t tempts to open each o f Ihe files l is ted i n /e tc /server . f i les . I f an open fai ls , u m o u n l d assumes the server is inaccessible and executes / e t c / r emoun t s , / e t c / r emoun t s u n m o u n t s a l l o f Ihe files i n / e t c / r emoun t s . l i s t and i h e n a l l empts to r e m o u n t t h e m , r emoun t s w i l l execute m o v e m a i l after successfully r e m o u n t i n g / u s r / m a i l . ( u m o u n t d runs on adminserver , but skips Ihcse steps.)

Updating modified files, u m o u n t d de te rmines w h e t h e r any o f ihe files i n /etc/server, files have been upda ted . I f so, u m o u n l d locks Ihe server copy and updates the na t ive copy , ( u m o u n t d t u n n i n g on adminserver skips these steps.)

Detection of configuration changes, i f key c o n f i g u r a t i o n f i les , e .g . , / e l c / adminse rve r or /e lc / server . f i les , have been u p d a l e d , u m o u n t d exec 's S S I m o u n i s . SSImount s applies ihe changes and Ihen starts a fresh vers ion o f u m o u n l d , as ind ica ted above.

T h e r e are subtleties o f l o c k i n g and t i m i n g w h i c h we o m i t for b r c v i l y . S ta r l ing w i t h DS 1.2, Ihe source for u m o u n t d . c is i n c l u d e d i n the DS samples d i r ec to ry , a long w i l h the in s t a l l a t ion d o c u m e n l a t i o n , i n s t a l l a t i o n c o m m a n d and so f o r l h .

N o l e Ihe p o w e r o f Ihe above m e c h a n i s m s . I3y s i m p l y chang ing Ihe name o f the server i n / e t c / adminsc rve r , e.g. , for a scheduled outage o f Ihe adminserver , the mach ines i n the cluster w i l l shif t to the new adminserver i n a couple o f m i n u l e s , w i t h o u t r e b o o t i n g any o f the mach ines or o ther ­wise s ign i f i can t ly d i s rup t ing users. F o r an unschedu led server outage o f s igni f icant d u r a t i o n , a swi t chove r lo a d i f f e r e n i server can he a c c o m p l i s h e d by chang ing ihe adminse rve r f i le on each o f the o l h e r mach ines and r eboo t ing t h e m . I n e i ther case, scheduled or unschedu led , the o r i g i n a l server can re jo in Ihe cluster as a c l i e n l w h e n it is ready to r e jo in the cluster, and then assume Ihe server ro le again w h e n its c o n f i g u r a t i o n files have been u p d a t e d .

S U M M A R Y

We bel ieve this approach effect ively meets the o p e r a l i o n a l d e f i n i t i o n g iven i n the I n t r o d u c t i o n , i n regard l o user accounts , da la , ava i l ab i l i t y , and admin i s t e red services. T h e m e c h a n i s m s are designed to be s imple to apply and admin i s t e r , yet h igh ly e f fec l ive in present ing the image o f a single system. -These mechanisms are c o m p l e m e n t a r y l o the u n d e r l y i n g file system mechan i sms o f D i s ­t r ibu ted Services, and o r thogona l to o lhe r enhancement s such as the code server m e c h a n i s m s . T h u s Ihe same u n d e r l y i n g mechan i sms arc app l i ed across the d i s t r ibu ted e n v i r o n m e n t , for example the abs t rac t ion or our software d e v e l o p m e n t e n v i r o n m e n t depic ted i n Figure 1. I n a d d i t i o n , s igni f icant a d m i n i s l r a l i v e f l ex ib i l i t y is present, as suggested i n Ihe p reced ing paragraph. These concepts c o u l d be app l i ed to o l h e r d i s t r ibu ted file systems s u p p o r t i n g f ine g ranu la r i ty m o u n t s .

R E F E R E N C E S

1. Charles I I . Sauer, D o n W . Johnson , L a r r y L o u c k s , A m a l A . S h a h e c n - O o u d a , T o d d A . S m i t h , , " R T PC Dis t r i bu ted Services: O v e r v i e w , " Operating Systems Review 21. 3 ( Ju ly 1987) pp . 1 8 - 2 9 .

2 . Char les I I . Sauer, D o n W . Johnson , L a r r y L o u c k s , A m a l A . S h a h c c n - G o u d a , T o d d A . S m i t h , " R T PC Dis t r ibu ted Services: F i l e S y s t e m , " ;togifi: 12. 5 ( S e p t e m b e r / O c t o b e r 1987) pp . 1 2 - 2 2 .

February 17, 1988 ens - 9

Page 10: Presenting a Single System Image with Fine Granularity Mounts

3. Charles H . Sauer, D o n W. Johnson , L a r r y L o u c k s , A m a l A , S h a h e e n - G o u d a , T o d d A . S m i t h , "Statelessness and Slatefulness i n D i s t r i b u t e d Serv ices ," UniForum 1988, Dal las , Texas , February 1988.

4. Jason L e v i t t , " T h e I B M R T Gets C o n n e c t e d , " BYTE 12, 12 ( 1 9 8 7 ) p p . 1 3 3 - 1 3 8

5. R. Sandberg, D . Go ldbe rg , S. K l e i m a n , D a n Walsh and D, L y o n , "Des ign and I m p l e m e n t a t i o n o f the Sun N e t w o r k Fi le S y s t e m , " USENIX Conference Proceedings, P o r t l a n d , June 1985.

6. Sun Mic rosys t ems , i n c . , Networking on the Sun Workstation, Febura ry 1986.

7. M . Sh ip l ey , " T h e V i r t u a l H o m e E n v i r o n m e n t , " UniForum 1988, Dal las , Texas , February 1988.

February 17, 1988 ens - I O


Top Related