+ All Categories
Home > Documents > Presenting a Single System Image with Fine Granularity Mounts

Presenting a Single System Image with Fine Granularity Mounts

Date post: 16-Oct-2021
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
10
Presenting a Single System Image with Fine Granularity Mounts Charles H. Sauer IBM Advanced Engineering Syslcms Austin, Texas 78758 ABSTRACT In distributed system environments, a variely of administrative environments (system images) can be presented, reflecting different user requirements and administrative objectives. One of the most important system images is the so called "single system image." This paper provides a context and definition for single system image. It descrihes an effeclive approach to collecting multiple UNIX'" systems into a single system image, based on simple use of remote mounts at fine granularities, including individual files. The approach is designed to allow for replication of ad- ministrative files, e.g., /etc/passwd, and graceful reconfiguration of the system to accomodate planned outages and respond to unplanned outages. Experiences with this approach and A/X" Dis- tributed Services are summarized. INTRODUCTION In a distributed system environment, individual machines usually perform roles as servers (file, print, name, ...) and/or clients, Subsets of machines may be associated into administrative groups or the associations between machines may remain primarily pairwise and ad hoc. Figure I illustrates a typical software development environment. Some machines provide services to all other machines in the organization, e.g., network news, source control, special devices, elc. Some machines are ad- ministered directly by their owners and have only loose associations with oilier machines, e.g., Ihe organization wide servers. Many of the other machines are collected into single system images, based on suborganizalions. These machines are administered as a group, with the intent that users can use any of the machines oquivalently. There will he inherent exceptions to this, e.g., some machines will have color displays and others will have monochrome displays. And even where the hardware con- figurations are the same, the end users will usually be able lo distinguish one machine from another, e.g., by querying a machine readable serial number. Bui a successful approach will give users the illusion that all the machines are Ihe same under mosl circumstances: user accounts/passwords. A user can login to any of the machines using the same login name and the same password. Regardless of which machine is used, (he user has Ihe same home directory and execution environment. When the user changes his/her password, using the standard passwd command, the change is effective immediately on all machines in the single system image. When an administrator adds a new account, this is done once for all ihe machines. availability. Even though one or more of the machines is unavailable, the rest of Ihe machines are slill able lo function together and present the same system image, except for resources which exisl only on unavailable machines, A machine which cannot connect to oilier machines is slill usable. administered services. System wide functions, e.g., print and mail service, function the same from machine to machine. if mail is seni lo a particular user, il can be seen/handled on any of the machines, UNIX, is developed and licensed by AT&T. Unix is a registered trademark in the U.S.A. and olhcr countries. AIX is a trademark of International Business Machines Corporation. February 17, 1988 CHS - I
Transcript
Page 1: Presenting a Single System Image with Fine Granularity Mounts

Presenting a Single System Image with Fine Granularity Mounts

Charles H . Sauer

I B M A d v a n c e d Engineer ing Syslcms

A u s t i n , Texas 78758

A B S T R A C T

I n d is t r ibuted system env i ronmen t s , a var ie ly of admin i s t r a t i ve env i ronmen t s (system images) can be presented, ref lect ing di f ferent user requirements and admin i s t r a t i ve objectives. One o f the most impor t an t system images is the so cal led "single system i m a g e . " T h i s paper provides a context and d e f i n i t i o n for single system image. It descrihes an effecl ive approach to c o l l e c t i n g m u l t i p l e UNIX'" systems i n t o a single system image, based on s imple use o f remote moun t s at f ine granular i t ies , i n c l u d i n g i n d i v i d u a l fi les. T h e approach is designed to a l low for r ep l i ca t ion o f ad­min i s t r a t i ve files, e.g. , /e tc/passwd, and graceful reconf igura t ion o f the system to accomoda te p lanned outages and respond to u n p l a n n e d outages. Experiences w i t h this approach and A/X" D i s ­t r ibu ted Services are summar i zed .

I N T R O D U C T I O N

I n a d i s t r ibu ted system e n v i r o n m e n t , i n d i v i d u a l machines usual ly p e r f o r m roles as servers ( f i l e , p r i n t , name , . . . ) and /o r cl ients , Subsets o f machines may be associated i n t o adminis t ra t ive groups or the associations between machines may r ema in p r i m a r i l y pai rwise and ad hoc. Figure I i l lustrates a typ ica l software deve lopment e n v i r o n m e n t . Some machines p rov ide services to a l l o ther machines i n the o rgan iza t ion , e .g . , n e t w o r k news, source c o n t r o l , special devices, e lc . Some machines are ad­min i s t e red d i rec t ly by their owners and have o n l y loose associations w i t h o i l i e r machines , e .g . , Ihe organiza t ion w ide servers. M a n y o f the o ther mach ines are co l l ec ted i n t o single system images, based on suborganizal ions. These machines are admin is te red as a g roup , w i t h the in ten t that users can use any o f the machines oquiva len t ly . T h e r e w i l l he inherent except ions to this, e.g. , some machines w i l l have co lo r displays and others w i l l have m o n o c h r o m e displays. A n d even where the hardware c o n ­f igurat ions are the same, the end users w i l l usual ly be able lo dis t inguish one m a c h i n e f r o m another , e.g. , by que ry ing a m a c h i n e readable serial number . B u i a successful approach w i l l give users the i l l u s i o n that a l l the machines are Ihe same under mos l c i rcumstances :

user accounts/passwords. A user can login to any o f the mach ines using the same login name and the same password. Regardless o f w h i c h m a c h i n e is used, (he user has Ihe same h o m e d i rec to ry and execut ion e n v i r o n m e n t . W h e n the user changes his /her password, us ing the standard passwd c o m m a n d , the change is effect ive i m m e d i a t e l y on a l l machines i n the single system image. W h e n an admin i s t r a to r adds a new account , this is done once for a l l ihe mach ines .

availability. Even though one o r m o r e o f the machines is unava i lab le , the rest o f Ihe machines are s l i l l able lo f u n c t i o n together and present the same system image, except for resources w h i c h exisl on ly on unavai lable machines , A m a c h i n e w h i c h cannot connect to o i l i e r machines is s l i l l usable.

administered services. System wide func t ions , e.g. , p r in t and m a i l service, f u n c t i o n the same f r o m m a c h i n e to m a c h i n e . i f m a i l is seni lo a par t icu lar user, i l can be seen/handled on any o f the machines ,

UNIX, is developed and licensed by AT&T. Unix is a registered trademark in the U.S.A. and olhcr countries. AIX is a trademark of International Business Machines Corporation.

February 17, 1988 CHS - I

Page 2: Presenting a Single System Image with Fine Granularity Mounts

T h i s is n o l an exhaustive list, but is intended to be indicat ive . Wherever possible, the adminis t rator should view Ihe col lec t ion o f machines as i f it were one machine and use the same procedures that wou ld be used on a single machine . We w i l l use the above characterislics as an operational de f in i t ion o f "single system image" and discuss an approach which we believe is effective in meeting Ihe de f in i ­t i on .

/usr/news • /usr/lpp/pl8cc

• •

/dev/aps5 /dev/mt

• d75

0 auschs

bgeynon

• D

Arch

a o • • D

Build

• Test • • •

• • •

• /usr/src

• • • • D • Prototype

y

Figure 1 - Associations o f machines.

Distributed Services ( D S ) provides distr ibuted operating system capabilities for the A I X operat­ing system. These include distributed file services w i t h local / remote transparency, distributed inter­process c o m m u n i c a t i o n and a number of administrat ive services. For background in fo rma t ion on DS, see Saner et a! [ 1 , 2 , 3 ] and Lev i t t [ 4 ] . One of the design goals o f DS was lo provide support for mixed administrat ive environments , such as the one depicted In Figure 1, using the same protocols and convent ions across the administrat ive envi ronment . One o f Ihe cornerstones o f Ihis adminis t ra­tive f lex ib i l i ty is a general remote mount model . T h e focus of this paper is to show how the features of this remote mount model can be used to s imply and effectively present a single system image. We first describe some of the characteristics o f the DS mount mode l , then describe the approach to single sysiem image, and f inal ly discuss some addi t ional related topics.

D I S T R I B U T E D S E R V I C E S MOUNT M O D E L Distr ibuted Services uses " remole moun t s" lo achieve local / remote transparency. A remole

moun t is much l ike a convent ional mount in the Unix operating system, but the mounted fi lcsyslem is on a different machine than the mounted on d i rec tory . Once the remote mount is established, local and remote files appear in Ihe same directory hierarchy, and, w i th m i n o r exceplions, fi le sysiem calls

February 17, 19SS CIIS - 2

Page 3: Presenting a Single System Image with Fine Granularity Mounts

have Ihe same effeci regardless o f w h e l h e r f i l c s ( d i r e c l o r i c s ) are local o r r e m o t e 1 . M o u n t s , b o l h c o n ­v e n t i o n a l and r e m o l e , are t yp i ca l ly made as p a r i o f system star tup, and thus are established before users l o g i n . A d d i t i o n a l r e mo l e moun t s can he established d u r i n g n o r m a l system o p e r a t i o n , i f des i red.

C o n v e n t i o n a l moun t s requi re that an en t i r e f i le sysiem be m o u n t e d . D i s t r i bu ted Services r emo le m o u n t s a l l o w moun t s o f subdirector ies and i n d i v i d u a l files o f a r emo te Rlesystem over a loca l d i rec­tory or f i l e , respect ively. Fi le g ranu la r i ty m o u n t s are useful i n c o n f i g u r i n g a single system image. F o r e x a m p l e , a shared copy o f / e t c / p a s s w d m a y be m o u n t e d over a local / e t c / p a s s w d w i t h o u t h i d i n g o the r , m a c h i n e specif ic , files i n the / e t c d i r e c t o r y . Use o f m o u n t s at a f ine granula r i ty is key lo this approach lo single system image.

Virtual File Systems

T h e Di s t r i bu ted Services remote m o u n t design is based on the V i r t u a l Fi le System approach used w i l h NFS [5,6]. T h i s approach a l lows c o n s t r u c l i o n o f essentially a rb i t ra ry m o u n l h ierarchies , i n c l u d i n g m o u n t i n g a loca l object over a r emo le object , m o u n t i n g a r e m o l e object over a r emo le ob jec t , m o u n l i n g an object m o r e than once w i t h i n the same h ie ra rchy , m o u n t h ierarchies spanning m o r e than one m a c h i n e , e l c . T h e m a i n cons t ra in t is tha i moun t s are on ly effect ive i n the m a c h i n e p e r f o r m i n g the m o u n t .

I n c o n j u n c t i o n w i t h us ing ihe V i r t u a l Fi le System concep t , we necessarily have replaced Ihe t r a d i t i o n a l n a m e i ( ) ke rne l f u n c t i o n , w h i c h t ranslated a f u l l pa th name lo an i - n u m b e r , w i t h a c o m ­ponen t by c o m p o n e n t l o o k u p O f u n c t i o n , l o o k u p ! ] is used b o t h for loca l and remote path name r e so lu t ion . T h e arguments to l o o k u p ! ) are a f i l e handle represent ing a d i r ec to ry and the name oT a c o m p o n e n t lo be f o u n d i t ! that d i r ec to ry , l o o k u p [ ) returns a handle for the c o m p o n e n t , iT f o u n d . A hand le is ef fect ively a p o i n l e r to Ihe on disk inode for Ihe co r r e spond ing object and a genera t ion n u m b e r for tha t i n o d e . T h e genera t ion n u m b e r is used for subsequent va l i d i t y tests.

W h e n a c l ien t successfully requests a m o u n l f r o m a server, i l receives an ha/idle for Ihe object it is m o u n l i n g and slores i l i n its m o u n l table. W h e n the c l ien t is pars ing a fi le p a t h n a m e , e.g. , for o p e n ( ) , and encounters the m o u n t e d ob jee l , Ihe handle is g iven lo Ihe server as an a r g u m e n l i n the l o o k u p ( ) r emo te p rocedure ca l l . T y p i c a l l y , the m o u n l e d objeel is a d i r ec to ry , and Ihe server w i l l l o o k u p an objeel w i t h i n that d i r e c t o r y .

F o r example , let us suppose that a c l i e n l m o u n l s server 's / B over / a / b . T h e c l i en t t hen opens / a / b / c . W h e n the c l ien t gets to b / c , it passes Ihe handle for b and the c o m p o n e n t c to Ihe server, request ing the server lo l o o k u p and r e l u r n a handle for c l h a l can be used i n Ihe actual o p e n ( ) c a l l . T h e server w i l l r e l u r n a handle for / B / c .

For fi le granularity m o u n l s , Ihe s i r ing f o r m o f the fi le name c o m p o n e n t is r e t u r n e d , a long w i t h the f i le handle o f the ( rea l ) parent d i r e c t o r y . T h i s a l te rna t ive l o using the fi le handle Tor the m o u n t e d file a l lows rep lacement of Ihe m o u n t e d file w i t h a new version w i t h o u t loss o f access lo the fi le ( w i l h that n a m e ) . ( F o r example , w h e n / e t c / p a s s w d is m o u n t e d and the p a s s w d c o m m a n d is used, Ihe o l d fi le is r e n a m e d o p a s s w d and a new p a s s w d file is p r o d u c e d . I f we used a fi le hand le for Ihe file g r a n u l a r i t y m o u n t , t hen the c l i en t w o u l d c o n t i n u e to access Ihe o ld vers ion o f Ihe f i le . O u r approach gives Ihe, p resumably i n t ended , effeci that ihe c l i e n l sees ihe new vers ion o f Ihe f i l e . )

1. The LradiLional p roh ib i t i on o l l inks across devices appl ies lo remote moun ts . In a d d i t i o n . D is t r ibu ted Services does no i support direct access to remo le special t i les (devices) and Ihe remote mapp ing ot data t i les using the A I X e x t e n ­sions to Ihe s h m n l ( ) sysiem ca l l . N o t e tha i p rog ram licenses may n o ! n l low execu t ion ot a remote ly s lorcd copy ot a p r o g r a m .

February 17, 19S8 ens - 3

Page 4: Presenting a Single System Image with Fine Granularity Mounts

There ore several po in l s lo no l i ce here . F i r s I , this approach is staleless i n l ha l Ihe server can be recycled ( e . g . , powered of f and o n ) and Ihe hand le ( s ) given lo Ihe c l i e n t ( s ) p e r f o r m i n g a m o u n l ( s ) is s t i l l v a l i d , so Ihe m o u n l need n o l be repeated. T h i s is true because Ihe handle refers lo an on disk s t ruc ture , not an in m e m o r y s t ructure . Second . Ihe pa th reso lu t ion process must necessarily ignore m o u n t s o n the server, since these arc not ref lec ted i n ihe o n disk structures and are not necessarily repeated w h e n Ihe server is r ecyc led . T h i r d , as an i m m e d i a t e consequence, the c l i en t must e x p l i c i t l y p e r f o r m a l l moun t s " f o r i t s e l f , " since i l does not "see" m o u n t s pe r fo rmed by the server.

Inherited Mounts

In cons t ruc t ing a single system image or U n i x systems, i l is desirable, i f n o l necessary, to preserve the t r ad i t iona l d i r ec to ry h i e r a r c h y and conven t i ons . A l l Ihe mach ines i n the single system image must see the same inslances o f /e tc /passwd, /e lc /hos ls h o m e di rec tor ies , spool d i rec tor ies for m a i l , and so f o r t h . H o w e v e r , i t is also desirable/necessary to be able to access loca l equivalents o f these f i les /d i rec tor ies so that they may be kept u p to date w i t h the shared copies. For example , /e ic /passwd refers lo a shared copy o f the f i l e , and /na l ive /e l c /passwd refers lo Ihe unshared loca l ve r s ion . I n general / n a l i v e / a / b / . . . is established as the path to Ihe loca l instance o f /a /b/

W i t h o u t the concept o f i n h e r i t e d m o u n t s , discussed b e l o w , Ihis imp l i e s that each m a c h i n e w o u l d have to he doubly conf igured for i t ' s l oca l (dev ice ) moun t s . E .g . , i f / ( r o o l ) , / u and /usr are o n pa r t i t i ons /dev /hd( ) , / d e v / h d l and / d c v / h d 2 , then Ihe desired moun t s c o u l d be achieved by Ihe c o m ­mands:

moun t / d e v / h d l / u m o u n t / d e v / h d 2 / u s r m o u n t / / n a t i v e moun t / u / n a t i v e / u moun t / u s r / n a t i v e / u s r

A l t e r n a t i v e l y , the m o u n l p ro f i l e ( / e t c / f i l e s y s t e m s i n A i X ) w o u l d c o n t a i n an en t ry for each o f these m o u n t s . I f ano iher disk was added to h o l d /usr/src, Ihen t w o p ro f i l e entries w o u l d he needed, one for m o u n t /usr/src and one for m o u n l i n g /na t ive /usr / s re .

D i s t r i b u t e d Services i m p l e m e n t s i n h e r i t e d m o u n l s on l o p o f v i r t u a l f i le systems. T h e r e is a m n t c t l ( ) sysiem ca l l and c o r r e s p o n d i n g r emo te procedure c a l l . O n e o f the op t ions o f m n t c t l ( ) is to query and r e l u r n a list o f a i l m o u n l s c u r r e n t l y in eTfecl on a g iven server. T h e m o u n t c o m m a n d i n A I X supports a - i ( i n h e r i t e d ) f lag w h i c h causes ihe query to be p e r f o r m e d and the a d d i t i o n a l m o u n t s to be made . For Ihe above e x a m p l e ,

m o u n t - i / / n a t i v e

w o u l d have the same n c i effeci as Ihe three separate m o u n l c o m m a n d s Tor Ihe /na t ive subtree. W h e n a d d i t i o n a l device moun t s are c o n f i g u r e d , Ihis single m o u n t c o m m a n d s l i l l provides ihe desired effect o f an aliased n a m i n g pa th for ihe loca l instance o f Ihe f i le h i e r a r c h y . A d d i t i o n a l examples o f m o t i v a ­t i o n for i n h e r i t e d moun t s are g iven i n [ 3 ] ,

February 17, 19RR CHS - 4

Page 5: Presenting a Single System Image with Fine Granularity Mounts

P R E S E N T A T I O N O F S I N G L E S Y S T E M I M A G E

Objectives T h e Conf igura t ion is managed by a few s imple p rof i l e s . I t s h o u l d be easy to add/dele te mach ines and users, and to make other c o n f i g u r a t i o n changes.

A l l o f Ihe mach ines i n the single system image cluster shou ld use exact ly ihe same c o n ­f i g u r a t i o n Tiles, i . e . , there is n o d i s t i n c t i o n be tween the prof i les o n the server for /e lc /passwd and relaler i files and the ones o n the c l i en t s .

A s a result o f Ihe above, it shou ld be s imp le to r econf igure to use a d i f fe ren t server for the /e tc Tiles, e i ther because o f p l a n n e d outages o f the ex is t ing server or because o f fa i lu re o f Ihe ex is t ing server.

C l i e n l mach ines should recognize w h e n the server is unava i l ab l e , and s w i t c h to a l te rnate copies oT admin i s t r a t i ve Tiles and o the r shared f i les .

L o c a l r ep l i ca l ed copies o f Ihe a d m i n i s t r a t i v e files s h o u l d be p e r i o d i c a l l y upda t ed , so that i f there is an u n p l a n n e d outage o f Ihe server, the o the r mach ines have u p to date copies .

I f a m a c h i n e p r o v i d i n g some o f ihe h o m e d i rec to r ies is unava i l ab l e , a user shou ld d iscover Ihis i m m e d i a t e l y al l o g i n l i m e , and and be able to e i the r use an a l te rnate h o m e d i r e c t o r y o r wai t u n t i l Ihe m a c h i n e becomes avai lable aga in .

General Approach

T h e discussion w i l l focus o n a d m i n i s t r a t i v e files, e .g . , / e tc /passwd, and data f i les , e .g . , h o m e d i r e c t o r y subhierarchies and m a i l spoo l ing areas. A s s u m i n g , for the m o m e n t , a homogeneous proces­sor a r c h i t e c t u r e , executable files may be v i e w e d be tween t w o ex t remes : ( 1 ) a l l o f the mach ines i n the c lus ter have f u l l copies o f fhe executable code , and so Ihere is no sha r ing o f executables, or ( 2 ) there is a single shared copy o f each executable f i l e . T h e first ex t r eme has the p o t e n t i a l f o r incons i s tency amongst the m u l t i p l e copies , a d m i n i s l r a l i v e b u r d e n lo ensure that incons is ten t copies are n o l present, and waslage o f disk space for the r edundan t copies . T h e second ex t r eme has the l i m i t a t i o n that execulables w i l l be unusable i f the shared copy is no t accessible. A n a d m i n i s t r a t o r w i l l t y p i c a l l y choose a p o l i c y i n be tween Ihe ex t remes , e .g . , tha t Ihe k e r n e l and the files i n /e tc and / b i n arc r e p l i c a l e d , hut that o the r execulables are shared . T h e app roach discussed b e l o w supports and a l lows f l c x i h i l i l y i n d e t e r m i n i n g such po l i c i es . H o w e v e r , A I X and DS have o lhe r a d m i n i s l r a l i v e m e c h a n i s m s , k n o w n as "code s e rv ing" w h i c h address these po l ic ies i n d e t a i l , so we focus o n the a d m i n i s t r a t i v e and da la f i les .

W h e r e heterogeneous processor a rchi tec tures are i n v o l v e d , the m o t i v a t i o n for a separate m e c h a n i s m for code serving is s t ronger , since the mechan i sms b e l o w shou ld n o l be used for shar ing b i n a r y execulables across d i f f e r e n t processor a rch i t ec tu res . T h e riles e x p l i c i t l y shared i n the m e c h a n i s m s descr ibed be low arc i n A S C I I f o r m a l , and are sui table for sha r ing across heterogeneous processor a rch i t ec tu res . T h u s Ihe m e c h a n i s m s themselves w i l l w o r k across helerogeneous processor a rch i t ec tu res . H o w e v e r , i n Ihe helerogeneous e n v i r o n m e n t , m a c h i n e boundar i e s are m u c h m o r e l i k e l y to be v i s ib le , e .g . , due l o byte o rde r cons ide ra l ions i n a p p l i c a t i o n data . M o r e s t r ingent requ i re ­men t s mus t be placed on a p p l i c a t i o n code i n such an e n v i r o n e m e n t , i f ihe i l l u s i o n o f a single system is lo be preserved.

Configuration Files

/ e t c / a d m i n s e r v e r . One m a c h i n e is designated as Ihe " a d m i n i s l r a l i v e server" and is the m a c h i n e lha t has the disk copies o f Ihe shared a d m i n i s t r a t i v e files such as /e lc /passwd. T h e a d -

February 17, 19R8 CHS - 5

Page 6: Presenting a Single System Image with Fine Granularity Mounts

m i n i s l r a l i v e server can lie changed w h i l e machines are i n o p e r a t i o n , as discussed be low. T h e name o f the admin i s t r a t i ve server is s tored i n /e tc /adminserver .

/ e t c / S S I m a c h i n e s . T h i s f i le lists the names o f a l l mach ines i n the single system image ( i n ­c l u d i n g Ihe a d m i n i s l r a l i o n server) .

/ e t c / s e r v e r . f i l e s . This fi le lisls i n d i v i d u a l files that w i l l be shared based on the a d m i n i s l r a ­l ive server 's copy . For example , ih i s list might i nc lude

/ e t c / p a s s w d / e t c / g r o u p / e t c / m o t d / e t c / q c o n f i g / e t c / h o s t s / e t c / h o s t s . e q u i v / e t c / a d m i n s e r v e r / e t c / S S I m a c h i n e s / u s r / a d m / u s e r . c f i l e / e t c / s e r v e r . f i l e s / e t c / s e r v e r . d i r s / e t c / r e m o u n t s . l i s t / e t c / u g . S S I / e t c / o u g . S S I / e t c / o p a s s w d / e t c / o g r o u p / e t c / u m o u n t d , c / u s r / a d m / n e w u s e r . s y s / u s r / a d m / n e w u s e r . u s r

( A I X p r in t e r con f igu ra t i on f i l e )

( A 1 X adduser defaul ts)

(see b e l o w ) ( see b e l o w )

( fo r i d translates - see b e l o w )

(source for u m o u n t d - see b e l o w )

( A I X adduser defaul ts)

T h o u g h the l i s l c o u l d be longer or shorter , roughly this scl o f files has been appropr ia te i n ou r ex­per ience .

/ e t c / s e r v e r . d i r s . T h i s fi le lists d i rec tor ies , o ther than h o m e di rec tor ies , that w i l l be shared based on the a d m i n i s l r a l i v e server 's copy . A s s u m i n g that code serving is hand led separately, Ihis l i s l m i g h t i n c l u d e

/ u s r / m a i l

/ u s r / l i b / n e w s

/ u s r / s p o o l / n e w s

/ u s r / m a n

I f code serving is n o l hand led separately, t hen / u s r / b i n , / u s r / l i b , . . . m igh t be added lo this l i s l .

/ e t c / r e m o u n t s , l i s t . T h i s f i le lisls Tiles and d i rec tor ies that w i l l be u n m o u n t e d w h e n it is delected i ha l ihe server is inaccessible and r e m o u n t e d w h e n Ihe server becomes accessible again. T h i s is hand led by the u m o u n t d d a e m o n , discussed be low. T h i s l is l w i l l be a subset o f the c o m b i n e d lists i n server.f i les and server .dirs , e .g . ,

February 17, 1988 CHS - 6

Page 7: Presenting a Single System Image with Fine Granularity Mounts

/ e t c / p a s s w d / e t c / g r o u p / e t c / m o t d

/ e t c / q c o n f i g ( A I X p r in t e r c o n f i g u r a t i o n f i l e ) / e t c / h o s t s / e t c / h o s t s . e q u i v / e t c / a d m i n s e r v e r / e t c / r e m o u n t s , l i s t / u s r / m a i l

T h i s is a subset o f lite c o m b i n e d list o r i en t ed t o w a r d n o r m a l o p e r a t i o n w h e n the admin i s t r a t i ve server is inaccessible. It is a subset because some opera t ions , e.g. , chang ing passwords, p resumably w i l l be de fe r red w h e n the a d m i n i s t r a t i v e server is inaccessible, and some d i rec tor ies , e .g . , / u s r / m a n and /us r / spoo l /news , are l i k e l y to be e m p t y , except o n the a d m i n i s t r a t i v e server, and thus un in t e r e s t i ng w h e n that m a c h i n e is unava i lab le ,

/ e t c / u g . S S I , / e t c / o u g . S S I . DS provides general t r ans l a t ion mechan i sms for user and g roup id t rans la t ion [ 1 , 2 ] , For mach ines w i t h i n the cluster , there shou ld he one to one t rans la t ions , so that n u m e r i c id ' s are the same on a l l mach ines in the cluster, but mach ines i n the cluster may also need translates lo o the r mach ines ou ls ide the cluster . ug.SSl is used for a cluster w ide d e f i n i t i o n o f Ihe translates. For b r e v i l y , we w i l l n o l discuss the con ten t o f these f i les .

Home Directories

F o r sake o f s i m p l i c i t y , i t is assumed that h o m e d i r ec to r i e s ' pa lhnames have Ihe f o r m . . . / m a c h i n e / u s e r , whe re " m a c h i n e " is Ihe name o f Ihe m a c h i n e where ihe h o m e d i r ec to ry is ac tua l ly s tored . Even though paths are o f Ihis f o r m , users w i l l see the same actual h o m e d i r ec to ry o n each m a c h i n e o f the cluster , e .g . , i n ou r e n v i r o n m e n t w h e n user sauer is logged i n t o m a c h i n e d7S , h is h o m e d i r ec to ry is s t i l l /u/auschs/sauer, since his h o m e d i r ec to ry is s tored on auschs. T h i s is a m i n o r sacr if ice o f t ransparency , since users usual ly d o not use roo ted paths l o get to the i r h o m e d i rec ­tories — the i r h o m e d i r e c t o r y is l isted i n /e lc /passwd, so t ha i is whe re they start, cd takes I h e m back there , and the c shell " - " n o t a t i o n is o f l en used to gel lo o lhe r users h o m e d i rec to r i e s , e .g . , cd -da l e . Sh ip l ey has proposed a s i m i l a r c o n v e n t i o n for shar ing h o m e d i rec tor ies [ 7 ] ,

H a v i n g paths oT this f o r m a l lows each m a c h i n e to s i m p l y m o u n l Ihe h o m e d i rec tor ies s lored o n o the r mach ines , e .g . , m o u n l - n auschs /u/auschs /u/auschs, or , i n genera l ,

for i i n 'ca t / e t c /SSImach ines '

do

iT [ Si 1= S m y n a m e ]

then

m o u n l - i - n Si / u / $ i / u / J i

fi

done

T h i s is a s l igh l ly s i m p l i f i e d f ragment f r o m / e t c / S S I m o u n l s , discussed b e l o w .

Machine Initialization

A s i n i t processing, a f ie r n o r m a l s tandalone I n i t i a l i z a t i o n , e .g . , fsck and device m o u n t s , / e t c / r c . D S starts DS and then runs / e l c / S S I m o u n t s . S S I m o u n i s runs i n the backg round so that l oca l opera t ions can begin w i t h o u t server ava i l ab i l i t y . S S I moun i s runs on a l l mach ines , i n c l u d i n g Ihe a d -

February 17, 1988 ens - 7

Page 8: Presenting a Single System Image with Fine Granularity Mounts

minserver . F o l l o w i n g are s imp l i f i ed sketches t aken f r o m SSImoun i s . E r ro r checks , louches /mkci i r s for m o u n l po in t s , p recau t iona ry u m o u n t s , e tc . are o m i t t e d :

Initial mounts from adminserver, updating local copies of files:

i f [ S m y n a m e 1= Sadminserver ] (hen

u n t i l m o u n t - i - n Sadminserver /na t ive /Sadminserver do

sleep Sdelay

done for i i n 'cat / c i c / se rve r . f i l e s ' do

' c p - p /SadminserverSi $ i m o u n t - n Sadminserver / n a l i v c S i $ i

done

for i i n 'cat /e tc/server .dirs* d o

m o u n t - i - n Sadminserver / n a l i v c S i Si

iT [ $ i = ' / u s r / m a i l ' ]

t hen

/ e t c / m o v e m a i l & It see b e l o w

.n done

fi

Start / e t c / u m o u i i t d

m a k e u m o u n t d / e t c / u m o u n t d &

Update user/group ids.

c m p /e tc /ug .SSI / c t c /oug .SSI i f [ $? - n e 0 1 t hen

d s l d x p r o f - a - f /e tc /ug .SSI

i f | $? - e q 0 ]

i h e n

cp - p / e t c /ug .5S l /e tc /nug.SSI

0 • f i

Mount home directories. This is as ind ica ted in the previous f ragment , except lha t ihe m o u n l s are r e t r i ed asynchronous ly in the background so lhat ava i l ab i l i ty o f any given m a c h i n e doesn ' t delay ava i l ab i l i t y o f h o m e d i r e c l o i i e s f r o m other mach ines .

O n c e these steps have been p e r f o r m e d , then the m a c h i n e has j o i n e d the single system image.

letcimovemail. M a i l received w h i l e ihe admin i s t r a t i ve server is n o l avai lable w i l l be k e p i i n Ihe na l tve spool d i r e c l o r y , / u s r / m a i l . m o v e m a i l moves m a i l f r o m Ihe na l i ve spoo l d i r ec to ry lo Ihe shared

February 17, 1988 CHS - S

Page 9: Presenting a Single System Image with Fine Granularity Mounts

spool d i r e c t o r y whenever i h e shared d i r ec to ry is m o u n t e d , c i the r by SSImount s or r emoun t s (see b e l o w ) .

umountd

T h e key r e m a i n i n g topic is the d aemon u m o u n t d . u m o u n t d uses a p o l l i n g l o o p , p e r f o r m i n g the f o l l o w i n g func t ions and then s leeping u n t i l r epea t ing the func t ions . T h e default sleep t i m e is 60 seconds.

Detection of server inaccessibility, u m o u n t d a t tempts to open each o f Ihe files l is ted i n /e tc /server . f i les . I f an open fai ls , u m o u n l d assumes the server is inaccessible and executes / e t c / r emoun t s , / e t c / r emoun t s u n m o u n t s a l l o f Ihe files i n / e t c / r emoun t s . l i s t and i h e n a l l empts to r e m o u n t t h e m , r emoun t s w i l l execute m o v e m a i l after successfully r e m o u n t i n g / u s r / m a i l . ( u m o u n t d runs on adminserver , but skips Ihcse steps.)

Updating modified files, u m o u n t d de te rmines w h e t h e r any o f ihe files i n /etc/server, files have been upda ted . I f so, u m o u n l d locks Ihe server copy and updates the na t ive copy , ( u m o u n t d t u n n i n g on adminserver skips these steps.)

Detection of configuration changes, i f key c o n f i g u r a t i o n f i les , e .g . , / e l c / adminse rve r or /e lc / server . f i les , have been u p d a l e d , u m o u n t d exec 's S S I m o u n i s . SSImount s applies ihe changes and Ihen starts a fresh vers ion o f u m o u n l d , as ind ica ted above.

T h e r e are subtleties o f l o c k i n g and t i m i n g w h i c h we o m i t for b r c v i l y . S ta r l ing w i t h DS 1.2, Ihe source for u m o u n t d . c is i n c l u d e d i n the DS samples d i r ec to ry , a long w i l h the in s t a l l a t ion d o c u m e n l a t i o n , i n s t a l l a t i o n c o m m a n d and so f o r l h .

N o l e Ihe p o w e r o f Ihe above m e c h a n i s m s . I3y s i m p l y chang ing Ihe name o f the server i n / e t c / adminsc rve r , e.g. , for a scheduled outage o f Ihe adminserver , the mach ines i n the cluster w i l l shif t to the new adminserver i n a couple o f m i n u l e s , w i t h o u t r e b o o t i n g any o f the mach ines or o ther ­wise s ign i f i can t ly d i s rup t ing users. F o r an unschedu led server outage o f s igni f icant d u r a t i o n , a swi t chove r lo a d i f f e r e n i server can he a c c o m p l i s h e d by chang ing ihe adminse rve r f i le on each o f the o l h e r mach ines and r eboo t ing t h e m . I n e i ther case, scheduled or unschedu led , the o r i g i n a l server can re jo in Ihe cluster as a c l i e n l w h e n it is ready to r e jo in the cluster, and then assume Ihe server ro le again w h e n its c o n f i g u r a t i o n files have been u p d a t e d .

S U M M A R Y

We bel ieve this approach effect ively meets the o p e r a l i o n a l d e f i n i t i o n g iven i n the I n t r o d u c t i o n , i n regard l o user accounts , da la , ava i l ab i l i t y , and admin i s t e red services. T h e m e c h a n i s m s are designed to be s imple to apply and admin i s t e r , yet h igh ly e f fec l ive in present ing the image o f a single system. -These mechanisms are c o m p l e m e n t a r y l o the u n d e r l y i n g file system mechan i sms o f D i s ­t r ibu ted Services, and o r thogona l to o lhe r enhancement s such as the code server m e c h a n i s m s . T h u s Ihe same u n d e r l y i n g mechan i sms arc app l i ed across the d i s t r ibu ted e n v i r o n m e n t , for example the abs t rac t ion or our software d e v e l o p m e n t e n v i r o n m e n t depic ted i n Figure 1. I n a d d i t i o n , s igni f icant a d m i n i s l r a l i v e f l ex ib i l i t y is present, as suggested i n Ihe p reced ing paragraph. These concepts c o u l d be app l i ed to o l h e r d i s t r ibu ted file systems s u p p o r t i n g f ine g ranu la r i ty m o u n t s .

R E F E R E N C E S

1. Charles I I . Sauer, D o n W . Johnson , L a r r y L o u c k s , A m a l A . S h a h e c n - O o u d a , T o d d A . S m i t h , , " R T PC Dis t r i bu ted Services: O v e r v i e w , " Operating Systems Review 21. 3 ( Ju ly 1987) pp . 1 8 - 2 9 .

2 . Char les I I . Sauer, D o n W . Johnson , L a r r y L o u c k s , A m a l A . S h a h c c n - G o u d a , T o d d A . S m i t h , " R T PC Dis t r ibu ted Services: F i l e S y s t e m , " ;togifi: 12. 5 ( S e p t e m b e r / O c t o b e r 1987) pp . 1 2 - 2 2 .

February 17, 1988 ens - 9

Page 10: Presenting a Single System Image with Fine Granularity Mounts

3. Charles H . Sauer, D o n W. Johnson , L a r r y L o u c k s , A m a l A , S h a h e e n - G o u d a , T o d d A . S m i t h , "Statelessness and Slatefulness i n D i s t r i b u t e d Serv ices ," UniForum 1988, Dal las , Texas , February 1988.

4. Jason L e v i t t , " T h e I B M R T Gets C o n n e c t e d , " BYTE 12, 12 ( 1 9 8 7 ) p p . 1 3 3 - 1 3 8

5. R. Sandberg, D . Go ldbe rg , S. K l e i m a n , D a n Walsh and D, L y o n , "Des ign and I m p l e m e n t a t i o n o f the Sun N e t w o r k Fi le S y s t e m , " USENIX Conference Proceedings, P o r t l a n d , June 1985.

6. Sun Mic rosys t ems , i n c . , Networking on the Sun Workstation, Febura ry 1986.

7. M . Sh ip l ey , " T h e V i r t u a l H o m e E n v i r o n m e n t , " UniForum 1988, Dal las , Texas , February 1988.

February 17, 1988 ens - I O


Recommended