DistributedsystemsLecture2:TheNetworkFileSystem(NFS)and
ObjectOrientedMiddleware(OOM)
Dr.RobertN.M.Watson
1
Lasttime• Distributedsystemsareeverywhere– Challengesincludingconcurrency,delays,failures– Theimportanceoftransparency
• Simplestdistributedsystemsareclient/server– Clientsendsrequestasmessage– Servergetsmessage,performsoperation,andreplies– Somecarerequiredhandlingretrysemantics,timeouts
• OnepopularmodelisRemoteProcedureCall(RPC)– Clientcallsfunctionsontheservervianetwork– Middleware generatesstubcodewhichcanmarshal/unmarshalarguments/returnvalues– e.g.SunRPC/XDR
– Transparencyfortheprogrammer,notjusttheuser
2
Firstcasestudy:NFS• NFS =NetworkedFileSystem(developedSun)– Aimedtoprovidedistributedfilingbyremoteaccess
• Keydesigndecisions:– Distributedfilesystemvs.remotedisks– Client-servermodel– Highdegreeoftransparency– Tolerantofnodecrashesornetworkfailure
• Firstpublicversion,NFSv2(1989),didthisvia:– Unixfilesystemsemantics(oralmost)– Integrationintokernel(includingmount)– Simplestatelessclient/serverarchitecture
• A setofRPC“programs”:mountd,nfsd,lockd,statd,...
Transparencyforusersand
applications,butalsoNFSprogrammers:henceSunRPC
3
NFS:Client/ServerArchitecture
UserProgram
Syscall Level
Clientside
RPCRequest
VFSLayer
LocalFS NFSClient
Syscall Level
ServerSide
VFSLayer
NFSServer LocalFS
RPCResponse
1
2
3 4
5
• Clientusesopaquefilehandlestorefertofiles• Servertranslatesthesetolocalinode numbers• SunRPC withXDRrunningoverUDP(originally)
4
NFS:mountingremotefilesystems
NFSClient
NFSServer
/tmp
/
/home
x y
/home
/
/bin
foo bar
• NFSRPCsaremethodsonfilesidentifiedbyfilehandle(s)• Bootstrapviadedicatedmount RPC‘program’that:
– Performsauthentication(ifany);– Negotiatesanyoptionalsessionparameters;and– Returnsrootfilehandle
5
NFSfilehandlesandscoping
• Purenames exposenovisiblesemantics(e.g.,NFShandle)• Impurenames haveexposesemantics(e.g.,filepaths)
UserProgram
Syscall Level
VFSLayer
NFSClient
VFSLayer
NFSServer
LocalFS
int
file*vnode *vnode *
nfsnode *fhandle_t fhandle_t
vnode *
fhandle_tvnode *inode *
fsid
len
Exampleserver-defined fhandle_t
pad ino
NFS
Localfilesystemgen
6
• Argumentsateachlayerarewithspecificscopes– Layerstranslatebetweennamespacesforencapsulation– Contentsofnamesbetweenlayersoftenopaque
NFSisstateless
• KeyNFSdesigndecisiontoeasefaultrecovery– Obviously,filesystemsaren’tstateless,so…
• Statelessmeanstheprotocoldoesn’trequire:– Keepinganyrecordofcurrentclients– Keepinganyrecordofcurrentopenfiles
• Servercancrash+reboot,andclientsdonothavetodoanything(exceptwait!)
• Clientscancrash,andserversdonotneedtodoanything(nocleanupetc)
7
Implicationsofstateless-ness• No“open”or“close”operations
– fh = lookup(<directory fh>, <filename>)– Allfileoperationsareviaper-filehandles
• NoimpliedstatelinkingmultipleRPCs;e.g.,– UNIXfiledescriptorhas“currentoffset”forI/O:
read(fd, buf, 2048)– NFSfilehandlehasnooffset;operationsareexplicit:
read(fh, buf, offset, 2048)• Thismakesmanyoperationsidempotent– ThisuseofSunRPC givesat-least-oncesemantics– Toleratemessageduplicationinnetwork,RPCretries
• ChallengesinprovidingUnixFSsemantics…
8
Semantictricks(andmesses)• rename(<old filename>, <new filename>)– Fundamentallynon-idempotent– Strongexpectationofatomicity– Servers-side“cache”recentRPCrepliesforreplay
• unlink(<old filename>)– UNIXrequiresopenfilestopersistafterunlink()– Whatiftheserverremovesafilethatisopenonaclient?– Sillyrename:clientstranslateunlink() torename()– Onlywithinclient(notserverdelete,norforotherclients)– Otherclientswillhaveastale filehandle:ESTALE
• Statelessfilelocking seemsimpossible– Problemavoided(?):separateRPCprotocols
9
Performanceproblems
• Neithersideknowsifotherisaliveordead– Allwritesmustbesynchronouslycommittedonserverbeforeitreturnssuccess
• Verylimitedclientcaching…– Riskofinconsistentupdatesifmultipleclientshavefileopenforwritingatthesametime
• ThesetwofactsalonemeantthatNFSv2hadtrulydreadful performance
10
NFSv3(1995)
• Mostlyminorprotocolenhancements– Scalability
• Removelimitsonpath- andfile-namelengths• Allow64-bitoffsetsforlargefiles• Allowlarge(>8KB)transfer-sizenegotiation
– Explicitasynchrony• Servercandoasynchronouswrites(write-back)• Clientsendsexplicitcommit aftersome#writes• Filetimestampspiggybackedonserverrepliesallowclientstomanagecache:close-to-openconsistency
– OptimizedRPCs(readdirplus,symlink)• Buthadmajor impactonperformance
11
NFSv3readdirplus
• NFSv2behaviour for“ls –l”– readdir() triggersNFS_READDIR torequestnamesandhandles
– stat() oneachfiletriggersoneNFS_GETATTR RPC
• NFS3_READDIRPLUS returnsanames,handles,andattributes– Eliminatesavastnumberof
round-triptimes• Principle:masknetworklatencyby
batchingsynchronousoperations
NFSv2Client
NFSv2Server
NFSv3Client
NFSv3Server
NFSv3
READDIR
GETATTR
GETATTR
GETATTR
READDIRPLUS
4xRTT(Round-TripTime)
1xRTT
drwxr-xr-x 55 al565 al565 12288 Feb 8 15:47 al565/drwxr-xr-x 115 am21 am21 49152 Feb 10 18:19 am21/drwxr-xr-x 214 atm26 atm26 36864 Feb 1 17:09 atm26/
12
Distributedfilesystemconsistency• Canadistributedapplicationexpectdatawrittenonclient
A tobevisibletoclientB?– Afterwrite() onA,willaread() onB seeit?– WhatifaprocessonAwritestoafile,andthensendsa
messagetoaprocessonBtoreadthefile?• InNFSv3,no!
– Amayhavefreshlywrittendatainitscache thatithasnotyetsenttotheserverviaawriteRPC
– TheserverwillreturnstaledatatoB’sreadRPC• Or:
– Bmayreturnstaledatainitscachefromapriorread• Thisproblemisknownasinconsistency:
– Clientsmayseedifferentversionsofthesamesharedobject
13
NFSclose-to-openconsistency(1)• Guaranteeingglobalvisibilityforeverywrite()requiredsynchronousRPCsandpreventedcaching
• NFSv3implementsclose-to-open consistency,whichreducessynchronousRPCsandpermitscaching1. Foreachfileitstores,theservermaintainsa timestamp
ofthelastwriteperformed2. Whenafileisopened,theclientreceivesthetimestamp;
ifthetimestamphaschangedsincedatawascached,theclientinvalidates itsreadcache,forcingfreshreadRPCs
3. Whilethefileisopen,datareads/writesforthefilecanbecachedontheclient,andwriteRPCscanbedeferred
4. Whenthefileisclosed,pendingwritesmustbesenttotheserver(andack’d)beforeclose() canreturn
14
NFSclose-to-openconsistency(2)• Wenowhaveaconsistencymodelthatprogrammerscan
usetoreasonaboutwhenwriteswillbevisibleinNFS:– IfaprogramonhostA needswritestoafiletobevisibletoa
programonhostB,itmustclose() thefile– IfaprogramonhostB needsreadsfromafiletoincludethose
writes,itmustopen() itafter thecorrespondingclose()• Thisworksquitewellforsomeapplications– E.g.,distributedbuilds:inputs/outputsarewholefiles– E.g.,UNIXmaildir format(eachemailinitsownfile)
• Itworksverybadlyforothers– E.g.,long-runningdatabasesthatmodifyrecordswithinafile– E.g.,UNIXmbox format(allemailsinonelargefile)
• ApplicationsusingNFStosharedatamustbedesignedforthesesemantics,ortheywillbehaveverybadly!
15
NFSv4(2003)
• Timeforamajorrethink– Singlestatefulprotocol(includingmount,lock)– TCP(oratleastreliabletransport)only– Explicitopen andclose operations– Sharereservations– Delegation– Arbitrarycompoundoperations– ManylessonslearnedfromAFS(laterinterm)
• Nowseeingsignificantdeployment
16
ImprovingoverSunRPC• SunRPC (now“ONCRPC”)verysuccessfulbut– Clunky(manualprogram,procedurenumbers,etc)– Limitedtypeinformation(evenwithXDR)– Hardtoscalebeyondsimpleclient/server
• OneimprovementwasOSFDCE(early90’s)– AnotherprojectthatlearnedfromAFS– DCE=“DistributedComputingEnvironment”– Largermiddlewaresystemincludingadistributedfilesystem,adirectoryservice,andDCERPC
– Dealswithacollectionofmachines– acell – ratherthanjustwithindividualclientsandservers
17
DCERPCversusSunRPC• Quitesimilarinmanyways– InterfaceswritteninInterfaceDefinitionNotation(IDN),andcompiledtoskeletonsandstubs
– NDRwireformat:little-endianbydefault!– Canoperateovervarioustransportprotocols
• Bettersecurity,andlocationtransparency– Servicesidentifiedby128-bit“Universally”UniqueIdentifiers(UUIDs),generatedbyuuidgen
– ServerregistersUUIDwithcell-widedirectoryservice– Clientcontactsdirectoryservicetolocateserver…whichsupportsservicemove,orreplication
18
Object-OrientedMiddleware• SunRPC /DCERPCforwardfunctions,anddonothavesupportformorecomplextypes,exceptions,orpolymorphism
• Object-OrientedMiddleware(OOM)aroseintheearly90stoaddressthis– AssumeprogrammeriswritinginOO-style– ’Remoteobjects’willbehavelikelocalobjects,buttheymethodswillbeforwardedoverthenetworkalaRPC
– Referencestoobjectscanbepassedasargumentsorreturnvalues– e.g.,passingadirectoryobjectreference
• Makesitmucheasiertoprogram– especiallyifyourprogramisobjectoriented!
19
CORBA(1989)
• FirstOOMsystemwasCORBA– CommonObjectRequestBrokerArchitecture– specifiedbytheOMG:ObjectManagementGroup
• OMA(ObjectManagementArchitecture)isthegeneralmodelofhowobjectsinteroperate– Objectsprovideservices.– Clientsmakesarequesttoanobjectforaservice.– Clientdoesn’tneedtoknowwheretheobjectis,oranythingabouthowtheobjectisimplemented!
– Objectinterfacemustbeknown(public)
20
ObjectRequestBroker(ORB)
• TheORBisthecoreofthearchitecture– Connectsclientstoobjectimplementations– Conceptuallyspansmultiplemachines(inpractice,ORBsoftwarerunsoneachmachine)
ORB
ObjectImplementationClient
GeneratedStubCode
GeneratedSkeletonCode
21
InvokingObjects
• Clientsobtainanobjectreference– Typicallyviathenamingservice ortradingservice– (Objectreferencescanalsobesavedforuselater)
• InterfacesdefinedbyCORBAIDL• Clientscancallremotemethodsin2ways:
1. StaticInvocation:usingstubsbuiltatcompiletime(justlikewithRPC)
2. DynamicInvocation:actualmethodcalliscreatedonthefly.Itispossibleforaclienttodiscovernewobjectsatruntimeandaccesstheobjectmethods
22
CORBAIDL• Definitionoflanguage-independentremoteinterfaces– LanguagemappingstoC++,Java,Smalltalk,…– TranslationbyIDLcompiler
• Typesystem– basictypes:long(32bit),longlong (64bit),short,float,char,boolean,octet,any,…
– constructedtypes:struct,union,sequence,array,enum– objects (commonsupertypeObject)
• Parameterpassing– in,out,inout (=sendremote,modify,update)– basic&constructedtypespassedbyvalue– objectspassedbyreference
23
CORBAProsandCons• CORBAhassomeuniqueadvantages– Industrystandard(OMG)– Language&OSagnostic:mixandmatch– RicherthansimpleRPC(e.g.interfacerepository,implementationrepository,DIIsupport,…)
– Manyadditionalservices(trading&naming,events¬ifications,security,transactions,…)
• However:– Really,reallycomplicated/ugly/buzzwordy– Poorinteroperability,atleastatfirst– Generallytobeavoidedunlessyouneedit!
24
Summary+nexttime• NFSasanRPC,distributed-filesystemcasestudy– Retrysemanticsvs.RPCsemantics– Scoping,purevs.impurenames– Close-to-openconsistency– Batchingtomasknetworklatency
• DCERPC• Object-OrientedMiddleware(OOM)• CORBA
• Javaremotemethodinvocation(RMI)• XML-RPC,SOAP,etc,etc,etc.• Startingtotalkaboutdistributedtime
25