Scheduling & Resource Management in Distributed Systems
Rajesh Rajamani, [email protected]
http://www.cs.wisc.edu/condorMay 2001
OutlineOutline Hi-throughput computing and CondorHi-throughput computing and Condor Resource Management in distributed systemsResource Management in distributed systems MatchmakingMatchmaking Current research/Misc.Current research/Misc.
Power = Work / TimePower = Work / TimeHigh Performance ComputingHigh Performance Computing
• Fixed amount of work; how much time?Fixed amount of work; how much time?• Traditional Performance metrics: FLOPS, MIPS Traditional Performance metrics: FLOPS, MIPS • Response time/latency orientedResponse time/latency oriented
High Throughput ComputingHigh Throughput Computing• Fixed amount of time; how much work?Fixed amount of time; how much work?• Application specific performance metrics Application specific performance metrics • Throughput orientedThroughput oriented
Power of Computing Power of Computing environmentsenvironments
In other words …In other words … HPC - Enormous amounts of computing power over HPC - Enormous amounts of computing power over
relatively short periods of timerelatively short periods of time(+) Good for applications under sharp time constraint(+) Good for applications under sharp time constraint
HTC - Large amounts of computing power for HTC - Large amounts of computing power for lengthy periodslengthy periods
(+) What if u want to simulate 1000 applications on ur (+) What if u want to simulate 1000 applications on ur latest DSP chip design over the next 3 months??latest DSP chip design over the next 3 months??
The Condor ProjectThe Condor Project Goal - To develop, Goal - To develop, implement, deploy,
and evaluate mechanisms and policies that support High Throughput Computing (HTC) on large collections of distributively owned computing resources
More about CondorMore about Condor Started in late 80’s Started in late 80’s Principal Investigator - Prof.Miron LivnyPrincipal Investigator - Prof.Miron Livny Latest version 6.3.0 releasedLatest version 6.3.0 released Supports 14 different platforms (OS + Arch) Supports 14 different platforms (OS + Arch)
including Linux, Solaris and WinNTincluding Linux, Solaris and WinNT Currently employs over 20 students and 5 staffCurrently employs over 20 students and 5 staff We write code, debug, port, publish papers and We write code, debug, port, publish papers and
YES, we also provide support !!!YES, we also provide support !!!
Distributed ownership of Distributed ownership of resourcesresources
Underutilized - 70% of CPU cycles in a cluster go Underutilized - 70% of CPU cycles in a cluster go wastewaste
Fragmented - Resources owned by different peopleFragmented - Resources owned by different people Use these resources to provide HTC, BUT without Use these resources to provide HTC, BUT without
impacting QOS available to ownerimpacting QOS available to owner Achieved by allowing the user to set access policy Achieved by allowing the user to set access policy
using control expressionsusing control expressions
Access policyAccess policy Current state of the resource (eg, keyboard idle Current state of the resource (eg, keyboard idle
for 15 minutes or load average less than 0.2)for 15 minutes or load average less than 0.2)
Characteristics of the request (run only jobs of Characteristics of the request (run only jobs of research associates)research associates)
Time of day/night that jobs can be runTime of day/night that jobs can be run
What happens when u What happens when u submit a jobsubmit a job
Central Manager
Submitting machine
Available resource
1. User submits a job
Resources announce theirproperties periodically
2. Submitting machine sendsClassad of the job
3. MatchmakerNotifies parties of a match
4. Parties negotiate
Important MechanismsImportant MechanismsMechanismMechanism ForFor
MatchmakingMatchmaking Resource ManagementResource Management
CheckpointingCheckpointing Saving the state of a jobSaving the state of a job
BypassBypass Remote system callsRemote system calls
DAGMANDAGMAN Automatic job Automatic job submission based on submission based on dependency graphdependency graph
Master-WorkerMaster-Worker Exploiting task level Exploiting task level parallelismparallelism
Condor ArchitectureCondor Architecture ManagerManager
• Collector: Database of resourcesCollector: Database of resources• Negotiator: MatchmakerNegotiator: Matchmaker• Accountant: Priority maintenanceAccountant: Priority maintenance
Startds ( Represent owners of resources)Startds ( Represent owners of resources)• Implement owner's access control policyImplement owner's access control policy
Schedds ( Represent customers of the system)Schedds ( Represent customers of the system)• Maintain persistent queues of resource requestsMaintain persistent queues of resource requests
Condor Architecture, cont.Condor Architecture, cont.
Power of CondorPower of Condor Solves NUG30 Quadratic assignment problem, posed in 1968 Solves NUG30 Quadratic assignment problem, posed in 1968
over a period of over a period of 6.9 days6.9 days, delivering over 96,000 CPU hours by , delivering over 96,000 CPU hours by commandeering an average of 650 machines !!!commandeering an average of 650 machines !!!
Compare this with the RSA-155 problem posed in 1977 and Compare this with the RSA-155 problem posed in 1977 and solved using 300 computers (over a period of 7 months) in the solved using 300 computers (over a period of 7 months) in the last 90s. If you were to use the same amount of resources as that last 90s. If you were to use the same amount of resources as that used to solve NUG30, this could’ve been done in used to solve NUG30, this could’ve been done in 2 weeks2 weeks !!! !!!
““It (Chorus production) was done in parallel on machines in the It (Chorus production) was done in parallel on machines in the computer center running XXX, and on the office machines under computer center running XXX, and on the office machines under Condor. The Condor. The latter did about 90%latter did about 90% of the work!” - of the work!” - - Helge MEINHARD- Helge MEINHARD (EP division, CERN)(EP division, CERN)
Resource management Resource management using Matchmaking using Matchmaking
Opportunistic Resource ExploitationOpportunistic Resource Exploitation• Resource availability is unpredictableResource availability is unpredictable
– Exploit resources as soon as they are availableExploit resources as soon as they are available– Matchmaking performed continuouslyMatchmaking performed continuously
As against a centralized scheduler which would’ve As against a centralized scheduler which would’ve to deal with -to deal with -• Heterogeneity of resourcesHeterogeneity of resources• Distributed Ownership - widely varying allocation Distributed Ownership - widely varying allocation
policiespolicies• Dynamic nature of the clusterDynamic nature of the cluster
Classified AdvertisementsClassified Advertisements A simple language used by resource providers and A simple language used by resource providers and
customers to express their properties/requirements to the customers to express their properties/requirements to the CollectorCollector
Uses a semi-structured data model => no specific schema is Uses a semi-structured data model => no specific schema is required by the matchmaker, allowing it to work naturally in required by the matchmaker, allowing it to work naturally in a heterogeneous enva heterogeneous env
Language folds query language into the data model. Language folds query language into the data model. Constraints may be expressed as attributes of the classadConstraints may be expressed as attributes of the classad
Should conform to advertising protocolShould conform to advertising protocol
Matchmaking with Matchmaking with ClassadsClassads
4 steps to managing resources -4 steps to managing resources -1.1. Parties requiring matchmaking advertise their Parties requiring matchmaking advertise their
characteristics, preferences, constraints, etc.characteristics, preferences, constraints, etc.2.2. Advertisements matched by a MatchmakerAdvertisements matched by a Matchmaker3.3. Matched entities are notifiedMatched entities are notified4.4. Matched entities establish an allocation through a Matched entities establish an allocation through a
claiming process - could include authentication, claiming process - could include authentication, constraint verification, negotiation of terms etcconstraint verification, negotiation of terms etc
Method is symmetricMethod is symmetric
Classad exampleClassad exampleSample classad of a workstationSample classad of a workstation
[ Type [ Type = “Machine”; = “Machine”; OpSys = “Linux”;OpSys = “Linux”; Arch = “INTEL”;Arch = “INTEL”; Memory = 256 M;Memory = 256 M; Constraint = Constraint = truetrue;;] ]
Sample classad of a JobSample classad of a Job
[ Type [ Type = “Job”;= “Job”; Owner Owner = “run_sim”;= “run_sim”; Constraint Constraint == other.Type ==“Machine” &&other.Type ==“Machine” &&Arch == “INTEL && Arch == “INTEL && Opsys == “Solaris251” &&Opsys == “Solaris251” &&Other.Memory >= Memory;Other.Memory >= Memory;] ]
Example Classad Example Classad (workstation)(workstation)
[ [ TypeType = = “Machine”;“Machine”;ActivityActivity == “Idle”;“Idle”;NameName == “crow.cs.wisc.edu”;“crow.cs.wisc.edu”;ArchArch == “INTEL”;“INTEL”;OpSysOpSys == “Solaris251”;“Solaris251”;KflopsKflops == 21893;21893;MemoryMemory = = 64;64;DiskDisk == 323496; 323496; //KB//KBDayTimeDayTime == 36107;36107;
Example Classad (contd.)Example Classad (contd.)
ResearchGrpResearchGrp = {“miron”, “thain”, “john”};= {“miron”, “thain”, “john”};UntrustedUntrusted = {“bgates”, “lalooyadav”, “thief”= {“bgates”, “lalooyadav”, “thief” };};RankRank = member(other.Owner, = member(other.Owner,
ResearchGrp)*10;ResearchGrp)*10;ConstraintConstraint = !member(other.Owner, Untrusted) = !member(other.Owner, Untrusted)
&& Rank >= 10 ?true : false&& Rank >= 10 ?true : false //To prevent //To prevent malicious usersmalicious users
]]
Example Classad Example Classad (Submitted job)(Submitted job)
[[TypeType == “Job”;“Job”;QDateQDate == 886799469;886799469;OwnerOwner == “raman”;“raman”;CmdCmd == run_sim;run_sim;IwdIwd == /usr/raman/sim2;/usr/raman/sim2;MemoryMemory == 31;31;RankRank == Kflops/1e3 + other.Memory/32;Kflops/1e3 + other.Memory/32;
ConstraintConstraint == other.Type == “Machine” && OpSys == other.Type == “Machine” && OpSys == “Solaris251”&& Disk >= 10000 && other.Memory >= self.Memory;“Solaris251”&& Disk >= 10000 && other.Memory >= self.Memory;
]]
MatchmakingMatchmaking Evaluates expressions in an environment that allows each Evaluates expressions in an environment that allows each
classad to access attributes of the otherclassad to access attributes of the other• Other.Memory >= self.Memory;Other.Memory >= self.Memory;
References to non-existent attribute evaluates to References to non-existent attribute evaluates to undefinedundefined Considers pairs of ads incompatible unless their Considers pairs of ads incompatible unless their ConstraintConstraint
expressions both evaluate to trueexpressions both evaluate to true RankRank is then then used to choose among compatible matches is then then used to choose among compatible matches Both parties are notified about the match - could generate Both parties are notified about the match - could generate
and hand-off session key for authentication and securityand hand-off session key for authentication and security
Separation of Matching Separation of Matching and Claimingand Claiming
Weak consistency requirements - Claiming allows provider Weak consistency requirements - Claiming allows provider and customer to verify their constraints with respect to their and customer to verify their constraints with respect to their current statecurrent state
Claiming protocol could use cryptographic techniques Claiming protocol could use cryptographic techniques (authentication)(authentication)
Principals involved in a match are themselves responsible Principals involved in a match are themselves responsible for establishing, maintaining and servicing a match for establishing, maintaining and servicing a match
Work outside the Condor Work outside the Condor kernel- New challengeskernel- New challenges
Mulitlateral Matchmaking - GangmatchingMulitlateral Matchmaking - Gangmatching IO regulation and Disk allocation - KangarooIO regulation and Disk allocation - Kangaroo User interfaces - ClassadViewUser interfaces - ClassadView Grid applications - GlobusGrid applications - Globus Security Security
SummarySummary Matchmaking provides a scalable and robust Matchmaking provides a scalable and robust
resource management solution for HTC resource management solution for HTC environments environments
Classads are used by workstations and jobs Classads are used by workstations and jobs Matchmaker forms the match and informs the Matchmaker forms the match and informs the
parties, who in turn invoke the claiming protocolparties, who in turn invoke the claiming protocol The parties are responsible for establishing, The parties are responsible for establishing,
maintaining and servicing a matchmaintaining and servicing a match Questions ? Questions ?
Gangmatch requestGangmatch request[[
TypeType = = “Job”;“Job”;OwnerOwner == “raj”;“raj”;CmdCmd == run_sim;run_sim;PortsPorts == {{ [ Label [ Label = “cpu”; = “cpu”; ImageSize ImageSize = 28 M; = 28 M; //Rank and constraints ],//Rank and constraints ], [Label[Label = “License”; = “License”; HostHost = cpu.Name;= cpu.Name; //Rank and constraints ]//Rank and constraints ]}}
]]