Distributed Operating SystemsCS551
Colorado State University
at Lockheed-Martin
Lecture 6 -- Spring 2001
7 March 2001 CS-551, Lecture 6 2
CS551: Lecture 6
Topics– Distributed Process Management (Chapter 7)
Distributed Scheduling Algorithm Choices Scheduling Algorithm Approaches Coordinator Elections Orphan Processes
– Distributed File Systems (Chapter 8) Distributed Name Service Distributed File Service Distributed Directory Service
7 March 2001 CS-551, Lecture 6 3
Distributed Deadlock Prevention
Assign each process a global timestamp when it starts
No two processes should have same timestamp
Basic idea: “When one process is about to block waiting for a resource that another process is using, a check is made to see which has a larger timestamp (i.e. is younger).” Tanenbaum, DOS (1995)
7 March 2001 CS-551, Lecture 6 4
Distributed Deadlock Prevention
Somehow put timestamps on each process, representing creation time of process
Suppose a process needs a resource already owned by another process
Determine relative ages of both processes Decide if waiting process should Preempt,
Wait, Die, or Wound owning process Two different algorithms
7 March 2001 CS-551, Lecture 6 5
Distributed Deadlock Prevention
Allow wait only if waiting process is older– Since timestamps increase in any chain of
waiting processes, cycles are impossible Or allow wait only if waiting process is
younger– Here timestamps decrease in any chain of
waiting process, so cycles are again impossible Wiser to give older processes priority
7 March 2001 CS-551, Lecture 6 6
Example: wait-die algorithm
54 79
79 54
WaitsHolds resourceWants resource
Wants resource Holds resource
Dies
7 March 2001 CS-551, Lecture 6 7
Example: wound-wait algorithm
54 79
79 54
PreemptsHolds resourceWants resource
Wants resource Holds resource
Waits
7 March 2001 CS-551, Lecture 6 8
Algorithm Comparison
Wait-die kills young process– When young process restarts and requests
resource again, it is killed once more– Less efficient of these two algorithms
Wound-wait preempts young process– When young process re-requests resource, it
has to wait for older process to finish– Better of the two algorithms
7 March 2001 CS-551, Lecture 6 9
Figure 7.7 The Bully Algorithm. (Galli, p. 169)
7 March 2001 CS-551, Lecture 6 10
Process Management in a Distributed Environment Processes in a Uniprocessor Processes in a Multiprocessor Processes in a Distributed System
– Why need to schedule– Scheduling priorities– How to schedule– Scheduling algorithms
7 March 2001 CS-551, Lecture 6 11
Distributed Scheduling
Basically resource management Want to distribute processing load among
the processing elements in order to maximize performance
Consider having several homogeneous processing elements on a LAN with equal average workloads– Workload may still not be evenly distributed– Some PEs may have idle cycles
7 March 2001 CS-551, Lecture 6 12
Efficiency Metrics Communication cost
– Low if very little or no communication required– Low if all communicating processes
on same PE not distant (small number of hops)
Execution cost– Relative speed of PE– Relative location of needed resources– Type of
operating system machine code architecture
7 March 2001 CS-551, Lecture 6 13
Efficiency Metrics, continued
Resource Utilization– May be based upon
Current PE loads Load status state Resource queue lengths Memory usage Other resource availability
7 March 2001 CS-551, Lecture 6 14
Level of Scheduling
When to run process locally or to send it to an idle PE?
Local Scheduling– Allocate process to local PE– Review Galli, Chapter 2, for more information
Global Scheduling– Choose which PE executes which process– Also called process allocation– Precedes local scheduling decision
7 March 2001 CS-551, Lecture 6 15
Figure 7.1 Scheduling Decision Chart. (Galli,p.152)
7 March 2001 CS-551, Lecture 6 16
Distribution Goals
Load Balancing– Tries to maintain an equal load throughout
system Load Sharing
– Simpler– Tries to prevent any PE from becoming too
busy
7 March 2001 CS-551, Lecture 6 17
Load Balancing / Load Sharing
Load Balancing– Try to equalize loads at PEs– Requires more information– More overhead
Load Sharing– Avoid having an idle PE if there is work to do
Anticipating Transfers– Avoid PE idle wait while a task is coming– Get a new task just before PE becomes idle
7 March 2001 CS-551, Lecture 6 18
Figure 7.2 Load Distribution Goals. (Galli,p.153)
7 March 2001 CS-551, Lecture 6 19
Processor Allocation Algorithms
Assume virtually identical PEs Assume PEs fully interconnected Assume processes may spawn children Two strategies
– Non-migratory static binding non-preemptive
– Migratory dynamic binding preemptive
7 March 2001 CS-551, Lecture 6 20
Processor Allocation Strategies
Non-migratory (static binding, non-preemptive)– Transfer before process starts execution– Once assigned to a machine, process stays there
Migratory (dynamic binding, preemptive)– Processes may move after execution begins– Better load balancing– Expensive: must collect and move entire state– More complex algorithms
7 March 2001 CS-551, Lecture 6 21
Efficiency Goals
Optimal– Completion time– Resource Utilization– System Throughput– Any combination thereof
Suboptimal– Suboptimal Approximate– Suboptimal Heuristic
7 March 2001 CS-551, Lecture 6 22
Optimal Scheduling Algorithms
Requires state of all competing processes Scheduler must have access to all related
information Optimization is a hard problem
– Usually NP-Hard for multiple processors Thus, consider
– Suboptimal Approximate solutions– Suboptimal Heuristic solutions
7 March 2001 CS-551, Lecture 6 23
SubOptimal Approximate Solutions Similar to Optimal Scheduling algorithms Try to find good solutions, not perfect
solutions Searches are limited Include intelligent shortcuts
7 March 2001 CS-551, Lecture 6 24
SubOptimal Heuristic Solutions Heuristics
– Employ rules-of-thumb– Employ intuition– May not be provable
Generally considered to work in an acceptable manner
Examples:– If PE has heavy load, don’t give it more to do– Locality of reference for related processes, data
7 March 2001 CS-551, Lecture 6 25
Figure 7.1 Scheduling Decision Chart. (Galli,p.152)
7 March 2001 CS-551, Lecture 6 26
Types of Load Distribution Algs
Static– Decisions are hard-wired in
Dynamic– Use static information to make decisions– Overhead of keeping track of information
Adaptive– A type of dynamic algorithm– May work differently at different loads
7 March 2001 CS-551, Lecture 6 27
Load Distribution Algorithm Issues
Transfer Policy Selection Policy Location Policy Information Policy Stability Sender-initiated versus Receiver-Initiated Symmetrically-Initiated Adaptive Algorithms
7 March 2001 CS-551, Lecture 6 28
Load Dist. Algs. Issues, cont.
Transfer Policy– When it is appropriate to move a task?– If load at sending PE > threshold– If load at receiving PE < threshold
Location Policy– Find a receiver PE– Methods:
Broadcast messages Polling: random, neighbors, recent candidates
7 March 2001 CS-551, Lecture 6 29
Load Dist. Algs. Issues, cont.
Selection Policy– Which task should migrate?– Simple
Select new tasks Non-Preemptive
– Criteria Cost of transfer
– should be covered by reduction in response time
Size of task Number of dependent system calls (use local PE)
7 March 2001 CS-551, Lecture 6 30
Load Dist. Algs. Issues, cont. Information Policy
– What information should be collected? When? From whom? By whom?
– Demand-driven Get info when PE becomes sender or receiver Sender-initiated – senders look for receivers Receiver-initiated – receivers look for senders Symmetrically-initiated – either of above
– Periodic – at fixed time intervals, not adaptive– State-change-driven
Send info about node state (rather than solicit)
7 March 2001 CS-551, Lecture 6 31
Load Dist. Algs. Issues, cont. Stability
– Queuing Theoretic Stable: Sum(arrival load + overhead) < capacity Effective: Using the algorithm gives better performance than
not doing load distribution An effective algorithm cannot be unstable A stable algorithm can be ineffective (overhead)
– Algorithmic Stability E.g. Performing overhead operations, but making no forward
progress E.g. moving a task from PE to PE, only to learn that it
increases the PE workload enough that it needs to be transferred again
7 March 2001 CS-551, Lecture 6 32
Load Dist Algs Issues, concluded Stability
– Queuing Theoretic Stable: Sum(arrival load + overhead) < capacity Effective: Using the algorithm gives better performance than
not doing load distribution An effective algorithm cannot be unstable A stable algorithm can be ineffective (overhead)
– Algorithmic Stability E.g. Performing overhead operations, but making no forward
progress E.g. moving a task from PE to PE, only to learn that it
increases the PE workload enough that it needs to be transferred again
7 March 2001 CS-551, Lecture 6 33
Load Dist Algs: Sender-Initiated Sender PE thinks it is overloaded Transfer Policy
– Threshold (T) based on PE CPU queue length (QL)
Sender: QL > T Receiver: QL < T
Selection Policy– Non-preemptive
Allows only new tasks Long-lived tasks makes this policy worthwhile
7 March 2001 CS-551, Lecture 6 34
Load Dist Algs: Sender-Initiated
Location (3 different policies)– Random
Select a receiver at random– Useless or wasted if destination is loaded
Want to avoid transferring the same task from PE to PE to PE
– Include limit on number of transfers
– Threshold Start polling PEs at random
– If ‘receiver’ found, send task to it– Limit search to ‘Poll-limit’
• If limit hit, keep task on current PE
7 March 2001 CS-551, Lecture 6 35
LDAs: Sender-Initiated
Location (3 different policies, cont.)– Shortest
Poll a random set of PEs– Choose PE with shortest queue length
Only a little better than Threshold Location Policy
– Not worth the additional work
7 March 2001 CS-551, Lecture 6 36
LDAs: Sender-Initiated
Information Policy– Demand-driven
After identifying a sender
Stability– At high load, PE might not find a receiver– Polling will be wasted – Polling increases the load on the system
Could lead to instability
7 March 2001 CS-551, Lecture 6 37
LDAs: Receiver-Initiated
Receiver is trying to find work Transfer Policy
– If local QL < T, try to find a sender Selection Policy
– Non-preemptive But there may not be any
– Worth the effort
7 March 2001 CS-551, Lecture 6 38
LDAs: Receiver-Initiated
Location Policy– Select PE at random– If taking a task does not move that PE’s load
below threshold, take it– If no luck after trying the Poll Limit times,
Wait until another task completed Wait another time period
Information Policy– Demand-driven
7 March 2001 CS-551, Lecture 6 39
LDAs: Receiver-Initiated
Stability– Tends to be stable
At high load, a sender should be found
Problem– Transfers tend to be preemptive
Tasks on sender node have already started
7 March 2001 CS-551, Lecture 6 40
LDAs: Symmetrically-Initiated
Both senders and receivers can search for tasks to transfer
Has both advantages and disadvantages of two previous methods
Above average algorithm– Try to keep load at each PE at acceptable level– Aiming for exact average can cause thrashing
7 March 2001 CS-551, Lecture 6 41
LDAs: Symmetrically-Initiated
Transfer Policy– Each PE
Estimates the average load Sets both an upper and a lower threshold
– Equal distance from any estimate
If load > upper, PE acts as a sender If load < lower, PE acts as a receiver
7 March 2001 CS-551, Lecture 6 42
LDAs: Symmetrically-Initiated Location Policy
– Sender-initiated Sender broadcasts a TooHigh message, sets timeout Receiver sends Accept message, clears timeout,
increases Load value, sets timeout If sender still wants to send when Accept message
comes, sends task If sender gets TooLow message before Accept,
sends task If sender has TooHigh timeout with no Accept
– Average estimate is too low– Broadcasts ChangeAvg message to all PEs
7 March 2001 CS-551, Lecture 6 43
LDAs: Symmetrically-Initiated
Location Policy– Receiver-initiated
Receiver sends TooLow message, sets timeout Rest is converse of sender-initiated algorithm
Selection Policy– Use a reasonable policy
Non-preemptive, if possible Low cost
7 March 2001 CS-551, Lecture 6 44
LDAs: Symmetrically-Initiated
Information Policy– Demand-driven– Determined at each PE– Low overhead
7 March 2001 CS-551, Lecture 6 45
LDAs: Adaptive
Stable: Symmetrically-Initiated– Previous instability was due to too much
polling by the sender– Each PE keeps lists of the other Pes sorted into
three categories Sender overloaded Receiver overloaded Okay
– Each PE has all other Pes receiver list at start
7 March 2001 CS-551, Lecture 6 46
LDAs: Adaptive Transfer Policy
– Based on PE CPU queue length– Low threshold (LT) and high threshold (HT)
Selection Policy– Sender-initiated: only sends new tasks– Receiver-initiated: takes any task
Trying for low cost
Information Policy– Demand-driven – maintains lists
7 March 2001 CS-551, Lecture 6 47
LDAs: Adaptive Location Policy
– Receiver-initiated Order of polling
– Sender’s list – head to tail (new info first)– OK list – tail to head (out-of-date first)– Receiver list (tail to head)
When PE becomes receiver, QL < LT– Starts polling
• If it finds a sender, transfer happens• Else use replies to update lists
– Continues until • It finds a sender• It is no longer a receiver• It hits the Poll Limit
7 March 2001 CS-551, Lecture 6 48
LDAs: Adaptive Notes
– At high loads, activity is sender-initiated, but there sender will soon have an empty receiver list no polling
So it will go to receiver-initiated
– At low loads, receiver-initiated failure But overhead doesn’t matter at low load And lists get updated So sender-initiated should work quickly
7 March 2001 CS-551, Lecture 6 49
Load Scheduling Algorithms (Galli)
Usage Points– Charged for using remote PEs, resources
Graph Theory– Minimum cutset of assignment graph– Maximum flow of graph
Probes– Messages to locate available, appropriate PEs
Scheduling Queues Stochastic Learning
7 March 2001 CS-551, Lecture 6 50
Figure 7.3 Usage Points. (Galli,p.158)
7 March 2001 CS-551, Lecture 6 51
Figure 7.4 Economic Usage Points. (Galli, p.159)
7 March 2001 CS-551, Lecture 6 52
Figure 7.5 Two-Processor Min-Cut Example. (Galli, p.161)
7 March 2001 CS-551, Lecture 6 53
Figure 7.6 A Station with Run Queues and Hints. (Galli, p.164)
7 March 2001 CS-551, Lecture 6 54
CPU Queue Length as Metric PE queue length correlates well with
response time– Easy to measure– Caution:
When accepting new migrating process, increment queue length right away
Perhaps time-out needed in case process never arrives
PE queue length does not correlate well with PE utilization– Daemon to monitor PE utilization: overhead
7 March 2001 CS-551, Lecture 6 55
Election Algorithms
Bully algorithm (Garcia-Molina, 1982) A Ring election algorithm
7 March 2001 CS-551, Lecture 6 56
Bully Algorithm
Each processor has a unique number One processor notices that the leader/server
is missing– Sends messages to all other processes– Requests to be appointed leader– Includes his processor number
Processors with higher (lower) processor numbers can bully the first processor
7 March 2001 CS-551, Lecture 6 57
Figure 7.7 The Bully Algorithm. (Galli, p. 169)
7 March 2001 CS-551, Lecture 6 58
Bully Algorithm, continued
Initial processor need only send messages about election to higher/lower numbered processors
Any processors that respond effectively tell the first processor that they overrule him and that he is out of the running
These processors then start sending election messages to the other top processors
7 March 2001 CS-551, Lecture 6 59
Bully Example
4
4
0
52
3
1
02
3
1
5
2 calls election
3, 4 respond
7 March 2001 CS-551, Lecture 6 60
Bully Example, continued
4
4
0
52
3
1
02
3
1
5
3 calls election
4 calls election
7 March 2001 CS-551, Lecture 6 61
Bully Example, concluded
4
4
0
52
3
1
02
3
1
5
4 responds to 3
4 is the new leader
7 March 2001 CS-551, Lecture 6 62
A Ring Election Algorithm
No token Each processor knows successor When a processor notices leader is down,
sends election message to successor If successor is down, sends to next
processor Each sender adds own number to message
7 March 2001 CS-551, Lecture 6 63
Ring Election Algorithm, cont.
First processor eventually receives back the election message containing his number
Election message is changed to coordinator message and resent around ring
The highest processor number in message becomes the new leader
When first processor receives the coordinator message, it is deleted
7 March 2001 CS-551, Lecture 6 64
Ring Election Example
1
0
76
5
4
3
2
3
3,4,5,6,0,1,23,4,5,6,0,1
3,4,5,6,0
3,4,5,6
3,4,5
3,4
7 March 2001 CS-551, Lecture 6 65
Orphan Processes
A child process that is still active after its parent process has terminated prematurely
Can happen with remote procedure calls Wastes resources Can corrupt shared data Can create more processes Three solutions follow
7 March 2001 CS-551, Lecture 6 66
Orphan Cleanup
A process must clean up after itself after a crash– Requires each parent keep list of children– Parent thus has access to family tree– Must be kept in nonvolatile storage– On restart, each family tree member told of
parent process’s death and halts execution Disadvantage: parent overhead
7 March 2001 CS-551, Lecture 6 67
Figure 7.8 Orphan Cleanup Family Trees. (Galli, p.170)
7 March 2001 CS-551, Lecture 6 68
Child Process Allowance
All child processes receive a finite time allowance
If no time left, child must request more time from parent
If parent has terminated prematurely, child’s request goes unanswered
With no time allowance, child process dies Requires more communication Slows execution of child processes
7 March 2001 CS-551, Lecture 6 69
Figure 7.9 Child Process Allowance. (Galli, p.172)
7 March 2001 CS-551, Lecture 6 70
Process Version Numbers
Each process must keep track of a version number for its parent
After a system crash, the entire distributed system is assigned a new version number
Child forced to terminate if version number is out-of-date
Child may try to find parent– Terminates if unsuccessful
Requires a lot of communication
7 March 2001 CS-551, Lecture 6 71
Figure 7.10 Process Version Numbers. (Galli, p.174)