Distributed Operating Systems CS551

Distributed Operating SystemsCS551

Colorado State University

at Lockheed-Martin

Lecture 6 -- Spring 2001

7 March 2001 CS-551, Lecture 6 2

CS551: Lecture 6

Topics– Distributed Process Management (Chapter 7)

Distributed Scheduling Algorithm Choices Scheduling Algorithm Approaches Coordinator Elections Orphan Processes

– Distributed File Systems (Chapter 8) Distributed Name Service Distributed File Service Distributed Directory Service


Distributed Deadlock Prevention

Assign each process a global timestamp when it starts

No two processes should have same timestamp

Basic idea: “When one process is about to block waiting for a resource that another process is using, a check is made to see which has a larger timestamp (i.e. is younger).” Tanenbaum, DOS (1995)



Somehow put timestamps on each process, representing creation time of process

Suppose a process needs a resource already owned by another process

Determine relative ages of both processes Decide if waiting process should Preempt,

Wait, Die, or Wound owning process Two different algorithms



Allow wait only if waiting process is older– Since timestamps increase in any chain of

waiting processes, cycles are impossible Or allow wait only if waiting process is

younger– Here timestamps decrease in any chain of

waiting process, so cycles are again impossible Wiser to give older processes priority


Example: wait-die algorithm

54 79

79 54

WaitsHolds resourceWants resource

Wants resource Holds resource

Dies


Example: wound-wait algorithm

54 79

79 54

PreemptsHolds resourceWants resource

Wants resource Holds resource

Waits


Algorithm Comparison

Wait-die kills young process– When young process restarts and requests

resource again, it is killed once more– Less efficient of these two algorithms

Wound-wait preempts young process– When young process re-requests resource, it

has to wait for older process to finish– Better of the two algorithms


Figure 7.7 The Bully Algorithm. (Galli, p. 169)


Process Management in a Distributed Environment Processes in a Uniprocessor Processes in a Multiprocessor Processes in a Distributed System

– Why need to schedule– Scheduling priorities– How to schedule– Scheduling algorithms


Distributed Scheduling

Basically resource management Want to distribute processing load among

the processing elements in order to maximize performance

Consider having several homogeneous processing elements on a LAN with equal average workloads– Workload may still not be evenly distributed– Some PEs may have idle cycles


Efficiency Metrics Communication cost

– Low if very little or no communication required– Low if all communicating processes

on same PE not distant (small number of hops)

Execution cost– Relative speed of PE– Relative location of needed resources– Type of

operating system machine code architecture


Efficiency Metrics, continued

Resource Utilization– May be based upon

Current PE loads Load status state Resource queue lengths Memory usage Other resource availability


Level of Scheduling

When to run process locally or to send it to an idle PE?

Local Scheduling– Allocate process to local PE– Review Galli, Chapter 2, for more information

Global Scheduling– Choose which PE executes which process– Also called process allocation– Precedes local scheduling decision


Figure 7.1 Scheduling Decision Chart. (Galli,p.152)


Distribution Goals

Load Balancing– Tries to maintain an equal load throughout

system Load Sharing

– Simpler– Tries to prevent any PE from becoming too

busy


Load Balancing / Load Sharing

Load Balancing– Try to equalize loads at PEs– Requires more information– More overhead

Load Sharing– Avoid having an idle PE if there is work to do

Anticipating Transfers– Avoid PE idle wait while a task is coming– Get a new task just before PE becomes idle


Figure 7.2 Load Distribution Goals. (Galli,p.153)


Processor Allocation Algorithms

Assume virtually identical PEs Assume PEs fully interconnected Assume processes may spawn children Two strategies

– Non-migratory static binding non-preemptive

– Migratory dynamic binding preemptive


Processor Allocation Strategies

Non-migratory (static binding, non-preemptive)– Transfer before process starts execution– Once assigned to a machine, process stays there

Migratory (dynamic binding, preemptive)– Processes may move after execution begins– Better load balancing– Expensive: must collect and move entire state– More complex algorithms


Efficiency Goals

Optimal– Completion time– Resource Utilization– System Throughput– Any combination thereof

Suboptimal– Suboptimal Approximate– Suboptimal Heuristic


Optimal Scheduling Algorithms

Requires state of all competing processes Scheduler must have access to all related

information Optimization is a hard problem

– Usually NP-Hard for multiple processors Thus, consider

– Suboptimal Approximate solutions– Suboptimal Heuristic solutions


SubOptimal Approximate Solutions Similar to Optimal Scheduling algorithms Try to find good solutions, not perfect

solutions Searches are limited Include intelligent shortcuts


SubOptimal Heuristic Solutions Heuristics

– Employ rules-of-thumb– Employ intuition– May not be provable

Generally considered to work in an acceptable manner

Examples:– If PE has heavy load, don’t give it more to do– Locality of reference for related processes, data


Figure 7.1 Scheduling Decision Chart. (Galli,p.152)


Types of Load Distribution Algs

Static– Decisions are hard-wired in

Dynamic– Use static information to make decisions– Overhead of keeping track of information

Adaptive– A type of dynamic algorithm– May work differently at different loads


Load Distribution Algorithm Issues

Transfer Policy Selection Policy Location Policy Information Policy Stability Sender-initiated versus Receiver-Initiated Symmetrically-Initiated Adaptive Algorithms


Load Dist. Algs. Issues, cont.

Transfer Policy– When it is appropriate to move a task?– If load at sending PE > threshold– If load at receiving PE < threshold

Location Policy– Find a receiver PE– Methods:

Broadcast messages Polling: random, neighbors, recent candidates


Load Dist. Algs. Issues, cont.

Selection Policy– Which task should migrate?– Simple

Select new tasks Non-Preemptive

– Criteria Cost of transfer

– should be covered by reduction in response time

Size of task Number of dependent system calls (use local PE)


Load Dist. Algs. Issues, cont. Information Policy

– What information should be collected? When? From whom? By whom?

– Demand-driven Get info when PE becomes sender or receiver Sender-initiated – senders look for receivers Receiver-initiated – receivers look for senders Symmetrically-initiated – either of above

– Periodic – at fixed time intervals, not adaptive– State-change-driven

Send info about node state (rather than solicit)


Load Dist. Algs. Issues, cont. Stability

– Queuing Theoretic Stable: Sum(arrival load + overhead) < capacity Effective: Using the algorithm gives better performance than

not doing load distribution An effective algorithm cannot be unstable A stable algorithm can be ineffective (overhead)

– Algorithmic Stability E.g. Performing overhead operations, but making no forward

progress E.g. moving a task from PE to PE, only to learn that it

increases the PE workload enough that it needs to be transferred again


Load Dist Algs Issues, concluded Stability

– Queuing Theoretic Stable: Sum(arrival load + overhead) < capacity Effective: Using the algorithm gives better performance than

not doing load distribution An effective algorithm cannot be unstable A stable algorithm can be ineffective (overhead)

– Algorithmic Stability E.g. Performing overhead operations, but making no forward

progress E.g. moving a task from PE to PE, only to learn that it

increases the PE workload enough that it needs to be transferred again


Load Dist Algs: Sender-Initiated Sender PE thinks it is overloaded Transfer Policy

– Threshold (T) based on PE CPU queue length (QL)

Sender: QL > T Receiver: QL < T

Selection Policy– Non-preemptive

Allows only new tasks Long-lived tasks makes this policy worthwhile


Load Dist Algs: Sender-Initiated

Location (3 different policies)– Random

Select a receiver at random– Useless or wasted if destination is loaded

Want to avoid transferring the same task from PE to PE to PE

– Include limit on number of transfers

– Threshold Start polling PEs at random

– If ‘receiver’ found, send task to it– Limit search to ‘Poll-limit’

• If limit hit, keep task on current PE


LDAs: Sender-Initiated

Location (3 different policies, cont.)– Shortest

Poll a random set of PEs– Choose PE with shortest queue length

Only a little better than Threshold Location Policy

– Not worth the additional work


LDAs: Sender-Initiated

Information Policy– Demand-driven

After identifying a sender

Stability– At high load, PE might not find a receiver– Polling will be wasted – Polling increases the load on the system

Could lead to instability


LDAs: Receiver-Initiated

Receiver is trying to find work Transfer Policy

– If local QL < T, try to find a sender Selection Policy

– Non-preemptive But there may not be any

– Worth the effort



Location Policy– Select PE at random– If taking a task does not move that PE’s load

below threshold, take it– If no luck after trying the Poll Limit times,

Wait until another task completed Wait another time period

Information Policy– Demand-driven



Stability– Tends to be stable

At high load, a sender should be found

Problem– Transfers tend to be preemptive

Tasks on sender node have already started


LDAs: Symmetrically-Initiated

Both senders and receivers can search for tasks to transfer

Has both advantages and disadvantages of two previous methods

Above average algorithm– Try to keep load at each PE at acceptable level– Aiming for exact average can cause thrashing



Transfer Policy– Each PE

Estimates the average load Sets both an upper and a lower threshold

– Equal distance from any estimate

If load > upper, PE acts as a sender If load < lower, PE acts as a receiver


LDAs: Symmetrically-Initiated Location Policy

– Sender-initiated Sender broadcasts a TooHigh message, sets timeout Receiver sends Accept message, clears timeout,

increases Load value, sets timeout If sender still wants to send when Accept message

comes, sends task If sender gets TooLow message before Accept,

sends task If sender has TooHigh timeout with no Accept

– Average estimate is too low– Broadcasts ChangeAvg message to all PEs



Location Policy– Receiver-initiated

Receiver sends TooLow message, sets timeout Rest is converse of sender-initiated algorithm

Selection Policy– Use a reasonable policy

Non-preemptive, if possible Low cost



Information Policy– Demand-driven– Determined at each PE– Low overhead


LDAs: Adaptive

Stable: Symmetrically-Initiated– Previous instability was due to too much

polling by the sender– Each PE keeps lists of the other Pes sorted into

three categories Sender overloaded Receiver overloaded Okay

– Each PE has all other Pes receiver list at start


LDAs: Adaptive Transfer Policy

– Based on PE CPU queue length– Low threshold (LT) and high threshold (HT)

Selection Policy– Sender-initiated: only sends new tasks– Receiver-initiated: takes any task

Trying for low cost

Information Policy– Demand-driven – maintains lists


LDAs: Adaptive Location Policy

– Receiver-initiated Order of polling

– Sender’s list – head to tail (new info first)– OK list – tail to head (out-of-date first)– Receiver list (tail to head)

When PE becomes receiver, QL < LT– Starts polling

• If it finds a sender, transfer happens• Else use replies to update lists

– Continues until • It finds a sender• It is no longer a receiver• It hits the Poll Limit


LDAs: Adaptive Notes

– At high loads, activity is sender-initiated, but there sender will soon have an empty receiver list no polling

So it will go to receiver-initiated

– At low loads, receiver-initiated failure But overhead doesn’t matter at low load And lists get updated So sender-initiated should work quickly


Load Scheduling Algorithms (Galli)

Usage Points– Charged for using remote PEs, resources

Graph Theory– Minimum cutset of assignment graph– Maximum flow of graph

Probes– Messages to locate available, appropriate PEs

Scheduling Queues Stochastic Learning


Figure 7.3 Usage Points. (Galli,p.158)


Figure 7.4 Economic Usage Points. (Galli, p.159)


Figure 7.5 Two-Processor Min-Cut Example. (Galli, p.161)


Figure 7.6 A Station with Run Queues and Hints. (Galli, p.164)


CPU Queue Length as Metric PE queue length correlates well with

response time– Easy to measure– Caution:

When accepting new migrating process, increment queue length right away

Perhaps time-out needed in case process never arrives

PE queue length does not correlate well with PE utilization– Daemon to monitor PE utilization: overhead


Election Algorithms

Bully algorithm (Garcia-Molina, 1982) A Ring election algorithm


Bully Algorithm

Each processor has a unique number One processor notices that the leader/server

is missing– Sends messages to all other processes– Requests to be appointed leader– Includes his processor number

Processors with higher (lower) processor numbers can bully the first processor


Figure 7.7 The Bully Algorithm. (Galli, p. 169)


Bully Algorithm, continued

Initial processor need only send messages about election to higher/lower numbered processors

Any processors that respond effectively tell the first processor that they overrule him and that he is out of the running

These processors then start sending election messages to the other top processors


Bully Example

4

4

0

52

3

1

02

3

1

5

2 calls election

3, 4 respond


Bully Example, continued

4

4

0

52

3

1

02

3

1

5

3 calls election

4 calls election


Bully Example, concluded

4

4

0

52

3

1

02

3

1

5

4 responds to 3

4 is the new leader


A Ring Election Algorithm

No token Each processor knows successor When a processor notices leader is down,

sends election message to successor If successor is down, sends to next

processor Each sender adds own number to message


Ring Election Algorithm, cont.

First processor eventually receives back the election message containing his number

Election message is changed to coordinator message and resent around ring

The highest processor number in message becomes the new leader

When first processor receives the coordinator message, it is deleted


Ring Election Example

1

0

76

5

4

3

2

3

3,4,5,6,0,1,23,4,5,6,0,1

3,4,5,6,0

3,4,5,6

3,4,5

3,4


Orphan Processes

A child process that is still active after its parent process has terminated prematurely

Can happen with remote procedure calls Wastes resources Can corrupt shared data Can create more processes Three solutions follow


Orphan Cleanup

A process must clean up after itself after a crash– Requires each parent keep list of children– Parent thus has access to family tree– Must be kept in nonvolatile storage– On restart, each family tree member told of

parent process’s death and halts execution Disadvantage: parent overhead


Figure 7.8 Orphan Cleanup Family Trees. (Galli, p.170)


Child Process Allowance

All child processes receive a finite time allowance

If no time left, child must request more time from parent

If parent has terminated prematurely, child’s request goes unanswered

With no time allowance, child process dies Requires more communication Slows execution of child processes


Figure 7.9 Child Process Allowance. (Galli, p.172)


Process Version Numbers

Each process must keep track of a version number for its parent

After a system crash, the entire distributed system is assigned a new version number

Child forced to terminate if version number is out-of-date

Child may try to find parent– Terminates if unsuccessful

Requires a lot of communication


Figure 7.10 Process Version Numbers. (Galli, p.174)

Date post:	20-Jan-2016
Category:	Documents
Upload:	cais
View:	27 times
Download:	0 times

Distributed Operating Systems CS551

Documents