+ All Categories
Home > Documents > Distributed Operating Systems CS551

Distributed Operating Systems CS551

Date post: 20-Jan-2016
Category:
Upload: cais
View: 27 times
Download: 0 times
Share this document with a friend
Description:
Distributed Operating Systems CS551. Colorado State University at Lockheed-Martin Lecture 6 -- Spring 2001. CS551: Lecture 6. Topics Distributed Process Management (Chapter 7) Distributed Scheduling Algorithm Choices Scheduling Algorithm Approaches Coordinator Elections - PowerPoint PPT Presentation
Popular Tags:
71
Distributed Operating Systems CS551 Colorado State University at Lockheed-Martin Lecture 6 -- Spring 2001
Transcript
Page 1: Distributed Operating Systems CS551

Distributed Operating SystemsCS551

Colorado State University

at Lockheed-Martin

Lecture 6 -- Spring 2001

Page 2: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 2

CS551: Lecture 6

Topics– Distributed Process Management (Chapter 7)

Distributed Scheduling Algorithm Choices Scheduling Algorithm Approaches Coordinator Elections Orphan Processes

– Distributed File Systems (Chapter 8) Distributed Name Service Distributed File Service Distributed Directory Service

Page 3: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 3

Distributed Deadlock Prevention

Assign each process a global timestamp when it starts

No two processes should have same timestamp

Basic idea: “When one process is about to block waiting for a resource that another process is using, a check is made to see which has a larger timestamp (i.e. is younger).” Tanenbaum, DOS (1995)

Page 4: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 4

Distributed Deadlock Prevention

Somehow put timestamps on each process, representing creation time of process

Suppose a process needs a resource already owned by another process

Determine relative ages of both processes Decide if waiting process should Preempt,

Wait, Die, or Wound owning process Two different algorithms

Page 5: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 5

Distributed Deadlock Prevention

Allow wait only if waiting process is older– Since timestamps increase in any chain of

waiting processes, cycles are impossible Or allow wait only if waiting process is

younger– Here timestamps decrease in any chain of

waiting process, so cycles are again impossible Wiser to give older processes priority

Page 6: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 6

Example: wait-die algorithm

54 79

79 54

WaitsHolds resourceWants resource

Wants resource Holds resource

Dies

Page 7: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 7

Example: wound-wait algorithm

54 79

79 54

PreemptsHolds resourceWants resource

Wants resource Holds resource

Waits

Page 8: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 8

Algorithm Comparison

Wait-die kills young process– When young process restarts and requests

resource again, it is killed once more– Less efficient of these two algorithms

Wound-wait preempts young process– When young process re-requests resource, it

has to wait for older process to finish– Better of the two algorithms

Page 9: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 9

Figure 7.7 The Bully Algorithm. (Galli, p. 169)

Page 10: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 10

Process Management in a Distributed Environment Processes in a Uniprocessor Processes in a Multiprocessor Processes in a Distributed System

– Why need to schedule– Scheduling priorities– How to schedule– Scheduling algorithms

Page 11: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 11

Distributed Scheduling

Basically resource management Want to distribute processing load among

the processing elements in order to maximize performance

Consider having several homogeneous processing elements on a LAN with equal average workloads– Workload may still not be evenly distributed– Some PEs may have idle cycles

Page 12: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 12

Efficiency Metrics Communication cost

– Low if very little or no communication required– Low if all communicating processes

on same PE not distant (small number of hops)

Execution cost– Relative speed of PE– Relative location of needed resources– Type of

operating system machine code architecture

Page 13: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 13

Efficiency Metrics, continued

Resource Utilization– May be based upon

Current PE loads Load status state Resource queue lengths Memory usage Other resource availability

Page 14: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 14

Level of Scheduling

When to run process locally or to send it to an idle PE?

Local Scheduling– Allocate process to local PE– Review Galli, Chapter 2, for more information

Global Scheduling– Choose which PE executes which process– Also called process allocation– Precedes local scheduling decision

Page 15: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 15

Figure 7.1  Scheduling Decision Chart. (Galli,p.152)

Page 16: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 16

Distribution Goals

Load Balancing– Tries to maintain an equal load throughout

system Load Sharing

– Simpler– Tries to prevent any PE from becoming too

busy

Page 17: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 17

Load Balancing / Load Sharing

Load Balancing– Try to equalize loads at PEs– Requires more information– More overhead

Load Sharing– Avoid having an idle PE if there is work to do

Anticipating Transfers– Avoid PE idle wait while a task is coming– Get a new task just before PE becomes idle

Page 18: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 18

Figure 7.2  Load Distribution Goals. (Galli,p.153)

Page 19: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 19

Processor Allocation Algorithms

Assume virtually identical PEs Assume PEs fully interconnected Assume processes may spawn children Two strategies

– Non-migratory static binding non-preemptive

– Migratory dynamic binding preemptive

Page 20: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 20

Processor Allocation Strategies

Non-migratory (static binding, non-preemptive)– Transfer before process starts execution– Once assigned to a machine, process stays there

Migratory (dynamic binding, preemptive)– Processes may move after execution begins– Better load balancing– Expensive: must collect and move entire state– More complex algorithms

Page 21: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 21

Efficiency Goals

Optimal– Completion time– Resource Utilization– System Throughput– Any combination thereof

Suboptimal– Suboptimal Approximate– Suboptimal Heuristic

Page 22: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 22

Optimal Scheduling Algorithms

Requires state of all competing processes Scheduler must have access to all related

information Optimization is a hard problem

– Usually NP-Hard for multiple processors Thus, consider

– Suboptimal Approximate solutions– Suboptimal Heuristic solutions

Page 23: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 23

SubOptimal Approximate Solutions Similar to Optimal Scheduling algorithms Try to find good solutions, not perfect

solutions Searches are limited Include intelligent shortcuts

Page 24: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 24

SubOptimal Heuristic Solutions Heuristics

– Employ rules-of-thumb– Employ intuition– May not be provable

Generally considered to work in an acceptable manner

Examples:– If PE has heavy load, don’t give it more to do– Locality of reference for related processes, data

Page 25: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 25

Figure 7.1  Scheduling Decision Chart. (Galli,p.152)

Page 26: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 26

Types of Load Distribution Algs

Static– Decisions are hard-wired in

Dynamic– Use static information to make decisions– Overhead of keeping track of information

Adaptive– A type of dynamic algorithm– May work differently at different loads

Page 27: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 27

Load Distribution Algorithm Issues

Transfer Policy Selection Policy Location Policy Information Policy Stability Sender-initiated versus Receiver-Initiated Symmetrically-Initiated Adaptive Algorithms

Page 28: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 28

Load Dist. Algs. Issues, cont.

Transfer Policy– When it is appropriate to move a task?– If load at sending PE > threshold– If load at receiving PE < threshold

Location Policy– Find a receiver PE– Methods:

Broadcast messages Polling: random, neighbors, recent candidates

Page 29: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 29

Load Dist. Algs. Issues, cont.

Selection Policy– Which task should migrate?– Simple

Select new tasks Non-Preemptive

– Criteria Cost of transfer

– should be covered by reduction in response time

Size of task Number of dependent system calls (use local PE)

Page 30: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 30

Load Dist. Algs. Issues, cont. Information Policy

– What information should be collected? When? From whom? By whom?

– Demand-driven Get info when PE becomes sender or receiver Sender-initiated – senders look for receivers Receiver-initiated – receivers look for senders Symmetrically-initiated – either of above

– Periodic – at fixed time intervals, not adaptive– State-change-driven

Send info about node state (rather than solicit)

Page 31: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 31

Load Dist. Algs. Issues, cont. Stability

– Queuing Theoretic Stable: Sum(arrival load + overhead) < capacity Effective: Using the algorithm gives better performance than

not doing load distribution An effective algorithm cannot be unstable A stable algorithm can be ineffective (overhead)

– Algorithmic Stability E.g. Performing overhead operations, but making no forward

progress E.g. moving a task from PE to PE, only to learn that it

increases the PE workload enough that it needs to be transferred again

Page 32: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 32

Load Dist Algs Issues, concluded Stability

– Queuing Theoretic Stable: Sum(arrival load + overhead) < capacity Effective: Using the algorithm gives better performance than

not doing load distribution An effective algorithm cannot be unstable A stable algorithm can be ineffective (overhead)

– Algorithmic Stability E.g. Performing overhead operations, but making no forward

progress E.g. moving a task from PE to PE, only to learn that it

increases the PE workload enough that it needs to be transferred again

Page 33: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 33

Load Dist Algs: Sender-Initiated Sender PE thinks it is overloaded Transfer Policy

– Threshold (T) based on PE CPU queue length (QL)

Sender: QL > T Receiver: QL < T

Selection Policy– Non-preemptive

Allows only new tasks Long-lived tasks makes this policy worthwhile

Page 34: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 34

Load Dist Algs: Sender-Initiated

Location (3 different policies)– Random

Select a receiver at random– Useless or wasted if destination is loaded

Want to avoid transferring the same task from PE to PE to PE

– Include limit on number of transfers

– Threshold Start polling PEs at random

– If ‘receiver’ found, send task to it– Limit search to ‘Poll-limit’

• If limit hit, keep task on current PE

Page 35: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 35

LDAs: Sender-Initiated

Location (3 different policies, cont.)– Shortest

Poll a random set of PEs– Choose PE with shortest queue length

Only a little better than Threshold Location Policy

– Not worth the additional work

Page 36: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 36

LDAs: Sender-Initiated

Information Policy– Demand-driven

After identifying a sender

Stability– At high load, PE might not find a receiver– Polling will be wasted – Polling increases the load on the system

Could lead to instability

Page 37: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 37

LDAs: Receiver-Initiated

Receiver is trying to find work Transfer Policy

– If local QL < T, try to find a sender Selection Policy

– Non-preemptive But there may not be any

– Worth the effort

Page 38: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 38

LDAs: Receiver-Initiated

Location Policy– Select PE at random– If taking a task does not move that PE’s load

below threshold, take it– If no luck after trying the Poll Limit times,

Wait until another task completed Wait another time period

Information Policy– Demand-driven

Page 39: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 39

LDAs: Receiver-Initiated

Stability– Tends to be stable

At high load, a sender should be found

Problem– Transfers tend to be preemptive

Tasks on sender node have already started

Page 40: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 40

LDAs: Symmetrically-Initiated

Both senders and receivers can search for tasks to transfer

Has both advantages and disadvantages of two previous methods

Above average algorithm– Try to keep load at each PE at acceptable level– Aiming for exact average can cause thrashing

Page 41: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 41

LDAs: Symmetrically-Initiated

Transfer Policy– Each PE

Estimates the average load Sets both an upper and a lower threshold

– Equal distance from any estimate

If load > upper, PE acts as a sender If load < lower, PE acts as a receiver

Page 42: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 42

LDAs: Symmetrically-Initiated Location Policy

– Sender-initiated Sender broadcasts a TooHigh message, sets timeout Receiver sends Accept message, clears timeout,

increases Load value, sets timeout If sender still wants to send when Accept message

comes, sends task If sender gets TooLow message before Accept,

sends task If sender has TooHigh timeout with no Accept

– Average estimate is too low– Broadcasts ChangeAvg message to all PEs

Page 43: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 43

LDAs: Symmetrically-Initiated

Location Policy– Receiver-initiated

Receiver sends TooLow message, sets timeout Rest is converse of sender-initiated algorithm

Selection Policy– Use a reasonable policy

Non-preemptive, if possible Low cost

Page 44: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 44

LDAs: Symmetrically-Initiated

Information Policy– Demand-driven– Determined at each PE– Low overhead

Page 45: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 45

LDAs: Adaptive

Stable: Symmetrically-Initiated– Previous instability was due to too much

polling by the sender– Each PE keeps lists of the other Pes sorted into

three categories Sender overloaded Receiver overloaded Okay

– Each PE has all other Pes receiver list at start

Page 46: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 46

LDAs: Adaptive Transfer Policy

– Based on PE CPU queue length– Low threshold (LT) and high threshold (HT)

Selection Policy– Sender-initiated: only sends new tasks– Receiver-initiated: takes any task

Trying for low cost

Information Policy– Demand-driven – maintains lists

Page 47: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 47

LDAs: Adaptive Location Policy

– Receiver-initiated Order of polling

– Sender’s list – head to tail (new info first)– OK list – tail to head (out-of-date first)– Receiver list (tail to head)

When PE becomes receiver, QL < LT– Starts polling

• If it finds a sender, transfer happens• Else use replies to update lists

– Continues until • It finds a sender• It is no longer a receiver• It hits the Poll Limit

Page 48: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 48

LDAs: Adaptive Notes

– At high loads, activity is sender-initiated, but there sender will soon have an empty receiver list no polling

So it will go to receiver-initiated

– At low loads, receiver-initiated failure But overhead doesn’t matter at low load And lists get updated So sender-initiated should work quickly

Page 49: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 49

Load Scheduling Algorithms (Galli)

Usage Points– Charged for using remote PEs, resources

Graph Theory– Minimum cutset of assignment graph– Maximum flow of graph

Probes– Messages to locate available, appropriate PEs

Scheduling Queues Stochastic Learning

Page 50: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 50

Figure 7.3  Usage Points. (Galli,p.158)

Page 51: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 51

Figure 7.4  Economic Usage Points. (Galli, p.159)

Page 52: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 52

Figure 7.5 Two-Processor Min-Cut Example. (Galli, p.161)

Page 53: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 53

Figure 7.6  A Station with Run Queues and Hints. (Galli, p.164)

Page 54: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 54

CPU Queue Length as Metric PE queue length correlates well with

response time– Easy to measure– Caution:

When accepting new migrating process, increment queue length right away

Perhaps time-out needed in case process never arrives

PE queue length does not correlate well with PE utilization– Daemon to monitor PE utilization: overhead

Page 55: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 55

Election Algorithms

Bully algorithm (Garcia-Molina, 1982) A Ring election algorithm

Page 56: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 56

Bully Algorithm

Each processor has a unique number One processor notices that the leader/server

is missing– Sends messages to all other processes– Requests to be appointed leader– Includes his processor number

Processors with higher (lower) processor numbers can bully the first processor

Page 57: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 57

Figure 7.7 The Bully Algorithm. (Galli, p. 169)

Page 58: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 58

Bully Algorithm, continued

Initial processor need only send messages about election to higher/lower numbered processors

Any processors that respond effectively tell the first processor that they overrule him and that he is out of the running

These processors then start sending election messages to the other top processors

Page 59: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 59

Bully Example

4

4

0

52

3

1

02

3

1

5

2 calls election

3, 4 respond

Page 60: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 60

Bully Example, continued

4

4

0

52

3

1

02

3

1

5

3 calls election

4 calls election

Page 61: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 61

Bully Example, concluded

4

4

0

52

3

1

02

3

1

5

4 responds to 3

4 is the new leader

Page 62: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 62

A Ring Election Algorithm

No token Each processor knows successor When a processor notices leader is down,

sends election message to successor If successor is down, sends to next

processor Each sender adds own number to message

Page 63: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 63

Ring Election Algorithm, cont.

First processor eventually receives back the election message containing his number

Election message is changed to coordinator message and resent around ring

The highest processor number in message becomes the new leader

When first processor receives the coordinator message, it is deleted

Page 64: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 64

Ring Election Example

1

0

76

5

4

3

2

3

3,4,5,6,0,1,23,4,5,6,0,1

3,4,5,6,0

3,4,5,6

3,4,5

3,4

Page 65: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 65

Orphan Processes

A child process that is still active after its parent process has terminated prematurely

Can happen with remote procedure calls Wastes resources Can corrupt shared data Can create more processes Three solutions follow

Page 66: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 66

Orphan Cleanup

A process must clean up after itself after a crash– Requires each parent keep list of children– Parent thus has access to family tree– Must be kept in nonvolatile storage– On restart, each family tree member told of

parent process’s death and halts execution Disadvantage: parent overhead

Page 67: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 67

Figure 7.8 Orphan Cleanup Family Trees. (Galli, p.170)

Page 68: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 68

Child Process Allowance

All child processes receive a finite time allowance

If no time left, child must request more time from parent

If parent has terminated prematurely, child’s request goes unanswered

With no time allowance, child process dies Requires more communication Slows execution of child processes

Page 69: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 69

Figure 7.9 Child Process Allowance. (Galli, p.172)

Page 70: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 70

Process Version Numbers

Each process must keep track of a version number for its parent

After a system crash, the entire distributed system is assigned a new version number

Child forced to terminate if version number is out-of-date

Child may try to find parent– Terminates if unsuccessful

Requires a lot of communication

Page 71: Distributed Operating Systems CS551

7 March 2001 CS-551, Lecture 6 71

Figure 7.10  Process Version Numbers. (Galli, p.174)


Recommended