+ All Categories
Home > Education > 10. resource management

10. resource management

Date post: 06-May-2015
Category:
Upload: dr-sandeep-kumar-poonia
View: 204 times
Download: 0 times
Share this document with a friend
Description:
Introduction DesirableFeaturesofGoodGlobalSchedulingAlgorithm TaskAssignmentApproach LoadBalancingApproach LoadSharingApproach
67
Sandeep Kumar Poonia Head of Dept. CS/IT B.E., M.Tech., UGC-NET LM-IAENG, LM-IACSIT,LM-CSTA, LM-AIRCC, LM-SCIEI, AM-UACEE
Transcript
Page 1: 10. resource management

Sandeep Kumar PooniaHead of Dept. CS/IT

B.E., M.Tech., UGC-NET

LM-IAENG, LM-IACSIT,LM-CSTA, LM-AIRCC, LM-SCIEI, AM-UACEE

Page 2: 10. resource management

Resource Management

Introduction

Desirable Features of Good Global

Scheduling Algorithm

Task Assignment Approach

Load Balancing Approach

Load Sharing Approach

11/10/2013 2Sandeep Kumar Poonia

Page 3: 10. resource management

Resource Management

A resource can be a logical, such as a shared file, or

physical, such as a CPU (a node of the distributed

system).

One of the functions of a distributed operating system is to

assign processes to the nodes (resources) of the

distributed system such that the resource usage, response

time, network congestion, and scheduling overhead are

optimized.

11/10/2013 3Sandeep Kumar Poonia

Page 4: 10. resource management

Resource Management

Scheduling Techniques in DS

Task Assignment Approach

Load Balancing Approach

Load Sharing Approach

11/10/2013 4Sandeep Kumar Poonia

Page 5: 10. resource management

Resource Management

Task Assignment Approach, in which

each process submitted by a user for

processing is viewed as a collection of

related tasks and these tasks are

scheduled to suitable nodes so as to

improve performance.

11/10/2013 5Sandeep Kumar Poonia

Page 6: 10. resource management

Resource Management

Load-balancing approach, in which all

the processes submitted by the users are

distributed among the nodes of the

system so as to equalize the workload

among the nodes.

11/10/2013 6Sandeep Kumar Poonia

Page 7: 10. resource management

Resource Management

Load-sharing approach, which simply

attempts to conserve the ability of the

system to perform work by assuring that

no node is idle while processes wait for

being processed.

11/10/2013 7Sandeep Kumar Poonia

Page 8: 10. resource management

Resource Management

No A Priori Knowledge about the Processes

Dynamic in Nature

Quick Decision-making Capability

Balanced System Performance and Scheduling Overhead

Stability

Scalability

Fault Tolerance

Fairness of Service

11/10/2013 8Sandeep Kumar Poonia

Page 9: 10. resource management

Resource Management

No a priori knowledge about the processesScheduling algorithms that operate based on the informationabout the characteristics and resource requirements of theprocesses pose an extra burden on the users who mustprovide this information while submitting their processes forexecution.

Dynamic in natureProcess assignment decisions should be dynamic, I.e., bebased on the current load of the system and not on somestatic policy. It is recommended that the scheduling algorithmpossess the flexibility to migrate a process more than oncebecause the initial decision of placing a process on aparticular node may have to be changed after some time toadapt to the new system load.

11/10/2013 9Sandeep Kumar Poonia

Page 10: 10. resource management

Resource Management Quick decision making capability

Heuristic methods requiring less computational efforts(and hence less time) while providing near-optimalresults are preferable to exhaustive (optimal) solutionmethods.

Balanced system performance andscheduling overheadAlgorithms that provide near-optimal systemperformance with a minimum of global state information(such as CPU load) gathering overhead are desirable.This is because the overhead increases as the amount ofglobal state information collected increases. This isbecause the usefulness of that information is decreaseddue to both the aging of the information being gatheredand the low scheduling frequency as a result of the cost ofgathering and processing the extra information.

11/10/2013 10Sandeep Kumar Poonia

Page 11: 10. resource management

Resource Management

StabilityFruitless migration of processes, known as processorthrashing, must be prevented. E.g. if nodes n1 and n2observe that node n3 is idle and then offload a portionof their work to n3 without being aware of theoffloading decision made by the other node. Now ifn3 becomes overloaded due to this it may again starttransferring its processes to other nodes. This iscaused by scheduling decisions being made at eachnode independently of decisions made by othernodes.

11/10/2013 11Sandeep Kumar Poonia

Page 12: 10. resource management

Resource Management

ScalabilityA scheduling algorithm should scale well as the numberof nodes increases. An algorithm that makes schedulingdecisions by first inquiring the workload from all thenodes and then selecting the most lightly loaded node haspoor scalability. This will work fine only when there arefew nodes in the system. This is because the inquirerreceives a flood of replies almost simultaneously, and thetime required to process the reply messages for making anode selection is too long as the number of nodes (N)increase. Also the network traffic quickly consumesnetwork bandwidth. A simple approach is to probe only mof N nodes for selecting a node.

11/10/2013 12Sandeep Kumar Poonia

Page 13: 10. resource management

Resource Management

Fault tolerance

A good scheduling algorithm should not be disabled

by the crash of one or more nodes of the system.Also, if the nodes are partitioned into two or moregroups due to link failures, the algorithm should becapable of functioning properly for the nodes within agroup. Algorithms that have decentralized decisionmaking capability and consider only available nodesin their decision making have better fault tolerancecapability.

11/10/2013 13Sandeep Kumar Poonia

Page 14: 10. resource management

Resource Management

Fairness of serviceGlobal scheduling policies that blindly attempt to balancethe load on all the nodes of the system are not good fromthe point of view of fairness of service. This is because inany load-balancing scheme, heavily loaded nodes willobtain all the benefits while lightly loaded nodes willsuffer poorer response time than in a stand-aloneconfiguration. A fair strategy that improves response timeof the former without unduly affecting the latter isdesirable. Hence load-balancing has to be replaced by theconcept of load sharing, that is, a node will share someof its resources as long as its users are not significantlyaffected.

11/10/2013 14Sandeep Kumar Poonia

Page 15: 10. resource management

Resource Management

Assumptions:1. A process has already been split up into pieces called

tasks. This split occurs along natural boundaries (such as amethod), so that each task will have integrity in itself anddata transfers among the tasks are minimized.

2. The amount of computation required by each task and thespeed of each CPU are known.

3. The cost of processing each task on every node is known.This is derived from assumption 2.

4. The IPC costs between every pair of tasks is known. TheIPC cost is 0 for tasks assigned to the same node. This isusually estimated by an analysis of the static program. If twotasks communicate n times and the average time for eachinter-task communication is t, them IPC costs for the twotasks is n * t.

5. Precedence relationships among the tasks are known.6. Reassignment of tasks is not possible.

11/10/2013 15Sandeep Kumar Poonia

Page 16: 10. resource management

Resource Management

◦ Minimization of IPC costs

◦ Quick turnaround time for the complete process

◦ A high degree of parallelism

◦ Efficient utilization of system resources in general

These goals often conflict. E.g., while minimizing IPC costs tends to assign alltasks of a process to a single node, efficient utilization of system resourcestries to distribute the tasks evenly among the nodes. So also, quick turnaroundtime and a high degree of parallelism encourage parallel execution of thetasks, the precedence relationship among the tasks limits their parallelexecution.

Also note that in case of m tasks and q nodes, there are mq possibleassignments of tasks to nodes . In practice, however, the actual number ofpossible assignments of tasks to nodes may be less than mq due to therestriction that certain tasks cannot be assigned to certain nodes due to theirspecific requirements (e.g. need a certain amount of memory or a certain datafile).

11/10/2013 16Sandeep Kumar Poonia

Page 17: 10. resource management

Resource Management

There are two nodes, {n1, n2} and six tasks {t1, t2, t3, t4, t5, t6}. There are two task assignment parameters – the task execution cost (xab the cost of executing task a on node b) and the inter-task communication cost (cij the inter-task communication cost between tasks i and j).

Inter-task communication cost Execution costs

t1 t2 t3 t4 t5 t6 Nodes

t1 0 6 4 0 0 12 n1 n2

t2 6 0 8 12 3 0 t1 5 10

t3 4 8 0 0 11 0 t2 2

t4 0 12 0 0 5 0 t3 4 4

t5 0 3 11 5 0 0 t4 6 3

t6 12 0 0 0 0 0 t5 5 2

t6 4

Task t6 cannot be executed on node n1 and task t2 cannot be executed on node n2 since the resources they need are not available on these nodes.

11/10/2013 17Sandeep Kumar Poonia

Page 18: 10. resource management

Resource Management

1) Serial assignment, where tasks t1, t2, t3 are assigned to node n1 and tasks t4, t5, t6 are assigned to node n2:

Execution cost, x = x11 + x21 + x31 + x42 + x52 + x62

= 5 + 2 + 4 + 3 + 2 + 4 = 20

Communication cost, c = c14 + c15 + c16 + c24 + c25 + c26

+ c34 + c35 + c36 = 0 + 0 + 12 + 12 + 3 + 0 + 0 + 11 + 0 = 38. Hence total cost = 58.

2) Optimal assignment, where tasks t1, t2, t3, t4, t5 are assigned to node n1 and task t6 is assigned to node n2. Execution cost, x = x11 + x21 + x31 + x41 + x51 + x62

= 5 + 2 + 4 + 6 + 5 + 4 = 26

Communication cost, c = c16 + c26 + c36 + c46 + c56

= 12 + 0 + 0 + 0 + 0 = 12

Total cost = 38

11/10/2013 18Sandeep Kumar Poonia

Page 19: 10. resource management

Resource ManagementOptimal assignments are found by first creating a static assignment graph. In this graph,

the weights of the edges joining pairs of task nodes represent inter-task communication

costs. The weight on the edge joining a task node to node n1 represents the execution

cost of that task on node n2 and vice-versa. Then we determine a minimum cutset in

this graph.

A cutset is defined to be a set of edges such that when these edges are removed, the nodes

of the graph are partitioned into two disjoint subsets such that nodes in one subset are

reachable from n1 and the nodes in the other are reachable from n2. Each task node is

reachable from either n1 or n2. The weight of a cutset is the sum of the weights of the

edges in the cutset. This sums up the execution and communication costs for that

assignment. An optimal assignment is found by finding a minimum cutset.

11/10/2013 19Sandeep Kumar Poonia

Page 20: 10. resource management

Resource Management

11/10/2013 20Sandeep Kumar Poonia

Page 21: 10. resource management

Resource Management

21

A Taxonomy of Load-Balancing Algorithms

Load-balancing algorithms

DynamicStatic

Deterministic Probabilistic Centralized Distributed

Cooperative Noncooperative

11/10/2013Sandeep Kumar Poonia

Page 22: 10. resource management

Resource Management

22

Static versus Dynamic

◦ Static algorithms use only information about the

average behavior of the system

◦ Static algorithms ignore the current state or load of

the nodes in the system

◦ Dynamic algorithms collect state information and

react to system state if it changed

◦ Static algorithms are much more simpler

◦ Dynamic algorithms are able to give significantly

better performance

11/10/2013Sandeep Kumar Poonia

Page 23: 10. resource management

Resource Management

23

Deterministic versus Probabilistic

◦ Deterministic algorithms use the information about

the properties of the nodes and the characteristic of

processes to be scheduled

◦ Probabilistic algorithms use information of static

attributes of the system (e.g. number of nodes,

processing capability, topology) to formulate simple

process placement rules

◦ Deterministic approach is difficult to optimize

Probabilistic approach has poor performance

11/10/2013Sandeep Kumar Poonia

Page 24: 10. resource management

Resource Management

24

Centralized versus Distributed

◦ Centralized approach collects information to server

node and makes assignment decision

◦ Distributed approach contains entities to make

decisions on a predefined set of nodes

◦ Centralized algorithms can make efficient

decisions, have lower fault-tolerance

◦ Distributed algorithms avoid the bottleneck of

collecting state information and react faster

11/10/2013Sandeep Kumar Poonia

Page 25: 10. resource management

Resource Management

25

Cooperative versus Noncooperative

◦ In Noncooperative algorithms entities act as

autonomous ones and make scheduling decisions

independently from other entities

◦ In Cooperative algorithms distributed entities

cooperate with each other

◦ Cooperative algorithms are more complex and

involve larger overhead

◦ Stability of Cooperative algorithms are better

11/10/2013Sandeep Kumar Poonia

Page 26: 10. resource management

Resource Management

26

Load estimation policy

◦ determines how to estimate the workload of a node

Process transfer policy

◦ determines whether to execute a process locally or remote

State information exchange policy

◦ determines how to exchange load information among nodes

Location policy

◦ determines to which node the transferable process should be sent

Priority assignment policy

◦ determines the priority of execution of local and remote processes

Migration limiting policy

◦ determines the total number of times a process can migrate

11/10/2013Sandeep Kumar Poonia

Page 27: 10. resource management

Resource Management

27

To balance the workload on all the nodes of thesystem, it is necessary to decide how to measure theworkload of a particular node

Some measurable parameters (with time and nodedependent factor) can be the following:◦ Total number of processes on the node

◦ Resource demands of these processes

◦ Instruction mixes of these processes

◦ Architecture and speed of the node’s processor

Several load-balancing algorithms use the totalnumber of processes to achieve big efficiency

11/10/2013Sandeep Kumar Poonia

Page 28: 10. resource management

Resource Management

28

In some cases the true load could vary widelydepending on the remaining service time, which canbe measured in several way:◦ Memoryless method assumes that all processes have the

same expected remaining service time, independent of thetime used so far

◦ Pastrepeats assumes that the remaining service time is equalto the time used so far

◦ Distribution method states that if the distribution servicetimes is known, the associated process’s remaining servicetime is the expected remaining time conditioned by the timealready used

11/10/2013Sandeep Kumar Poonia

Page 29: 10. resource management

Resource Management

29

None of the previous methods can be used in modernsystems because of periodically running processesand daemons

An acceptable method for use as the load estimationpolicy in these systems would be to measure the CPUutilization of the nodes

Central Processing Unit utilization is defined as thenumber of CPU cycles actually executed per unit ofreal time

It can be measured by setting up a timer toperiodically check the CPU state (idle/busy)

11/10/2013Sandeep Kumar Poonia

Page 30: 10. resource management

Resource Management

30

Most of the algorithms use the threshold policy to

decide on whether the node is lightly-loaded or

heavily-loaded

Threshold value is a limiting value of the workload of

node which can be determined by

◦ Static policy: predefined threshold value for each node

depending on processing capability

◦ Dynamic policy: threshold value is calculated from average

workload and a predefined constant

Below threshold value node accepts processes to

execute, above threshold value node tries to transfer

processes to a lightly-loaded node11/10/2013Sandeep Kumar Poonia

Page 31: 10. resource management

Resource Management

31

Single-threshold policy may lead to unstable

algorithm because underloaded node could turn to be

overloaded right after a process migration

To reduce instability double-threshold policy has

been proposed which is also known as high-low

policy

Overloaded

Underloaded

Threshold

Single-threshold policy

Overloaded

Normal

Underloaded

Low mark

High mark

Double-threshold policy

11/10/2013Sandeep Kumar Poonia

Page 32: 10. resource management

Resource Management

32

Double threshold policy

◦ When node is in overloaded region new local

processes are sent to run remotely, requests to

accept remote processes are rejected

◦ When node is in normal region new local processes

run locally, requests to accept remote processes are

rejected

◦ When node is in under loaded region new local

processes run locally, requests to accept remote

processes are accepted

11/10/2013Sandeep Kumar Poonia

Page 33: 10. resource management

Resource Management

33

Threshold method◦ Policy selects a random node, checks whether the node is

able to receive the process, then transfers the process. Ifnode rejects, another node is selected randomly. Thiscontinues until probe limit is reached.

Shortest method◦ L distinct nodes are chosen at random, each is polled to

determine its load. The process is transferred to the nodehaving the minimum value unless its workload valueprohibits to accept the process.

◦ Simple improvement is to discontinue probing whenever anode with zero load is encountered.

11/10/2013Sandeep Kumar Poonia

Page 34: 10. resource management

Resource Management

34

Bidding method

◦ Nodes contain managers (to send processes) and contractors

(to receive processes)

◦ Managers broadcast a request for bid, contractors respond

with bids (prices based on capacity of the contractor node)

and manager selects the best offer

◦ Winning contractor is notified and asked whether it accepts

the process for execution or not

◦ Full autonomy for the nodes regarding scheduling

◦ Big communication overhead

◦ Difficult to decide a good pricing policy

11/10/2013Sandeep Kumar Poonia

Page 35: 10. resource management

Resource Management

35

Pairing

◦ Contrary to the former methods the pairing policy is to

reduce the variance of load only between pairs

◦ Each node asks some randomly chosen node to form a pair

with it

◦ If it receives a rejection it randomly selects another node and

tries to pair again

◦ Two nodes that differ greatly in load are temporarily paired

with each other and migration starts

◦ The pair is broken as soon as the migration is over

◦ A node only tries to find a partner if it has at least two

processes

11/10/2013Sandeep Kumar Poonia

Page 36: 10. resource management

Resource Management

36

Dynamic policies require frequent exchange of stateinformation, but these extra messages arise twoopposite impacts:◦ Increasing the number of messages gives more accurate

scheduling decision◦ Increasing the number of messages raises the queuing time

of messages

State information policies can be the following:◦ Periodic broadcast◦ Broadcast when state changes◦ On-demand exchange◦ Exchange by polling

11/10/2013Sandeep Kumar Poonia

Page 37: 10. resource management

Resource Management

37

Periodic broadcast

◦ Each node broadcasts its state information after the elapse of

every T units of time

◦ Problem: heavy traffic, fruitless messages, poor scalability

since information exchange is too large for networks having

many nodes

Broadcast when state changes

◦ Avoids fruitless messages by broadcasting the state only

when a process arrives or departures

◦ Further improvement is to broadcast only when state

switches to another region (double-threshold policy)

11/10/2013Sandeep Kumar Poonia

Page 38: 10. resource management

Resource Management

38

On-demand exchange◦ In this method a node broadcast a State-Information-Request

message when its state switches from normal to eitherunderloaded or overloaded region.

◦ On receiving this message other nodes reply with their ownstate information to the requesting node

◦ Further improvement can be that only those nodes replywhich are useful to the requesting node

Exchange by polling

◦ To avoid poor scalability (coming from broadcast messages)

the partner node is searched by polling the other nodes on by

one, until poll limit is reached

11/10/2013Sandeep Kumar Poonia

Page 39: 10. resource management

Resource Management

39

Selfish

◦ Local processes are given higher priority than remote processes. Worst

response time performance of the three policies.

Altruistic

◦ Remote processes are given higher priority than local processes. Best

response time performance of the three policies.

Intermediate

◦ When the number of local processes is greater or equal to the number of

remote processes, local processes are given higher priority than remote

processes. Otherwise, remote processes are given higher priority than

local processes.

11/10/2013Sandeep Kumar Poonia

Page 40: 10. resource management

Resource Management

40

This policy determines the total number of times aprocess can migrate◦ Uncontrolled A remote process arriving at a node is treated just as a process

originating at a node, so a process may be migrated any number oftimes

◦ Controlled Avoids the instability of the uncontrolled policy

Use a migration count parameter to fix a limit on the number of time aprocess can migrate

Irrevocable migration policy: migration count is fixed to 1

For long execution processes migration count must be greater than 1to adapt for dynamically changing states

11/10/2013Sandeep Kumar Poonia

Page 41: 10. resource management

Resource Management

41

Drawbacks of Load-balancing approach

◦ Load balancing technique with attempting equalizing the workload on

all the nodes is not an appropriate object since big overhead is generated

by gathering exact state information

◦ Load balancing is not achievable since number of processes in a node is

always fluctuating and temporal unbalance among the nodes exists every

moment

Basic ideas for Load-sharing approach

◦ It is necessary and sufficient to prevent nodes from being idle while

some other nodes have more than two processes

◦ Load-sharing is much simpler than load-balancing since it only attempts

to ensure that no node is idle when heavily node exists

◦ Priority assignment policy and migration limiting policy are the same as

that for the load-balancing algorithms

11/10/2013Sandeep Kumar Poonia

Page 42: 10. resource management

Resource Management

42

Since load-sharing algorithms simply attempt to

avoid idle nodes, it is sufficient to know whether a

node is busy or idle

Thus these algorithms normally employ the simplest

load estimation policy of counting the total number of

processes

In modern systems where permanent existence of

several processes on an idle node is possible,

algorithms measure CPU utilization to estimate the

load of a node

11/10/2013Sandeep Kumar Poonia

Page 43: 10. resource management

Resource Management

43

Algorithms normally use all-or-nothing strategy

This strategy uses the threshold value of all thenodes fixed to 1

Nodes become receiver node when it has noprocess, and become sender node when it hasmore than 1 process

To avoid processing power on nodes having zeroprocess load-sharing algorithms use a thresholdvalue of 2 instead of 1

When CPU utilization is used as the load estimationpolicy, the double-threshold policy should be usedas the process transfer policy

11/10/2013Sandeep Kumar Poonia

Page 44: 10. resource management

Resource Management

44

Location policy decides whether the sender node orthe receiver node of the process takes the initiativeto search for suitable node in the system, and thispolicy can be the following:

◦ Sender-initiated location policy Sender node decides where to send the process

Heavily loaded nodes search for lightly loaded nodes

◦ Receiver-initiated location policy Receiver node decides from where to get the process

Lightly loaded nodes search for heavily loaded nodes

11/10/2013Sandeep Kumar Poonia

Page 45: 10. resource management

Resource Management

45

Sender-initiated location policy◦ Node becomes overloaded, it either broadcasts or randomly

probes the other nodes one by one to find a node that isable to receive remote processes

◦ When broadcasting, suitable node is known as soon asreply arrives

Receiver-initiated location policy◦ Nodes becomes underloaded, it either broadcast or

randomly probes the other nodes one by one to indicate itswillingness to receive remote processes

Receiver-initiated policy require preemptiveprocess migration facility since schedulingdecisions are usually made at process departureepochs

11/10/2013Sandeep Kumar Poonia

Page 46: 10. resource management

Resource Management

46

Experiences with location policies◦ Both policies gives substantial performance

advantages over the situation in which no load-sharing is attempted

◦ Sender-initiated policy is preferable at light tomoderate system loads

◦ Receiver-initiated policy is preferable at high systemloads

◦ Sender-initiated policy provide better performance forthe case when process transfer cost significantly moreat receiver-initiated than at sender-initiated policydue to the preemptive transfer of processes

11/10/2013Sandeep Kumar Poonia

Page 47: 10. resource management

Resource Management

47

In load-sharing algorithms it is not necessary for the nodesto periodically exchange state information, but needs to knowthe state of other nodes when it is either underloaded oroverloaded

Broadcast when state changes◦ In sender-initiated/receiver-initiated location policy a node

broadcasts State Information Request when it becomesoverloaded/underloaded

◦ It is called broadcast-when-idle policy when receiver-initiatedpolicy is used with fixed threshold value value of 1

Poll when state changes◦ In large networks polling mechanism is used

◦ Polling mechanism randomly asks different nodes for stateinformation until find an appropriate one or probe limit is reached

◦ It is called poll-when-idle policy when receiver-initiated policy isused with fixed threshold value value of 1

11/10/2013Sandeep Kumar Poonia

Page 48: 10. resource management

Resource Management

48

Resource manager of a distributed systemschedules the processes to optimize combination ofresources usage, response time, networkcongestion, scheduling overhead

Three different approaches has been discussed◦ Task assignment approach deals with the assignment of

task in order to minimize inter process communicationcosts and improve turnaround time for the completeprocess, by taking some constraints into account

◦ In load-balancing approach the process assignmentdecisions attempt to equalize the avarage workload on allthe nodes of the system

◦ In load-sharing approach the process assignment decisionsattempt to keep all the nodes busy if there are sufficientprocesses in the system for all the nodes

11/10/2013Sandeep Kumar Poonia

Page 49: 10. resource management

Resource Management

component faults Transient faults: occur once and then disappear.

E.g. a bird flying through the beam of a microwave transmitter may cause lost bits on some network. If retry, may work.

Intermittent faults: occurs, then vanishes, then reappears, and so on. E.g. A loose contact on a connector.

Permanent faults: continue to exist until the fault is repaired. E.g. burnt-out chips, software bugs, and disk head crashes.

Sandeep Kumar Poonia 11/10/2013 49

Page 50: 10. resource management

Resource Management

There are two types of processor faults:

1. Fail-silent faults: a faulty processor just stops and does not respond

2. Byzantine faults: continue to run but give wrong answers

Sandeep Kumar Poonia 11/10/2013 50

Page 51: 10. resource management

Resource Management

Synchronous systems: a system that has the property of always responding to a message within a known finite bound if it is working is said to be synchronous. Otherwise, it is asynchronous.

Sandeep Kumar Poonia 11/10/2013 51

Page 52: 10. resource management

Resource Management

There are three kinds of fault tolerance approaches:

1. Information redundancy: extra bit to recover from garbled bits.(ex. Hamming code)

2. Time redundancy: do again(using atomic transaction, useful in transient & intermittent faults)

3. Physical redundancy: add extra components. There are two ways to organize extra physical equipment: active replication (use the components at the same time) and primary backup (use the backup if one fails).

Sandeep Kumar Poonia 11/10/2013 52

Page 53: 10. resource management

Resource Management

A B C

A3

A2

A1

B3

B2

B1

C3

C2

C1V1

V2

V3

V4

V5

V6

V7

V8

V9

voter

Sandeep Kumar Poonia 11/10/2013 53

Page 54: 10. resource management

Resource Management

A system is said to be k fault tolerant if itcan survive faults in k components and stillmeet its specifications.

K+1 processors can fault tolerant k fail-stop faults. If k of them fail, the one left canwork. But need 2k+1 to tolerate k Byzantinefaults because if k processors send outwrong replies, but there are still k+1processors giving the correct answer. Bymajority vote, a correct answer can still beobtained.

Sandeep Kumar Poonia 11/10/2013 54

Page 55: 10. resource management

Resource Management

Client BackupPrimary

1. Request2. Do work 3. Update

4. Do work

5. Ack6. Reply

Sandeep Kumar Poonia 11/10/2013 55

Page 56: 10. resource management

Resource Management

Backward recovery – checkpoints. In the checkpointing method, two

undesirable situations can occur: Lost message- The state of process Pi indicates

that it has sent a message m to process Pj. Pj hasno record of receiving this message.

Orphan message- The state of process Pj is suchthat it has received a message m from the processPi but the state of the process Pi is such that it hasnever sent the message m to Pj.

Sandeep Kumar Poonia 11/10/2013 56

Page 57: 10. resource management

Resource Management

A strongly consistent set of checkpointsconsist of a set of local checkpoints such thatthere is no orphan or lost message.

A consistent set of checkpoints consists of aset of local checkpoints such that there is noorphan message.

Sandeep Kumar Poonia 11/10/2013 57

Page 58: 10. resource management

Resource Management

Pi

Pj

m

failure

Current checkpoint

Current checkpoint

Sandeep Kumar Poonia 11/10/2013 58

Page 59: 10. resource management

Resource Management

a processor Pi needs to take a checkpointonly if there is another process Pj that hastaken a checkpoint that includes the receiptof a message from Pi and Pi has not recordedthe sending of this message.

In this way no orphan message will begenerated.

Sandeep Kumar Poonia 11/10/2013 59

Page 60: 10. resource management

Resource Management

Each process takes its checkpointsindependently without any coordination.

Sandeep Kumar Poonia 11/10/2013 60

Page 61: 10. resource management

Resource Management

Synchronous checkpoints are established in alonger period while asynchronouscheckpoints are used in a shorter period.

That is, within a synchronous period there areseveral asynchronous periods.

Sandeep Kumar Poonia 11/10/2013 61

Page 62: 10. resource management

Resource Management

Two-army problem

Two blue armies must reach agreement toattack a red army. If one blue army attacksby itself it will be slaughtered. They canonly communicate using an unreliablechannel: sending a messenger who issubject to capture by the red army.

They can never reach an agreement onattacking.

Sandeep Kumar Poonia 11/10/2013 62

Page 63: 10. resource management

Resource Management

Now assume the communication is perfectbut the processors are not. The classicalproblem is called the Byzantine generalsproblem. N generals and M of them aretraitors. Can they reach an agreement?

Sandeep Kumar Poonia 11/10/2013 63

Page 64: 10. resource management

Resource Management

1

43

2

1

11

2

2

2

x y

z

4

4

4

N = 4

M = 1

Sandeep Kumar Poonia 11/10/201364

Page 65: 10. resource management

Resource Management

After the first round1 got (1,2,x,4); 2 got (1,2,y,4);

3 got (1,2,3,4); 4 got (1,2,z,4)

After the second round1 got (1,2,y,4), (a,b,c,d), (1,2,z,4)

2 got (1,2,x,4), (e,f,g,h), (1,2,z,4)

4 got (1,2,x,4), (1,2,y,4), (i,j,k,l)

Majority1 got (1,2,_,4); 2 got (1,2,_,4); 4 got (1,2,_,4)

So all the good generals know that 3 is the bad guy.

Sandeep Kumar Poonia 11/10/2013 65

Page 66: 10. resource management

Resource Management

If there are m faulty processors, agreementcan be achieved only if 2m+1 correctprocessors are present, for a total of 3m+1.

Suppose n=3, m=1. Agreement cannot bereached.

Sandeep Kumar Poonia 11/10/2013 66

Page 67: 10. resource management

Resource Management

1

3

2

1

2

1x

2

y

After the first round

1 got (1,2,x); 2 got (1,2,y); 3 got (1,2,3)

After the second round

1 got (1,2,y), (a,b,c)

2 got (1,2,x), (d,e,f)

No majority. Cannot reach an agreement.

Sandeep Kumar Poonia 11/10/2013 67


Recommended