+ All Categories
Home > Documents > Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing...

Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing...

Date post: 14-Dec-2015
Category:
Upload: jair-jaquith
View: 224 times
Download: 2 times
Share this document with a friend
33
Chapter 7 - Local Stabilization 1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes and Repair
Transcript
Page 1: Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.

Chapter 7 - Local Stabilization 1

Chapter 7: roadmap

7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing

Algorithms7.3 Error-Detection Codes and Repair

Page 2: Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.

Chapter 7 - Local Stabilization 2

Introduction

We present a scheme that can be used to correct the state of algorithms for ongoing long-lived tasks.

Converting non-stabilizing algorithms for such tasks to self-stabilizing algorithm for the same task.

Page 3: Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.

Chapter 7 - Local Stabilization 3

The Malicious Fault Model

Starting from a safe configuration c, after which k processors experience transient fault - a new configuration c’ is reached.

The states of the faulty processors can be chosen as the states that result in the

longest convergence time.

Page 4: Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.

Chapter 7 - Local Stabilization 4

The Malicious Fault Model (2)

This worst case measure minimize the convergence time in the worst case scenario

However, algorithms designed with the worst case measure may have larger average convergence time than other algorithms

Page 5: Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.

Chapter 7 - Local Stabilization 5

The Non-malicious Fault Model

In this model, a transient fault assigns a state to a processor, that is chosen with equal probability from the state space of the processor

Page 6: Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.

Chapter 7 - Local Stabilization 6

Average Convergence Time

Pr (c, k, c’) : The probability of reaching a particular configuration c’ from a safe configuration c due to the occurrence of k faults

WorstCase(c) : The maximal number of cycles before the system reaches a safe configuration when it starts in c

Page 7: Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.

Chapter 7 - Local Stabilization 7

Average Convergence Time (2)

The average convergence time following the occurrence of k non-malicious transient faults is:

Σ [pr(c, k, c’) · WorstCase(c’)]Computed over all possible configurations c’

Page 8: Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.

Chapter 7 - Local Stabilization 8

Error Detection Codes

We use error-detection codes to reduce average convergence time

For each processor we maintain a variable ErrorDetect holding the error-detection code ed, of its current state s

The error-detecting function computes a pair <s, ed> given s

Page 9: Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.

Chapter 7 - Local Stabilization 9

Converting the Algorithm

Replace every step a by a step a’ that does the following:

1. Examines whether the value of ErrorDetect fits the current state

2. If (1) holds, execute a

3. Otherwise, execute a special repair step a’’

4. Compute the new ed’ by using the error-detecting function on the resulting state s’

Page 10: Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.

Chapter 7 - Local Stabilization 10

Converting the Algorithm (2)

A transient fault can corrupt all the memory bits of a processor

Thus, the probability that the value of ErrorDetect will fit the state of the faulty processor, decreases as the number of bits in ErrorDetect increases

Page 11: Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.

Chapter 7 - Local Stabilization 11

PyramidsA pyramid ∆i = vi[0], vi[1], vi[2],…, vi[d] of

views is maintained by every processor Pi , where vi[h] is a view of all the processors that are within a distance of no more than h from Pi, h times units ago.

In particular, vi[d] is a view of the entire system, d time units ago.

Page 12: Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.

Chapter 7 - Local Stabilization 12

V1

V1[0] : View of V1 Now.

Page 13: Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.

Chapter 7 - Local Stabilization 13

V1

V1[1] : View of colored vertices, one time unit ago.

Page 14: Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.

Chapter 7 - Local Stabilization 14

V1

V1[2] : View of colored vertices, two time units ago.

Page 15: Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.

Chapter 7 - Local Stabilization 15

V1

V1[3] : View of colored vertices, three time units ago.

Page 16: Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.

Chapter 7 - Local Stabilization 16

V1

V1[4] : View of the entire system, four time units ago.

Page 17: Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.

Chapter 7 - Local Stabilization 17

V1

V1[5] and V1[6] are views of the entire system as well, the difference is only in the time these views were taken.

Page 18: Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.

Chapter 7 - Local Stabilization 18

Neighboring Pyramids

Neighboring processors exchange pyramids between themselves, and check agreement on the shared portions

If shared portions are equal, then all the v[d] views are equal

In addition, every processor checks that vi[d] is a consistent configuration for the input

algorithm AL and the current task (the configuration is reachable from the initial state

of AL)

Page 19: Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.

Chapter 7 - Local Stabilization 19

Checking Consistent Configuration

Pi checks that its state in the view vi[h] , for 0 ≤ h ≤ d-1, is obtained by executing AL using the state of Pi and its neighbors in vi[h+1] .

Page 20: Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.

Chapter 7 - Local Stabilization 20

Updating the Pyramids

In every time unit, Pi receives the pyramid ∆j = vj[0], vj[1], vj[2],…, vj[d] of every neighbor, and uses the values of vj[d-1] to construct the value of the new vi[d]

The values of vj[d-1] contain information about every processor at distance d from Pi, d-1 time units ago

In the same way, Pi uses the received values of vj[k-1], for 0 ≤ k ≤ d-1,

(together with vi[k-1] ) to compute vi[k]

Page 21: Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.

Chapter 7 - Local Stabilization 21

The Repair Scheme

First, we will assume that the error detection code, identifies all the faults

In general, the faulty processors initialize their states, and collect state information from non-faulty

processors to reconstruct their pyramids

Page 22: Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.

Chapter 7 - Local Stabilization 22

The Repair Scheme(2)

Let c’ be a configuration reached after several faults

Three groups of processors:Faulty,Border-non-faulty, Operating.

A Process which identifies an error, assigns faulty to its local status variable, and resets its pyramid

Page 23: Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.

Chapter 7 - Local Stabilization 23

Border-Non-Faulty and Operating

The pyramid of a non-faulty processor that is neighbor to a faulty processor has almost all the information stored in the faulty processor before the fault.

Such process assigns its local status variable the value border-non-faulty.

The rest non-faulty processors are defined operating.

Page 24: Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.

Chapter 7 - Local Stabilization 24

Faulty

Border-non-faulty

Operating

Page 25: Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.

Chapter 7 - Local Stabilization 25

Freezing the Pyramids

A border-non-faulty processor does not change its pyramid until all the faulty processors finished reconstructing theirs

The Topology Collection procedure is used to verify that.

Page 26: Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.

Chapter 7 - Local Stabilization 26

Topology Collection

Every faulty and border-non-faulty processors send their topology known at that moment to their neighbors

After several rounds (the diameter of the corrupted region + 1), all the information in the pyramids of processors next to a faulty one has arrived

Page 27: Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.

Chapter 7 - Local Stabilization 27

Topology Collection (2)

Every processor checks if there exists a faulty processor which has an edge connected to a processor with an unknown state

When this test returns false, the processor pyramids can be reconstructed

Page 28: Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.

Chapter 7 - Local Stabilization 28

Reconstruction

The faulty processors reconstruct their pyramids using the collected information from the other pyramids and the transition functions of the processors

Page 29: Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.

Chapter 7 - Local Stabilization 29

Back to Operating

Using a local counter, and the collected topology, the faulty and border-non-faulty processors conclude when the rest have finished reconstructing their pyramids

At the end of the repair process, all the processors change their status to operating

Page 30: Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.

Chapter 7 - Local Stabilization 30

The algorithm

State variables:

Status = {operating, faulty, border non faulty}

Topology = {V , E}

Pyramid (Explained before)

Round Counter – counts the number of rounds since the occurrence of the recent fault.

Page 31: Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.

Chapter 7 - Local Stabilization 31

The algorithm (cont.)Upon a clock tick:1. If (status = operating)

1.1 if (DetectError())1.1.1 status = faulty1.1.2 Pyramid = nil1.1.3 RoundCounter = 0

1.2 else if (HaveFaultyNeighbor())1.2.1 status = Border non faulty1.2.2 RoundCounter = 0

1.3 else UpdatePyramid()2. Else

2.1 ExchangeLocalTopologyInformation()2.2 if ( HasAllTopology()

& status = faulty)2.2.1 ReconstructPyramid()

2.3 RoundCounter++2.4 If (Diamater(Topology) = RoundCounter)

2.4.1 status = operating

Detects if a transient error occurredError Detection Codes

If one of the neighbors is faulty

Send immediate neighbors information, and receive Information from neighbors

Returns true iff there is not an edge coming out from faulty to an unknown state processor`

Page 32: Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.

Chapter 7 - Local Stabilization 32

Undetected Faults

What happens in case the faults are not detected?

Transient fault detectors and watch dog counters are used in this situation

When an error is detected by the transient fault detector, the faulty process starts counting while letting the repair scheme try and fix the problem

Page 33: Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.

Chapter 7 - Local Stabilization 33

Undetected Faults (2)

When the counter reaches its upper bound, the system is examined again

If the repair failed, a reset is triggered to the system


Recommended