Fault-Tolerant Design of Digital Systems EGE 534bai/EGE534/lecture_3.pdf · Parity Codes for Memory...

ECE 534 1

Fault-Tolerant Design of Digital Systems

EGE 534

Information Redundancy

Dr. Baback IzadiDepartment of Electrical and Computer Engineering and

State University of New York – New [email protected]

ECE 534 2

Outline

Error detecting and error correcting codesCodes for storage and communicationCodes for arithmetic operationsCodes for control units Example: Applying the detection techniques used so far

A failure resilient node/network controller

ECE 534 3

Basic ConceptsCode

A mean of representing information using a well defined set of rulesCodeword

A collection of symbols used to represent a data based on specific codeExample: BCD code using 0’s and 1’s

Valid codeword Vs. invalid codewordCodeword is valid if it adheres to all the rules that defines the code.

Encoding ProcessDetermining the pertaining codeword from a data itemExample: Data item = 7 0111

Decoding ProcessRecovering the original data from the codeword.

Separable Code: Original information is appended with the check bits to form the code

Non separable Code

ECE 534 4

Basic Concept of Error Detecting/ Correcting

Error detection code Specific representation allowing error introduced into the code to be detected.

n bits 2n combinations

Is BCD error detecting code?Error correcting code

Error detecting initiates reconfigurationError correction provides fault masking

Valid InvalidError

ECE 534 5

Fault Detection through Encoding

At logic level, codes provide means of masking or detection of errorsFormally, code is a subset S of universe U of possible vectorsA noncode word is a vector in set U-S

Example:X1 is a codeword <10010011>Due to multiple bit error, becomesX3 = <10011100> not detectable

X2 is a codeword, becomes X4 noncode detectable

S = even parity

X1

X3X2

X4

U = 28 vectors

ECE 534 6

Hamming Distance

The Hamming weight of a vector x (e.g., codeword), w(x), is number of nonzero elements of x.Hamming distance between two vectors x and y, d(x,y) is number of bits in which they differ.Distance of a code is a minimum of Hamming distances between all pairs of code words.

Example: x = (1011), y = (0110)w(x) = 3, w(y) = 2, d(x, y) = 3

ECE 534 7

Distance Properties

A code with a distance of 2 can detect a single fault. A code with a distance of 3 can correct a single fault or detect a double fault. To detect all error patterns of Hamming distance ≤ d, code distance must be ≥ d+1To correct all error patterns of Hamming distance ≤ c, code distance must be ≥ 2c + 1To correct c bits, and detect d bits, the code distance must be ≥ 2c + d + 1

ECE 534 8

Basic Code Operations

Consider n-bit vectors, space of 2n vectorsA subset of 2n vectors are code wordsSubset called (n, k) code, where fraction k/n is called rate of codeAddition operation on vectors is bit-wise exclusive-ORX + Y = < x1 ⊕ y1, x2 ⊕ y2, …, xn ⊕ yn >

Multiplication operation is bitwise ANDcX = < cx1, cx2, …, cxn >

ECE 534 9

Example

0123456789

Odd and even parity codes for BCD data

DecimalDigit BCD

BCDodd parity

BCDeven parity

0000000100100011010001010110011110001001

00001000100010000111010000101101101011101000010011

ParityBit

00000000110010100110010010101001100011111000110010

ParityBit

Parity

Data

Data Out

ParityChecking

ErrorSignalParity

Generator

Memory

Parity Bit

Data

Data In~~ ~~Organization of a memory that uses single-bit parity. The parity bit is generated when data is written to memory and checked when data is ready.

Parity Codes

Separable code

Data Bits

Generated ParityBit

Error Signal

d0d1

d2

d3

P

XOR Tree for Parity Generation4-bit Parity Generation and Checking

ECE 534 10

Variation of Parity Codes for RAM

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 P Bit-per-word parity

7 6 5 4 3 2 1 0 P1 P215 14 13 12 11 10 9 8 Bit-per-byte parity

3 2 1 0 7 6 5 411 10 9 815 14 13 12 P4 P3 P2 P1

Chip 5 Chip 2Chip 4 Chip 3 Chip 1

Bit-per-multiple-chips parity

3 2 1 0 7 6 5 411 10 9 815 14 13 12 P4 P3 P2 P1

Chip 5 Chip 2Chip 4 Chip 3 Chip 1

Bit-per-chip parity

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 P4 P3 P2 P1 Interlaced parity

Odd or Even

Odd Even

ECE 534 11

Overlapping Parity

Parity groups are formed with each bit appearing in more than one parity groupErrors can be detected and locatedErroneous bit can be corrected by a simple complementation

0123 P0P1P2

Four information bits Three parity bits

Bit in error Parity in error3 P2 P1 P02 P2 P11 P2 P00 P1 P0P2 P2P1 P1P0 P0

ECE 534 12

Parity Codes for Memory - Comparison

Parity Code Advantages Disadvantages Bit-per-word: one parity bit per data word Detects all single-bit

errors Certain errors undetected, e.g., a word, including parity bit becomes all 1s, due to a failure of a bus or a set of data buffers.

Bit-per-byte: each data portion (e.g., a byte) is protected by a separate parity bit; the parity of one group should be even and the parity of the other group should be odd

Detects all-1s and all-0s conditions

Ineffective for multiple errors, e.g., the whole-chip failure

Bit-per-multiple-chips: one bit from each chip is associated with a single parity bit

Detects failure of entire chip

Cannot locate failure of complete chip

Bit-per-chip: each parity bit is associated with one chip of the memory

Detects single-bit errors and identifies chip with erroneous bit

Susceptible to whole-chip failure, i.e., a single chip error can result in multiple bits to be corrupted and this may go undetected.

Interlaced: similar to the bit-per-multiple-chips; must ensure that no two adjacent bits are from the same parity group

Detects errors in adjacent bits

Parity groups are not based on physical organization of the memory

Overlapping: Error can be detected and corrected

Multiple bits of parity is needed

ECE 534 13

Parity Prediction in Arithmetic Circuits

Binary adderTwo inputs: A = (an-1 … a0 ac) and B = (bn-1 … b0 bc)Two operands to be added: (an-1 … a0) and (bn-1 … b0)ac and bc are check bits of A and B, respectivelyEncoded output will be s = (sn-1 … s0 sc) where (sn-1 … s0) are determined by the ordinary binary addition of (an-1 … a0) to (bn-1 … b0) and sc is the check bit for (sn-1 … s0)Then

Reduces to

sc = sii= 0

n−1

∑

= aii=0

n−1

∑ ⊕ bii=0

n−1

∑ ⊕ cii=0

n−1

∑

sc = ac ⊕ bc ⊕ cii =0

n−1

∑

ECE 534 14

A four bit example

A = a3a2 a1a0 ac = a3 ⊕ a2 ⊕ a1 ⊕ a0

B = b3b2 b1b0 bc = b3 ⊕ b2 ⊕ b1 ⊕ b0

S = s3 s2 s1 s0 sc = s3 ⊕ s2 ⊕ s1 ⊕ s0

s0 = a0⊕ b0 ⊕ c0

s1 = a1⊕ b1 ⊕ c1

s2 = a2⊕ b2 ⊕ c2

s3 = a3⊕ b3 ⊕ c3

sc= (a0⊕ b0⊕ c0)⊕( a1⊕ b1⊕ c1) ⊕ (a2⊕ b2⊕ c2) ⊕ (a3⊕ b3⊕ c3 )sc= (a3 ⊕ a2 ⊕ a1⊕ a0)⊕(b3 ⊕ b2 ⊕ b1 ⊕ b0) ⊕ (c3 ⊕ c2 ⊕ c1 ⊕ c0)sc = ac ⊕ bc ⊕ (c3 ⊕ c2 ⊕ c1 ⊕ c0)

sc = ac ⊕ bc ⊕ cii =0

n−1

∑

ECE 534 15

Parity Prediction in Binary Adder

an-1 a0 ac bcbn-1 b0

Binary adder

+

cn-1

c0

sn-1 s0 Error

... ...

...sc = ac ⊕ bc ⊕ ci

i =0

n−1

∑

∑−

=

=1

0'

n

iic ss

ECE 534 16

Parity-Checked Binary Adder

Sum

...

...

...

sn-1

cn-1

′ c n −1

an-1 bn-1

Sum

Carry

Carry Carry

Sum

Carry

′ c 2

c2

′ c 1

c1

s1 s0

c0

a1 a0 b0b1

+

...sn-1

s0s1

Error

(a) ac bc

ECE 534 17

Parity-Checked Binary Adder (cont.)

+

...sn-1

s0s1

Error

Carry look-ahead circuit

′ c n −1 Carry

Sum

Carry

SumSum

(b)

′ c 2 ′ c 1

sn-1 s1 s0cn-1

a n 1

b n 1

c2

ac bc

c1

a0

b0

...

c0a1

b1

ECE 534 18

Binary Multiplierp0 = a0b0

p1 = a0b1 ⊕ a1b0

p2 = a0b2 ⊕ a1b1 ⊕ a2b0 ⊕ c1,1

p3 = a0b3 ⊕ a1b2 ⊕ a2b1 ⊕ a3b0 ⊕ c2 ,1 ⊕ c1,2

p4 = a1b3 ⊕ a2b2 ⊕ a3b1 ⊕ c3,1 ⊕ c 2, 2 ⊕ c1 ,3

p5 = a2b3 ⊕ a3b2 ⊕ c3,2 ⊕ c2 ,3 ⊕ c1 ,4

p6 = a3b3 ⊕ c3 ,3 ⊕ c2 ,4

p7 = c3 ,4Therefore, denoting the check bit for (p7 …p0 ) by pc

pc = pii =0

7

∑

= ( aii=0

3

∑ )( bji = 0

3

∑ )⊕ ci, jj =1

4

∑i=1

3

∑

= acbc ⊕ ci , jj =1

4

∑i=1

3

∑

a3a2 a1a0× b3b2 b1b0

a3b0 a2b0 a1b0 a0b0a3b1 a2b1 a1b1 a0b1

a3b2 a2b2 a1b2 a0b2a3b3 a2b3 a1b3 a0b3

p7 p6 p5 p4 p3 p2 p1 p0

Example:1 0 1 1

× 0 1 1 00 0 0 0

1 0 1 11 0 1 1

0 0 0 0p7p6 p5p4p3p2p1p0

Cases for A’s and B’s

Even #aibj =1

Even #aibj =1

Oddaibj =1

B Even 1B Odd 1B Odd

A Even 1A Even 1A Oddai= 1∑ ai= 0∑

bj= 0∑bj= 1∑

ai= 0∑

bj= 0∑

pj= 1∑ pj= 0∑ pj= 0∑

ECE 534 19

Multiplier Using Array of Full-Half Adders

a3 a2 a1 a0

a3 a1 a0a2

a3 a2 a1 a0

a3 a2 a1 a0

c sc sc s

c sc sc s

c sc sc s

c sc sc s

HA: Half AdderFA: Full Adder

S: SumC: Carry

HA

HA HA HA

FA

FA FA FA

FAFAFA

FA

C 3 , 1 C 2 , 1 C1 , 1

C 3, 2C 2, 2 C 1, 2

C 3, 3 C 2 , 3 C 1 , 3

C 2, 4C 3, 4 C 1 , 4

p7 p6 p5 p4 p5 p2 p1 p0

b0

b1

b2

b3

ECE 534 20

Error Correction with Overlapped Parity

Corrected Bits

B it 0C 0

B it 0

C 2B it 2

CP 2

B it P2

C 1B it 1

B it 1

B it 2

B it P2

3 2 1 0 P2 P1 P0

Parity GeneratorParity Generator

Pr0V

Parity Generator

1 3 0

Pr1V

2 0 3

Pr2V

1 2 3

3-8Decode

Correct Bit 0C 0

Pr0

P0

Pr1

P1

Pr2

P2

C 1

C 2C 3

CP 0CP 1

CP2Correct Bit 1Correct Bit 2

Correct Bit 3

Correct Bit P0

Correct Bit P1

Correct Bit P2

No Error

Bit in error Parity in error3 P2 P1 P02 P2 P11 P2 P00 P1 P0P2 P2P1 P1P0 P0

0123 P0P1P2

Four information bits Three parity bits

ECE 534 21

Hamming Error-Correcting Code

Require from 10% to 40% redundancyOverlapping parityThe Hamming single-error correcting code uses c parity check bits to protect n bits of information:

2c ≥ c + n + 1Example:

For four information bits (d3, d2, d1, d0), need three parity bits (p2, p1, p0)the bits are partitioned into groups as (d3, d1, d0, p0), (d3, d2, d0, p1) and (d3, d2, d1, p2)the grouping of bits can be determined from a list of binary numbers from 0 to 2n - 1.Each check bit is specified to set the parity, either even or odd, of its respective group 7 6 5 4 3 2 1

d3 d2 d1 p2 d0 p1 p0

ECE 534 22

Hamming Error-Correcting Code

p2p1p0 Bit in Error0 0 0 No Error0 0 1 P0

0 1 0 P1

0 1 1 d0

1 0 0 p3

1 0 1 d1

1 1 0 d2

1 1 1 d3

Parity bits calculationp2 = XOR of bits (5, 6, 7) = d1⊕d2⊕ d3p1 = XOR of bits (3, 6, 7) = d0⊕d2⊕ d3p0 = XOR of bits (3, 5, 7) = d0⊕d1⊕ d3

Parity checkingc2 = XOR of bits (4,5, 6, 7) = p2 ⊕ d1 ⊕d2 ⊕ d3c1 = XOR of bits (2,3, 6, 7) = p1 ⊕ d0 ⊕d2 ⊕ d3c0 = XOR of bits (1,3, 5, 7) = p0 ⊕ d0 ⊕d1 ⊕ d3

Example: d3d2d1d0= 0101 p2p1p0= 101Hence, code = d3d2d1p2d0p1p0 = 0101101Assume a bit error in bit d0 0101001

c3 = p2 ⊕ d1⊕d2⊕ d3 = 0c2 = p1⊕ d0⊕d2⊕ d3 = 1c1= p0⊕d0⊕d1⊕ d3 = 1

Hence, bit 3 is in error! And needs to be inverted.

7 6 5 4 3 2 1

d3 d2 d1 p2 d0 p1 p0

Observe that the check bits identifies the position of the bit in error

ECE 534 23

Check Bits and Syndromes for Single-Bit Errors

The original data is encoded by generating a set Cg, of parity bits.To check correctness, the encoding process is repeated and a set Cc, of parity bits is generated.If Cg and Cc agree, the information is correct.If Cg and Cc disagree, the information is incorrect and must be corrected.To aid the correction, a syndrome is defined:The syndrome is a binary word that has 1 in each bit position in which Cg and Cc disagree; the syndrome points directly to the erroneous bit.

Erroneous bits Check bits affected Syndromesd0 p0, p1 011d1 p0, p2 101d2 p1, p2 110d3 p0, p1, p2 111p1 p0 001p2 p1 010p3 p2 100

7 6 5 4 3 2 1

d3 d2 d1 p2 d0 p1 p0

ECE 534 24

Hamming Single-Error Correction Unit

p3 *

ParityGenerator

d3 d2 d1

p3

C3

p2

ParityGenerator

d3 d2 d0

p2

C2

p1

ParityGenerator

d3 d1 d0

p1

C1

**

Syndrome

Controlled Complementation Unit

d3d0 p3p2 d1 d2p1

d3d0 d1p1 p3p2 d2

C1C2C3

}

SyndromeSyndrome determineswhich bit (if any) Iscomplemented}

Corrected Bits

Hamming single-error correction unit for fourinformation bits and three check bits

7 6 5 4 3 2 1

d3 d2 d1 p3 d0 p2 p1

ECE 534 25

Double-Bit Errors

Say d1 and d0 become faulty, p2 p1 p0 = 111 d3 is incorrectly inverted.

Double errors can’t be corrected, can it be detected?Adding an extra parity bit to check data and overlapping parity bits

dn-1 … d1, d0 pk-1 … p0 p

Erroneous bits Check bits affected Syndromesd0 p0, p1 011d1 p0, p2 101d2 p1, p2 110d3 p0, p1, p2 111p0 p0 001p1 p1 010p3 p2 100

7 6 5 4 3 2 1

d3 d2 d1 p2 d0 p1 p0

Does not detect errorDetects errorDouble bits error

Detects errorDetects errorSingle bit error

ECE 534 26

Modified Hamming Code (SEC-DED)

c2 c1 c0 c3 Syndromes computationC2 = XOR (4, 5, 6, 7) C1 = XOR (2, 3, 6, 7)C0 = XOR (1, 3, 5, 7)

C3 = parity over all 8 bits of the code word

Check bits computationP2 = XOR (5, 6, 7)P1 = XOR (3, 6, 7) P0 = XOR (3, 5, 7)

P3 = parity over the first 7 bits of the code word i.e. d3 d2 d1 p2 d0 p1 p0

0 0 001

y2 y1 0y0

0 0 10

Single error in a position (x2x1x0)x2 x1 x0

H matrix for Single Error Correction and Double Error Detection

No errors

Double errorError in bit p4

11111111c3

10101010c0

01100110c1

00011110c2

p0p1d0p2d1d2d3p3

12345678

ECE 534 27

Single Error Correction and Double Error Detection Hamming Code (SEC-DED) Example

110011015

010001004

110010003

111011002

110011001

p0p1d0p2d1d2d3p4

12345678

Initial datad3 d2 d1 d00 1 1 0

CorrespondingSyndromes

No errors

Failure scenarios

Single error in position 3

Error in bit p3

Check bits computationP2 = XOR (5, 6, 7)P1 = XOR (3, 6, 7) P0 = XOR (3, 5, 7)No errors


Double error


Error in bit p3


Double error

Syndromes computationC2 = XOR (4, 5, 6, 7) C1 = XOR (2, 3, 6, 7)C0 = XOR (1, 3, 5, 7)

C3 = parity over all 8 bits of the code word00015

00104011131101200001c0c1c2c3

ECE 534 28

Checksum Codes - Basic Concepts

The checksum is appended to block data when such blocks are transferred

rn

r5r4r3r2r1

Transfer

Checksum onReceived Data

Received Versionof Checksum

Compare

dn

d5d4d3d2d1

Checksum onOriginal Data

di = original word of datari = received word of data

ECE 534 29

Single Precision Checksums

The single-precision checksum is unable to detect certain types of errors. The received checksum and the checksum of the received data are equal, so no error is detected.

0 1 1 1

0 0 0 1

0 1 1 0

0 0 0 0

d3 d2 d1 d0

1 1 1 0

Checksum

Original Data

1 1 1 1

1 0 0 1

1 1 1 0

1 0 0 0

d3 d2 d1 d0

1 1 1 0

Checksum of Received Data

Original Data

1 1 1 0

Received Checksum

Transmit Receive

d0

d1

d2

d3

Faulty LineAlways “1”

A single-precision checksum is formed by adding the data words and ignoring any overflow

Carry is Ignored

0 1 1 1 0 0 0 10 1 1 01 0 0 0

0 1 1 0( )

(Addition) +

1 Checksum

Original Data}

ECE 534 30

Double Precision Checksums

Compute 2n-bit checksum for a block of n-bit wordsOverflow is still a concern, but it is now overflow from a 2n-bits

The received checksum and the checksum of the received data are not equal, so the error is detected

0 1 1 1

0 0 0 1

0 1 1 0

0 0 0 0

d3 d2 d1 d0

0 0 0 0Checksum

Original Data

1 1 1 1

1 0 0 1

1 1 1 0

1 0 0 0

d3 d2 d1 d0


Received Data

Received Checksum

Transmit Receive

d0d1

d2

d3


1 1 1 0 0 0 1 0 1 1 1 0

1 0 0 0 1 1 1 0

ECE 534 31

Honeywell Checksums

Concatenate consecutive words to form double words to create k/2words of 2n bits; checksum formed over newly structured data

Transmit Receive

d0d1

d2

d3


0 1 1 1

0 0 0 1

0 1 1 0

0 0 0 0

d3 d2 d1 d0

Original Data

0 0 0 1 0 1 1 1

0 0 0 0 0 1 1 0

0 0 0 1 1 1 0 1

Checksum of Original Data

1 1 1 1

1 0 0 1

1 1 1 0

1 0 0 0

d3 d2 d1 d0

Received Data

1 0 0 1 1 1 1 1

1 0 0 0 1 1 1 0

0 0 1 0 1 1 0 1


Received Checksum

10 0 1 1 1 0 1

Word n

Word 2Word 3

Word 1

Checksum

Word n - 1 Word n

Word 9 Word 10

Word 7 Word 8

Word 5 Word 6

Word 3 Word 4

Word 1 Word 2

ECE 534 32

Residue Checksums

0 1 1 1

0 0 0 1

0 1 1 0

0 0 0 0

d3 d2 d1 d0

Original Data

1 1 1 0

Checksum of Original Data

Word n

Word 3Word 4

Word 2

Checksum

Sum of Data

Word 1

c

c

Carry from Addition

End-Around Carry Addition

1 1 1 11 0 0 1

1 1 1 0

1 0 0 0

d3 d2 d1 d0

Received Data

0 0 0 1


1 1 1 0

Received Checksum

Three Carries

Generated During

End-Around

Carry Addition

1 1 1 0

111

The same concept as the single-precision checksum except that the carry bit is not ignored and is added to checksum in an end-around carry fashion

Transmit Receive

d0d1

d2

d3


ECE 534 33

Codes for Storage and Communication Cyclic Codes

Cyclic codes are parity check codes with additional property that cyclic shift of codeword is also a codeword

if (Cn-1, Cn-1 ... C1, C0) is a codeword, (Cn-2, Cn-3, ... C0, Cn-1) is also a codeword

Cyclic codes are used in

sequential storage devices, e.g. tapes, disks, and data links

communication applications

An (n,k) cyclic code can detect single bit errors, multiple adjacent bit errors affecting fewer than (n-k) bits, and burst transient errors

Cyclic codes require less hardware, in form of linear feedback shift registers

in comparison, parity check codes require complex encoding, decoding circuit using arrays of EX-OR gates, AND gates, etc.

ECE 534 34

Cyclic Code and Polynomials

Cyclic codes employ on the representation of data by a polynomial If (Cn-1, Cn-2 ... C1, C0) is a codeword, its polynomial representation is C(x)= Cn-1 xn-1 + Cn-2xn-2 + ... C1x + C0

The (n,k) cyclic code is characterized by

G(x): the generator polynomial of degree (n-k)

D(x): the data polynomial of degree (k-1)

dk-1, dk-2 ... d1, d0 D(x)= dk-1 xk-1 + Ck-2xk-2 + ... C1x + C0

Similar idea to number representation in a base

(d3d2d1d0)r = d3 × r3 + d2 × r2 + d1 × r + d0

V(x): Code polynomial Code word V

Code word V = (vn-1, vn-2 ... v1, v0) of degree (n-1)

ECE 534 35

Cyclic Code and Polynomials

Encoding process: V(x) = D(x) * G (x) using modulo 2

Code word V(x) is a non-separable code

Example: for (7,4) code, G(x) = x3 + x + 1

Given data word (1011) D(x) =x3 + x + 1

V(x) = G(x) * D(x) = (x3 + x + 1) (x3 + x + 1) = x6 + x4 + x3 +x4 + x2 + x + x3 + x + 1 =x6 + x2 + 1

Hence code word is (1000101)

Given data word (1111) D(x) =x3 + x2 + x + 1

V(x) = G(x) * D(x) = (x3 + x2 + x + 1) (x3 + x + 1) = = x6 + x5 + x3 + 1

Hence code word is (1101001)

ECE 534 36

Basic Operations on Polynomials

Can multiply or divide one polynomial by another, follow modulo 2 arithmetic, coefficients are 1 or 0, and addition and subtraction are same

G(x) is a polynomial of degree (n-k) for an (n,k) code, with a unity coefficient in (n-k) termG(x) is a factor of xn-1, i.e., it divides it with zero remainder

if a polynomial with degree n-k divides xn-1, then G(x) generates a cyclic codefor (7,4) code, G(x) = x3 + x + 1, can verify g(x) divides x7 -1

MultiplicationMultiplication DivisionDivision

(x4 + x3 + x2 + 1)(x3 + x)x7+ x6 + x5 + x3

+ x5 + x4 +x3 +x= x7 + x6 + x4 + x

x4 + x3 + x2 + 1Quotient

Remainder

x

x5 + x4

x5 + x4 + x3 + x

x3 + x

ECE 534 37

Cyclic Code - Example

Generator polynomial G(x) =x3 + x + 1 for (7,4) code

Data polynomial D(x) = d3x3 + d2x2 + d1x + d0

Code polynomial V(x)= ν6x6 + ν5x5 + ν4x4 + ν3x3 +

+ ν2x2 + ν1x + ν0

Code distance is 3

SEC/DED code

0000000100100011010001010110011110001001101010111100110111101111

0000000000110100110100010111011010001110010101110010001111010001100101111001011111111011100101000110001101001011

Cyclic codes for 4-bit information words.Information

(d0, d1, d2, d3,)Code

(ν0,ν1,ν2,ν3,ν4,ν5 ,ν6)

ECE 534 38

Consider blocks labeled X as multipliers, and addition elements as modulo 2

Another representation is to replace multipliers by storage elements, adders by EX-OR gates

Circuit to Generate Cyclic Code

x x x+ + v(x)

D(x)

g(x)=x3 + x + 1V(x) = [(x2 +1)x +1] D(x)

v(x)

D(x)

clock

Reg 1 Reg 2 Reg 3

ECE 534 39

Generation of Code Words

0000000100100011010001010110011110001001101010111100110111101111

0000000000110100110100010111011010001110010101110010001111010001100101111001011111111011100101000110001101001011

Cyclic codes for 4-bit information words.Information

(d0, d1, d2, d3,)Code

(ν0,ν1,ν2,ν3,ν4,ν5 ,ν6)

Data polynomial = d0 + d1x + d2x2 + d3x3

Generator polynomial = 1 + x + x3

Code polynomial = ν0 + ν1x + ν2x2 + ν3x3 + ν4x4 + ν5x5 + ν6x6

0

1

2

3

4

5

6

7

1

0

1

1

0

1

0

0

0

2

0

0

1

1

0

1

0

0

3

0

1

1

1

0

0

1

0

1

1

0

1

0

0

0

1

0

1

0

0

0

1

The encoding process

Register valuesClock period D(x)V(x)

v(x)

D(x)

clock

Reg 1 Reg 2 Reg 3

ECE 534 40

Decoding of Cyclic Codes

Determine if code word (rn-1, rn-2, ....., r1, r0) is validCode polynomial r(x) = rn-1 xn-1 + rn-2 xn-2 + ... r1 x + r0

If r(x) is a valid code polynomial, it should be a multiple generator polynomial g(x)

r(x) = d(x) g(x) + s(x), where s(x) the syndrome polynomial should be zero

Hence, divide r(x) by g(x) and check the remainder whether equal to 0

D(x) = V(x) / G(x)+

V(x) B(x)

D(x)

ECE 534 41

Decoding of Cyclic Codes

Example: for (7,4) code, G(x) = x3 + x + 1

D(x) = V(x) + B(x)

V(x) = D(x) – B(x)

Since in Modulo 2 addition and subtraction are the same

V(x) = D(x) + B(x)

V(x) = D(x) (x3 + x + 1)

B(x) = D(x) ( x3 + x)

+V(x) B(x)

D(x)

x x x

+B(x)

D(x)

ECE 534 42

Circuits for Decoding

Another representation is to replace multipliers by storage elements and adders by EX-OR gates

x x x

+ +V(x) B(x)

D(x)

Note: Once the division is completed, the registers contain the valueof the syndrome (remainder)

V(x)

D(x)Reg 1 Reg 2 Reg 3

clock

ECE 534 43

Example Decoding

0

1

2

3

4

5

6

7

1

0

0

0

1

1

0

1

0

2

0

0

1

1

0

1

0

0

3

0

1

1

0

1

0

0

0

1

0

1

0

0

0

1

0

1

1

1

0

0

1

The decoding process with correct information

Register valuesClock period V(x) B(x)

OriginalInformationSyndrome {

1

1

0

1

0

0

0

D(x)

Codeword

0

1

2

3

4

5

6

7

1

0

0

0

1

1

0

0

1

2

0

0

1

1

0

0

1

1

3

0

1

1

0

0

1

1

0

1

0

1

1

0

0

1

0

1

1

1

1

1

1

The decoding process with erroneous information

Register valuesClock period V(x) B(x)

1

1

0

0

1

1

0

D(x)

{Nonzero Syndrome

Receivedword

Generator polynomial, g(x) = x3 + x +1

v(x)

d(x)Reg 1 Reg

2Reg 3

clock

ECE 534 44

Arithmetic CodesUseful to check arithmetic operations

Parity codes are not preserved under addition, subtraction

Arithmetic codes can be separate (check bits disjoint from data bits) or nonseparate (combined check and data)

Several Arithmetic codes

AN codes, Residue codes, Inverse residue codes, Residue Number Systems (RNS)

Arithmetic codes must be invariant to a set of arithmetic operations

A(b*c) = A(b) * A(c)

ECE 534 45

AN CodesMultiple each data word N with some constant A

Code is invariant to addition and subtraction, but not to multiplication and division

Constant A determines the number of extra bits required and error detection capability provided

A ≠ 2a

If A = 2a

N = nj-1 nj-2 … n2 n1 n0

A N = nj-1 nj-2 … n2 n1 n0 00 … 0

AN still divisible to A if and bits ni is in error

a

ECE 534 46

Properties of AN CodesA valid AN code is 3N code

N 3 N0000 0000000001 0000110010 0001100011 001001

Operation performed under an AN code can be checked by determining if the result is evenly divisible by A

6 + 1 = 7 010010 + 000011 = 010101 (21 is divisible by 3)

If you have an output line s-a-1 i.e. 010111

23 not divisible by 3 and error is detected

ECE 534 47

Example of 3N Code

0000000100100011010001010110011110001001101010111100110111101111

000000000011000110001001001100001111010010010101011000011011011110100001100100100111101010101101

Resulting 3N code words for a 4-bit information words

OriginalInformation 3N code word

b5 b4 b3 b2 b1 b0 a5 a4 a3 a2 a1 a0

S5 S4 S3 S2 S1 S0

ADDER

3N CODEB

3N CODEA

3N CODEof Sum

(3N Code of 6)(3N Code of 1)(3N Code of 7)

A = 0 1 0 0 1 0+ B = 0 0 0 0 1 1

S = 0 1 0 1 0 1

Normal Operation If S1 is always “1”

(3N Code of 6)(3N Code of 1)(Not a valid 3N Code)

A = 0 1 0 0 1 0B = 0 0 0 0 1 1S = 0 1 0 1 1 1

Illustration of the error detection capabilities of the 3N arithmeticcode. The presence of the fault results in the sum being an invalid 3N code.

ECE 534 48

Generating AN Codes

(n+1) bit adder

N = nj-1 nj-2 … n2 n1 n02N = nj-1 nj-2 … n2 n1 n0 0

3N

Generating 3N code

00

01

00 01 11 10

0 1 3 2

4 5 7 6

12 13 15 14

8 9 11 10

11

10 1

1

1 1

1

1

How to check if a code is a 3N

ECE 534 49

Residue codes

Separable codes D | R D is data and R is the residue of the dataN = Q m + r ; r remainder or residue, Q quotient, and m modulus

N = 14 m = 3, r =2 14 = 2 mod (3)Example: residue of a 4 bit mod 3

Data residue Codeword0000 0 0000 000001 1 0001 010010 2 0010 100011 0 0011 000100 1 0100 01

ECE 534 50

Residue CodesResidue of an integer n with respect to A is: n mod (A)Residue code are invariant to additions((x mod A) + (y mod A)) mod A = (x+y) mod AResidues can be handled separately from the data during the addition process

If A is any integer not a power of 2, then Residue Code detects any single-bit error in an arithmetic operation of ADD or SUB

+A

+X

Y

residue(X mod A)

residue(Y mod A)

Equality checker

X + Y

Adder

Modulo-AAdder

ResidueGenerator(mod-A)

Error if not equal

ECE 534 51

Low Cost Residue Code

m = 2b -1 ( b ≥ 2) then code word has n+b bitsEncoding in low cost residue is done by dividing the information into b bits and add them in mod (2b -1).Example: b = 2 m = 22 – 1= 3

Data = (167)10 = (10100111)2 10 10 01 11

01 0110

Low cost residue code provides easy encoding Inverse residue code

Instead of appending r, append m-rBetter detection of repeated-used fault

167 mod 3 = 2

ECE 534 52

Residue Number System (RNS)

Represent a number by a set of relatively prime moduliP = [3, 4, 5]

3210 = (2 0 2)RNS

1410 = (2 2 4)RNS

3210 (2 0 2)RNS

+ 1410 + (2 2 4)RNS

4610 (1 2 1)RNS

Speed advantage – carry-free number systemError detection capability

ECE 534 53

Residue Number System (RNS)

Example: P = [3,4]

Number RNS RNS (Binay)0 0 0 00 001 1 1 01 012 2 2 10 103 0 3 00 114 1 0 01 005 2 1 10 016 0 2 00 107 1 3 01 11

Code distance is one

Example: P = [3, 4, 5]

Number RNS RNS (Binay)0 0 0 0 00 00 0001 1 1 1 01 01 0012 2 2 2 10 10 0103 0 3 3 00 11 0114 1 0 4 01 00 1005 2 1 0 10 01 0006 0 2 1 00 10 0017 1 3 2 01 11 010

Code distance is twoTolerates one error

Redundant moduli provide higher code distance and thereforeprovides error detection

ECE 534 54

Self-Checking Concept

Have systems with capabilities of self-checking

Duplication and coding scheme, need to compare outputs of two modules

What if the checker fails – single point of failure sceneraio

Self Checking is the ability to automatically detect the existence of a fault without the need for any externally applied stimulus

Circuit(if faulty)

CodewordValid codeword

Invalid codeword

ECE 534 55

Self-Checking Circuits

Self-testing circuit for every fault from a prescribed set there exists at least one valid input code word that will produce an invalid output code word when a single fault is present in the circuit

Fault secure circuit any single fault from a prescribed set results in the circuit either producing the correct code word or producing a non-code word, for any valid input code word

Totally self-checking circuit (TSC)

the circuit is both fault secure and self-testing

all single faults are detectable by at least one valid code wordinput, and when a given input combination does not detect the fault, the output is the correct code word output

ECE 534 56

Self Checking Checker

Two outputs needed to avoid stuck at fault situationIf checker is faulty free (valid output)

Checker output = 01 or 10If checker is faulty

Checker output = 00 or 11

Checker

Coded OutputCoded InputCircuit

f Error g indicator

ECE 534 57

Two Rail Checker

f = A ⊕ B = AB + AB A = x0 = y0g = A ⊕ B = AB + AB B = x1 = y1

f = x0y1 + x1y0g = x0x1 + y1y0

x0

y1

x1

y0

f

g

ECE 534 58

Two Rail Checkers

fg

x0

y1

x1

y0 x2

y3

x3

y2

x4

y5

x5

y4

f1g1

f2g2

f3g3 f4g4

x0

y1

x1

y0

fg

ECE 534 59

“Vanilla” Node/Network Controller

Node/network controller

• executes high level protocols (e.g., reliable multicast) and interfaces directly with the network• host/ processing element can directly access memory of the controller• the controller can access memory of the host/processing element

Compute Node

Host/Processing Element

Node/Network Controller

Network Interface

RAMROMProcessorBuffer

PCI BUS

Syst

em B

us

ECE 534 60

Fault Resilient Node/Network Controller

High level protocol engine(executes high level protocols, e.g., reliable multicast)

• duplicated, tightly coupled CPUs• data buses compared onmemory accesses and otherCPU operations, e.g., write tomemory of the the network adapter)

• memory write-protection (MASK)on all write operations from the processing element (usually DMA operations)

• checksum computation on the data from ROM

Network adapter• parity checking on all data transfers over the bus

• watchdog to monitor the node communications with the network

• control flow checking and dataaudit to monitor SW errors

Compute Node

Host/Processing Element

RAMROMProcessor 1

Node/Network Controller

RAMROMProcessor 2

Reset

Network Interface

MASK

MASK

RAMROMLow levelprotocol engine

Buffer

Bridge

Network Adapter

High levelprotocol engineComparator

PCI BUS

Watch-dog

Syst

em B

us

Date post:	30-Oct-2020
Category:	Documents
Upload:	others
View:	6 times
Download:	0 times

Fault-Tolerant Design of Digital Systems EGE 534bai/EGE534/lecture_3.pdf · Parity Codes for Memory...

Documents