Introduction Repetition Codes Parity Reed-Solomon Conclusion
Error Detection, Correction and Erasure Codesfor Implementation in a Cluster File-system
Steve Baker
Department of Computer ScienceIndiana State University
December 7th 2011
Introduction Repetition Codes Parity Reed-Solomon Conclusion
Introduction
Introduction Repetition Codes Parity Reed-Solomon Conclusion
Repetition Codes
One of the simplest ways to add redundancy to a code andallow for error recovery is to repeat the same data somenumber of times.
C1 =
00 = 011 = 1
C2 =
000 = 0111 = 1
C3 =
00000 = 011111 = 1
Introduction Repetition Codes Parity Reed-Solomon Conclusion
Simply repeats the data one or more times.
Introduction Repetition Codes Parity Reed-Solomon Conclusion
Simply repeats the data one or more times.
Used as RAID 1 and in Google’s Cluster File system.
Introduction Repetition Codes Parity Reed-Solomon Conclusion
Simply repeats the data one or more times.
Used as RAID 1 and in Google’s Cluster File system.
Can withstand as many erasures as copies, and has nocomputational overhead, so is a good erasure code, butbad with respect to storage.
Introduction Repetition Codes Parity Reed-Solomon Conclusion
Simply repeats the data one or more times.
Used as RAID 1 and in Google’s Cluster File system.
Can withstand as many erasures as copies, and has nocomputational overhead, so is a good erasure code, butbad with respect to storage.
With only two devices (or one singular datum withadditional redundancy), all schemes reduce to a repetitioncode.
Introduction Repetition Codes Parity Reed-Solomon Conclusion
Parity
A single bit added to n bits to indicate that the number ofset bits is either even or odd.
Can detect and odd number of errors (but not an evennumber), and can be used to correct a single erasure.
Widely used in RAID systems (levels 3-5 & 6) as anefficient code to reconstruct missing data.
Introduction Repetition Codes Parity Reed-Solomon Conclusion
Given n data disks, the parity P is generated by xor’ing thevalues of the data disks together, giving us:
P = D1 ⊕ D2 ⊕ D3 ⊕ · · · ⊕ Dn
Introduction Repetition Codes Parity Reed-Solomon Conclusion
Given n data disks, the parity P is generated by xor’ing thevalues of the data disks together, giving us:
P = D1 ⊕ D2 ⊕ D3 ⊕ · · · ⊕ Dn
Restoring data for a lost data drive i involves computing theparity of the remaining drives, which we’ll call Px and xor’ingthat with the original parity. The value of our lost data is then:
Di = (P ⊕ Px) = P ⊕ D1 ⊕ · · · ⊕ Di−1 ⊕ Di+1 ⊕ · · · ⊕ Dn
Introduction Repetition Codes Parity Reed-Solomon Conclusion
To correct for more than one error using parity one can attemptto employ other parity-based schemes:
Use small groups each with their own parity.
Introduction Repetition Codes Parity Reed-Solomon Conclusion
To correct for more than one error using parity one can attemptto employ other parity-based schemes:
Use small groups each with their own parity.
Use a Hamming-Code scheme.
Introduction Repetition Codes Parity Reed-Solomon Conclusion
To correct for more than one error using parity one can attemptto employ other parity-based schemes:
Use small groups each with their own parity.
Use a Hamming-Code scheme.
2-Dimensional arrangement of data w/ parity for rows andcolumns.
Introduction Repetition Codes Parity Reed-Solomon Conclusion
To correct for more than one error using parity one can attemptto employ other parity-based schemes:
Use small groups each with their own parity.
Use a Hamming-Code scheme.
2-Dimensional arrangement of data w/ parity for rows andcolumns.
Cannot reliably handle more than 1 erasure in onedimension and 3 in two dimensions. For m parity bits, thereis some failure arrangement of k data disks < m, that thesystem cannot recover from.
Introduction Repetition Codes Parity Reed-Solomon Conclusion
The Reed-Solomon Algorithm
Introduction Repetition Codes Parity Reed-Solomon Conclusion
Developed by Irving S. Reed and Gustave Solomon.
A linear block code that does not work on bits, but symbols(read bytes).
Widely used to detect and correct for errors in CD and DVDmedia.
Given m additional symbols added to data, Reed-Solomoncan detect up to m errors and correct up to ⌊m/2⌋ errors.
As an erasure code it can recover up to any m erasures,making it an optimal erasure code, and so it is of particularinterest to us.
Introduction Repetition Codes Parity Reed-Solomon Conclusion
Originally viewed as a code where an overdeterminedsystem is created by oversampling the input and thenencoding the message as coefficients of a polynomial (overa Finite Field). Recovery of the original message remainspossible so long as there are at least as many remainingpoints in the system as were in the original message.
Introduction Repetition Codes Parity Reed-Solomon Conclusion
Originally viewed as a code where an overdeterminedsystem is created by oversampling the input and thenencoding the message as coefficients of a polynomial (overa Finite Field). Recovery of the original message remainspossible so long as there are at least as many remainingpoints in the system as were in the original message.
Traditionally however viewed as a cyclic code where themessage polynomial is multiplied by a cyclic generatorpolynomial to produce an output polynomial where theencoding check symbols are the coefficients.These coefficients can then be used to check for errorswith polynomial division with the generator polynomial. Anynon-zero remainder indicates an error.The remainder can then be used to solve a system oflinear equations known as syndrome decoding, often usingthe Berlekamp-Massey algorithm.
Introduction Repetition Codes Parity Reed-Solomon Conclusion
Because we are most interested in using Reed-Solomon as anerasure code, which implies that the error locations are knownwe can employ a simpler method of encoding and decodingusing a properly formed information dispersal matrix (one thatis formed from manipulating a Vandermonde matrix so that it isinvertible via Gaussian Elimination). First we will discuss GaloisFields, in which all arithmetic will take place.
Introduction Repetition Codes Parity Reed-Solomon Conclusion
Galois Fields
Galois Fields
For many codes, Reed-Solomon included, it is necessary toperform addition, subtraction, multiplication and particularlydivision without introducing rounding errors or a loss ofprecision, and since the resulting codes often need to be in aspecific integer range – modulus arithmetic would seem to berequired. However in normal modulus arithmetic, division isproblematic. Rings of size prime would solve that issue, but failto allow us to efficiently use our data storage. Arithmetic in aGalois Field where q = 2h however provides us with a way toaccomplish these feats.
Introduction Repetition Codes Parity Reed-Solomon Conclusion
Galois Fields
Galois Fields have the following properties:
(i) F is closed under addition and multiplication, i.e. a + b anda · b are in F whenever a and b are in F.
(ii) Commutative laws: a + b = b + a , a · b = b · a.
(iii) Associative laws: (a + b) + c = a + (b + c) ,a · (b · c) = (a · b) · c.
(iv) Distributive law: a · (b + c) = a · b + a · c.
(v) a + 0 = a for all a in F.
(vi) a · 1 = a for all a in F.
(vii) For any a in F, there exists an additive inverse element(−a) in F such that a + (−a) = 0.
(viii) For any a 6= 0 in F, there exists a multiplicative inverseelement a−1 in F such that a · a−1 = 1.
Introduction Repetition Codes Parity Reed-Solomon Conclusion
Galois Fields
Because of (vii) and (viii) we have subtraction and divisionrespectively, understanding that a − b = a + (−b) anda ÷ b = a · b−1 for b 6= 0.
We also have the following properties:
(i) a · 0 = 0 for all a in F.
(ii) a · b = 0 ⇒ a = 0 or b = 0. (Thus the product of twonon-zero elements of a field is also non-zero.)
Introduction Repetition Codes Parity Reed-Solomon Conclusion
Galois Fields
In a field of order 2h, addition and subtraction is essentiallythe xor operation.
Multiplication and division is handled quickly by usinglogarithm and inverse log tables. To multiply the logs of thepolynomials are added together and the inverse log taken,to divide the logs are subtracted instead. Adding andsubtracting logs is done modulus the size of the field.
Introduction Repetition Codes Parity Reed-Solomon Conclusion
Galois Fields
Galois Field for GF (24).
polynomial vector decimal0 (0 0 0 0) 0x0 1 (0 0 0 1) 1x1 x (0 0 1 0) 2x2 x2 (0 1 0 0) 4x3 x3 (1 0 0 0) 8x4 x + 1 (0 0 1 1) 3x5 x2 + x (0 1 1 0) 6x6 x3 + x2 (1 1 0 0) 12x7 x3 + x + 1 (1 0 1 1) 11x8 x2 + 1 (0 1 0 1) 5x9 x3 + x (1 0 1 0) 10x10 x2 + x + 1 (0 1 1 1) 7x11 x3 + x2 + x (1 1 1 0) 14x12 x3 + x2 + x + 1 (1 1 1 1) 15x13 x3 + x2 + 1 (1 1 0 1) 13x14 x3 + 1 (1 0 0 1) 9
Introduction Repetition Codes Parity Reed-Solomon Conclusion
Information Dispersal Matrix
To create the check-sums necessary for our Reed-Solomonencoding, we use a (n + m) × n information dispersal matrix,created by altering a standard Vandermonde matrix until thefirst n rows are the (n × n) identity matrix. Once we have aproperly created dispersal matrix, any sub-matrix formed bydeleting m rows of the matrix is invertible via GaussianElimination.
A Vandermonde matrix is defined as:
00(= 1) 01(= 0) 02(= 0) · · · 0n(= 0)10 11 12 · · · 1n−1
20 21 22 · · · 2n−1
30 31 32 · · · 3n−1
......
.... . .
...(n + m − 1)0 (n + m − 1)1 (n + m − 1)2 · · · (n + m − 1)n−1
Introduction Repetition Codes Parity Reed-Solomon Conclusion
Information Dispersal Matrix
The above Vandermonde matrix has the property that anysub-matrix formed by deleting m rows, is invertible. By usingelementary transformations of the above matrix, we canmaintain its rank, and thereby maintain its invertibleproperty.[PD03] Thus we can derive our information dispersalmatrix A through the following elementary operations on theVandermonde matrix until the first n rows are the identitymatrix.[PD03]
Any column Ci may be swapped with column Cj .
Any column Ci may be replaced by Ci · c, where c 6= 0.
Any column Ci may be replaced by adding a multiple ofanother column to it: Ci = Ci + c · Cj , where j 6= i andc 6= 0.
Introduction Repetition Codes Parity Reed-Solomon Conclusion
Information Dispersal Matrix
An example [PD03], we construct A for n = 3, m = 3 overGF (24). We first construct the 6 × 3 Vandermonde matrix overGF (24).
00 01 02
10 11 12
20 21 22
30 31 32
40 41 42
50 51 52
=
1 0 01 1 11 2 41 3 51 4 31 5 2
⇒
1 0 00 1 00 0 1? ? ?? ? ?? ? ?
Introduction Repetition Codes Parity Reed-Solomon Conclusion
Information Dispersal Matrix
Row 1 is already an identity row, so we move on to row 2. Toconvert row 2, we note that f2,1 = f2,2 = f2,3 = 1, so we need toreplace C1 with (C1 − C2) and C3 with (C3 − C2). Resulting in(a).
(a) (b) (c)
1 0 00 1 03 2 62 3 65 4 74 5 7
Introduction Repetition Codes Parity Reed-Solomon Conclusion
Information Dispersal Matrix
All that is left is to convert row 3. Since f3,3 6= 1, we replace C3
with 6−1C3 = 7C3 resulting in (b).
(a) (b) (c)
1 0 00 1 03 2 62 3 65 4 74 5 7
⇒
1 0 00 1 03 2 12 3 15 4 64 5 6
Introduction Repetition Codes Parity Reed-Solomon Conclusion
Information Dispersal Matrix
Then finally replace C1 with (C1 − 3C3) and C2 with (C2 − 2C3)to finally yield our desired A (c).
(a) (b) (c)
1 0 00 1 03 2 62 3 65 4 74 5 7
⇒
1 0 00 1 03 2 12 3 15 4 64 5 6
⇒
1 0 00 1 00 0 11 1 1
15 8 614 9 6
Introduction Repetition Codes Parity Reed-Solomon Conclusion
The Reed-Solomon Algorithm
The Algorithm
The general idea of the algorithm is to generate a set ofcheck-sums that are linearly independent from one another.Given the data set D composed of data elementsd1, d2, . . . , dn, we generate the check-sum word ci byapplying a function Fi to the data D, i.e. ci = Fi(d1, d2, . . . , dn).
Introduction Repetition Codes Parity Reed-Solomon Conclusion
The Reed-Solomon Algorithm
The Algorithm
The general idea of the algorithm is to generate a set ofcheck-sums that are linearly independent from one another.Given the data set D composed of data elementsd1, d2, . . . , dn, we generate the check-sum word ci byapplying a function Fi to the data D, i.e. ci = Fi(d1, d2, . . . , dn).
F is the last m rows of the invertible matrix A, formed froma Vandermonde matrix.
Introduction Repetition Codes Parity Reed-Solomon Conclusion
The Reed-Solomon Algorithm
The Algorithm
The general idea of the algorithm is to generate a set ofcheck-sums that are linearly independent from one another.Given the data set D composed of data elementsd1, d2, . . . , dn, we generate the check-sum word ci byapplying a function Fi to the data D, i.e. ci = Fi(d1, d2, . . . , dn).
F is the last m rows of the invertible matrix A, formed froma Vandermonde matrix.
The first n rows are the identity matrix.
Introduction Repetition Codes Parity Reed-Solomon Conclusion
The Reed-Solomon Algorithm
The Algorithm
The general idea of the algorithm is to generate a set ofcheck-sums that are linearly independent from one another.Given the data set D composed of data elementsd1, d2, . . . , dn, we generate the check-sum word ci byapplying a function Fi to the data D, i.e. ci = Fi(d1, d2, . . . , dn).
F is the last m rows of the invertible matrix A, formed froma Vandermonde matrix.
The first n rows are the identity matrix.
The remaining m rows are used to generate the check-sumwords from the data.
Introduction Repetition Codes Parity Reed-Solomon Conclusion
The Reed-Solomon Algorithm
Given our invertible matrix A =[
IF
]
, and our data D, we have
the equation to generate an array E where E =[
DC
]
as:
A · D = E
Introduction Repetition Codes Parity Reed-Solomon Conclusion
The Reed-Solomon Algorithm
Given our invertible matrix A =[
IF
]
, and our data D, we have
the equation to generate an array E where E =[
DC
]
as:
A · D = E
As matrices:
1 0 . . . 00 1 . . . 0...
.... . .
...0 0 . . . 1
f1,1 f1,2 . . . f1,n...
.... . .
...fm,1 fm,2 . . . fm,n
·
d1
d2
d3...
dn
=
d1
d2...
dn
c1...
cm
Introduction Repetition Codes Parity Reed-Solomon Conclusion
Data updates
Data updates
When any of the data words changes, each check-sum wordmust be updated to reflect the change. This can be doneefficiently by subtracting out the portion of the check-sumcorresponding to the old data word and adding in the newamount, so in this way we do not need to re-inspect the otherdata words to affect the change. Given a new data word d ′
jwhich changes dj , we compute:
c′
i = ci + fi,j(d′
j − dj)
where c′
i is the new updated check-sum for each ci .[P96]
Introduction Repetition Codes Parity Reed-Solomon Conclusion
Data recovery
Data recovery
To perform data recovery, we use the identity matrix I and thematrix D and replace the rows in I and D where there aremissing data words with the first available check-sum row fromA and check-sum word from E .For example, if data word d2 was missing, we would replacerow 2 of the identity matrix with the first check-sum row from A(row n + 1) and replace the value for d2 in D with c1. For eachsuccessive missing data element we repeat this process usingthe next available check-sum row until we have replaced all themissing data elements or run out of check-sum rows (in whichcase we cannot recover the data.)
Introduction Repetition Codes Parity Reed-Solomon Conclusion
Data recovery
Once we have done this we have the new matrices A′ and E ′.To recover the data we solve the following equation to find D:
A′ · D = E ′
By inverting the matrix A′ using Gaussian Elimination we yield:
D = (A′)−1 · E ′
Introduction Repetition Codes Parity Reed-Solomon Conclusion
An RS-example
An example for n = 3 and m = 3, D = 8, 5, 10 in the fieldGF (24). We first construct A by altering a Vandermonde matrixin the way described above to yield:
A =
1 0 00 1 00 0 11 1 1
15 8 614 9 6
Introduction Repetition Codes Parity Reed-Solomon Conclusion
An RS-example
We use rows 4-6 to calculate our check-sum digits c1, c2 and c3
as follows:
c1 = (1 · 8) ⊕ (1 · 5) ⊕ (1 · 10)
= 8 ⊕ 5 ⊕ 10
= (1000) ⊕ (0101) ⊕ (1010) = (0111) = 7
Introduction Repetition Codes Parity Reed-Solomon Conclusion
An RS-example
We use rows 4-6 to calculate our check-sum digits c1, c2 and c3
as follows:
c1 = (1 · 8) ⊕ (1 · 5) ⊕ (1 · 10)
= 8 ⊕ 5 ⊕ 10
= (1000) ⊕ (0101) ⊕ (1010) = (0111) = 7
c2 = (15 · 8) ⊕ (8 · 5) ⊕ (6 · 10)
= 1 ⊕ 14 ⊕ 9
= (0001) ⊕ (1110) ⊕ (1001) = (0110) = 6
Introduction Repetition Codes Parity Reed-Solomon Conclusion
An RS-example
We use rows 4-6 to calculate our check-sum digits c1, c2 and c3
as follows:
c1 = (1 · 8) ⊕ (1 · 5) ⊕ (1 · 10)
= 8 ⊕ 5 ⊕ 10
= (1000) ⊕ (0101) ⊕ (1010) = (0111) = 7
c2 = (15 · 8) ⊕ (8 · 5) ⊕ (6 · 10)
= 1 ⊕ 14 ⊕ 9
= (0001) ⊕ (1110) ⊕ (1001) = (0110) = 6
c3 = (14 · 8) ⊕ (9 · 5) ⊕ (6 · 10)
= 9 ⊕ 11 ⊕ 9
= (1001) ⊕ (1011) ⊕ (1001) = (1011) = 11
Introduction Repetition Codes Parity Reed-Solomon Conclusion
An RS-example
Now suppose we lose the data elements d1, d3 and check-sumword c2. We can still recover our data words (and once wehave, we can recompute our missing check-sum word). To doso we take our identity matrix and replace the rows 1 and 3corresponding to data words d1 and d3 with the two remainingcheck-sum rows, and then invert it, like so:
1 0 00 1 00 0 1
⇒ A′ =
1 1 10 1 014 9 6
⇒ (A′)−1 =
4 10 150 1 05 11 15
C =
7611
, D =
?5?
⇒ E ′ =
7511
Introduction Repetition Codes Parity Reed-Solomon Conclusion
An RS-example
Then to find d1 and d3 we multiply rows 1 and 3 of (A′)−1
through matrix E ′ to yield:
d1 = (4 · 7) ⊕ (10 · 5) ⊕ (15 · 11)
= 15 ⊕ 4 ⊕ 3
= (1111) ⊕ (0100) ⊕ (0011) = (1000) = 8
Introduction Repetition Codes Parity Reed-Solomon Conclusion
An RS-example
Then to find d1 and d3 we multiply rows 1 and 3 of (A′)−1
through matrix E ′ to yield:
d1 = (4 · 7) ⊕ (10 · 5) ⊕ (15 · 11)
= 15 ⊕ 4 ⊕ 3
= (1111) ⊕ (0100) ⊕ (0011) = (1000) = 8
d3 = (5 · 7) ⊕ (11 · 5) ⊕ (15 · 11)
= 8 ⊕ 1 ⊕ 3
= (1000) ⊕ (0001) ⊕ (0011) = (1010) = 10
Introduction Repetition Codes Parity Reed-Solomon Conclusion
An RS-example
Then to find d1 and d3 we multiply rows 1 and 3 of (A′)−1
through matrix E ′ to yield:
d1 = (4 · 7) ⊕ (10 · 5) ⊕ (15 · 11)
= 15 ⊕ 4 ⊕ 3
= (1111) ⊕ (0100) ⊕ (0011) = (1000) = 8
d3 = (5 · 7) ⊕ (11 · 5) ⊕ (15 · 11)
= 8 ⊕ 1 ⊕ 3
= (1000) ⊕ (0001) ⊕ (0011) = (1010) = 10
Having obtained our data words, we can re-calculate themissing check-sum c2 through the normal mechanism.
Introduction Repetition Codes Parity Reed-Solomon Conclusion
complexity
Given n data elements, and m desired checksum words, thecomplexity of the algorithm is:
First addition and subtraction in the field is an xoroperation, which is assumed to be O(1) in complexity.
Introduction Repetition Codes Parity Reed-Solomon Conclusion
complexity
Given n data elements, and m desired checksum words, thecomplexity of the algorithm is:
First addition and subtraction in the field is an xoroperation, which is assumed to be O(1) in complexity.
Multiplication and division are two table look-ups, a regularaddition or subtraction with a possible modulus operation(implemented as a comparison and optionaladdition/subtraction) followed by an additional tablelook-up, so three table look-ups and up to twoaddition/subtractions, so at most (3T · 2h) operations= Ω(h), where h is the number of bits input in the addition /subtraction and T is some table look-up constant.
Introduction Repetition Codes Parity Reed-Solomon Conclusion
complexity
The generation of the distribution matrix, requires firstcreating a Vandermonde matrix. The first and secondcolumn of which require no expensive operations to create.Each successive column requires a multiplication in thefield of the previous column value by the row index for eachrow > 2. So given a (n + m) × n matrix, we require(n + m − 1) × (n − 2) field multiplications. Conversion toinvertible form, requires up to 2 · (n + m − 1) · n additionalmultiplications and xors. Since this matrix can be cached,and is only generated once, its computational complexity ismostly irrelevant.
Introduction Repetition Codes Parity Reed-Solomon Conclusion
complexity
The generation of the distribution matrix, requires firstcreating a Vandermonde matrix. The first and secondcolumn of which require no expensive operations to create.Each successive column requires a multiplication in thefield of the previous column value by the row index for eachrow > 2. So given a (n + m) × n matrix, we require(n + m − 1) × (n − 2) field multiplications. Conversion toinvertible form, requires up to 2 · (n + m − 1) · n additionalmultiplications and xors. Since this matrix can be cached,and is only generated once, its computational complexity ismostly irrelevant.
To compute a single checksum word requires n fieldmultiplications and (n − 1) additions (xors) in the field. Anupdate will require m check-sums to be updated, whichrequires two xors and a multiplication for each checksum.
Introduction Repetition Codes Parity Reed-Solomon Conclusion
complexity
To recover data requires a matrix inversion which is doneby Gaussian Elimination, which has an arithmeticcomplexity of approximately 2n3/3 operations or O(n3).This operation needs to only be performed once prior todata restoration, so might be considered insignificant.
Introduction Repetition Codes Parity Reed-Solomon Conclusion
complexity
To recover data requires a matrix inversion which is doneby Gaussian Elimination, which has an arithmeticcomplexity of approximately 2n3/3 operations or O(n3).This operation needs to only be performed once prior todata restoration, so might be considered insignificant.
Recovery of up to m data words again only requires n fieldmultiplications and (n − 1) xor operations for eachrecovered data word.
Introduction Repetition Codes Parity Reed-Solomon Conclusion
complexity
To recover data requires a matrix inversion which is doneby Gaussian Elimination, which has an arithmeticcomplexity of approximately 2n3/3 operations or O(n3).This operation needs to only be performed once prior todata restoration, so might be considered insignificant.
Recovery of up to m data words again only requires n fieldmultiplications and (n − 1) xor operations for eachrecovered data word.
Overall time complexity for Reed-Solomon grows with thesquare of the growth of data elements and check symbols inthe system, or O(n · m), m < n, or O(n2).
Introduction Repetition Codes Parity Reed-Solomon Conclusion
Questions?
Introduction Repetition Codes Parity Reed-Solomon Conclusion
References
Raymond HillA First Course in Coding Theory, From
Oxford University Press, New York, 1996
James S. PlankA Tutorial on Reed-Solomon Coding for Fault-Tolerance inRAID-like Systems, FromUniversity of Tennessee, 1996, CS-96-332,http://www.cs.utk.edu/ plank/plank/papers/CS-96-332.html
James S. Plank, Ying Ding,Note: Correction to the 1997 Tutorial on Reed-SolomonCoding, FromUniversity of Tennessee, 2003, CS-03-504,http://www.cs.utk.edu/ plank/plank/papers/CS-03-504.html