1
236601 - Coding and Algorithms for
MemoriesLecture 11
2
Array Codes and Distributed Storage
Large Scale Storage Systems
3
• Big Data Players: Facebook, Amazon, Google, Yahoo,…
Cluster of machines running Hadoop at Yahoo! (Source: Yahoo!)
• Failures are the norm
Node failures at Facebook
4
Date
XORing Elephants: Novel Erasure Codes for Big Data M. Sathiamoorthy, M. Asteris, D. Papailiopoulos, A. G. Dimakis, R. Vadali, S. Chen, and D. Borthakur, VLDB 2013
• 3x Replication: » Easily implemented and maintained » Can tolerate any 2 disk failures » Large storage overhead of 300% - A Big Problem!
• More sophisticated schemes:» Reed-Solomon (RS) Codes» The repair problem
State-of-the-Art Storing Schemes
5
1 2 3 4 5 6 7 9 108
1 2 3 4 5 6 7 9 108
1 2 3 4 5 6 7 9 108
Widely used
6
Problem Setup
• Disks are stored together in a group (rack)• Disk failures should be supported• Requirements:– Support as many disk failures as possible– And yet…
• Optimal and fast recovery• Low complexity
7
Problem Setup
• Question 1: How many extra disks are required to support a single disk failure?
Answer: 1, How?• Question 2: How many extra disks are
required to support two disk failures? Answer: 2, How?• Question 3: How many extra disks are
required to support 3 disk failures?Answer: 3, How?
8
Problem Setup
• Question 1: How many extra disks are required to support a single disk failure?
• Question 2: How many extra disks are required to support two disk failures?
• Question 3: How many extra disks are required to support 3 disk failures?
A B C A+B+C
A B C A+B+C
A+B+C
A B C A+B+C
A+B+C
’A+’B+’
C
9
Problem Setup
• Question 1: How many extra disks are required to support a single disk failure?
• Question 2: How many extra disks are required to support two disk failures?
• Question 3: How many extra disks are required to support d disk failures?
A B C A+B+C
A B C A+B+C
A+B+C
A B C A+B+C
A+B+C
’A+’B+’C
{(x1,x2,x3,x4): x1+x2+x3+x4= 0 }
{(x1,x2,x3,x4,x5): x1+x2+x3+x4=0 x1+x2+x3+x5=0 }
{(x1,x2,x3,x4,x5,x6): x1+x2+x3+x4=0 x1+x2+x3+x5=0
’x1+’x2+’x3+x6=0}
10
Problem Setup• Question 1: How many extra disks are required to support
a single disk failure?
• Question 2: How many extra disks are required to support two disk failures?
• Question 3: How many extra disks are required to support d disk failures?
A B C A+B+C
A B C A+B+C
A+B+C
A B C A+B+C
A+B+C
’A+’B+’C
{(x1,x2,x3,x4): x1+x2+x3+x4= 0 }
{(x1,x2,x3,x4,x5): x1+x2+x3+x4=0 x1+x2+x3+x5=0 }
{(x1,x2,x3,x4,x5,x6): x1+x2+x3+x4=0 x1+x2+x3+x5=0
’x1+’x2+’x3+x6=0}
{(x1,x2,x3,x4): H1∙(x1,x2,x3,x4)T=0}
H1 = (1,1,1,1)
{(x1,x2,x3,x4,x5): H2∙(x1,x2,x3,x4,x5)T=0}
H2= (1,1,1,1,0; ,,,0,1)
{(x1,x2,x3,x4,x5,x6):H3∙(x1,x2,x3,x4,x5,x6)T=0} H3= (1,1,1,1,0,0; ,,,0,1,0; ’,’,’,0,1,0)
11
Problem Setup
• Question 2: How many extra disks are required to support two disk failures?
• Question: What is the requirement on H2?
Answer: Every 2x2 sub-matrix has rank two• Question: What is the requirement on H3?
Answer: Every 3x3 sub-matrix has rank three
A B C A+B+C
A+B+C
{(x1,x2,x3,x4,x5): x1+x2+x3+x4=0 x1+x2+x3+x5=0 }
{(x1,x2,x3,x4,x5): H2∙(x1,x2,x3,x4,x5)T=0}
H2= (1,1,1,1,0; ,,,0,1)
12
Problem Setup
• Question: How many extra disks are required to support d disk failures?Answer: d, How?
{(x1,x2,…,xn-1,xn):H∙(x1,x2,…,xn-1,xn)T=0}, n=k+d• What is the requirement on H?• Answer: Every sub-matrix of size dxd has
rank d• Is it possible to construct such matrices?
13
Reed Solomon Codes
• A code with parity check matrix of the form
Where is a primitive element at some extension field and O() > n-1Claim: Every sub-matrix of size dxd has full rank
14
Vandermonde Matrices
15
Reed Solomon Codes
• Advantages:– Support the maximum number of disk failures– Are very comment in practice and have
relatively efficient encoding/decoding schemes
• Disadvantages – Require to work over large fields– Need to require all the disks in order to recover
even a single disk failure – not efficient rebuild
16
EVENODD Codes
• Designed by Mario Balum, Jim Brady, Jehoshua Bruck, and Jai Menon
• Goal: Construct array codes correcting 2 disk failures using only binary XOR operations– No need for calculations over extension fields
• Code construction:– Every disk is a column– The array size is (m-1)x(m+2), m is prime– The last two arrays are used for parity
17
EVENODD Codes
• Code construction:– Every disk is a column– The array size is (m-1)x(m+2), m is prime– The last two arrays are used for parity
0 1 1 0 1
0 0 1 1 0
0 0 0 1 1
1 1 0 1 0
18
EVENODD Codes
• Redundancy Calculation:– First parity drive – a simple XOR of the first m-1 disks
for 0 ≤ l ≤ m-2– Second parity drive – S=1
for 0 ≤ l ≤ m-2
0 1 1 0 1
0 0 1 1 0
0 0 0 1 1
1 1 0 1 0
1 0 1 1 0 1
0 0 0 1 1 0
0 0 0 0 1 1
1 1 1 0 1 0
19
EVENODD Codes
• Redundancy Calculation:– First parity drive – a simple XOR of the first m-1 disks
for 0 ≤ l ≤ m-2– Second parity drive – S=1
for 0 ≤ l ≤ m-2
0 1 1 0 1
0 0 1 1 0
0 0 0 1 1
1 1 0 1 0
0 1 0 1 1 0 1
0 0 0 0 1 1 0
1 0 0 0 0 1 1
0 1 1 1 0 1 0
0 0 0 0 0 0 0
20
Decoding of EVENODD Codes
• Observation: the value of S is the bits sum on the last two columns
S = 1
0 1 1 0 1
0 0 1 1 0
0 0 0 1 1
1 1 0 1 0
0 1 0 1 ? 0 ?
0 0 0 0 ? 1 ?
1 0 0 0 ? 1 ?
0 1 1 1 ? 1 ?
21
Decoding of EVENODD Codes
• Observation: the value of S is the bits sum on the last two columns
S = 1
0 1 1 0 1
0 0 1 1 0
0 0 0 1 1
1 1 0 1 0
0 1 0 1 ? 0 ?
0 0 0 0 ? 1 ?
1 0 0 0 ? 1 ?
0 1 1 1 ? 1 ?
22
Decoding of EVENODD Codes
• Observation: the value of S is the bits sum on the last two columns
S = 1
0 1 1 0 1
0 0 1 1 0
0 0 0 1 1
1 1 0 1 0
0 1 0 1 ? 0 ?
0 0 0 0 ? 1 0
1 0 0 0 ? 1 ?
0 1 1 1 ? 1 ?
23
Decoding of EVENODD Codes
• Observation: the value of S is the bits sum on the last two columns
S = 1
0 1 1 0 1
0 0 1 1 0
0 0 0 1 1
1 1 0 1 0
0 1 0 1 ? 0 ?
0 0 0 0 1 1 0
1 0 0 0 ? 1 ?
0 1 1 1 ? 1 ?
24
Decoding of EVENODD Codes
• Observation: the value of S is the bits sum on the last two columns
S = 1
0 1 1 0 1
0 0 1 1 0
0 0 0 1 1
1 1 0 1 0
0 1 0 1 ? 0 ?
0 0 0 0 1 1 0
1 0 0 0 ? 1 ?
0 1 1 1 ? 1 0
25
Decoding of EVENODD Codes
• Observation: the value of S is the bits sum on the last two columns
S = 1
0 1 1 0 1
0 0 1 1 0
0 0 0 1 1
1 1 0 1 0
0 1 0 1 ? 0 ?
0 0 0 0 1 1 0
1 0 0 0 ? 1 ?
0 1 1 1 0 1 0
26
Decoding of EVENODD Codes
• Observation: the value of S is the bits sum on the last two columns
S = 1
0 1 1 0 1
0 0 1 1 0
0 0 0 1 1
1 1 0 1 0
0 1 0 1 ? 0 1
0 0 0 0 1 1 0
1 0 0 0 ? 1 ?
0 1 1 1 0 1 0
27
Decoding of EVENODD Codes
• Observation: the value of S is the bits sum on the last two columns
S = 1
0 1 1 0 1
0 0 1 1 0
0 0 0 1 1
1 1 0 1 0
0 1 0 1 ? 0 1
0 0 0 0 1 1 0
1 0 0 0 ? 1 ?
0 1 1 1 0 1 0
28
Decoding of EVENODD Codes
• Observation: the value of S is the bits sum on the last two columns
S = 1
0 1 1 0 1
0 0 1 1 0
0 0 0 1 1
1 1 0 1 0
0 1 0 1 1 0 1
0 0 0 0 1 1 0
1 0 0 0 ? 1 ?
0 1 1 1 0 1 0
29
Decoding of EVENODD Codes
• Observation: the value of S is the bits sum on the last two columns
S = 1
0 1 1 0 1
0 0 1 1 0
0 0 0 1 1
1 1 0 1 0
0 1 0 1 1 0 1
0 0 0 0 1 1 0
1 0 0 0 ? 1 1
0 1 1 1 0 1 0
30
Decoding of EVENODD Codes
• Observation: the value of S is the bits sum on the last two columns
S = 1
0 1 1 0 1
0 0 1 1 0
0 0 0 1 1
1 1 0 1 0
0 1 0 1 1 0 1
0 0 0 0 1 1 0
1 0 0 0 0 1 1
0 1 1 1 0 1 0
31
EVENODD Codes
• Redundancy Calculation:– First parity drive – a simple XOR of the first m-1 disks
for 0 ≤ l ≤ m-2– Second parity drive – S=1
for 0 ≤ l ≤ m-2
0 1 1 0 1
0 0 1 1 0
0 0 0 1 1
1 1 0 1 0
0 1 0 1 1 0 1
0 0 0 0 1 1 0
1 0 0 0 0 1 1
0 1 1 1 0 1 0
32
33
Decoding of EVENODD Codes
• Observation: the value of S is the bits sum on the last two columns
S = 1
0 1 1 0 1
0 0 1 1 0
0 0 0 1 1
1 1 0 1 0
0 1 0 1 ? 0 ?
0 0 0 0 ? 1 ?
1 0 0 0 ? 1 ?
0 1 1 1 ? 1 ?
34
Decoding of EVENODD Codes
• Observation: the value of S is the bits sum on the last two columns
S = 1
0 1 1 0 1
0 0 1 1 0
0 0 0 1 1
1 1 0 1 0
0 1 0 1 ? 0 ?
0 0 0 0 ? 1 ?
1 0 0 0 0 1 ?
0 1 1 1 ? 1 ?
35
Decoding of EVENODD Codes
• Observation: the value of S is the bits sum on the last two columns
S = 1
0 1 1 0 1
0 0 1 1 0
0 0 0 1 1
1 1 0 1 0
0 1 0 1 ? 0 ?
0 0 0 0 ? 1 ?
1 0 0 0 0 1 1
0 1 1 1 ? 1 ?
36
Decoding of EVENODD Codes
• Observation: the value of S is the bits sum on the last two columns
S = 1
0 1 1 0 1
0 0 1 1 0
0 0 0 1 1
1 1 0 1 0
0 1 0 1 1 0 ?
0 0 0 0 ? 1 ?
1 0 0 0 0 1 1
0 1 1 1 ? 1 ?
37
Decoding of EVENODD Codes
• Observation: the value of S is the bits sum on the last two columns
S = 1
0 1 1 0 1
0 0 1 1 0
0 0 0 1 1
1 1 0 1 0
0 1 0 1 1 0 1
0 0 0 0 ? 1 ?
1 0 0 0 0 1 1
0 1 1 1 ? 1 ?
38
Decoding of EVENODD Codes
• Observation: the value of S is the bits sum on the last two columns
S = 1
0 1 1 0 1
0 0 1 1 0
0 0 0 1 1
1 1 0 1 0
0 1 0 1 1 0 1
0 0 0 0 ? 1 ?
1 0 0 0 0 1 1
0 1 1 1 0 1 ?
39
Decoding of EVENODD Codes
• Observation: the value of S is the bits sum on the last two columns
S = 1
0 1 1 0 1
0 0 1 1 0
0 0 0 1 1
1 1 0 1 0
0 1 0 1 1 0 1
0 0 0 0 ? 1 ?
1 0 0 0 0 1 1
0 1 1 1 0 1 0
40
Decoding of EVENODD Codes
• Observation: the value of S is the bits sum on the last two columns
S = 1
0 1 1 0 1
0 0 1 1 0
0 0 0 1 1
1 1 0 1 0
0 1 0 1 1 0 1
0 0 0 0 1 1 ?
1 0 0 0 0 1 1
0 1 1 1 0 1 0
41
Decoding of EVENODD Codes
• Observation: the value of S is the bits sum on the last two columns
S = 1
0 1 1 0 1
0 0 1 1 0
0 0 0 1 1
1 1 0 1 0
0 1 0 1 1 0 1
0 0 0 0 1 1 0
1 0 0 0 0 1 1
0 1 1 1 0 1 0
42
EVENODD Codes
• Redundancy Calculation:– First parity drive – a simple XOR of the first m-1 disks
for 0 ≤ l ≤ m-2– Second parity drive – S=1
for 0 ≤ l ≤ m-2
0 1 1 0 1
0 0 1 1 0
0 0 0 1 1
1 1 0 1 0
0 1 0 1 1 0 1
0 0 0 0 1 1 0
1 0 0 0 0 1 1
0 1 1 1 0 1 0