On Coding for Distributed Networked Storage Systems
Frederique Oggier,Joint work with Anwitaman Datta
Nanyang Technological University, Singapore
IMS-NTU Workshop on Coding and Cryptography,Singapore, May 2011
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 1 / 34
Outline
1 Coding for Distributed Networked Storage
2 Self-Repairing Codes: Definition and Constructions
3 Self-Repairing Codes: Analysis and Properties
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 2 / 34
Coding for Distributed Networked Storage
Distributed Networked Storage
A data owner wants to store data over a network of nodes (e.g. datacenter, back-up or archival in peer-to-peer networks).
Redundancy is essential for resilience
Replication: good availability and durability, but very costly.Erasure codes: good trade-off of availability, durability and storagecost.
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 3 / 34
Coding for Distributed Networked Storage
Distributed Networked Storage
A data owner wants to store data over a network of nodes (e.g. datacenter, back-up or archival in peer-to-peer networks).
Redundancy is essential for resilience
Replication: good availability and durability, but very costly.Erasure codes: good trade-off of availability, durability and storagecost.
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 3 / 34
Coding for Distributed Networked Storage
Distributed Networked Storage
A data owner wants to store data over a network of nodes (e.g. datacenter, back-up or archival in peer-to-peer networks).
Redundancy is essential for resilience
Replication: good availability and durability, but very costly.
Erasure codes: good trade-off of availability, durability and storagecost.
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 3 / 34
Coding for Distributed Networked Storage
Distributed Networked Storage
A data owner wants to store data over a network of nodes (e.g. datacenter, back-up or archival in peer-to-peer networks).
Redundancy is essential for resilience
Replication: good availability and durability, but very costly.Erasure codes: good trade-off of availability, durability and storagecost.
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 3 / 34
Coding for Distributed Networked Storage
Erasure codes for storage systems
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 4 / 34
Coding for Distributed Networked Storage
Repair
Nodes may go offline, or may fail, so that the data they storebecomes unavailable.
Redundancy needs to be replenished, else data may be permanentlylost over time (after multiple storage node failures)
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 5 / 34
Coding for Distributed Networked Storage
Repair
Nodes may go offline, or may fail, so that the data they storebecomes unavailable.
Redundancy needs to be replenished, else data may be permanentlylost over time (after multiple storage node failures)
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 5 / 34
Coding for Distributed Networked Storage
Repair process using traditional Erasure Codes
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 6 / 34
Coding for Distributed Networked Storage
Related work
1 J. Kubiatowicz, D. Bindel, Y. Chen, S. Czerwinski, P. Eaton, D.Geels, R. Gummadi, S. Rhea, H. Weatherspoon, W. Weimer, C.Wells, and B. Zhao. OceanStore: An Architecture for Global-ScalePersistent Storage, ASPLOS 2000.
2 H. Weatherspoon, J. Kubiatowicz. Erasure Coding Vs. Replication: AQuantitative Comparison, Peer-to-Peer Systems, LNCS, 2002.
3 A. G. Dimakis, P. Brighten Godfrey, M. J. Wainwright, K.Ramchandran, The Benefits of Network Coding for Peer-to-PeerStorage Systems, Netcod 2007.
4 A. Duminuco, E. Biersack, Hierarchical Codes: How to Make ErasureCodes Attractive for Peer-to-Peer Storage Systems, Peer-to-PeerComputing (P2P), 2008.
5 K. V. Rashmi, N. B. Shah, P. V. Kumar and K. Ramchandran, ExplicitConstruction of Optimal Exact Regenerating Codes for DistributedStorage, Allerton Conf. on Control, Computing and Comm., 2009.
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 7 / 34
Coding for Distributed Networked Storage
Related work
1 J. Kubiatowicz, D. Bindel, Y. Chen, S. Czerwinski, P. Eaton, D.Geels, R. Gummadi, S. Rhea, H. Weatherspoon, W. Weimer, C.Wells, and B. Zhao. OceanStore: An Architecture for Global-ScalePersistent Storage, ASPLOS 2000.
2 H. Weatherspoon, J. Kubiatowicz. Erasure Coding Vs. Replication: AQuantitative Comparison, Peer-to-Peer Systems, LNCS, 2002.
3 A. G. Dimakis, P. Brighten Godfrey, M. J. Wainwright, K.Ramchandran, The Benefits of Network Coding for Peer-to-PeerStorage Systems, Netcod 2007.
4 A. Duminuco, E. Biersack, Hierarchical Codes: How to Make ErasureCodes Attractive for Peer-to-Peer Storage Systems, Peer-to-PeerComputing (P2P), 2008.
5 K. V. Rashmi, N. B. Shah, P. V. Kumar and K. Ramchandran, ExplicitConstruction of Optimal Exact Regenerating Codes for DistributedStorage, Allerton Conf. on Control, Computing and Comm., 2009.
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 7 / 34
Self-Repairing Codes: Definition and Constructions
Outline
1 Coding for Distributed Networked Storage
2 Self-Repairing Codes: Definition and Constructions
3 Self-Repairing Codes: Analysis and Properties
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 8 / 34
Self-Repairing Codes: Definition and Constructions
Self-Repairing Codes (SRC)
Motivation: minimize the number of nodes necessary to repair amissing block.
Gain: lower bandwidth consumption, lower computational complexityof repair, possibility for faster and parallel replenishment of lostredundancy.
Self-repairing codes are (n, k) codes such that
encoded fragments can be repaired directly from other subsets ofencoded fragments.a fragment can be repaired from a fixed number of encoded fragments,independently of which specific blocks are missing(analogous to erasure codes supporting reconstruction using any n − klosses, independently of which).
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 9 / 34
Self-Repairing Codes: Definition and Constructions
Self-Repairing Codes (SRC)
Motivation: minimize the number of nodes necessary to repair amissing block.
Gain: lower bandwidth consumption, lower computational complexityof repair, possibility for faster and parallel replenishment of lostredundancy.
Self-repairing codes are (n, k) codes such that
encoded fragments can be repaired directly from other subsets ofencoded fragments.a fragment can be repaired from a fixed number of encoded fragments,independently of which specific blocks are missing(analogous to erasure codes supporting reconstruction using any n − klosses, independently of which).
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 9 / 34
Self-Repairing Codes: Definition and Constructions
Self-Repairing Codes (SRC)
Motivation: minimize the number of nodes necessary to repair amissing block.
Gain: lower bandwidth consumption, lower computational complexityof repair, possibility for faster and parallel replenishment of lostredundancy.
Self-repairing codes are (n, k) codes such that
encoded fragments can be repaired directly from other subsets ofencoded fragments.a fragment can be repaired from a fixed number of encoded fragments,independently of which specific blocks are missing(analogous to erasure codes supporting reconstruction using any n − klosses, independently of which).
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 9 / 34
Self-Repairing Codes: Definition and Constructions
Self-Repairing Codes (SRC)
Motivation: minimize the number of nodes necessary to repair amissing block.
Gain: lower bandwidth consumption, lower computational complexityof repair, possibility for faster and parallel replenishment of lostredundancy.
Self-repairing codes are (n, k) codes such that
encoded fragments can be repaired directly from other subsets ofencoded fragments.
a fragment can be repaired from a fixed number of encoded fragments,independently of which specific blocks are missing(analogous to erasure codes supporting reconstruction using any n − klosses, independently of which).
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 9 / 34
Self-Repairing Codes: Definition and Constructions
Self-Repairing Codes (SRC)
Motivation: minimize the number of nodes necessary to repair amissing block.
Gain: lower bandwidth consumption, lower computational complexityof repair, possibility for faster and parallel replenishment of lostredundancy.
Self-repairing codes are (n, k) codes such that
encoded fragments can be repaired directly from other subsets ofencoded fragments.a fragment can be repaired from a fixed number of encoded fragments,independently of which specific blocks are missing(analogous to erasure codes supporting reconstruction using any n − klosses, independently of which).
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 9 / 34
Self-Repairing Codes: Definition and Constructions
Self-Repairing Codes (a black-box view)
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 10 / 34
Self-Repairing Codes: Definition and Constructions
Homomorphic SRC (HSRC)
A first instance of self-repairing code.
Self-repairing Homomorphic Codes for Distributed Storage SystemsF. Oggier, A. Datta, INFOCOM 2011
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 11 / 34
Self-Repairing Codes: Definition and Constructions
Preliminaries: Weakly linearized polynomials
A weakly linearized polynomial p(X ) over Fq, q = 2m, has the form
p(X ) =k−1∑i=0
piX2i , pi ∈ Fq.
Let a, b ∈ F2m and let p(X ) be a weakly linearized polynomial given
by p(X ) =∑k−1
i=0 piX2i . We have
p(a + b) = p(a) + p(b).
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 12 / 34
Self-Repairing Codes: Definition and Constructions
HSRC: Encoding
1 Take an object o of length M:
o = (o1, . . . , ok), oi ∈ F2M/k .
2 Take a linearized polynomial with coefficients in F2M/k
p(X ) =k−1∑i=0
piX2i ,
and encode the k fragments pi = oi+1, i = 0, . . . , k − 1.
3 Evaluate p(X ) in n non-zero values α1, . . . , αn of F2M/k to get an-dimensional codeword
(p(α1), . . . , p(αn)),
and each p(αi ) is given to node i for storage.
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 13 / 34
Self-Repairing Codes: Definition and Constructions
HSRC: Encoding Illustration
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 14 / 34
Self-Repairing Codes: Definition and Constructions
HSRC: Decoding and Repair
1 Decoding is ensured by Lagrange interpolation.
2 Repair: Express αi in a F2-basis B = {b1, . . . , bM/k} of F2M/k , then
αi =
M/k∑j=1
αijbj , αij ∈ F2 ⇒ p(αi ) =
M/k∑j=1
αijp(bj).
3 Computational cost of a repair: XORs.
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 15 / 34
Self-Repairing Codes: Definition and Constructions
HSRC: Decoding and Repair
1 Decoding is ensured by Lagrange interpolation.
2 Repair: Express αi in a F2-basis B = {b1, . . . , bM/k} of F2M/k , then
αi =
M/k∑j=1
αijbj , αij ∈ F2 ⇒ p(αi ) =
M/k∑j=1
αijp(bj).
3 Computational cost of a repair: XORs.
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 15 / 34
Self-Repairing Codes: Definition and Constructions
HSRC: Decoding and Repair
1 Decoding is ensured by Lagrange interpolation.
2 Repair: Express αi in a F2-basis B = {b1, . . . , bM/k} of F2M/k , then
αi =
M/k∑j=1
αijbj , αij ∈ F2 ⇒ p(αi ) =
M/k∑j=1
αijp(bj).
3 Computational cost of a repair: XORs.
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 15 / 34
Self-Repairing Codes: Definition and Constructions
HSRC: A toy example (I)
A data file o = (o1, . . . , o12) of M = 12 bits is cut into k = 3fragments (M/k = 4)
o1 = (o1, . . . , o4), o2 = (o5, . . . , o8), o3 = (o9, . . . , o12) ∈ F24 .
F∗24 = 〈w〉, with w4 = w + 1. Encode with the polynomial
p(X ) =4∑
i=1
oiwiX +
4∑i=1
oi+4wiX 2 +
4∑i=1
oi+8wiX 4.
For n = 7, evaluate p(X ) at say 1,w ,w2,w4,w5,w8,w10. We get:
(p(1), p(w), p(w2), p(w4), p(w5), p(w8), p(w10))
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 16 / 34
Self-Repairing Codes: Definition and Constructions
HSRC: A toy example (I)
A data file o = (o1, . . . , o12) of M = 12 bits is cut into k = 3fragments (M/k = 4)
o1 = (o1, . . . , o4), o2 = (o5, . . . , o8), o3 = (o9, . . . , o12) ∈ F24 .
F∗24 = 〈w〉, with w4 = w + 1. Encode with the polynomial
p(X ) =4∑
i=1
oiwiX +
4∑i=1
oi+4wiX 2 +
4∑i=1
oi+8wiX 4.
For n = 7, evaluate p(X ) at say 1,w ,w2,w4,w5,w8,w10. We get:
(p(1), p(w), p(w2), p(w4), p(w5), p(w8), p(w10))
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 16 / 34
Self-Repairing Codes: Definition and Constructions
HSRC: A toy example (I)
A data file o = (o1, . . . , o12) of M = 12 bits is cut into k = 3fragments (M/k = 4)
o1 = (o1, . . . , o4), o2 = (o5, . . . , o8), o3 = (o9, . . . , o12) ∈ F24 .
F∗24 = 〈w〉, with w4 = w + 1. Encode with the polynomial
p(X ) =4∑
i=1
oiwiX +
4∑i=1
oi+4wiX 2 +
4∑i=1
oi+8wiX 4.
For n = 7, evaluate p(X ) at say 1,w ,w2,w4,w5,w8,w10. We get:
(p(1), p(w), p(w2), p(w4), p(w5), p(w8), p(w10))
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 16 / 34
Self-Repairing Codes: Definition and Constructions
HSRC: A toy example (II)
missing pairs to reconstruct missing fragment(s)fragment(s)
p(1) (p(w), p(w4));(p(w2), p(w8));(p(w5), p(w10))p(w) (p(1), p(w4));(p(w2), p(w5));(p(w8), p(w10))p(w2) (p(1), p(w8));(p(w), p(w5));(p(w4), p(w10))
p(1) and (p(w2), p(w8)) or (p(w5), p(w10)) for p(1)p(w) (p(w8), p(w10)) or (p(w2), p(w5)) for p(w)
p(1) and (p(w5), p(w10)) for p(1)p(w) and (p(w8), p(w10)) for p(w)p(w2) (p(w4), p(w10)) for p(w2)
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 17 / 34
Self-Repairing Codes: Definition and Constructions
Self-Repairing Codes from Projective Geometry (PSRC)
A second instance of self-repairing code.
Self-Repairing Codes for Distributed Storage - A Projective GeometricConstruction, F. Oggier, A. Datta, preprint 2011
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 18 / 34
Self-Repairing Codes: Definition and Constructions
Preliminaries: Spreads
Consider a vector space of dimension m over Fq, namely, a projectivespace PG (m − 1, q).
Let P be a projective space. A t-spread of P is a set S oft-dimensional subspaces of P which partitions P.
Theorem (Andre)
In PG(m − 1, q), a t-spread exists if and only if t + 1| m.
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 19 / 34
Self-Repairing Codes: Definition and Constructions
Preliminaries: Spreads
Consider a vector space of dimension m over Fq, namely, a projectivespace PG (m − 1, q).
Let P be a projective space. A t-spread of P is a set S oft-dimensional subspaces of P which partitions P.
Theorem (Andre)
In PG(m − 1, q), a t-spread exists if and only if t + 1| m.
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 19 / 34
Self-Repairing Codes: Definition and Constructions
Spreads from Field Extensions
Suppose that t + 1| m. Consider the finite fields F0 = Fq, F1 = Fqt+1
and F2 = Fqm .
Then F0 ⊆ F1 ⊆ F2. The field F2 is an m-dimensional vector space Vover F0.
The subspaces of V form the projective space P=PG(m, q). The fieldF1 is a (t + 1)-dimensional subspace of V and hence a t-dimensional(projective) subspace of P.
The same holds for all cosets aF1, (a ∈ F2). These cosets partitionthe multiplicative group of F2. Hence they form a t-spread of P.
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 20 / 34
Self-Repairing Codes: Definition and Constructions
Spreads from Field Extensions
Suppose that t + 1| m. Consider the finite fields F0 = Fq, F1 = Fqt+1
and F2 = Fqm .
Then F0 ⊆ F1 ⊆ F2. The field F2 is an m-dimensional vector space Vover F0.
The subspaces of V form the projective space P=PG(m, q). The fieldF1 is a (t + 1)-dimensional subspace of V and hence a t-dimensional(projective) subspace of P.
The same holds for all cosets aF1, (a ∈ F2). These cosets partitionthe multiplicative group of F2. Hence they form a t-spread of P.
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 20 / 34
Self-Repairing Codes: Definition and Constructions
Spreads from Field Extensions
Suppose that t + 1| m. Consider the finite fields F0 = Fq, F1 = Fqt+1
and F2 = Fqm .
Then F0 ⊆ F1 ⊆ F2. The field F2 is an m-dimensional vector space Vover F0.
The subspaces of V form the projective space P=PG(m, q). The fieldF1 is a (t + 1)-dimensional subspace of V and hence a t-dimensional(projective) subspace of P.
The same holds for all cosets aF1, (a ∈ F2). These cosets partitionthe multiplicative group of F2. Hence they form a t-spread of P.
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 20 / 34
Self-Repairing Codes: Definition and Constructions
Spreads from Field Extensions
Suppose that t + 1| m. Consider the finite fields F0 = Fq, F1 = Fqt+1
and F2 = Fqm .
Then F0 ⊆ F1 ⊆ F2. The field F2 is an m-dimensional vector space Vover F0.
The subspaces of V form the projective space P=PG(m, q). The fieldF1 is a (t + 1)-dimensional subspace of V and hence a t-dimensional(projective) subspace of P.
The same holds for all cosets aF1, (a ∈ F2). These cosets partitionthe multiplicative group of F2. Hence they form a t-spread of P.
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 20 / 34
Self-Repairing Codes: Definition and Constructions
PSRC: Encoding
1 For an object o of size M, consider the finite field F2M .
2 Consider a t-spread S formed of t-dimensional subspaces of P suchthat t + 1|M. Set α = t + 1. Assign to each node an F2-basiscontaining α vectors. The number of storage nodes is (at most)
n =2M − 1
2α − 1.
3 The ith node will actually store
{ovTiα+1, . . . , ov(i+1)α}
for a total storage of α.
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 21 / 34
Self-Repairing Codes: Definition and Constructions
PSRC: Decoding and Repair
1 Decoding is solving a system of linear equations.
2 Repair The lth node Nl stores ν lF∗2α , l = 1, . . . , n. Let us assume this
lth node fails, and a new comer Ni joins. Contact the jth node Nj
such that ν j = ν i + ν l . By combining the data stored at node Ni andNj , we get
ν iF∗2α
∐(ν i + ν l)F∗
2α
which contains ν lF∗2α .
Lemma
For any choice of node Ni among the remaining n − 1 live nodes, thereexists at least one node Nj such that Nl can be repaired by downloadingthe data stored at nodes Ni and Nj .
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 22 / 34
Self-Repairing Codes: Definition and Constructions
PSRC: Decoding and Repair
1 Decoding is solving a system of linear equations.
2 Repair The lth node Nl stores ν lF∗2α , l = 1, . . . , n. Let us assume this
lth node fails, and a new comer Ni joins. Contact the jth node Nj
such that ν j = ν i + ν l . By combining the data stored at node Ni andNj , we get
ν iF∗2α
∐(ν i + ν l)F∗
2α
which contains ν lF∗2α .
Lemma
For any choice of node Ni among the remaining n − 1 live nodes, thereexists at least one node Nj such that Nl can be repaired by downloadingthe data stored at nodes Ni and Nj .
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 22 / 34
Self-Repairing Codes: Definition and Constructions
PSRC: A toy example
node basis vectors data stored
N1 v1 = (1000), v2 = (0110) {o1, o2 + o3}N2 v3 = (0100), v4 = (0011) {o2, o3 + o4}N3 v5 = (0010), v6 = (1101) {o3, o1 + o2 + o4}N4 v7 = (0001), v8 = (1010) {o4, o1 + o3}N5 v9 = (1100), v10 = (0101) {o1 + o2, o2 + o4}
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 23 / 34
Self-Repairing Codes: Analysis and Properties
Outline
1 Coding for Distributed Networked Storage
2 Self-Repairing Codes: Definition and Constructions
3 Self-Repairing Codes: Analysis and Properties
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 24 / 34
Self-Repairing Codes: Analysis and Properties
Static resilience
There is at least one pair to repair a node, for up to (n − 1)/2simultaneous failures
Static resilience of a distributed storage system is the probability thatan object stored in the system stays available without any furthermaintenance, even when a fraction of nodes become unavailable.
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 25 / 34
Self-Repairing Codes: Analysis and Properties
Static resilience: HSRC versus EC
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
pfrag
p obj
sim SRC(31,5)ana SRC(31,5)sim SRC(15,4)ana SRC(15,4)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
pfrag
p obj
EC(63,5)SRC(63,5)EC(31,5)SRC(31,5)
Figure: Static resilience of self-repairing codes (SRC): Validation of analysis, andcomparison with erasure codes (EC)
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 26 / 34
Self-Repairing Codes: Analysis and Properties
Static resilience: PSRC versus EC
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
pnode
p obj
PSRC(21,4)EC(21,4)
Figure: Static resilience of self-repairing codes (HSRC): Comparison with erasurecodes (EC)
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 27 / 34
Self-Repairing Codes: Analysis and Properties
More on Resilience: HSRC versus EC
0 5 10 15 20 25 300
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
ρ x
HSRC(31,5)EC(31,5)
Figure: Static resilience of self-repairing codes (HSRC): Comparison with erasurecodes (EC)
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 28 / 34
Self-Repairing Codes: Analysis and Properties
More on Resilience: PSRC versus EC
0 2 4 6 8 10 12 14 16 18 200
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
ρ x
PSRC(21,4)EC(21,4)
Figure: Static resilience of self-repairing codes (HSRC): Comparison with erasurecodes (EC)
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 29 / 34
Self-Repairing Codes: Analysis and Properties
Fast & parallel repairs using HSRC: A toy example
Consider:
(15,4) code, nodes storing p(w i ) for i = 0, 1, 2, 3, 4, 5, 6 are missingNodes have upload/download bandwidth limit: one block per time unit
Possible pairs to repair each missing block:
fragment suitable pairs to reconstruct
p(1) (p(w 7), p(w 9));(p(w 11), p(w 12))p(w) (p(w 7), p(w 14));(p(w 8), p(w 10))p(w 2) (p(w 7), p(w 12));(p(w 9), p(w 11));(p(w 12), p(w 10))p(w 3) (p(w 8), p(w 13));(p(w 10), p(w 12))p(w 4) (p(w 9), p(w 14));(p(w 11), p(w 13))p(w 5) (p(w 7), p(w 13));(p(w 12), p(w 14))p(w 6) (p(w 7), p(w 10));(p(w 8), p(w 14))
A parallelized schedule:node p(w 0) p(w 1) p(w 2) p(w 3) p(w 4) p(w 5) p(w 6)
Time 1 p(w 7) p(w 8) p(w 9) p(w 13) p(w 11) p(w 12) p(w 10)Time 2 p(w 9) p(w 10) p(w 11) p(w 8) p(w 13) p(w 14) p(w 7)
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 30 / 34
Self-Repairing Codes: Analysis and Properties
Fast & parallel repairs using HSRC: A toy example
Consider:
(15,4) code, nodes storing p(w i ) for i = 0, 1, 2, 3, 4, 5, 6 are missingNodes have upload/download bandwidth limit: one block per time unit
Possible pairs to repair each missing block:
fragment suitable pairs to reconstruct
p(1) (p(w 7), p(w 9));(p(w 11), p(w 12))p(w) (p(w 7), p(w 14));(p(w 8), p(w 10))p(w 2) (p(w 7), p(w 12));(p(w 9), p(w 11));(p(w 12), p(w 10))p(w 3) (p(w 8), p(w 13));(p(w 10), p(w 12))p(w 4) (p(w 9), p(w 14));(p(w 11), p(w 13))p(w 5) (p(w 7), p(w 13));(p(w 12), p(w 14))p(w 6) (p(w 7), p(w 10));(p(w 8), p(w 14))
A parallelized schedule:node p(w 0) p(w 1) p(w 2) p(w 3) p(w 4) p(w 5) p(w 6)
Time 1 p(w 7) p(w 8) p(w 9) p(w 13) p(w 11) p(w 12) p(w 10)Time 2 p(w 9) p(w 10) p(w 11) p(w 8) p(w 13) p(w 14) p(w 7)
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 30 / 34
Self-Repairing Codes: Analysis and Properties
Fast & parallel repairs using HSRC: A toy example
Consider:
(15,4) code, nodes storing p(w i ) for i = 0, 1, 2, 3, 4, 5, 6 are missingNodes have upload/download bandwidth limit: one block per time unit
Possible pairs to repair each missing block:
fragment suitable pairs to reconstruct
p(1) (p(w 7), p(w 9));(p(w 11), p(w 12))p(w) (p(w 7), p(w 14));(p(w 8), p(w 10))p(w 2) (p(w 7), p(w 12));(p(w 9), p(w 11));(p(w 12), p(w 10))p(w 3) (p(w 8), p(w 13));(p(w 10), p(w 12))p(w 4) (p(w 9), p(w 14));(p(w 11), p(w 13))p(w 5) (p(w 7), p(w 13));(p(w 12), p(w 14))p(w 6) (p(w 7), p(w 10));(p(w 8), p(w 14))
A parallelized schedule:node p(w 0) p(w 1) p(w 2) p(w 3) p(w 4) p(w 5) p(w 6)
Time 1 p(w 7) p(w 8) p(w 9) p(w 13) p(w 11) p(w 12) p(w 10)Time 2 p(w 9) p(w 10) p(w 11) p(w 8) p(w 13) p(w 14) p(w 7)
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 30 / 34
Self-Repairing Codes: Analysis and Properties
Systematic Object Retrieval using PSRC: A toy example
node basis vectors data stored
N1 v1 = (1000), v2 = (0110) {o1, o2 + o3}N2 v3 = (0100), v4 = (0011) {o2, o3 + o4}N3 v5 = (0010), v6 = (1101) {o3, o1 + o2 + o4}N4 v7 = (0001), v8 = (1010) {o4, o1 + o3}N5 v9 = (1100), v10 = (0101) {o1 + o2, o2 + o4}
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 31 / 34
Self-Repairing Codes: Analysis and Properties
Future/ongoing work
Efficient decoding, other instances of SRC
Implementation & integration in a distributed storage system
Various systems/algorithmic issues: Topology optimized placement,repair scheduling
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 32 / 34
Self-Repairing Codes: Analysis and Properties
Wrap Up
Design of codes for distributed networked storage
Self-Repairing Codes
New research topic in coding theory!
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 33 / 34
Self-Repairing Codes: Analysis and Properties
Wrap Up
Design of codes for distributed networked storage
Self-Repairing Codes
New research topic in coding theory!
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 33 / 34
Self-Repairing Codes: Analysis and Properties
Wrap Up
Design of codes for distributed networked storage
Self-Repairing Codes
New research topic in coding theory!
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 33 / 34
Self-Repairing Codes: Analysis and Properties
Q&A
More information:http://sands.sce.ntu.edu.sg/CodingForNetworkedStorage/
Contact: {frederique,anwitaman}@ntu.edu.sg
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 34 / 34
Self-Repairing Codes: Analysis and Properties
Q&A
More information:http://sands.sce.ntu.edu.sg/CodingForNetworkedStorage/
Contact: {frederique,anwitaman}@ntu.edu.sg
F. Oggier (NTU) Coding for Storage IMS-NTU Workshop 34 / 34