+ All Categories
Home > Documents > Securing Dynamic Distributed Storage Systems against ... · Securing Dynamic Distributed Storage...

Securing Dynamic Distributed Storage Systems against ... · Securing Dynamic Distributed Storage...

Date post: 13-Jun-2020
Category:
Upload: others
View: 11 times
Download: 0 times
Share this document with a friend
19
arXiv:1009.2556v2 [cs.IT] 27 Apr 2011 1 Securing Dynamic Distributed Storage Systems against Eavesdropping and Adversarial Attacks Sameer Pawar, Salim El Rouayheb, Member, IEEE and Kannan Ramchandran, Fellow, IEEE Abstract—We address the problem of securing distributed storage systems against eavesdropping and adversarial attacks. An important aspect of these systems is node failures over time, necessitating, thus, a repair mechanism in order to maintain a desired high system reliability. In such dynamic settings, an important security problem is to safeguard the system from an intruder who may come at different time instances during the lifetime of the storage system to observe and possibly alter the data stored on some nodes. In this scenario, we give upper bounds on the maximum amount of information that can be stored safely on the system. For an important operating regime of the distributed storage system, which we call the bandwidth- limited regime, we show that our upper bounds are tight and provide explicit code constructions. Moreover, we provide a way to short list the malicious nodes and expurgate the system. Index Terms—Byzantine adversary, Distributed Storage, Net- work Codes, Secrecy. I. I NTRODUCTION Distributed storage systems (DSS) consist of a collection of n data storage nodes, typically individually unreliable, that are collectively used to reliably store data files over long periods of time. Applications of such systems are innumerable and include large data centers and peer-to-peer file storage systems such as OceanStore [1], Total Recall [2] and DHash++ [3] that use a large number of nodes spread widely across the Internet. To satisfy important requirements such as data reliability and load balancing, it is desirable for the system to be designed to enable a user, also referred to as a data collector, to download a file stored on the DSS by connecting to a smaller number k, k<n, nodes. An important design problem for such systems arises from the individual unreliability of the system nodes due to many reasons, such as disk failures (often due to the use of inexpensive “commodity” hardware) or peer “churning” in peer-to-peer storage systems. In order to maintain a high system reliability, the data is stored redundantly across the storage nodes. Moreover, the system is repaired every time a node fails by replacing it with a new node that connects to d other nodes and download data to replace the lost one. Codes for protecting data from erasures have been well studied in classical channel coding theory, and can be used This research was funded by an NSF grant (CCF-0964018), a DTRA grant (HDTRA1-09-1-0032), and in part by an AFOSR grant (FA9550-09-1-0120). Sameer Pawar is with the Wireless Foundation, Department of Electrical Engineering and Computer Science, University of California, Berkeley, CA 94704 USA (e-mail: [email protected]). Salim El Rouayheb is with the Wireless Foundation, Department of Elec- trical Engineering and Computer Science, University of California, Berkeley, CA 94704 USA (e-mail: [email protected]). K. Ramchandran is with the Wireless Foundation, Department of Electrical Engineering and Computer Science, University of California, Berkeley, CA 94704 USA (e-mail: [email protected]). a1 +2b1 2a2 + b2 v4 a1 + a2 a1 +4a2 v5 DC v3 a2 + b2 a1 + b1 File F v1 v2 a1,a2 b1,b2 b1,b2 a1,a2 1 2 2 2 2 1 1 Fig. 1. An example of a distributed data storage system under repair. A file F of 4 symbols (a 1 ,a 2 ,b 1 ,b 2 ) F 4 5 is stored on four nodes using a (4, 2) MDS code. Node v 1 fails and is replaced by a new node v 5 that downloads (b 1 + b 2 ), (a 1 + a 2 + b 1 + b 2 ) and (a 1 +4a 2 +2b 1 +2b 2 ) from nodes v 2 ,v 3 and v 4 respectively to compute and store (a 1 + a 2 ,a 1 +4a 2 ). Nodes v 2 ,...,v 5 form a new (4, 2) MDS code. The edges in the graph are labeled by their capacities. The figure also depicts a data collector connecting to nodes v 2 and v 4 to recover the stored file. here to increase the reliability of distributed storage systems. Fig. 1 illustrates an example where a (4, 2) maximal distance separable (MDS) code is used to store a file F of 4 symbols (a 1 ,a 2 ,b 1 ,b 2 ) F 4 5 distributively on n =4 different nodes, v 1 ,...,v 4 , each having a storage capacity of two symbols. The (4, 2) MDS code ensures that a data collector connecting to any k =2 storage nodes, out of n =4, can reconstruct the whole file F . However, what distinguishes the scenario here from the erasure channel counterpart is that, in the event of a node failure, the system needs to be repaired by replacing the failed node with a new one. A straightforward repair mechanism would be to add a replacement node that connects to k =2 other nodes, downloads the whole file, reconstructs the lost part of the data and stores it. One drawback of this solution is the relatively high repair bandwidth, i.e., the total amount of data downloaded by the new replacement node. For this straightforward repair scheme, the repair bandwidth is equal to the size of the file F which can be large in general. A more efficient repair scheme that requires less repair bandwidth is depicted in Fig. 1 where node v 1 fails and is replaced by node v 5 . By making node v 5 connect to d =3 nodes instead of k =2, it is possible to decrease the total repair bandwidth from 4 to 3 symbols. Note that, in the proposed repair solution, v 5 does not store the exact data that was on v 1 ; the only required property is that the
Transcript
Page 1: Securing Dynamic Distributed Storage Systems against ... · Securing Dynamic Distributed Storage Systems against Eavesdropping and Adversarial Attacks Sameer Pawar, Salim El Rouayheb,

arX

iv:1

009.

2556

v2 [

cs.IT

] 27

Apr

201

11

Securing Dynamic Distributed Storage Systemsagainst Eavesdropping and Adversarial Attacks

Sameer Pawar, Salim El Rouayheb,Member, IEEEand Kannan Ramchandran,Fellow, IEEE

Abstract—We address the problem of securing distributedstorage systems against eavesdropping and adversarial attacks.An important aspect of these systems is node failures over time,necessitating, thus, a repair mechanism in order to maintaina desired high system reliability. In such dynamic settings, animportant security problem is to safeguard the system froman intruder who may come at different time instances duringthe lifetime of the storage system to observe and possibly alterthe data stored on some nodes. In this scenario, we give upperbounds on the maximum amount of information that can bestored safely on the system. For an important operating regimeof the distributed storage system, which we call thebandwidth-limited regime, we show that our upper bounds are tight andprovide explicit code constructions. Moreover, we providea wayto short list the malicious nodes and expurgate the system.

Index Terms—Byzantine adversary, Distributed Storage, Net-work Codes, Secrecy.

I. I NTRODUCTION

Distributed storage systems(DSS) consist of a collection ofn data storage nodes, typically individually unreliable, that arecollectively used to reliably store data files over long periodsof time. Applications of such systems are innumerable andinclude large data centers and peer-to-peer file storage systemssuch as OceanStore [1], Total Recall [2] and DHash++ [3] thatuse a large number of nodes spread widely across the Internet.To satisfy important requirements such as data reliabilityandload balancing, it is desirable for the system to be designedtoenable a user, also referred to as a data collector, to downloada file stored on the DSS by connecting to a smaller numberk,k < n, nodes. An important design problem for such systemsarises from the individual unreliability of the system nodesdue to many reasons, such as disk failures (often due to theuse of inexpensive “commodity” hardware) or peer “churning”in peer-to-peer storage systems. In order to maintain a highsystem reliability, the data is stored redundantly across thestorage nodes. Moreover, the system is repaired every time anode fails by replacing it with a new node that connects todother nodes and download data to replace the lost one.

Codes for protecting data from erasures have been wellstudied in classical channel coding theory, and can be used

This research was funded by an NSF grant (CCF-0964018), a DTRA grant(HDTRA1-09-1-0032), and in part by an AFOSR grant (FA9550-09-1-0120).

Sameer Pawar is with the Wireless Foundation, Department ofElectricalEngineering and Computer Science, University of California, Berkeley, CA94704 USA (e-mail: [email protected]).

Salim El Rouayheb is with the Wireless Foundation, Department of Elec-trical Engineering and Computer Science, University of California, Berkeley,CA 94704 USA (e-mail: [email protected]).

K. Ramchandran is with the Wireless Foundation, Departmentof ElectricalEngineering and Computer Science, University of California, Berkeley, CA94704 USA (e-mail: [email protected]).

a1 + 2b12a2 + b2

v4

a1 + a2a1 + 4a2

v5

DC

v3

a2 + b2

a1 + b1

File F

v1

v2

a1, a2

b1, b2

b1, b2

a1, a2

1

2

2

2

2

1

1

Fig. 1. An example of a distributed data storage system underrepair. A fileF of 4 symbols(a1, a2, b1, b2) ∈ F

45 is stored on four nodes using a(4, 2)

MDS code. Nodev1 fails and is replaced by a new nodev5 that downloads(b1 + b2), (a1 + a2 + b1 + b2) and (a1 + 4a2 + 2b1 + 2b2) from nodesv2, v3 andv4 respectively to compute and store(a1 +a2, a1 +4a2). Nodesv2, . . . , v5 form a new(4, 2) MDS code. The edges in the graph are labeledby their capacities. The figure also depicts a data collectorconnecting to nodesv2 andv4 to recover the stored file.

here to increase the reliability of distributed storage systems.Fig. 1 illustrates an example where a(4, 2) maximal distanceseparable(MDS) code is used to store a fileF of 4 symbols(a1, a2, b1, b2) ∈ F

45 distributively onn = 4 different nodes,

v1, . . . , v4, each having a storage capacity of two symbols.The (4, 2) MDS code ensures that a data collector connectingto anyk = 2 storage nodes, out ofn = 4, can reconstruct thewhole file F . However, what distinguishes the scenario herefrom the erasure channel counterpart is that, in the event ofanode failure, the system needs to be repaired by replacingthe failed node with a new one. A straightforward repairmechanism would be to add a replacement node that connectsto k = 2 other nodes, downloads the whole file, reconstructsthe lost part of the data and stores it. One drawback of thissolution is the relatively high repair bandwidth,i.e., the totalamount of data downloaded by the new replacement node.For this straightforward repair scheme, the repair bandwidthis equal to the size of the fileF which can be large ingeneral. A more efficient repair scheme that requires lessrepair bandwidth is depicted in Fig. 1 where nodev1 failsand is replaced by nodev5. By making nodev5 connect tod = 3 nodes instead ofk = 2, it is possible to decreasethe total repair bandwidth from 4 to 3 symbols. Note that,in the proposed repair solution,v5 does not store the exactdata that was onv1; the only required property is that the

Page 2: Securing Dynamic Distributed Storage Systems against ... · Securing Dynamic Distributed Storage Systems against Eavesdropping and Adversarial Attacks Sameer Pawar, Salim El Rouayheb,

2

data stored on all the surviving nodesv2, v3, v4 andv5 form a(4, 2) MDS code. The above important observations were thebasis of the original work of [4] where the authors showedthat there exists a fundamental tradeoff between the storagecapacity at each node and the repair bandwidth. They alsointroduced and constructedregenerating codesas a new classof codes that generalizes classical erasure codes and permitsthe operation of a DSS at any operational point on the optimaltradeoff curve.

When a distributed data storage system is formed usingnodes widely spread across the Internet, e.g., peer-to-peersystems, individual nodes may not be secure and may be thussusceptible to an intruder that can eavesdrop on the nodes andpossibly modify their data, e.g., viruses, botnet, etc. In thiswork, we address the issue of securing dynamic distributedstorage systems, with nodes continually leaving and joining thesystem, against such intruders. The dynamic behavior of thesystem can jeopardize the data by making the intruder morepowerful. For instance, while eavesdropping on a new nodeduring the repair process, the intruder can observe not onlyits stored content but also all its downloaded data. Moreover,it allows an adversary to introduce errors on nodes beyondhis/her control by sending erroneous messages when contactedfor repair.

In our analysis, we focus on three different types of in-truders: (i) apassive eavesdropperwho can eavesdrop onℓnodes in the system, (ii) anactive omniscient adversarywhohas complete knowledge of the data stored in the systemand can maliciously modify the data on anyb nodes in thesystem, and (iii) anactive limited-knowledge adversarywhocan eavesdrop on anyℓ nodes and can maliciously corruptthe data on anyb nodes among theℓ observed ones. In thelast case, the intruder’s knowledge about the stored data inthe system is limited to what can be inferred from the nodeshe/she is observing.

We define thesecrecyand resiliency capacitiesof a dis-tributed storage system as the maximum amount of informa-tion that it can store safely, respectively, in the presenceofan eavesdropper or a malicious adversary. For these intruderscenarios, we derive general upper bounds on the secrecyand resiliency capacity of the system. Motivated by systemconsiderations, we define an important operation regime thatwe call thebandwidth-limitedregime where there is a fixedallowed budget for the repair bandwidth with no constraintson the node storage capacity. This regime is of increasingimportance due to the asymmetry in the cost of bandwidth vs.storage. For the bandwidth-limited regime, we show that ourupper bounds are tight and provide explicit constructions ofcapacity-achieving codes.

The work in this paper is related to the recent work in the lit-erature on secure network coding for networks with restrictedwiretapping sets [5] and networks comprising traitor nodes[6]. The problem of studying such networks is known to bemuch harder in general than models considering (unrestricted)compromised edges instead of nodes. For instance, the workof [5] implies that finding the secrecy capacity of networkswith wiretapped nodes is an NP-hard problem. Moreover, non-linear coding at intermediate network nodes may be necessary

for securing networks against malicious nodes as shown in[6]. The contribution of this paper resides, at a high level,in showing that the networks representing distributed storagesystems have structural symmetry that makes the securityproblem more tractable than in general networks. We lever-age this fact to derive the exact expressions of the secrecyand resiliency capacities of these systems in the importantbandwidth-limited regime. Moreover, we present capacity-achieving codes that are linear. These codes are characterizedby a separation property: the file to be stored is first encodedfor security then stored in the system without any modificationto the internal operation of the system nodes. An additionalinteresting property of our proposed codes is that, in the activeadversary case, they permit the identification of a small listof suspected nodes guaranteed to contain the malicious ones,permitting thus the expurgation of the system.

The rest of this paper is organized as follows. In Section II,we discuss related work on distributed storage systems andsecure network coding. In Section III, we describe the flowgraph model for distributed storage systems and elaborate onthe intruder model. We provide a brief summary of our mainresults in Section IV. In Section V, we derive an upper boundon the secrecy capacity of the system and provide an achiev-able scheme for the bandwidth-limited regime. We providea similar analysis for the omniscient and limited-knowledgeadversary cases respectively in Section VI and Section VII,where we find upper bounds on the resiliency capacity andconstruct capacity achieving codes for the bandwidth-limitedregime. We conclude the paper in Section VIII and discusssome related open problems.

II. RELATED WORK

The pioneering work of Dimakis et al. in [4], [7], [8],demonstrated the fundamental trade-off between repair band-width and storage cost in a distributed storage system, wherenodes fail over time and are repaired to maintain a desiredsystem reliability. They also introducedregenerating codesas codes that are more efficient than classical erasure codesfor distributed storage applications. In many scenarios ofinterest, the data is required to exist in the system alwaysin a systematic form. This has motivated the study ofexactregenerating codes[9], [10], [11], [12] that achieve this goalby repairing a failed node with an exact copy of the lostdata. The construction of exact regenerating codes in [9] turnsout to be instrumental in achieving the secrecy and resiliencycapacity of a DSS in the bandwidth-limited regime.

In [7], the construction of regenerating codes was linked tofinding network codes for a suitable network. Network codingwas introduced in the seminal paper of [13] and extends theclassical routing approach by allowing the intermediate nodesin the network to encode their incoming packets as opposedto just copying and forwarding it. The literature on networkcoding is now rich in interesting results which can be foundin references [14] and [15], that provide a comprehensiveoverview of this area.

In this paper, we are interested in securing distributedstorage systems under repair dynamics, which is a special

Page 3: Securing Dynamic Distributed Storage Systems against ... · Securing Dynamic Distributed Storage Systems against Eavesdropping and Adversarial Attacks Sameer Pawar, Salim El Rouayheb,

3

case of the more general problem of achieving security indynamical systems. A node-based intruder model is naturalin this setting and is related to the recent work of [16] onsecuring distributed storage systems in the presence of atrusted verifier and that of Kosut et al. in [6] on protectingdata in networks with traitor nodes. An intruder model thatcan observe and/or change the data on links, as opposed tonodes, has been extensively studied in the network codingliterature. Cai and Yeung introduced in [17], [18] the problemof designing secure network codes in the presence of aneavesdropper, which was further studied in [19], [20], [21],[5]. A Byzantineadversary that can maliciously introduceerrors on the network links was investigated in [22], [23], [24],[25], [26]. The problem of error correction in networks wasalso studied by Cai and Yeung in [27], [28] from a classicalcoding theory perspective. A different approach for correctingerrors in networks was proposed by Koetter and Kschischangin [29], where communication is established by transmittingsubspaces instead of vectors through the network. The use ofmaximum rank-metric codes for error control under this modelwas investigated in [30].

III. M ODEL

A. Distributed Storage System

A distributed storage system (DSS) is a dynamic networkof storage nodes. These nodes include a source node that hasan incompressible data fileF of R symbols, or units, eachbelonging to a finite fieldF. The source node is connected tonstorage nodesv1, . . . , vn, each having a storage capacity ofαsymbols, which may be utilized to save coded parts of the fileF . The storage nodes are individually unreliable and may failover time. To guarantee a certain desired level of reliability, weassume that the DSS is required to always haven active,i.e.,non-failed, storage nodes that are simultaneously in service.Therefore, when a storage node fails, it is replaced by a newnode with the same storage capacityα. The DSS should bedesigned in such a way as to allow any legitimate user ordata collector, that contacts anyk out of then active storagenodes available at any given time, to be able to reconstruct theoriginal file F . We term this condition as thereconstructionpropertyof distributed storage systems.

We assume that nodes fail one at a time1, and we denote byvn+i the new replacement node added to the system to repairthe i-th failure. The new replacement node connects then tosomed nodes,d ≥ k, chosen, possibly randomly, out of theremaining activen−1 nodes and downloadsγ units of data intotal from them, which are then possibly compressed (ifα <γ) and stored on the node. The data stored on the replacementnode can be different than the one that was stored on the failednode, as long as the reconstruction property of the DSS isretained. The process of replenishing redundancy to maintainthe reliability of a DSS is referred to as the“regeneration” or

1Multiple nodes failing simultaneously is a rare event. Whenthis occurs,the DSS implements an “emergency” recovery process that employs a reservedset of trusted nodes, guaranteed not to be compromised. The trusted nodesthen replace the failed ones by acting as data collectors anddownloading datafrom k active nodes. The trusted nodes then consecutively leave the system,thus triggering multiple rounds of the repair process.

α

α

x2in

x1in

x3in

x4in

x1out

x4out

x2out

x3out

v3

v2

v4

α

α α

β = 1

β

x5outx5

in

β = 1

s

v1

v5

DC

Fig. 2. The flow graph model of the DSSD(4, 2, 3) of Fig. 1 when nodev1 fails and is replaced by nodev5. Each storage nodevi is represented bytwo nodesxi

in and xiout connected by an edge(xi

in, xiout) of capacityα

representing the node storage constraint. A data collectorDC connecting tonodesv2 andv4 is also depicted.

“repair” process, and we callγ, the total amount of data (insymbols) downloaded for repair, therepair bandwidthof thesystem.

Due to load balancing and “fairness” requirements in thesystem, the repair process is typicallysymmetricwhere thenew replacement node downloads equal amount of data,β =γ/d units, from each of the node participating in the repairprocess. We will adopt the symmetric repair model throughoutthis paper. A distributed storage systemD is thus characterizedasD(n, k, d), wherek ≤ d ≤ n − 1. For instance, the DSSdepicted in Fig. 1 corresponds toD(4, 2, 3) operating at thepoint (α, γ) = (2, 3).

B. Flow Graph Representation

We adopt the same model as in [4] where the distributedstorage system is represented by an information flow graphG. The graphG is a directed acyclic graph with capacityconstrained edges. It consists of three kinds of nodes: a singlesource nodes, input storage nodesxi

in and output storagenodesxi

out, and data collectors DCj for i, j ∈ 1, 2, . . .. Thesource nodes holds an information sourceS having the fileF as a special realization. Each storage nodevi in the DSSis represented by two nodesxi

in and xiout in G. To account

for the storage capacity ofvi, these two nodes are joined bya directed edge(xi

in, xiout) of capacityα (see Fig. 2).

The repair process that is initiated every time a failureoccurs, causes the DSS, and consequently the flow graph, to bedynamic and evolving with time. At any given time, each nodein the graph is either active or inactive depending on whether ithas failed or not. The graphG starts with only the source nodes and the nodesx1

in, . . . , xnin connected respectively to the

nodesx1out, . . . , x

nout. Initially, only the source nodes is active

and is connected to the storage input nodesx1in, . . . , x

nin by

outgoing edges of infinite capacity. From this point onwards,

Page 4: Securing Dynamic Distributed Storage Systems against ... · Securing Dynamic Distributed Storage Systems against Eavesdropping and Adversarial Attacks Sameer Pawar, Salim El Rouayheb,

4

the source nodes becomes and remains inactive, and theninput and output storage nodes become active. When a nodevifails in a DSS, the corresponding nodesxi

in andxiout become

inactive inG. If a replacement nodevj joins the DSS in theprocess of repairing a failure and connects tod active nodesvi1 , . . . , vid , the corresponding nodesxj

in and xjout with the

edge (xjin, x

jout) are added to the flow graphG, and node

xjin is connected to the nodesxi1

out, . . . , xidout by incoming

edges of capacityβ = γ/d units each. A data collector isrepresented by a node connected tok active storage outputnodes through infinite capacity links enabling it to downloadall their stored data and reconstruct the fileF . The graphG constitutes a multicast network with the data collectors asdestinations. An underlying assumption here is that the flowgraph corresponding to a distributed storage system dependson the sequence of failed nodes. As an example, we depict inFig. 2 the flow graph corresponding to the DSSD(4, 2, 3) ofthe previous section (see Fig. 1) when nodev1 fails.

LetV be the set of nodes in the flow graphG. A cutC(V, V )in the flow graph separating the sources from a data collectorDCi is a partition of the node set ofG into two subsetsV ⊂ VandV = V \ V , such thats ∈ V and DCi ∈ V . We say thatan edge(n1, n2) belongs to a cutC(V, V ) if n1 ∈ V andn2 ∈ V . Thevalueof a cut is the sum of the capacities of theedges belonging to it.

C. Intruder Model

We assume the presence of an illegitimate intruder in theDSS who can eavesdrop on some of the storage nodes, andpossibly alter the stored data on some of them in order tosabotage the system. We characterize the power of an intruderby two parametersℓ and b, whereℓ denotes the number ofnodes that the intruder can eavesdrop on, andb denotes thenumber of nodes it can control by maliciously corruptingits data. We distinguish among three categories of intruders:a passive eavesdropper“Eve”, an active omniscient adver-sary “Calvin”, and an active limited-knowledge adversary“Charlie”. We always assume that all the data collectors andintruders have the complete knowledge of the storage and therepair scheme implemented in the system.

a) Passive Eavesdropper:We assume that the eavesdrop-per Eve can access up toℓ, ℓ < k, nodes of her choice amongall the storage nodes,v1, v2, . . . , possibly at different timeinstances as the system evolves. Eve is passive and can onlyread the data on the observedℓ nodes without modifying it,i.e.,b = 0. In the flow graph model, Eve is an eavesdropper thatcan access a fixed numberℓ of nodes chosen from the storageinput nodesx1

in, x2in, . . . . Notice that while a data collector

observes the output storage nodes,i.e., the data stored on thenodes it connects to, Eve, has access to the input storage nodes,and thus can observe, in addition to the stored data, all themessages incoming to these nodes. As a result, Eve can choosesome of the compromisedℓ nodes to be among the initialnstorage nodes, and/or, if she deems it more profitable, she canwait for certain failures to occur and then eavesdrop on thereplacement nodes by observing its downloaded data.

b) Active Omniscient Adversary:The active adversaryCalvin is omniscient [24],i.e., he knows the fileF and thedata stored on all the nodes. Moreover, Calvin can controlb nodes in total, where2b < k, that can include some ofthe original nodesv1, . . . , vn, and/or some replacement nodesvn+1, . . . . Calvin can maliciously alter the data stored on thenodes under his control. It can also send erroneous outgoingmessages when contacted for repair or reconstruction. In theflow graph, this corresponds to controlling a set ofb inputnodesxi1

in, xi2in, . . . , x

ibin and the corresponding output nodes

xi1out, x

i2out, . . . , x

ibout.

c) Active Limited-knowledge Adversary:The active ad-versary Charlie is notomniscientbut haslimited knowledgeabout the data stored in the system. In particular, he has alimited eavesdropping capabilityℓ not sufficient enough toknow all the stored data. In addition, Charlie can controlbnodes of his choice and maliciously corrupt their data. Indistributed storage systems, an intruder controlling a node willalso observe its data. Therefore, we assume thatb ≤ ℓ, and thattheseb nodes are a subset of theℓ eavesdropped nodes. In theflow graph, this corresponds to eavesdropping on someℓ inputnodesxi1

in, . . . , xiℓin and controlling a subset of sizeb of these

nodes and the corresponding output nodes. A similar modelwas studied in [23], [24], [25] where the authors consider alimited-knowledge adversary that can eavesdrop and controledgesrather thannodesin multicast networks.

IV. RESULTS

The primary goal of this work is to secure distributed stor-age systems with repair dynamics in the presence of differenttypes of intruders: passive eavesdropper, active omniscientadversary and active limited-knowledge adversary. We addressthe following issues:

• In the case of a passive eavesdropper, we study thesecrecy capacityCs of the DSS, i.e., the maximumamount of data that can be stored on the DSS anddelivered to a legitimate data collector without revealingany information about the data to the intruder.

• In the case of an active adversary, we study theresiliencycapacityCr of the DSS,i.e., the maximum amount ofdata that can be stored on the DSS and reliably madeavailable to a legitimate data collector.

For a DSS with symmetric repair, we provide upper boundson thesecrecycapacity andresiliencycapacity. These boundsare maximized for the choice of repair degreed = n − 1.In this case, we provide explicit coding schemes that canachieve these bounds in the bandwidth-limited regime. Ourresults are summarized in Table I. We also show that for theactive adversary controllingb nodes, our capacity achievingschemes can identify a list, of size at most2b nodes, that isguaranteed to contain the malicious nodes. Thus, the systemcan be expurgated of these corrupt nodes, and thereby itsresiliency to active adversaries is rejuvenated.

The upper bounds in Table I are based on cut argumentsover the information flow graph representing the DSS [4]. Notethat when there is no intruder,i.e., ℓ = b = 0, all the upperbounds in the second column of the Table I collapse to the DSS

Page 5: Securing Dynamic Distributed Storage Systems against ... · Securing Dynamic Distributed Storage Systems against Eavesdropping and Adversarial Attacks Sameer Pawar, Salim El Rouayheb,

5

Adversary Model Upper bound Bandwidth limited regime (Γ)γ = dβ d = n− 1, dβ = Γ

Passive eavesdropper (ℓ < k, b = 0) Cs(α, γ) ≤∑k

i=ℓ+1 min(d − i+ 1)β, α CBLs (Γ) =

∑ki=ℓ+1(n− i)β

Active omniscient adversary (ℓ = k, 2b < k) Cr(α, γ) ≤∑k

i=2b+1 min(d− i+ 1)β, α CBLr (Γ) =

∑ki=2b+1(n− i)β

Active limited-knowledge adversary(ℓ, b ≤ ℓ) Cr(α, γ) ≤∑k

i=b+1 min(d − i+ 1)β, α CBLr (Γ) =

∑ki=b+1(n− i)β

TABLE ISUMMARY OF OUR CAPACITY RESULTS FOR ADSSD(n, k, d), WITH α UNITS OF STORAGE CAPACITY AT EACH NODE ANDγ = dβ REPAIR BANDWIDTH.AN ADVERSARY IS CHARACTERIZED BY TWO PARAMETERS: ℓ, THE NUMBER OF NODES IT CAN EAVESDROP ON, AND b, THE NUMBER OF NODES IT CAN

CONTROL. Cs AND Cr DENOTE THE SECRECY CAPACITY AND RESILIENCY CAPACITY, RESPECTIVELY.Γ IS THE UPPER LIMIT ON THE REPAIRBANDWIDTH FOR THE BANDWIDTH-LIMITED REGIME . NOTE THAT IF THE CONDITIONS ONℓ, b SPECIFIED IN THE FIRST COLUMN ARE NOT SATISFIED,

THEN Cs, Cr ARE EQUAL TO ZERO

capacityM =∑k

i=1 min(d− i+1)β, α which was derivedin the original work of [4]. The upper bound on the secrecycapacityCs, for the case of a passive eavesdropper can beexplained intuitively by recognizing that when the DSS knowsthe identity of theℓ compromised nodes it can discard themand avoid using them for storage. Hence, in the expressionof the upper bound onCs, we see a loss ofℓ terms in thesummation as compared to the capacity with no intruder.

The upper bound on the resiliency capacityCr, for thecase of an active omniscient adversary, is similar to the onederived in [6] and can be regarded as a network version ofthe Singleton bound: a redundancy of2b nodes is needed inorder to correct the adversarial errors onb nodes. Whereas,a feasible strategy for the limited-knowledge adversary istodelete the data stored on theb nodes it controls rendering themuseless resulting in the corresponding upper bound. Rigorousproofs of these results will be provided in the coming sections.

To get more insight into the above results for the bandwidth-limited case, we consider an asymptotic regime for the DSSwhere the number of nodes goes to infinity whereas theparametersk, ℓ andb are kept constant. We compute the ratiosCBL

s /M andCBLr /M , whereM is the capacity of the DSS in

the absence of any intruder. This ratio for the secrecy capacityis,

CBLs (Γ)

M=

∑k

ℓ+1(n− i)β∑k

1(n− i)β≈ 1−

k, (1)

as n → ∞. Similarly, for the resiliency capacities, we havefor omniscient adversary,

CBLr (Γ)

M≈ 1−

2b

k. (2)

And for limited-knowledge adversary,

CBLr (Γ)

M≈ 1−

b

k. (3)

Note that these asymptotic ratios are reminiscent of the ca-pacity of the classical wiretap channel [31] in the case of apassive eavesdropper (1), the Singleton bound [32] in the caseof omniscient adversary (2), and the capacity of the erasurechannel [33] for the case of limited-knowledge adversary (3).

V. PASSIVE EAVESDROPPER

In this section, we consider a distributed storage systemD(n, k, d) in the presence of a passive intruder “Eve”. Asdescribed in Section III, Eve can eavesdrop on anyℓ < kstorage nodes2 of her choice in order to learn informationabout the stored file. However, Eve cannot modify the dataon these nodes. We assume that Eve has complete knowledgeof the storage and repair schemes implemented in the DSS.Next, we define thesecrecy capacityof a DSS as the maximumamount of data that can be stored on a DSS under aperfectsecrecyrequirement,i.e., without revealing any informationabout it to the eavesdropper.

A. Secrecy Capacity

Let S be a random variable uniformly distributed overFRq

representing the incompressible data file of sizeR symbols atthe source node, which is to be stored on the DSS. Thus, wehaveH(S) = R (in baselogq). Let Vin := x1

in, x2in, . . .

andVout := x1out, x

2out, . . . be the sets of input and output

storage nodes in the flow graph, respectively. For each storagenodevi, letDi andCi be the random variables representing itsdownloaded messages and stored content respectively. Thus,Ci represents the data observed by a data collector DC whenconnecting to nodevi. If vi is compromised while joiningthe DSS, Eve will observe all its downloaded dataDi, withH(Di) ≤ γ, and not only what it stores.

Let V aout be the collection of all subsets ofVout of cardinal-

ity k consisting of the nodes that are simultaneously active,i.e., not failed, at a certain instant in time. For any subsetB ofVout, defineCB := Ci : x

iout ∈ B. Similarly for any subset

E of Vin, defineDE := Di : xiin ∈ E. The reconstruction

property at the data collector can be written as

H(S|CB) = 0 ∀B ∈ V aout, (4)

and the perfect secrecy condition implies

H(S|DE) = H(S) ∀E ⊂ Vin and |E| ≤ ℓ. (5)

2When Eve observesℓ ≥ k the secrecy capacity of the system is triviallyequal to zero since Eve can implement the data collector’s scheme to recoverall the stored data.

Page 6: Securing Dynamic Distributed Storage Systems against ... · Securing Dynamic Distributed Storage Systems against Eavesdropping and Adversarial Attacks Sameer Pawar, Salim El Rouayheb,

6

Given a DSSD(n, k, d) with ℓ compromised nodes, itssecrecy capacity, denoted byCs(α, γ), is then defined to be themaximum amount of data that can be stored in this system suchthat the reconstruction property in (4) and the perfect secrecycondition in (5) are simultaneously satisfied for all possibledata collectors and eavesdroppers,i.e.,

Cs(α, γ) := supH(S|CB) = 0 ∀B

H(S|DE) = H(S) ∀E

H(S), (6)

whereB ∈ V aout, E ⊂ Vin and |E| ≤ ℓ.

B. Special Cases

Before we proceed to the general problem of determiningthe secrecy capacity of a DSS, we analyze two special casesthat shed light on the general problem.

1) Static Systems:A static version of the problem studiedhere corresponds to a DSS with ideal storage nodes that donot fail, and hence there is no need for repair in the system.The flow graph of this system constitutes then a well-knownmulticast network studied in network coding theory calledthe combination network [15, Chap. 4]. Therefore, the staticstorage problem can be regarded as a special case of wiretapnetworks [18], [20], or equivalently, as the erasure-erasurewiretap-II channel studied in [34]. The secrecy capacity forsuch systems is equal to(k− ℓ)α, and can be achieved usingeither the nested MDS codes of [34] or the coset codes of[20], [31].

Even though the above proposed solution is optimal for thestatic case, it can have a very poor security performance whenapplied directly to dynamic storage systems experiencingfailures and repairs. For instance, consider the straightforwardway of repairing a failed node by downloading the whole fileand regenerating the lost data. In this case, if Eve observesthenew replacement node while it is downloading the whole file,she will be able to reconstruct the entire original data. Hence,no secrecy scheme will be able to hide any part of the datafrom Eve, and the secrecy rate would be zero.

The case of static systems highlights the new dimension thatthe repair process brings into the secrecy picture of distributedstorage systems. The dynamic nature of the DSS renders itintrinsically different from the static counterpart making therepair process a key factor that should be carefully designedin order not to jeopardize the whole stored data.

2) Systems Using Random Network Coding:Using the flowgraph model, the authors of [4] showed thatrandom linearnetwork codesover a large finite field can achieve any point(α, γ) on the optimal storage-repair bandwidth tradeoff curvewith a high probability. Consider an example of a randomlinear network code used in a compromised DSSD(4, 3, 3)which stores a file of sizeR = 6 symbols withβ = 1, i.e.,γ = dβ = 3, andα = 3. From [4], it can be shown using themax-flow min-cut theorem that the maximum file size that canbe stored on this DSS is equal to6 symbols. In this case, eachof the initial nodesv1, . . . , v4 store3 independently generatedrandom linear combinations of the6 information symbols.Assume now that nodev4 fails (see Fig. 3) and is replaced

DC

α

α

x5in

x6inx2

in

x1in

β = 1

β = 1

x3in

x4in

x1out

x5out

x4out

x6outx2

out

x3out

v1

v5v3

v2

v4

α

α

v6File

α

α

∞∞

s

R = 6

Fig. 3. The DSSD(4, 3, 3) with (α, γ) = (3, 3), i.e., β = 1. Eve canobserveℓ = 2 nodes. Nodev4 fails and is replaced by nodev5, which failsin turn after some time and is replaced by nodev6. Nodesv5 and v6 arecompromised and shown with broken boundaries. If random network codingis used and Eve observes nodesv5 and v6 during repair, it will be able todecode all the stored data with a high probability.

by a new nodev5 that connects tov1, v2, v3 and downloadsfrom eachβ = 1 random linear combination of their storeddata. Now suppose that nodev5 fails after some time and isreplaced by nodev6 in a similar fashion. Ifℓ = 2 and Eve hadaccessed nodesv5 and v6 while they were being repaired, itwould observe6 random linear equations of the data symbols.Since the underlying field is typically of large size, the6 linearequations observed by Eve are linearly independent with highprobability. Hence, she will be able to reconstruct the wholefile, and the secrecy rate here is equal to0. Later in Example 3we present a scheme that achieves a secrecy rate of1 unit forthis DSS.

While random network codes are appealing for use indistributed storage systems due to their decentralized natureand low complexity, the above analysis shows that this maynot always be the case for achieving security. This is also incontrast with the case of multicast networks where an intrudercan observe a fixed number of edges instead of nodes [18],wherein, random network coding performs as good as anydeterministic secure code [21].

C. Results on Passive Eavesdropper

We present here our two main results for the compromisedDSS withpassive eavesdropper:

Theorem 1:[Secrecy Capacity Upper Bound] For a dis-tributed storage systemD(n, k, d), with ℓ < k compromisednodes, the secrecy capacity is upper bounded by

Cs(α, γ) ≤k∑

i=ℓ+1

min(d− i+ 1)β, α, (7)

whereβ = γ/d.In the bandwidth-limited regime, we have a constraint on

the repair bandwidthγ ≤ Γ, while no constraint is imposedon the node storage capacityα. The secrecy capacity in this

Page 7: Securing Dynamic Distributed Storage Systems against ... · Securing Dynamic Distributed Storage Systems against Eavesdropping and Adversarial Attacks Sameer Pawar, Salim El Rouayheb,

7

α

v3

x3in

x3out

αx2in x2

out

v2

x1in

αx1out

v1

v5

β = 1

α

v4x4in x4

out

x5in x5

outα

s

DC

Fig. 4. The flow graph of the DSSD(4, 3, 3) with (α, γ) = (3, 3), β = 1and ℓ = 2. Node v3 fails and is replaced by nodev5. Nodes v1, v2are compromised to Eve and are shown with broken boundaries.A datacollector DC connects to nodesv1, v2, v5 to retrieve the data file. Thedata collector can get at most one unit of information securely on the path(s, x4

in, x4out, x

5in, x

5out,DC) which is not observed by Eve.

regime is thus defined as

CBLs (Γ) := sup

γ ≤ Γα ≥ 0

Cs(α, γ) (8)

≤ supγ ≤ Γ

k∑

i=ℓ+1

(d− i+ 1)β. (9)

The last inequality follows from Theorem 1 by settingα = Γ.When the parameterd is a system design choice, the maximumin the above optimization is attained atd∗ = n − 1. InSection V-D, we demonstrate a scheme that achieves this upperbound, thereby establishing the following theorem.

Theorem 2:[Secrecy Capacity: Bandwidth-LimitedRegime] For a distributed data storage systemD(n, k, d)with d = n − 1 and ℓ < k compromised nodes, the secrecycapacity in the bandwidth-limited regime is given by

CBLs (Γ) =

k∑

i=ℓ+1

(n− i)β,

where β = Γn−1 and can be achieved for a node storage

capacityα = Γ.Before we proceed to prove the above theorems, we consider

an example that gives insights into the proof techniques.Example 3:Consider again the DSSD(4, 3, 3) operating at

α = 3, β = 1 and ℓ = 2 of Section V-B2. We show first thatthe upper bound on the secrecy capacity of this system is1 asgiven by Theorem 1, and then provide a scheme that achievesit.

To obtain the upper bound on the secrecy capacity, considerthe flow graph of this DSS shown in Fig. 4 where nodesv1and v2 are compromised and observed by Eve. Suppose thatnode v3 fails and is replaced byv5 that downloadsβ = 1unit of information from each of thed = 3 nodesv1, v2, v4.We focus now on a data collector that connects to the threenodesv1, v2 andv5 to reconstruct the source file. Even if thesource nodes and the data collector knew the location of the

eavesdropper, it can get at most one unit of secure informationby ignoring all the information received from the compromisednodes. The data can only be conveyed securely through thepath(s, x4

in, x4out, x

5in, x

5out,DC), that has a “bottleneck” edge

(x4out, x

5in) with capacityβ = 1 unit. Since our analysis is

based on a worst case scenario, this gives an upper bound of1 unit on the secrecy capacity. This bound can be reinterpretedas taking the minimum value of a cut separating the sources from any data collector in the flow graph after deletion ofany two nodes. This argument can be generalized to any DSSD(n, k, d) by finding an upper bound on the value of themin-cut in the flow graph after deletingℓ nodes. Thus, weobtain the upper bound of Theorem 1 whose detailed proof isprovided in Appendix A.

Before we provide a coding scheme that achieves the previ-ous upper bound, we define thenested MDS codes[34] whichwill be an important building block in our code construction.

Definition 4 (Nested MDS Codes):An (n, k) MDS codewith generator matrixG is called nested if there exists a

positive integerk0 < k such thatG =

[

G1

G2

]

, with G1, of

dimensions(k0 ×n), itself is a generator matrix of an(n, k0)MDS code.

Our proposed capacity-achieving code is depicted in Fig. 5and consists of the concatenation of an outer nested MDS codewith a special inner repetition code that was introduced in [9]for constructing exact regeneration codes. LetS ∈ Fq denotethe information symbol that is to be securely stored on thesystem andK = [K1 . . . K5] be a vector of independentrandom keys each uniformly distributed overFq. The MDScoset code is chosen to be a nested MDS code [34] with its

generator matrix given byG :=

[

GK

GS

]

, where

GK =

1 1 0 0 0 01 0 1 0 0 01 0 0 1 0 01 0 0 0 1 01 0 0 0 0 1

, and

GS =[

1 0 0 0 0 0]

.

Note that the matrixG :=

[

GK

GS

]

a generator of a(6, 6)

MDS code and the sub-matrixGK is a generator of an(6, 5)MDS code (k0 = 5). Hence, the code generated byG is anested MDS code. Set,Z = S+

∑5i=1 Ki, then the codeword

X given by

X =[

K S]

[

GK

GS

]

, (10)

can be written asX =[

Z K1 . . . K5

]

. The encodedsymbolsZ,K1, . . . ,K5 are then stored on the nodesv1, . . . , v4as shown in Fig. 5, following the special repetition code ofRashmi et al [9], which we henceforth refer to asRSKR-repetition code.

In the RSKR-repetition code used here, nodesv1, . . . , v4store respectivelyZ,K1,K2, Z,K3,K4, K1,K3,K5and K2,K4,K5. Since d = 3, in the case of a failure

Page 8: Securing Dynamic Distributed Storage Systems against ... · Securing Dynamic Distributed Storage Systems against Eavesdropping and Adversarial Attacks Sameer Pawar, Salim El Rouayheb,

8

the new replacement node contacts all the3 remaining activenodes in the system and recovers an exact copy of the lostdata. For example, when nodev1 fails the new replacementnode connects to nodesv2, v3 and v4 and downloads thesymbolsZ,K1 and K2 from each, respectively. It can alsobe checked that a data collector connecting to any3 nodesobserves all the symbolsZ,K1, . . . ,K5 and hence can decodethe information symbolS as S = Z −

∑5i=1 Ki. However,

an eavesdropper accessing any two nodes will observe somesubset of5 symbols out of6, and therefore cannot obtain anyinformation aboutS.

RSKR−repetition code

coset codeMDS

K2K1

K1 K3 K5

K5K4K2

K3 K4Nodev2

Nodev3

Nodev1

Nodev4

Z

Z

S

Informationsymbol

K1, . . . , K5

Z,K1, . . . , K5

Random keys

Z = S +∑5

i=1Ki

Fig. 5. A schematic representation of the optimal code for the DSSD(4, 3, 3), operating at(α, γ) = (3, 3) with ℓ = 2, that achieves the secrecycapacity of1 unit. The information symbolS and5 independent random keysare mixed appropriately using an MDS coset code. The encodedsymbols arethen stored on the DSS using the RSKR-repetition code. An eavesdropperobserving anyℓ = 2 nodes cannot get any information about the storedsymbolS.

In the following section, we provide a generalization of thecode in this example, and show that it achieves the secrecycapacity of DSS ford = n−1 in the bandwidth-limited regime,thus proving Theorem 2.

D. Secrecy Capacity in the Bandwidth-Limited Regime

The special cases studied in Section V-B pointed out thatthe main difficulty in determining the secrecy capacity ofdistributed storage systems is due to its dynamic nature. Wewill demonstrate that in the bandwidth-limited regime ford = n − 1, with a careful choice of code, it is possible totransform the problem of secrecy over a dynamic DSS intoa static problem of secrecy over a point to point channelequivalent to the erasure-erasure wiretap channel-II in [34].Then, we show that using nested MDS codes at the sourceone can achieve the secrecy capacity of the equivalent wiretapchannel.

Our approach builds on the results of [9] where the authorsconstructed a family of exact regenerating codes for the DSSD(n, k, d) with d = n − 1, α = dβ. The “exact” propertyof these codes allows any repair node to reconstruct andstore an identical copy of the data lost upon a failure. Thecode construction in [9] consists of the concatenation of anMDS code with the RSKR-repetition code. This constructionis instrumental for obtaining codes that can achieve the secrecycapacity by carefully choosing the outer code to be a nestedMDS coset code as was done in Example 3.

For simplicity, we will explain the code forβ = 1, i.e., Γ =n−1. For any larger values ofΓ, and in turn ofβ, the file canbe split into chunks, each of which can be separately encodedusing the construction corresponding toβ = 1. Since the DSS

Nodev1

Nodev3

Nodev2

Nodevn

...

. . .

. . .

. . .

...

x1 x2 x3 xd

x2d−1

xd+1

x2d x3d−3

xd+2 x2d−1

x3d−3

x1

x2

xd

xd+1

...

Fig. 6. The structure of the RSKR-repetition code of Rashmi et al [9] forn storage nodes,α = d = n − 1, β = 1 and θ =

n(n−1)2

. The RSKR-repetition code stores2 copies of each coded symbol,i.e., the total numberof stored symbols isnd = 2θ.

is operating in the bandwidth-limited regime with no constrainton the node storage capacity, we chooseα = Γ. From [4], weknow that for a DSSD(n, k, d = n−1) with α = n−1, β = 1the capacity in the absence of an intruder (ℓ = 0) is M =∑k

i=1(n − i). Let R :=∑k

i=ℓ+1(n − i) be the maximumnumber of information that we could store securely on theDSS, andθ := n(n−1)

2 . Let S = (s1, . . . , sR) ∈ FRq denote the

information file andK = (K1, . . . ,KM−R) ∈ FM−Rq denote

M − R independent random keys each uniformly distributedoverFq. Then, the proposed code consists of an outer(θ,M)nested MDS code (see (10)) which takesS andK as an inputand outputsX = (x1, . . . , xθ), as,

X =[

K S]

[

GK

GS

]

,

where,G :=

[

GK

GS

]

is a generator matrix of a(θ,M) MDS

code such thatGK itself is a generator matrix of a(θ,M −R) MDS code. This outer(θ,M) nested MDS code is thenfollowed by an inner RSKR-repetition code which stores thecodewordX on the DSS following the pattern depicted inFig. 6.

The RSKR-repetition codes were introduced in [9] as amethod for constructing exact regenerating codes for a dis-tributed storage system. These codes consist of “filling” thestorage nodesv1, . . . , vn successively, by repeating “verti-cally” (i.e, across all the nodes) the data stored “horizontally”(i.e., on a single storage node), as shown in Fig. 6. Thisprocedure can be described using an auxiliary complete graphover n verticesu1, . . . , un that consists ofθ edges. Supposethe edges are indexed by the coded symbolsx1, . . . , xθ. Thecode then consists of storing on nodevi the indices of theedges adjacent to vertexui in the complete graph. As a result,the RSKR-repetition code has a special property that everycoded symbolxi is stored on exactly two storage nodes, andany pair of two storage nodes have exactly one coded symbolin common. This property along with the fact that the repairdegreed = n− 1, enables the exact repair of any failed nodein the DSS as it was explained in Example 3.

The use of the RSKR-repetition code transforms the dy-

Page 9: Securing Dynamic Distributed Storage Systems against ... · Securing Dynamic Distributed Storage Systems against Eavesdropping and Adversarial Attacks Sameer Pawar, Salim El Rouayheb,

9

namic storage system into a static point-to-point channel asexplained below. Notice first that sinceΓ = α = n−1, all thedata downloaded during the repair process is stored on the newreplacement node without any further compression3. Thus,accessing a node during repair,i.e., observing its downloadeddata, is equivalent to accessing it after repair,i.e., only observ-ing its stored data. Second, the RSKR-repetition code restorethe replacement node with an exact copy of the lost data.Therefore, even though there are failures and repairs, the datastorage system looks exactly the same at any point of time:any data collector downloadsM symbols out ofx1, . . . , xθ

by contactingk nodes, and any eavesdropper can observeµ =

∑ℓ

i=1(d − i + 1) = M − R symbols. Thus, the systembecomes similar to the erasure-erasure wiretap channel-IIofparameters(θ,M, µ)4. Therefore, since the outer code is anested MDS code, from [34] we know that it can achieve thesecrecy capacity of the erasure-erasure wiretap channel whichis equal toM − µ. Hence for the DSS, our codes achieve thesecrecy rate of

M − (M −R) = R =

k∑

i=ℓ+1

(n− i).

This rate corresponds toβ = 1. For the general case whenβ = Γ/(n− 1), the total secrecy rate achieved is,

k∑

i=ℓ+1

(n− i)β,

thus completing the proof of Theorem 2.

VI. A CTIVE OMNISCIENT ADVERSARY

In this section we study distributed storage systems in thepresence of an active adversary “Calvin” that can control uptob nodes. Calvin can choose to control anyb nodes among allthe storage nodes,v1, v2, . . . , and possibly at different timeinstances as the system evolves in time due to failures andrepairs. Moreover, Calvin is assumed to be omniscient (l =k), so he knows the source fileF . Moreover, since he hascomplete knowledge of the storage and repair schemes, heknows the content stored on each node in the system. Underthis setting, we define theresiliency capacityof a DSS as themaximum amount of data that can be stored on the DSS anddelivered reliably to any data collector that contacts anyknodes in the system.

Example 5:Consider again our example of the DSSD(4, 3, 3) with α = γ = 3. Assume that there is an omniscientactive adversary Calvin that can control one storage node,i.e.,b = 1, and can modify its stored data and/or its messagesoutgoing to data collectors and repair nodes.

A first approach for finding a scheme to reliably store dataon this DSS would be to use the results in the network codingliterature [24], [27], [28], [29] on the capacity of multicast

3This corresponds to theMinimum Bandwidth Regenerating(MBR) codesdescribed in [4].

4In the erasure-erasure wiretap channel-II of parameters(θ,M, µ), thetransmitter sendsθ symbols through an erasure channel to a legitimate receiverthat receivesM symbols. The eavesdropper can observe anyµ symbols outof the transmittedM [34].

Nodev2

Nodev3

Nodev1

Nodev4

x1, x2, . . . , x6

xi = m

x1 x2 x3

x1 x4 x5

x2

x3

x4

x5

x6

x6

m ∈ 0, 1

code

(6, 1)

Repetition

Fig. 7. A coding scheme for storing1 bit reliably on the DSSD(4,3,3) withα = 3 bits andβ = 1, in the presence of an omniscient adversary Calvinwho controlsb = 1 node.

networks in the presence of an adversary that can controlt edges of unit capacity each. It is shown there that theresiliency capacity of these networks is equal toΩ−2t, whereΩ is the capacity of the multicast network in the absenceof the adversary. This resiliency capacity can be achieved byoverlaying an error-correction code such as a Maximum RankDistance (MRD) code [21] on top of the network at the source.This approach turns out to be not very useful here. In fact, thecapacity in the absence of Calvin is6 (see [4]), andb = 1corresponds tot = α = 3. Hence, the above approach willachieve a storage rate of6− 2t = 0.

We now give a coding scheme that can reliably store 1bit of information for the DSS. Later, we show that this isalso the best that can be done,i.e., the resiliency capacity ofthis DSS is equal to 1 unit. The proposed code is formed byconcatenating a(6, 1) repetition code with an RSKR-repetitioncode as shown in Fig 7. The repair process is that of theRSKR-repetition codes described in Section V-D. When anode fails, the replacement node recovers the lost bits bydownloading the bits with same indices from the remainingthree active nodes.

Any data collector contacting three nodes will observe 9bits. In the static case, when no failure or repair occur, only3 bits (the ones stored on the compromised node) among the9 bits observed by the data collector may be erroneous. Inthat case, the DC can perform a majority decoding to recoverthe information bit. However, in the dynamic model, the DCcan receive up to 5 erroneous bits. To show how this mayoccur, assume that the DSS is storing the all-zero codeword,i.e., xi = 0 for i = 1, . . . , 6, in Fig. 7, corresponding tothe messagem = 0. Suppose that nodev1 is the one thatis compromised and controlled by the adversary Calvin asshown in Fig. 8. Assume that Calvin changes all the 3 storedbits (x1, x2, x3) on nodev1, from (0, 0, 0) to (1, 1, 1) andalso sends the erroneous bit “1” wheneverv1 is contactedfor repair. Now suppose that nodev2 fails and it is replacedby nodev5 which, based on the RSKR-repetition structure,downloads bitsx1 = 1, x4 = 0 and x5 = 0 from nodesv1, v3 andv4 respectively. Suppose also that, after some periodof time, nodev3 fails and is replaced by nodev6 whichdownloads bitsx2 = 1, x4 = 0 and x6 = 0 from nodesv1, v4 and v5 respectively. An important point to note hereis that our repair scheme is fixed and is based on the RSKR-repetition structure irrespective of the possible errors in thebits downloaded during the repair process. As a result a datacollector that contacts nodesv1, v5 and v6 observes the dataas shown in the table in Fig. 8 which includes5 errors.

Page 10: Securing Dynamic Distributed Storage Systems against ... · Securing Dynamic Distributed Storage Systems against Eavesdropping and Adversarial Attacks Sameer Pawar, Salim El Rouayheb,

10

DC

DC observation

Node

Node

Node

1 1 1

1 0 0

1 0 0

v1

v6

v5

α

v4x4in x4

out

αx5in x5

out

v5

v6

αx6in x6

out

α

v3

x3in

x3out

x1in

αx1out

v1

αx2in x2

out

v2

’1’

’1’

’0’

’0’

’0’

’0’

s

∞∞

Fig. 8. Nodev1 with broken boundary is compromised and controlled by anomniscient adversary Calvin. Nodesv2 andv3 fail, and are replaced by nodesv5, v6 respectively. The all-zero codeword corresponding to messagem = 0is stored on the DSS. The Data collector DC connecting to nodes v1, v5 andv6 observes a total number of9 bits out of which5 bits are erroneous andequal to “1” as shown in the table above.

In a worst case scenario, Calvin will be able to corrupt allthe bits in the DSS having the same indices as the bits storedon the nodes it controls (here the bits with labelsx1, x2 andx3). Therefore, Calvin can introduce at most 5 erroneous bitson a collection ofk = 3 nodes which may be observed by adata collector. In this case, a majority decoder, or equivalentlya minimum Hamming distance decoder, will not be able todecode to the correct message.

To overcome this problem, we exploit the fact that Calvincontrols only one node, so he can introduce errors only inspecific patterns, to design a special decoder that will alwaysdecode to the correct messagem irrespective of Calvin’sadversarial strategy. In fact, for any possible choice of thecompromised node, one of the following four setsT1 =x4, x5, x6, T2 = x2, x3, x6, T3 = x1, x3, x5 andT4 =x1, x2, x4 is a trusted setthat only contains symbols thatwere not altered by Calvin. For example, when Calvin controlsv1, the trusted set isT1. The proposed decoder operates inthe following way. First, it finds a setT ∗ ∈ T1, . . . , T4whose elements all agree to either 0 or 1. Then, it declaresaccordingly that messagem = 0 or m = 1 was stored. Thisdecoder will always decode to the correct message since eachsetTi intersects with every other setTj, j 6= i, in exactly onesymbol and one of them is a trusted set. Therefore each setTi contains at least one symbol which is unaltered by Calvin.Thus, if all the symbols inTi agree, they will agree to thecorrect message.

A. Results on Omniscient Adversary

In [6], the resiliency capacity of unicast networks with asingle compromised node was analyzed and a cut-set upperbound was derived. In the following, Theorem 6 generalizesthe bound in [6] for the case of distributed storage systems,whereb ≥ 1 nodes are controlled by an omniscient adversary.

Theorem 6:[Resiliency Capacity Upper Bound] Considera distributed storage system DSSD(n, k, d). If an omniscientadversary controls anyb ≥ 1 nodes, with2b < k, the resiliencycapacityCr(α, γ) is upper bounded as,

Cr(α, γ) ≤

k∑

i=2b+1

min(d− i+ 1)β, α, (11)

whereβ = γ/d. If 2b ≥ k, thenCr(α, γ) = 0.This bound is a network version of the Singleton bound and

is obtained by computing the value of certain cuts in the flowgraph of the DSS after the deletion of2b nodes. The detailedproof of the above theorem is given in Appendix B.

The resiliency capacity in the bandwidth-limited regime isdefined as

CBLr (Γ) := sup

γ ≤ Γα ≥ 0

Cr(α, γ),

whereΓ is the upper limit on the total repair bandwidth. Weagain note that if the parameterd is a system design choice,the upper bound of Eq. (11) in the bandwidth-limited regimeis maximized ford = n − 1. In the following section weexhibit a scheme that achieves this upper bound. This resultis summarized in Theorem 7.

Theorem 7:Consider a distributed storage systemD(n, k, d = n − 1) operating in the bandwidth-limitedregime. If an omniscient adversary controlsb nodes, with2b < k, the resiliency capacity of the DSS is given by

CBLr (Γ) =

k∑

i=2b+1

(n− i)β, (12)

where β = Γn−1 and can be achieved for a node storage

capacityα = Γ. If 2b ≥ k, thenCBLr (Γ) = 0.

B. Resiliency Capacity in the Bandwidth-Limited Regime

Similar to the proof of Theorem 2, it suffices to show theachievability forβ = 1, i.e., Γ = n − 1. In this case, ourcapacity achieving code uses a node storage capacityα = n−1symbols.

The code has a similar structure to the scheme used inSection V for the case of a passive adversary and is a gener-alization of the code used in Example 5. The(6, 1) repetitioncode in the example is replaced by an(θ,R) MDS codewhereR := Cr(n − 1) =

∑k

i=2b+1(n − i) and θ = n(n−1)2 .

In the second layer, the output of the MDS code is storedon the DSS following the RSKR-repetition structure as inFig 6. As explained in Example 5, node failures are repairedusing the RSKR-repetition structure (also see Section V foradditional details) irrespective of the possible errors introducedby Calvin. Notice that the MDS code used here has a rate

Page 11: Securing Dynamic Distributed Storage Systems against ... · Securing Dynamic Distributed Storage Systems against Eavesdropping and Adversarial Attacks Sameer Pawar, Salim El Rouayheb,

11

lower then the one used in the passive adversary case inSection V-D to allow for correcting the errors introduced bythe adversary.

A data collector accessing anyk nodes will observe a totalof αk = (n− 1)k symbols, out of whichM =

∑k

i=1(n − i)

symbols have distinct indices, andk(k−1)2 symbols are re-

peated due to the RSKR-repetition code. The adversary cancorrupt identically the two copies of each symbol stored onthe b controlled nodes. Therefore, the data collector focuseson M symbols with distinct indices out of(n− 1)k and usesthem for decoding. TheseM symbols with distinct indicesform a codeword of an(M,R) MDS code, sayX , which arepossibly corrupted by the errors introduced by the adversary.The minimum distance of the MDS codeX is,

dmin(X ) = M −R+ 1 =

2b∑

i=1

(n− i) + 1. (13)

The adversary that controlsb nodes can introduce up tot =

∑b

i=1(n− i) errors in the set ofM symbols with distinctindices. A simple manipulation shows thatt > ⌊dmin(X )−1

2 ⌋.Therefore, a classical minimum distance decoder forX willnot be able to recover the original file. Thus, the minimumdistance decoder fails for this specific adversarial strategywhere Calvin corrupts the repeated symbols identically andcannot be used for a general adversarial strategy.

Next, we present a novel decoder that can correct errorsbeyond the classical upper bound of⌊dmin(X )−1

2 ⌋ in the DSS.The main idea is to take advantage of the special structure ofthe error patterns that can be introduced by the adversary.

First, we introduce two definitions that will be useful indescribing the decoding algorithm and that will serve as ageneralization of the concept of trusted set in the previousexample.

Definition 8: Puncturing a vector: Consider a vector~v ∈FN for some fieldF. Let I ⊂ 1, 2, . . . , N, |I| = p, be a

given set. Thenpuncturingvector~v with patternI correspondsto deleting the entries in~v indexed by the elements inI toobtain a vector~vI ∈ F

N−p.Definition 9: Puncturing a Code: Consider a codeC in

FN . Let I ⊂ 1, 2, . . . , N, |I| = p, be a given set. The

punctured codeCI is obtained bypuncturingall the codewordsof C with patternI, i.e.,

CI := ~xI |~x ∈ C.

Proposition 10: If C is an MDS code with parameters(n, k)then for any given fixed patternI ⊂ 1, 2, . . . , n, |I| = p <(n− k+1), the punctured codeCI is also an MDS code withparameters(n− p, k).

Decoding Algorithm: Let B, |B| ≤ b, denote the set ofstorage nodes controlled by the adversary. Because of the exactrepair property of the RSKR-repetition codes, it is sufficientto focus on the case whenB ⊂ v1, . . . , vn with |B| = b.For each such setB, we defineIB ⊂ 1, 2, . . . , θ to be theset of the indices of the symbols stored on the nodes inB.For instance, in Example 5, ifB = v1, IB = 1, 2, 3.

The decoding algorithm proceeds in the following way:

1) The data collector connecting tok nodes selects anyM symbols with distinct indices, out of the(n − 1)kobserved symbols, as its inputY ∈ F

Mq for decoding. In

Example 5, Fig. 8, the DC connecting to nodesv1, v5, v6observes vector(y1, y2, y3, y1, y4, y5, y2, y4, y6). Af-ter removing the repeated symbols, we getY =(y1, y2, y3, y4, y5, y6). Note for a fixed DC,Y is acodeword of an(M,R) MDS code which we callX .Y includes possible errors introduced by the adversary.The codeX itself is a punctured code of the outer(θ,R)MDS code.

2) For eachB ⊂ v1, . . . , vn, |B| = b, find IB .3) PunctureY and the codeX with patternIB to obtain

the observed wordYIB and punctured codeXIB . Notethat due to the RSKR-repetition structure, the size ofsuch puncturing pattern is

|IB | =

b∑

i=1

(n− i)

which is less than the minimum distance of the MDScodeX (see (13)). Hence, by Proposition 10XIB is anMDS code.

4) Let HXIBbe the parity check matrix of the punctured

codeXIB . Compute the syndrome of the observed wordYIB as

~σIB = HXIBY TIB.

5) If ~σIB = 0, thenYIB is a codeword ofXIB . Assume itto be a trusted codeword and decode to message usingthe codeXIB .

Proof of Correctness:We now prove the correctness ofthe above decoding algorithm by showing that it will alwayscorrect the errors introduced by the adversary and outputthe correct message. Notice first that the syndrome~σIB willalways be equal to zero wheneverB = B∗, the actual set ofnodes controlled by the adversary (which is not known to thedata collector). Therefore, the above decoding algorithm willalways give an output. Next, we show that this output alwayscorresponds to the correct message stored on the DSS. Denoteby X the true codeword inX , that would have been observedby the DC in the absence of Calvin. LetB∗ be the set of thebtraitor nodes. Then, the proposed decoding algorithm failsiffthere exists some other setB 6= B∗, and some other codewordX ′ ∈ X , s.t. X ′ 6= X , for which YIB = X ′

IB∈ XIB . This

implies thatXIB∗∪IB = X ′

IB∪IB∗. (14)

But, from the RSKR-repetition code structure we know

|IB∗ ∪ IB | ≤

2b∑

i=1

(n− i). (15)

Equations (14) and (15) imply thatdmin(X ) ≤∑2b

i=1(n−i)which contradicts equation (13).

Remark 11 (Decoder complexity):The complexity of theproposed decoder is exponential in the numberb of maliciousnodes. Therefore, it is not practical for systems with large

Page 12: Securing Dynamic Distributed Storage Systems against ... · Securing Dynamic Distributed Storage Systems against Eavesdropping and Adversarial Attacks Sameer Pawar, Salim El Rouayheb,

12

values ofb. However, this decoder can be regarded as a prooftechnique for the achievability of the resiliency capacityCBL

r

of Theorem 7.Remark 12:[Expurgation of malicious nodes] As shown

above, the proposed decoder always decodes to the correctmessage, and thus, can identify the indices of any erroneoussymbols. The data collector can then report this set of indicesto a central authority (tracker) in the system. This authoritywill combine all the sets it receives, and knowing the RSKR-repetition structure (see Fig. 6), it forms a list of suspectednodes that will surely include the malicious nodes that aresending corrupted data to the data collectors. Since there areat mostb malicious nodes and each symbolxi is stored onexactly two nodes, the size of the list will be at most2b. Thesystem is then purged by discarding the nodes in this list.

VII. A CTIVE L IMITED -KNOWLEDGE ADVERSARY

In this section, we consider the case of a non-omniscientactive adversary with limited eavesdropping and controllingcapabilities. We assume the adversary can eavesdrop onℓnodes and control some subset ofb ≤ ℓ nodes out of theseℓ nodes. The adversary’s knowledge about the stored fileis limited to what it can deduce from the observed nodes.Moreover, we assume that the adversary knows the codingand decoding strategies at every node in the system. Clearlywhen ℓ ≥ k, the adversary becomes omniscient. We areinterested here in the limited-knowledge scenario that does notdegenerate into the omniscient model studied in the previoussection. For this case, we demonstrate that the resiliencycapacity of the DSS exceeds that of the omniscient case, andcan be achieved by storing a smallhash on the nodes inaddition to the data. Our approach is similar to that of [23],[24], [25], where the authors consider a limited-knowledgeadversary that can eavesdrop and controledgesrather thannodesin multicast networks.

Example 13:Consider a DSSD(5, 3, 4) with α = γ = 4with an adversary Charlie that can eavesdrop on and controlone node,i.e., b = ℓ = 1. In the omniscient case withb = 1,the resiliency capacity of this system as given by Theorem 7is equal to2. Here, we show that the limitation on Charlie’sknowledge can be leveraged to increase the resiliency capacityto 5.

First, we show that the resiliency capacity for this DSS isupper bounded by5. To that end, consider the case when nodev1 is observed and controlled by Charlie. Moreover, assumethat nodesv2 and v3 fail successively and are replaced bynodesv6 and v7 as shown in Fig. 9. Consider now a datacollector DC that connects to nodesv1, v6, v7 and wants toreconstruct the stored file. One possible attack that Charliecan perform, is to erase all the data stored on nodev1, i.e.,always change it to a fixed value irrespective of the storedfile. This renders nodev1 useless and the system performs asif node v1 was removed which reduces the value of the cutC(V, V ) (see Fig. 9) between the sources and data collectorDC to 5.

We now exhibit a code that uses a simple “correlation”hash scheme to achieve the above upper bound with highprobability.

DC

αs

x2in

x1in

x3in

x4in

x1out

x4out

x2out

x3out

v1

v3

v2

v4

α

α

α

v7

x7in x7

out

α

C(V, V )

β = 1

∞ x6in x6

out

v6

α

x5in x5

out

α = 4

v5

Fig. 9. The limited-knowledge adversary Charlie eavesdrops and con-trols node v1, shown with the broken boundary. If Charlie erases thedata stored on nodev1, the value of the cutC(V, V ), with V =x1

out, x6in, x

6out, x

7in, x

7out,DC, between the source nodes and a data

collector DC accessing nodesv6, v7, v8 becomes equal to5.

a) Code Construction:The code consists of an outer(10, 5) MDS code overFqv , followed by the RSKR-repetitioncode enabling the exact repair of the nodes in the case offailures. Furthermore, each data packetxi ∈ Fqv is appendedwith a hash vectorhi = (hi,1, . . . , hi,10) ∈ F

10q computed as,

hi,j = xixjT ,

for j = 1, 2, . . . , 10, where with abuse of notation,xi alsodenotes the vector(xi,1, . . . , xi,v) in F

vq representing the

corresponding element ofFqv . The schematic form of the codeis shown in Table II below.

For simplicity, we assume in this example that the hashvalues stored on the nodes are made secure from Charlie whocan neither observe, nor corrupt them. Later in Appendix C,we explain how this can be achieved in the general case witha negligible sacrifice in the system capacity. Note that eventhough Charlie cannot directly observe the hash table, he cangenerate some of the hash values using the observed datapackets onℓ = 1 eavesdropped nodes, since he knows thecoding scheme. Charlie can use these computed hash valuesto carefully introduce errors in the data symbols such that itis still consistent with these hash values.

Node data∈ Fqv hash∈ F10q

v1 x1,x2,x3,x4 h1,h2,h3,h4

v2 x1,x5,x6,x7 h1,h5,h6,h7

v3 x2,x5,x8,x9 h2,h5,h8,h9

v4 x3,x6,x8,x10 h3,h6,h8,h10

v5 x4,x7,x9,x10 h4,h7,h9,h10

TABLE IITHE SCHEMATIC FORM OF THE CODE STORED ON THEDSSD(5,3,4),

ALONG WITH THE SECURE HASH TABLE THAT IS NOT ACCESSIBLE TO THE

ADVERSARY CHARLIE .

b) Decoding logic: A data collector contacting3 nodesobserves 12 symbols in total. In a worst case scenario, Charliecan corrupt 6 out of these 12 symbols. This can happen,for instance, when Charlie eavesdrops and controls nodev1,

Page 13: Securing Dynamic Distributed Storage Systems against ... · Securing Dynamic Distributed Storage Systems against Eavesdropping and Adversarial Attacks Sameer Pawar, Salim El Rouayheb,

13

and maliciously changes its stored data fromxi to yi =xi + ei, ei 6= 0, i = 1, . . . , 4. Then, v2, v3 fail successively(as shown in Fig. 9) and Charlie sends the erroneous symbolsy1 and y2, respectively, to nodesv5 and v6 during therepair process. In this scenario, a data collector, unawareofCharlie’s actual node location, accessing nodesv1, v6 andv7 will have among its observation6 corrupted symbols,namely those having indices1, . . . , 4 as shown in Table III,where the symbolyi denotes the possibly corrupted versionof xi, i = 1, . . . , 9. Here, we haveyi = xi, i = 5, . . . , 9. Thetable also shows the hash vectors observed by the same datacollector.

Node data∈ Fqv hash∈ F10q

v1 y1,y2,y3,y4 h1,h2,h3,h4

v6 y1,y5,y6,y7 h1,h5,h6,h7

v7 y2,y5,y8,y9 h2,h6,h8,h9

TABLE IIITHE DATA SYMBOLS AND HASH VALUES OBSERVED BY THE DATA

COLLECTOR CONTACTING NODESv1, v6, v7 , WHEN NODEv1 ISCONTROLLED BY CHARLIE .

Among the 12 stored symbolsxi observed by the datacollector and their hasheshi, each of the 3 symbols withindices1, 2, 5 and the corresponding hash vectorsh1, h2, h5

are repeated twice. Since the adversary can change both copiesof each repeated data symbol identically, our decoder focusesonly on a set ofM = 9 symbols of distinct indices andthe corresponding hash vectors for decoding. Note that thecorresponding9 symbols(x1, . . . ,x9) form a codeword of a(9, 5) MDS code that we refer to asX .

Let H denote the9 × 9 hash matrix observed by the datacollector, obtained as

H =

h1

h2

...h9

,

where theith row hi ∈ F10q corresponds to the hash vector of

the symbolyi, i = 1, . . . , 9. The data collector then computesits own 9× 9 hash matrixH from the9 observed symbolsyi

asHij = yiyj

T , 1 ≤ i, j ≤ 9.

Then, it compares the entries inH with the correspondingentries inH to generate a9× 9 comparison table. Table IV isan example of such a comparison table where a “X” in position(i, j) indicates that the computed hash and the observed hashmatch,i.e., Hij = Hij , whereas “×” indicates thatHij 6= Hij

due to the errors introduced by the adversary.The decoder selects atrusted set of 5 symbols from

y1, . . . ,y9 that index a 5 × 5 sub-table of the com-parison table where all the entries are “X”, e.g., symbolsy5,y6,y7,y8,y9 in Table IV. It then sets the remaining4symbols as erasures and proceeds to decode using a min-imum distance decoder for the(9, 5) MDS code X , thatcan correct up to4 erasures. There always exists at leastone set of5 symbols that generates a consistent hash table,e.g., T = y5,y6,y7,y8,y9 when Charlie controls node

Data Symbol y1 y2 y3 y4 y5 y6 y7 y8 y9

y1 X X X X × × × × ×y2 X X X X × × × × ×y3 X X X X × × × × ×y4 X X X X × × × × ×y5 × × × × X X X X X

y6 × × × × X X X X X

y7 × × × × X X X X X

y8 × × × × X X X X X

y9 × × × × X X X X X

TABLE IVEXAMPLE OF THE COMPARISON TABLE OF THE HASH MATRICESH AND

H . NOTE THAT SINCECHARLIE OBSERVES THE DATA SYMBOLS

x1, . . . ,x4, HE CAN INTRODUCE ERRORS SUCH THAT THE HASHVALUES OF y1 . . . ,y4 ARE CONSISTENT.

v1. Hence, the proposed decoding will eventually stop andoutput a decoding decision. Next, we analyze the probabilityof selecting a trusted set that results in an error in decoding.

c) Error Analysis: Let E = x1, . . . ,x4 denote theset of data symbols observed by Charlie by eavesdropping onℓ = 1 node (v1 in this case). The above proposed decoder mayresult in an error only if the chosen trusted setT contains atleast one erroneous symbol, sayy1. Therefore, we can writey1 = x1 + e1 for some errore1 6= 0 ∈ Fqv . Any chosentrusted setT is also guaranteed to contain at least one error-free symbol that is not observed by Charlie, sayy5 = x5 /∈ E.To see this, note that the cardinality of the trusted setT is 5,and by eavesdropping and controlling any one node Charliecan observe and introduce errors in a maximum of4 symbolswith distinct indices to any data collector observation. For thesetT , containingy1,y5 along with3 other symbols, to be atrusted set, it has to generate a consistent hash table of size5 × 5. Therefore, Charlie has to pick the errore1 to satisfyx5e1

T = 0.The observationE = x1, . . . ,x4 of Charlie is inde-

pendent ofx5 due to the MDS property of the outer code.Therefore, for any choice ofe1 that Charlie makes, thereare qv equally likely choices ofx5, out of which qv−1 areorthogonal to the chosene1. Hence, the consistency conditionof hashH5,1 = H5,1 is satisfied with probability,

Pr(x5e1T = 0|E, e1) =

1

q.

Note that if Charlie could observe the complete hash table,thenx5 is no more independent of Charlie’s observation. Forexample, if Charlie observes the hash valueH2,5 = x2x5

T ,then for a given value ofx2 andH2,5, there are onlyqv−1

equally likely choices forx5. In which case Charlie canalways choosee1 to belong to the space orthogonal tov − 1dimensional space of possible choices ofx5, thus, deceivingthe proposed decoder. Therefore, it is crucial to keep the hashvalues secure from Charlie.

It can be verified that the above reasoning easily carries toany choice ofb = 1 node controlled by Charlie. Therefore, theprobability of error is upper bounded by1/q which vanisheswith increasing the field sizeq.

d) Rate Analysis:We encode5 information symbols inFqv to form the coded symbolsxi, i = 1, . . . , 10. For these10 symbols we construct a hash table of size10 × 10 withelements inFq. Hence the total overhead of the hash table

Page 14: Securing Dynamic Distributed Storage Systems against ... · Securing Dynamic Distributed Storage Systems against Eavesdropping and Adversarial Attacks Sameer Pawar, Salim El Rouayheb,

14

is 1005v = O( 1

v) per information symbol. Thus, the rate of our

code is5−O( 1v) which approaches5 with an increasing block

lengthv.

A. Results on Active Limited-Knowledge Adversary

Below we summarize our two main results on the resiliencycapacity in the case of a limited-knowledge adversary.

Theorem 14:For a DSSD(n, k, d) with an adversary thatcan eavesdrop on anyℓ < k nodes and control a subset of sizeb of theseℓ nodes (b ≤ ℓ), the following upper bound holdson the resiliency capacity,

Cr(α, γ) ≤

k∑

i=b+1

min(d− i+ 1)β, α (16)

wheredβ = γ.Proof: (sketch) Consider a case when nodesv1, . . . , vk

fail successively and are replaced by nodesvn+1, . . . , vn+k

as shown in Fig. 10. Also consider a data collector DCthat contacts thesek nodesvn+1, . . . , vn+k to retrieve thesource file. If the adversary Charlie controls theb nodesvn+1, . . . , vn+b, one possible adversarial strategy that Char-lie can use is to erase all the data stored on theseb nodes,i.e., always change it to a fixed value irrespective of the filestored on the DSS. This renders theb controlled nodes useless,resulting in the upper bound stated in the theorem.

Let R :=∑k

i=b+1 min(d − i + 1)β, α and E :=∑ℓ

i=1 min(d − i + 1)β, α. Our second results states thatif the eavesdropping capabilityℓ of the adversary Charlie islimited, in particularℓ is such thatE < R, the upper bound inTheorem 14 can be achieved ford = n− 1 in the bandwidth-limited regime.

Theorem 15:Consider a DSSD(n, k, d = n−1) operatingin the bandwidth-limited regime in the presence of an adver-sary that can eavesdrop onℓ nodes and controls a subset ofsize b of theseℓ nodes (b ≤ ℓ). Then, if the adversary islimited-knowledge,i.e., ℓ is such thatE < R, the resiliencycapacity of the system is,

CBLr (Γ) =

k∑

i=b+1

(n− i)β, (17)

whereβ = Γ/(n− 1).The condition E < R in Theorem 15 says that the

eavesdropping capability of the adversary is insufficient todetermine the message stored on the DSS,i.e., the adversary isnot omniscient. This limitation in the adversary’s knowledgeenables every data collector to identify the erroneous symbolsintroduced by the adversary and discard them, thus, resultingin erasures rather than errors. In this case also, identifying theerroneous symbols helps in the expurgation of the system anddiscarding the malicious nodes, as pointed out in Remark 12.

The proof of Theorem 15 is detailed in Appendix C andis composed of two parts. In the first part, we assume thatthe hash table is secure from the adversary and generalize thereasoning of Example 13 to show how the hash table can beused to identify, with high probability, the erroneous symbolsintroduced by Charlie and thus decode correctly. In the second

α

α

(d− ℓ+ 1)β

(d− k + 1)β

(d− 1)β

α

α

β

β

β

β

β

DCxn+1in

xn+2in

xn+ℓin

xn+kin

xn+1out

xn+ℓout

xn+2out

xn+kout

vn+1

vn+2

vn+ℓ

vn+k

∞∞

Fig. 10. Part of the information flow graph corresponding to a DSSD(n, k, d), when nodesv1, . . . , vk fail successively and are replaced bynodesvn+1, . . . , vn+k. A data collector contacts thesek nodes and wantsto reconstruct the stored file. Nodesvn+1, . . . , vn+ℓ shown with brokenboundaries are compromised by Eve while they were being repaired.

part, we demonstrate an efficient scheme to store the hash tablesecurely and reliably with a negligible sacrifice in the systemcapacity.

VIII. C ONCLUSION

In this paper we have considered the problem of securinga distributed storage system underrepair dynamicsagainsteavesdropping and adversarial attacks. We proposed a newdynamical model for the intrusion, wherein the adversary in-trudes the system at different time instances in order to exploitthe system repair dynamics to its own benefit. For the generalmodel of an adversary that can eavesdrop and/or maliciouslychange the data on some nodes in the system, we investigatethe problem of determining thesecrecy capacityandresiliencycapacity of the system. We provide upper bounds on thesecrecy and resiliencycapacity and show their achievabilityin thebandwidth-limited regime. General expressions of thesecapacities in addition to efficient decoding algorithms remainan open problem.

APPENDIX

A. Proof of Theorem 1

Consider a DSSD(n, k, d) with ℓ < k, operating atpoint (α, γ) with dβ = γ. Assume that nodesv1, v2, . . . , vkhave failed successively and were replaced during the repairprocess by the nodesvn+1, vn+2, . . . , vn+k respectively asshown in the corresponding information flow graphG inFig. 10. Now suppose that Eve accesses theℓ input nodesin the set E = xn+1

in , xn+2in , . . . , xn+ℓ

in ⊂ Vin whilethey were being repaired. Consider also a data collectorDC that downloads data from thek output nodes inB =xn+1

out , xn+2out , . . . , x

n+kout ∈ V a

out. The reconstruction propertyof Eq. (4) impliesH(S|CB) = 0 and the perfect secrecy

Page 15: Securing Dynamic Distributed Storage Systems against ... · Securing Dynamic Distributed Storage Systems against Eavesdropping and Adversarial Attacks Sameer Pawar, Salim El Rouayheb,

15

condition in Eq. (5) impliesH(S|DE) = H(S). We cantherefore write

H(S) = H(S|DE)−H(S|CB)

(1)

≤ H(S|CE)−H(S|CB)

(2)= H(S|CE)−H(S|CE , CB\E)

= I(S,CB\E |CE)

≤ H(CB\E |CE)

=

k∑

i=ℓ+1

H(Cn+i|Cn+1, . . . , Cn+i−1)

(3)

k∑

i=ℓ+1

min(d− i+ 1)β, α

Inequality (1) follows from the Markov chainS →DE → CE i.e., the stored dataCE is dependent onSonly through the downloaded dataDE , (2) from CB\E :=Cn+ℓ+1, . . . , Cn+k, (3) follows from the fact that each nodecan store at mostα units, and for each replacement node wehaveH(Ci) ≤ H(Di) ≤ dβ, also from the topology of thenetwork (see Fig. 10) where each nodexn+i

in is connected toeach of the nodesxn+1

out , . . . , xn+i−1out by an edge of capacity

β. The upper bound of Theorem 1 then follows directly fromthe definition of Eq. (6).

B. Proof of Theorem 6

Consider a DSSD(n, k, d) operating at point(α, γ)with dβ = γ, in the presence of an omniscient adver-sary that can controlb nodes, with 2b < k. Assumethat nodesvj+1, vj+2, . . . , vk, for some j, 2b < j <k, have failed consecutively and were replaced by nodesvn+1, vn+2, . . . , vn+(k−j), respectively. The information flowgraph G of the DSS corresponding to this sequence ofnode failures and repairs is shown in Fig. 11. Consider adata collector (Fig. 11) that observes the stored data onthe k nodesv1, . . . , vj , vn+1, . . . , vn+k−j . Consider also thecut C(V, V ) with V = x1

out, . . . , xjout, x

n+1in , . . . , xn+k−j

in ,xn+1out , . . . , x

n+k−jout ,DC that separates the source nodes from

the data collector DC. We group the edges belonging to thiscut into 3 disjoint sets as follows:

1) E1: the set of edges outgoing from nodesxpin, p =

1, . . . , b.2) E2: the set of edges outgoing from nodesxp

in, p = b +1, . . . , 2b.

3) E3: the set of edges outgoing from nodesxpin, p = 2b+

1, . . . , j, in addition to the edges belonging to the cutC(V, V ) that are incoming to the nodesxq

in, q = n +1, . . . , n+ k − j.

Let XEi(m), i = 1, 2, 3, be the symbols transmitted on the

edges in setEi corresponding to the stored messagem. Weclaim that in the presence of an adversary controlling anyb nodes and for any two distinct messagesm1 6= m2 thefollowing condition is necessary for the DC to not make adecoding error:

XE3(m1) 6= XE3

(m2).

vn+1

vn+k−j

α

α

DC

∞∞

α

v1

α

vj

(d− k + 1)β

(d− j)β

x1in x1out

xjin x

jout

xn+1in xn+1

out

xn+k−jin x

n+k−jout

C(V, V )

Fig. 11. Part of the information flow graph corresponding to aDSS (n, k, d) when nodesvj+1, . . . , vk fail successively and are re-placed by nodesvn+1, . . . , vn+k−j . A data collector connects to nodesv1, . . . , vj , vn+1, . . . , vn+k−j to retrieve the file.

Suppose that there exist two distinct messagesm1 6= m2

satisfying XE3(m1) = XE3

(m2). Now, if the symbolscarried on the edges belonging to the cutC(V, V ) areXE1

(m1), XE2(m2) and XE3

(m1) = XE3(m2). Then, as-

suming all the messages to be equally likely, the data collectorwill make a decoding error with probability at least1/2. Thisis true since it will not be able to distinguish between thefollowing two cases:

• The true message ism2 and the nodesx1in, . . . , xb

in

are controlled by the adversary Calvin who changed thetransmitted symbols on the edges in the setE1, fromXE1

(m2) to XE1(m1).

• The true message ism1 and the nodesxb+1in , . . . , x2b

in

are controlled by the adversary Calvin who changed thetransmitted symbols on the edges in the setE2, fromXE2

(m1) to XE2(m2).

Thus, the capacity of the DSS is upper bounded by the totalcapacity of the edges in the setE3, i.e.,

Cr(α, γ) ≤

j∑

i=2b+1

α+

k∑

i=j+1

(d−i+1)β, j = 2b+1, . . . , k−1.

The same analysis, as above, can be applied forj = 2bresulting in,

Cr(α, γ) ≤k

i=2b+1

(d− i+ 1)β.

Page 16: Securing Dynamic Distributed Storage Systems against ... · Securing Dynamic Distributed Storage Systems against Eavesdropping and Adversarial Attacks Sameer Pawar, Salim El Rouayheb,

16

And also forj = k, which gives,

Cr(α, γ) ≤

k∑

i=2b+1

α.

The bound in Theorem 6 then follows by taking the minimumof all the above upper bounds obtained forj = 2b, . . . , k. Itcan be easily seen that the above argument extends to the caseof 2b ≥ k for which the setE3 is empty andCr(α, γ) = 0.

C. Proof of Theorem 15

Consider a DSSD(n, k, d), with d = n−1, operating in thebandwidth-limited regime, in the presence of an adversary thatcan eavesdrop onℓ nodes and control a subset of them of sizeb, b ≤ ℓ. As in the earlier proofs, we show the achievabilityfor β = 1, i.e., Γ = n − 1. Any larger values ofβ or Γcan be achieved by repeatedly applying the proposed scheme.Since there is no constraint on the node storage capacityα inbandwidth-limited regime, we chooseα = n − 1. Let θ :=n(n−1)

2 , M :=∑k

i=1(n− i), R :=∑k

i=b+1(n − i) andE :=∑ℓ

i=1(n− i).Our proof consists of two parts: 1) We assume that the

hash table can be stored securely and reliably, and show anachievable scheme that can attain the resiliency capacity.2) Wepresent an efficient method to reliably and securely store thehash table in the presence of a limited-knowledge adversaryCharlie.

C.1 Resiliency Capacity in the Limited-knowledge Case forthe Bandwidth-Limited Regime

Code Construction:The code that we propose here is ageneralization of the one used in Example 13 of Section VII.It consists of an outer(θ,R) MDS code whose outputX =(x1, . . . ,xθ) ∈ F

θqv is stored on then storage nodes using an

inner RSKR-repetition code that enables exact repair in caseof any node failure. As shown in Table V, each data packetxi ∈ Fqv , i = 1, . . . , θ, is further appended with a hash vectorhi = (hi,1, . . . , hi,θ) ∈ F

θq. The values of these hashes are

computed as follows,

hi,j = xixjT ,

for j = 1, 2, . . . , θ, where with abuse of notationxi also de-notes the vector inFv

q representing the corresponding elementof Fqv . We assume for now that the hash values stored onthe nodes are secure from Charlie who can neither observenor corrupt them (as shown in the next section). AlthoughCharlie cannot directly observe the hash table, he can computesome of the hash values using the observed data packets onℓ eavesdropped nodes and possibly introduce errors that areconsistent with these hash values.

Decoding Logic: A data collector accessing anyk nodeswill observe a total of(n−1)k symbols and the correspondinghash vectors, where

(

k2

)

indices are repeated twice. As notedearlier, since the adversary can corrupt both of the storedsymbols with same indices identically, the decoder focusesonly on a set ofM =

∑k

i=1(n − i) symbols with distinct

Node data packet∈ Fqv hash∈ Fθq

v1 x1 x2 . . . xn−1 h1 h2 . . . hn−1

v2 x1 xn . . . x2n−3 h1 hn . . . h2n−3

v3 x2 xn . . . x3n−6 h2 hn . . . h3n−6

... . . ....

. . ....

......

. . ....

vn xn−1 x2n−3 . . . xθ hn−1 h2n−3 . . . hθ

TABLE VSCHEMATIC FORM OF THE CODE STORED ON THEDSS(n, k, d = n− 1),

ALONG WITH THE HASH TABLE THAT IS NOT ACCESSIBLE TO THE

ADVERSARY CHARLIE .

indices along with their hash vectors to make a decodingdecision. TheseM symbols form a codeword of an(M,R)MDS codeX possibly corrupted by errors introduced by theadversary.

Recall that Charlie can eavesdrop on a total ofℓ nodes andcontrol some subsetb ≤ ℓ of these eavesdropped nodes inthe system. Letyi, i = 1, . . . , θ, denote the possibly corruptedversion of the original data symbolsxi. We haveyi = xi+ei,whereei is the error introduced by Charlie on the symbolsstored on the nodes he controls, and for rest of symbolsei = 0. Without loss of generality, we suppose that thedata collector observes nodesv1, . . . , vk, i.e., data symbolsyi and hash valueshi, i ∈ 1, 2, . . . ,M. The data collectorobserves the hash values with no errors since the hash tableis assumed to be secure and reliable against the adversary.Let H denote the observedM × θ hash matrix having thevectorshi ∈ F

θq, i = 1, . . . ,M as rows. The data collector

then computes its ownM ×M hash matrixH as

Hij = yiyjT , 1 ≤ i, j ≤ M

from the observedM data packets and compares it with thecorresponding entries inH . It generates anM×M comparisontable similar to Table IV in Example 13. In this table a “X”in the i-th row andj-th column indicates that the computedhash and the observed hash match,i.e., Hij = Hij , whereas“×” indicates thatHij 6= Hij due to the errors introduced bythe adversary.

The decoder then selects a set ofR symbols, among(y1, . . . ,yM), that index an R × R sub-table of thecomparison table with all its entries equal to “X”, anddeclares it as atrusted setwith no errors. Then, it sets therest of theM − R observed symbols as erased and proceedsto decode the obtained vector as a codeword of an(M,R)MDS code X with M − R erasures. Since Charlie cancontrol only b nodes there always exists at least one setof size M −

∑b

i=1(n − i) = R symbols that generates aconsistent hash sub-table of sizeR×R with “X”. Hence, theproposed decoder is guaranteed to stop. Next, we computethe probability that the above decoder decodes to an incorrectmessage.

Error Analysis: The proposed decoder may result in an errorin decoding only if the chosen trusted set ofR observed sym-bols contains at least one erroneous symbol, sayyj = xj+ej,

Page 17: Securing Dynamic Distributed Storage Systems against ... · Securing Dynamic Distributed Storage Systems against Eavesdropping and Adversarial Attacks Sameer Pawar, Salim El Rouayheb,

17

ej 6= 0. Also, sinceb ≤ ℓ, we have,

b∑

i=1

(n− i) ≤ℓ

i=1

(n− i) < R, (18)

where the last inequality follows from our assumption (seeTheorem 15) that the eavesdropping capabilityE is strictlyless than the desired storage rateR. From equation (18), it isclear that the chosen trusted set contains at least one error-freesymbol that is not observed by Charlie, sayyi = xi /∈ E. Forthis set to be a trusted set, it has to generate a consistent hashtable of sizeR×R. In particularHij = Hij , i.e., xiej

T = 0.Next, we compute the probability of such event. LetE be

the set of symbols in the codewordX that are observed byCharlie. SinceX is the output of a(θ,R) MDS code and|E| < R, any symbolxi of X that does not belong toE isuniformly distributed inFqv conditioned onE, i.e.,

Pr(xi = xi|E) =1

qv, xi ∈ Fqv . (19)

Therefore, for any choice ofej that Charlie makes basedon his observationE, there areqv equally likely choices ofxi out of whichqv−1 are orthogonal to the chosenej. Hence,the consistency condition of hashHi,j = Hi,j is satisfied withprobability,

Pr(xiejT = 0|E, ej) =

1

q,

which goes to zero with increasing field sizeq.Note that if Charlie could observe the complete hash table,

xi would no more be independent of Charlie’s observation.Then, as shown in Example 13, Charlie can always chooseejto belong to the orthogonal space of all possible choices ofxi,thus deceiving the proposed decoder. Therefore, it is crucialto keep the hash values secure from Charlie.

Rate Analysis:We encodeR information symbols inFqv

using a(θ,R) MDS code to form a codeword(x1, . . . ,xθ).For these symbols we construct a hash table of sizeθ×θ withsymbols inFq. Hence the total overhead of the hash tableis θ2

Rv= O( 1

v) per information symbol which goes to zero

with increasing block lengthv. Hence, asymptotically in blocklengthv, these codes achieve the capacity of Theorem 15.

C.2 Reliable and Secure Storage of the Hash Table

The scheme described here for storing the hash tablesecurely and reliably is along the parallel lines of the schemeproposed [25]5 in the context of securing multicast networks.It aims at storing1 bit of information securely and reliably.The scheme can then be repeated to store the completehash table which, as shown in the previous section, is ofconstant size and independent of the block lengthv ofthe information symbols. The total overhead incurred bythis scheme can be then made arbitrarily small by increasingv.

5The scheme of [25] is matrix-based and is designed for networks whereintermediate nodes perform random network coding. Our scheme here can beregarded as a simple vector version of the one in [25]. This simplificationis possible due to the special structure of the networks (information flowgraphs) representing distributed storage systems in conjunction with theRSKR-repetition codes that limit coding in these networks to the source.

Code Construction:Let G =

(

GK

GS

)

be a generator

matrix of a (θ,M) nested MDS code over the finite fieldFq

(symbols in the hash table also belong to the same field). Thematrix GK in itself is a generator matrix of a(θ, E) MDScode overFq. If the bit to be stored is “1” then choose avectorS randomly and uniformly fromFM−E

q , otherwise, setS = 0 ∈ F

M−Eq . LetK = (K1 . . . ,KE) denoteE random keys

mutually independent and each uniformly distributed overFq.Now, we form the vectorX ∈ F

θq to be stored on the DSS as

part of the hash table by “mixing”S with the random keysusing the nested MDS code as,

X = KGK + SGS .

This encoded vectorX ∈ Fθq is then stored on the(n, k, d)

DSS using the RSKR-repetition code as shown in Fig. 6. TheRSKR-repetition structure allows the exact repair of a nodein case of failure as explained in Section V.

Security Analysis:The coding scheme used here is same asthe one in Section V-D that discusses passive adversary andhence the vectorS, which is of the appropriate rateM −E , isperfectly secure from Charlie eavesdropping onℓ nodes. Theperfect secrecy ofS implies the perfect secrecy of the hashbit.

Next we describe a decoding algorithm that the datacollector uses to decode the stored bit with high probabilityof success even in the presence of errors introduced byCharlie controllingb nodes.

Decoding Logic:We denote byD the decoder used by thedata collector to recover the stored bit belonging to the hashtable.D implements the same decoding steps as the decoder ofSection VI-B, of omniscient adversary, except for the decisionrule that determines the output. The input toD is the dataobserved by the data collector accessingk nodes which isformed of kα = k(n − 1) symbols, among which

(

k

2

)

pairshave the same indices. The decoder executes the followingsteps:

1) D selects any set ofM symbols having distinct indicesamong the observedkα symbols. These symbols aregrouped in a vectorY ∈ F

Mq which can be written as

Y = KGK + SGS + e,

whereGK and GS are submatrices ofGK andGS ofsizeE ×M and(M −E)×M , respectively. The vectore ∈ F

Mq , with up to

∑b

i=1(n− i) non-zero terms, is theerror vector that accounts for the errors introduced bythe adversary.

2) Let B, |B| = b, denote the set of storage nodes con-trolled by the adversary. Again, due to the exact repairproperty of the RSKR-repetition code it is sufficient toconsiderB ⊂ v1, . . . , vn with |B| = b. For each suchsetB, let IB ⊂ 1, 2, . . . , θ denote the set of indicesof the symbols stored on the nodes inB.

3) For each possibleB ⊂ v1, v2, . . . , vn, |B| = b, D

puncturesY with patternIB to obtainYIB as

YIB = KGKIB+ SGSIB

+ eIB ,

Page 18: Securing Dynamic Distributed Storage Systems against ... · Securing Dynamic Distributed Storage Systems against Eavesdropping and Adversarial Attacks Sameer Pawar, Salim El Rouayheb,

18

whereGKIBandGSIB

are the submatrices ofGK andGS obtained by deleting the columns corresponding tothe punctured elements ofY , andeIB is the puncturederror vector.

4) D checks whetherYIB is a valid codeword of the codegenerated by the matrixGKIB

by checking whether thecorresponding syndrome is zero.

5) The decoderD repeats steps3) and 4) for each of the(

nb

)

setsB until the syndrome obtained in step4) iszero. In this case,D declares that bit “0” was stored.Otherwise, if for all possible values ofB no zerosyndrome is obtained,D declares that “1” was stored.

Error Analysis: We do the error analysis of the abovedecoding logic considering two different cases based on thevalue of the stored hash bit.

• Hash bit ‘0’: We will show that when the stored infor-mation bit is ‘0’, the decoderD makes no error. In fact,this case corresponds toS = 0 and, thus,Y = KGK +e.Let B∗ be the actual set of nodes controlled by Charlie.Then, there is at least one setB = B∗ for whichYI∗

B= KGKI∗

B

, sinceeIB∗= 0. As a result, the decoder

always outputs “0”.• Hash bit ‘1’: Information bit ‘1’ corresponds to

Y = KGK + SGS + e,

where (K, S) is a uniformly random vector inFMq and

e ∈ FMq is the error vector introduced by Charlie. Note

that the matrixG is a generator matrix of a(θ,M) MDS

code, hence theM × M sub-matrixG :=

(

GK

GS

)

is

invertible. Thus, we can write

Y = (K + eK)GK + (S + eS)GS , (20)

whereeK , eS are the coefficients of the error vectore interms of the basis corresponding to the rows ofGK , GS .We have already shown in the security analysis above,that S is perfectly secure from Charlie’s observation.HenceS + eS is a uniformly random vector inFM−E

q .Consider any setB ⊂ v1, . . . , vn of cardinality|B| = bwith index setIB. Then, |IB| =

∑b

i=1(n − i), hencethe matrix GIB obtained by deleting the columns ofGcorresponding to the indicesIB hasR = M − |IB| ormore columns. Now, the matrixGKIB

is a generator of an(M, E) MDS code andE < R (Theorem 15). Hence, therank of GKIB

is E . This, along with the fact thatG is aninvertible matrix, implies that the rank of matrixGSIB

is R − E or more. The probability, that the syndromecomputed in the step4) of the proposed decoding logicfor this setB is equal to zero, is equal to the probabilityof the event that a uniformly random vector(S+eS) liesin the space orthogonal to the span of columns ofGSIB

.This probability is upper bounded by1/qR−E .Now applying the union bound to all

(

n

b

)

choices of thesetB that the decoder attempts, the probability of errorcan be upper bounded by,

limq→∞

(

nb

)

qR−E→ 0

which goes to zero with increasing the field sizeq.

Rate Analysis:In the code proposed above to store the hashvalues securely and reliably we needθ symbols inFq foreach1 bit of hash information. Also, in the previous sectionwe showed that the total size of the hash table of interest isθ2 symbols inFq. Thus, the total overhead of the proposedcode to store the hash table isθ3 log q symbols ofFq, that isindependent of the block lengthv of information packets.

Thus, we have shown how the hash table described inTable V can be stored on the DSS with a negligible overheadand is guaranteed with a high probability to be secret andresilient to the adversary provided that field sizeq and blocklengthv are large enough.

REFERENCES

[1] S. Rhea, C. Wells, P. Eaton, D. Geels, B. Zhao, H. Weatherspoon, andJ. Kubiatowicz, “Maintenance-free global data storage,”IEEE InternetComputing, pp. 40–49, 2001.

[2] R. Bhagwan, K. Tati, Y.-C. Cheng, S. Savage, and G. M. Voelker, “Totalrecall: System support for automated availability management,” in Proc.NSDI, 2004.

[3] F. Dabek, J. Li, E. Sit, J. Robertson, M. Kaashoek, and R. Morris,“Designing a DHT for low latency and high throughput,” inProc. NSDI,2004.

[4] A. Dimakis, P. Godfrey, Y. Wu, M. Wainright, and K. Ramchandran,“Network coding for distributed storage systems,”IEEE Transactionson Information Theory, vol. 56, pp. 4539–4551, Sep. 2010.

[5] T. Cui, T. Ho, and J. Kliewer, “On secure network coding over networkswith unequal link capacities and restricted wiretapping sets,” in IEEEInternat. Symp. Inform. Th. (ISIT), 2010.

[6] O. Kosut, L. Tong, and D. Tse, “Nonlinear network coding is necessaryto combat general byzantine attacks,” inProc. of 47th Annual AllertonConf. on Comm., Control, and Computing, Oct. 2009.

[7] A. G. Dimakis, P. B. Godfrey, M. J. Wainwright, and K. Ramchandran,“Network coding for distributed storage systems,” inIEEE Internat.Conf. on Comp. Comm. (INFOCOM), 2007.

[8] Y. Wu, A. G. Dimakis, and K. Ramchandran, “Deterministicregener-ating codes for distributed storage,” inProc. of 45th Annual AllertonConf. on Comm., Control, and Computing, 2007.

[9] K. Rashmi, N. B. Shah, P. V. Kumar, and K. Ramchandran, “Exactregenerating codes for distributed storage,” inProc. of 47th AnnualAllerton Conf. on Comm., Control, and Computing, 2009.

[10] Y. Wu and A. G. Dimakis, “Reducing repair traffic for erasure coding-based storage via interference alignment,” inIEEE Internat. Symp.Inform. Th. (ISIT), 2009.

[11] C. Suh and K. Ramchandran, “Exact regeneration codes for distributedstorage repair using interference alignment,” inIEEE Internat. Symp.Inform. Th. (ISIT), 2010.

[12] N. B. Shah, K. V. Rashmi, P. V. Kumar, and K. Ramchandran,“Explicitcodes minimizing repair bandwidth for distributed storage,” in Proceed-ings of IEEE Information Theory Workshop (ITW’10), 2010.

[13] R. Ahlswede, N. Cai, S.-Y. R. Li, and R. W. Yeung, “Network Informa-tion Flow,” IEEE Transactions on Information Theory, vol. 46, no. 4,pp. 1204–1216, 2000.

[14] C. Fragouli and E. Soljanin,Network Coding Fundamentals (Founda-tions and Trends in Networking). Now Publishers Inc, 2007.

[15] R. Yeung, S.-Y. Li, and N. Cai,Network Coding Theory (Foundationsand Trends in Communications and Information Theory). Now Publish-ers Inc, 2006.

[16] T. K. Dikaliotis, A. G. Dimakis, and T. Ho, “Security in distributedstorage systems by communicating a logarithmic number of bits,” inIEEE Internat. Symp. Inform. Th. (ISIT), 2010.

[17] N. Cai and R. W. Yeung, “Secure network coding,” inIEEE Internat.Symp. Inform. Th. (ISIT), 2002.

[18] N. Cai and R. W. Yeung, “Secure Network Coding on a WiretapNetwork,” IEEE Transactions on Information Theory, vol. 57, pp. 424–435, 2011.

[19] J. Feldman, T. Malkin, C. Stein, and R. A. Servedio, “On the capacityof secure network coding,” inProc. of 42nd Annual Allerton Conf. onComm., Control, and Computing, 2004.

Page 19: Securing Dynamic Distributed Storage Systems against ... · Securing Dynamic Distributed Storage Systems against Eavesdropping and Adversarial Attacks Sameer Pawar, Salim El Rouayheb,

19

Notation Explanation

G Information flow graph of a distributed storage system.V Set of nodes in the information flow graph.

C(V, V ) Cut partitioning the set of nodesV in a graph into two setsV ⊂ V and V = V \ V .S Random variable representing an incompressible source file.n Total number of active nodes in a distributed storage system.k Number of nodes a data collector connects to in order to retrieve the source file.d Number of nodes a new replacement node connects to during therepair process.α Storage capacity at each storage node in a distributed storage system.β Amount of data downloaded from every node participating in the repair process.γ The total amount of data downloaded during the repair process i.e., repair bandwidth.Γ Upper limit on the repair bandwidth in the bandwidth-limited regime.Di All the data\messages downloaded on the replacement nodevi during the repair process.Ci Data stored on the nodevi.R Desired or achieved storage rate.M Capacity of the distributed storage system in the absence ofan adversary.xi Data symbol or packet stored on a distributed storage system.yi Data symbol or packet, possibly corrupted by an adversary, observed by a data collector.ℓ Number of nodes an adversary can eavesdrop on in a distributed storage system.b Number of nodes an active adversary can maliciously control.E A set of symbols\nodes observed by an adversary by eavesdropping onℓ nodes.Cs Secrecy capacity of a distributed storage system.Cr Resiliency capacity of a distributed storage system.

TABLE VITABLE OF IMPORTANT NOTATIONS

[20] S. El Rouayheb and E. Soljanin, “On wiretap networks II,” in IEEEInternat. Symp. Inform. Th. (ISIT), 2007.

[21] D. Silva and F. R. Kschischang, “Security for wiretap networks viarank-metric codes,” inIEEE Internat. Symp. Inform. Th. (ISIT), 2008.

[22] T. Ho, B. Leong, R. Koetter, M. Medard, M. Effros, and D. Karger,“Byzantine modification detection in multicast networks using random-ized network coding,” inIEEE Internat. Symp. Inform. Th. (ISIT),pp. 616–624, 2004.

[23] S. Jaggi and M. Langberg, “Resilient network codes in the presence ofeavesdropping byzantine adversaries,” inIEEE Internat. Symp. Inform.Th. (ISIT), pp. 541–545, 2007.

[24] Jaggi, M. Langberg, S. Katti, T. Ho, D. Katabi, M. Medard, andM. Effros, “Resilient network coding in the presence of byzantineadversaries,” inIEEE Transactions on Information Theory (special issueon information-theoretic security), pp. 2596–2603, 2008.

[25] H. Yao, D. Silva, S. Jaggi, and M. Langberg, “Network codes resilientto jamming and eavesdropping,” inIEEE Internat. on Network Coding(NetCod’10), 2010.

[26] S. Ki, T. Ho, M. Effros, and S. Avestimehr, “New results on networkerror correction: capacities and upper bounds,” inInformation Theoryand Applications Workshop (ITA’10), 2010.

[27] R. W. Yeung and N. Cai, “Network error correction, part I: Basicconcepts and upper bounds,” inCommun. Inf. Syst, vol. 6, pp. 19–36,2006.

[28] R. W. Yeung and N. Cai, “Network error correction, part II: Lowerbounds,” inCommun. Inf. Syst, vol. 6, pp. 37–54, 2006.

[29] R. Koetter and F. Kschischang, “Coding for errors and erasures inrandom network coding,” inIEEE Transactions on Information Theory,pp. 3579–3591, 2008.

[30] D. Silva, F. R. Kschischang, and R. Koetter, “A rank-metric approachto error control in random network coding,” inIEEE Transactions onInformation Theory, 2008.

[31] L. H. Ozarow and A. D. Wyner, “Wire-tap channel-II,” inAT&T Belllab tech. journal vol. 63, no. 10, 1984.

[32] R. E. Blahut, Algebraic Codes for Data Transmission. CambridgeUniversity Press, 2002.

[33] T. M. Cover and J. A. Thomas,Elements of Information Theory. Wiley-Interscience, 2006.

[34] S. Arunkumar and S. W. Mclaughlin, “MDS codes on erasure-erasurewire-tap channel,” inarXiv:0902.3286v1, 2009.

Sameer Pawar received the M.S. degree in electrical engineering fromIndian Institute of Science (IISc), Bangalore, India, in 2005. Since 2007,

he has been with the Department of Electrical Engineering and ComputerScience in the University of California at Berkeley. Prior to that, he hadbeen with the Communications Department, Infineon Technologies India. Hisresearch interests include information theory and Coding theory for Storageand communication systems. He is recipient of Gold Medal forthe BestMasters thesis in Electrical Division in IISc.

Salim El Rouayheb (S’07M’09) received the Diploma degree in electricalengineering from the Lebanese University, Faculty of Engineering, Roumieh,Lebanon, in 2002, and the M.S. degree in computer and communicationsengineering from the American University of Beirut, Lebanon, in 2004.He received the Ph.D. degree in electrical engineering fromTexas A&MUniversity, College Station, in 2009. He is currently a Postdoctoral ResearchFellow with the Electrical Engineering and Computer Science Department,University of California, Berkeley. His research interests lie in the broadarea of communications with a focus on reliable an secure distributedinformation systems and on the algorithmic and information-theoretic aspectsof networking.

Kannan Ramchandran is a Professor of Electrical Engineering and Com-puter Science at the University of California at Berkeley, where he has beensince 1999. Prior to that, he was with the University of Illinois at Urbana-Champaign from 1993 to 1999, and was at AT&T Bell Laboratories from 1984to 1990. His current research interests include distributed signal processingalgorithms for wireless sensor and ad hoc networks, multimedia and peer-to-peer networking, multi-user information and communication theory, andwavelets and multi-resolution signal and image processing. Prof. Ramchan-dran is a Fellow of the IEEE. His research awards include the Elaihu Juryaward for the best doctoral thesis in the systems area at Columbia University,the NSF CAREER award, the ONR and ARO Young Investigator Awards, twoBest Paper awards from the IEEE Signal Processing Society, aHank MagnuskiScholar award for excellence in junior faculty at the University of Illinois, andan Okawa Foundation Prize for excellence in research at Berkeley. He is aFellow of the IEEE. He has published extensively in his field,holds 8 patents,serves as an active consultant to industry, and has held various editorial andTechnical Program Committee positions.


Recommended