HAIL (High-Availability HAIL (High-Availability and Integrity Layer) for and Integrity Layer) for
Cloud StorageCloud Storage
Kevin Bowers and Alina OpreaKevin Bowers and Alina OpreaRSA LaboratoriesRSA Laboratories
Joint work with Ari JuelsJoint work with Ari Juels
2
Cloud Storage Provider
Client
Storage server
Web server
Cloud storageCloud storage
Pros:
• Lower cost
• Easier management
• Enables sharing and access from anywhere
Cons:
• Loss of control
• No guarantees of data availability
• Provider failures
3
Provider failuresProvider failures
Amazon S3 systems failure downs Web 2.0 sites
Twitterers lose their faces, others just want their data back Computer World, July 21, 2008
Customers Shrug Off S3 Service Failure
At about 7:30 EST this morning, S3, Amazon.com’s online storage service, went down. The 2-hour service failure affected customers worldwide.
Wired, Feb. 15, 2008
Loss of customer data spurs closure of online storage service 'The Linkup‘Network World, Nov 8, 2008
Spectacular Data Loss Drowns Sidekick Users
October 10, 2009
Temporary unavailability
Permanent data loss
How do we increase users’ confidence in the cloud?
4
OutlineOutline
• Proofs of Retrievability – Constructions and practical aspects– Limitations
• HAIL goals and adversarial model
• HAIL protocol design– Encoding layer– Decoding layer– Challenge-response protocol– Redistribution of shares in case of failures
• HAIL parameter tradeoffs
• Open problems
5
PORs: Proofs of RetrievabilityPORs: Proofs of Retrievability
• Client outsources a file F to a remote storage provider
• Client would like to ensure that her file F is retrievable
• The simple approach: client periodically downloads F
• This is resource-intensive!
• What about spot-checking instead?– Sample a few file blocks periodically– If file is not stored locally, need verification mechanism (e.g.,
MACs for each file block)
7
Spot-checkingSpot-checking
Cloud Storage Provider
Client
F B4 B7 T1T2 T3B1 T1B1
Small corruptions go undetected
k
8
Error correcting codeError correcting code
Cloud Storage Provider
Client
F
Parity blocks
Corrects small
corruption
k
9
ECC + MACECC + MAC
Cloud Storage Provider
Client
F T2 T3T1 T4B1 B4 B7 P1
Parity blocks MACs over file and parity
blocks
k
• Detect large corruption through spot checking
• Corrects small corruption through ECC
10
Query aggregationQuery aggregation
Cloud Storage Provider
Client
Challenge
k
Response
Parity blocks
F
MACs over aggregation of
blocks
11
Practical considerationsPractical considerations
• Applying such an ECC to all of F is impractical
• Instead, we can stripe the ECC
• If adversary knows the stripe structure, she can corrupt selectively…
12
Selective corruptionSelective corruption
• Adversary targets a particular stripe
• File can not be recovered
• The probability that the client detects the corruption through sampling is small if stripes are small
• Practical code parameters encode hundreds of bytes at a time (e.g., Reed-Solomon (255, 223, 32))
13
Adversarial codes: hide ECC stripesAdversarial codes: hide ECC stripes
• Do secret, randomized partitioning of F into stripes– E.g. use secret key to generate pseudorandom permutation and
then choose stripes sequentially
• Encrypt and permute parity blocks• The encoding is still systematic• But adversary does not know where stripes are, so…
adversary cannot feasibly target a stripe!
14
POR papersPOR papers
• Proofs of Retrievability (PORs) – Juels-Kaliski 2007
• Proofs of Data Possession (PDPs)– Burns et al. 2007– Erway et al. 2009
• Unlimited queries using homomorphic MACs– Shacham-Waters 2008– Ateniese, Kamara and Katz 2009
• Fully general query aggregation in PORs– Bowers, Juels and Oprea 2009– Dodis, Vadhan and Wichs 2009
15
When PORs failWhen PORs fail
Cloud Storage Provider
Client
FF
k
Challenge Responsedecoder
Unrecoverable
16
OutlineOutline
• Proofs of Retrievability – Constructions and practical aspects– Limitations
• HAIL goals and adversarial model
• HAIL protocol design– Encoding layer– Decoding layer– Challenge-response protocol– Redistribution of shares in case of failures
• HAIL parameter tradeoffs
• Open problems
17
HAIL goalsHAIL goals
• Resilience against cloud provider failure and temporary unavailability
• Use multiple cloud providers to construct a reliable cloud storage service out of unreliable components– RAID (Reliable Array of Inexpensive Disks) for cloud storage
under adversarial model
• Provide clients or third party auditing capabilities– Efficient proofs of file availability by interacting with cloud
providers
18
RAID (Redundant Array of Inexpensive Disks)RAID (Redundant Array of Inexpensive Disks)
B1 B2 B3 P1=B1B2B3
Data block Parity blockData block Data block
XB1B3P1
• Shift from monolithic, high-performance drives to cheaper drives with redundancy
Stripe
19
RAID in the CloudRAID in the Cloud
Provider A Provider B Provider C Provider D
• Fuse together cheap cloud providers to provide high-quality (reliable) abstraction– E.g., Memopal offers $0.02 / GB / Month storage on a 5-year
contract vs. Amazon at $0.15 / GB / Month
20
……But the cloud is adversarial!But the cloud is adversarial!
Provider A Provider B Provider C Provider D
• RAID designed for benign failures (drive crashes)
• Static adversaries are not realistic
• A mobile adversary moves from provider to provider– System failures and corruptions over time– Corrupts a threshold of providers in each epoch (b out of n)
21
Mobile adversaryMobile adversary
Provider A Provider B Provider C Provider D
• Combination of proactive and reactive models– Separate each server into code base and storage base
Code base of servers cleaned at beginning of epoch (e.g., through reboot) At most b out of n server have corrupted code in each epoch
– Challenge-responses used for detection of failure Corrupted storage recovered when failure is detected
22
HAIL protocolsHAIL protocols
• File encoding– Distribute a file across n storage providers– Add redundancy to tolerate provider failures– Small state stored locally by client (including secret key)
• File decoding– Recover original file by contacting a threshold of providers– Tolerate provider failures or unavailability
• Challenge-response protocol– Executed a number of times per epoch– Enables clients to perform integrity checks by contacting a
threshold of providers– Detects failures early and enhances data availability
• Share redistribution– When failure detected, clients reconstruct shares from
redundancy encoded in other providers
23
OutlineOutline
• Proofs of Retrievability – Constructions and practical aspects– Limitations
• HAIL goals and adversarial model
• HAIL protocol design– Encoding layer– Decoding layer– Challenge-response protocol– Redistribution of shares in case of failures
• HAIL parameter tradeoffs
• Open problems
24
First idea: file replication with PORFirst idea: file replication with POR
F
Provider A Provider B Provider C
Client
POR Challenge
POR ResponsePOR Challenge POR Response
POR Response
POR Challenge
FF FParity MACs Parity MACsParity MACs
25
File replication with POR: IssuesFile replication with POR: Issues
Client
MACs
Provider A Provider B Provider C
FF FParity MACs ParityParity MACs
• Compute different MACs per provider
• Large encoding overhead
• Large storage overhead due to replication
F
26
Use redundancy across serversUse redundancy across servers
FSample and check
consistency across providers
F F F
Provider A Provider B Provider C
Client
Block i Block i Block i
Fi Fi Fi
27
Small-corruption attackSmall-corruption attack
F F F
Provider A Provider B Provider C
Client
Fi Fi Fi
The probability that client samples the
corrupted block is low
File can not be recovered after
[n/b] epochs
28
Replication with server codeReplication with server code
Provider A Provider B Provider C
Client
FF FParity ParityParity
• Still vulnerable to small-corruption attack, once corruption exceeds the error correction rate of server code
• Large storage overhead due to replication
29
Dispersal erasure codeDispersal erasure code
PA PB PC PD PE
F
Stripe
Dispersal code parity Original file F
Primary servers (k) Secondary servers (n-k)
• File can be recovered from any k available servers
• For encoding efficiency, use striping for 128-bit blocks
128 bit
F1 F2 F3
30
Two encoding layersTwo encoding layers
PA PB PC PD PE
Server code
Dispersal code parity
F1 F2 F3
• Dispersal code reduces storage overhead of replication with similar availability guarantees
• Server code improves resilience to small-corruption attack
31
Checking for correct encodingChecking for correct encoding
PA PB PC PD PE
ClientCheck that stripe is a codeword in dispersal code
32
Aggregation of stripesAggregation of stripes
PA PB PC PD PE
ClientCheck that linear
combination of stripes is a codeword
1
α
α2
ComparisonComparison
33
F
Parity MACs
F
Parity MACs
F
Parity MACs
File replication with POR
HAIL:Two encoding layers (dispersal and server code)
- Large storage overhead due to replication
- Redundant MACs for POR
- Large encoding overhead
- Verifiable by client only
+ Increased lifetime
+ Optimal storage overhead for given availability level
+ Uses cross-server redundancy for verifying responses
+ Reasonable encoding overhead
+ Public verifiability
- Limited lifetime
34
Increase protocol lifetimeIncrease protocol lifetime
PA PB PC PD PE
F1 F2 F3
• Authenticate stripes with MACs
• One MAC per block
- Large storage overhead
- How can the MACs from multiple stripes be aggregated?
MAC
35
Integrity-protected dispersal codeIntegrity-protected dispersal code
PA PB PC PD PE
F1 F2 F3
• Embed integrity information into parity blocks of dispersal code
• Can check linear combination of MACs knowing only linear combination of blocks
+ PRFk1(pos)
36
HAIL protocolsHAIL protocols
• Encoding– Two layers of error correction: dispersal code and server code– Integrity-protected dispersal code used to reduce storage
overhead– Server code is adversarial erasure code
• Decoding– Reverse of encoding, using two layers of error correction
• Tradeoffs:– Erasure dispersal code: tolerates n-m-1 failures per round, but
decoding requires brute force in case of errors (do not know the positions of erasures)
– Error-correcting dispersal code: tolerates up to b = (n-m-1)/2 failures per round
37
HAIL protocols, contHAIL protocols, cont
• Challenge-response– Executed in each time round a number of times– Challenge: a number of row positions– Response: aggregated row– Verification: response should be a codeword in dispersal code
and composite MAC should be valid
• Redistribution of shares:– Invoked when corruption of a fragment is detected by challenge-
response– Reconstruction done by client and involves downloading m
correct file fragments
Encoding PerformanceEncoding Performance
• HAIL requires two levels of encoding
• Order is important!
40
Encoding SecurityEncoding Security
• Security of the MAC depends on the size of the finite field used to perform Reed-Solomon encoding.
• Most Reed-Solomon codes are implemented over bytes, or at most 4-byte words (typical integer representation)
• 32-bit security is low from a cryptographic viewpoint
• Operating over larger symbols is slow– Larger encodings can be generated by combining several
smaller encodings– Or, they can be implemented using extension fields
• To speed up larger symbol encoding, need fast operations in large Galois Fields– Work with Jianqiang Luo and Lihao Xu at Wayne State Univ.
41
47
SummarySummary
• HAIL is an extension of RAID into the cloud
• High availability and tolerance to adversarial failures– Low storage overhead due to integrity-protected dispersal code
• Enables client-side integrity checks– Low bandwidth for challenge-response due to aggregation
• Papers:– K. Bowers, A. Juels, and A. Oprea. Proofs of Retrievability:
Theory and Implementation. ACM CCSW ’09.– K. Bowers, A. Juels, and A. Oprea. HAIL: High Availability and
Integrity Layer for Cloud Storage. ACM CCS ’09.
• http://www.rsalabs.com/