+ All Categories
Transcript
Page 1: On Utilization of Contributory Storage in Desktop Grids · On Utilization of Contributory Storage in Desktop Grids Chreston Miller, Ali R. Butt, and Patrick Butler ... •Advantages:

On Utilization of Contributory Storage in Desktop GridsOn Utilization of Contributory Storage in Desktop Grids

Chreston Miller, Ali R. Butt, and Patrick Butler

Department of Computer Science

Page 2: On Utilization of Contributory Storage in Desktop Grids · On Utilization of Contributory Storage in Desktop Grids Chreston Miller, Ali R. Butt, and Patrick Butler ... •Advantages:

2

Contributory Storage: Cheap

Storage using Shared Resources

Contributory Storage: Cheap

Storage using Shared Resources

• Distributed setup with many participants

• Nodes contribute storage space for sharing

• Create a uniform global storage space

• Typically supports decentralized store/lookup

• Many systems build upon this idea

• PAST, CFS, OceanStore, Kosha, LOCKSS,…

2

Page 3: On Utilization of Contributory Storage in Desktop Grids · On Utilization of Contributory Storage in Desktop Grids Chreston Miller, Ali R. Butt, and Patrick Butler ... •Advantages:

3

Goal: Use of Contributory Storage

in Scientific Computing

Goal: Use of Contributory Storage

in Scientific Computing

• Advantages:• Provides economical storage with large capacity

• Supports parallel access to distributed resources

• Challenges:• Limited individual file sizes

• Unreliable and transient participants

Simple replication or file splitting is likely not to work

3

Need for techniques to use shared storage

in scientific computing

Page 4: On Utilization of Contributory Storage in Desktop Grids · On Utilization of Contributory Storage in Desktop Grids Chreston Miller, Ali R. Butt, and Patrick Butler ... •Advantages:

4

Our Contribution: PeerStripe

Reliable Shared Storage

Our Contribution: PeerStripe

Reliable Shared Storage

• Utilizes storage contributed by peer nodes

• Adapts data striping to support large files

• Employs error coding for fault tolerance

• Leverages multicast for efficient replication

• Supports easy integration with applications

4

Page 5: On Utilization of Contributory Storage in Desktop Grids · On Utilization of Contributory Storage in Desktop Grids Chreston Miller, Ali R. Butt, and Patrick Butler ... •Advantages:

5

OutlineOutline

• Preamble

• End to our Means

• Evaluation Study

• Conclusion

5

Page 6: On Utilization of Contributory Storage in Desktop Grids · On Utilization of Contributory Storage in Desktop Grids Chreston Miller, Ali R. Butt, and Patrick Butler ... •Advantages:

6

OutlineOutline

• Preamble

• End to our Means

• Evaluation Study

• Conclusion

6

– Problem

– Motivation

– Our Contributions

– Core Technologies

Page 7: On Utilization of Contributory Storage in Desktop Grids · On Utilization of Contributory Storage in Desktop Grids Chreston Miller, Ali R. Butt, and Patrick Butler ... •Advantages:

7

Core Technologies:

Structured Peer-to-Peer

Networks

Core Technologies:

Structured Peer-to-Peer

Networks• Implement Distributed Hash Table abstraction

• Facilitate decentralized operation

• Provide self-organization of participants

• Systems based on these networks provide:

• Mobility and location transparency

• Load-balancing

• We use Free Pastry substrate from Rice

University and Microsoft

7

Page 8: On Utilization of Contributory Storage in Desktop Grids · On Utilization of Contributory Storage in Desktop Grids Chreston Miller, Ali R. Butt, and Patrick Butler ... •Advantages:

8

Core Technologies:

Increasing Data Availability

Core Technologies:

Increasing Data Availability

• Erasure codes• Provide redundancy against failures

• Incur less space overhead than replication

• Advanced codes can withstand multiple failures

• Multicast communication protocol• Supports simultaneous messaging to many nodes

• Can be leveraged for efficient replication

8

Page 9: On Utilization of Contributory Storage in Desktop Grids · On Utilization of Contributory Storage in Desktop Grids Chreston Miller, Ali R. Butt, and Patrick Butler ... •Advantages:

9

OutlineOutline

• Preamble

• End to our Means

• Experimental Study

• Conclusion

9

– Software Architecture

– Splitting a file

– Redundancy with multicast

– Error coding

– Interfacing with applications

Page 10: On Utilization of Contributory Storage in Desktop Grids · On Utilization of Contributory Storage in Desktop Grids Chreston Miller, Ali R. Butt, and Patrick Butler ... •Advantages:

10

PeerStripe Software TasksPeerStripe Software Tasks

1. Storing large files

• Split file into different size chunks

• Use DHT’s to store chunks

2. Error coding chunks

• Use online code to provide redundancy

3. Chunk replication

• Replicate commonly used chunks

4. Interface with applications

• Provide API’s for applications to use10

Page 11: On Utilization of Contributory Storage in Desktop Grids · On Utilization of Contributory Storage in Desktop Grids Chreston Miller, Ali R. Butt, and Patrick Butler ... •Advantages:

11

Part 1: Splitting Files into ChunksPart 1: Splitting Files into Chunks

11

EncoderSplitter

Data File

x

Chunks

n blocks

/chunk

m blocks/chunk

x*m error coded

blocks

Nodes

Chunk 1

Get capacity from nodes

Page 12: On Utilization of Contributory Storage in Desktop Grids · On Utilization of Contributory Storage in Desktop Grids Chreston Miller, Ali R. Butt, and Patrick Butler ... •Advantages:

12

Part 2: Error Coding ChunksPart 2: Error Coding Chunks

• Each chunk is separately error coded

1. A chunk is split into equal n size blocks

2. The blocks are error coded into m encoded blocks

3. Encoded blocks are inserted into the DHT

12

QuickTime™ and a decompressor

are needed to see this picture.

1 2

3

Page 13: On Utilization of Contributory Storage in Desktop Grids · On Utilization of Contributory Storage in Desktop Grids Chreston Miller, Ali R. Butt, and Patrick Butler ... •Advantages:

13

Investigation of Error CodesInvestigation of Error Codes

• Error codes tested and used:

• XOR code: Protect against single failures

• Online code: Protect against multiple failures

+ Good redundancy with small space overhead

- Recovery may consume resources

Page 14: On Utilization of Contributory Storage in Desktop Grids · On Utilization of Contributory Storage in Desktop Grids Chreston Miller, Ali R. Butt, and Patrick Butler ... •Advantages:

14

Part 3: Multicast-based

Replication

Part 3: Multicast-based

Replication

• Leverage multicast for efficient and fast data

dissemination to multiple destinations

• Faster recovery at the cost of space

• Challenge: Creation of a multicast-tree from

source to replica destinations

14

Page 15: On Utilization of Contributory Storage in Desktop Grids · On Utilization of Contributory Storage in Desktop Grids Chreston Miller, Ali R. Butt, and Patrick Butler ... •Advantages:

15

Creating a Multicast TreeCreating a Multicast Tree

• Use greedy approach • Start from the source S

• Using locality-aware DHT select random nodes close to S as first tier

• Repeat selecting at each tier till replica location Ris reached

• Employ standard multicast protocols, e.g. Bullet to push data from S to R

15

S

R R R RRRR R

Page 16: On Utilization of Contributory Storage in Desktop Grids · On Utilization of Contributory Storage in Desktop Grids Chreston Miller, Ali R. Butt, and Patrick Butler ... •Advantages:

16

Part 4: Interfacing with

Applications

Part 4: Interfacing with

Applications

• Modify applications to use direct calls to the

PeerStripe API

• Works well for new applications

• Link applications with an interposing library to

redirect I/O

• Transparent integration with existing applications

16

Page 17: On Utilization of Contributory Storage in Desktop Grids · On Utilization of Contributory Storage in Desktop Grids Chreston Miller, Ali R. Butt, and Patrick Butler ... •Advantages:

17

OutlineOutline

• Begin to our Means

• End to our Means

• Evaluation Study

• Conclusion

17

– Simulation

– Real world

– PlanetLab

– Condor

Page 18: On Utilization of Contributory Storage in Desktop Grids · On Utilization of Contributory Storage in Desktop Grids Chreston Miller, Ali R. Butt, and Patrick Butler ... •Advantages:

18

Evaluation: OverviewEvaluation: Overview

1. Simulation study:

• Successful File Stores

• Number and size of chunks created

• System utilization (in terms of storage capacity)

• File availability with error coding

• Error code performance

• Effects of participant churn

2. Design verification on PlanetLab

3. Integration with Condor desktop grid

18

Page 19: On Utilization of Contributory Storage in Desktop Grids · On Utilization of Contributory Storage in Desktop Grids Chreston Miller, Ali R. Butt, and Patrick Butler ... •Advantages:

19

Simulation Study SetupSimulation Study Setup

• 10,000-node directly connected network

• Assigned node capacities with mean 45 GB and

variance 10 GB

• File system trace of 1.2M files totaling 278.7 TB

• Compare with PAST and CFS storage systems

19

Page 20: On Utilization of Contributory Storage in Desktop Grids · On Utilization of Contributory Storage in Desktop Grids Chreston Miller, Ali R. Butt, and Patrick Butler ... •Advantages:

20

Number of Successful File StoresNumber of Successful File Stores

• 7.0x improvement over PAST

• 2.9x improvement over CFS

20

Page 21: On Utilization of Contributory Storage in Desktop Grids · On Utilization of Contributory Storage in Desktop Grids Chreston Miller, Ali R. Butt, and Patrick Butler ... •Advantages:

21

Number and Size of ChunksNumber and Size of Chunks

• CFS: 61.25 chunks with stdev of 13.8• Fixed chunk size of 4 MB

• PeerStripe: 3.72 chunks with stdev of 3.1• Average chunk size 81.28 MB with stdev 19.9 MB

Fewer chunks in PeerStripe allows• Fewer expensive p2p lookups

• Performance similar to PAST

21

Page 22: On Utilization of Contributory Storage in Desktop Grids · On Utilization of Contributory Storage in Desktop Grids Chreston Miller, Ali R. Butt, and Patrick Butler ... •Advantages:

22

Overall System Capacity

Utilization

Overall System Capacity

Utilization

• PeerStripe: 20.19% better than PAST

• PeerStripe: 7.18% better than CFS

• PeerStripe can utilize the available storage capacity more efficiently even at higher utilization

22

Page 23: On Utilization of Contributory Storage in Desktop Grids · On Utilization of Contributory Storage in Desktop Grids Chreston Miller, Ali R. Butt, and Patrick Butler ... •Advantages:

23

Error Coding: File AvailabilityError Coding: File Availability

• XOR code - 23% less failures

• Online code - 32% less failures

• Online code provides excellent fault tolerance against node failures

23

Page 24: On Utilization of Contributory Storage in Desktop Grids · On Utilization of Contributory Storage in Desktop Grids Chreston Miller, Ali R. Butt, and Patrick Butler ... •Advantages:

24

Error Coding PerformanceError Coding Performance

• Compare XOR (1:1) and Online code with NULL code

• XOR - factor of 3.3 times faster than online codes

• Online code - slower than XOR,

• Decoding can start as soon as a block becomes available and can be overlapped with retrieval of other blocks

• The efficiency of online code overshadows its overhead

24

Erasure

code

Encoded size Encoding time

Size(MB) Overhead Time Overhead

Null 4 0% 11 0%

XOR 6 50% 79 618%

Online 4.12 3% 264 2300%

Page 25: On Utilization of Contributory Storage in Desktop Grids · On Utilization of Contributory Storage in Desktop Grids Chreston Miller, Ali R. Butt, and Patrick Butler ... •Advantages:

25

Effects of Participant ChurnEffects of Participant Churn

• Failed up to 20% of total nodes

• 29.3 GB of data was regenerated per node failure

• Total of 58,625.8 GB regenerated

• 142.2 GB data was lost which is small compared to the 278.7 TB of total data

• The data recreated per failure is small: 0.01%

25

Nodes failed

(percentage

of total)

Data lost Data regenerated

Total

(GB)

Total

(GB)

Average

(GB)

Sd

(GB)

10 percent 0 28044.35 28.04 79.85

20 percent 142.18 58625.78 29.31 80.02

Page 26: On Utilization of Contributory Storage in Desktop Grids · On Utilization of Contributory Storage in Desktop Grids Chreston Miller, Ali R. Butt, and Patrick Butler ... •Advantages:

26

Verification on PlanetLabVerification on PlanetLab

• 40 different distributed sites

• Number of failed stores reduced by 330% w.r.t. PAST105% w.r.t. CFS

• Storage utilization: CFS 52%, PAST - 47%, PeerStripe - 63%

• Online codes provided 98.6% availability through four node failures

26

Page 27: On Utilization of Contributory Storage in Desktop Grids · On Utilization of Contributory Storage in Desktop Grids Chreston Miller, Ali R. Butt, and Patrick Butler ... •Advantages:

27

Interfacing with CondorInterfacing with Condor

27

QuickTime™ and a decompressor

are needed to see this picture.

• Utilize a 32-node Condor pool

• CFS and PeerStripe worked for smaller files

• DHT lookups introduced an overhead - few for PeerStripe

• Overhead for PeerStripe is small

Page 28: On Utilization of Contributory Storage in Desktop Grids · On Utilization of Contributory Storage in Desktop Grids Chreston Miller, Ali R. Butt, and Patrick Butler ... •Advantages:

28

OutlineOutline

• Begin to our Means

• End to our Means

• Experimental Study

• Conclusion

28

Page 29: On Utilization of Contributory Storage in Desktop Grids · On Utilization of Contributory Storage in Desktop Grids Chreston Miller, Ali R. Butt, and Patrick Butler ... •Advantages:

29

ConclusionConclusion

• P2p-based storage can be extended with erasure coding and striping to provide robust, scalable, and reliable distributed storage for scientific computing.

• PeerStripe achieves better utilization of collective capacity of nodes with good performance

• Error coding is effective in providing fault tolerance and data availability

• Multicast can be used for replica maintenance

• Use of interposing library allows easy integration with new and existing applications

29


Top Related