+ All Categories
Home > Documents > Fault Tolerant Storage And Quorum Systems in Dynamic Environments Uri Nadav, Master thesis Advisor:...

Fault Tolerant Storage And Quorum Systems in Dynamic Environments Uri Nadav, Master thesis Advisor:...

Date post: 20-Dec-2015
Category:
View: 220 times
Download: 4 times
Share this document with a friend
Popular Tags:
56
Fault Tolerant Storage And Quorum Systems in Dynamic Environments Uri Nadav, Master thesis Advisor: Moni Naor The Weizmann Institute of Science
Transcript

Fault Tolerant Storage And Quorum Systems in Dynamic

Environments

Uri Nadav, Master thesisAdvisor: Moni Naor

The Weizmann Institute of Science

Slide - 2

Agenda

Fault-Tolerant Storage System Fighting Censors

Quorum System for Dynamic Networks

Slide - 3

Goal

Distributed file storage system Peer-to-peer environment Processors join and leave the system

Partial Solutions Distributed File sharing applications [Gnutella, Kazaa] Distributed Hash Tables [DH, Chord, Viceroy]

Store (key, value) pairs and perform lookup on key

Slide - 4

Fault-Tolerant Storage System

Censor Aims to eliminate access to some files

Design Goal:

A reader should be able to reconstruct each file with high probability after faults have been caused

Probability taken over coins of the writer and reader

Slide - 5

Adversarial Model

Adversary chooses the set of processors to crash

fail-stop failures We do not consider Byzantine failures

Different degrees of adaptiveness Non adaptive adversary

Choice of faulty processors is not based on their content

Adversary with a limited number of queries May query some processors

Slide - 6

Other Fault Models

Random faults model: Examples: Distance Halving DHT, Chord Standard technique:

Replication to log(n) processors Assures survival with high probability

Adversarial faults [Fiat, Saia]

Large fraction accessible after adversary crashes a linear fraction of the processors

Still, a censor can target a specific file

Slide - 7

Measures of Quality

Read/Write complexity: Average number of processors accessed during a

read/write operation

Number of rounds: Number of rounds required from an adaptive reader

Blowup Ratio: Ratio between the total number of bits used for the

storage of a file and its size

Slide - 8

Quorum Systems

Formal Definition: U – Universe F ½ 2U

8 A,B 2 F AÅ B ; F is called a quorum system. A,B are called quorum sets

Probabilistic -intersecting quorum system [Malkhi et al] Strategy (distribution) w over 2U

Two sets A,B drawn from the strategy w, intersect with probability at least 1

A quorum system is an intersecting family of sets over some universe

The set of processors from which a file is read must intersect the set of processors to which a file was written

Slide - 9

Storage system example

The intersecting quorum system [Malkhi et al]

The quorum set is made of all sets of size -----

Pick one quorum uniformly at random

Intersection follows from the birthday paradox

Storage System:

Storage: A file is replicated to all members of a quorum set

Retrieval: Choose a quorum set and probe its members

Slide - 10

Properties of the Probabilistic Storage System

Pros: Simplicity

Resilient against linear number of faults Even if the processors are chosen by the adversary

adaptively

Adapted to a dynamic environment [Abraham, Malkhi]

Target: Come up with a storage system with better parameters

Cons: High read/write complexity ( )

High blowup-ratio ( )

Slide - 11

The Model

Non-adaptive adversary: Chooses a set of processors (linear size) without

accessing any processor first.

Non-adaptive reader: Processors are chosen without accessing any processor

Theorem: A fault tolerant storage system, in the non-adaptive reader model, resilient against (n) faults, cannot do better than the -intersecting storage system example.

Slide - 12

Lower bound on the blowup-ratioTheorem: A system which tolerates (n) faults, with -----

read complexity, has

Blowup Ratio =

Lower Bounds for the Non-Adaptive Reader Model

Lower bound on the read/write complexityTheorem: A system which tolerates (n) faults has

Read Complexity ¢ Write Complexity = ----

Formal definitions of the non-adaptive storage system

Slide - 13

Slightly Adaptive Adversary Model

Reader can have adaptive queries Wish to have a small number of rounds to shorten time

complexity of read operation

Slightly adaptive adversary: Adversary is less adaptive than the reader queries

Fail-stop and not Byzantine faults

Slide - 14

Generic Storage Scheme Storing a file:

Encode a file using a coding scheme with a constant blowup ratio (Reed Solomon, IDA [Rabin])

Distribute to a set chosen by a write strategy with load ------ (optimal)

Retrieval of a file, after faults: Find enough processors from the ‘write set’ Decode the file using the coding scheme

Fault-tolerance: With high probability:

The adversary doesn’t find any element during the adaptive queries phase. At least half of the processors in a chosen write set survive

Any half of the processors in the write set can reconstruct the file

To instantiate a storage system – plug-in a write strategy and a read algorithm

Load: Maximal probability of a processor to be chosen

Slide - 15

Choosing a write strategy

What about the random strategy of the example? Not a good candidate A read algorithm that finds a constant fraction of the

surviving processors requires access to (n) processors

We will present a strategy with ---------- read complexity Logarithmic number of rounds. Using the And-Or tree.

Slide - 16

The And-Or Tree Structure

Complete binary tree Leaves represent processors Inner nodes are AND/OR gates

Alternating layers

1

AND

OR OR

2 3 4 9 10 11 125 6 7 8 14 15 16

AND/OR gateProcessor

13

AND AND AND AND

OR OR OR OR OR OR OR OR

Slide - 17

The And-Or Tree Structure

Recursive Definition of ANDset, ORset collections Recursive procedure for selecting a set

Write strategy: Pick a set from the ANDset collection uniformly at random

Intersection Property: A set from ANDset collection and a set from ORset collection intersect

1 2 3 4 9 10 11 125 6 7 8 14 15 16131 3 13 16

AND

OR

AND

OR

AND/OR gateProcessor

Slide - 18

Adaptive Read Algorithm

1 2 3 4 9 10 11 125 6 7 8 14 15 1613

Write set

1 2 5 6

Pick a set from the ORset collection to find an element from the write set

AND/OR gateProcessor

Slide - 19

Read Algorithm - Pruning the Tree…

1 2 3 4 9 10 11 125 6 7 8 14 15 16131

To find remaining items, algorithm is recursively applied to remaining subtreesTotal of processor-accesses during the read algorithm

Write set

AND/OR gateProcessor

Slide - 20

Properties of the And-Or Storage System

Constant blowup ratio, write complexity and ---------- read complexity

Logarithmic number of rounds

Resilient against (n) faults of a slightly adaptive adversary

Cannot expect anything much better in terms of read/write tradeoff!

Slide - 21

Early Stopping When Less Faults Occur

Drawback: The read complexity is high even when no faults occur

Dynamic read-complexity: When up to t faults occur the read complexity is Pay in logarithmic instead of constant blowup ratio

Slide - 22

Dynamically Adjusting to the Number of Faults

AND

OR

AND

OR

Each Node represents a processor, not only leaves

Redefine the ANDset collection so that each set includes all the visited nodes

The size of a set in the collection remains ------

Slide - 23

Where do we stand? And-Or for static network

Ignored routing scheme

Adaptation of the And-Or storage system

Storage coupled with the routing

Use the distance-halving network [Naor-Wieder]

Next: Dynamic Environment

Slide - 24

Dynamic Hash Tables

The continuous space is partitioned locally (on the fly) into cells corresponding to processors Each point in [0,1) is covered by exactly one processor

0 1

Slide - 25

The Distance Halving Network [Naor, Wieder]

0 1x

continuous graph Nodes: [0,1) interval

Edges: Left and right outgoing edges

Each point is the root of a binary tree subgraph

Slide - 26

The Distance Halving Network [Naor, Wieder]

Connect two processors if their respective cells are connected in the continuous graph

0 1

Slide - 27

Embedding the Storage System

The binary tree Subgraph of the continuous graph

Well defined for each point

Depth log n

0 1

Edges covered by network connections

Gossip protocols

Each file has a different tree

Slide - 28

Storage Through Gossip

Data percolate using DH edges for log(n) steps

After a single write operation the writer is done, and the file can already be retrieved

Fault-Tolerance is built during gossip When messages reach the nodes in the ith level, the file is

(2i) fault-tolerant

Slide - 29

Retrieval

Uses routing protocol of the DH-network Routing dilation is O(log(n))

Total time for retrieval is O(log2(n))

Read complexity can be dynamically adjusted to the number of faults Store in every processor visited

Slide - 30

Fault-Tolerance

Balanced network A processor covers a segment of size O(1/n) Various balancing techniques (Manku, Karger and Ruhl, Naor and

Wieder, Abraham et al)

Theorem: When the network is balanced, the system is (n) fault-tolerant

0 1

Slide - 31

Open Questions

Do the lower bounds shown when both the reader and the adversary are non-adaptive hold when both are adaptive?

Is there a fault-tolerant storage system in the adaptive reader model with o(log(n)) rounds?

Slide - 32

Summary The probabilistic solution is optimal in the non-

adaptive reader model

The And-Or storage system Constant blowup-ratio Almost optimal read/write complexity

Adaptation of the storage system in a dynamic environment Storage uses network topology When the system is balanced it maintains fault-tolerance

Slide - 33

Agenda

Fault-Tolerant Storage System Fighting Censors

The And-Or Quorum System Static case Dynamic Networks Quorum systems are

important beyond their application in storage (mutual exclusion, load balancing, access control…)

Slide - 34

Measures of Quality

Load: Load of strategy: maximal probability of a processor to

be chosen Minimum over all strategies

Availability: Probability all quorums are hit under random faults

Probe Complexity: Number of probes required to obtain a live quorum w.h.p

Slide - 35

The And-Or Quorum System

Known Properties [Naor, Wool]: Optimal Load, High Availability

Our contribution: Static network:

Optimal non-adaptive algorithm Optimal adaptive algorithm

Construction in a dynamic network

Slide - 36

Non Adaptive Algorithm

Probes

Matches a lower bound [Naor, Wieder]

2lo

glo

g n

Slide - 37

Adaptive Algorithm

Probe complexity

Run in parallel

2loglogn rounds

Local Adjustments

Slide - 38

Dynamic Quorum System

The universe constantly changes

Two challenges: Integrity:

Intersection property Combinatorial structure and properties

Locality: Local way to access a quorum

Slide - 39

Dynamic And-Or

Embedding of a binary tree

DH-Graph Left, Right children Define Tree on each point Leaves equally divides [0,1)

0 1

A quorum of processors is the set that covers the points in a quorum

Slide - 40

Dynamic And-Or

Locality Natural gossip protocol

Integrity When network grows/shrinks members of quorums gossip

themselves to children/parent in the continuous graph

Network connections cover edges in the continuous graph

Slide - 41

Load

Processor is chosen when covered leaves are chosen Optimal load on leaves Balanced Network

Induced optimal load on processors

0 1

Slide - 42

Availability of the Dynamic Quorum Static case:

Global Failure probability exponentially decays

Processor fails with probability < 0.25

Dynamic case: Problem in analysis: Faults are not independent

When the network is balanced… Two leaves are dependent, only if covered by same processor

Constant number of dependent faults

Domination by a product measure

Slide - 43

Domination by a Product Measure

Finite set S

Space of configurations: = {0,1}S

Partial order : 1, 2 2

1 ¸ 2 if 8 s2 S, 1(s) ¸ 2(s)

Function f increasing:

1 ¸ 2 ) f(1) ¸ f(2)

Product measure (p)

8 s2 S, Pr[(s)=0] = p

(s) independent of all others

Slide - 44

Domination by a Product Measure

, Probability measures on ,

dominates ( ¹ ): for every increasing f

E(f) · E(f)

[Ligget et al]: If 8 s2 S, Pr[(s)=0] < pand this event is dependent on at most k

other such events (where k is a constant), then,

9 p`<p, s.t. p’ ¹

By decreasing p, p` can be made arbitrarily close to 0

Slide - 45

Availability of the Dynamic And-Or

S is the set of leaves, the configurations

probability measure induced by random faults on

processors

Balanced network: limited independence

Dominates a product measure p’

When p' < 0.25, Fp' · O(exp(-n0.5))

Slide - 46

Probe Complexity of the Dynamic And-Or

Nonadaptive

Subtrees are not independent

Positively correlated

Adaptive

Expected constant height for local subtrees

Expected number of probes

Markov: Optimal probe complexity with probability 1-

o(1)

Slide - 47

Other Dynamic Quorum Systems Dynamic Probabilistic QS [Abraham, Malkhi]

Random walk

Very high availability For arbitrary failure probability

Higher load

Dynamic Paths [Naor, Wieder]

Emulate Paths quorum system

Voronoi diagram

High availability Failure probability < 0.5

Slower Adaptive algorithm

Slide - 48

Summary

Non-adaptive, Adaptive Algorithms to And-Or Optimal Adaptive case: Excellent time complexity

Adaptation over dynamic overlay network

Optimal Load, probe complexity and high availability Domination by product measure

Slide - 49

Open Questions (on Quorum Systems)

Lower bound to the adaptive algorithmic probe complexity

Better analysis of the adaptive algorithm for dynamic network

Slide - 50

Elementary Storage System

Write strategy w Distribution on {N}n

Encoder: E(f,qw) (x1,…,xn)

Read strategy r Distribution on {0,1}n

Decoder:D(x1,…,xn) {0,1}k

qw chosen by w

Slide - 51

Reconstruction

Decoding a previously encoded file: D((E(f,qw),qr)) = f

(,k)-Storage System: 8 f2{0,1}k, Pr[D((E(f,qw),qr)) = f] > 1-

qw, qr chosen from w, r

projection to mask unread processors:x1,x2,x3,x4, (1,1,0,1)) = (x1,x2,,x4)

Slide - 52

(,k-)Intersection Property

Write Strategy w, Read Strategy r

The Pair (w,r) satisfy (,k)-Intersection Property, if

Pr[h qw,qr i > k] > 1-

number of bits read

Slide - 53

Storage System Characterization

Theorem: Let S=(w,E,r,D) be an (,k)-storage system. Then w,r maintain the (2,k) intersection property.

Slide - 54

Error Correcting Codes

View storage-system as coding scheme:

Message: files concatenated

Codeword: Processors’ memories concatenated

Worst case Faults-Model

Adversary “knows” the content of each processor

Slide - 55

Locally Decodable Codes

Decode a single symbol, instead of the whole message No need to read all the codeword(?)

Rates: No linear code for constant number of queries [Katz, Trevisan]

Exponential lower bound for 2 queries [Goldreich et al],[Wolf, Ker’]

Linear rate for polynomial number of queries Multivariate code [Reed-Muller]

Slide - 56

Balanced And-Or Tree

Family of trees Constant difference

Optimal Load

Good Availability

Different constants

Useful for the dynamic case


Recommended