Distributed Hash tables

DHTs

1

Distributed Hash Tables (DHTs)

� Abstraction: a distributed hash-table data structure - insert(id, item);- item = query(id); (or lookup(id);)- Note: item can be anything: a data object, document, file,

pointer to a file…

2

� Proposals- CAN, Chord, Kademlia, Pastry, Tapestry, etc

� Distributed Hash Tables (DHTs)� Hash table interface: put (key,item), get (key)� O(log n) hops

Structured Networks

K IK I

(K1,I1)

3

K I

K I

K I

K I

K I

K I

K I

put(K 1,I1) get (K 1)

I1

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications

4

Robert MorrisIon Stoica, David Karger,

M. Frans Kaashoek, Hari Balakrishnan

MIT and Berkeley

Chord IDs

� Key identifier = SHA-1(key)� Node identifier = SHA-1(IP address)� Both are uniformly distributed� Both exist in the same ID space

5

� Both exist in the same ID space

� How to map key IDs to node IDs?

Consistent hashing [Karger 97]

N105 K20

K5Key 5Node 105

6

N32

N90

K80

Circular 7-bitID space

A key is stored at its successor: node with next higher ID

� N: number of nodes in the system� m: m-bit identifier assigned to each node id� K: number of keys stored in system� Identifier circle: modulo 2m

7

� “Load balancing”: uniform distribution of keys to nodes- Rougly K/N keys change hand for each join/leave.

� Discussion:- What if nodes have heterogeneous capabilities?- What if keys have varying degrees of popularity?

Basic lookup

N105

N10N120

“Where is key 80?”

8

N32

N90

N60

K80

“N90 has K80”

Simple lookup algorithm

Lookup(my-id, key-id)n = my successorif my-id < n < key-id

call Lookup(id) on node n // next hop

9

elsereturn my successor // done

� Correctness depends only on successors

“Finger table” allows log(N)-time lookups

½¼

10

N80

1/8

1/161/321/641/128

Finger i points to successor of n+2i

½¼112

N120

11

N80

1/8

1/161/321/641/128

Lookup with fingers

Lookup(my-id, key-id)look in local finger table for

highest node n s.t. my-id < n < key-idif n exists

12

call Lookup(id) on node n // next hop

elsereturn my successor // done

Lookups take O( log(N)) hops

N10

N5

N20

N110

N99

K19

13

N32

N99

N80

N60

Lookup(K19)

Discussion

� “Structure”:- Neighbor relations imposed by id- Performance not considered- Potentially log N hops, but each hop could be really inefficient.

� Debate:

14

� Debate:- Can “Structured” protocols result in good performance?

� Variants (e.g. Pastry):- Slightly “relax” structure- Neighbor must be chosen from among “set of possibilities”.- Use performance to choose among them.

Joining: linked list insert

N25

15

N36

N401. Lookup(36)

K30K38

Join (2)

N25

16

N36

N40

2. N36 sets its ownsuccessor pointer

K30K38

Join (3)

N25

17

N36

N40

3. Copy keys 26..36from N40 to N36

K30K38

K30

Join (4)

N25

18

N36

N40

4. Set N25’s successorpointer

Update finger pointers in the backgroundCorrect successors produce correct lookups

K30K38

K30

Join (finger pointers)

� Node n contacts “bootstrap node” b� Node n learns predecessors and fingers by asking b to look them up.

- O(log2N) messages- Practical optimizations to reduce further.

� Need to update existing finger tables of existing nodes to point to n.- p=findpredecessor(n - 2i-1)

19

- p=findpredecessor(n - 2 )- Find Predecessor (id):

• Contact a node x towards the id• If id lies btw x and succ(x), return x• Else ask x for the node it knows about that closely precedes id

Failures might cause incorrect lookup

N120

N113

N102

N10

20

N80

N85

N80 doesn’t know correct successor, so incorrect lookup

Lookup(90)

Solution: successor lists

� Each node knows r immediate successors� After failure, will know first live successor� Correct successors guarantee correct lookups

21

� Guarantee is with some probability� If O(log N) successors, probability of ring partition is small.

Concurrent Operations and Failures

� What’s discussed:- Simplistic scenarios: individual joins and leaves.- How about concurrent failures, churn, transient state?

� Paper argument:- Maintain correctness of ring.

22

- Maintain correctness of ring.- Its ok to have inaccuracies in finger table (performance)

� “Stabilization” primitive:- If node n runs stabilize, it asks its successor s for its predecessor p.- Decides if p should be successor instead.

� Does not discuss failures leading to “partition” of ring� Finger-pointer update not extensively discussed.

Other Notable DHTs

� Concurrently developed with Chord:- CAN (Ratnasamy, Francis, Handley, Karp, Shenker)- Pastry (Rowstron, Druschel)- Tapestry (Zhao, Kubiatowicz, Joseph)

23

� Later: - Kademlia (Mazieres)

Discussion

� Churn, concurrent failures is tricky in general� Paper, and DHT works in general seem to be superficial

about it. (some follow-on work exists).� “Ring Correctness”: What if NATs exist?

24

Data Replication

� If node leaves, data moved to nearby node.� What about abrupt failure?� Paper: Replicate data in r successors.

- Follow-up work looked at replication.

25

DHTs: Discussion

� Structured Search Vs. Unstructured Search.- Structure helps, but what if most searches are for popular files?

� Lots of practical issues are tricky with structured- Handling node heterogeneity (Ethernet Vs. DSL)- Handling NATs and firewalls

26

- High churn- Performance Vs. structure in neighbor selection.- Load imbalance due to differences in file popularity- Keyword search/ fuzzy keyword search

� Lots of work since on:- Resolving some of these issues- Comparing structured and unstructured protocols- Hybrid schemes that combine both structured/unstructured.

DHTs: Discussion

� DHTs: Evolved as a nice algorithm for search/lookup.� Since then, DHT-camp argues it is a generic substrate

- Multicast, Anycast, file storage, etc. etc. (e.g. I3 paper, Sigcomm02)

� Panacea/Generic substrate or “solution looking for problems”?

27

problems”?

� DHTs in real world:- Primarily seen success in lookup/search. - Deployed in Kademlia and recent versions of Bittorrent.

Date post:	08-Feb-2016
Category:	Documents
Upload:	zahid52
View:	7 times
Download:	0 times

Distributed Hash tables

Documents