+ All Categories
Home > Documents > Distributed Hash tables

Distributed Hash tables

Date post: 08-Feb-2016
Category:
Upload: zahid52
View: 7 times
Download: 0 times
Share this document with a friend
Description:
Lecture for Distributed Hash tables
Popular Tags:
27
DHTs 1
Transcript
Page 1: Distributed Hash tables

DHTs

1

Page 2: Distributed Hash tables

Distributed Hash Tables (DHTs)

� Abstraction: a distributed hash-table data structure - insert(id, item);- item = query(id); (or lookup(id);)- Note: item can be anything: a data object, document, file,

pointer to a file…

2

� Proposals- CAN, Chord, Kademlia, Pastry, Tapestry, etc

Page 3: Distributed Hash tables

� Distributed Hash Tables (DHTs)� Hash table interface: put (key,item), get (key)� O(log n) hops

Structured Networks

K IK I

(K1,I1)

3

K I

K I

K I

K I

K I

K I

K I

put(K 1,I1) get (K 1)

I1

Page 4: Distributed Hash tables

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications

4

Robert MorrisIon Stoica, David Karger,

M. Frans Kaashoek, Hari Balakrishnan

MIT and Berkeley

Page 5: Distributed Hash tables

Chord IDs

� Key identifier = SHA-1(key)� Node identifier = SHA-1(IP address)� Both are uniformly distributed� Both exist in the same ID space

5

� Both exist in the same ID space

� How to map key IDs to node IDs?

Page 6: Distributed Hash tables

Consistent hashing [Karger 97]

N105 K20

K5Key 5Node 105

6

N32

N90

K80

Circular 7-bitID space

A key is stored at its successor: node with next higher ID

Page 7: Distributed Hash tables

� N: number of nodes in the system� m: m-bit identifier assigned to each node id� K: number of keys stored in system� Identifier circle: modulo 2m

7

� “Load balancing”: uniform distribution of keys to nodes- Rougly K/N keys change hand for each join/leave.

� Discussion:- What if nodes have heterogeneous capabilities?- What if keys have varying degrees of popularity?

Page 8: Distributed Hash tables

Basic lookup

N105

N10N120

“Where is key 80?”

8

N32

N90

N60

K80

“N90 has K80”

Page 9: Distributed Hash tables

Simple lookup algorithm

Lookup(my-id, key-id)n = my successorif my-id < n < key-id

call Lookup(id) on node n // next hop

9

elsereturn my successor // done

� Correctness depends only on successors

Page 10: Distributed Hash tables

“Finger table” allows log(N)-time lookups

½¼

10

N80

1/8

1/161/321/641/128

Page 11: Distributed Hash tables

Finger i points to successor of n+2i

½¼112

N120

11

N80

1/8

1/161/321/641/128

Page 12: Distributed Hash tables

Lookup with fingers

Lookup(my-id, key-id)look in local finger table for

highest node n s.t. my-id < n < key-idif n exists

12

call Lookup(id) on node n // next hop

elsereturn my successor // done

Page 13: Distributed Hash tables

Lookups take O( log(N)) hops

N10

N5

N20

N110

N99

K19

13

N32

N99

N80

N60

Lookup(K19)

Page 14: Distributed Hash tables

Discussion

� “Structure”:- Neighbor relations imposed by id- Performance not considered- Potentially log N hops, but each hop could be really inefficient.

� Debate:

14

� Debate:- Can “Structured” protocols result in good performance?

� Variants (e.g. Pastry):- Slightly “relax” structure- Neighbor must be chosen from among “set of possibilities”.- Use performance to choose among them.

Page 15: Distributed Hash tables

Joining: linked list insert

N25

15

N36

N401. Lookup(36)

K30K38

Page 16: Distributed Hash tables

Join (2)

N25

16

N36

N40

2. N36 sets its ownsuccessor pointer

K30K38

Page 17: Distributed Hash tables

Join (3)

N25

17

N36

N40

3. Copy keys 26..36from N40 to N36

K30K38

K30

Page 18: Distributed Hash tables

Join (4)

N25

18

N36

N40

4. Set N25’s successorpointer

Update finger pointers in the backgroundCorrect successors produce correct lookups

K30K38

K30

Page 19: Distributed Hash tables

Join (finger pointers)

� Node n contacts “bootstrap node” b� Node n learns predecessors and fingers by asking b to look them up.

- O(log2N) messages- Practical optimizations to reduce further.

� Need to update existing finger tables of existing nodes to point to n.- p=findpredecessor(n - 2i-1)

19

- p=findpredecessor(n - 2 )- Find Predecessor (id):

• Contact a node x towards the id• If id lies btw x and succ(x), return x• Else ask x for the node it knows about that closely precedes id

Page 20: Distributed Hash tables

Failures might cause incorrect lookup

N120

N113

N102

N10

20

N80

N85

N80 doesn’t know correct successor, so incorrect lookup

Lookup(90)

Page 21: Distributed Hash tables

Solution: successor lists

� Each node knows r immediate successors� After failure, will know first live successor� Correct successors guarantee correct lookups

21

� Guarantee is with some probability� If O(log N) successors, probability of ring partition is small.

Page 22: Distributed Hash tables

Concurrent Operations and Failures

� What’s discussed:- Simplistic scenarios: individual joins and leaves.- How about concurrent failures, churn, transient state?

� Paper argument:- Maintain correctness of ring.

22

- Maintain correctness of ring.- Its ok to have inaccuracies in finger table (performance)

� “Stabilization” primitive:- If node n runs stabilize, it asks its successor s for its predecessor p.- Decides if p should be successor instead.

� Does not discuss failures leading to “partition” of ring� Finger-pointer update not extensively discussed.

Page 23: Distributed Hash tables

Other Notable DHTs

� Concurrently developed with Chord:- CAN (Ratnasamy, Francis, Handley, Karp, Shenker)- Pastry (Rowstron, Druschel)- Tapestry (Zhao, Kubiatowicz, Joseph)

23

� Later: - Kademlia (Mazieres)

Page 24: Distributed Hash tables

Discussion

� Churn, concurrent failures is tricky in general� Paper, and DHT works in general seem to be superficial

about it. (some follow-on work exists).� “Ring Correctness”: What if NATs exist?

24

Page 25: Distributed Hash tables

Data Replication

� If node leaves, data moved to nearby node.� What about abrupt failure?� Paper: Replicate data in r successors.

- Follow-up work looked at replication.

25

Page 26: Distributed Hash tables

DHTs: Discussion

� Structured Search Vs. Unstructured Search.- Structure helps, but what if most searches are for popular files?

� Lots of practical issues are tricky with structured- Handling node heterogeneity (Ethernet Vs. DSL)- Handling NATs and firewalls

26

- High churn- Performance Vs. structure in neighbor selection.- Load imbalance due to differences in file popularity- Keyword search/ fuzzy keyword search

� Lots of work since on:- Resolving some of these issues- Comparing structured and unstructured protocols- Hybrid schemes that combine both structured/unstructured.

Page 27: Distributed Hash tables

DHTs: Discussion

� DHTs: Evolved as a nice algorithm for search/lookup.� Since then, DHT-camp argues it is a generic substrate

- Multicast, Anycast, file storage, etc. etc. (e.g. I3 paper, Sigcomm02)

� Panacea/Generic substrate or “solution looking for problems”?

27

problems”?

� DHTs in real world:- Primarily seen success in lookup/search. - Deployed in Kademlia and recent versions of Bittorrent.


Recommended