CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13:...

CompSci 514: Computer Networks

Lecture 13: Distributed Hash Table

Xiaowei Yang

Overview

• What problems do DHTs solve?• How are DHTs implemented?

Background

• A hash table is a data structure that stores (key, object) pairs.

• Key is mapped to a table index via a hash function for fast lookup.

• Content distribution networks– Given an URL, returns the object

Example of a Hash table: a web cache

• Client requests http://www.cnn.com• Web cache returns the page content

located at the 1st entry of the table.

http://www.cnn.com Page contenthttp://www.nytimes.com …….http://www.slashdot.org …..… …… …

0

1

2

http://www.cnn.com/

DHT: why?

• If the number of objects is large, it is impossible for any single node to store it.

• Solution: distributed hash tables.– Split one large hash table into smaller tables

and distribute them to multiple nodes

DHTK V K V

K V K V

A content distribution network

• A single provider that manages multiple replicas.

• A client obtains content from a close replica.

Basic function of DHT• DHT is a �virtual� hash table

– Input: a key– Output: a data item

• Data Items are stored by a network of nodes.• DHT abstraction

– Input: a key– Output: the node that stores the key

• Applications handle key and data item association.

DHT: a visual exampleK V K V

K V

K V

K V

Insert (K1, V1)

(K1, V1)

DHT: a visual exampleK V K V

K V

K V

K V

Retrieve K1

(K1, V1)

Desired properties of DHT

• Scalability: each node does not keep much state

• Performance: look up latency is small

• Load balancing: no node is overloaded with a large amount of state

• Dynamic reconfiguration: when nodes join and leave, the amount of state moved from nodes to nodes is small.

• Distributed: no node is more important than others.

A straw man design

• Suppose all keys are intergers• The number of nodes in the network

is n.• id = key % n

0

1 2

(0, V1)(3, V2)

(1, V3)(4, V4) (2, V5)

(5, V6)

When node 2 dies

• A large number of data items need to be rehashed.

0

1

(0, V1)(2, V5)(4, V4)

(1, V3)(3, V2)(5, V6)

Fix: consistent hashing

• When a node joins or leaves, the expected fraction of objects that must be moved is the minimum needed to maintain a balanced load.

• A node is responsible for a range of keys

• All DHTs implement consistent hashing

Chord: basic idea

• Hash both node id and key into a m-bit one-dimension circular identifier space

• Consistent hashing: a key is stored at a node whose identifier is closest to the key in the identifier space– Key refers to both the key and its hash value.

Basic components of DHTs

• Overlapping key and node identifier space

– Hash(www.cnn.com/image.jpg) à a n-bit binary string

– Nodes that store the objects also have n-bit string as

their identifiers

• Building routing tables

– Next hops

– Distance functions

– These two determine the geometry of DHTs

• Ring, Tree, Hybercubes, hybrid (tree + ring) etc.

– Handle node join and leave

• Lookup and store interface

http://www.cnn.com/image.jpg

N32

N90

N105

K80

K20

K5

Circular 7-bitID space

Key 5Node 105

A key is stored at its successor: node with next higher ID

Chord: ring topology

Chord: how to find a node that stores a key?

• Solution 1: every node keeps a routing table to all other nodes– Given a key, a node knows which node

id is successor of the key– The node sends the query to the

successor– What are the advantages and

disadvantages of this solution?

N32

N90

N105

N60

N10N120

K80

�Where is key 80?�

�N90 has K80�

Solution 2: every node keeps a routing entry to the node�s successor (a linked list)

Simple lookup algorithmLookup(my-id, key-id)

n = my successorif my-id < n < key-id

call Lookup(key-id) on node n // next hopelse

return my successor // done

• Correctness depends only on successors• Q1: will this algorithm miss the real successor?• Q2: what�s the average # of lookup hops?

Solution 3: �Finger table� allows log(N)-time lookups

• Analogy: binary search

N80

½¼

1/8

1/161/321/641/128

Finger i points to successor of n+2i-1

• A finger table entry includes Chord Id and IP address

• Each node stores a small table log(N)

N80

½¼

1/8

1/161/321/641/128

112N120

Chord finger table example

0

1

2

34

5

6

7

1 [1,2)

2 [2,4)4 [4,0)

1

30

Keys:5,6

2 [2,3)

[3,5)5 [5,1)

3

30

Keys:13

4 [4,5)

[5,7)7 [7,3)

0

00

5Keys:2

Lookup with fingers

Lookup(my-id, key-id)look in local finger table for

highest node n s.t. my-id < n < key-idif n exists

call Lookup(key-id) on node n // next hopelse

return my successor // done

5

// ask node n to fi nd the successor of idn.find successor(id)if (id ∈ (n, successor])return successor;

else// forward the query around the circlereturn successor.fi nd successor(id);

(a)

lookup(K54)

N8

N14

N38

N42

N51

N48

N21

N32

N56K54

N1

(b)

Fig. 3. (a) Simple (but slow) pseudocode to fi nd the successor node of an identifi er id. Remote procedure calls and variable lookups are preceded by the remotenode. (b) The path taken by a query from node 8 for key 54, using the pseudocode in Figure 3(a).

N1

N14

N38

N51

N48

N21

N32

+32

+1

+2+4

+8+16

N42

N8 + 1 N14N8 + 2 N14N8 + 4 N14N8 + 8 N21N8 +16 N32N8 +32 N42

Finger tableN8

(a)

N1

lookup(54)

N8

N14

N38

N42

N51

N48

N21

N32

N56K54

(b)Fig. 4. (a) The fi nger table entries for node 8. (b) The path a query for key 54 starting at node 8, using the algorithm in Figure 5.

Notation Defi nitionfinger[k] first node on circle that succeeds (n +

2k−1) mod 2m, 1 ≤ k ≤ msuccessor the next node on the identifier circle;

finger[1].nodepredecessor the previous node on the identifier circle

TABLE IDefi nition of variables for node n, usingm-bit identifi ers.

The example in Figure 4(a) shows the finger table of node 8.The first finger of node 8 points to node 14, as node 14 is thefirst node that succeeds (8 + 20) mod 26 = 9. Similarly, the lastfinger of node 8 points to node 42, as node 42 is the first nodethat succeeds (8 + 25) mod 26 = 40.This scheme has two important characteristics. First, each

node stores information about only a small number of othernodes, and knows more about nodes closely following it on theidentifier circle than about nodes farther away. Second, a node’sfinger table generally does not contain enough information todirectly determine the successor of an arbitrary key k. For ex-ample, node 8 in Figure 4(a) cannot determine the successor ofkey 34 by itself, as this successor (node 38) does not appear innode 8’s finger table.Figure 5 shows the pseudocode of the find successor opera-

// ask node n to fi nd the successor of idn.find successor(id)if (id ∈ (n, successor])return successor;

elsen′ = closest preceding node(id);return n′.fi nd successor(id);

// search the local table for the highest predecessor of idn.closest preceding node(id)for i = m downto 1if (fi nger[i] ∈ (n, id))return fi nger[i];

return n;

Fig. 5. Scalable key lookup using the fi nger table.

tion, extended to use finger tables. If id falls between n andits successor, find successor is finished and node n returns itssuccessor. Otherwise, n searches its finger table for the noden′ whose ID most immediately precedes id, and then invokesfind successor at n′. The reason behind this choice of n′ is thatthe closer n′ is to id, the more it will know about the identifiercircle in the region of id.As an example, consider the Chord circle in Figure 4(b), and

suppose node 8 wants to find the successor of key 54. Since thelargest finger of node 8 that precedes 54 is node 42, node 8 willask node 42 to resolve the query. In turn, node 42will determinethe largest finger in its finger table that precedes 54, i.e., node51. Finally, node 51 will discover that its own successor, node

Chord lookup example

0

1

2

34

5

6

7

1 [1,2)

2 [2,4)4 [4,0)

1

30

Keys:5,6

2 [2,3)

[3,5)5 [5,1)

3

30

Keys:13

4 [4,5)

[5,7)7 [7,3)

0

00

5Keys:2

• Lookup(1,6)• Lookup(1,2)

Node join• Maintain the invariant

1.Each node�s successor is correctly maintained

2.For every node k, node successor(k) answers for k. It�s desirable that finger table entries are correct

• Each nodes maintains a predecessor pointer

• Tasks: – Initialize predecessor and fingers of new node– Update existing nodes� state– Notify apps to transfer state to new node

Chord Joining: linked list insert

• Node n queries a known node n� to initialize its state

• for its successor: lookup (n)

N36

N40

N25

1. Lookup(36)K30K38

Join (2)

N36

N40

N25

2. N36 sets its ownsuccessor pointer

K30K38

Join (3)

• Note that join does not make the network aware of n

N36

N40

N25

3. Copy keys 26..36from N40 to N36

K30K38

K30

Join (4): stabilize

• Stabilize 1) obtains a node n�s successor�s predecessor x, and determines whether x should be n�s successor 2) notifies n�s successor n�s existence

– N25 calls its successor N40 to return its predecessor– Set its successor to N36– Notifies N36 it is predecessor

• Update finger pointers in the background periodically– Find the successor of each entry i

• Correct successors produce correct lookups

N36

N40

N25

4. Set N25�s successorpointer

K38

K30

Failures might cause incorrect lookup

N120N113

N102

N80

N85

N80 doesn�t know correct successor, so incorrect lookup

N10

Lookup(90)

Solution: successor lists

• Each node knows r immediate successors• After failure, will know first live successor• Correct successors guarantee correct lookups

• Guarantee is with some probability

• Higher layer software can be notified to duplicate keys at failed nodes to live successors

Choosing the successor list length

• Assume 1/2 of nodes fail• P(successor list all dead) = (1/2)r

– I.e. P(this node breaks the Chord ring)– Depends on independent failure

• P(no broken nodes) = (1 – (1/2)r)N

– r = 2log(N) makes prob. = 1 – 1/N

Lookup with fault tolerance

Lookup(my-id, key-id)look in local finger table and successor-list

for highest node n s.t. my-id < n < key-idif n exists

call Lookup(key-id) on node n // next hopif call failed,

remove n from finger tablereturn Lookup(my-id, key-id)

else return my successor // done

Chord performance

• Per node storage– Ideally: K/N– Implementation: large variance due to

unevenly node id distribution• Lookup latency

– O(logN)

Comments on Chord• DHTs are used for p2p file lookup in the real

world

• ID distance ¹ Network distance– Reducing lookup latency and locality are research

challenges• Strict successor selection

– Can�t overshoot• Asymmetry

– A node does not learn its routing table entries from queries it receives

Conclusion

• Consistent Hashing

– What problem does it solve

• Design of DHTs

– Chord: ring

• Kademlia: tree

– Used in practice, emule, Bittorrent

– CAN: hybercube

– Much more others: Pastry, Tapestry, Viceroy….

Discussion

• What tradeoff does chord make?• How can we improve chord�s lookup

latency?• What are the possible applications of

DHT?• Recursive lookup or iterative lookup?

Date post:	06-Jun-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13:...

Documents