+ All Categories
Home > Documents > CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13:...

CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13:...

Date post: 06-Jun-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
39
CompSci 514: Computer Networks Lecture 13: Distributed Hash Table Xiaowei Yang
Transcript
Page 1: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

CompSci 514: Computer Networks

Lecture 13: Distributed Hash Table

Xiaowei Yang

Page 2: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

Overview

• What problems do DHTs solve?• How are DHTs implemented?

Page 3: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

Background

• A hash table is a data structure that stores (key, object) pairs.

• Key is mapped to a table index via a hash function for fast lookup.

• Content distribution networks– Given an URL, returns the object

Page 4: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

Example of a Hash table: a web cache

• Client requests http://www.cnn.com• Web cache returns the page content

located at the 1st entry of the table.

http://www.cnn.com Page contenthttp://www.nytimes.com …….http://www.slashdot.org …..… …… …

0

1

2

Page 5: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

DHT: why?

• If the number of objects is large, it is impossible for any single node to store it.

• Solution: distributed hash tables.– Split one large hash table into smaller tables

and distribute them to multiple nodes

Page 6: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

DHTK V K V

K V K V

Page 7: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

A content distribution network

• A single provider that manages multiple replicas.

• A client obtains content from a close replica.

Page 8: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

Basic function of DHT• DHT is a �virtual� hash table

– Input: a key– Output: a data item

• Data Items are stored by a network of nodes.• DHT abstraction

– Input: a key– Output: the node that stores the key

• Applications handle key and data item association.

Page 9: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

DHT: a visual exampleK V K V

K V

K V

K V

Insert (K1, V1)

(K1, V1)

Page 10: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

DHT: a visual exampleK V K V

K V

K V

K V

Retrieve K1

(K1, V1)

Page 11: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

Desired properties of DHT

• Scalability: each node does not keep much state

• Performance: look up latency is small

• Load balancing: no node is overloaded with a large amount of state

• Dynamic reconfiguration: when nodes join and leave, the amount of state moved from nodes to nodes is small.

• Distributed: no node is more important than others.

Page 12: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

A straw man design

• Suppose all keys are intergers• The number of nodes in the network

is n.• id = key % n

0

1 2

(0, V1)(3, V2)

(1, V3)(4, V4) (2, V5)

(5, V6)

Page 13: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

When node 2 dies

• A large number of data items need to be rehashed.

0

1

(0, V1)(2, V5)(4, V4)

(1, V3)(3, V2)(5, V6)

Page 14: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

Fix: consistent hashing

• When a node joins or leaves, the expected fraction of objects that must be moved is the minimum needed to maintain a balanced load.

• A node is responsible for a range of keys

• All DHTs implement consistent hashing

Page 15: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

Chord: basic idea

• Hash both node id and key into a m-bit one-dimension circular identifier space

• Consistent hashing: a key is stored at a node whose identifier is closest to the key in the identifier space– Key refers to both the key and its hash value.

Page 16: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

Basic components of DHTs

• Overlapping key and node identifier space

– Hash(www.cnn.com/image.jpg) à a n-bit binary string

– Nodes that store the objects also have n-bit string as

their identifiers

• Building routing tables

– Next hops

– Distance functions

– These two determine the geometry of DHTs

• Ring, Tree, Hybercubes, hybrid (tree + ring) etc.

– Handle node join and leave

• Lookup and store interface

Page 17: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

N32

N90

N105

K80

K20

K5

Circular 7-bitID space

Key 5Node 105

A key is stored at its successor: node with next higher ID

Chord: ring topology

Page 18: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

Chord: how to find a node that stores a key?

• Solution 1: every node keeps a routing table to all other nodes– Given a key, a node knows which node

id is successor of the key– The node sends the query to the

successor– What are the advantages and

disadvantages of this solution?

Page 19: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

N32

N90

N105

N60

N10N120

K80

�Where is key 80?�

�N90 has K80�

Solution 2: every node keeps a routing entry to the node�s successor (a linked list)

Page 20: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

Simple lookup algorithmLookup(my-id, key-id)

n = my successorif my-id < n < key-id

call Lookup(key-id) on node n // next hopelse

return my successor // done

• Correctness depends only on successors• Q1: will this algorithm miss the real successor?• Q2: what�s the average # of lookup hops?

Page 21: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

Solution 3: �Finger table� allows log(N)-time lookups

• Analogy: binary search

N80

½¼

1/8

1/161/321/641/128

Page 22: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

Finger i points to successor of n+2i-1

• A finger table entry includes Chord Id and IP address

• Each node stores a small table log(N)

N80

½¼

1/8

1/161/321/641/128

112N120

Page 23: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

Chord finger table example

0

1

2

34

5

6

7

1 [1,2)

2 [2,4)4 [4,0)

1

30

Keys:5,6

2 [2,3)

[3,5)5 [5,1)

3

30

Keys:13

4 [4,5)

[5,7)7 [7,3)

0

00

5Keys:2

Page 24: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

Lookup with fingers

Lookup(my-id, key-id)look in local finger table for

highest node n s.t. my-id < n < key-idif n exists

call Lookup(key-id) on node n // next hopelse

return my successor // done

Page 25: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

5

// ask node n to fi nd the successor of idn.find successor(id)if (id ∈ (n, successor])return successor;

else// forward the query around the circlereturn successor.fi nd successor(id);

(a)

lookup(K54)

N8

N14

N38

N42

N51

N48

N21

N32

N56K54

N1

(b)

Fig. 3. (a) Simple (but slow) pseudocode to fi nd the successor node of an identifi er id. Remote procedure calls and variable lookups are preceded by the remotenode. (b) The path taken by a query from node 8 for key 54, using the pseudocode in Figure 3(a).

N1

N14

N38

N51

N48

N21

N32

+32

+1

+2+4

+8+16

N42

N8 + 1 N14N8 + 2 N14N8 + 4 N14N8 + 8 N21N8 +16 N32N8 +32 N42

Finger tableN8

(a)

N1

lookup(54)

N8

N14

N38

N42

N51

N48

N21

N32

N56K54

(b)Fig. 4. (a) The fi nger table entries for node 8. (b) The path a query for key 54 starting at node 8, using the algorithm in Figure 5.

Notation Defi nitionfinger[k] first node on circle that succeeds (n +

2k−1) mod 2m, 1 ≤ k ≤ msuccessor the next node on the identifier circle;

finger[1].nodepredecessor the previous node on the identifier circle

TABLE IDefi nition of variables for node n, usingm-bit identifi ers.

The example in Figure 4(a) shows the finger table of node 8.The first finger of node 8 points to node 14, as node 14 is thefirst node that succeeds (8 + 20) mod 26 = 9. Similarly, the lastfinger of node 8 points to node 42, as node 42 is the first nodethat succeeds (8 + 25) mod 26 = 40.This scheme has two important characteristics. First, each

node stores information about only a small number of othernodes, and knows more about nodes closely following it on theidentifier circle than about nodes farther away. Second, a node’sfinger table generally does not contain enough information todirectly determine the successor of an arbitrary key k. For ex-ample, node 8 in Figure 4(a) cannot determine the successor ofkey 34 by itself, as this successor (node 38) does not appear innode 8’s finger table.Figure 5 shows the pseudocode of the find successor opera-

// ask node n to fi nd the successor of idn.find successor(id)if (id ∈ (n, successor])return successor;

elsen′ = closest preceding node(id);return n′.fi nd successor(id);

// search the local table for the highest predecessor of idn.closest preceding node(id)for i = m downto 1if (fi nger[i] ∈ (n, id))return fi nger[i];

return n;

Fig. 5. Scalable key lookup using the fi nger table.

tion, extended to use finger tables. If id falls between n andits successor, find successor is finished and node n returns itssuccessor. Otherwise, n searches its finger table for the noden′ whose ID most immediately precedes id, and then invokesfind successor at n′. The reason behind this choice of n′ is thatthe closer n′ is to id, the more it will know about the identifiercircle in the region of id.As an example, consider the Chord circle in Figure 4(b), and

suppose node 8 wants to find the successor of key 54. Since thelargest finger of node 8 that precedes 54 is node 42, node 8 willask node 42 to resolve the query. In turn, node 42will determinethe largest finger in its finger table that precedes 54, i.e., node51. Finally, node 51 will discover that its own successor, node

Page 26: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

Chord lookup example

0

1

2

34

5

6

7

1 [1,2)

2 [2,4)4 [4,0)

1

30

Keys:5,6

2 [2,3)

[3,5)5 [5,1)

3

30

Keys:13

4 [4,5)

[5,7)7 [7,3)

0

00

5Keys:2

• Lookup(1,6)• Lookup(1,2)

Page 27: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

Node join• Maintain the invariant

1.Each node�s successor is correctly maintained

2.For every node k, node successor(k) answers for k. It�s desirable that finger table entries are correct

• Each nodes maintains a predecessor pointer

• Tasks: – Initialize predecessor and fingers of new node– Update existing nodes� state– Notify apps to transfer state to new node

Page 28: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

Chord Joining: linked list insert

• Node n queries a known node n� to initialize its state

• for its successor: lookup (n)

N36

N40

N25

1. Lookup(36)K30K38

Page 29: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

Join (2)

N36

N40

N25

2. N36 sets its ownsuccessor pointer

K30K38

Page 30: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

Join (3)

• Note that join does not make the network aware of n

N36

N40

N25

3. Copy keys 26..36from N40 to N36

K30K38

K30

Page 31: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

Join (4): stabilize

• Stabilize 1) obtains a node n�s successor�s predecessor x, and determines whether x should be n�s successor 2) notifies n�s successor n�s existence

– N25 calls its successor N40 to return its predecessor– Set its successor to N36– Notifies N36 it is predecessor

• Update finger pointers in the background periodically– Find the successor of each entry i

• Correct successors produce correct lookups

N36

N40

N25

4. Set N25�s successorpointer

K38

K30

Page 32: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

Failures might cause incorrect lookup

N120N113

N102

N80

N85

N80 doesn�t know correct successor, so incorrect lookup

N10

Lookup(90)

Page 33: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

Solution: successor lists

• Each node knows r immediate successors• After failure, will know first live successor• Correct successors guarantee correct lookups

• Guarantee is with some probability

• Higher layer software can be notified to duplicate keys at failed nodes to live successors

Page 34: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

Choosing the successor list length

• Assume 1/2 of nodes fail• P(successor list all dead) = (1/2)r

– I.e. P(this node breaks the Chord ring)– Depends on independent failure

• P(no broken nodes) = (1 – (1/2)r)N

– r = 2log(N) makes prob. = 1 – 1/N

Page 35: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

Lookup with fault tolerance

Lookup(my-id, key-id)look in local finger table and successor-list

for highest node n s.t. my-id < n < key-idif n exists

call Lookup(key-id) on node n // next hopif call failed,

remove n from finger tablereturn Lookup(my-id, key-id)

else return my successor // done

Page 36: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

Chord performance

• Per node storage– Ideally: K/N– Implementation: large variance due to

unevenly node id distribution• Lookup latency

– O(logN)

Page 37: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

Comments on Chord• DHTs are used for p2p file lookup in the real

world

• ID distance ¹ Network distance– Reducing lookup latency and locality are research

challenges• Strict successor selection

– Can�t overshoot• Asymmetry

– A node does not learn its routing table entries from queries it receives

Page 38: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

Conclusion

• Consistent Hashing

– What problem does it solve

• Design of DHTs

– Chord: ring

• Kademlia: tree

– Used in practice, emule, Bittorrent

– CAN: hybercube

– Much more others: Pastry, Tapestry, Viceroy….

Page 39: CompSci514: Computer Networks Lecture 13: Distributed Hash Table · 2018-11-20 · Lecture 13: Distributed Hash Table Xiaowei Yang. Overview •What problems do DHTs solve? •How

Discussion

• What tradeoff does chord make?• How can we improve chord�s lookup

latency?• What are the possible applications of

DHT?• Recursive lookup or iterative lookup?


Recommended