Date post: | 22-Dec-2015 |
Category: |
Documents |
View: | 219 times |
Download: | 3 times |
File Sharing : Hash/LookupYossi Shasho(HW in last slide)
• Based on Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications•Partially based on The Impact of DHT Routing Geometry on Resilience and Proximity• Partially based on Building a Low-latency, Proximity-aware DHT-Based P2P Network http://www.computer.org/portal/web/csdl/doi/10.1109/KSE.2009.49 • Some slides liberally borrowed from:
• Carnegie Melon Peer-2-Peer 15-411• Petar Maymounkov and David Mazières’ Kademlia Talk, New York University
1
Peer-2-Peer
– Distributed systems without any centralized control or hierarchical organization.
– Long list of applications:• Redundant storage• Permanence• Selection of nearby servers• Anonymity, search, authentication, hierarchical naming
and more
– Core operation in most p2p systems is efficient location of data items
2
Think Big
• /home/google/• One namespace, thousands of servers
– Map each key (=filename) to a value (=server)– Hash table? Think again
• What if a new server joins? server fails?• How to keep track of all servers?• What about redundancy? And proximity?• Not scalable, Centralized, Fault intolerant• Lots of new problems to come up…
5
DHT: Overview
• Abstraction: a distributed “hash-table” (DHT) data structure:– put(id, item);– item = get(id);
• Scalable, Decentralized, Fault Tolerant• Implementation: nodes in system form a
distributed data structure– Can be Ring, Tree, Hypercube, Skip List, Butterfly
Network, ...
7
DHT: Overview (3)
• Good properties: – Distributed construction/maintenance– Load-balanced with uniform identifiers– O(log n) hops / neighbors per node– Provides underlying network proximity
9
Consistent Hashing
• When adding rows (servers) to hash-table, we don’t want all keys to change their mappings
• When adding the Nth row, we want ~1/N of the keys to change their mappings.
• Is this achievable? Yes.
10
Chord: Overview
• Just one operation: item = get(id)• Each node needs routing info about few other nodes• O(logN) for lookup, O(log2N) for join/leave• Simple, provable correctness, provable performance• Apps built on top of it do the rest
14
Chord: Geometry
• Identifier space [1,N], example: binary strings• Keys (filenames) and values (server IPs)
on the same identifier space• Keys & values evenly distributed• Now, put this identifier space on a circle• Consistent Hashing:
A key is stored at its successor.
15
Chord: Geometry (2)
• A key is stored at its successor: node with next higher ID
16
N32
N90
N105
K80
K20
K5
Circular ID space
Key 5Node 105
• Get(5)=32• Get(20)=32• Get(80)=90• Who maps to 105? Nobody.
Chord: Back to Consistent Hashing• “When adding the Nth row, we want ~1/N of the keys to
change their mappings.” (The problem, a few slides back)
17
N32
N90
N105
K80
K20
K5
Circular ID space
Key 5Node 105
• Get(5)=32• Get(20)=32• Get(80)=90• Who maps to 105? Nobody.
N50
N15
Chord: Basic Lookup
19
N32
N90
N105
N60
N10N120
K80
“Where is key 80”?
“N90 has K80”
K80
• Each node remembers only next node
• O(N) lookup time – no good!
get(k):If (I have k) Return “ME”Else P next node Return P.get(k)
Chord: “Finger Table”• Previous lookup was O(N). We want O(logN)
20
N80
1/21/4
1/8
1/161/321/641/128
• Entry i in the finger table of node n is the first node n’ such that n’ ≥ n + 2i
• In other words, the ith finger of n points 1/2n-i way around the ring
i id+2i succ0 80+20 = 81 __1 82+21 = 82 __2 84+22 = 84 __
Finger Table
Chord: “Finger Table” Lookups
21
N80
1/21/4
1/8
1/161/321/641/128
• Entry i in the finger table of node n is the first node n’ such that n’ ≥ n + 2i
• In other words, the ith finger of n points 1/2n-i way around the ring
i id+2i succ0 80+20 = 81 __1 82+21 = 82 __2 84+22 = 84 __
Finger Table
22
N65
N74
N81
N90
N2
N9
N19
N31N49
K40
K40“Where is key
40”?
i id+2i succ0 65+20 = 66 N741 65+21 = 67 N746 65+26 = 29 N19
Finger Table
i id+2i succ0 20 N311 21 N314 35 N49
Finger Table
“40”!
get(k):If (I have k) Return “ME”Else P Closest finger i ≤ k Return P.get(k)
Chord: “Finger Table” Lookups
Chord: Example
23
• Assume an identifier space [0..8]
• Node n1 joins• Responsible for all keys
• (Succ == successor)
01
2
34
5
6
7
i id+2i succ0 1+20 = 2 11 1+21 = 3 12 1+22 = 5 1
Succ. Table
Chord: Example
25
• Node n0, n6 join
01
2
34
5
6
7
Succ. Table
Succ. Table
i id+2i succ0 1 11 2 22 4 0
Succ. Table
i id+2i succ0 7 01 0 02 2 2
Succ. Table
Chord: Example
26
• Nodes: n1, n2, n0, n6
• Items: 1,7 01
2
34
5
6
7
Succ. Table
Succ. Table
i id+2i succ0 1 11 2 22 4 0
Succ. Table
7
Items 1
Items
i id+2i succ0 7 01 0 02 2 2
Succ. Table
Chord: Routing
27
Upon receiving a query for item id, a node:
1. Checks if it stores the item locally
2. If not, forwards query tolargest node i in its fingertable such that i ≤ id 0
1
2
34
5
6
7 i id+2i succ0 2 21 3 62 5 6
Succ. Table
i id+2i succ0 3 61 4 62 6 6
Succ. Table
i id+2i succ0 1 11 2 22 4 0
Succ. Table
7
Items
1
Items
i id+2i succ0 7 01 0 02 2 2
Succ. Table
query(7)
Chord: Node Join
Node n joins:Need one existing node - n', in hand1.Initialize fingers of n
– Ask n' to look them up (logN fingers to init)
2.Update fingers of the rest – Few nodes need to be updated– Look them up and tell them n is new in town
• Transfer keys
29
Chord: Improvements
• Every 30s, ask successor for its predecessor– Fix your own successor based on this
• Also, pick and verify a random finger– Rebuild finger table entries this way
• keep successor list of r successors– Deal with unexpected node failures– Can use these to replicate data
30
Chord: Performance
32
• Routing table size?–Log N fingers
• Routing time?–Each hop expects to half the distance to the
desired id => expect O(log N) hops.
• Node joins–Query for the fingers => O(log N)–Update other nodes’ fingers => O(log2 N)
Chord: Performance (4)
35
• Promises few O(logN) hops on the overlay– But, on the physical network, this can be quite far
ff
A Chord network with N(=8) nodes and m(=8)-bit key space
Applications employing DHTs• eMule (KAD implements Kademlia - a DHT)• A anonymous network (≥ 2 mil downloads to day)
• BitTorrent (≥ 4.1.2 beta)– Trackerless BitTorrent, allows anonymity (thank god)
1. Clients A & B handshake2. A: “I have DHT, its on port X”3. B: ping port X of A4. B gets a reply => start adjusting - nodes, rows…
37
Kademlia (KAD)
• Distance between A and B is A XOR B• Nodes are treated as leafs in binary tree• Node’s position in A’s tree is determined by the
longest postfix it shares with A– A’s ID: 010010101– B’s ID: 101000101
38
Kademlia: Postfix Tree• Node’s position in A’s tree is determined by the
longest postfix it shares with A (=> logN subtrees)
39 Node / PeerOur node
common prefix: 001
common prefix: 00
common prefix: 0No common prefix
11…11 00…00
1
1
1
1
1
1
11 1
1 1
1
1
11
1
0
0
0
0
0
0
0
0
0
0
0 0
0
0
0
0
0
Space of 160-bit numbers
Kademlia: Lookup
40
11…11 00…00
1
1
1
1
1
1
11 1
1 1
1
1
11
1
0
0
0
0
0
0
0
0
0
0
0 0
0
0
0
0
0
•Consider a query for ID 111010… initiated by node 0011100…
Node / PeerOur node
Kademlia: K-Buckets11…11 00…00
1
1
1
1
1
1
11 1
1 1
1
1
11
1
0
0
0
0
0
0
0
0
0
0
0 0
0
0
0
0
0
`
Its binary tree is divided into a series of subtreesConsider routing table for a node with prefix 0011
A contact consist of <IP:Port, NodeID>
The routing table is composed of a k-bucket s corresponding to each of these subtreesConsider a 2-bucket example, each bucket will have atleast 2 contacts for its key range
Node / PeerOur node
Summary
42
1. The Problem
2. Distributed hash tables
(DHT)
3. Chord: a DHT scheme• Geometry• Lookup• Node Joins• Performance
4. Extras
Homework
• Load balance is achieved when all Servers in the Chord network are responsible for (roughly) the same amount of keys
• Still, with some probability, one server can be responsible for significantly more keys
• How can we lower the upper bound to the number of keys assigned to a server?
• Hint: Simulation
43