Peer To PeerDistributed Systems
Pete Keleher
Why Distributed Systems? Aggregate resources!
– memory– disk– CPU cycles
Proximity to physical stuff– things with sensors– things that print– things that go boom– other people
Fault tolerance!– Don’t want one tsunami to take everything down
Why Peer To Peer Systems?
What’s peer to peer?
(Traditional) Client-Server
Server Clients
Peer To Peer
– Lots of reasonable machines• No one machine loaded more than others• No one machine irreplacable!
Peer-to-Peer (P2P) Where do the machines come from?
– “found” resources• SETI @ home• BOINC
– existing resources• computing “clusters” (32, 64, ….)
What good is a peer to peer system?– all those things mentioned before, including
Storage: files, MP3’s, leaked documents, porn …
The lookup problem
Internet
N1N2 N3
N6N5N4
Publisher
Key=“title”Value=MP3 data… Client
Lookup(“title”)?
Centralized lookup (Napster)
Publisher@Client
Lookup(“title”)
N6
N9 N7
DB
N8
N3
N2N1SetLoc(“title”, N4)
Simple, but O(N) states and a single point of failure
Key=“title”Value=MP3 data…
N4
Flooded queries (Gnutella)
N4Publisher@Client
N6
N9
N7 N8
N3
N2N1
Robust, but worst case O(N) messages per lookup
Key=“title”Value=MP3 data…
Lookup(“title”)
Routed queries (Freenet, Chord, etc.)
N4PublisherClient
N6
N9
N7 N8
N3
N2N1
Lookup(“title”)Key=“title”Value=MP3 data…
Bad load balance.
Routing challenges Define a useful key nearness metric.
Keep the hop count small.– O(log N)
Keep the routing tables small.– O(log N)
Stay robust despite rapid changes.
Distributed Hash Tables to the Rescue!
Load Balance: Distributed hash function spreads keys evenly over the nodes (Consistent hashing).
Decentralization: Fully distributed (Robustness).
Scalability: Lookup grows as a log of number of nodes.
Availability: Automatically adjusts internal tables to reflect changes.
Flexible Naming: No constraints on key structure.
What’s a Hash? Wikipedia: any well-defined procedure or mathematical
function that converts a large, possibly variable-sized amount of data into a small datum, usually a single integer
Example: Assume: N is a large prime ‘a’ means the ASCII code for the letter ‘a’ (it’s 97)
H(“pete”) == (H(“pe”) x N + ‘t’) x N + ‘e’= (H(“pe”) x N + ‘t’) x N + ‘e’= 451845518507
H(“pet”) x N + ‘e’ H(“pete”) mod 1000 = 507H(“peter”) mod 1000 = 131H(“petf”) mod 1000 = 986
It’s a deterministic random number generator!
Chord (a DHT) m-bit identifier space for both keys and nodes.
Key identifier = SHA-1(key).
Node identifier = SHA-1(IP address).
Both are uniformly distributed.
How to map key IDs to node IDs?
Consistent hashing [Karger 97]
N32
N90
N105
K80
K20
K5
Circular 7-bitID space
Key 5Node 105
A key is stored at its successor: node with next higher ID
Basic lookup
N32
N90
N105
N60
N10N120
K80
“ Where is key 80?”
“ N90 has K80”
Basic lookup
N32
N90
N105
N60
N10N120
K80
“ Where is key 80?”
“ N90 has K80”
Basic lookup
N32
N90
N105
N60
N10N120
K80
“ Where is key 80?”
“ N90 has K80”
Basic lookup
N32
N90
N105
N60
N10N120
K80
“ Where is key 80?”
“ N90 has K80”
Basic lookup
N32
N90
N105
N60
N10N120
K80
“ Where is key 80?”
“ N90 has K80”
“ Finger table” allows log(N)-time lookups
N80
½¼
1/8
1/161/321/641/128
Every node knows m other nodes in the ring
Finger i points to successor of n+2i-1
N80
½¼
1/8
1/161/321/641/128
112N120
Each node knows more about portion of circle close to it
Lookups take O(log(N)) hops
N32
N10N5
N20N110
N99
N80
N60
Lookup(K19)
K19
Lookups take O(log(N)) hops
N32
N10N5
N20N110
N99
N80
N60
Lookup(K19)
K19
Lookups take O(log(N)) hops
N32
N10N5
N20N110
N99
N80
N60
Lookup(K19)
K19
Lookups take O(log(N)) hops
N32
N10N5
N20N110
N99
N80
N60
Lookup(K19)
K19
Lookups take O(log(N)) hops
N32
N10N5
N20N110
N99
N80
N60
Lookup(K19)
K19
Joining: linked list insert
N36
N40
N25
1. Lookup(36) K30K38
1. Each node’s successor is correctly maintained.2. For every key k, node successor(k) is responsible for k.
Join (2)
N36
N40
N25
2. N36 sets its ownsuccessor pointer
K30K38
Initialize the new node finger table
Join (3)
N36
N40
N25
3. Set N25’s successorpointer
Update finger pointers of existing nodes
K30K38
Join (4)
N36
N40
N25
4. Copy keys 26..36from N40 to N36
K38
K30
Transferring keys
Stabilization Protocol To handle concurrent node joins/fails/leaves.
Keep successor pointers up to date, then verify and correct finger table entries.
Incorrect finger pointers may only increase latency, but incorrect successor pointers may cause lookup failure.
Nodes periodically run stabilization protocol.
Won’t correct a Chord system that has split into multiple disjoint cycles, or a single cycle that loops multiple times around the identifier space.
Take Home Points Hash used to uniformly distribute data, nodes
across a range.
Random distribution balances load.
Awesome systems paper:– identify commonality across algorithms– restrict work to implementing that one simple
abstraction– use as building block