+ All Categories
Home > Documents > 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical...

1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical...

Date post: 18-Dec-2015
Category:
View: 215 times
Download: 2 times
Share this document with a friend
Popular Tags:
43
1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley, CA 94720-1776 ntation based on slides from Robert Morris and Sean Rhea)
Transcript
Page 1: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

1

CS 268: Lecture 22 DHT Applications

Ion StoicaComputer Science Division

Department of Electrical Engineering and Computer SciencesUniversity of California, Berkeley

Berkeley, CA 94720-1776

(Presentation based on slides from Robert Morris and Sean Rhea)

Page 2: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

2

Outline

Cooperative File System (CFS) Open DHT

Page 3: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

3

Target CFS Uses

Serving data with inexpensive hosts:- open-source distributions

- off-site backups

- tech report archive

- efficient sharing of music

node

nodenode

node

Internet

node

Page 4: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

4

How to mirror open-source distributions?

Multiple independent distributions- Each has high peak load, low average

Individual servers are wasteful

Solution: aggregate- Option 1: single powerful server

- Option 2: distributed service

• But how do you find the data?

Page 5: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

5

Design Challenges

Avoid hot spots Spread storage burden evenly Tolerate unreliable participants Fetch speed comparable to whole-file TCP Avoid O(#participants) algorithms

- Centralized mechanisms [Napster], broadcasts [Gnutella]

CFS solves these challenges

Page 6: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

6

CFS Architecture

Each node is a client and a server Clients can support different interfaces

- File system interface

- Music key-word search

node

client server

node

clientserverInternet

Page 7: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

7

Client-server interface

Files have unique names Files are read-only (single writer, many readers) Publishers split files into blocks Clients check files for authenticity

FS Client serverInsert file f

Lookup file f

Insert block

Lookup block

node

server

node

Page 8: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

8

Server Structure

• DHash stores, balances, replicates, caches blocks

• DHash uses Chord [SIGCOMM 2001] to locate blocks

DHash

Chord

Node 1 Node 2

DHash

Chord

Page 9: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

9

Chord Hashes a Block ID to its Successor

N32

N10

N100

N80

N60

CircularID Space

• Nodes and blocks have randomly distributed IDs• Successor: node with next highest ID

B33, B40, B52

B11, B30

B112, B120, …, B10

B65, B70

B100

Block ID Node ID

Page 10: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

10

DHash/Chord Interface

lookup() returns list with node IDs closer in ID space to block ID

- Sorted, closest first

server

DHash

Chord

Lookup(blockID) List of <node-ID, IP address>

finger table with <node IDs, IP address>

Page 11: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

11

DHash Uses Other Nodes to Locate Blocks

N40

N10

N5

N20

N110

N99

N80 N50

N60N68

Lookup(BlockID=45)

1.

2.

3.

Page 12: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

12

Storing Blocks

Long-term blocks are stored for a fixed time

- Publishers need to refresh periodically Cache uses LRU

disk: cache Long-term block storage

Page 13: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

13

Replicate blocks at r successors

N40

N10

N5

N20

N110

N99

N80

N60

N50

Block17

N68

• Node IDs are SHA-1 of IP Address• Ensures independent replica failure

Page 14: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

14

Lookups find replicas

N40

N10

N5

N20

N110

N99

N80

N60

N50

Block17

N68

1.3.

2.

4.

Lookup(BlockID=17)

RPCs:1. Lookup step2. Get successor list3. Failed block fetch4. Block fetch

Page 15: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

15

First Live Successor Manages Replicas

N40

N10

N5

N20

N110

N99

N80

N60

N50

Block17

N68

Copy of17

• Node can locally determine that it is the first live successor

Page 16: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

16

DHash Copies to Caches Along Lookup Path

N40

N10

N5

N20

N110

N99

N80

N60

Lookup(BlockID=45)

N50

N68

1.

2.

3.

4.RPCs:1. Chord lookup2. Chord lookup3. Block fetch4. Send to cache

Page 17: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

17

Caching at Fingers Limits Load

N32

• Only O(log N) nodes have fingers pointing to N32• This limits the single-block load on N32

Page 18: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

18

Virtual Nodes Allow Heterogeneity

Hosts may differ in disk/net capacity Hosts may advertise multiple IDs

- Chosen as SHA-1(IP Address, index)

- Each ID represents a “virtual node” Host load proportional to # v.n.’s Manually controlled

Node A

N60N10 N101

Node B

N5

Page 19: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

19

Why Blocks Instead of Files?

Cost: one lookup per block- Can tailor cost by choosing good block size

Benefit: load balance is simple- For large files

- Storage cost of large files is spread out

- Popular files are served in parallel

Page 20: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

20

Outline

Cooperative File System (CFS) Open DHT

Page 21: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

21

Questions:

How many DHTs will there be?

Can all applications share one DHT?

Page 22: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

22

Benefits of Sharing a DHT

Amortizes costs across applications- Maintenance bandwidth, connection state, etc.

Facilitates “bootstrapping” of new applications- Working infrastructure already in place

Allows for statistical multiplexing of resources- Takes advantage of spare storage and bandwidth

Facilitates upgrading existing applications- “Share” DHT between application versions

Page 23: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

23

The DHT as a Service

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

Page 24: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

24

The DHT as a Service

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V OpenDHT

Page 25: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

25

The DHT as a Service

OpenDHT Clients

Page 26: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

26

The DHT as a Service

OpenDHT

Page 27: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

27

The DHT as a Service

OpenDHT

What is this interface?

Page 28: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

28

It’s not lookup()

lookup(k)

k

What does this node do with it?

Challenges:1. Distribution2. Security

Page 29: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

29

How are DHTs Used?1. Storage

- CFS, UsenetDHT, PKI, etc.

2. Rendezvous- Simple: Chat, Instant Messenger

- Load balanced: i3

- Multicast: RSS Aggregation, White Board

- Anycast: Tapestry, Coral

Page 30: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

30

What about put/get?

Works easily for storage applications

Easy to share- No upcalls, so no code distribution or security complications

But does it work for rendezvous?- Chat? Sure: put(my-name, my-IP)

- What about the others?

Page 31: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

31

Protecting Against Overuse

Must protect system resources against overuse- Resources include network, CPU, and disk

- Network and CPU straightforward

- Disk harder: usage persists long after requests

Hard to distinguish malice from eager usage- Don’t want to hurt eager users if utilization low

Number of active users changes over time- Quotas are inappropriate

Page 32: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

32

Fair Storage Allocation

Our solution: give each client a fair share- Will define “fairness” in a few slides

Limits strength of malicious clients- Only as powerful as they are numerous

Protect storage on each DHT node separately- Must protect each subrange of the key space

- Rewards clients that balance their key choices

Page 33: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

33

The Problem of Starvation

Fair shares change over time- Decrease as system load increases

time

Client 1 arrivesfills 50% of disk

Client 2 arrivesfills 40% of disk

Client 3 arrivesmax share = 10%

Starvation!

Page 34: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

34

Preventing Starvation

Simple fix: add time-to-live (TTL) to puts- put (key, value) put (key, value, ttl)

Prevents long-term starvation- Eventually all puts will expire

Page 35: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

35

Preventing Starvation

Simple fix: add time-to-live (TTL) to puts- put (key, value) put (key, value, ttl)

Prevents long-term starvation- Eventually all puts will expire

Can still get short term starvation

time

Client A arrivesfills entire of disk

Client B arrivesasks for space

Client A’s valuesstart expiring

B Starves

Page 36: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

36

Preventing Starvation

Stronger condition:Be able to accept rmin bytes/sec new data at all times

This is non-trivial to arrange!

Reserved for futureputs. Slope = rmin

Candidate put

TTL

size

Sum must be < max capacity

time

space

max

max0now

Page 37: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

37

Preventing Starvation

Stronger condition:Be able to accept rmin bytes/sec new data at all times

This is non-trivial to arrange!

TTL

size

time

space

max

max0now

TTLsize

time

space

max

max0now

Violation!

Page 38: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

38

Preventing Starvation

Formalize graphical intuition:

f() = B(tnow) - D(tnow, tnow+ ) + rmin • D(tnow, tnow+ ): aggregate size of puts expiring in the

interval (tnow, tnow+ )

To accept put of size x and TTL l:

f() + x < C for all 0 ≤ < l

Can track the value of f efficiently with a tree- Leaves represent inflection points of f

- Add put, shift time are O(log n), n = # of puts

Page 39: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

39

Fair Storage Allocation

Per-clientput queues

Queue full:reject put

Not full:enqueue put

Select mostunder-

represented

Wait until canaccept withoutviolating rmin

Store andsend accept

message to client

The Big Decision: Definition of “most under-represented”

Page 40: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

40

Defining “Most Under-Represented”

Not just sharing disk, but disk over time- 1 byte put for 100s same as 100 byte put for 1s

- So units are bytes seconds, call them commitments

Equalize total commitments granted?- No: leads to starvation

- A fills disk, B starts putting, A starves up to max TTL

time

Client A arrivesfills entire of disk

Client B arrivesasks for space

B catches up with A

Now A Starves!

Page 41: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

41

Defining “Most Under-Represented”

Instead, equalize rate of commitments granted- Service granted to one client depends only on others putting “at same

time”

time

Client A arrivesfills entire of disk

Client B arrivesasks for space

B catches up with A

A & B shareavailable rate

Page 42: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

42

Defining “Most Under-Represented”

Instead, equalize rate of commitments granted- Service granted to one client depends only on others putting “at same

time”

Mechanism inspired by Start-time Fair Queuing- Have virtual time, v(t)

- Each put gets a start time S(pci) and finish time F(pc

i)

F(pci) = S(pc

i) + size(pci) ttl(pc

i)

S(pci) = max(v(A(pc

i)) - , F(pci-1))

v(t) = maximum start time of all accepted puts

Page 43: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

43

FST Performance


Recommended