Resource Allocation in OpenHash: a Public DHT Service Sean Rhea with Brad Karp, Sylvia Ratnasamy,...

Resource Allocation in OpenHash: a Public DHT Service

Sean Rhea

with Brad Karp, Sylvia Ratnasamy,

Scott Shenker, Ion Stoica, and Harlan Yu

Introduction

• In building OceanStore, worked on Tapestry– Found Tapestry problem harder than expected– Main problem: handling churn

• Built Bamboo to be churn-resilient from start– Working by 6/2003– Rejected from NSDI in 9/2003– Released code in 12/2003, about 10 groups using– Accepted to USENIX, 6/2004

Introduction (con’t.)

• Intended to Bamboo to be general, reusable– Supports Common API for DHTs– Tens of DHT applications proposed in literature– Still very few in common use--why?

• One possible barrier: deployment– Need access to machines, not everyone on PL– Must monitor, restart individual processes– Takes about an hour/day minimum right now

Simple DHT Applications

• Many uses of DHTs very simple: just put/get– Don’t use Common API [Dabek et al.]

– No routing, no upcalls, etc.

• Examples:– Dynamic DNS

– FreeDB

• In general: use DHT as highly available cache or rendezvous service

• Should be able to share a single DHT deployment

Sophisticated DHT Applications

• Other functionality of DHTs is lookup– Map identifiers to application nodes efficiently– Used by most sophisticated applications

• i3, OceanStore, SplitStream

• Can implement lookup on put/get– Algorithm called ReDiR, IPTPS paper this year

• Sophisticated applications could also share a single DHT deployment

OpenHash: a Public DHT Service• Idea: Public DHT to amortize deployment effort

– Very low barrier to entry for simple applications– Amortize bandwidth cost for sophisticated apps

• Challenges– Economics– Security– Resource Allocation

Overview

• Introduction

• OpenHash interface, assumptions

• Resource allocation– Goals/Problem formalization– Rate-limiting puts– Fair sharing

• Discussion

OpenHash Interface/Assumptions• Want to keep things simple for clients

– Remember goal: low barrier to entry

• Simple put/get– put (key, value, time-to-live)– get (key)

• Service contract: – Puts accepted/rejected immediately (not queued)– Once accepted, put values available for whole TTL

• Predictable, zero-effort availability for clients

– After that, will be thrown out by DHT• Easy garbage collection, also valuable for some apps

Resource Allocation Introduction• Problem: disk space is limited

– If service popular, may exhaust– Malicious clients might exhaust on purpose

• Rough goal: every client gets fair share of store– Ideally, algorithm should be work-conserving

• Example:– Three clients: A, B, and C; 10 GB of total space– A and B want 1 GB each, C wants 20 GB– A and B should get 1 GB each; C should get 8 GB

Problem Simplification

• For now, shares calculated per-DHT node– Global fair sharing saved for future work

• Clients that balance puts won’t notice a problem– Most DHT applications already balance puts– Apps that can choose their keys can do even better

• Side benefit: encourages balancing puts– Mitigates need for load balancing in DHT

• Let the users handle load balancing

– Easier for us to implement!

Problem Formalization (First Try)

• C - total available storage• si - storage desired by client i, S = si

• sfair - fair share such that

C = min(si, sfair)• gi - storage granted to client i, G - gi

• Goals– Fairness: i gi = min(si, sfair)– Utilization: G = min(C, S)

Problem Formulation (Second Try)• Previous version didn’t account for time

– Can only remove stored values as TTLs expire– As such, can only adapt so quickly– Before accepting one put, another must expire

• Add goal: always accept puts at rate R– Prefer puts from underrepresented clients– Intuition: R bounds time it takes to correct unfairness

• New questions:– How to guarantee space frees up at rate R?– How to divide R among clients?

Overview

• Introduction



• Discussion

Accepting At Rate R• S(t) - total data stored at time t

• A(t1, t2) - data added to system in [t1, t2)

• D(t1, t2) - data freed in [t1, t2)

• For adaptivity, need: A(t, t+∆t) R ∆t

• Capacity limit: S(t) + A(t, t+∆t) - D(t, t+∆t) C

– Rearrange: C + D(t, t+∆t) - S(t) A(t, t+∆t)• Combined with top eqn: C + D(t, t+∆t) - S(t) R ∆t

– Rearrange: D(t, t+∆t) R ∆t - C + S(t)• Result: can accept any put that won’t make us violate this

equation at any point in the future

Implementing Rate Limiting• Before accepting put, must check D(t, t+∆t)

– Can we check this efficiently?

• Easy, assuming all puts have same TTL– Can implement using a virtual “pipe”– Pipe is TTL long, total capacity C– New puts go into pipe, expire on exit– Can easily show pipe is optimal for this case

• With varying TTLs, problem harder– Puts with short TTLs expire in middle of pipe– Bin-packing problem on new puts: find latest spot

in pipe that satisfies desired size and TTL

Overview

• Introduction



• Discussion

Choosing Puts for Fair Sharing

• Assume can accept new puts at rate R– How do we divide it up between clients?

• Unlike fair queuing, two competing goals:1. Want to make decisions (put/reject) quickly

– In FQ, may queue for a long time before fowarding

2. Suffer consequences of decisions for full TTL– In FQ, only interested in fairness over short window

• But one big advantage: long history– Remember all puts whose TTLs haven’t expired

The Rate-Based Approach• Accept based on recent put rates

– Already storing all puts, so also store rates– (Could estimate these as in Approx. Fair Drop.)– Basically, fair share the input rate R

• Pros:– Easy to implement– If all clients put at uniform rates, gives fair stores

• Cons:– To get fair share, must put at uniform rate– What about bursty clients (avg. rate << max. rate)?

The Storage-Based Approach

• Accept puts based on amount of storage used– Keep counters of storage used by each client– Prefer new puts from clients with less data on disk

• Pros:– Also easy to implement– Gives fair stores regardless of uniformity of client put rates

• Cons:– Over-represented clients block on under-represented ones– Could be very disruptive as new clients enter system

The Commitment-Based Approach

• Base fairness around “commitments”– How many bytes stored for how much more time– New bytes entail more future commitment than old

• Pros:– Better at bursts than rate-based approach– Better at not blocking over-represented clients than

storage-based approach

• Cons:– Hard to think about in detail, hard to implement?

Related Work

• Various fair queuing techniques– Standard FQ– Approximate Fair Dropping– CSFQ

• Other DHT work– Palimpsest

• Other networking work– Internet backplane

Discussion• What is the optimal rate limiting algorithm?

– How close to our various schemes come to it?

• What’s the right model for sharing?– Rate-based approach?– Storage-based approach?– Commitment-based approach?– Some hybrid?– Lottery Scheduling?

• What other models make sense?– Palimpsest?

Date post:	20-Dec-2015
Category:	Documents
View:	214 times
Download:	0 times

Resource Allocation in OpenHash: a Public DHT Service Sean Rhea with Brad Karp, Sylvia Ratnasamy,...

Documents