Date post: | 20-Dec-2015 |
Category: |
Documents |
View: | 214 times |
Download: | 0 times |
Resource Allocation in OpenHash: a Public DHT Service
Sean Rhea
with Brad Karp, Sylvia Ratnasamy,
Scott Shenker, Ion Stoica, and Harlan Yu
Introduction
• In building OceanStore, worked on Tapestry– Found Tapestry problem harder than expected– Main problem: handling churn
• Built Bamboo to be churn-resilient from start– Working by 6/2003– Rejected from NSDI in 9/2003– Released code in 12/2003, about 10 groups using– Accepted to USENIX, 6/2004
Introduction (con’t.)
• Intended to Bamboo to be general, reusable– Supports Common API for DHTs– Tens of DHT applications proposed in literature– Still very few in common use--why?
• One possible barrier: deployment– Need access to machines, not everyone on PL– Must monitor, restart individual processes– Takes about an hour/day minimum right now
Simple DHT Applications
• Many uses of DHTs very simple: just put/get– Don’t use Common API [Dabek et al.]
– No routing, no upcalls, etc.
• Examples:– Dynamic DNS
– FreeDB
• In general: use DHT as highly available cache or rendezvous service
• Should be able to share a single DHT deployment
Sophisticated DHT Applications
• Other functionality of DHTs is lookup– Map identifiers to application nodes efficiently– Used by most sophisticated applications
• i3, OceanStore, SplitStream
• Can implement lookup on put/get– Algorithm called ReDiR, IPTPS paper this year
• Sophisticated applications could also share a single DHT deployment
OpenHash: a Public DHT Service• Idea: Public DHT to amortize deployment effort
– Very low barrier to entry for simple applications– Amortize bandwidth cost for sophisticated apps
• Challenges– Economics– Security– Resource Allocation
Overview
• Introduction
• OpenHash interface, assumptions
• Resource allocation– Goals/Problem formalization– Rate-limiting puts– Fair sharing
• Discussion
OpenHash Interface/Assumptions• Want to keep things simple for clients
– Remember goal: low barrier to entry
• Simple put/get– put (key, value, time-to-live)– get (key)
• Service contract: – Puts accepted/rejected immediately (not queued)– Once accepted, put values available for whole TTL
• Predictable, zero-effort availability for clients
– After that, will be thrown out by DHT• Easy garbage collection, also valuable for some apps
Resource Allocation Introduction• Problem: disk space is limited
– If service popular, may exhaust– Malicious clients might exhaust on purpose
• Rough goal: every client gets fair share of store– Ideally, algorithm should be work-conserving
• Example:– Three clients: A, B, and C; 10 GB of total space– A and B want 1 GB each, C wants 20 GB– A and B should get 1 GB each; C should get 8 GB
Problem Simplification
• For now, shares calculated per-DHT node– Global fair sharing saved for future work
• Clients that balance puts won’t notice a problem– Most DHT applications already balance puts– Apps that can choose their keys can do even better
• Side benefit: encourages balancing puts– Mitigates need for load balancing in DHT
• Let the users handle load balancing
– Easier for us to implement!
Problem Formalization (First Try)
• C - total available storage• si - storage desired by client i, S = si
• sfair - fair share such that
C = min(si, sfair)• gi - storage granted to client i, G - gi
• Goals– Fairness: i gi = min(si, sfair)– Utilization: G = min(C, S)
Problem Formulation (Second Try)• Previous version didn’t account for time
– Can only remove stored values as TTLs expire– As such, can only adapt so quickly– Before accepting one put, another must expire
• Add goal: always accept puts at rate R– Prefer puts from underrepresented clients– Intuition: R bounds time it takes to correct unfairness
• New questions:– How to guarantee space frees up at rate R?– How to divide R among clients?
Overview
• Introduction
• OpenHash interface, assumptions
• Resource allocation– Goals/Problem formalization– Rate-limiting puts– Fair sharing
• Discussion
Accepting At Rate R• S(t) - total data stored at time t
• A(t1, t2) - data added to system in [t1, t2)
• D(t1, t2) - data freed in [t1, t2)
• For adaptivity, need: A(t, t+∆t) R ∆t
• Capacity limit: S(t) + A(t, t+∆t) - D(t, t+∆t) C
– Rearrange: C + D(t, t+∆t) - S(t) A(t, t+∆t)• Combined with top eqn: C + D(t, t+∆t) - S(t) R ∆t
– Rearrange: D(t, t+∆t) R ∆t - C + S(t)• Result: can accept any put that won’t make us violate this
equation at any point in the future
Implementing Rate Limiting• Before accepting put, must check D(t, t+∆t)
– Can we check this efficiently?
• Easy, assuming all puts have same TTL– Can implement using a virtual “pipe”– Pipe is TTL long, total capacity C– New puts go into pipe, expire on exit– Can easily show pipe is optimal for this case
• With varying TTLs, problem harder– Puts with short TTLs expire in middle of pipe– Bin-packing problem on new puts: find latest spot
in pipe that satisfies desired size and TTL
Overview
• Introduction
• OpenHash interface, assumptions
• Resource allocation– Goals/Problem formalization– Rate-limiting puts– Fair sharing
• Discussion
Choosing Puts for Fair Sharing
• Assume can accept new puts at rate R– How do we divide it up between clients?
• Unlike fair queuing, two competing goals:1. Want to make decisions (put/reject) quickly
– In FQ, may queue for a long time before fowarding
2. Suffer consequences of decisions for full TTL– In FQ, only interested in fairness over short window
• But one big advantage: long history– Remember all puts whose TTLs haven’t expired
The Rate-Based Approach• Accept based on recent put rates
– Already storing all puts, so also store rates– (Could estimate these as in Approx. Fair Drop.)– Basically, fair share the input rate R
• Pros:– Easy to implement– If all clients put at uniform rates, gives fair stores
• Cons:– To get fair share, must put at uniform rate– What about bursty clients (avg. rate << max. rate)?
The Storage-Based Approach
• Accept puts based on amount of storage used– Keep counters of storage used by each client– Prefer new puts from clients with less data on disk
• Pros:– Also easy to implement– Gives fair stores regardless of uniformity of client put rates
• Cons:– Over-represented clients block on under-represented ones– Could be very disruptive as new clients enter system
The Commitment-Based Approach
• Base fairness around “commitments”– How many bytes stored for how much more time– New bytes entail more future commitment than old
• Pros:– Better at bursts than rate-based approach– Better at not blocking over-represented clients than
storage-based approach
• Cons:– Hard to think about in detail, hard to implement?
Related Work
• Various fair queuing techniques– Standard FQ– Approximate Fair Dropping– CSFQ
• Other DHT work– Palimpsest
• Other networking work– Internet backplane
Discussion• What is the optimal rate limiting algorithm?
– How close to our various schemes come to it?
• What’s the right model for sharing?– Rate-based approach?– Storage-based approach?– Commitment-based approach?– Some hybrid?– Lottery Scheduling?
• What other models make sense?– Palimpsest?