Informed Content Delivery Across
Adaptive Overlay Networks
John ByersDept. of Computer Science, Boston
Universitywww.cs.bu.edu/~byers
Joint work with Jeffrey Considine, Michael Mitzenmacher and Stanislav Rost
Overlays for Content Delivery
• Build distribution topology out of unicast connections (tunnels).
• Requires active participation of end-systems.
• Native IP multicast unnecessary.• Saves considerable bandwidth
over N * unicast solution.• Basic paradigm easy to build
and deploy.
• Bonus: Overlay topology can adaptadapt to network
conditions by self-reconfiguration.
SOURCE
• Killer apps: Millions of users want to download a new movie
watch the SIGCOMM technical sessions. CDNs want to populate thousands of servers with
new movies for those users.• Research directions to date:
Considerable effort on optimizing overlay layout (Narada, Overcast, RON, etc.).
Scalable solutions for indexing/locating content using overlays (CAN, Chord, etc.).
• Our focus: Maximize throughput of large transfers across
overlays.
Use of Overlays
Limitations of Existing Schemes
• Tree-like topologies Rooted in history (IP Multicast) Limitations:
• bandwidth decreases monotonically from the source• losses increase monotonically along a path
• Does this matter in practice? Anecdotal and experimental evidence says yesyes:
• Downloads from multiple mirror sites in parallel[BLM ’99, RKB ’00]
• Availability of better routes [SCHSA ’99, ABKM ’01]. • Peer-to-peer: Morpheus, Kazaa and Grokster.
An Illustrative Example
1. A basic tree topology.
1
2. Harnessing the power of parallel downloads.
2
3. Incorporating collaborative transfers.
3
Our Philosophy
• Go beyond trees. Use additional links and bandwidth by:
• downloading from multiple peers in parallelin parallel• taking advantage of “perpendicularperpendicular” bandwidth
Has potential to significantly speed up downloads…
• But only effective if: collaboration is carefully orchestratedcarefully orchestrated methods are amenable to frequent frequent
adaptationadaptation of the overlay topology
Suitable Applications
• Prerequisite conditions: Available bandwidth between peers. Differences in content received by peers. Rich overlay topology.
• Applications Downloads of large, popular files. Video-on-demand or nearly real-time
streams. Shared virtual environments.
Erasure Codes
• We typically think of data as an ordered stream. I need packets 1-1,000.I need packets 1-1,000.
• Using erasure codes, data is like water: Can generate a pool of redundant data from
full original content. You don’t care what droplets you get. You don’t care if some spills. You just want enough to get through the
pipe. I need any 1,000 packets.I need any 1,000 packets.
• The digital fountain model [BLMR ’98] is ideal for use in a fluid overlay environment.
Erasure Codes Offer Freedom
• Intrinsic resilience to packet loss, reordering.• Better support for transient connections via
stateless migration, suspension.• Peers with full content can always generate
useful symbols.• Peers with partial content are more likely to
have content to share.
• ButBut using erasure codes comes at a price: Content is no longer an ordered stream. Therefore, collaboration is more difficult.
Informed Content Delivery:Definitions and Problem Statement
• Peers A and B have working setsworking sets of symbols SSAA, SSBB drawn from a large universe UU and want to collaborate effectively.
• Key components:1)1) SummarizeSummarize: Furnish a concise and useful
sample of a working set to a peer.2)2) Approximately ReconcileApproximately Reconcile: Compute as
many elements in SSAA - S - SBB as possible and transmit them.
• Do so with minimal control messaging overhead.
Min-Wise Summaries
Problem: Neighboring peers may have similar content.
Solution: Give peers a “calling card” (fits in 1 packet) to summarize the content they have, check similarity.
Recoding
Problem: What to transmit when peers have similar content?
Solution: Allow peers to probabilistically “hedge their bets,” minimizing chance of transmission of useless content.
Example:
Suppose the resemblance between SSAA and SSBB is 0.9.
If AA sends a symbol at random the probability of it being useful to B B is 0.1.
A better strategy is to XOR 10 random symbols together.
B B can extract one useful symbol with probability:
10 x (1/10) x (9/10)9 > 1/e 0.37
Approximate Reconciliation Trees
Problem: Collaborating peers have overlapping content.
Solution: Efficient data structures for reconciliation.
Experimental Scenarios
• Three methods for collaboration UninformedUninformed: A transmits symbols at random to B. SpeculativeSpeculative:
B transmits a minwise summary to A; A then sends recoded symbols to B.
ReconciledReconciled: B transmits a digest of its set to A; A then sends packets from the set difference.
• Overhead:
Decoding overhead: with erasure codes, fixed 2.5%.
Reception overhead: useless duplicate packets. Recoding overhead: useless recoding packets.
symbols received - symbols needed
symbols needed
Pairwise Reconciliation
Containment of B in A:|SA SB|
|SB|
128MB file96K input symbols
115K distinct symbolsin system initially
Four peers in parallel
128MB file96K input symbols
105K distinct symbolsin system initially
Containment of B in A:|SA SB|
|SB|
Four peers, periodic updates
128MB file96K input symbols
105K distinct symbolsin system initially
Digests updated at every 10%.
Containment of B in A:|SA SB|
|SB|
Conclusions
• Even with ultimate routing topology optimization, the choice of whatwhat to send is paramount to content delivery.
• Digital fountain model ideal for fluid and ephemeral network environments.
• Richly connected topologies are key to harnessing perpendicular bandwidth.
• Wanted: more algorithms for intelligent collaboration.