Date post: | 17-Jan-2016 |
Category: |
Documents |
Upload: | emory-clarence-simpson |
View: | 221 times |
Download: | 0 times |
Presented by Kelly Whitacre
Written by John W. Byers, Jeffrey Considine, Michael Mitzenmacher, Member, IEEE, and Stanislav Rost
Distributing a large new file across the Internet to millions of users simultaneously has proven to be challenging
Problem
2
Possible Solution: Point-to-Point?
Wasted Bandwidth Limited Transfer Rates
Having individual point-to-point connections from a single source wastes bandwidth Server must handle
load of possible many clients
Bandwidth costs money Server should utilize
available Bandwidth
Transfer rates are limited by the characteristics of the end-to-end paths
3
Possible Solution: IP Multicast?
Pros Cons
Solves bandwidth problems of point-to-point Server sends one copy Network handles the
rest
No flow control No retransmission of
lost packets Limited deployment
4
Reliable Multicast
Digital fountain approach Erasure codes—sends parity information with
packets to recover lost (no feedback channels are needed to ensure reliable delivery)
Recirculation—information is re-circulated (fountain) for asynchronous client arrivals
Parallel Transfer rates—heterogeneous client transfer rates so as to not flood network
5
Digital Fountain Approach
k
k
k Can recover filefrom any set of k encoding packets.
Source: http://www.sigcomm.org/sigcomm98/tp/abs_05.html
Digital Fountain Approach
Source: http://www.sigcomm.org/sigcomm98/tp/abs_05.html
Cyclic Interleaving
Source: http://www.sigcomm.org/sigcomm98/tp/abs_05.html
Solution: Adaptive Overlay Networks
9Source: http://www.cs.virginia.edu/~mngroup/hypercast/designdoc/Chp1-Overview/
Chp1-Overview.html
Adaptive Overlay NetworksDiffers from IP Multicast Do not use Multicast tree Flexibly adapt to changing network
conditions End systems are explicitly required to
collaborate! Can improve performance by additional
cross-connections and active collaboration
10
Addressing Limitations: Content Delivery Scenario
S = SourceShaded Area = each node has a working set of packets,the subset of packets it has received
Consider: Initial Delivery Tree
11
Addressing Limitations: Improving Transfer Rates
Establishing concurrent connections to multiple serversor peers with complete copies of the file
Harnessing the Power of Parallel Downloads
12
Tree Directed Acyclic Graph
Addressing Limitations: Improving Transfer Rates
Establishing concurrent connections to multiple peers
Harnessing the Power of Collaborative Transfer
13
Addressing Limitations: Improving Transfer Rates
(d) depicts the portions of content which can be beneficially exchanged via pair-wise transfers
Power of Cross-Connections & Collaboration
14
Considerations
1. (a) & (b) impede the full flow of content to downstream receivers
2. Opportunistic connections of (c) & (d) allow for higher transfer rates
• Yet, demand more careful orchestration between end systems
• Must determine set difference of working sets
3. Reconciliation is simple in working sets limited to small contiguous blocks
• Limits flexibility of frequent changes that arise in AON
15
Challenges
Stateful vs. Non-Stateful Solutions
Content Delivery Across Adaptive Overlay Networks
16
Adaptive Overlay Networks in a Fluid Internet
Challenges … Need to … Asynchrony
Receivers may open and close connections or leave and rejoin the infrastructure at arbitrary times
Heterogeneity Connections vary in speed
and loss rates Transience
Routers, links, and end systems may fail and their performance may fluctuate over time
Scalability The service must scale to
large receiver populations and large content
Adaptively detect and avoid congested or temporarily unstable areas of the network
Dynamically establish paths with the most desirable end-to-end characteristics
Deliver useful content, often in parallel with a minimum of setup overhead and message complexity
17
Limitations of Stateful Solutions
Addresses A significant per-connection state
Issues of connection Connections that vary
in speed and loss rates Clients coming and
going at arbitrary times
Is highly unscalable May impact
performance state must be
maintained in the face of reconfiguration and reconnection
With parallel downloading is problematic
18
Alternative: Encoded Content through Digital Fountain Approach Digital Fountain Approach
Resilience to packet loss—erasure-correcting code
Guarantee Claims : recover the original source file from
any subset of distinct symbols in the encoding stream equal to the size of the original file
In practice : recover a file from a few percent more than the number of symbols in the original file
19
Encoded Content through Digital Fountain Approach
Pros Continuous Encoding
Senders with a complete copy of a file may continuously produce fresh encoding symbols
Time Invariance New encoding symbols are produced independently from
symbols produced in the past Tolerance
Digital fountain streams are useful to all receivers regardless of the times of their connections or disconnections and their rates of sampling the stream
Additivity Parallel downloads from multiple servers with complete
copies of the content require no orchestration
20Stateless!
Encoded Content through Digital Fountain Approach
Cons
Encoding/Decoding Overhead Reconciliation methods are needed for those
collaborating end systems have only a portion of the content
21
• Coarse-grained reconciliation• Speculative transfers• Fine-grained reconciliation
22
Reconciliation and Informed Delivery
• Approaches proposed are local in scope and typically involve a pair or a small number of end systems
• Goal is to provide the most cost-effective reconciliation mechanisms measuring cost both in computation and message complexity
23
Note:
Coarse-Grained Reconciliation Estimate resemblance working sets of pairs of
nodes prior to establishing connections Quick estimates of the fraction of symbols common
to the working sets of both peers
Approach 1: Employs Random Sampling Approach 2: Employs sketches of each peer’s
working set High-level information Lightweight, computed efficiently Incrementally updated Fit into a single 1-kB packet
24
Notation & Framework
Let peers A and B have working sets SA and SB containing symbols from an encoding of the file
Containment The containment of B in A is the quantity
Resemblance The resemblance of A and B is the quantity
25
B)"(S"|)/"BS" " ?A S"(|
Notation & Framework
Each element of a working set is identified by an integer key (sending an element entails sending its key)
Keys are distributed over the key space uniformly at random
With 64-bit keys, a 1-kB packet can hold roughly 128 keys
Can be the same If the elements are determined by a hash function
seeded by the key, two keys may generate the same element with small probability
Minimal impact
26
Select elements of the working set at random and transport those to the peer.
27
Random Sampling
Random Sampling
Pros Cons
Unbiased estimate of containment
Can be incrementally updated using reservoir sampling
Must search its own working set for each element in random set
Do not easily allow one peer to check the resemblance between prospective peers A cannot check
resemblance between B & C
28
Calculates working set resemblance based on min-wise sketches
29
Min-Wise Sketches
Min-Wise Sketches
30
The result is an unbiased estimate of the resemblance
∏i represents a random permutation on the key universe
1.A sends B a vector of A’s minima (elements that lie in both sets)2.B Counts the number of positions where the two are equal3.Divides by the total number of permutations
Min-Wise Sketches
Pros Cons
Unbiased estimate of resemblance
Allows similarity comparisons given any two sketches for any two peers A can check resemblance
between B and C
Truly random permutations cannot be used Storage requirements are
impractical
Possibility of false positives ∏i values are hashed to
fewer bits to allow for more sketch elements in packet
(Details not discussed)
31
Speculative Transfers
Involve a sender performing “educated guesses” as to which symbols to generate and transfer
Send symbols which are probably useful to the other
This process can be fine-tuned using the results of coarse-grained reconciliation
32
Speculative Transfers
When containment of B in A is low, speculative transfers is trivial since most of B’s symbols are useful to A
When containment of B in A is high, strategy is inefficient—use recoding
33
Recoding
A recoding symbol is simply the bitwise XOR of a set of encoding symbols
Must be accompanied by a specification of the encoding symbols blended to create it
Must explicitly list the random seeds of the encoding symbols from which it was produced
34
Encoding/Decoding Recoding Symbols
Similar to the substitution rule Example—peers with y5, y8, y13
generate recoding symbols: Z1 = y13 Z2 = y5 XOR y8 Z3 = y5 XOR y13
Peer receives Z1, Z2, Z3 can recover y13 By substitution recover y5 & y8
35
Fine-grained Reconciliation Is a set-difference problem
Tries to determine the exact difference of SA - SB
Many approaches Polynomial-Based Enumeration-Based
Bloom filter
Search-Based Approximate Reconciliation Trees (ART) which
combine the compact representation of Bloom filters with the speed of a search-based approach
36
Bloom Filter
A set of n elements that represent the working set calculated by independent random hash functions
Flow1. Peer A sends B a Bloom filter FA of SA 2. Peer B then checks for each element of SB in FA
3. Peer B has determined SA - SB
This solution is effective particularly when the number of differences is a large fraction of the set size
37
Demonstrate the benefits and costs of using reconciliation in peer-to-peer transfers and in parallel downloads
38
Experimental Results
Simulation Parameters
All consider transfer of a 128-MB file Origin server
Divides this file into input symbols of 1400 bytes each (fit it in an Ethernet packet with headers)
Encodes this file into a large set of encoding symbols
Associate each encoding symbol with a 64-bit identifier representing the set of input symbols used to produce it
Min-wise sketches used 180 permutations, yielding 180 entries of 64 bits each for a total of 1440 bytes per summary
Bloom filters used 6 hash functions and 8(1 + 0.0025)L bits for a total of 96 kB per filter
39
Collaboration Methods
1. Uninformed The sending peer picks a symbol to send at random
2. Speculative The sending peer uses a min-wise sketch from the
receiving peer to estimate the containment
3. Reconciled The sending peer uses either a Bloom filter or an ART
from the receiving peer to filter out duplicate symbols and sends a random permutation of the differences.
40
Scenarios and Evaluation
Varying 3 experimental factors:1. Set of connections in the overlay formed
between sources and peers2. Distribution of content among collaborating
peers3. Slack of the scenario (1.1 & 1.3)
When smaller than (1+ decoding overhead), the set of peers will be unable to recover the file
When larger than (1+decoding overhead), the set of peers will most likely recover the file
Methods provide the most significant benefits over naive methods when there is only a small amount of slack
41
Scenario 1: Two peers with Partial Content One peer sends symbols to the other
42
% of Shared Encoding Symbols
• Uninformed collaboration performs poorly and degrades significantly as the containment increases • Speculative collaboration is more efficient, but the overhead still increases slowly with containment• Overhead of reconciliation is purely from the cost of transmitting a Bloom filter or ART (less than a %)
Scenario 2: Download from a Server with Complete Content With concurrent transfer from a peer
43
% of Shared Encoding Symbols
• Uninformed collaboration overhead is considerably lower than in the scenario 1 (larger fraction of the content is sent directly via fresh symbols from the server)• Speculative collaboration performs similarly to scenario 1• Reconciled collaboration has overhead slightly higher than receiving symbols directly from the server
Scenario 3: Parallel Download from Peers with Partial Content Collaborating With Multiple Peers in
Parallel
44
% of Shared Encoding Symbols
• Can leverage bandwidth from peers with partial content with only a slight increase in overhead• Uninformed collaboration performs extremely poorly• Speculative collaboration dramatically improves as containment increases• Reconciled collaboration has much higher overhead than before
Conclusions
Adaptive overlay networks offer a powerful alternative to traditional mechanisms for content delivery Flexibility, scalability, and deploy-ability.
Informed and effective collaboration between end systems can be achieved through the digital fountain approach Care is needed to provide methods for representing and
transmitting the content in a manner that is as flexible and scalable as the underlying capabilities of the delivery model
45
Questions?
46
Supplemental Reading and Resources A Digital Fountain Approach to Reliable
Distribution of Bulk Data http://www.ecse.rpi.edu/Homepages/shivkuma/teaching/sp2001/readings/digital-fountain.pdf
ACM SIGCOM ’98, A Digital Fountain Approach to Reliable Distribution of Bulk Data http://www.sigcomm.org/sigcomm98/tp/abs_05.html
47