Date post: | 27-Dec-2015 |
Category: |
Documents |
Upload: | solomon-dennis |
View: | 222 times |
Download: | 0 times |
2: Application Layer 1
Content Distribution
March 6, 2012
2: Application Layer 2
Contents
P2P architecture and benefits P2P content distribution Content distribution network (CDN)
2: Application Layer 3
Pure P2P architecture
no always-on server arbitrary end systems
directly communicate peers are
intermittently connected and change IP addresses
Three topics: File distribution Searching for
information Case Study: Skype
peer-peer
2: Application Layer 4
File Distribution: Server-Client vs P2PQuestion : How much time to distribute file
from one server to N peers?
us
u2d1 d2u1
uN
dN
Server
Network (with abundant bandwidth)
File, size F
us: server upload bandwidth
ui: peer i upload bandwidth
di: peer i download bandwidth
2: Application Layer 5
File distribution time: server-client
us
u2d1 d2u1
uN
dN
Server
Network (with abundant bandwidth)
F server
sequentially sends N copies: NF/us time
client i takes F/di
time to download
increases linearly w.r.t. N (for large N)
= dcs = max {NF/us, F/min(di) }i
Time to distribute F to N clients using
client/server approach
2: Application Layer 6
File distribution time: P2P
us
u2d1 d2u1
uN
dN
Server
Network (with abundant bandwidth)
F server must send one
copy: F/us time client i takes F/di time
to download NF bits must be
downloaded (aggregate) fastest possible upload rate: us + Sui
dP2P = max { F/us, F/min(di) , NF/(us + Sui) }i
2: Application Layer 7
0
0.5
1
1.5
2
2.5
3
3.5
0 5 10 15 20 25 30 35
Min
imu
m D
istr
ibu
tion
Tim
e
N
P2P
Client-Server
Server-client vs. P2P: example
Client upload rate = u, F/u = 1 hour, us = 10u, dmin ≥ us
Client server ~ NF/us vs. P2P ~ NF/(us + Sui)
2: Application Layer 8
Contents
P2P architecture and benefits P2P content distribution Content distribution network (CDN)
P2P content distribution issues Issues
Group management and data search Reliable and efficient file exchange Security/privacy/anonymity/trust
Approaches for group management and data search (i.e., who has what?) Centralized (e.g., BitTorrent tracker) Unstructured (e.g., Gnutella) Structured (Distributed Hash Tables [DHT])
2: Application Layer 9
Centralized model (Napster)
original “Napster” design
1) when peer connects, it informs central server: IP address content
2) Alice queries for “Hey Jude”; server notifies that Bob has the file..
3) Alice requests file from Bob
centralizeddirectory server
peers
Alice
Bob
1
1
1
12
3
2: Application Layer 10
Q: “Hey Jude”A: Bob has it
Centralized modelBob Alice
JaneJudy
file transfer is decentralized, but locating content is highly centralized
2: Application Layer 11
Centralized model Benefits:
Low per-node state Limited bandwidth usage Short search time High success rate Fault tolerant
Drawbacks: Single point of failure Limited scale Possibly unbalanced load
copyright infringement (?)
Bob Alice
JaneJudy
2: Application Layer 12
2: Application Layer 13
File distribution: BitTorrent
tracker: tracks peers participating in torrent
torrent: group of peers exchanging chunks of a file
obtain a listof peers
trading chunks
peer
P2P file distribution
2: Application Layer 14
BitTorrent (1)
file divided into 256KB chunks. peer joining torrent:
has no chunks, but will accumulate them over time
registers with tracker to get list of peers, connects to subset of peers (“neighbors”)
while downloading, peer uploads chunks to other peers.
peers may come and go once peer has entire file, it may (selfishly) leave
or (altruistically) remain
2: Application Layer 15
BitTorrent (2)
Pulling Chunks at any given time,
different peers have different subsets of file chunks
periodically, a peer (Alice) asks each neighbor for a list of chunks that it has.
Alice sends requests for her missing chunks rarest first
Sending Chunks: tit-for-tat Alice sends chunks to
four neighbors currently sending her chunks at the highest rate re-evaluate top 4
every 10 secs every 30 secs: randomly
select another peer, starts sending chunks newly chosen peer
may join top 4 “optimistically
unchoke”
2: Application Layer 16
BitTorrent: Tit-for-tat(1) Alice “optimistically unchokes” Bob
(2) Alice becomes one of Bob’s top-four providers; Bob reciprocates(3) Bob becomes one of Alice’s top-four providers
With higher upload rate, can find better trading partners & get file faster!
2: Application Layer 17
P2P Case study: Skype
inherently P2P: pairs of users communicate.
proprietary application-layer protocol (inferred via reverse engineering)
hierarchical overlay with super nodes (SNs)
Index maps usernames to IP addresses; distributed over SNs
Skype clients (SC)
Supernode (SN)
Skype login server
2: Application Layer 18
Peers as relays
Problem when both Alice and Bob are behind “NATs”. NAT prevents an
outside peer from initiating a call to insider peer
Solution: Using Alice’s and Bob’s
SNs, Relay is chosen Each peer initiates
session with relay. Peers can now
communicate through NATs via relay
2: Application Layer 19
Contents
P2P architecture and benefits P2P content distribution Content distribution network (CDN)
Why Content Networks?
More hops between client and Web server more congestion!
Same data flowing repeatedly over links between clients and Web server
S
C1
C4
C2
C3
- IP router
Slides from http://www.cis.udel.edu/~iyengar/courses/Overlays.ppt 2: Application Layer 20
Why Content Networks?
Origin server is bottleneck as number of users grows
Flash Crowds (for instance, Sept. 11)
The Content Distribution Problem: Arrange a rendezvous between a content source at the origin server (www.cnn.com) and a content sink (us, as users)
Slides from http://www.cis.udel.edu/~iyengar/courses/Overlays.ppt 2: Application Layer 21
Example: Web Server Farm
Simple solution to the content distribution problem: deploy a large group of servers
Arbitrate client requests to servers using an “intelligent” L4-L7 switch
Pretty widely used today
L4-L7 Switch
Request fromgrad.umd.edu
Request from ren.cis.udel.edu
Request fromren.cis.udel.edu
Request fromgrad.umd.edu
www.cnn.com(Copy 1)
www.cnn.com(Copy 3)
www.cnn.com(Copy 2)
2: Application Layer 22
Example: Caching Proxy
Majorly motivated by ISP business interests – reduction in bandwidth consumption of ISP from the Internet
Reduced network traffic Reduced user perceived latency
Clientren.cis.udel.edu
Clientmerlot.cis.u
del.edu
Intercepters
Proxy
www.cnn.comInternetTCP port 80 traffic
Othertraffic
ISP
2: Application Layer 23
2: Application Layer 24
But on Sept. 11, 2001
Web Serverwww.cnn.com
Usermslab.kaist.ac.kr
1000,000other hosts
1000,000other hosts
New ContentWTC News!
oldcontent request
request
- Caching Proxy
ISP
- Congestion / Bottleneck
2: Application Layer 25
Problems with discussed approaches: Server farms and Caching proxies Server farms do nothing about problems due to
network congestion
Caching proxies serve only their clients, not all users on the Internet
Content providers (say, Web servers) cannot rely on existence and correct implementation of caching proxies
Accounting issues with caching proxies. For instance, www.cnn.com needs to know the number of
hits to the webpage for advertisements displayed on the webpage
2: Application Layer 26
Again on Sept. 11, 2001 with CDN
Web Serverwww.cnn.com
Usermslab.kaist.ac.kr
New ContentWTC News!
requestnew
content
1000,000other users
1000,000other users
- Surrogate
- Distribution Infrastructure
FL
IL
DE
NY
MA
MICA
WA
2: Application Layer 27
Web replication - CDNs
Overlay network to distribute content from origin servers to users
Avoids large amount of same data repeatedly traversing potentially congested links on the Internet
Reduces Web server load
Reduces user perceived latency
Tries to route around congested networks
2: Application Layer 28
CDN vs. Caching Proxies
Caches are used by ISPs to reduce bandwidth consumption, CDNs are used by content providers to improve quality of service to end users
Caches are reactive, CDNs are proactive
Caching proxies cater to their users (web clients) and not to content providers (web servers), CDNs cater to the content providers (web servers) and clients
CDNs give control over the content to the content providers, caching proxies do not
CDN Architecture
Surrogate
Surrogate
Request Routing
Infrastructure
Distribution& Accounting Infrastructure
CDN
Origin Server
Client Client
2: Application Layer 29
CDN Components
Distribution Infrastructure: Moving or replicating content from content source
(origin server, content provider) to surrogates
Request Routing Infrastructure: Steering or directing content request from a client to
a suitable surrogate
Content Delivery Infrastructure: Delivering content to clients from surrogates
Accounting Infrastructure: Logging and reporting of distribution and delivery activities
2: Application Layer 30
Server Interaction with CDN
DistributionInfrastructure
1
1. Origin server pushes new content to CDN OR CDN pulls content from origin server
Accounting Infrastructure
2
2. Origin server requests logs and other accounting info from CDN OR CDN provides logs and other accounting info to origin server
CDN
Origin Server
www.cnn.com
2: Application Layer 31
Request Routing
Infrastructure
Client Interaction with CDN
1
1. Hi! I need www.cnn.com/sept11
2
2. Go to surrogate newyork.cnn.akamai.com
3
3. Hi! I need content /sept11
Q:How did the CDN choose the New York surrogate over the California surrogate ?
Client
Surrogate(NY)
Surrogate(CA)
CDNcalifornia.cnn.akamai.com
newyorkcnn.akamai.com
2: Application Layer 32
Request Routing Techniques
Request routing techniques use a set of metrics to direct users to “best” surrogate
Proprietary, but underlying techniques known: DNS based request routing Content modification (URL rewriting) Anycast based (how common is anycast?) URL based request routing Transport layer request routing Combination of multiple mechanisms
2: Application Layer 33
DNS based Request-Routing
Common due to the ubiquity of DNS as a directory service
Specialized DNS server inserted in a DNS resolution process
DNS server is capable of returning a different set of A, NS or CNAME records based on policies/metrics
2: Application Layer 34
DNS based Request-Routing
Akamai DNS
DN
S q
uery
:w
ww
.cnn.c
om
DN
S r
esp
onse
:A
1
45
.15
5.1
0.1
5
Sess
ion
local DNS server (dns.nyu.edu)128.4.4.12
1) DNS query:www.cnn.com
DNS response:A 145.155.10.15
www.cnn.com
Surrogate145.155.10.15
Surrogate58.15.100.152
AkamaiCDN
test.nyu.edu
128.4.30.15
newyork.cnn.akamai.com
california.cnn.akamai.com
newyork.cnn.akamai.com
Q: How does the Akamai DNS know which surrogate is
closest ?
2: Application Layer 35
DNS based Request-Routing
DN
S q
uery
Akamai DNS
www.cnn.com
Surrogate
Surrogate
AkamaiCDN
test.nyu.edu128.4.30.15
local DNS server (dns.nyu.edu)
128.4.4.12
DNS query
Measure
to
Client D
NS
Measure to Client DNS
Measurement results
Measure
ment resu
lts
Mea
sure
men
tsMeasurem
ents
2: Application Layer 36
DNS based Request-Routingwww.cnn.com
Client DNS76.43.32.4
Surrogate145.155.10.15
Surrogate58.15.100.152
Akamai DNS
AkamaiCDN
Client76.43.35.53
Requesting DNS - 76.43.32.4
Surrogate - 145.155.10.15
www.cnn.comA 145.155.10.15TTL = 10s
Requesting DNS - 76.43.32.4Available Bandwidth = 10 kbpsRTT = 10 ms
Requesting DNS - 76.43.32.4Available Bandwidth = 5 kbpsRTT = 100 ms
2: Application Layer 37
38
DNS based Request Routing: Discussion
Originator Problem: Client may be far removed from client DNS
Client DNS Masking Problem: Virtually all DNS servers, except for root DNS servers honor requests for recursion Q: Which DNS server resolves a request for test.nyu.edu?Q: Which DNS server performs the last recursion of the DNS
request?
Hidden Load Factor: A DNS resolution may result in drastically different load on the selected surrogate – issue in load balancing requests, and predicting load on surrogates
2: Application Layer
2: Application Layer 39
Summary
P2P architecture and its benefits P2P content distribution
BitTorrent, Skype Content distribution network (CDN)
DNS-based request routing
Distributed Hash Table (DHT)
DHT = distributed P2P database Database has (key, value) pairs;
key: ss number; value: human name key: content type; value: IP address
Peers query DB with key DB returns values that match the key
Peers can also insert (key, value) peers
2: Application Layer 40
DHT Identifiers
Assign integer identifier to each peer in range [0,2n-1]. Each identifier can be represented by n bits.
Require each key to be an integer in same range.
To get integer keys, hash original key. eg, key = h(“Led Zeppelin IV”) This is why they call it a distributed “hash” table
2: Application Layer 41
How to assign keys to peers?
Central issue: Assigning (key, value) pairs to peers.
Rule: assign key to the peer that has the closest ID.
Convention in lecture: closest is the immediate successor of the key.
Ex: n=4; peers: 1,3,4,5,8,10,12,14; key = 13, then successor peer = 14 key = 15, then successor peer = 1
2: Application Layer 42
1
3
4
5
810
12
15
Chord (a circular DHT) (1)
Each peer only aware of immediate successor and predecessor.
“Overlay network”2: Application Layer 43
Chord (a circular DHT) (2)
0001
0011
0100
0101
10001010
1100
1111
Who’s resp
for key 1110 ?I am
O(N) messageson avg to resolvequery, when thereare N peers
1110
1110
1110
1110
1110
1110
Define closestas closestsuccessor
2: Application Layer 44
Chord (a circular DHT) with Shortcuts
Each peer keeps track of IP addresses of predecessor, successor, short cuts.
Reduced from 6 to 2 messages. Possible to design shortcuts so O(log N) neighbors, O(log N)
messages in query
1
3
4
5
810
12
15
Who’s resp for key 1110?
2: Application Layer 45
Peer Churn
Peer 5 abruptly leaves Peer 4 detects; makes 8 its immediate successor;
asks 8 who its immediate successor is; makes 8’s immediate successor its second successor.
What if peer 13 wants to join?
1
3
4
5
810
12
15
• To handle peer churn, require each peer to know the IP address of its two successors. • Each peer periodically pings its
two successors to see if they are still alive.
2: Application Layer 46