Date post: | 03-Jan-2016 |
Category: |
Documents |
Upload: | julius-cain |
View: | 221 times |
Download: | 0 times |
1
On the Placement of Web Server Replicas
Lili Qiu, Microsoft ResearchVenkata N. Padmanabhan, Microsoft Research
Geoffrey M. Voelker, UCSD
IEEE INFOCOM’2001, Anchorage, AK, April 2001
2
Outline Overview Related work Our approach Simulation methodology & results Summary
3
Motivation Growing interests in Web
server replicas Exponential growth in Web usage Content providers want to offer
better service at lower cost Solution: replication
Forms of Web server replicas Mirror sites Content Distribution Networks
(CDNs) CDN: a network of servers Examples: Akamai, Digital Island
Internetreplica
replica
Clients Content Provider
s
replica
replica
replica
4
Placement of Web Server Replicas
Problem specification Among a set of N potential sites, pick K sites as
replicas to minimize users’ latency or bandwidth usage
Internet
ClientsContent
Providers
5
Related Work Placement of Web proxies [LGI+99] Cache location [KRS00] Placement of Internet instrumentation
[JJJ+00]
6
Our Approach Model Internet as a graph Parameterize the graph using measured inputs
# requests generated from each region Distance between different regions
Map the placement problem onto a graph optimization problem
Assumption: Each client uses a single replica that is closest to it
Solve graph optimization problem Using various approximation algorithms
7
Minimum K-median Problem
Given a complete graph G=(V,E), d(j), c(i,j)
d(j): # requests c(i,j): distance between node
i and j Latency or hop counts or other metric to be
optimized Find a subset V’ V with |
V’| = K s.t. it minimizes
vV minwV’ d(v)c(v,w) NP-hard problem
2
510
3
8
4 7
3
6
8
25
64 2
8
Placement Algorithms Tree based algorithm [LGG+99]
Assume the underlying topologies are trees, and model it as a dynamic programming problem
O(N3M2) for choosing M replicas among N potential places
Random Pick the best among several random
assignments Hot spot
Place replicas near the clients that generate the largest load
9
Placement Algorithms (Cont.)
Greedy algorithm Calculate costs of assigning clients to replicas Select replica with lowest cost Adjust costs based upon assignment, repeat until
done
Super-Optimal algorithm Lagrangian relaxation + subgradient method
10
Simulation Methodology Network topology
Randomly generated topologies Using GT-ITM Internet topology generator
Real Internet network topology AS level topology obtained using BGP routing data from
a set of seven geographically dispersed BGP peers Web Workload
Real server traces MSNBC, ClarkNet, NASA Kennedy Space Center
Performance Metric Relative performance: costpractical/costsuper-optimal
11
Simulation Methodology (Cont.)
Simulate a network of N nodes (100 N 3000)
Cluster clients using network aware clustering [KW00]
IP addresses with the same address prefix belong to a cluster
A small number of popular clusters account for most requests
Top 10, 100, 1000, 3000 clusters account for about 24%, 45%, 78%, and 94% of the requests respectively
Pick the top N clusters Map them to different nodes
12
Simulation Methodology (Cont.) Random trees Random graphs AS-level topologies Sensitivity to the error in the input
13
Random Tree Topologies
Tree-based algorithm performs well as expected.Greedy algorithm performs equally as well.
14
Random Graph Topologies
The greedy and hot-spot algorithms out-perform the tree-based algorithm.
15
Large Random Graph Topologies
The greedy performs the best, and the hot-spot performs nearly as well.
16
AS-level Internet Topologies
The greedy performs the best, and the hot-spot performs nearly as well.
17
Effects of Imperfect Knowledge about Input Data
Predicted workload (using moving window average)
Perfect topology information
Within 5% degradation when using predicted workload
18
Effects of Imperfect Knowledge about Input Data (Cont.) Predicted workload (using moving window
average) Noisy topology information
Perturb the distance between two nodes i and j by up to a factor of 2
Within 15% degradation when using predicted workload and noisy topology information
19
Summary One of the first experimental studies on placement of
Web server replicas Knowledge about client workload and topology is needed
for provisioning replicas The greedy algorithm performs very well
Within a factor of 1.1 – 1.5 of the super-optimal Insensitive to noise
Stay within a factor of 2 of the super-optimal when the salted error is a factor of 4
The hot spot algorithm performs nearly as well Within a factor of 1.6 – 2 of the super-optimal
Obtaining input data Moving window average for load prediction Using BGP router data to obtain topology information
20
Conclusion Recommend using the greedy
algorithm for deciding the placement of Web server replicas
21
Acknowledgement Craig Labovitz Yin Zhang Ravi Kumar
22
Comments on greedy algorithm performance Worst-case performance: unbounded Bad example
A full homogeneous binary tree with n=2i leaves and n caches
optimal cost = 0 greedy cost = (n-1)*d
However, the worst-case scenario seems unlikely to occur in real and random topologies
0
ddd d
0 0
23
Simulation Results inRandom Tree Topologies
24
Random Tree Topologies
Tree-based algorithm performs well as expected.Greedy algorithm performs equally as well.
25
Random Graph Topologies
The greedy and hot-spot algorithms out-perform the tree-based algorithm.
26
Large Random Graph Topologies
The greedy performs the best, and the hot-spot performs nearly as well.
27
AS-level Internet Topologies
The greedy performs the best, and the hot-spot performs nearly as well.
28
Simulation Results inReal Internet Topologies
29
Obtaining Input Data Workload
The number of requests generated by popular client clusters
Stable Placement algorithm can use moving window average
for predicting load with negligible impact on performance
Network topology Propagation delay Hop count AS hop count Internet weather map
30
Placement of Web Server Replicas
Goal Placing K replicas to
minimize users’ latency or bandwidth usage
Minimum K-median problem Select K servers to
minimize the sum of assignment costs
NP-hard problem
Internetreplica
replica
replica
replica
replica
ClientsContent
Providers