Date post: | 03-Jan-2016 |
Category: |
Documents |
Upload: | paulina-dante |
View: | 32 times |
Download: | 0 times |
ZGP001 (zphddef.ppt - 07/15/03)
Performance Evaluation of URL Routing for
Content Distribution Networks
PhD defenseby
Zornitza Genova Prodanoff
Committee Members:Dr. K. J. Christensen (Major Professor)
Dr. M. VaranasiDr. R. Perez
Dr. ChariDr. Labrador
This material is based upon work funded by the National Science Foundation under grant no. 9875177
ZGP002
Acknowledgements
I would like to thank:
My major professor Dr. Ken Christensen,
My committee: Dr. Varanasi, Dr. Perez, Dr. Chari, and Dr. Labrador
Dr. Suen for his comments at my proposal defense
My colleagues: K. Yoshigoe, A. Aslam, G. Perrera, and J. Shahbazian
My family
• Motivation
• Problem and contributions
• URL Routing
• Improvements to URL routing
• Evaluation of URL signatures
• Evaluation of hashing for URL routing
• Summary
• List of my publications
ZGP003
Topics
New
New
New
ZGP004
Motivation
“…2.5 Billion Hours Spent Waiting on the Web in 1998.” - John Roth, chief executive of Nortel Networks at Telecom '99
ZGP005
Problem:
Excessive delay in the Internet caused by the inability to efficiently access distributed content in the Web
My contributions:
1) Architected a new URL router that uses HTTP redirection
2) Investigated new use of CRC32 for reducing the size of routing tables
3) Investigated a new self-adjusting hashing method for faster URL routing look-up
4) Performed the first queuing evaluation of hashing - effects of correlation discovered
Problem and contributions
• Motivation
• Problem and contributions
• URL Routing
• Improvements to URL routing
• Evaluation of URL signatures
• Evaluation of hashing for URL routing
• Summary
• List of my publications
ZGP006
Topics
ZGP007
• Next generation Internet - Content Distribution Networks- A CDN is an overlay network on the Internet - A CDN co-locates content throughout the world
• CDNs are of a great commercial and research interest- $15 million in NSF funding for Web services research- Akamai is one major CDN provider
URL routing
ZGP008
URL routing continued
Transparent cache
Origin site
Reverse cache
Distributed server
Internet
Clients
Proxy cache http://www.some.com/page
http://334.249.2.8/page
Global content distribution in a CDN
http://214.29.2.15/page
ZGP009
URL routing continued
HTTP redirection in a CDN
(1) HTTP request and redirect
(2) HTTP re-request and response
Origin site
Reverse cache
Distributed server
Clients
Proxy cache
(1)
(2)URL router
ZGP010
URL routing continued
One armed URL router
HTTP requestsand redirects
Network links
Layer 3 switch
Architecture of a new URL router
URL 1 Loc 1 (state), loc 2 (state), … loc M1 (state)
URL 2 Loc 1 (state), loc 2 (state), … loc M2 (state)
URL N Loc 1 (state), loc 2 (state), … loc MN (state)
… …
Routing Table
ZGP011
URL routing continued
Need to exchange routing tables (digesting)
Summary Cache [17]– Use Bloom filters to “merge” routing (hash) tables
Bloom filter is probabilistic and does not support updates- False positives if non-unique hashes- Results in a “routing collision” in the context of URLs
ZGP012
URL routing continued
Need to do look-ups in routing tables
• Why use hashing?– Build routing tables as hash tables for efficient look-up
• Idea of self–adjusting hash– Most frequently used keys are closer to the head
» If chained hashing: rearrange after key accesses»Transposition rule for lists [50], [7]» Move-to-front rule for lists [33]
• Review of H1 hashing [74]– Self-adjusting by using transposition
ZGP013
URL routing continued
Chained resolution of hash table collision
key record
rs
0
1
2
…
index chain
The hashing collision at index 0 causes the chain to be
created
r2
r1
r0 rn-1r0
r1
r2
rn-1
…
k0
k1
k2
kn-1
m-1
URL routing continued
ZGP014
C1. [Create lists] For i 0 to m-1 set LISTi NULL.
C2. [Hash] Set i h(KEY), j 0
C3. [Is there a list?] If LISTi = NULL, go to C6.
C4. [Compare] If K = LISTi[j], terminate
C5. [Advance to next] If LISTi[j] NULL, set j j+1 and go to step C4.
C6. [Insert new key] Set LISTi[j] KEY.
C4A. [Compare and transpose – H1 hashing]If K = LISTi[j] and j 0, swap LISTi[j] with LISTi[j-1] and terminate Else terminate
H1 and Simple hashing algorithms based on [37]
ZGP015
URL routing continued
Now begin my contributions in digesting and hashing (and evaluation thereof)
• Motivation
• Problem and contributions
• URL routing
• Improvements to URL routing
• Evaluation of URL signatures
• Evaluation of hashing for URL routing
• Summary
• List of my publications
ZGP016
Topics
ZGP027
Improvements to URL routing
Open problems
1) Select best source based on state (and location of client)
2) Reduce the size of the routing table to update/share
3) Perform fast routing look-ups
My problems
ZGP018
Improvements to URL routing continued
• My idea…−Use CRC32 for URL signatures
• CRC32 circuitry is already part of an Ethernet adapter– Serial shift-register with wrapped XOR terms
• Use to get CRC32 signatures for URL in HTTP request header
• Need to calculate a CRC32 over a subfield [53]– The subfield is the URL in an HTTP request header
ZGP019
Improvements to URL routing continued
Define the following,– P is CRC32 generator polynomial– Ai, i = 1, …, m is a polynomial (bit sequence)
– We store in a table (for all possible M) the remainders…
, where M is length of subfield
SubfieldPacket header
A0 A2
A1
Rest of packet
PR
M
M2
Rem
ZGP020
Improvements to URL routing continued
We have the following,
P
ARA
0Rem0
Returned by adapter - from CRC32 shift register
What we want (CRC32 for subfield)
P
ARA
1Rem1
P
ARA
2Rem2
ZGP021
Improvements to URL routing continued
P
R
P
Am
iA
m
ii i
11 RemRem
P
R
P
Am
iA
m
ii i
11 RemRem
P
AR iAi
Rem For the following properties apply:
ZGP022
Improvements to URL routing continued
Solve for RA2 as follows…
Let A3 be A0 shifted left M bits.
Then
and
.
P
RR
P
AR
MAM
A0
3Rem
2Rem 0
P
RR
P
AAR
AAA
132
RemRem 13
32-bit multiply
ZGP023
Improvements to URL routing continued
• My idea…−Aggressive hashing to perform fast look-up
» Self-adjusting chained collision resolution
» Fast way to do hash table look-ups
» Based on move-to-front rule for lists [33], [50]
Improvements to URL routing continued
The new Aggressive hashing algorithm
C1. [Create lists] For i 0 to m-1 set LISTi NULL.
C2. [Hash] Set i h(KEY), j 0
C3. [Is there a list?] If LISTi = NULL, go to C6.
C4. [Compare] If K = LISTi[j], terminate
C5. [Advance to next] If LISTi[j] NULL, set j j+1 and go to step C4.
C6. [Insert new key] Set LISTi[j] KEY.
C4B. [Compare and move-to-front – Aggressive hashing]If K = LISTi[j] and j 0 LISTi[j] TEMP, for k = 0 to j
LISTi[k] LISTi [k-1]. Terminate.Else terminate.
New
• Motivation
• Problem and contributions
• URL routing
• Improvements to URL routing
• Evaluation of URL signatures
• Evaluation of hashing for URL routing
• Summary
• List of my publications
ZGP025
Topics
ZGP026
Evaluation of URL signatures Evaluation done with trace-driven simulation
Response variables:
1) Probability of false hits due to signature collisions
2) CPU time required to generate URL signatures
3) Reduction in processing and memory resources for URL look-up
ZGP027
Input data used in the evaluation:
Obtained lists of URLs from 9 cache and server HTTP logs– Access lists– URL lists– CRC32 lists
Unique URLs range from 70 to 2.5 million (1.5 to 146 MBytes)
Continuity of logs was in months
Full URL string or CRC32 signatures lists were built
Evaluation of URL signatures continued
generated by me
2.1 GBytes of ASCII format raw data was used
ZGP028
Evaluation of URL signatures continued
Access list name Number accesses
Number URLs
Mean URL
length (B)
Full URL list size (bytes)
CRC32 list size
(bytes)
www.peak.org 16,374 70 23.93 1,675 280
SDMA 41,941 153 33.76 5,165 612
UVA 318,899 45,816 44.91 2,057,625 183,264
NLANR 944,028 504,967 58.44 29,510,135 2,019,868
UC Berkeley 1,791,34
9149,344 41.87 6,253,716 597,376
mcs.net 1,862,07
075,361 29.87 2,250,829 301,444
hyperreal.org 4,080,59
086,338 89.17 7,698,337 345,352
CA*netII4,642,86
12,552,045 57.83 147,573,556
10,208,184
USF CSEE8,819,45
449,029 51.84 2,541,483 196,116
Input data characteristics
Access list name Number accesses
Number URLs
Mean URL
length (B)
Full URL list size (bytes)
CRC32 list size
(bytes)
www.peak.org 16,374 70 23.93 1,675 280
SDMA 41,941 153 33.76 5,165 612
UVA 318,899 45,816 44.91 2,057,625 183,264
NLANR 944,028 504,967 58.44 29,510,135 2,019,868
UC Berkeley 1,791,34
9149,344 41.87 6,253,716 597,376
mcs.net 1,862,07
075,361 29.87 2,250,829 301,444
hyperreal.org 4,080,59
086,338 89.17 7,698,337 345,352
CA*netII4,642,86
12,552,045 57.83 147,573,556
10,208,184
USF CSEE8,819,45
449,029 51.84 2,541,483 196,116
CA*netII 4,642,86
12,552,045 57.83 147,573,556
10,208,184
USF CSEE 8,819,45
449,029 51.84 2,541,483 196,116
ZGP029
Experiments on the performance of CRC32
• Experiment #1: Number of CRC collisions was measured– CRC32 generated for each URL – Non-unique CRC32s counted
• Experiment #2: Measured CPU time to generate CRC32 URL list
– Software CRC generation (8-bit look-up coded in “C”)
• Experiment #3: Measured CPU time required for look-up – All entries from access list were looked up in URL list – URL list is a Simple chained hash table
Evaluation of URL signatures continued
ZGP030
Evaluation of URL signatures continued
Access list name
CollisionsMeasured
Calculated value
Pr[collision] measured
www.peak.org 0 0 0.0000000
SDMA 0 0 0.0000000
UVA 0 1 0.0000000
NLANR 68 59 0.0001347
UC Berkeley 2 5 0.0000134
mcs.net 0 1 0.0000000
hyperreal.org 2 2 0.0000463
CA*netII 1558 1516 0.0006105
USF CSEE 2 1 0.0000408
Results for experiment #1
Measured and theoretical are close
ZGP031
Evaluation of URL signatures continued
Access list Time for URL list Time for URLwww.peak.or
g <10
millisec--
SDMA <10 --
UVA 40 0.8730
secNLANR 460 0.9109
UC Berkeley 100 0.6695
mcs.net 40 0.5307
hyperreal.org 120 1.3897
CA*netII 2390 . 0.9368
USF CSEE 40 0.8158
Results for experiment #2
Time per URL string is small ( sec)
ZGP032
Evaluation of URL signatures continued
0
0.1
0.2
0.3
0.4
0.5
0.6
10 12 14 16 18 20 22
H value
Look-u
p t
ime (
sec)
CRC32 URL signatures
Full URL
Results for experiment #3
CRC32 URL signature is better
ZGP033
Evaluation of URL signatures continued
Experiments for CRC32 vs. MD5-Bloom filter digesting
•Experiment #1: Measured digest size and generation CPU time
–MD5-Bloom filter –CRC32–32-bit checksum–Lempel-Ziv (LZ) compression (used pkzip25)
•Experiment #2: Measured digest size and CPU time
–MD5-Bloom
•Experiment #3: Measured collisions –Control variable is URL length –MD5-Bloom vs. CRC32 –URL length is a maximum of 25, 30, …, 80 bytes
ZGP034
Evaluation of URL signatures continued
Experiments for CRC32 vs. MD5-Bloom filter digesting (continued)
•Experiment #4: Measured digest size of the hash chain method
–Based on the number of components –Tree structure of 32 bits for a <depth, hash code> pair
ZGP035
Evaluation of URL signatures continued
CA*net list CSE list
Method (Load Factor)
CPU time (sec)
Size (Mbytes
)
Collisions (%)
CPU time (sec)
Size (Mbytes
)
Collisions (%)
MD5-Bloom (8) 89.13 9.74 0.03 1.63 0.19 0.00
CRC32 16.22 9.74 0.03 0.27 0.19 0.00
32-bit checksum
14.85 9.74 0.71 0.24 0.19 0.22
LZ compression 17.35 16.43 0.00 0.23 0.25 0.00
MD5-Bloom (8) 89.13 9.74 0.03 1.63 0.19 0.00
MD5-Bloom (16) 92.37 19.47 0.00 1.71 0.37 0.00
MD5-Bloom (32) 97.40 38.94 0.00 1.84 0.75 0.00
Results for experiments #1 and #2
Similar CRC32 and Bloom filter collisions
ZGP036
Evaluation of URL signatures continued
0.00
0.01
0.10
25 35 45 55 65 75
URL length (bytes)
Colli
sions
(%)
MD5-Bloom
CRC32
Results for experiment #3
Collisions are same for CRC32 and Bloom filter
ZGP037
Evaluation of URL signatures continued
Results from experiment #4
• Hash chaining in an average of 212% larger digests than CRC32
Substantially larger then the other methods
ZGP038
Evaluation of URL signatures continued
Discussion of results
• CRC32 URL signatures reduce the size of URL lists and speed-up look-up in a hash table– Require less network bandwidth to transfer – Require less memory for storage in the URL router
• For CRC32 the number of collisions was found to be small
• CRC32 digests require less CPU and produce same collisions
• Motivation
• Problem and contributions
• URL routing
• Improvements to URL routing
• Evaluation of URL signatures
• Evaluation of hashing for URL routing
• Summary
• List of my publications
ZGP039
Topics
ZGP040
Evaluation of hashing for URL routing continued
Look-up time experiments:
• Experiment #1: Effect of hash table size on look-up time (NASA access list)
• Experiment #2: Effect of hash table size (in K ) on look-up time (Clark.net access list)
ZGP041
Hash table look-up time for experiment #1
Evaluation of hashing for URL routing continued
0
10
20
30
40
50
60
8 9 10 11 12 13
Hash table Size (K)
Mean L
ook-
up T
ime
Simple
H1
Aggressive
For dense hash tables Aggressive is better than H1
ZGP042
Hash table look-up time for experiment #2
Evaluation of hashing for URL routing continued
0
10
20
30
40
8 9 10 11 12 13K
Mean L
ook-
up T
ime
Simple
H1
Aggressive
Similar to experiment #1 results
ZGP043
Evaluation of hashing for URL routing continued
• Evaluation model (single server queue):
• Response variables: – mean queuing delay – drop in utilization
Queued URLs
Arrivals are URLs to be looked-up
Server is a hash table look
ZGP044
Mean queue length experiments:
•Experiment #1: Effect of hash table size (K) on queue length (L) for utilization U = 80% (Simple chain) and exponential arrivals
•Experiment #2: Effect of burtiness (Tmax) on L for U = 80% (Simple chain) and K = 8
•Experiment #3: Effect of (Tmax) on L for U = 80% and K = 8
•Experiment #4: Effect of autocorrelation (unshuffled and shuffled ordering of requests) on L for U = 80% and K = 8
•Experiment #5: Effect of autocorrelation (unshuffled and shuffled ordering of requests) on L for U = 80% (Simple chain) and K = 8
Evaluation of hashing for URL routing continued
ZGP045
Evaluation of hashing for URL routing continued
0
1
2
3
4
5
6
8 9 10 11 12 13
K
L
Simple
H1Aggressive
Results for experiment #1
Self-adjusting methods show similar performance
ZGP046
Evaluation of hashing for URL routing continued
20
0
10
30
40
50 100 250 500 750 1000
Tmax
L
Simple hashing - value range is 5500 to 34000
H1
Aggressive
Results for experiment #2
H1 shows faster increase in L
ZGP047
Evaluation of hashing for URL routing continued
0
40K
80K
120K
50 100 250 500 750 1000
T max
L
Simple
H1
Aggressive
Results for experiment #3
H1 has magnitudes worse queue length
ZGP048
Results for experiment #4
Evaluation of hashing for URL routing continued
Algorithm unshuffled shuffled M/G/1
Simple 5.20 3.15 3.13
H1 29102.01 8.58 8.57
Aggressive
294.09 9.93 9.76
H1 has magnitudes worse queue length
ZGP049
Results for experiment #5
Evaluation of hashing for URL routing continued
Algorithm U unshuffled shuffled
Simple 80.0% 5.20 3.15
H1 21.7 0.43 0.36
Aggressive
12.9 0.19 0.18
ZGP050
Discussion of results
• Aggressive hashing improves upon H1 hashing– Modest look-up time improvement– Significant improvement from a queueing perspective
• Queueing must be used for evaluating hashing algorithms
• LRD in look-up time of H1 results in extreme queueing delay – Catastrophic effects on any application
Evaluation of hashing for URL routing continued
• Motivation
• Problem and contributions
• URL routing
• Improvements to URL routing
• Evaluation of URL signatures
• Evaluation of hashing for URL routing
• Summary
• List of my publications
ZGP051
Topics
ZGP052
In summary, I have address the problem of
Excessive delay in the Internet caused by the inability to efficiently access distributed content in the Web
My work has shown that:
1) A URL router that uses HTTP redirection is feasible
2) CRC32 can be used for digesting of URL routing tables
3) Aggressive hashing improves upon existing hashing algorithms in fast look-up
4) Queueing behavior needs to be considered when evaluating hashing algorithms
Summary
Four publications have resulted
ZGP053
List of my related publications
1. Z. Genova and K. Christensen, "Managing Routing Tables for URL Routers in Content Distribution Networks," submitted to the International Journal of Network Management in June 2003
2. Z. Genova and K. Christensen, “Efficient Summarization of URLs using CRC32 for Implementing URL Switching,” Proceedings of the 27th IEEE Conference on Local Computer Networks (LCN), pp. 343-344, November 2002
3. Z. Genova and K. Christensen, “Using Signatures to Improve URL Routing,” Proceedings of IEEE International Performance, Computing, and Communications Conference, pp. 45-52, April 2002
4. Z. Genova and K. Christensen, “Challenges in URL Switching for Implementing Globally Distributed Web Sites,” Proceedings of the Workshop on Scalable Web Services, pp. 89-94, August 2000