1
Scheduling file transfers on a circuit-switched network
Student: Hojun Lee Advisor: Professor M. Veeraraghavan
Committee: Professor E. K. P. Chong Professor S. Torsten Professor S. Panwar Professor M. Veeraraghavan
Date: 5/10/04
2
Problem statement
Increasing file sizes(e.g., multimedia, eScience: particle physics)
Increasing Link rates (e.g., optical fiber)
Current protocols (e.g., tcp) do not exploit high bandwidth to decrease file transfer delay
example) Current TCP connection with:
1) 1500B (MTU), 2)100 ms round-trip time(RTT), and
3) a steady throughput of 10Gbps; would require
at most one packet drop every 5,000,000,000 packets (not realistic)
pC
RTTMSSThroughput
23C(p = packet loss rate and )
3
Solutions to this problem Limit upgrades to end hosts
Scalable TCP (Kelly) High speed TCP (Floyd) FAST TCP (Low, et al.)
Upgrade routers within the Internet Larger Maximum Transmission Unit (MTU) ~ proposed by
Mathis
4
Our proposed solution: Circuit-switched High-speed End-to-End Transport ArcHitecture (CHEETAH)
End-to-end circuit setup and release dynamically CHEETAH: add-on basis to current Internet
5
File transfers using CHEETAH Set up circuit transfer file release circuit
Do not keep circuit open during user think time
Only unidirectional circuit used Utilization reasons
Mode of operation of circuit-switched network Call-blocking mode
“All-or nothing” full-bandwidth allocation approach Attempt a circuit setup:
If it succeeds end host will enjoy a much shorter file-transfer delay than on the TCP/IP path
If it fails fall back to the TCP/IP path Call-queueing mode
6
Analytical model for blocking mode (mean delay if the circuit setup is attempted):
(1)
= call blocking probability, = the mean call-setup delay,
= time to transfer a file, ~=
: ~ Padhye et al., Cardwell et al. (Modeling TCP Latency)
~ function of RTT, bottleneck link rate r,
packet loss , and round-trip propagation delay
cheetahTE
])[][()][)(1(][ tcpfailbtransfersetupbcheetah TETEPTTEPTE
bP ][ setupTE
transferT ][ failTE
][ tcpTE
][ setupTE
lossP propT
7
Routing decision Compare with
resort directly to the TCP/IP path
attempt circuit setup
][ cheetahTE ][ tcpTE
][][1
][~
][][ If
cheetahtcpb
setup
tcpcheetah
TETEP
TE
TETE
][][1
][~
][][ If
cheetahtcpb
setup
tcpcheetah
TETEP
TE
TETE
8
File transfer delays for large files( 1GB and 1TB) over TCP/IP path
9
Numerical results for transfer delays of file size [5MB – 1GB]
Link rate = 100 Mbps, , k = 207.0 spsig
Should always attempt a circuit setup for these parameters
10
Numerical results for transfer delays of file size [5MB – 1GB] Con’t
Link rate = 1 Gbps, , k = 207.0 spsig
Cross over file size exists for small propagation delay environment
11
Crossover file sizes
Measure of loading on
Pb= 0.01 Pb= 0.1 Pb= 0.3
Ploss = 0.001
22 MB 24 MB 30 MB
Ploss = 0.001
9 MB 10 MB 12 MB
Ploss = 0.01
< 5MB < 5MB < 5MB
Measure of loading on
Pb= 0.01 Pb= 0.1 Pb= 0.3
Ploss = 0.001
2.4 MB 2.65 MB 3.4 MB
Ploss = 0.001
2 MB 2.2 MB 2.8 MB
Ploss = 0.01
500 KB 550 KB 650 kB
ckt. Sw.network
TCP/IP path TCP/IP path
ckt. Sw.network
r = 1 Gbps, Tprop = 0.1 ms, k = 20 r = 100 Mbps, Tprop = 0.1 ms, k = 20
For high propagation-delay environment, always attempt a circuit (utilization implications) This work was presented at PFLDNET2003 Workshop [1] and Opticomm2003 [2].
12
Motivation for call queueing Example: Large file transfer (1TB)
NetworkH DTCP/IP path
Circuit path
Call setup attempt Delay = 4 days 14.9 hours Ploss = 0.0001, Tprop = 50ms r = 1 Gbps, 1TB/1Gbps = 2.2 hours
13
Problem with call queueing Low bandwidth utilization
Reason: up-stream switches hold resources while for waiting for down-stream switches to admit a call instead of using the wait period to admit short calls that only traverse upstream segments
Host A Host BSwitch 1 Switch 2link 1 link 2setup setup
The call waits (queues) until resources become available on link 1, reserves and holds bandwidthfor this call until the call is setup all through
While call is being queue for link 2 resources, link 1 resourcesare idle
14
Idea! Use knowledge of file sizes to “schedule” calls Network knows
File sizes of admitted calls Bandwidth of admitted calls
When a new call arrives: The network can figure out when resources will become available
for the new call Network can schedule the new call for a delayed start and provide
this information to the requesting end host End host can then compare this delay with the expected delay on
TCP/IP path
15
Call scheduling on a single link Main question:
Since files can be transferred at any rate, what rate should the network assign to a given file transfer?
16
One simple answer In circuit switched networks, use fixed bandwidth allocation for the
duration of a file transferTDM/FDM scheme Transmission capacity C (bits/sec) divided among n streams Transmission of a file of L(bits) will take Ln/C sec Even if other transfers complete before this transfer, bandwidth
cannot be increased for this particular transfer
Packet-switched system Statistical multiplexing
17
Our answer Greedy scheme ~ allocates maximum bandwidth available that is less
than of equal to , which is the maximum rate requested for call
Varying-Bandwidth List Scheduling (VBLS): End host specifies the file size, maximum bandwidth limit and
a desired start time, and the network returns a time-range capacity allocation vector assigning varying bandwidth levels in different time ranges for the transfer
VBLS with Channel Allocation (VBLS/CA): Special case of practical interest Tracks actual channel allocations in different time ranges
iRmax i
18
NotationSpecified in call request i
Switch’s response
19
VBLS algorithmInitialization step: set time , remaining file size
check for available bandwidth at (if find next change
point in curve), set “next change point”
Case 1 ( and can be transmitted before the next change point in curve)
set , , , ; Terminate loop
Case 2 ( and cannot be transmitted before the next
change point in curve)
set , “next change point” in curve,
set , “next change point” ,
continue repeat loop (go to Initialization step)
ireqTv iF
)(v
iRv max)(
v0)( v
vB ik ))(/( vvE i
k )(vC ik ki
iRv max)(
)(t
)(tvB ik i
kE
)(t
)(vC ik
)()( vxvE ik v 1 kk
)(t
)(t
20
VBLS algorithm con’t Case 3 ( and can be transmitted before the next change pint in
curve) set , , , ; Terminate loop
Case 4 ( and cannot be transmitted before the next change in curve) set , “next change point” in curve, set , “next change point” , continue repeat loop (go to Initialization step)
)(t
vB ik )/( max ii
k RvE
iik RC max
ki
iRv max)(
ikE
iik RC max
iik xRvE max)( v
1 kk
iRv max)(
)(t
vB ik )(t
21
Example of VBLS by figure
CircuitSwitch
S3
Shared single link
Ch. 1
Ch. 2
S1
S2 Ch. 3Ch. 4
D
t=1 t=2 t=3 t=4 t=5)(t
time1234
:Available time ranges
)2,1,2( 1max
11 RTF req
TRC1
)2,1,2( 2max
22 RTF req
TRC2
)3,3,5( 3max
33 RTF req
TRC3
22
VBLS/CA algorithm Four additions:
1) Track channel availability with time for each channel in addition to tracking total available bandwidth curve ( ). Furthermore, track the channel availability in each time change point in
2) Track the set of open channels - to save the switch programming time3) If multiple channels are allocated with the same time range, we
count each allocation as a separate entry in the Time-Range-channeL (TRL) vector
4) For many candidate channels, there are two rules: 1’st rule: If file transfer completes within a time range, choose
the channel with smallest leftover time 2nd rule: If file transfer does not complete within a time range,
choose the channel with largest leftover time
)(t)(t
23
Example of VBLS/CA
Parameter Value
10
2
75 MB
1 Gbps
ireqTiRmaxiF
leftover(MB)
Copen TRLi
Round 0 75 { } { }
Round 1 50 {1,4} (10,20,1)(10,20,4)
Round 2 25 {4} (20,30,1)(20,30,4)
Round 3 12.5 { } (30,40,4)
Round 4 0 {1} (40,50,1)
),,( ik
ik
ik LEB
12.5 MB can be sent
24
Traffic model
File arrival request ~ Poisson process File size( ) distribution ~ the bounded Pareto distribution,
where , the shape parameter (for the entire simulation 1.1), k , and p
are the lower and upper bounds, respectively, of allowed file-size range
~ varies depending on the simulation settings
,
1
)(1
pk
xkxf X pxk
iRmax
iF
25
Validation of simulation against analytical results
Assumptions: 1. All file requests set their to
match the link capacity, C. 2. Arrival rate is Poisson process3. Service rate is bounded Pareto
distribution4. k = 500MB and p = 100GB
iRmax
)1(2][)(
2
XEWEAnalytical model: ,
2222 11
2)/(1(1][
pkpkk
CXEwhere
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
System load
File
late
ncy
(sec
)
Analytical model Simulation
26
Sensitivity analysisWe carry out four experiments:
(1) To understand the impact of when all calls request the constant
(2) To understand the impact of the allowed file-size range (i.e., parameters (k and p))
(3) To understand the impact of the when calls request the three different values of (i.e., (1,2,4) and (1,5,10))
(4) To understand the impact of the size of
iRmaxiRmax
iRmaxiRmax
discreteT
27
Sensitivity analysis Con’tFirst Experiment: k = 500MB, p = 100GB, and (1, 5, 10, and 100 channels)
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
System load
File
late
ncy
(sec
)
Rimax = 100 channels
Rimax = 1 channel
Rimax = 5 channels
Rimax = 10 channels
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
2.5
System load
Mea
n fil
e tra
nsfe
r del
ay (s
ec)
Rimax = 100 channels
Rimax = 1 channel
Rimax = 5 channels
Rimax = 10 channels
File latency: the mean waiting time across all files transferred Mean file transfer delay: file latency + mean service time (transmission delay)
File latency comparison Mean file transfer delay comparison
iRmax
28
Sensitivity analysis Con’tSecond Experiment: Case 1: k = 500MB, p = 100GB, = 1.1Case 2: k = 10GB, p = 100GB, = 1.1 Question: In which case is the variance is larger at first glance?
22
2
))(()(
)1(2)()(
XEXEXE
XEWE
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
System load
File
late
ncy
(sec
)
Rimax = 10 channels (k = 10GB)
Rimax = 5 channels (k = 10GB)
Rimax = 1 channel (k = 10GB)
Rimax = 10 channels (k = 500MB)
Rimax = 5 channels (k = 500MB)
Rimax = 1 channel (k = 500MB)
0 10 20 30 40 5010
0
101
102
103
104
105
106
|
bin-number
Freq
uenc
y
|
Bin 9 (Mean file size = 24.6 GB)
Bin 1 (Mean file size = 2.27 GB)
k = 10GB; p = 100GB; alpha = 1.1
k = 500MB; p = 100GB; alpha = 1.1
Case 1
Case 2
Case 1
Case 2
29
Sensitivity analysis Con’tThird ExperimentCase 1: (per-channel rate) = 10Gbps, C = 1Tbps (100 channels), Case 2: (per-channel rate) = 1Gbps, C =100Gbps (100 channels), File throughput: long-run average of the file size divided by the file transfer delay
channelsorRi 4,2,1max
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 15
10
15
20
25
30
35
40
System load
File
thro
ughp
ut, G
bps
Rimax = 1 channel
Rimax = 2 channels
Rimax = 4 channels
Case 1 Case 2
channelsorR i 10,5,1max
0 10 20 30 40 50 60 70 80 90 1000
1
2
3
4
5
6
7
8
9
10
System load
File
thro
ughp
ut, G
bps
Rimax = 1 channel
Rimax = 5 channels
Rimax = 10 channels
30
Sensitivity analysis Con’t
1max iR
Fourth ExperimentAssumptions: 1. all calls request the same and the link capacity C = 100 channels 2. vary the value of the discrete time unit ( Tdiscrete) as 0.05, 0.5, 1, and 2 sec.
0 0.2 0.4 0.6 0.8 10
10
20
30
40
50
60
70
80
90
100
System load
File
thro
ughp
ut, G
bps
Tdiscrete = 50msec
Tdiscrete = 0.5sec
Tdiscrete = 1secTdiscrete = 1sec
Tdiscrete = 2sec
0 0.2 0.4 0.6 0.8 10
10
20
30
40
50
60
70
80
90
100
System load
Util
izat
ion
(%)
Tdiscrete = 50msec
Tdiscrete = 0.5sec
Tdiscrete = 1sec
Tdiscrete = 2sec
1max iR
31
Comparison of VBLS with FBLS and PSBasic simulation setup:
File arrival requests ~ Poisson process Per channel rate = 10Gbps, = 1, 5, or 10 channels (30%,
30%, and 40%) Bound Pareto input parameters: = 1.1, k = 5MB, and p = 1GB Packet-switched system: files are divided into packets (1500B), and
arrive at the infinite packet buffer at a constant packet rate equal to divided by the packet length
iRmax
iRmax
32
Comparison of VBLS with FBLS and PS Con’t
1max iR 5max
iR 10max iR
• The performance of the VBLS scheme has proved to be much better than the FBLS scheme • The throughput performance of VBLS is indistinguishable from packet switching. This serves to illustrate our main point – that by taking into account file sizes and varying the bandwidth allocation for each transfer over its transfer duration, we mitigate the performance degradation usually associated with circuit-based methods.
• This work was presented at GAN’04[3] and PFLDNET’04[4] and will be published in ICC2004 [5].
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11
2
3
4
5
6
7
8
9
10
System load
File
thro
ughp
ut, G
bps
FBLS
VBLS
PS
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
5
10
15
20
25
30
35
40
45
50
System load
File
Thr
ough
put,
Gbp
s
FBLS
VBLS
PS
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
10
20
30
40
50
60
70
80
90
100
System load
File
thro
ughp
ut, G
bps
FBLS
VBLS
PS
33
Call scheduling on multiple-link case Multiple-link cases
Centralized online greedy scheme Create a new reflecting the available bandwidth
for all links Distributed online greedy scheme
Needs some kind of mechanism to merge TRC and TRL vectors for multiple switches Practical issues
Clock synchronization Propagation delay
)(tnew
34
Some additional notations for multiple-link case
Symbol Meaning
Time-range-capacity allocation: Capacity is assigned to call i in time range k starting at and ending at by switch n . Since the number of time ranges can change from link to link, we add the subscript n to .
Time range capacity allocation: Capacity is to be released i starting at and ending at at switch (n-1)
M Multiplicative factor used in reserving TRCs; if 5, then TRC vector reserved is 5 times the TRC allocation needed to transfer the file
)},,2,1(),,,1(),,,{( NnkCEB
TRCin
ifkn
ifkn
ifkn
inf
ikC
ifknB if
knEin
)},,2,1(),,,1(),,,{( NnkCEB
TRCin
irkn
irkn
irkn
inr
ikC
irknB ir
knE
35
VBLS example for M = 1 by figure
SW1Ch. 1
Ch. 2S1D1
Ch. 3Ch. 4
t=1t=2 t=3t=4 t=5
)(1 t
time
1234
:Available time ranges
SW2 SW3
time
1234
t=1t=2t=3t=4 t=5
)(2 t
t=6
t=6
)1,2,1,3( 1max
11 MRTF req fTRC1
X (blocked)
36
VBLS example for M = 2 by figure
SW1Ch. 1
Ch. 2S1D1
Ch. 3Ch. 4
t=1t=2t=3t=4 t=5
)(1 t
time
1234
:Available time ranges
SW2 SW3
time
1234
t=1t=2t=3t=4 t=5
)(2 t
t=6 t=6
)2,2,1,3( 1max
11 MRTF req fTRC1 fTRC2
rTRC1
37
Traffic model
Source DestSW 1 SW 2 SW 3
Src1 Src2
Dest1 Dest2
Interference traffic Interference traffic
Study traffic
Bounded Pareto input parameters: = 1.1, k = 500MB, and p = 100GB
Study traffic: the mean call interarrival timeused by Source is 10 files/sec (constant)
Interference traffic: the mean call interarrivaltimes used for the interference traffic arevaried (5, 10, 15, 20, 25, 30, 35, and 40 file/sec)
38
Sensitivity analysis
We carry out two experiments:
(1) To understand the impact of M (Multiplicative factor) M = 2, 3, and 4
(2) To understand the impact of the discrete time unit (Tdiscrete)
Tdiscrete = 0.01, 0.1, and 1 sec
39
Sensitivity analysis Con’tFirst experiment (Impact of M): varies the size of M as 2, 3, and 4, but fixes propagation delay and Tdiscrete as 5ms and 10ms respectively.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80
10
20
30
40
50
60
70
80
90
100
Interference traffic load
Blo
cked
cal
ls(%
)
M = 2 M = 3
M = 4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80
1
2
3
4
5
6
7
8
9
10
Interference traffic load
File
thro
ughp
ut, G
bps
M = 2
M = 3
M = 4
Percentages of blocked calls comparison File throughput comparison
40
Sensitivity analysis Con’t
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80
10
20
30
40
50
60
70
80
90
100
Interference traffic load
Blo
cked
cal
ls(%
)
Tdiscrete = 0.01sec
Tdiscrete = 0.1sec
Tdiscrete = 1sec
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80
1
2
3
4
5
6
7
8
9
10
Interference traffic load
File
thro
ughp
ut, G
bps
Tdiscrete = 0.1sec
Tdiscrete = 1sec
Tdiscrete = 0.01sec
Second experiment (Impact of Tdiscrete): varies the size of Tdiscrete as 0.01, 0.1, and 1 sec, but fixes propagation delay and M as 5ms and 3 respectively.
Percentages of blocked calls comparison File throughput comparison
41
Future work We can include a second class of user requests specifically targeted at
interactive applications (long-holding time applications), i.e., remote visualization and simulation steering. Such requests will be specified as
The simulation results for the multiple-link case are only preliminary More possible sets of comparison via simulations Varying the propagation delays for the links, but fixing other
parameters such as M and Tdiscrete
Comparison between TCP/IP(FAST TCP) and VBLS scheme Assume finite buffer instead of infinite buffer Take into account the effect of congestion control,
retransmission mechanism when the packet loss exists due to the buffer overflow
Might degrade the performance of packet-switched system
),,,( maxmini
reqiii TRRH
42
References
1. M. Veeraraghavan, H. Lee and X. Zheng, “File transfers across optical circuit-switched networks,” PFLDnet 2003, Feb. 3-4, 2003, Geneva, Switzerland.
2. M. Veeraraghavan, X. Zheng, H. Lee, M. Gardner, W. Feng, "CHEETAH: Circuit-switched High-speed End-to-End Transport ArcHitecture,” accepted for publication in the Proc. of Opticomm 2003, Oct. 13-17, Dallas, TX.
3. H. Lee, M. Veeraraghavan, , E.K.P. Chong, H. Li, “Lambda scheduling algorithm for file transfers on high-speed optical circuits,” Workshop of Grids and Advanced Networks (GAN’04), April 19-22, 2004, Chicago, Illinois.
4. M. Veeraraghavan, X. Zheng, W. Feng, H. Lee, E. K. P. Chong, and H. Li, “Scheduling and Transport for File Transfers on High-speed Optical Circuits,” PFLDNET 2004, Feb. 16-17, 2004, Argonne, Illinois, http://www.-didc.lbl.gov/PFLDnet2004/.
5. M. Veeraraghavan, H. Lee, E.K.P. Chong, H. Li, “A varying-bandwidth list scheduling heuristic for file transfers,” in Proc of ICC2004, June 20-24, Paris, France.
6. M. Veeraraghavan, X. Zheng, W. Feng H. Lee, E.K.P. Chong, H. Li, “Scheduling and Transport for File Transfers on High-speed Optical Circuits,” JOGC 2004 (Journal of Grid Computing)
43
Thank you!