Date post: | 16-Dec-2015 |
Category: |
Documents |
Upload: | matthew-hoyt |
View: | 216 times |
Download: | 0 times |
Adonet Spring School
1
Scheduling algorithms for input-queued IP routers
Emilio Leonardiin collaboration with: P. Giaccone, M. Ajmone Marsan, A Bianco, M.Mellia, F.Neri
Dipartimento di ElettronicaTelecommunication Network Group
http://www.tlc-networks.polito.itPolitecnico di Torino (Italy)
Budapest, March 2006
Adonet Spring School
2
Outline
IP routers OQ routers IQ routers
Scheduling Optimal algorithms Heuristic algorithms Packet-mode algorithms Networks of routers
CIOQ routers Multicast traffic Conclusions
Adonet Spring School
3
Note
The slides marked RWP are reproduced with permission of Prof.Nick McKeown from the Electrical Engineering and Computer Science Dept. of Stanford University (CA,USA)
Adonet Spring School
4
Outline
IP routers OQ routers IQ routers
Scheduling Optimal algorithms Heuristic algorithms Packet-mode algorithms Networks of routers
CIOQ routers Multicast traffic Conclusions
Adonet Spring School
5
“The Internet is a mesh of routers”
core router
access router
enterprise router
Adonet Spring School
6
Access router: high number of ports at low speed (kbps/Mbps) several access protocols (modem, ADSL, cable)
Enterprise router: medium number of ports at high speed (Mbps) several services (IP classification, filtering)
Core router: moderate number of ports at very high speed (Mbps/Gbps) very high throughput
“The Internet is a mesh of routers”
Adonet Spring School
7
Basic functions
Routing computation of the output port of
an incoming packet uses the routing tables computed by
the routing protocols can be a complex procedure:
• very large routing tables• dynamic variation of routes in the Internet
Adonet Spring School
8
Basic functions
Switching transfer of packets from input ports
to output ports solution of the contentions for output ports
• queueing– where to store
• scheduling – what to transfer
Adonet Spring School
9
Faster and faster
Need for high performance routers to accommodate the bandwidth demands
for new users and new services to support QoS to reduce costs
Adonet Spring School
10
Packet processing and link speed
0,1
1
10
100
1000
10000
1985 1990 1995 2000
Spec
95In
t CPU
resu
lts
0,1
1
10
100
1000
10000
1985 1990 1995 2000
Fib
er
Ca
pa
cit
y (
Gb
it/s
)
TDM DWDM
Packet processing Power Link Speed
Moore’s law2x / 18 months
2x / 7 months
Source: SPEC95Int & David Miller, Stanford.
RWP
Increase of electronic packet processing power cannot accommodate the increase in link speed
?
Adonet Spring School
110,001
0,01
0,1
1
10
100
1000
1980 1983 1986 1989 1992 1995 1998 2001
Acc
ess
Tim
e (n
s)
Moore’s Law2x / 18 months
1.1x / 18 months
RWP
Memory access time
Adonet Spring School
12
It’s hard to keep up with Moore’s law: the bottleneck is memory speed
Moore’s law is too slow: routers need to improve faster
than Moore’s law
RWP
Moore’s law
Adonet Spring School
13
Router capacity exceeds Moore’s law
Growth in capacity of commercial routers: 1992 ~ 2 Gb/s 1995 ~ 10 Gb/s 1998 ~ 40 Gb/s 2001 ~ 160 Gb/s 2003 ~ 640 Gb/s
Average growth rate: 2.2x / 18 months
RWP
Adonet Spring School
14
Single packet processing
The time to process one packet is becoming shorter and shorter worst case: 40-Byte packets (ACKs)
travelling over the Internet• 3.2 s at 100 Mbps• 320 ns at 1 Gps• 32 ns at 10 Gps• 3.2 ns at 100 Gbps• 320 ps at 1Tbps
Adonet Spring School
15
S F
LC
LC
LC
LC
CP
S F
IP
IP
IP
IP
CP
OP
OP
OP
OP
Hardware architecture
physical structure logical structure
Adonet Spring School
16
Hardware architecture
Main elements line cards
support input/output transmissions store packets adapt packets to the internal format of the switching fabric support data link protocols classify packets schedule packets support security
switching fabric transfers packets from input ports to output ports
Adonet Spring School
17
Main elements
control processor/network processor runs routing protocols computes routing tables manages the overall system
forwarding engines compute the packet destination (lookup) inspect packet headers rewrite packet headers
Hardware architecture
Adonet Spring School
18
switching fabric
line card line card
control
processor
forwarding
engine
forwarding
engine
1 N
Interconnections among main elements - I
Adonet Spring School
19
Interconnections among main elements - II
switching fabric
line card &forwarding engine
control
processor
1
line card &forwarding engine
N
Adonet Spring School
20
Cell switch (fabric) ORM1
ORMN
1
ISM
N
ISM
packets cells cells packets
Cell-based routers
ISM: Input-Segmentation Module
ORM: Output-Reassembly Module
packet: variable-size data unit
cell: fixed-size data unit
Adonet Spring School
21
Switching fabric
Our assumptions: bufferless
• to reduce internal hardware complexity non-blocking
• it is always possible to transfer in parallel from input to output ports any non-conflicting set of cells
Adonet Spring School
22
Switching fabric
Examples: crossbar rearrangeable Clos network Benes network Batcher-Banyan network (self-routing)
Switching constraints at most one cell for each input and for each
output can be transferred
1234
1 2 3 4outputs
inpu
ts
Adonet Spring School
23
Switching fabric
We do not discuss switching fabrics with internal buffers e.g.: crossbars with buffer at each crosspoint
Adonet Spring School
24
Generic switching architecture
Output 1
switching fabric
Input 1
Input N Output N
Sin
Sin
Sout
Sout
input queues output queues
Adonet Spring School
25
Speedup
The speedup determinates the switch performance: Sin = reading speed from input queues
Sout = writing speed to output queues
maximum speedup factor:
S = max(Sin,Sout)
Adonet Spring School
26
Performance comparison
The performance of different switching systems can be studied with analytical models
• introducing simplifying assumptions, but obtaining general results
with simulation models• obtaining more detailed results
Adonet Spring School
27
Traffic description
Aij(n) = 1 if a packet arrives at time n at input i, with destination reachable through output j
ij = E[Aij(n)] An arrival process is admissible if:
i ij 1
j ij 1
• that is, no input and no output are overloaded on average
• note that OQ switches exhibit finite delays only for admissible traffic
traffic matrix: = [ij ]
Adonet Spring School
28
Traffic scenarios
Uniform traffic Bernoulli i.i.d. arrivals usual testbed in the literature
• “easy to schedule”
Diagonal traffic Bernoulli i.i.d arrivals critical to schedule, since only two matchings are good
2001
1200
0120
0012
3
1111
1111
1111
1111
N
Adonet Spring School
29
Traffic scenarios
LogDiagonal traffic Bernoulli i.i.d. arrivals more critical than uniform,
less than diagonal traffic
8124
4812
2481
1248
12N
Adonet Spring School
30
Outline
IP routers OQ routers IQ routers
Scheduling Optimal algorithms Heuristic algorithms Packet-mode algorithms Networks of routers
CIOQ routers Multicast traffic Conclusions
Adonet Spring School
31
Output Queued (OQ) switches
Sin = 1 Sout = N used for low bandwidth routers
no coordination among ports work-conserving
• best average delays complete control of delays
• support of QoS scheduling
Adonet Spring School
32
Output Queued (OQ) switch
speedup N
Output N
Output 1
switching fabric
Input 1
Input N
Adonet Spring School
33
OQ performance
OQ
Note: OQ is optimal from the point of view of average delay and
throughput
Uniform traffic
Adonet Spring School
34
Outline
IP routers OQ routers IQ routers
Scheduling Optimal algorithms Heuristic algorithms Packet-mode algorithms Networks of routers
CIOQ routers Multicast traffic Conclusions
Adonet Spring School
35
Simple Input Queued (IQ) switches
Sin = 1 Sout = 1 1 FIFO queue for each input port throughput limitations
due to head of the line (HOL) blocking scheduling
to solve contentions
for the same output
Output N
Input 1 Output 1
switching fabric
Input 1
Adonet Spring School
36
Head of the Line (HOL) Blocking
RWP
Adonet Spring School
37
Simple IQ switch performance
OQSimple IQ
Uniform traffic
%5822
Adonet Spring School
38
Improving simple IQ switches
Window/bypass schedulers the first w cells of each queue contend
for outputs HOL blocking is reduced, not eliminated w = 1 means FIFO at each input higher complexity
• the scheduler deals with wN cells • non-FIFO queues
Adonet Spring School
39
Improving IQ switches
Virtual output queueing (VOQ) one queue for each input/output pair
• N queues at each input• N2 queues in the whole switch
eliminates HOL blocking used in high-bandwidth routers
• scheduling implemented in hardware at very high speed
Adonet Spring School
40
IQ switches with VOQ
Output N
Input 11
N
Output 1
Input N1
Nschedule
r
switching fabric
Note: from now on, we always assume VOQ at the switch inputs
input constraints
output constraints
Adonet Spring School
41
Outline
IP routers OQ routers IQ routers
Scheduling Optimal algorithms Heuristic algorithms Packet-mode algorithms Networks of routers
CIOQ routers Multicast traffic Conclusions
Adonet Spring School
42
Scheduling in IQ switches
Scheduling can be modeled as a matching problem in a bipartite graph the edge from node i to node j refers to packets
at input i and directed to output j the weight of the edge can be
• binary (not empty/empty queue)• queue length• HOL cell waiting time, or cell age• some other metric indicating the priority
of the HOL cell to be served
Adonet Spring School
43
Scheduling in IQ switches
Request Graph
Matching (or Permutation)
inputs outputs
scheduler
Adonet Spring School
44
Scheduling in IQ switches
Request Matrix
3 5 0 0
2 0 0 4
4 5 0 0
0 0 8 2
0 1 0 0
0 0 0 1
1 0 0 0
0 0 1 0
scheduler
Permutation
Adonet Spring School
45
Implementing schedulers
Scheduling is a complex task a scheduling algorithm can be implemented
in hardware if:• it shows good performance for a wide range
of traffic patterns• it can be efficiently parallelized• it can be efficiently pipelined• it requires few iterations (or clock cycles)• it requires limited control information
Adonet Spring School
46
Scheduling uniform traffic
A number of algorithms give 100% throughput when traffic is uniform For example:
• TDM and a few variants• iSLIP (see later)
RWP
Example of TDM for a 4x4 switch
Adonet Spring School
47
Birkhoff - von Neumann theorem
Any doubly stochastic matrix can be
expressed as convex combination of permutation matrices n:
= n an n
with
an≥0
n an =1
Adonet Spring School
48
Scheduling non-uniform traffic
thanks to the Birkhoff - von Neumann theorem
If the traffic is known and admissible, 100% throughput can be achieved by a TDM using: for a fraction of time a1 matching M1 (
for a fraction of time a2 matching M2 (
for a fraction of time ak matching Mk (
Adonet Spring School
49
Outline
IP routers OQ routers IQ routers
Scheduling Optimal algorithms Heuristic algorithms Packet-mode algorithms Networks of routers
CIOQ routers Multicast traffic Conclusions
Adonet Spring School
50
Maximum Size Matching
Maximum Size Matching (MSM) among all the possible matchings, selects the one
with the highest number of edges • MSM is generally not unique
the best MSM algorithm requires O(N2.5) iterations, and cannot be implemented efficiently, since it is based on a flow augmentation path algorithm
Adonet Spring School
51
Instability of MSM Assume:
P(arrival at Q12) =
P(arrival at Q11) = P(arrival at Q22) = 1--
Q12 = B » 0 Q11 = Q22 = 0
in case of parity serve Q11 and/or Q22 instead of Q12
Observe: Q12 is served only when A11 = 0 and A22 = 0, i.e. with probability:
P(serve Q12) = P(no arrivals at both Q11 and Q22 ) = [1-(1--)]2 = (+)2
P(serve Q12) < P(arrival at Q12) if is small enough
Example: = 0.5; = 0.1; P(serve Q12) = 0.36 In1
In2
Out1
Out2
1--
1--
Note: this proof is due to I.Keslassy, Stanford Univ.
Adonet Spring School
52
Maximum Size Matching
MSM maximizes the instantaneous throughput
MSM may not yield 100% throughput short term decisions can be inefficient
in the long term non-binary edge weights allow MWM
to maximize the long-term throughput
Adonet Spring School
53
Maximum Weight Matching
Maximum Weight Matching (MWM) among all the possible N! matchings, selects the one
with the highest weight (sum of the edge metrics)• MWM is generally not unique
MWM is too complex to be implemented in hardware at high speed
• the best MWM algorithm requires O(N3) iterations, and cannot be implemented efficiently, since it is based on a flow augmentation path algorithm
• cannot be parallelized and pipelined efficiently MWM has never been implemented in a commercial
chipset
Adonet Spring School
54
Maximum Weight Matching
In case of unknown traffic, MWM is the optimal solution of the scheduling problem when the weight is either the queue length or the cell age achieves 100% throughput under any traffic
• also under non-Bernoulli arrival processes, satisfying the law of large numbers
achieves low average delays, very close to those of OQ switches
possible starvation for lightly loaded packet flows
Adonet Spring School
55
Maximum Weight Matching
MWM is the optimal solution of the scheduling problem when the traffic is unknown, when the weight is either the queue length or the cell age achieves 100% throughput under any traffic
• also under non-Bernoulli arrival processes, satisfying the law of large numbers
achieves low average delays, very close to those of OQ switches
possible starvation for lightly loaded packet flows
Adonet Spring School
56
MWM with pipeline and latency
Let T and P be fixed Dt denotes the matching used at time t The following variations of MWM also achieve
100% throughput: Dt = MWM(t-P) MWM with pipeline degree P
Dt = MWM(ceil(t/T)•T) MWM with latency T
combinations of both thus, it seems easy to achieve 100% throughput!
Adonet Spring School
57
MWM with pipeline and latency
Bit: What about throughput?
• 100% throughput– but needs the computation of a MWM …
What about delays?• delays can be really bad!
Adonet Spring School
58
General consideration
When scheduling in IQ switches, it is very difficult to achieve simultaneously high throughput low delay limited implementation complexity
Adonet Spring School
59
Uniform traffic MWM and MSM behave almost identically
1
10
100
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Me
an
de
lay
Normalized Load
Uniform Traffic
MWM MSM
Adonet Spring School
60
LogDiagonal traffic MSM is somewhat inferior to MWM
1
10
100
1000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Me
an
de
lay
Normalized Load
LogDiagonal Traffic
MWM MSM
Adonet Spring School
61
Diagonal traffic MSM yields much longer delays than MWM at medium/high loads
1
10
100
1000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Me
an
de
lay
Normalized Load
Diagonal Traffic
MWM MSM
Adonet Spring School
62
Outline
IP routers OQ routers IQ routers
Scheduling Optimal algorithms Heuristic algorithms Packet-mode algorithms Networks of routers
CIOQ routers Multicast traffic Conclusions
Adonet Spring School
63
Approximations of MSM and MWM
Motivation strong interest in scheduling algorithms with
• very low complexity• high performance
Usually implementable schedulers (low complexity)
low throughput, long delays theoretical schedulers (high complexity)
high throughput, short delays
Adonet Spring School
64
Some implementable algorithms
Approximate MSM WFA, iSLIP, 2DRR, RC, FIRM and many others
Approximate MWM with wij = Xij (queue length)
iLQF, RPA, learning algorithms Approximate MWM with wij = cell age
iOCF Approximate MWM with wij = i Xij+ j Xij
iLPF, MUCS
Adonet Spring School
65
APPROXIMATIONS OF MAXIMUM SIZE
MATCHING
Adonet Spring School
66
Wave Front Arbiter
Requests Match
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
RWP
Adonet Spring School
67
Wave Front Arbiter
Requests Match
RWP
2N-1 steps
Adonet Spring School
68
Wrapped Wave Front Arbiter
Requests Match
N steps instead of2N-1
RWP
Adonet Spring School
69
iSLIP
iSLIP means “iterative SLIP” iterates among the following 3 phases
Request Grant Accept
Adonet Spring School
70
iSLIP 3 phases:
Request (from inputs to outputs)• each unmatched input sends a request
to every output for which it has a cell Grant (from outputs to inputs)
• if an unmatched output receives requests, it sends a grant to one of the inputs
– contentions solved by a round-robin mechanism Accept (from inputs to outputs)
• if an unmatched input receives grants, it selects a single output and it becomes matched to it
– contentions solved by a round-robin mechanism
Adonet Spring School
71
iSLIP
The round robin mechanism in iSLIP is designed so that, under uniform traffic, iSLIP emulates a dynamic TDM scheduler synchronized on the arrival pattern
Adonet Spring School
72
iSLIP
iSLIP is maximal• often, with log N iterations• always, with N iterations
iSLIP was implemented on one chip in the Cisco 12000 router http://www.cisco.com/warp/public/cc/pd/rt/12000/tech/fasts_wp.pdf
Adonet Spring School
73
iSLIP
iSLIP demo
from: http://tiny-tera.stanford.edu/tiny-tera/demos/index.html
Adonet Spring School
74
APPROXIMATIONS OF MAXIMUM WEIGHT
MATCHING
Adonet Spring School
75
iLQF
iLQF means “iterative Longest Queue First” iterates among the following 3 phases
Request Grant Accept
Adonet Spring School
76
iLQF 3 phases:
Request (from inputs to outputs)• each unmatched input sends all its queue lengths
as requests to corresponding outputs Grant (from outputs to inputs)
• if an unmatched output receives requests, it sends a grant to the input corresponding to the longest queue
– contentions solved by random choice
Accept (from inputs to outputs)• if an unmatched input receives grants, it selects
the output with the longest queue– contentions solved by random choice
Adonet Spring School
77
iLQF
iLQF is maximal• often, with log N iterations• always, with N iterations
iLQF is robust to non-uniform traffic
Adonet Spring School
78
iLQF
iLQF demo
from: http://tiny-tera.stanford.edu/tiny-tera/demos/index.html
Adonet Spring School
79
RPA
RPA means “Reservation with Preemption and Acknowledgment”
Two phases Reservation (possibly preemptive) Acknowledgement
Sequential accesses to a reservation vector Urgj (if set) is the urgency of the transfer from
input Inj to output j
Urg1,In1 Urg2,In2 Urg3,In3 UrgN,InN
Out 1 Out 2 Out 3 Out N
Vector Res
Adonet Spring School
80
RPA
Vector Res is sequentially accessed by all inputs
Res
Input 1 Input 2
Input 4 Input 3
Adonet Spring School
81
RPA
Initially, at each round: Urgj = 0 for all j
Reservation phase when input i accesses Res
it computes Wj = Xij – Urgj for all j finds j* such that Wj* = max{ Wj } if Wj* > 0,
reserve output j* and set Urgj*=Xij*, possibly overwriting the previous reservation
otherwise, leave the current reservation
Adonet Spring School
82
RPA
Acknowledgement phase if input i still finds its reservation at output j,
books output j otherwise,
chooses an unreserved output j and books output j
Adonet Spring School
83
Uniform traffic
comparison between MWM, iSLIP, iLQF, and RPA
1
10
100
1000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Mea
n de
lay
Normalized Load
Uniform Traffic
MWMiSLIP iLQF RPA
Adonet Spring School
84
LogDiagonal traffic
iSLIP saturates close to 84% throughput
1
10
100
1000
10000
100000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Mea
n de
lay
Normalized Load
LogDiagonal Traffic
MWMiSLIP iLQFRPA
Adonet Spring School
85
Diagonal traffic RPA achieves 98% throughput, iLQF 87%, iSLIP 83%
1
10
100
1000
10000
100000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Mea
n de
lay
Normalized Load
Diagonal Traffic
MWMiSLIP iLQF RPA
Adonet Spring School
86
LEARNING ALGORITMS
Adonet Spring School
87
Learning algorithms
Goal:find a good compromise among
throughput, delay and complexity
Adonet Spring School
88
Learning algorithms
Key observation the matchings generated by MWM show limited
changes from one time to another• remembering the matching from the past simplifies
the computation of the new matching the search implemented by MWM can be enhanced
• with a randomized approach• by observing arrivals• by searching in parallel
based on an extension of randomized scheduling algorithms
Adonet Spring School
89
Simple Randomized Schemes Choose a matching at random and use it
as the schedule doesn’t yield 100% throughput
Choose 2 matchings at random and use the heavier one as the schedule
… Choose N matchings at random and use
the heaviest one as the schedule
None of these can give 100% throughput !
Adonet Spring School
90
0.001
0.01
0.1
1
10
100
1000
10000
0.0 0.2 0.4 0.6 0.8 1.0
Mea
n IQ
Len
Normalized Load
Diagonal Traffic
MWM R32R1
Simple randomized algorithms32x32
Adonet Spring School
91
Bounds on Maximum Throughput
Adonet Spring School
92
Tassiulas’ scheme
Consider the following policy Rt = matching picked at random (uniformly) among
all the possible N! matchings Dt = arg max { W(Dt-1), W(Rt) }
Complexity is very low O(1) iterations easy to pipeline
Yields 100% throughput ! note the boost in throughput is due to memory
of the past matching Dt-1
However, delays are very large
Adonet Spring School
93
0.01
0.1
1
10
100
1000
10000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Mea
n IQ
Len
Normalized Load
Diagonal Traffic
MWMTassiulas
Tassiulas' scheme32x32
Adonet Spring School
94
Learning approach
Properties of COMP1 W(Dt) W(Dt-1)
W(Dt) W(Mt)
Examples: COMP1 is the MAX among
Dt-1 and Mt
COMP1 is the MERGE among Dt-1 and Mt
Dt-1
Dt
COMP1
Mt
Adonet Spring School
95
Merging
3
2
3
3
1
3-1+2-2=2
2-1+2-4=-1
3
2
3
2
2
X
W(X)=12
1
2
3
4
1
R
W(R)=10
M
W(M)=13
Emulating MWM is O(N)
MERGE procedure
Adonet Spring School
96
The learning approach
Dt-1
Dt
COMP1
Mt
Properties of Mt
informally, Mt should be a “good” sample in the space of all possible matchings
Examples: Mt is a matching picked uniformly at random Mt is a matching picked non-uniformly
at random, with a high probability of being heavy
Mt is derived from the arrival vector At
Mt is a good “neighbor” of Dt-1
Adonet Spring School
97
Theoretical properties
Dt-1
Dt
COMP1
Mt
Stability 100% throughput under any
admissible Bernoulli traffic pattern
Delay the better is the weight of Mt ,
the smaller are the queue lengths, and hence the smaller are the delays
Adonet Spring School
98
Dt-1
Mt
Dt
MAX
MAX
N1 NK
At
K-th neighbor of
Dt-1
Example of practical implementation Exploiting parallel search:
This scheme is called APSARA
Adonet Spring School
99
What is a “neighbor” of a matching?
• Each neighbor– differs from Dt-1 in ONLY TWO edges– can be generated very easily in hardware
3 neighbors
• Example: 3 x 3 switch Dt-1
N1 N2 N3
Adonet Spring School
100
Max-APSARA
APSARA, as described before, is not maximal
Max-APSARA is a modified version of APSARA where a maximal size matching algorithm runs on the remaining unmatched inputs/outputs e.g., if k inputs/outputs are unmatched,
• run iSLIP with k iterations• select k random edges among the
unmatched inputs/outputs
Adonet Spring School
101
APSARA performance
0.01
0.1
1
10
100
1000
10000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Me
an
IQ
Le
ng
th
Normalized Load
Diagonal Traffic
MWMMaxAPSARA APSARA iSLIPiLQF
Adonet Spring School
102
Outline
IP routers OQ routers IQ routers
Scheduling Optimal algorithms Heuristic algorithms Packet-mode algorithms Networks of routers
CIOQ routers Multicast traffic Conclusions
Adonet Spring School
103
Routers and switches
IP routers deal with variable-size packets Hardware switching fabrics often deal
with fixed-size cells
Question: how to integrate an hardware switching fabric
within an IP router?
Adonet Spring School
104
Router based on an IQ cell switch: cell-mode
switching fabric
IQ cell switch1 ISM
N ISM
ORM1
ORMN
Adonet Spring School
105
Cell-mode scheduling
Scheduling algorithms work at cell level pros:
• 100% throughput achievable cons:
• interleaving of packets at the outputs of the switching fabric
Adonet Spring School
106
Router based on an IQ cell switch: packet-mode
switching fabric
IQ cell switch1 ISM
N ISM
ORM1
ORMN
NO packet interleaving
if packet-mode
Adonet Spring School
107
Router based on an IQ cell switch: packet-mode
switching fabric
IQ cell switch1 ISM
N ISM
ORM1
ORMN
NO packet interleaving
if packet-mode
ORMs can be removed
Adonet Spring School
108
Packet-mode scheduling
Rule: packets transferred as trains of cells when an input starts transferring the first cell
of a packet comprising k cells, it continues to transfer in the following k-1 time slots
Pros: no interleaving of packets at the outputs easy extension of traditional schedulers
Cons: starvation due to long packets
• inherent in packet systems without preemption• negligible for high speed rates
Adonet Spring School
109
Packet-mode scheduling
Questions can packet mode provide high
throughput?
what about delays?
YES!
It depends…
Adonet Spring School
110
Packet-mode properties
Main theoretical results MWM in packet-mode yields 100% throughput Packet mode can provide shorter delays
than cell mode, depending on the packet length distribution
Adonet Spring School
111
Simulation scenario
Router with ISMs and ORMs Uniform packet traffic
uniform packet load uniform (1,192) packet size
distribution Spotted packet traffic
non uniform packet load bimodal (3,100) packet size
distribution
1 1 1 0 1 0 1 0 0 1 0 1 1 1 0 1 1 0 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 0 1 0 1 0 1 1 0 1 0 1 0 1 1 1 1 0 1 1 1 0 1 00 1 1 1 0 1 0 1
P=
Adonet Spring School
112
Uniform packet traffic Packet mode and cell mode reach the same throughput
Cell-mode Packet-mode
100
1000
10000
100000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Me
an
pac
ket
del
ay
Normalized Load
Uniform packet traffic for packet mode
100
1000
10000
100000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Me
an
pac
ket
del
ay
Normalized Load
Uniform packet traffic for cell mode
MWM MSM iSLIP iLQF
Adonet Spring School
113
Spotted packet traffic Packet mode reaches higher throughput than cell mode
100
1000
10000
100000
0.5 0.6 0.6 0.7 0.7 0.8 0.8 0.9 0.9 1.0 1.0
Me
an
pac
ket
del
ay
Normalized Load
Spotted packet traffic for packet mode
100
1000
10000
100000
0.5 0.6 0.6 0.7 0.7 0.8 0.8 0.9 0.9 1.0 1.0
Me
an
pac
ket
del
ay
Normalized Load
Spotted packet traffic for cell mode
MWMMSMiSLIPiLQF
Cell-mode Packet-mode
Adonet Spring School
114
At high load PM becomes better
Effect of packet size distribution
iSLIP delayCM/delayPM for different packet size distributions
better PM
better CM
0
0.5
1
1.5
2
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Pac
ket m
ode
gain
for
iSLI
P
Normalized load
UniformExponentialTrimodalBimodal
Adonet Spring School
115
Packet mode features
Packet mode scheduling is a feasible modification of schedulers improves throughput
• but it can generate some unfairness between long and short packets
– inherent to all variable-packet networks without preemption
may give better packet delays than cell mode• depends on the packet size distribution
Adonet Spring School
116
Outline
IP routers OQ routers IQ routers
Scheduling Optimal algorithms Heuristic algorithms Packet-mode algorithms Networks of routers
CIOQ routers Multicast traffic Conclusions
Adonet Spring School
117
Network of IQ routers
Question: given a network of IQ switches and an
admissible input traffic, is the network always stable?
NO!
this is quite counterintuitive…but true
Adonet Spring School
118
Networks of IQ routers
Consider the acyclic network of IQ routers in the following slide derived from well established results
from adversarial queueing theory a very specific scenario, but comprises
only few switches…• this situation may not be common,
but cannot be excluded in real networks
Adonet Spring School
119
Pathological network of IQ switches
Network with 8 switches and 4 flows
Adonet Spring School
120
Instability of MWM
If MWM is adopted at each IQ router, and the traffic is admissible, the system can be unstable under Bernoulli i.i.d. arrivals
Adonet Spring School
121
Instability of MWM
MWM is too greedy, in the sense that it can create traffic bursts that are amplified by each scheduler
A server can be idling when large bursts (directed to it) are blocked because of the contentions upstream the problem arises when a packet flow is
subject to priority changes along its path through the network• it is “dangerous” to increase priority
along the path
Adonet Spring School
122
Stability in networks of routers
Global policies “Oldest in the network” and many others
• problem: requires global information about the network, and perfectly synchronized clocks at the ingress of the network
Local policies until now, nothing really satisfying known …
(work in progress)
Adonet Spring School
123
Stability in networks of routers
Semi-local policies MWM with local information about the router
neighbors can achieves 100% throughput under i.i.d. Bernoulli arrivals
Virtual network queue• the weights used by MWM are:
– wij = max{0,Xij-H(Xij)}
where H(Xij) is the size of the queue upstream which is sending packets to Xij
Adonet Spring School
124
Outline
IP routers OQ routers IQ routers
Scheduling Optimal algorithms Heuristic algorithms Packet-mode algorithms Networks of routers
CIOQ routers Multicast traffic Conclusions
Adonet Spring School
125
CIOQ routers
Output 1
switching fabric
Output N
S
S
o1
oN
Input 1 S
Input N S
VOQ
Adonet Spring School
126
CIOQ routers
Question: if a low speedup S is allowed (and queues
are available at both inputs and outputs), is it possible to design simple scheduling algorithms, capable of achieving high throughput and low delay?
YES!
Adonet Spring School
127
CIOQ routers with S=2
If S = 2 it is easy to obtain 100% throughput
• all maximal matchings work – based on stable marriage algorithms
it is less easy to obtain work conservation• output never idling whenever a packet is present
destined to it • same average delays as OQ• very good delay performance• e.g.: LOOFA
it is difficult to perfectly emulate OQ…
Adonet Spring School
128
LOOFA
The occupancy Cj
is the number of cells currently residing at the j-th output queue
at each time slot, it is decremented by one because of departures
Basic idea of LOOFA give priority to output channels with low
occupancy, thereby attempting to maintain work-conservation for all outputs
Adonet Spring School
129
LOOFA
If S = 2, during each of the two phases each unmatched input selects a non-empty
VOQ directed to the unmatched output with the lowest occupancy, and sends a request to that output
each unmatched output selects one of the requests, and sends a request to that input
repeat until the matching is maximal the selection at the outputs can be
round robin, random, ...
Adonet Spring School
130
CIOQ routers with S=2
If S = 2 it is difficult (but possible) to perfectly emulate
an OQ router in terms of packet departures• it is impossible to distinguish, by observing
arrivals and departures, if the switching architecture is CIOQ or OQ
• delays are perfectly controlled– easy to implement scheduling algorithms
born for OQ (eg: WFQ)
Adonet Spring School
131
CIOQ routers
CIOQ are very promising architectures many degrees of freedom in design
• how to balance input/output buffers• how the buffers interact
– e.g., by backpressure mechanisms
Several currently designed architectures are supposed to be CIOQ
The speedup S is becoming closer and closer to 1 in practical implementations of new switching architectures (CIOQ IQ)
Adonet Spring School
132
Outline
IP routers OQ routers IQ routers
Scheduling Optimal algorithms Heuristic algorithms Packet-mode algorithms Networks of routers
CIOQ routers Multicast traffic Conclusions
Adonet Spring School
133
Multicast traffic
Misleading (but common) idea: observe
1. OQ can achieve 100% throughput under any admissible unicast and multicast traffic
2. OQ can be perfectly emulated by CIOQ with S = 2
then, with S = 2 it is possible to achieve 100% throughput for multicast traffic
WRONG! because observation 2 holds only for unicast traffic
Adonet Spring School
134
Multicast traffic
Question: what is the minimum speedup required
to achieve 100% throughput?
unknown!
Adonet Spring School
135
Multicast traffic
Possible implementations copy network before the switching fabric
• a multicast cell with f destinations is treated as f cells• possible bandwidth inefficiency
dedicated queue• multicast packets are treated in some specific way
1
UC
MC
N
N N
UC+MC
N N
Adonet Spring School
136
Multicast traffic: optimal queueing
MC-VOQ queueing best throughput performance
• avoids HOL blocking 2N-1 queues for each input, one for each fanout set
• re-enqueuing process out-of-sequence problem• no re-enqueuing some throughput degradation
MC+UC
1
2N-1N N
Adonet Spring School
137
Multicast traffic: optimal scheduling
The optimal scheduling for multicast traffic can be defined similarly to unicast traffic it is a sort of max flow algorithm on all N(2N-1)
queues Many heuristics can be envisaged
to approximate it
Adonet Spring School
138
Summary
3 main ingredients for IQ scheduling algorithms:
Weight computation Matching computation Contention resolution
Adonet Spring School
139
Summary
Weight computation obtains the priority of each input queue the metric can be related to queue length,
waiting time of the cell at the HOL, … Contention resolution
whenever the selection is among situations with equal weights
can be round robin, or random
Adonet Spring School
140
Summary
Matching computation computes the matching, trying to maximize
its total weight can be based on
an iterative search, like in iSLIP, iOCF, iLQF
a matrix greedy approach, like in MUCS, WFA
a reservation vector, like in RPA a learning approach, like in APSARA
Adonet Spring School
141
Summary
Good IQ scheduling algorithms exist: 100% throughput short delay limited complexity
Performance differences are significant only close to saturation
Adonet Spring School
142
Summary
Open questions concerning IQ schedulers:
QoS guarantees stability of networks of switches multicast traffic
Adonet Spring School
143
ReferencesRouter functions and architectures Keshav S., Sharma R., ``Issues and trends in router design'', IEEE Communications Magazine, vol.36, n.5, May 1998,
p.144-151 Bux W., Denzel W.E., Engbersen T., Herkersdorf A., Luijten R.P.,``Technologies and building blocks for fast packet
forwarding'', IEEE Communications Magazine, Jan.2001, pp.70-77 Newman P., Minshall G., Lyon T., Huston L.,``IP switching and gigabit routers'', IEEE Communications Magazine,
Jan.1997, pp.64-69 Wolf T., Turner J.S., ``Design issues for high-performance active routers'', IEEE Journal on Selected Areas in
Communications, vol.19, n.3, Mar.2001, pp.404-409
Scheduling in IQ switches Karol M., Hluchyj M., Morgan S., ``Input versus output queueing on a space division switch'', IEEE Transactions on
Communications, vol.35, n.12, Dec.1987 McKeown N., Anantharam V., Walrand J.,``Achieving 100\% throughput in an input-queued switch'',IEEE
INFOCOM'96, vol.1, San Francisco, CA, Mar.1996, pp.296-302 McKeown N.,``iSLIP: a scheduling algorithm for input-queued switches'', IEEE Transactions on Networking, vol.7, n.2,
Apr.1999, pp.188-201 McKeown N., Mekkittikul A.,``A practical scheduling algorithm to achieve 100\% throughput in input-queued switches'',
IEEE INFOCOM'98, vol.2, 1998, pp.792-9, New York, NY Tamir Y., Chi H.-C., ``Symmetric crossbar arbiters for VLSI communication switches'', IEEE Transaction on Parallel
and Distributed Systems, vol.4, no.1, Jan.1993, pp.13 –27 Chen H., Lambert J., Pitsilledes A.,``RC-BB switch. A high performance switching network for B-ISDN'',
GLOBECOM 95
Adonet Spring School
144
ReferencesScheduling in IQ switches Anderson T., Owicki S., Saxe J., Thacker C.,``High speed switch scheduling for local area networks'', ACM Transactions
on Computer Systems, vol.11, n.4, Nov.1993 LaMaire R.O., Serpanos D.N., ``Two dimensional round-robin schedulers for packet switches with multiple input queues'',
IEEE/ACM Transaction on Networking, vol.2, n.5, Oct.1994, p.471-482 Chen H., Lambert J., Pitsilledes A., ``RC-BB switch. A high performance switching network for B-ISDN'', IEEE
GLOBECOM 95, 1995 Duan H., Lockwood J.W., Kang S.M., Will J.D., ``A high performance OC12/OC48 queue design prototype for input
buffered ATM switches'', IEEE INFOCOM'97, vol.1, 1997, pp.20-8, Los Alamitos, CA Partridge C., et al., ``A 50-Gb/s IP router'', IEEE Transactions on Networking, vol.6, n.3, June 1998, pp.237-248 Ajmone Marsan M., Bianco A., Leonardi E., Milia L., ``RPA: a flexible scheduling algorithm for input buffered switches'',
IEEE Transactions on Communications, vol.47, n.12, Dec.1999, pp.1921-1933 Ajmone Marsan M., Bianco A., Filippi E., Giaccone P.,Leonardi E., Neri F.,``On the behavior of input queueing switch
architectures'', European Transactions on Telecommunications, vol.10, n.2, Mar.1999, pp.111-124 Christensen K.J.,``Design and evaluation of a parallel-polled virtual output queued switch'', IEEE ICC 2001, vol.1, pp.112-
116, 2001 Serpanos D.N., Antoniadis P.I., ``FIRM: a class of distributed scheduling algorithms for high-speed ATM switches with
multiple input queues'', IEEE INFOCOM 2000, vol.2, pp.548-555, 2000 Ying Jiang, Hamdi, M., “A 2-stage matching scheduler for a VOQ packet switch architecture”, IEEE ICC 2002, vol.4,
pp.2105-2110, 2002 Tassiulas L., ``Linear complexity algorithms for maximum throughput in radio networks and input queued switches'', IEEE
INFOCOM'98, vol.2, New York, NY, 1998, pp.533-539 Giaccone P., Prabhakar B., Shah D., ``Towards simple, high-performance schedulers for high-aggregate bandwidth
switches '', IEEE INFOCOM'02, New York, Jun.2002
Adonet Spring School
145
ReferencesPacket scheduling in IQ switches Ajmone Marsan M., Bianco A., Giaccone P., Leonardi E., Neri F., ``Packet scheduling in input-queued cell-
based switches'', IEEE INFOCOM'01, Anchorage, Alaska, Apr.2001(extended version to appear in IEEE Trans. on Networking, about Oct.2002)
Moon S.H., Sung D.K., ``High-performance variable-length packet scheduling algorithm for IP traffic'', IEEE GLOBECOM'01, Dec.2001
Scheduling multicast traffic in IQ switches Hayes J.F., Breault R., Mehmet-Ali M.K., ``Performance analysis of a multicast switch'', IEEE Transactions
on Communications, vol.39, n.4, Apr.1991, pp.581-587 Kim C.K., Lee T.T., ``Call scheduling algorithm in multicast switching systems'', IEEE Transactions on
Communications, vol.40, n.3, Mar.1992, pp.625-635 McKeown N., Prabhakar B., ``Scheduling multicast cells in an input-queued switch'', INFOCOM'96, vol.1,
San Francisco, CA, Mar.1996, pp.261-278 Prabhakar B., McKeown N., Ahuja R., ``Multicast scheduling for input-queued switches'', IEEE Journal on
Selected Areas in Communications, vol.15, n.5, Jun.1997, pp.855-866 Chen W., Chang Y., Hwang W., ``A high performance cell scheduling algorithm in broadband multicast
switching systems'', IEEE GLOBECOM'97, vol.1, New York, NY, 1997, pp.170-174 Guo M., Chang R., ``Multicast ATM switches: survey and performance evaluation'', Computer
Communication Review, vol.28, n.2, Apr.1998, pp.98-131 Andrews M., Khanna S., Kumaran K., ``Integrated scheduling of unicast and multicast traffic in an input-
queued switch'', IEEE INFOCOM'99, vol.3, New York, NY, 1999, pp.1144-1151 Liu Z., Righter R., ``Scheduling multicast input-queued switches'', Journal of Scheduling, John Wiley & Sons,
May 1999
Adonet Spring School
146
ReferencesScheduling multicast traffic in IQ switches Nong G., Hamdi M., ``On the provision of integrated QoS guarantees of unicast and multicast traffic in input-
queued switches'', IEEE GLOBECOM'99, vol.3, 1999 Ajmone Marsan M., Bianco A., Giaccone P., Leonardi E., Neri F., ``On the throughput of input-queued cell-
based switches with multicast traffic'', IEEE INFOCOM'01, Anchorage Alaska, Apr.2001 Ge Nong, Hamdi M., “Providing QoS guarantees for unicast/multicast traffic with fixed/variable-length
packets in multiple input-queued switches”, IEEE Symposium on Computers and Communications, pp.166 –171, 2001
Smiljanic A., “Flexible bandwidth allocation in high-capacity packet switches”, IEEE/ACM Transactions on Networking, vol.10, n.2, pp.287-293, Apr.2002
QoS support in IQ switches Tabatabaee V., Georgiadis L., Tassiulas L., ``QoS provisioning and tracking fluid policies in input queueing
switches'', IEEE INFOCOM'00, New York, Mar.2000 Chang C.S., Lee D.S., Jou Y.S., ``Load balanced Birkhoff-von Neumann switches'', 2001 IEEE Workshop on
High Performance Switching and Routing, 2001, pp.276-280. Hung A., Kesidis G., McKeown N.,``ATM input-buffered switches with guaranteed-rate property'', IEEE
ISCC'98, July 1998, pp.331-335, Athens, Greece
Advanced architectures derived from pure IQ Iyer S., McKeown N., ``Making parallel packet switches practical'', IEEE INFOCOM'01, Alaska, Mar.2001 Chang C.S., Lee D.S., Jou Y.S., ``Load balanced Birkhoff-von Neumann switches'', 2001 IEEE Workshop on
High Performance Switching and Routing, 2001, pp.276-280 Sivaram R., Stunkel C.B., Panda D.K., “HIPIQS: a high-performance switch architecture using input
queuing”, IEEE Transactions on Parallel and Distributed Systems, vol.13, n.3, pp.275-289, Mar.2002