Buffered CrossbarsWith Performance Guarantees
Shang-Tse (Da) ChuangDepartment of Electrical Engineering,Stanford University, http://yuba.stanford.edu/~stchuang
EE384YThursday, April 29, 2004
2
Motivation
Network operators want performance guarantees Throughput guarantee Delay guarantee
High performance routers use crossbars
Hard to build crossbar-based routers with guarantees
My talk: How a crossbar with a small amount of internal
buffering can give guarantees
3
Contents
Throughput Guarantees Buffered Crossbar - 100% Throughput Buffered Crossbar - Work Conservation
Delay Guarantees Traditional Crossbar – Emulating an OQ Switch Buffered Crossbar – Emulating an OQ Switch
4
Generic Crossbar-Based Architecture
Speedup of S
Scheduler
VOQs
5
Admissible Traffic
1 , , j
iji
ij
Traffic Matrix
Traffic is admissible if
6
100% Throughput An algorithm delivers 100% throughput if for any
admissible traffic the average backlog is finite
Throughput Guarantee
Speedup of S
Scheduler
7
Previous Work
1985 1990 1995 2000 2005
Wave Front Arbiter [Tamir]
Parallel Iterative Matching [Anderson et al.]
iSLIP [McKeown]
Longest Port First [Mekkittikul et al.]
Maximum Weight Matching [McKeown et al.]
Maximal Matching S=2[Dai,Prabhakar]
Heuristics
TheoreticallyProven
8
Maximal Matching Has Become Hard
TTX Switch Fabric Uses maximal matching Speedup less than 2 Consumes up to 8kW Limited to ~2.5Tb/s No 100% throughput guarantee
9
Traditional Crossbar
Crossbar Requirements An input can send at most one cell An output can receive at most one cell
Scheduling Problem Must overcome two constraints simultaneously
New Crossbar Relieve contention Remove dependency between inputs and outputs
10
Contents
Throughput Guarantees Buffered Crossbar - 100% Throughput Buffered Crossbar - Work Conservation
Delay Guarantees Traditional Crossbar – Emulating an OQ Switch Buffered Crossbar – Emulating an OQ Switch
11
Buffered Crossbar
Arrival Phase Scheduling Phases – Speedup of 2 Departure Phase
12
Scheduling Phase
Input Schedule Each input selects in parallel a cell for an empty crosspoint
Output Schedule Each output selects in parallel a cell from a full crosspoint
13
Example of Input/Output Scheduling
Round-robin Policy Each input schedules in a round-robin order Each output schedules in a round-robin order
14
Previous Work
Buffered Crossbar Simulations [Rojas-Cessa et al. 2001] 32x32 switch, Uniform Bernoulli Traffic, Round-Robin, S=1
0.01
0.1
1
10
100
1000
0.025 0.125 0.225 0.325 0.425 0.525 0.625 0.725 0.825 0.925
Offered Load p
Ave
rag
e D
elay
(C
ell
Tim
e)
1-SLIP
4-SLIP
Buffered Crossbar
Ideal Router
15
Theorem 1 A buffered crossbar with speedup of 2 delivers 100%
throughput for any admissible Bernoulli iid traffic using any work-conserving input/output schedules.
100% Throughput
16
Intuition of Proof
ε
<1-ε
<1-ε
1 2
1-ε 1-ε+ + ε = 2- ε
When a flow is backed up, the services for this backlog exceeds the arrivals
17
Intuition of ProofQij = Queue Length 0 if buffer empty
1 if buffer fullBij =
j
mjmjk
ikij BQQX
18
Intuition of Proof
Recall
If Qij > 0, then for Xij, Expected increase is 2 Expected decrease
If Bij = 1, then in output schedule one B*j will decrease
If Bij = 0,then in input schedule one Qi* will decrease
Thus expected decrease is 2
j
mjmjk
ikij BQQX
19
Contents
Throughput Guarantees Buffered Crossbar - 100% Throughput Buffered Crossbar - Work Conservation
Delay Guarantees Traditional Crossbar – Emulating an OQ Switch Buffered Crossbar – Emulating an OQ Switch
20
Work-conserving Property If there is a cell for a given output in the system, that
output is busy.
Work Conservation
Output Queued (OQ) Switch
21
?
Emulating an OQ switch
Under identical inputs, the departure time of every cell from both switches is identical
22
4
Input Priority List
57 6
56
1
1
2
9
2
3
8 3
1
Label each cell with their corresponding departure times Arrange input cells into an input priority list Output selects crosspoint with earliest departure time
4
23
Input Priority List
57 6
56
4
132
9
4
2
13
1
8
2
Good guy
Bad guysBad guy
Label each cell with their corresponding departure times Arrange input cells into an input priority list Output selects crosspoint with earliest departure time
24
Definitions
57 6
56
2
4
132
9
4
2
13
Output Margin – cells at its output with earlier departure time Input Margin – cells ahead in input priority list destined to
different outputs Total Margin – Output Margin minus Input Margin
1
8
2 good guys2 bad guys
25
Emulation of FIFO OQ Switch
57 6
56
2
4
12
9
4
2
13
Scheduling Phase Crosspoint is full – Output Margin will increase by one Crosspoint is empty – Input Margin will decrease by one
Total Margin increases by two
1
8 3
26
Emulation of FIFO OQ Switch
57 6
56
2
4
12
9
4
2
13
Arrival Phase Input Margin might increase by one
Departure Phase Output Margin will decrease by one
Total Margin decreases by at most two
1
8 3
3
27
Emulation of FIFO OQ Switch
57 6
56
2
4
2
9
4
2
3
8 33
Lemma 1 For every time slot, total margin does not decrease
28
FIFO Insertion Policy
56
4
2
9
4
2
3
857 6 323
47
Arrival Phase Cell for non-empty VOQ, insert behind cells for same
output Cell for empty VOQ, insert at head of input priority list
29
FIFO Insertion Policy
57 6
56
2
4
2
9
4
2
3
8 33
Lemma 2 An arriving cell will have a non-negative total margin
4 7
30
Theorem 2 A buffered crossbar with speedup of 2 can exactly emulate a
FIFO OQ switch.
Result was shown independently B. Magill, C. Rohrs, R. Stevenson, “Output-Queued Switch
Emulation by Fabrics With Limited Memory”, in IEEE Journal on Selected Areas in Communications, pp.606-615, May. 2003.
Theorem 3 A buffered crossbar with speedup of 2 can be work-conserving
with a distributed algorithm.
Emulation of FIFO OQ Switch
31
Contents
Throughput Guarantees Buffered Crossbar - 100% Throughput Buffered Crossbar - Work Conservation
Delay Guarantees Traditional Crossbar – Emulating an OQ Switch Buffered Crossbar – Emulating an OQ Switch
32
Delay Guarantees
one output, many logical FIFO queues
1
m
1 Weighted fair queueing
sorts packetsconstrained traffic
PIFO models
Weighted Fair Queueing Weighted Round Robin Strict priority etc.
one output, single PIFO queue
Push In First Out (PIFO)
1 constrained traffic
push
33
Achieving Delay Guarantees in Crossbars
Theorem 4 A crossbar switch with a speedup of 2 can exactly
emulate an OQ switch which provides delay guarantees.
Theorem 5 A crossbar switch with a speedup of 2-1/N is
necessary and sufficient to exactly emulate an NxN FIFO OQ switch.
34
Contents
Throughput Guarantees Buffered Crossbar - 100% Throughput Buffered Crossbar - Work Conservation
Delay Guarantees Traditional Crossbar – Emulating an OQ Switch Buffered Crossbar – Emulating an OQ Switch
35
3
Emulation of PIFO OQ Switch
57 6
56
2
4
1
9
4
2
12
Crosspoint Blocking A cell in the crosspoint has a larger departure time
Swap Phase If an arriving cell has a smaller departure time than the cell
in the crosspoint, swap the two cells
1
8 3
67
5 3
2
1
4
36
1
35
67
PIFO Insertion Policy
57 6 3 1
9
4
2
1
1
8 3
2
Arrival Phase Insert cell directly behind cell with departure time just earlier If cell has earliest departure time, then insert at head of input
priority list
42
4
2
3
15
37
Theorem 6 A buffered crossbar with speedup of 3 can exactly
emulate an OQ switch with delay guarantees.
PIFO Emulation
38
Output Linecard
Header Scheduling Architecture
Buffered Crossbar
Input Linecard
HeadersGrants
HeaderScheduler
39
Header Scheduling
2
9
4 3
Schedule headers instead of cells Headers are converted into grants in output schedule Grants are sent back to the input
1
18 3
1
42
25
56
367
4
2
2
2
40
Output Linecard
Grant Stream
Buffered Crossbar
Input Linecard
HeadersGrants
GrantFIFO
HeaderScheduler
Input can receive N grants in one scheduling phase Bounded to p+N-1 grants over p consecutive phases
41
33
33
3
3
3
3
22
1
1
1
2
1
Counter Example
GrantFIFO
Crosspoints
Grants
p=1
p=2
p=3
p=4
p=5
p=6
1
2
3
2 3
Cells ToOutput Queue
1 2 3
3
333
3
42
Modified Buffered Crossbar
Modified Buffered Crossbar N cells per crosspoint – requires N3 cell buffers N cells per output – requires N2 cell buffers
Theorem 7 A modified buffered crossbar with speedup of 2 can
emulate an OQ switch with delay guarantees with a fixed delay of N scheduling phases.
43
Summary
Buffered crossbars Uses crosspoints to relieve contention Inputs and outputs schedule independently and in
parallel
Performance guarantees Throughput – any work-conserving input/output
schedule Work Conservation – simple insertion policy Delay – header scheduling
44
Relevant Papers
Crossbars Shang-Tse Chuang, Ashish Goel, Nick McKeown,
Balaji Prabhakar, “Matching Output Queuing with a Combined Input Output Queued Switch,” IEEE Journal on Selected Areas in Communications, vol.17, n.6, pp.1030-1039, Dec.1999.
Buffered Crossbars Shang-Tse Chuang, Sundar Iyer, Nick McKeown,
“Practical Algorithms for Performance Guarantees in Buffered Crossbars,” Stanford HPNG Technical Report TR03-HPNG-061501 .