Date post: | 07-Apr-2018 |
Category: |
Documents |
Upload: | isik-demeti |
View: | 234 times |
Download: | 0 times |
of 189
8/6/2019 Sigcomm Tutorial
1/189
High Performance Switches and Routers:
Theory and PracticeSigcomm 99August 30, 1999
Harvard UniversityH i g h P e r f o r m a n c e
S w i tc h i n g a n d R o u t ing
Tel eco m Cen ter Wo rks hop : Sep t4, 19 97.
Nick McKeown Balaji Prabhakar
Departments of Electrical Engineering and Computer Science
8/6/2019 Sigcomm Tutorial
2/189
Copyright 1999. All Rights Reserved 2
Tutorial Outline
Introduction:What is a Packet Switch?
Packet Lookup and Classification:Where does a packet go next?
Switching Fabrics:How does the packet get there?
Output Scheduling:When should the packet leave?
8/6/2019 Sigcomm Tutorial
3/189
Copyright 1999. All Rights Reserved 3
Introduction
What is a Packet Switch?
Basic Architectural Components
Some Example Packet Switches
The Evolution of IP Routers
8/6/2019 Sigcomm Tutorial
4/189
Copyright 1999. All Rights Reserved 4
Basic Architectural Components
Policing
Output
SchedulingSwitching
Routing
Congestion
Control
Reservation
Admission
Control
Control
Datapath:
per-packetprocessing
8/6/2019 Sigcomm Tutorial
5/189
Copyright 1999. All Rights Reserved 5
Basic Architectural ComponentsDatapath: per-packet processing
ForwardingDecision
ForwardingDecision
ForwardingDecision
Forwarding
Table
Forwarding
Table
Forwarding
Table
Interconnect
OutputScheduling
1.
2.
3.
8/6/2019 Sigcomm Tutorial
6/189
Copyright 1999. All Rights Reserved 6
Where high performance packetswitches are used
Enterprise WAN access
& Enterprise Campus Switch
- Carrier Class Core Router
- ATM Switch
- Frame Relay Switch
The Internet Core
Edge Router
8/6/2019 Sigcomm Tutorial
7/189Copyright 1999. All Rights Reserved 7
Introduction
What is a Packet Switch?
Basic Architectural Components
Some Example Packet Switches
The Evolution of IP Routers
8/6/2019 Sigcomm Tutorial
8/189Copyright 1999. All Rights Reserved 8
ATM Switch
Lookup cell VCI/VPI in VC table.
Replace old VCI/VPI with new.
Forward cell to outgoing interface.
Transmit cell onto link.
8/6/2019 Sigcomm Tutorial
9/189Copyright 1999. All Rights Reserved 9
Ethernet Switch
Lookup frame DA in forwarding table.
If known, forward to correct port.
If unknown, broadcast to all ports.
Learn SA of incoming frame.
Forward frame to outgoing interface.
Transmit frame onto link.
8/6/2019 Sigcomm Tutorial
10/189Copyright 1999. All Rights Reserved 10
IP Router
Lookup packet DA in forwarding table.
If known, forward to correct port.
If unknown, drop packet.
Decrement TTL, update header Cksum.
Forward packet to outgoing interface.
Transmit packet onto link.
8/6/2019 Sigcomm Tutorial
11/189Copyright 1999. All Rights Reserved 11
Introduction
What is a Packet Switch?
Basic Architectural Components
Some Example Packet Switches
The Evolution of IP Routers
8/6/2019 Sigcomm Tutorial
12/189Copyright 1999. All Rights Reserved 12
First-Generation IP Routers
Shared Backplane
LineInterface
CPU
Memory
CPU BufferMemory
LineInterface
DMA
MAC
LineInterface
DMA
MAC
LineInterface
DMA
MAC
8/6/2019 Sigcomm Tutorial
13/189Copyright 1999. All Rights Reserved 13
Second-Generation IP Routers
CPU BufferMemory
LineCard
DMA
MAC
Local
BufferMemory
LineCard
DMA
MAC
Local
BufferMemory
LineCard
DMA
MAC
Local
BufferMemory
8/6/2019 Sigcomm Tutorial
14/189Copyright 1999. All Rights Reserved 14
Third-Generation Switches/Routers
LineCard
MAC
LocalBuffer
Memory
CPUCard
LineCard
MAC
LocalBuffer
Memory
Switched Backplane
LineInterface
LineInterface
LineInterface
LineInterface
LineInterface
LineInterfac
e
LineInterface
LineInterface
CPU
Memory
8/6/2019 Sigcomm Tutorial
15/189Copyright 1999. All Rights Reserved 15
1 2 3 4 5 6 7 8 9 10 1112 13 14 15 16
17181920 212223242526272829303132
13 14 15 16 17 18
19 20 21 22 23 24
25 26 27 28 29 30
31 32 21
1 2 3 4 5 6
7 8 9 10 11 12
Fourth-Generation Switches/RoutersClustering and Multistage
8/6/2019 Sigcomm Tutorial
16/189Copyright 1999. All Rights Reserved 16
Packet Switches
References J. Giacopelli, M. Littlewood, W.D. Sincoskie Sunshine: A
high performance self-routing broadband packet switch
architecture, ISS 90.
J. S. Turner Design of a Broadcast packet switching
network, IEEE Trans Comm, June 1988, pp. 734-743.
C. Partridge et al. A Fifty Gigabit per second IP Router,
IEEE Trans Networking, 1998.
N. McKeown, M. Izzard, A. Mekkittikul, W. Ellersick, M.
Horowitz, The Tiny Tera: A Packet Switch Core, IEEE
Micro Magazine, Jan-Feb 1997.
8/6/2019 Sigcomm Tutorial
17/189Copyright 1999. All Rights Reserved 17
Tutorial Outline
Introduction:What is a Packet Switch?
Packet Lookup and Classification:Where does a packet go next?
Switching Fabrics:How does the packet get there?
Output Scheduling:When should the packet leave?
8/6/2019 Sigcomm Tutorial
18/189Copyright 1999. All Rights Reserved 18
Basic Architectural ComponentsDatapath: per-packet processing
ForwardingDecision
ForwardingDecision
ForwardingDecision
Forwarding
Table
Forwarding
Table
Forwarding
Table
Interconnect
OutputScheduling
1.
2.
3.
8/6/2019 Sigcomm Tutorial
19/189
Copyright 1999. All Rights Reserved 19
Forwarding Decisions ATM and MPLS switches
Direct Lookup
Bridges and Ethernet switches
Associative Lookup
Hashing
Trees and tries
IP RoutersCaching
CIDR
Patricia trees/tries
Other methods
Packet Classification
8/6/2019 Sigcomm Tutorial
20/189
Copyright 1999. All Rights Reserved 20
ATM and MPLS Switches
Direct Lookup
VCIA
ddres
sMemory
Data
(Port, VCI)
8/6/2019 Sigcomm Tutorial
21/189
Copyright 1999. All Rights Reserved 21
Forwarding Decisions ATM and MPLS switches
Direct Lookup
Bridges and Ethernet switches
Associative Lookup
Hashing
Trees and tries
IP RoutersCaching
CIDR
Patricia trees/tries
Other methods
Packet Classification
8/6/2019 Sigcomm Tutorial
22/189
Copyright 1999. All Rights Reserved 22
Bridges and Ethernet Switches
Associative Lookups
Network
Address
Associated
Data
Associative
Memory or CAM
Search
Data
48
log2
N
Associated
Data
Hit?
Address{
Advantages:
Simple
Disadvantages
Slow
High Power
Small
Expensive
8/6/2019 Sigcomm Tutorial
23/189
Copyright 1999. All Rights Reserved 23
Bridges and Ethernet Switches
Hashing
Hashing
FunctionMemory
Address
Data
Search
Data
48
log2N
AssociatedData
Hit?
Address{16
8/6/2019 Sigcomm Tutorial
24/189
Copyright 1999. All Rights Reserved 24
Lookups Using Hashing
An example
Hashing Function
CRC-16
16
#1 #2 #3 #4
#1 #2
#1 #2 #3Linked lists
Memory
Search
Data
48
log2N
AssociatedData
Hit?
Address{
8/6/2019 Sigcomm Tutorial
25/189
Copyright 1999. All Rights Reserved 25
Lookups Using HashingPerformance of simple example
Where:
ER Expected number of memory references=
M Number of memory addresses in table=
N Number of linked lists=
M N=
ER 12--- 1 1 1
1
N----
M--------------------------------+
=
8/6/2019 Sigcomm Tutorial
26/189
Copyright 1999. All Rights Reserved 26
Lookups Using Hashing
Advantages:
Simple
Expected lookup time can be small
Disadvantages
Non-deterministic lookup time
Inefficient use of memory
8/6/2019 Sigcomm Tutorial
27/189
Copyright 1999. All Rights Reserved 27
Trees and Tries
Binary Search Tree
< >
< > < >
log
2N
Nentries
Binary Search Trie
0 1
0 1 0 1
111010
8/6/2019 Sigcomm Tutorial
28/189
Copyright 1999. All Rights Reserved 28
Trees and Tries
Multiway tries
16-ary Search Trie
0000, ptr 1111, ptr
0000, 0 1111, ptr
000011110000
0000, 0 1111, ptr
111111111111
d i
8/6/2019 Sigcomm Tutorial
29/189
Copyright 1999. All Rights Reserved 29
Trees and TriesMultiway tries
Degree ofTree
# MemReferences
# Nodes(x106)
Total Memory(Mbytes)
FractionWasted (%)
2 48 1.09 4.3 49
4 24 0.53 4.3 73
8 16 0.35 5.6 8616 12 0.25 8.3 93
64 8 0.17 21 98
256 6 0.12 64 99.5
Ew DL 1 1 1
N
DL-------
D Di 1 Di 1( )N 1 D1 i( )N( )
i 1=
L 1
+=
En 1 DL 1
N
DL-------
D Di Di 1 1 Di 1( )N
i 1=
L 1
+ +=
Where:
D D egree of tree=
L Number of layers/references=
N Number of entries in table=
En Expected number of nodes=
Ew Expected amount of wasted memory=
Table produced from 215 randomly generated 48-bit addresses
8/6/2019 Sigcomm Tutorial
30/189
Copyright 1999. All Rights Reserved 30
Forwarding Decisions ATM and MPLS switches
Direct Lookup
Bridges and Ethernet switches
Associative Lookup
Hashing
Trees and tries
IP RoutersCaching
CIDR
Patricia trees/tries
Other methods
Packet Classification
8/6/2019 Sigcomm Tutorial
31/189
Copyright 1999. All Rights Reserved 31
Caching Addresses
CPU BufferMemory
LineCard
DMA
MAC
Local
BufferMemory
LineCard
DMA
MAC
Local
BufferMemory
LineCard
DMA
MAC
Local
BufferMemory
Fast Path
Slow Path
8/6/2019 Sigcomm Tutorial
32/189
Copyright 1999. All Rights Reserved 32
Caching Addresses
LAN:Average flow < 40 packets
WAN: Huge Number of flows
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Cache = 10% of Full Table
Cache
Hit
Rate
8/6/2019 Sigcomm Tutorial
33/189
Copyright 1999. All Rights Reserved 33
IP Routers
Class-based addresses
Class A Class B Class C D
212.17.9.4
Class A
Class BClass C
212.17.9.0 Port 4
ExactMatch
Routing Table:
IP Address Space
8/6/2019 Sigcomm Tutorial
34/189
Copyright 1999. All Rights Reserved 34
IP Routers
CIDR
A B C D
0 232
-1
0 232 -1
128.9/16
128.9.0.0
216
142.12/19
65/8
Classless:
Class-based:
128.9.16.14
8/6/2019 Sigcomm Tutorial
35/189
Copyright 1999. All Rights Reserved 35
IP Routers
CIDR
0 232 -1
128.9/16
128.9.16.14
128.9.16/20 128.9.176/20
128.9.19/24
128.9.25/24
Most specific route = longest matching prefix
8/6/2019 Sigcomm Tutorial
36/189
Copyright 1999. All Rights Reserved 36
IP Routers
Metrics for Lookups
128.9.16.14 128.9/16128.9.16/20
128.9.176/20
128.9.19/24
128.9.25/24
142.12/19
65/8
Prefix Port
35
2
7
10
1
3
Lookup time Storage space Update time Preprocessing time
IP R t
8/6/2019 Sigcomm Tutorial
37/189
Copyright 1999. All Rights Reserved 37
IP Router
Lookup
IPv4 unicast destination address based lookup
DstnAddr Next Hop
----
----
---- ----
----
----
Destination Next HopForwarding Table
Next Hop Computation
Forwarding Engine
Incoming
Packet
HE
A
D
E
R
8/6/2019 Sigcomm Tutorial
38/189
Copyright 1999. All Rights Reserved 38
Need more than IPv4 unicast
lookups Multicast
PIM-SM
Longest Prefix Matching on the source and group address
Try (S,G) followed by (*,G) followed by (*,*,RP) Check Incoming Interface
DVMRP:
Incoming Interface Check followed by (S,G) lookup
IPv6 128-bit destination address field
Exact address architecture not yet known
8/6/2019 Sigcomm Tutorial
39/189
Copyright 1999. All Rights Reserved 39
Lookup Performance Required
Gigabit Ethernet (84B packets): 1.49 Mpps
Line Line Rate Pktsize=40B Pktsize=240B
T1 1.5Mbps 4.68 Kpps 0.78 Kpps
OC3 155Mbps 480 Kpps 80 Kpps
OC12 622Mbps 1.94 Mpps 323 Kpps
OC48 2.5Gbps 7.81 Mpps 1.3 Mpps
OC192 10 Gbps 31.25 Mpps 5.21 Mpps
8/6/2019 Sigcomm Tutorial
40/189
Copyright 1999. All Rights Reserved 40
Size of the Routing Table
Source: http://www.telstra.net/ops/bgptable.html
8/6/2019 Sigcomm Tutorial
41/189
Copyright 1999. All Rights Reserved 41
Ternary CAMs
10.0.0.0 R110.1.0.0 R2
10.1.1.0 R3
10.1.3.0 R4
255.0.0.0255.255.0.0
255.255.255.0
255.255.255.0
255.255.255.25510.1.3.1 R4
Value Mask
Priority Encoder
Next Hop
Associative Memory
8/6/2019 Sigcomm Tutorial
42/189
Copyright 1999. All Rights Reserved 42
Binary Tries
Example Prefixes
a) 00001
b) 00010
c) 00011d) 001
e) 0101
f) 011
g) 100
h) 1010
i) 1100
j) 11110000a b c
d
e
f g
h i
j
0 1
P t i i T
8/6/2019 Sigcomm Tutorial
43/189
Copyright 1999. All Rights Reserved 43
Patricia Tree
Skip=5
j
a b c
d
e
f g
0 1
h i
Example Prefixesa) 00001
b) 00010c) 00011
d) 001
e) 0101
f) 011
g) 100
h) 1010
i) 1100
j) 11110000
8/6/2019 Sigcomm Tutorial
44/189
Copyright 1999. All Rights Reserved 44
Patricia Tree
Disadvantages
Many memory accesses
May need backtracking Pointers take up a lot of
space
Advantages
General Solution
Extensible to widerfields
Avoid backtracking by storing the intermediate-best matched prefix.(Dynamic Prefix Tries)
40K entries: 2MB data structure with 0.3-0.5 Mpps [O(W)]
8/6/2019 Sigcomm Tutorial
45/189
Copyright 1999. All Rights Reserved 45
Binary search on trie levels
P
Level 0
Level 29
Level 8
8/6/2019 Sigcomm Tutorial
46/189
Copyright 1999. All Rights Reserved 46
Binary search on trie levels
10.0.0.0/810.1.0.0/1610.1.1.0/24
Example Prefixes
10.1.2.0/24
Length Hash
8
12
16
24
Store a hash table for each prefix lengthto aid search at a particular trie level.
10.2.3.0/24
Example Addrs
10.1.1.410.4.4.310.2.3.910.2.4.8
10.0.0.0/810.1.0.0/1610.1.1.0/24
Example Prefixes
10.1.2.0/2410.2.3.0/2410
10.1, 10.2
10.1.1, 10.1.2, 10.2.3
8/6/2019 Sigcomm Tutorial
47/189
Copyright 1999. All Rights Reserved 47
Binary search on trie levels
Disadvantages
Multiple hashed memory
accesses. Updates are complex.
Advantages
Scaleable to IPv6.
33K entries: 1.4MB data structure with 1.2-2.2 Mpps [O(log W)]
8/6/2019 Sigcomm Tutorial
48/189
Copyright 1999. All Rights Reserved 48
Compacting Forwarding Tables
1 0 0 0 1 0 1 1 1 0 0 0 1 1 1 1
8/6/2019 Sigcomm Tutorial
49/189
Copyright 1999. All Rights Reserved 49
Compacting Forwarding Tables
10001010 11100010 10000010 10110100 11000000
R1, 0 R5, 0R2, 3 R3, 7 R4, 9
0 13
Codeword array
Base index array
01
0 321 4
8/6/2019 Sigcomm Tutorial
50/189
8/6/2019 Sigcomm Tutorial
51/189
Copyright 1999. All Rights Reserved 51
16-ary Search Trie
0000, ptr 1111, ptr
0000, 0 1111, ptr
000011110000
0000, 0 1111, ptr
111111111111
Multi-bit Tries
8/6/2019 Sigcomm Tutorial
52/189
Copyright 1999. All Rights Reserved 52
Compressed Tries
L16
L24
L8
Only 3 memory accesses
8/6/2019 Sigcomm Tutorial
53/189
Copyright 1999. All Rights Reserved 53
Routing Lookups in Hardware
Prefix length
Numbe
r
Most prefixes are 24-bits or shorter
Routing Lookups in Hardware
8/6/2019 Sigcomm Tutorial
54/189
Copyright 1999. All Rights Reserved 54
Routing Lookups in Hardware
142
.19.6.14
Prefixes up to 24-bits
142.19
.6
14
1 Next Hop
24
Next Hop
142.19.6
224 = 16M entries
Routing Lookups in Hardware
8/6/2019 Sigcomm Tutorial
55/189
Copyright 1999. All Rights Reserved 55
Routing Lookups in Hardware
128
.3.72.44
Prefixes up to 24-bits
128.3. 7
2
44
1 Next Hop
128.3.72
240 Pointer
8
Prefixes above24-bits
Next Hop
Next Hop
Next Hop
offset
base
Routing Lookups in Hardware
8/6/2019 Sigcomm Tutorial
56/189
Copyright 1999. All Rights Reserved 56
Routing Lookups in Hardware
Prefixes up to n-bits2n entries:
0
N + M
N
i j Prefixeslonger than
N+M bits
Next Hop
( )2m
i entries
8/6/2019 Sigcomm Tutorial
57/189
Copyright 1999. All Rights Reserved 57
Routing Lookups in Hardware
Disadvantages
Large memory required
(9-33MB) Depends on prefix-length
distribution.
Advantages
20Mpps with 50ns
DRAM
Easy to implement in
hardware
Various compression schemes can be employed to decrease the
storage requirements: e.g. employ carefully chosen variable length
strides, bitmap compression etc.
8/6/2019 Sigcomm Tutorial
58/189
Copyright 1999. All Rights Reserved 58
IP Router Lookups
References A. Brodnik, S. Carlsson, M. Degermark, S. Pink. Small Forwarding Tables
for Fast Routing Lookups, Sigcomm 1997, pp 3-14.
B. Lampson, V. Srinivasan, G. Varghese. IP lookups using multiway and
multicolumn search, Infocom 1998, pp 1248-56, vol. 3.
M. Waldvogel, G. Varghese, J. Turner, B. Plattner. Scalable high speed IP
routing lookups, Sigcomm 1997, pp 25-36.
P. Gupta, S. Lin, N.McKeown. Routing lookups in hardware at memory
access speeds, Infocom 1998, pp 1241-1248, vol. 3.
S. Nilsson, G. Karlsson. Fast address lookup for Internet routers, IFIP Intl
Conf on Broadband Communications, Stuttgart, Germany, April 1-3, 1998.
V. Srinivasan, G.Varghese. Fast IP lookups using controlled prefix
expansion, Sigmetrics, June 1998.
8/6/2019 Sigcomm Tutorial
59/189
Copyright 1999. All Rights Reserved 59
Forwarding Decisions ATM and MPLS switches
Direct Lookup Bridges and Ethernet switches
Associative Lookup
Hashing
Trees and tries
IP Routers
Caching
CIDR
Patricia trees/tries
Other methods
Packet Classification
P idi V l Add d S i
8/6/2019 Sigcomm Tutorial
60/189
Copyright 1999. All Rights Reserved 60
Providing Value-Added Services
Some examples
Differentiated services
Regard traffic from Autonomous System #33 as `platinum-grade
Access Control Lists
Deny udp host 194.72.72.33 194.72.6.64 0.0.0.15 eq snmp
Committed Access Rate
Rate limit WWW traffic from sub-interface#739 to 10Mbps
Policy-based Routing
Route all voice traffic through the ATM network
8/6/2019 Sigcomm Tutorial
61/189
8/6/2019 Sigcomm Tutorial
62/189
Copyright 1999. All Rights Reserved 62
Multi-field Packet Classification
Given a classifier with N rules, find the action associatedwith the highest priority rule matching an incoming
packet.
Field 1 Field 2 Field k Action
Rule 1 152.163.190.69/ 21 152.163.80.11/ 32 UDP A1
Rule 2 152.168.3.0/ 24 152.163.0.0/ 16 TCP A2
Rule N 152.168.0.0/ 16 152.0.0.0/ 8 ANY An
G i I i i 2D
8/6/2019 Sigcomm Tutorial
63/189
Copyright 1999. All Rights Reserved 63
R5
Geometric Interpretation in 2D
R4
R3
R2R1
R7
P2
Field #1
Field# 2
R6
Field #1 Field #2 Data
P1
e.g. (128.16.46.23, *)e.g. (144.24/16, 64/24)
8/6/2019 Sigcomm Tutorial
64/189
8/6/2019 Sigcomm Tutorial
65/189
Copyright 1999. All Rights Reserved 65
Proposed Schemes (Contd.)
Pros ConsCrossproducting
(Srinivasan etal[Sigcomm 98])
Fast accesses.
Suitable formultiple fields.
Large memory
requirements. Suitablewithout caching forclassifiers with fewer than50 rules.
Bil-level Parallelism
(Lakshman andStiliadis[Sigcomm 98])
Suitable for
multiple fields.
Large memory bandwidth
required. Comparativelyslow lookup rate.Hardware only.
8/6/2019 Sigcomm Tutorial
66/189
Copyright 1999. All Rights Reserved 66
Proposed Schemes (Contd.)
Pros ConsHierarchical
Intelligent Cuttings
(Gupta and
McKeown[HotI 99])
Suitable for multiple
fields. Small memory
requirements. Good
update time.
Large preprocessing
time.
Tuple Space Search
(Srinivasan et
al[Sigcomm 99])
Suitable for multiple
fields. The basic scheme
has good update times
and memory
requirements.
Classification rate can be
low. Requires perfect
hashing for determinism.
Recursive Flow
Classification (Gupta
and
McKeown[Sigcomm
99])
Fast accesses. Suitable for
multiple fields.
Reasonable memory
requirements for real-life
classifiers.
Large preprocessing time
and memory
requirements for large
classifiers.
8/6/2019 Sigcomm Tutorial
67/189
Copyright 1999. All Rights Reserved 67
Grid of Tries
R7
R4
R6R5R3
R2
R1
Dimension 1
Dimension 2
0
00
0
0
01
1
1
1
1
1
0
0
000
8/6/2019 Sigcomm Tutorial
68/189
Copyright 1999. All Rights Reserved 68
Grid of Tries
Disadvantages
Static solution
Not easy to extend to
higher dimensions
Advantages
Good solution for two
dimensions
20K entries: 2MB data structure with 9 memory accesses [at most 2W]
8/6/2019 Sigcomm Tutorial
69/189
Copyright 1999. All Rights Reserved 69
Classification using Bit Parallelism
R4 R3 R2R11
1
0
0
1
0
1
1
8/6/2019 Sigcomm Tutorial
70/189
Copyright 1999. All Rights Reserved 70
Classification using Bit Parallelism
Disadvantages
Large memory
bandwidth
Hardware optimized
Advantages
Good solution for
multiple dimensions
for small classifiers
512 rules: 1Mpps with single FPGA and 5 128KB SRAM chips.
8/6/2019 Sigcomm Tutorial
71/189
Copyright 1999. All Rights Reserved 71
Classification Using Multiple Fields
Recursive Flow ClassificationPacket Header
F1
F2
F3
F4
Fn
MemoryMemory
Action
Memory
2S =2128 2T =212
2S =2128
2T
=
2122
64
224
8/6/2019 Sigcomm Tutorial
72/189
Copyright 1999. All Rights Reserved 72
Packet Classification
References T.V. Lakshman. D. Stiliadis. High speed policy based packet
forwarding using efficient multi-dimensional range matching,
Sigcomm 1998, pp 191-202.
V. Srinivasan, S. Suri, G. Varghese and M. Waldvogel. Fast and
scalable layer 4 switching, Sigcomm 1998, pp 203-214.
V. Srinivasan, G. Varghese, S. Suri. Fast packet classification using
tuple space search, to be presented at Sigcomm 1999.
P. Gupta, N. McKeown, Packet classification using hierarchicalintelligent cuttings, Hot Interconnects VII, 1999.
P. Gupta, N. McKeown, Packet classification on multiple fields,
Sigcomm 1999.
8/6/2019 Sigcomm Tutorial
73/189
Copyright 1999. All Rights Reserved 73
Tutorial Outline
Introduction:What is a Packet Switch?
Packet Lookup and Classification:Where does a packet go next?
Switching Fabrics:How does the packet get there?
Output Scheduling:When should the packet leave?
8/6/2019 Sigcomm Tutorial
74/189
Copyright 1999. All Rights Reserved 74
Switching Fabrics
Output and Input Queueing
Output Queueing
Input Queueing
Scheduling algorithms
Combining input and output queues
Other non-blocking fabrics
Multicast traffic
Basic Architectural Components
8/6/2019 Sigcomm Tutorial
75/189
Copyright 1999. All Rights Reserved 75
Basic Architectural ComponentsDatapath: per-packet processing
ForwardingDecision
ForwardingDecision
ForwardingDecision
Forwarding
Table
Forwarding
Table
Forwarding
Table
Interconnect
OutputScheduling
1.
2.
3.
8/6/2019 Sigcomm Tutorial
76/189
Copyright 1999. All Rights Reserved 76
InterconnectsTwo basic techniques
Input Queueing Output Queueing
Usually a non-blocking
switch fabric (e.g. crossbar)
Usually a fast bus
8/6/2019 Sigcomm Tutorial
77/189
Copyright 1999. All Rights Reserved 77
InterconnectsOutput Queueing
Individual Output Queues Centralized Shared Memory
Memory b/w = (N+1).R
1
2
N
Memory b/w = 2N.R
1
2
N
Output Queueing
8/6/2019 Sigcomm Tutorial
78/189
Copyright 1999. All Rights Reserved 78
Output QueueingThe ideal
1
1
1
1
1
1
1
1
1
11
1
2
2
2
2
2
2
8/6/2019 Sigcomm Tutorial
79/189
8/6/2019 Sigcomm Tutorial
80/189
Copyright 1999. All Rights Reserved 80
Switching Fabrics
Output and Input Queueing
Output Queueing
Input QueueingScheduling algorithms
Other non-blocking fabrics
Combining input and output queues
Multicast traffic
8/6/2019 Sigcomm Tutorial
81/189
8/6/2019 Sigcomm Tutorial
82/189
Copyright 1999. All Rights Reserved 82
Input Queueing
Head of Line Blocking
D
ela
y
Load58.6% 100%
Head of Line Blocking
8/6/2019 Sigcomm Tutorial
83/189
Copyright 1999. All Rights Reserved 83
Head of Line Blocking
8/6/2019 Sigcomm Tutorial
84/189
Copyright 1999. All Rights Reserved 84
8/6/2019 Sigcomm Tutorial
85/189
Input QueueingV l
8/6/2019 Sigcomm Tutorial
86/189
Copyright 1999. All Rights Reserved 86
Virtual output queues
8/6/2019 Sigcomm Tutorial
87/189
Copyright 1999. All Rights Reserved 87
Input QueuesVirtual Output Queues
D
ela
y
Load100%
8/6/2019 Sigcomm Tutorial
88/189
Copyright 1999. All Rights Reserved 88
Input Queueing
Scheduler
Memory b/w = 2R
Can be quite
complex!
Input Queueing
8/6/2019 Sigcomm Tutorial
89/189
Copyright 1999. All Rights Reserved 89
Input QueueingScheduling
Input 1Q(1,1)
Q(1,n)
A1(t)
Input m
Q(m,1)
Q(m,n)
Am(t)
D1(t)
Dn(t)
Output 1
Output n
Matching, MA1,1(t)
?
Q i
8/6/2019 Sigcomm Tutorial
90/189
Copyright 1999. All Rights Reserved 90
Input Queueing
Scheduling
Request
Graph
1
2
34
1
2
342
5
242
7
Bipartite
Matching
1
2
34
1
2
34
(Weight = 18)
Question:Maximum weight or maximum size?
I Q i
8/6/2019 Sigcomm Tutorial
91/189
Copyright 1999. All Rights Reserved 91
Input Queueing
Scheduling Maximum Size
Maximizes instantaneous throughput
Does it maximize long-term throughput? Maximum Weight
Can clear most backlogged queues
But does it sacrifice long-term throughput?
I Q i
8/6/2019 Sigcomm Tutorial
92/189
Copyright 1999. All Rights Reserved 92
Input Queueing
Scheduling
1
2
1
2
1
2
1
2
Input Queueing
8/6/2019 Sigcomm Tutorial
93/189
Copyright 1999. All Rights Reserved 93
Longest Queue First or
Oldest Cell First
1
23
4
1
23
4
1
23
4
1
23
4
10
1
11
110
Maximum weight
WeightWaiting Time
100%Queue Length
{ }=
Input Queueing
8/6/2019 Sigcomm Tutorial
94/189
Copyright 1999. All Rights Reserved 94
Input QueueingWhy is serving long/old queues better than
serving maximum number of queues? When traffic is uniformly distributed, servicing the
maximum number of queues leads to 100% throughput. When traffic is non-uniform, some queues become
longer than others. A good algorithm keeps the queue lengths matched, and
services a large number of queues.
VOQ #
AvgOccupancy Uniform traffic
VOQ #
AvgOcc
upancy
Non-uniform traffic
8/6/2019 Sigcomm Tutorial
95/189
8/6/2019 Sigcomm Tutorial
96/189
Copyright 1999. All Rights Reserved 96
Wave Front Arbiter
Requests Match
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
8/6/2019 Sigcomm Tutorial
97/189
Copyright 1999. All Rights Reserved 97
Wave Front Arbiter
Requests Match
W F t A bit
8/6/2019 Sigcomm Tutorial
98/189
Copyright 1999. All Rights Reserved 98
Wave Front ArbiterImplementation
1,1 1,2 1,3 1,4
2,1 2,2 2,3 2,4
3,1 3,2 3,3 3,4
4,1 4,2 4,3 4,4
Combinational
Logic Blocks
W F t A bit
8/6/2019 Sigcomm Tutorial
99/189
Copyright 1999. All Rights Reserved 99
Wave Front ArbiterWrapped WFA (WWFA)
Requests Match
N steps instead of
2N-1
I t Q i
8/6/2019 Sigcomm Tutorial
100/189
Copyright 1999. All Rights Reserved 100
Input QueueingPractical Algorithms
Maximal Size Algorithms
Wave Front Arbiter (WFA)
Parallel Iterative Matching (PIM)iSLIP
Maximal Weight Algorithms
Fair Access Round Robin (FARR)Longest Port First (LPF)
8/6/2019 Sigcomm Tutorial
101/189
Copyright 1999. All Rights Reserved 101
Parallel Iterative Matching
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
Requests
1
2
3
4
1
2
3
4
Grant
1
2
3
4
1
2
3
4
Accept/Match
1
2
3
4
1
2
3
4
#1
#2
Random Selection
1
2
3
4
1
2
3
4
Random Selection
Parallel Iterative Matching
8/6/2019 Sigcomm Tutorial
102/189
Copyright 1999. All Rights Reserved 102
Parallel Iterative MatchingMaximal is not Maximum
1
2
34
1
2
34
Requests Accept/Match
1
2
3
4
1
2
34
1
2
3
4
1
2
3
4
Parallel Iterative Matching
8/6/2019 Sigcomm Tutorial
103/189
Copyright 1999. All Rights Reserved 103
Parallel Iterative MatchingAnalytical Results
E C[ ] Nlog
E U i[ ]N2
4i-------
C # of iterations required to resolve connections=
N # of ports=
Ui # of unresolved connections after iteration i=
Number of iterations to converge:
Parallel Iterative Matching
8/6/2019 Sigcomm Tutorial
104/189
Copyright 1999. All Rights Reserved 104
Parallel Iterative Matching
8/6/2019 Sigcomm Tutorial
105/189
Copyright 1999. All Rights Reserved 105
Parallel Iterative Matching
8/6/2019 Sigcomm Tutorial
106/189
Copyright 1999. All Rights Reserved 106
Inp t Q e eing
8/6/2019 Sigcomm Tutorial
107/189
Copyright 1999. All Rights Reserved 107
Input QueueingPractical Algorithms
Maximal Size Algorithms
Wave Front Arbiter (WFA)
Parallel Iterative Matching (PIM)iSLIP
Maximal Weight Algorithms
Fair Access Round Robin (FARR)Longest Port First (LPF)
8/6/2019 Sigcomm Tutorial
108/189
iSLIP
8/6/2019 Sigcomm Tutorial
109/189
Copyright 1999. All Rights Reserved 109
iSLIPProperties
Random under low load
TDM under high load
Lowest priority to MRU 1 iteration: fair to outputs
Converges in at most N iterations. On average
8/6/2019 Sigcomm Tutorial
110/189
Copyright 1999. All Rights Reserved 110
8/6/2019 Sigcomm Tutorial
111/189
iSLIP
8/6/2019 Sigcomm Tutorial
112/189
Copyright 1999. All Rights Reserved 112
iSLIPImplementation
Grant
Grant
Grant
Accept
Accept
Accept
1
2
N
1
2
N
State
N
N
N
Decision
log2N
log2N
log2N
Programmable
Priority Encoder
Input Queueing References
8/6/2019 Sigcomm Tutorial
113/189
Copyright 1999. All Rights Reserved 113
Input Queueing ReferencesReferences
M. Karol et al. Input vs Output Queueing on a Space-Division Packet
Switch, IEEE Trans Comm., Dec 1987, pp. 1347-1356.
Y. Tamir, Symmetric Crossbar arbiters for VLSI communication
switches, IEEE Trans Parallel and Dist Sys., Jan 1993, pp.13-27.
T. Anderson et al. High-Speed Switch Scheduling for Local Area
Networks, ACM Trans Comp Sys., Nov 1993, pp. 319-352.
N. McKeown, The iSLIP scheduling algorithm for Input-Queued
Switches, IEEE Trans Networking, April 1999, pp. 188-201.
C. Lund et al. Fair prioritized scheduling in an input-bufferedswitch, Proc. of IFIP-IEEE Conf., April 1996, pp. 358-69.
A. Mekkitikul et al. A Practical Scheduling Algorithm to Achieve
100% Throughput in Input-Queued Switches, IEEE Infocom 98,
April 1998.
8/6/2019 Sigcomm Tutorial
114/189
Other Non Blocking Fabrics
8/6/2019 Sigcomm Tutorial
115/189
Copyright 1999. All Rights Reserved 115
Other Non-Blocking FabricsClos Network
Other Non Blocking Fabrics
8/6/2019 Sigcomm Tutorial
116/189
Copyright 1999. All Rights Reserved 116
Other Non-Blocking FabricsClos Network
Expansion factor required = 2-1/N (but still blocking for multicast)
Other Non-Blocking Fabrics
8/6/2019 Sigcomm Tutorial
117/189
Copyright 1999. All Rights Reserved 117
Other Non-Blocking FabricsSelf-Routing Networks
000
001
010
011
100
101
110
111
000
001
010
011
100
101
110
111
8/6/2019 Sigcomm Tutorial
118/189
8/6/2019 Sigcomm Tutorial
119/189
Speedup
8/6/2019 Sigcomm Tutorial
120/189
Copyright 1999. All Rights Reserved 120
Context
input-queued switches
output-queued switches
the speedup problem
Early approaches
Algorithms
Implementation considerations
Speedup: Context
8/6/2019 Sigcomm Tutorial
121/189
Copyright 1999. All Rights Reserved 121
Speedup: Context
Me
m
o
r
y
Me
m
o
r
y
The placement of memory gives
- Output-queued switches- Input-queued switches- Combined input- and output-queued switches
A generic switch
Output-queued switches
8/6/2019 Sigcomm Tutorial
122/189
Copyright 1999. All Rights Reserved 122
Best delay and throughput performance- Possible to erect bandwidth firewalls between sessions
Main problem- Requires high fabric speedup (S = N)
Unsuitable for high-speed switching
Input-queued switches
8/6/2019 Sigcomm Tutorial
123/189
Copyright 1999. All Rights Reserved 123
Big advantage- Speedup of one is sufficient
Main problem
- Cant guarantee delay due to input contention
Overcoming input contention: use higher speedup
A Comparison
8/6/2019 Sigcomm Tutorial
124/189
Copyright 1999. All Rights Reserved 124
A Comparison
Line Rate Memory
BW
Access Time
Per cell
Memory
BW
Access Time
Memory speeds for 32x32 switch
Output-queued Input-queued
100 Mb/s 3.3 Gb/s 128 ns 200 Mb/s 2.12 s1 Gb/s 33 Gb/s 12.8 ns 2 Gb/s 212 ns
2.5 Gb/s 82.5 Gb/s 5.12 ns 5 Gb/s 84.8 ns
10 Gb/s 330 Gb/s 1.28ns 20 Gb/s 21.2 ns
Th S d P bl
8/6/2019 Sigcomm Tutorial
125/189
Copyright 1999. All Rights Reserved 125
The Speedup Problem
Find a compromise: 1 < Speedup
8/6/2019 Sigcomm Tutorial
126/189
The findings
8/6/2019 Sigcomm Tutorial
127/189
Copyright 1999. All Rights Reserved 127
The findings
Very tantalizing ...
- under different settings (traffic, loading, algorithm, etc)
- and even for varying switch sizes
A speedup of between 2 and 5 was sufficient!
Using Speedup
8/6/2019 Sigcomm Tutorial
128/189
Copyright 1999. All Rights Reserved 128
1
1
1
2
2
Intuition
8/6/2019 Sigcomm Tutorial
129/189
Copyright 1999. All Rights Reserved 129
Speedup = 1
Speedup = 2
Fabric throughput = .58
Bernoulli IID inputs
Fabric throughput = 1.16
Bernoulli IID inputs
I/p efficiency, = 1/1.16
Ave I/p queue = 6.25
Intuition (continued)
8/6/2019 Sigcomm Tutorial
130/189
Copyright 1999. All Rights Reserved 130
Speedup = 3Fabric throughput = 1.74
Bernoulli IID inputs
Input efficiency = 1/1.74
Speedup = 4 Fabric throughput = 2.32
Bernoulli IID inputs
Input efficiency = 1/2.32
Ave I/p queue = 0.75
Ave I/p queue = 1.35
Issues
8/6/2019 Sigcomm Tutorial
131/189
Copyright 1999. All Rights Reserved 131
Need hard guarantees- exact, not average
Robustness- realistic, even adversarial, traffic not friendly Bernoulli IID
The Ideal Solution
8/6/2019 Sigcomm Tutorial
132/189
Copyright 1999. All Rights Reserved 132
Speedup = N
?
Speedup
8/6/2019 Sigcomm Tutorial
133/189
Copyright 1999. All Rights Reserved 133
Apply same inputs to an OQ and a CIOQ switch
- packet by packet
Obtain same outputs- packet by packet
Algorithm - MUCF
8/6/2019 Sigcomm Tutorial
134/189
Copyright 1999. All Rights Reserved 134
Key concept: urgency value
- urgency = departure time - present time
MUCF
8/6/2019 Sigcomm Tutorial
135/189
Copyright 1999. All Rights Reserved 135
The algorithm
- Outputs try to get their most urgent packets- Inputs grant to output whose packet is most
urgent, ties broken by port number- Loser outputs for next most urgent packet- Algorithm terminates when no more matchings
are possible
Stable Marriage Problem
8/6/2019 Sigcomm Tutorial
136/189
Copyright 1999. All Rights Reserved 136
MariaHillary Monica
PedroJohnBill
Men = Outputs
Women = Inputs
An example
8/6/2019 Sigcomm Tutorial
137/189
Copyright 1999. All Rights Reserved 137
Observation: Only two reasons a packet doesnt get to its output
- Input contention, Output contention
-This is why speedup of 2 works!!
What does this get us?
8/6/2019 Sigcomm Tutorial
138/189
Copyright 1999. All Rights Reserved 138
g
Speedup of 4 is sufficient for exact emulation of FIFO
OQ switches, with MUCF
What about non-FIFO OQ switches?
E.g. WFQ, Strict priority
8/6/2019 Sigcomm Tutorial
139/189
What gives?
8/6/2019 Sigcomm Tutorial
140/189
Copyright 1999. All Rights Reserved 140
g
Complexity of the algorithms- Extra hardware for processing
- Extra run time (time complexity)
What is the benefit?
- Reduced memory bandwidth requirements
Tradeoff: Memory for processing
- Moores Law supports this tradeoff
Implementation - a closer look
8/6/2019 Sigcomm Tutorial
141/189
Copyright 1999. All Rights Reserved 141
Main sources of difficulty- Estimating urgency, etc - info is distributed
- Matching process - too many iterations?
Estimating urgency depends on what is being emulated
- Like taking a ticket to hold a place in a queue
- FIFO, Strict priorities - no problem
- WFQ, etc - problems
(and communicating this info among I/ps and O/ps)
Implementation (contd)
8/6/2019 Sigcomm Tutorial
142/189
Copyright 1999. All Rights Reserved 142
Matching process
- A variant of the stable marriage problem
- Worst-case number of iterations in switching =
N- High probability and average approxly log(N)
- Worst-case number of iterations for SMP = N2
Other Work
8/6/2019 Sigcomm Tutorial
143/189
Copyright 1999. All Rights Reserved 143
Relax stringent requirement of exact emulation
- Least Occupied O/p First Algorithm (LOOFA)
- Disallow arbitrary inputs
Keeps outputs always busy if there are packets
By time-stamping packets, it also exactly mimics
E.g. leaky bucket constrained
Obtain worst-case delay bounds
References for speedup
8/6/2019 Sigcomm Tutorial
144/189
Copyright 1999. All Rights Reserved 144
p p
- Y. Oie et al, Effect of speedup in nonblocking packet switch, ICC 89.- A.L Gupta, N.D. Georgana, Analysis of a packet switch with input and
and output buffers and speed constraints, Infocom 91.
- S-T. Chuang et al, Matching output queueing with a combined input and
and output queued switch, IEEE JSAC, vol 17, no 6, 1999.- B. Prabhakar, N. McKeown, On the speedup required for combined input
and output queued switching, Automatica, vol 35, 1999.
- P. Krishna et al, On the speedup required for work-conserving crossbar
switches, IEEE JSAC, vol 17, no 6, 1999.- A. Charny, Providing QoS guarantees in input buffered crossbar switches
with speedup, PhD Thesis, MIT, 1998.
Switching Fabrics
8/6/2019 Sigcomm Tutorial
145/189
Copyright 1999. All Rights Reserved 145
Switching Fabrics
Output and Input Queueing
Output Queueing
Input QueueingScheduling algorithms
Other non-blocking fabrics
Combining input and output queuesMulticast traffic
Multicast Switching
8/6/2019 Sigcomm Tutorial
146/189
Copyright 1999. All Rights Reserved 146
The problem
Switching with crossbar fabrics
Switching with other fabrics
Multicasting
8/6/2019 Sigcomm Tutorial
147/189
Copyright 1999. All Rights Reserved 147
1
2
64
3 5
8/6/2019 Sigcomm Tutorial
148/189
Method 2
8/6/2019 Sigcomm Tutorial
149/189
Copyright 1999. All Rights Reserved 149
Use copying properties of crossbar fabric
No fanout-splitting: Easy, but low
throughput
Fanout-splitting: higher
throughput, but not as simple.
Leaves residue.
The effect of fanout-splitting
8/6/2019 Sigcomm Tutorial
150/189
Copyright 1999. All Rights Reserved 150
Performance of an 8x8 switch with and without fanout-splitting
under uniform IID traffic
Placement of residue
8/6/2019 Sigcomm Tutorial
151/189
Copyright 1999. All Rights Reserved 151
Key question: How should outputs grant requests?(and hence decide placement of residue)
Residue and throughput
8/6/2019 Sigcomm Tutorial
152/189
Copyright 1999. All Rights Reserved 152
Result: Concentrating residue brings more new workforward. Hence leads to higher throughput.
But, there are fairness problems to deal with.
This and other problems can be looked at in a unified
way by mapping the multicasting problem onto a
variation of Tetris.
Multicasting and Tetris
8/6/2019 Sigcomm Tutorial
153/189
Copyright 1999. All Rights Reserved 153
Output ports
1 2 3 54
1 2 3 54Input ports
Residue
Multicasting and Tetris
8/6/2019 Sigcomm Tutorial
154/189
Copyright 1999. All Rights Reserved 154
Output ports
1 2 3 54
1 2 3 54Input ports
Residue
Concentrated
Replication by recycling
Main idea: Make two copies at a time using a binary tree
8/6/2019 Sigcomm Tutorial
155/189
Copyright 1999. All Rights Reserved 155
Main idea: Make two copies at a time using a binary tree
with input at root and all possible destination outputs atthe leaves.
ab
e
x d
yc
x
y
a
b
cx
y
de
8/6/2019 Sigcomm Tutorial
156/189
8/6/2019 Sigcomm Tutorial
157/189
Tutorial Outline
8/6/2019 Sigcomm Tutorial
158/189
Copyright 1999. All Rights Reserved 158
Tutorial Outline
Introduction:What is a Packet Switch?
Packet Lookup and Classification:
Where does a packet go next?
Switching Fabrics:How does the packet get there?
Output Scheduling:When should the packet leave?
Output Scheduling
8/6/2019 Sigcomm Tutorial
159/189
Copyright 1999. All Rights Reserved 159
What is output scheduling?
How is it done?
Practical Considerations
Output Scheduling
8/6/2019 Sigcomm Tutorial
160/189
Copyright 1999. All Rights Reserved 160
scheduler
Allocating output bandwidthControlling packet delay
Output Scheduling
8/6/2019 Sigcomm Tutorial
161/189
Copyright 1999. All Rights Reserved 161
FIFO
Fair Queueing
Motivation
FIFO i l b i Q S
8/6/2019 Sigcomm Tutorial
162/189
Copyright 1999. All Rights Reserved 162
FIFO is natural but gives poor QoS
bursty flows increase delays for others
hence cannot guarantee delays
Need round robin scheduling of packets
Fair Queueing
Weighted Fair Queueing, Generalized Processor Sharing
Fair queueing: Main issues
8/6/2019 Sigcomm Tutorial
163/189
Copyright 1999. All Rights Reserved 163
Level of granularitypacket-by-packet? (favors long packets)
bit-by-bit? (ideal, but very complicated)
Packet Generalized Processor Sharing (PGPS)
serves packet-by-packet
and imitates bit-by-bit schedule within a tolerance
How does WFQ work?
8/6/2019 Sigcomm Tutorial
164/189
Copyright 1999. All Rights Reserved 164
WR = 1
WG = 5
WP = 2
Delay guarantees
8/6/2019 Sigcomm Tutorial
165/189
Copyright 1999. All Rights Reserved 165
Theorem
If flows are leaky bucket constrained and all nodes
employ GPS (WFQ), then the network canguarantee worst-case delay bounds to sessions.
Practical considerations
8/6/2019 Sigcomm Tutorial
166/189
Copyright 1999. All Rights Reserved 166
Forevery packet, the scheduler needs to
classify it into the right flow queue and maintain a linked-list
for each flow
schedule it for departure
Complexities of both are o(log [# of flows])
first is hard to overcome
second can be overcome by DRR
Deficit Round Robin
8/6/2019 Sigcomm Tutorial
167/189
Copyright 1999. All Rights Reserved 167
50 700 250
400 600
200 600 100
500
500 Quantum size
250
500
500400
750
1000
Good approximation of FQ
Much simpler to implement
But...
8/6/2019 Sigcomm Tutorial
168/189
Copyright 1999. All Rights Reserved 168
WFQ is still very hard to implementclassification is a problem
needs to maintain too much state information
doesnt scale well
Strict Priorities and Diff Serv
l if fl i i i l
8/6/2019 Sigcomm Tutorial
169/189
Copyright 1999. All Rights Reserved 169
Classify flows into priority classes
maintain only per-class queues
perform FIFO within each class
avoid curse of dimensionality
Diff Serv
8/6/2019 Sigcomm Tutorial
170/189
Copyright 1999. All Rights Reserved 170
A framework for providing differentiated QoS
set Type of Service (ToS) bits in packet headers
this classifies packets into classes
routers maintain per-class queues
condition traffic at network edges to conform to
class requirements
May still need queue management inside the network
References for O/p Scheduling
8/6/2019 Sigcomm Tutorial
171/189
Copyright 1999. All Rights Reserved 171
- A. Demers et al, Analysis and simulation of a fair queueing algorithm,
ACM SIGCOMM 1989.
- A. Parekh, R. Gallager, A generalized processor sharing approach to
flow control in integrated services networks: the single node
- M. Shreedhar, G. Varghese, Efficient Fair Queueing using Deficit Round
Robin, ACM SIGCOMM, 1995.- K. Nichols, S. Blake (eds), Differentiated Services: Operational Model
and Definitions, Internet Draft, 1998.
case, IEEE Trans. on Networking, June 1993.- A. Parekh, R. Gallager, A generalized processor sharing approach to
flow control in integrated services networks: the multiple node
case, IEEE Trans. on Networking, August 1993.
Active Queue Management
8/6/2019 Sigcomm Tutorial
172/189
Copyright 1999. All Rights Reserved 172
Problems with traditional queue management
tail drop
Active Queue Management
goals
an example
effectiveness
Tail Drop Queue ManagementL k O t
8/6/2019 Sigcomm Tutorial
173/189
Copyright 1999. All Rights Reserved 173
Max Queue Length
Lock-Out
Tail Drop Queue Management
8/6/2019 Sigcomm Tutorial
174/189
Copyright 1999. All Rights Reserved 174
Drop packets only when queue is full
long steady-state delay
global synchronization
bias against bursty traffic
Global Synchronization
8/6/2019 Sigcomm Tutorial
175/189
Copyright 1999. All Rights Reserved 175
Max Queue Length
Bias Against Bursty Traffic
8/6/2019 Sigcomm Tutorial
176/189
Copyright 1999. All Rights Reserved 176
Max Queue Length
Alternative Queue Management
Schemes
8/6/2019 Sigcomm Tutorial
177/189
Copyright 1999. All Rights Reserved 177
Schemes
Active Queue ManagementGoals
8/6/2019 Sigcomm Tutorial
178/189
Copyright 1999. All Rights Reserved 178
Solve lock-out and full-queue problems
no lock-out behavior
no global synchronization
no bias against bursty flow
Provide better QoS at a router
low steady-state delay
lower packet dropping
Goals
Active Queue Management
8/6/2019 Sigcomm Tutorial
179/189
Copyright 1999 All Rights Reserved 179
Problems with traditional queue management
tail drop
Active Queue Management
goals
an exampleeffectiveness
Random Early Detection (RED)
8/6/2019 Sigcomm Tutorial
180/189
Copyright 1999 All Rights Reserved 180
q if qavg
< minth: admit every packet
q else if qavg maxth: drop every incoming packet
minthmaxth
P1P
kP2
qavg
Effectiveness of RED: Lock-Out
8/6/2019 Sigcomm Tutorial
181/189
Copyright 1999. All Rights Reserved 181
Packets are randomly dropped
Each flow has the same probability of being discarded
Effectiveness of RED: Full-Queue
8/6/2019 Sigcomm Tutorial
182/189
Copyright 1999. All Rights Reserved 182
Drop packets probabilistically in anticipation of congestion (not when queue is full)
Use qavg to decide packet dropping probability: allow instantaneous bursts
Randomness avoids global synchronization
What QoS does RED Provide?
8/6/2019 Sigcomm Tutorial
183/189
Copyright 1999. All Rights Reserved 183
Lower buffer delay: good interactive service
qavg is controlled to be small
Given responsive flows: packet dropping is reduced
early congestion indication allows traffic to throttle back before congestion
Given responsive flows: fair bandwidth allocation
Unresponsive or aggressive flows
8/6/2019 Sigcomm Tutorial
184/189
Copyright 1999. All Rights Reserved 184
Dont properly back off during congestion
Take away bandwidth from TCP
compatible flows Monopolize buffer space
Control Unresponsive Flows
8/6/2019 Sigcomm Tutorial
185/189
Copyright 1999. All Rights Reserved 185
Some active queue management schemes
RED with penalty box
Flow RED (FRED)
Stabilized RED (SRED)
identify and penalize unresponsive flows with a bit of extra work
Active Queue ManagementReferences
8/6/2019 Sigcomm Tutorial
186/189
Copyright 1999. All Rights Reserved 186
B. Braden et al. Recommendations on queue management andcongestion avoidance in the internet, RFC2309, 1998.
S. Floyd, V. Jacobson, Random early detection gateways for
congestion avoidance, IEEE/ACM Trans. on Networking, 1(4),
Aug. 1993. D. Lin, R. Morris, Dynamics on random early detection, ACM
SIGCOMM, 1997
T. Ott et al. SRED: Stabilized RED, INFOCOM 1999
S. Floyd, K. Fall, Router mechanisms to support end-to-end
congestion control, LBL technical report, 1997
Tutorial Outline
8/6/2019 Sigcomm Tutorial
187/189
Copyright 1999. All Rights Reserved 187
Introduction:What is a Packet Switch?
Packet Lookup and Classification:
Where does a packet go next? Switching Fabrics:
How does the packet get there?
Output Scheduling:When should the packet leave?
Basic Architectural Components
8/6/2019 Sigcomm Tutorial
188/189
Copyright 1999. All Rights Reserved 188
PolicingOutput
SchedulingSwitching
Routing
Congestion
ControlReservation
Admission
Control
Control
Datapath:per-packet
processing
Basic Architectural ComponentsDatapath: per-packet processing
Output1.
3.
8/6/2019 Sigcomm Tutorial
189/189
ForwardingDecision
ForwardingDecision
ForwardingTable
ForwardingTable
F di
Interconnect
Output
Scheduling2.