Mastering Data Center QoS
BRKRST-2509
V5.2
Slides: https://db.tt/xN7Lw9AJ
Lucien Avramov @flying91
Distinguished Speaker and Technical Marketing Engineer – INSBU
Nexus 9000 and ACI
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Session Objective: WHY, WHEN and HOW of QoS
3
At the end of the session, the participants should:
• Understand Data Center QoS Requirements and Capabilities
• Understand QoS implementation on Nexus platforms
• Understand how to configure QoS on Nexus
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Mastering Data Center QoS – BRKRST-2509
4
• Data Center QoS Requirements
• Nexus QoS Capabilities
• Nexus QoS Configuration
–Nexus Configuration Model: MQC
–Platform Configuration Examples
1K Cisco Nexus
x86
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
VMotion
FCoE
Evolution of QoS Design – Where do we Start?
5
• Quality of Service is just about voice and video anymore
• Campus Specialization
Desktop based Unified Communications
Blended Wired & Wireless Access
• Data Center Specialization
Compute and Storage Virtualization
Cloud Computing
• Protocol convergence onto the fabric
Storage – FCoE, iSCSI, NFS
Inter-Process and compute communication (RCoE, vMotion, … )
• Switching Evolution and Specialization
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Data Center QoS Design Requirements Where are we starting from?
6
• VoIP and Video are now mainstream technologies
• Ongoing evolution to the full spectrum of Unified Communications
• High Definition Executive Communication Application requires stringent Service-Level Agreement (SLA)
– Reliable Service—High Availability Infrastructure
– Application Service Management—QoS
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Media based Application Requirements Voice vs. Video – At the Packet Level
7
20 msec
Voice Packets
Bytes
200
600
1000
Audio
Samples
1400
Time
200
600
1000
1400
33 msec
Video Packets
Video
Frame
Video
Frame
Video
Frame
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Enterprise / Campus QoS Design Requirements QoS for Voice / Video is implicit–Medianet RFC 4594
8
Application
Class
Per-Hop
Behavior
Admission
Control
Queuing &
Dropping
Application
Examples
VoIP Telephony EF Required Priority Queue (PQ) Cisco IP Phones (G.711, G.729)
Broadcast Video CS5 Required (Optional) PQ Cisco IP Video Surveillance / Cisco Enterprise TV
Realtime Interactive CS4 Required (Optional) PQ Cisco TelePresence
Multimedia Conferencing AF4 Required BW Queue + DSCP WRED Cisco Unified Personal Communicator, WebEx
Multimedia Streaming AF3 Recommended BW Queue + DSCP WRED Cisco Digital Media System (VoDs)
Network Control CS6 BW Queue EIGRP, OSPF, BGP, HSRP, IKE
Call-Signaling CS3 BW Queue SCCP, SIP, H.323
Ops / Admin / Mgmt (OAM) CS2 BW Queue SNMP, SSH, Syslog
Transactional Data AF2 BW Queue + DSCP WRED ERP Apps, CRM Apps, Database Apps
Bulk Data AF1 BW Queue + DSCP WRED E-mail, FTP, Backup Apps, Content Distribution
Best Effort DF Default Queue + RED Default Class
Scavenger CS1 Min BW Queue (Deferential) YouTube, iTunes, BitTorent, Xbox Live
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Data Center QoS Requirements - Goodput
9
• A balanced fabric is a function of maximal throughput ‘and’ minimal loss => “Goodput”
• Application-level throughput (goodput): – Given by the total bytes received from
all senders divided by the finishing time of the last sender.
– “Understanding TCP Incast Throughput Collapse in Datacenter Networks”
5 millisecond view Congestion Threshold exceeded
Data Center Design Goal: Optimizing the balance of end to
end fabric latency with the ability to absorb traffic peaks and prevent
any associated traffic loss
http://www.ietf.org/id/draft-dcbench-def-00.txt
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Diversity of Data Center Application Flows
10
• Small Flows/Messaging
(Heart-beats, Keep-alive, delay sensitive application messaging)
• Small – Medium Incast
(Hadoop Shuffle, Scatter-Gather, Distributed Storage)
• Large Flows
(HDFS Insert, File Copy)
• Large Incast
(Hadoop Replication, Distributed Storage)
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Data Center QoS Design Requirements – what else do we need to consider?
11
• The Data Center adds a number of new traffic types and requirements
– No Drop, IPC, Storage, Vmotion, …
• New Protocols and mechanisms
– 802.1Qbb, 802.1Qaz, ECN, …
Spectrum of Design Evolution
Ultra Low Latency
• Queueing is designed out of the network whenever possible
• Nanoseconds matter
Warehouse Scale • ECN & Data Center TCP • Hadoop and Incast Loads on
the server ports
slot 1 slot 2 slot 3 slot 4 slot 5 slot 6 slot 7 slot 8
blade1 blade2 blade3 blade4 blade5 blade6 blade7 blade8
slot 1 slot 2 slot 3 slot 4 slot 5 slot 6 slot 7 slot 8
blade1 blade2 blade3 blade4 blade5 blade6 blade7 blade8
slot 1 slot 2 slot 3 slot 4 slot 5 slot 6 slot 7 slot 8
blade1 blade2 blade3 blade4 blade5 blade6 blade7 blade8
slot 1 slot 2 slot 3 slot 4 slot 5 slot 6 slot 7 slot 8
blade1 blade2 blade3 blade4 blade5 blade6 blade7 blade8
slot 1 slot 2 slot 3 slot 4 slot 5 slot 6 slot 7 slot 8
blade1 blade2 blade3 blade4 blade5 blade6 blade7 blade8
slot 1 slot 2 slot 3 slot 4 slot 5 slot 6 slot 7 slot 8
blade1 blade2 blade3 blade4 blade5 blade6 blade7 blade8
slot 1 slot 2 slot 3 slot 4 slot 5 slot 6 slot 7 slot 8
blade1 blade2 blade3 blade4 blade5 blade6 blade7 blade8
slot 1 slot 2 slot 3 slot 4 slot 5 slot 6 slot 7 slot 8
blade1 blade2 blade3 blade4 blade5 blade6 blade7 blade8
slot 1 slot 2 slot 3 slot 4 slot 5 slot 6 slot 7 slot 8
blade1 blade2 blade3 blade4 blade5 blade6 blade7 blade8
slot 1 slot 2 slot 3 slot 4 slot 5 slot 6 slot 7 slot 8
blade1 blade2 blade3 blade4 blade5 blade6 blade7 blade8
slot 1 slot 2 slot 3 slot 4 slot 5 slot 6 slot 7 slot 8
blade1 blade2 blade3 blade4 blade5 blade6 blade7 blade8
slot 1 slot 2 slot 3 slot 4 slot 5 slot 6 slot 7 slot 8
blade1 blade2 blade3 blade4 blade5 blade6 blade7 blade8
Virtualized Data Center
• vMotion, iSCSI, FCoE, NAS, CIFS
• Multi Tenant Applications • Voice & Video
HPC/GRID
• Low Latency • Bursty Traffic
(workload migration) • IPC • iWARP & RoCE
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Trust Boundaries – What have we trusted?
12
Access-Edge Switches
Conditionally Trusted Endpoints
Example: IP Phone + PC
Secure Endpoint
Example: Software-protected PC
With centrally-administered QoS
markings
Unsecure Endpoint
Tru
st
Bo
un
dary
Tru
st
Bo
un
dary
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
The Evolving Data Centre Architecture
13
Re-Visit - Where Is the Edge?
NIC
PCI-E Bus
Operating System and Device Drivers
FC 3/11
HBA
Eth 2/12
Edge of the Network and Fabric
pNIC
PCI-E Bus
Hypervisor provides virtualization of PCI-E resources
FC 3/11
HBA
Eth 2/12
Edge of the Fabric
VMF
S
SCSI
VNIC
VETH
Still 2 PCI Addresses on the BUS
PCI-E Bus
Hypervisor provides virtualization of PCI-E resources
Edge of the Fabric
VMF
S
SCS
I
PCIe
Eth
ern
et
Fib
re C
han
nel
10
Gb
E
10G
bE
Link
Eth 2/12
vFC 3
Converged Network Adapter provides
virtualization of the physical
Media
VNIC
VETH PCI-E Bus
Hypervisor provides virtualization of PCI-E resources
Edge of the Fabric
VM
FS
SC
SI
veth 1
vFC 4
SR-IOV adapter provides multiple PCIe resources
10GE - VNTag
Eth
1
FC
2
Eth
3
FC
4
Eth
126
vFC 2
vFC 3
vFC 126
Pas
s
Thr
u
VNIC
VETH
Compute and Fabric Edge are Merging
13
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
What do we trust and where do we classify and mark?
14
• Data Centre architecture can be provide a new set of trust boundaries
• Virtual Switch extends the trust boundary into the memory space of the Hypervisor
• Converged and Virtualized Adapters provide for local classification, marking and queuing
vPC
vPC
VM #4
VM #3
VM #2
N1KV – Classification,
Marking & Queuing
COS Based Queuing in the
extended Fabric
Trust Boundary
CNA/A-FEX - Classification and Marking
N2K – CoS Marking
COS Based Queuing in the
extended Fabric
N5K – CoS/DSCP Marking,
Queuing and Classification
N7K/N6K – CoS/DSCP Marking,
Queuing and Classification
COS/DSCP Based Queuing in the
extended Fabric
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Data Center QoS Model: 4 and 8 Class Model
15
Time
Critical Data
Realtime
4-Class Model
Best Effort
Signaling / Control Call Signaling
Critical Data
Interactive Video
Voice
8-Class Model
Scavenger
Best Effort
Streaming Video
Network Control
Network Management
Realtime Interactive
Transactional Data
Multimedia Conferencing
Voice
12-Class Model
Bulk Data
Scavenger
Best Effort
Multimedia Streaming
Network Control
Broadcast Video
Call Signaling
http://www.cisco.com/en/US/docs/solutions/Enterprise/WAN_and_MAN/QoS_SRND_40/QoSIntro_40.html#wp61135
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public 16
iSCSI
Appliance
File System
Application
SCSI Device Driver iSCSI Driver
TCP/IP Stack
NIC
Volume Manager
NIC
TCP/IP Stack
iSCSI Layer
Bus Adapter
iSCSI
Gateway
FC
File System
Application
SCSI Device Driver iSCSI Driver
TCP/IP Stack
NIC
Volume Manager
NIC
TCP/IP Stack
iSCSI Layer
FC HBA
NAS
Appliance
NIC
TCP/IP Stack
I/O Redirector
File System
Application
NFS/CIFS
NIC
TCP/IP Stack
File System
Device Driver
Block I/O
NAS
Gateway
NIC
TCP/IP Stack
I/O Redirector
File System
Application
NFS/CIFS
FC
NIC
TCP/IP Stack
File System
FC HBA
The Flexibility of a
Unified Fabric
Transport
‘Any RU to Any
Spindle’
FCoE SAN
FCoE
SCSI Device Driver
File System
Application
Computer System Computer System Computer System Computer System Computer System
Block I/O File I/O
SAN IP IP IP IP
Block I/O
NIC
Volume Manager Volume Manager
FCoE Driver
Evolving Data Centre Architecture Where and How is Storage Connected?
16
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
NX-OS QoS Requirements: COS or DSCP?
17
• We have non IP based traffic to consider again
– FCoE – Fibre Channel Over Ethernet
– RCoE – RDMA Over Ethernet
• DSCP is still marked but CoS will be required in Nexus Data Center designs
PCP/COS Network priority Acronym Traffic characteristics
1 0 (lowest) BK Background
0 1 BE Best Effort
2 2 EE Excellent Effort
3 3 CA Critical Applications
4 4 VI Video, < 100 ms latency
5 5 VO Voice, < 10 ms latency
6 6 IC Internetwork Control
IEEE 802.1Q-2005
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
NX-OS QoS Requirements – Where do we put the new traffic types
18
• In this example of a Virtualized Multi-Tenant Data Center there is a potential overlap/conflict with Voice/Video queuing assignments, e.g.
– COS 3 – FCoE ‘and’ Call Control
Traffic Type Network Class COS Class, Property, BW Allocation
Infrastructure Control 6 Platinum, 10%
vMotion 4 Silver, 20%
Tenant
Gold, Transactional 5 Gold, 30%
Silver, Transactional 2 Bronze, 15%
Bronze, Transactional 1 Best effort, 10%
Storage FCOE 3 No Drop, 15%
NFS datastore 5 Silver
Non Classified Data 1 Best Effort
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Mastering Data Center QoS – BRKRST 2509
19
• Data Center QoS Requirements
• Nexus QoS Capabilities
• Nexus QoS Configuration
1K Cisco Nexus
x86
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Priority Flow Control – FCOE Flow Control – 802.1Qbb
20
• Enables lossless Ethernet using PAUSE based on a COS as defined in 802.1p
• When link is congested, CoS assigned to “no-drop” will be PAUSED
• Other traffic assigned to other CoS values will continue to transmit and rely on upper layer protocols for retransmission
• Not only for FCoE traffic
Packet
R_R
DY
Fibre Channel
Transmit Queues Ethernet Link
Receive Buffers
Eight
Virtual
Lanes
One One
Two Two
Three Three
Four Four
Five Five
Seven Seven
Eight Eight
Six Six
STOP PAUSE
B2B Credits
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Enhanced Transmission Selection (ETS) Bandwidth Management – 802.1Qaz
21
• Prevents a single traffic class of “hogging” all the bandwidth and starving other classes
• When a given load doesn’t fully utilize its allocated bandwidth, it is available to other classes
• Helps accommodate for classes of a “bursty” nature
Offered Traffic
t1 t2 t3
10 GE Link Realized Traffic Utilization
3G/s HPC Traffic
3G/s
2G/s
3G/s Storage Traffic
3G/s
3G/s
LAN Traffic
4G/s
5G/s 3G/s
t1 t2 t3
3G/s 3G/s
3G/s 3G/s 3G/s
2G/s
3G/s 4G/s 6G/s
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Unified Fabric
22
• Storage I/O - Flexibility and Serialized Re-Use
8 Gb
2 Gb
2 Gb
2 Gb
14 Gb
Boot Production vMotion
3 Gb 2 Gb 4 Gb
Back
Front
vMotion
SAN
Server Life Cycle Network
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Unified Fabric
23
• Storage I/O - Flexibility and Serialized Re-Use
Network
20 Gb
20 Gb
Boot Production vMotion
2 Cables
Consolidated I/O
10 Gb 20 Gb 10 Gb
Server Life Cycle
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Data Center Bridging Control Protocol – 802.1Qaz
24
• Negotiates Ethernet capability’s : PFC, ETS, CoS values between DCB capable peer devices
• Simplifies Management : allows for configuration and distribution of parameters from one node to another
• Responsible for Logical Link Up/Down signaling of Ethernet and Fibre Channel
• DCBX is LLDP with new TLV fields
• The original pre-standard CIN (Cisco, Intel, Nuova) DCBX utilized additional TLV’s
• DCBX negotiation failures result in: – per-priority-pause not enabled on CoS values
– vfc not coming up – when DCBX is being used in FCoE environment
DCBX Switch
DCBX CNA
Adapter
dc11-5020-3# sh lldp dcbx interface eth 1/40
Local DCBXP Control information:
Operation version: 00 Max version: 00 Seq no: 7 Ack no: 0
Type/
Subtype Version En/Will/Adv Config
006/000 000 Y/N/Y 00
<snip>
https://www.cisco.com/en/US/netsol/ns783/index.html
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Explicit Congestion Notification [ECN] - TCP
25
ECN is an extension to TCP that provides end-to-end congestion notification without dropping packets. Both the network infrastructure and the end hosts have to be capable of supporting ECN for it to function properly. ECN uses the two least significant bits in the Diffserv field in the IP header to encode four different values. During periods of congestion a router will mark the DSCP header in the packet indicating congestion (0x11) to the receiving host who should notify the source host to reduce its transmission rate.
N3K-1(config)# policy-map type network-qos traffic-priorities
N3K-1(config-pmap-nq)# class type network-qos class-gold
N3K-1(config-pmap-nq-c)# congestion-control random-detect ecn
The configuration for enabling ECN is very similar to the previous WRED example, so only the policy-map configuration with the ecn option is displayed for simplicity.
ECN Configuration
Diffserv field Values in the IP Header
0x00 – Non ECN-Capable Transport
0x10 - ECN Capable Transport (0)
0x01 – ECN Capable Transport (1)
0x11 – Congestion Encountered
WRED and ECN are always applied to the system policy
Notes: When configuring ECN ensure there are not any queuing policy-maps applied to the
interfaces. Only configure the queuing policy under the system policy.
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
ECN in Action! Incast Results
26
0
2000
4000
6000
8000
10000
12000
1 3 5 7 9 11 13 15 17 19 21 23
Go
od
pu
t in
Mb
ps
Server Incast #
TCP
ECN
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
• Topology and traffic pattern changes require us to re-evaluate the assumptions of congestion
management within the Data Center
• Higher density of uplinks with greater multi-pathing ratio is resulting in more variability in
congestion patterns
• Distribution of workload is adding another dimension of traffic patterns
• Two options
• Spend the time to statically engineering marking, queuing and traffic patterns to accommodate
these new patterns
• Build a more systems based reactive approach to congestion management for traffic within
the Data Center
ACI Fabric - next generation DC QoS
40G links with
10G sinks and
sources
Higher density
of multi-pathing
Increasing
distribution of
workload
APIC
27
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
ACI Fabric Load Balancing Flowlet Switching
H1 H2
TCP flow
• State-of-the-art ECMP hashes flows (5-
tuples) to path to prevent reordering
TCP packets.
• Flowlet switching* routes bursts of
packets from the same flow
independently.
• No packet re-ordering
Gap ≥ |d1 – d2|
d1 d2
*Flowlet Switching (Kandula et al ’04)
28
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
ACI Fabric Load Balancing Congestion Monitoring
P1 5
At every ASIC hop along the path DRE field is
updated with the congestion metric (if my DRE >
current DRE then update DRE field else keep
current DRE)
2
P1 0 0
Every packet is sourced from an
iLeaf with an identifier for the
ingress port to the Fabric and a
DRE = 0 metric
0 P1
1
P1 0 5
Next packet sent back to the
originating iLeaf will carry
with it feedback information
on the max load recorded on
the ingress path 4
P4
On receipt of the metric feedback the original leaf
will update the Flowlet load balancing weights (It
will also receive indication of the load from the
other direction)
6 P1 3 5 P4 P1 5
On receipt of each packet update the
DRE weight in feedback table tracking
LBTag (source identifier)
3
P1 3 5 P4
5
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
ACI Fabric Load Balancing Dynamic Flow Prioritization
Real traffic is a mix of large (elephant) and small (mice) flows.
F1
F2
F3
Standard (single priority):
Large flows severely impact
performance (latency & loss).
for small flows
High
Priority
Dynamic Flow Prioritization:
Fabric automatically gives a
higher priority to small flows.
Standard
Priority Key Idea:
Fabric detects initial few
flowlets of each flow and
assigns them to a high
priority class.
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Application Performance Improvements ACI Fabric Load Balancing
0
0.2
0.4
0.6
0.8
1
30 40 50 60 70 80
No
rma
lize
d F
CT
IFLB results in ~20-35% better
throughput efficiency in stead
state
0
0.2
0.4
0.6
0.8
1
30 40 50 60 70 80
No
rma
lize
d F
CT
Traffic Load (%) Traffic Load (%)
0
5
10
15
20
30 40 50 60 70 80
Traffic Load (%)
Slo
wd
ow
n (
x T
ime
s S
low
er)
Standard ECMP with No Priority
ECMP ‘with’ Priority
Dynamic Load Balancing with
Priority
During Link Loss Application
Flow Completion is significantly
reduced
As traffic volumes increase
application impact is
significantly reduced
31
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Hadoop Symmetric Topology (no link failure)
0
50
100
150
200
250
300
350
400
450
500
0 5 10 15 20 25 30 35 40
Jo
b C
om
ple
tio
n T
ime
(s
ec
)
Trial Number
ECMP
DLBFlowlet
32
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Hadoop Asymmetric Topology (link failure)
0
50
100
150
200
250
300
350
400
450
500
0 5 10 15 20 25 30 35 40
Jo
b C
om
ple
tio
n T
ime
(s
ec
)
Trial Number
ECMP
~2x
improvement
33
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Application Performance Improvements MemCacheD Throughput Improvement
0
50
100
150
200
0 100 200 300 400 500 600 700
Mem
cach
ed
T
hro
ug
hp
ut
(MB
/s)
Time (seconds)
Background
traffic (iperf)
starts Dynamic Packet
Prio enabled
~10x improvement
34
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
When are buffers needed?
35
• Speed Mismatch
• Incast / Many to One conversations
• Storage
35
1GE
Acces
s
10GE
10GE 10GE
10GE
1GE
Acces
s
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
1 13 25 37 49 61 73 85 97 109
121
133
145
157
169
181
193
205
217
229
241
253
265
277
289
301
313
325
337
349
361
373
385
397
409
421
433
445
457
469
481
493
505
517
529
541
553
565
577
589
601
613
625
637
649
661
673
685
697
709
721
733
745
757
769
781
793
Job
Com
plet
ion
Cell U
sage
1G Buffer Used 10G Buffer Used 1G Map % 1G Reduce % 10G Map % 10G Reduce %
Buffer Amount – 1GE vs. 10GE Buffer Usage
36
10 GE
Buffer
1 GE Buffer
Going to 10GE lowers the buffer utilization on switching layer
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Remaining Logic (Forwarding, etc)
Tables (L2,L3,MC,etc)
Buffer Amount – the Buffer Bloat
37
Buffer
ASIC
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Egress Buffer
I
N
G
R
E
S
S
E
G
R
E
S
S
I
N
G
R
E
S
S
Ingress per port Buffer
Scheduler
Crossbar Egress per port Buffer
E
G
R
E
S
S
Shared
Memory
Buffer Scheduler
I
N
G
R
E
S
S
E
G
R
E
S
S
Crossbar
Scheduler
Buffer Amount – The Switch Architecture
38
Cat6k
N7K M
N5K
N6K
N7k F
N3K
N9K multi-SoC
… …
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Nexus 7000 F2/F2e QoS – Ingress Queuing
39
Ingress buffered architecture implements a large, distributed buffer pool to absorb congestion
Ingress buffered architectures absorb congestion at every ingress port contributing to congestion, leveraging all per-port ingress buffer
Versus egress buffered architectures (e.g., Catalyst 6500) which absorb congestion at each egress port, requiring large per-port egress buffer
Excess traffic does not consume fabric bandwidth, only to be dropped at the egress port
5
6
7
8
Fabric
1
2
Ingress
Egress
2:1 Ingress:Egress Ingress
Egress
8:1 Ingress:Egress 1
2
3
4 Fabric
Available buffer
for congestion
management:
Available buffer
for congestion
management:
Ingress
VOQ buffer
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public 40
In typical Data Center access designs multiple ingress access ports transmit to a few uplink ports
Nexus 5000 and 5500 utilize an Ingress Queuing architecture
Packets are stored in ingress buffers until egress port is free to transmit
Ingress queuing provides an additive effective
The total queue size available is equal to [number of ingress ports x queue depth per port]
Statistically ingress queuing provides the same advantages as shared buffer memory architectures
Egress Queue 0
is full, link
congested
Traffic is Queued on all ingress interface buffers
providing a cumulative scaling of buffers for
congested ports
Nexus 6000/5500/5000 – Ingress Queuing
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Nexus 5500 QoS Defaults
41
• QoS is enabled by default (not possible to turn it off)
• Three default class of services defined when system boots up
– Two for control traffic (CoS 6 & 7)
– Default Ethernet class (class-default – all others)
• Cisco Nexus 5500 switch supports five user-defined classes and the one default drop system class
• FCoE queues are ‘not’ pre-allocated
• When configuring FCoE the predefined service policies must be added to existing QoS configurations
# Predefined FCoE service policies
service-policy type qos input fcoe-default-in-policy
service-policy type queuing input fcoe-default-in-policy
service-policy type queuing output fcoe-default-out-policy
service-policy type network-qos fcoe-default-nq-policy
Gen 2 UPC
Unified Crossbar Fabric
Gen 2 UPC
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Nexus 6000 - Increased Packet Buffer
448
Gbps 224
Gbps
Ingress
UPC Egress UPC
Un
ica
st
VO
Q
Mu
ltic
as
t V
OQ
Unified
Crossbar
Fabric
16MB 9MB
• 25-MB packet buffer is shared by every three 40 GE ports or twelve 10 GE
ports.
• Buffer is 16 MB at ingress and 9 MB at egress.
• Unicast packet can be buffered at both ingress and egress.
• Multicast is buffered at egress.
42
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Nexus 6000/5000/5500 QoS – PFC
43
• Actions when congestion occurs depending on policy configuration – PAUSE upstream transmitter for lossless traffic
– Tail drop for regular traffic when buffer is exhausted
• Priority Flow Control (PFC) or 802.3X PAUSE can be deployed to ensure lossless for application that can’t tolerate packet loss
• Buffer management module monitors buffer usage for no-drop class of service. It signals MAC to generate PFC (or link level PAUSE) when the buffer usage crosses threshold
• FCoE traffic is assigned to class-fcoe, which is a no-drop system class
• Other class of service by default have normal drop behavior (tail drop) but can be configured as no-drop
SFP SFP SFP SFP
SFP SFP SFP SFP
Unified
Crossbar
Fabric
Egress
UPC
ingress
UPC
1. Congestion or
Flow Control on
Egress Port
2. Egress UPC
does not allow
Fabric Grants
3. Traffic is
Queued on
Ingress
4. If queue is
marked as no-drop
or flow control
then Pause is sent
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Nexus Qos Key Concepts – Ingress Model
44
Nexus 6000 /5000/5500/7000 F
Ingress buffering and queuing (as defined by ingress queuing policy) occurs at VOQ of each ingress port
Ingress VOQ buffers are primary congestion-management point for arbitrated traffic
Egress scheduling (as defined by egress queuing policy) enforced by egress port
Egress scheduling dictates manner in which egress port bandwidth made available at ingress
Per-port, per-priority grants from arbiter control which ingress frames reach egress port
For Your Reference
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public 45
Multi-Level Scheduling
Per-port Per-group
Deficit Round Robin
Buffer/Queuing Block UC Queue 0 UC Queue 1 UC Queue 2 UC Queue 3 UC Queue 4 UC Queue 5 UC Queue 6 UC Queue 7 MC Queue 0 MC Queue 1 MC Queue 2 MC Queue 3
UC Queue 0 UC Queue 1 UC Queue 2 UC Queue 3 UC Queue 4 UC Queue 5 UC Queue 6 UC Queue 7 MC Queue 0 MC Queue 1 MC Queue 2 MC Queue 3
UC Queue 0 UC Queue 1 UC Queue 2 UC Queue 3 UC Queue 4 UC Queue 5 UC Queue 6 UC Queue 7 MC Queue 0 MC Queue 1 MC Queue 2 MC Queue 3
….
Egress
port 2
….
Egress
port 1
Egress
port 64
9MB Total
Shared
Per Port
Reserved
A pool of 18MB/9MB Buffer
space is divided up among
Egress reserved and
Dynamically shared buffer
Nexus 3000 / 3500 – Shared Memory Architecture
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Nexus 2248TP-E – 32MB Shared Buffer
46
• Speed mismatch between 10G NAS and 1G server requires QoS tuning
• Nexus 2248TP-E utilizes a 32MB shared buffer to handle larger traffic bursts
• Hadoop, NAS, AVID are examples of bursty applications
• You can control the queue limit for a specified Fabric Extender for egress direction (from the network to the host)
• You can use a lower queue limit value on the Fabric Extender to prevent one blocked receiver from affecting traffic that is sent to other non-congested receivers ("head-of-line blocking”)
N5548-L3(config-fex)# hardware N2248TPE queue-limit 4000000 rx
N5548-L3(config-fex)# hardware N2248TPE queue-limit 4000000 tx
N5548-L3(config)#interface e110/1/1
N5548-L3(config-if)# hardware N2348TP queue-limit 4096000 tx
VM
#4
VM
#3
VM
#2
NAS
iSCSI
10G Attached Source (NAS Array)
1G Attached Server
10G
NF
S
Tune 2248TP-E to support a extremely large
burst (Hadoop, AVID, …)
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Nexus 2248TP-E Counters
47
N5596-L3-2(config-if)# sh queuing interface e110/1/1
Ethernet110/1/1 queuing information:
Input buffer allocation:
Qos-group: 0
frh: 2
drop-type: drop
cos: 0 1 2 3 4 5 6
xon xoff buffer-size
---------+---------+-----------
0 0 65536
Queueing:
queue qos-group cos priority bandwidth mtu
--------+------------+--------------+---------+---------+----
2 0 0 1 2 3 4 5 6 WRR 100 9728
Queue limit: 2097152 bytes
Queue Statistics:
---+----------------+-----------+------------+----------+------------+-----
Que|Received / |Tail Drop |No Buffer |MAC Error |Multicast |Queue
No |Transmitted | | | |Tail Drop |Depth
---+----------------+-----------+------------+----------+------------+-----
2rx| 5863073| 0| 0| 0| - | 0
2tx| 426378558047| 28490502| 0| 0| 0| 0
---+----------------+-----------+------------+----------+------------+-----
Ingress queue limit(Configurable)
Egress queue limit(Configurable)
Egress queues:
CoS to queue
mapping
Bandwidth allocation
MTU
Per port
per queue
counters
Drop due to
oversubscription
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
dc11-5020-4# sh queuing int eth 1/39
Interface Ethernet1/39 TX Queuing
qos-group sched-type oper-bandwidth
0 WRR 50
1 WRR 50
Interface Ethernet1/39 RX Queuing
qos-group 0
q-size: 243200, HW MTU: 1600 (1500 configured)
drop-type: drop, xon: 0, xoff: 1520
Statistics:
Pkts received over the port : 85257
Ucast pkts sent to the cross-bar : 930
Mcast pkts sent to the cross-bar : 84327
Ucast pkts received from the cross-bar : 249
Pkts sent to the port : 133878
Pkts discarded on ingress : 0
Per-priority-pause status : Rx (Inactive), Tx (Inactive)
<snip – other classes repeated>
Total Multicast crossbar statistics:
Mcast pkts received from the cross-bar : 283558
SFP SFP SFP SFP
Unified
Crossbar
Fabric
UPC
Egress (Tx) Queuing
Configuration
Packets Arriving on this port but
dropped from ingress queue due to
congestion on egress port
Mapping the switch architecture to ‘show queuing’
48
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Mastering Data Center QoS – BRKRST-2509
49
• Data Center QoS Requirements
• Nexus QoS Capabilities
• Nexus QoS Configuration
–Nexus Configuration Model: MQC
–Platform Configuration Examples • Nexus 7x00
• Nexus 6000 / 5x00 / 3000
1K Cisco Nexus
x86
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Nexus QoS – Capabilities and Configuration
50
Nexus 1000v/3000/5000/6000/7000 supports a new set of QoS capabilities designed to provide per system class based traffic control
Lossless Ethernet—Priority Flow Control (IEEE 802.1Qbb)
Traffic Protection—Bandwidth Management (IEEE 802.1Qaz)
Configuration signaling to end points—DCBX (part of IEEE 802.1Qaz)
These new capabilities are added to and managed by the common Cisco MQC (Modular QoS CLI) which defines a three-step configuration model
Define matching criteria via a class-map
Associate action with each defined class via a policy-map
Apply policy to entire system or an interface via a service-policy
Nexus leverage the MQC qos-group capabilities to identify and define traffic in policy configuration
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Nexus – QoS – Configuration Principles
51
• Class-Map
• Policy-Map
• Service-Policy 3 MQCs
• N1000v: 64 classes (8 pre-defined)
• N3K: 8 classes / Qos-groups (4 Multicast)
• N6K/N5K: 6 classes
• N7K: 2 to 8 classes
Classes
• Type Network-QOS
• Type Queuing
• Type QOS
Policies
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Type (CLI) Description Applied To…
QoS Packet Classification based on Layer 2/3/4 (Ingress) Interface or System
Network-QoS Packet Marking (CoS), Congestion Control WRED/ECN (Egress) System
Queuing Scheduling - Queuing Bandwidth % / Priority Queue (Egress) Interface or System
Nexus QoS - Overview
52
QoS is enabled by default (NX-OS Default)
Qos policy defines how the system classifies traffic, assigned to qos-groups
Network QoS policy defines system policies, e.g. which COS values ALL ports treat as drop
versus no-drop
Ingress queuing policy defines how ingress port buffers ingress traffic for ALL destinations
over fabric
Egress queuing policy defines how egress port transmits traffic on wire
‒ Conceptually, controls how all ingress ports schedule traffic toward the egress port over
fabric (by controlling manner in which bandwidth availability reported to arbiter)
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Classification System class matched by QoS Group
Policy MTU
Queue-Limit (5k)
Set CoS Mark 802.1p
Set DSCP (5500/3k)
ECN-WRED (3k)
Classification System class matched by qos-
group
Policy ETS Guaranteed scheduling deficit
weighted round robin (DWRR)
percentage
Priority
Strict priority scheduling – Only one
class can be configured for priority in
a given queuing policy
type qos type network-qos type queuing
1 2 3
4 Apply service-policy
Nexus QoS Methodology
53
Classification ACL, CoS, DSCP, IP RTP, Precedence,
Protocol
Policy Sets qos-group to the system class this
traffic flow is mapped to
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Software QoS Model – Configuration Steps
Class-map type queuing class-app-1
Match qos-group 1
Class-map type queuing class-app-2
Match qos-group 2
Policy-map type queuing policy-queue
Class type queuing class-default
Bandwidth percent 10
Class type queuing class-app-1
Bandwidth percent 50
Class type queuing class-app-2
Bandwidth percent 40
Class-map type network-qos class-app-1
Match qos-group 1
Class-map type network-qos class-app-2
Match qos-group 2
Policy-map type network-qos policy-nq
Class type network-qos class-app-1
Pause no-drop
MTU 9216
Class type network-qos class-app-2
Set cos 2
Queue-limit 81920 bytes
STEP 1 Qos - Ingress STEP 2- Network-Qos STEP 3 – Queuing In/Egress
system qos
service-policy type qos input INGRESS_CLASS
service-policy type network-qos MARK_COS
service-policy type queuing output EGRESS_QUEUE
ACL app-2
qos-group=2
set cos 2
Buffer 82kb Cos=2
STEP 4
Apply QoS – Global / per int.
Class-map type qos class-app-1
Match access-group app-1
Class-map type qos class-app-2
Match access-group app-2
Policy-map type qos policy-qos
Clas type qos class-app-1
Set qos-group 1
Class type qos class-app-2
Set qos-group 2
54
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Configuring QoS – type ‘network-qos’ Policies
55
• Define global queuing and scheduling parameters for all interfaces in switch – Identify drop/no-drop classes, instantiate specific default queuing policies, etc.
• One network-QoS policy per system, applies to all ports in all VDCs
• Assumption is network-QoS policy defined/applied consistently network-wide – Particularly for no-drop applications, end-to-end consistency mandatory
Switch 1 Switch 2 Switch 3
Network QoS policies should be applied
consistently on all switches network wide
Fabric
Ingress Module
Ingress Module
Ingress Module
Egress Module
Fabric
Ingress Module
Ingress Module
Ingress Module
Egress Module
Fabric
Ingress Module
Ingress Module
Ingress Module
Egress Module
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Mastering Data Center QoS – BRKRST-2509
56
• Data Center QoS Requirements
• Nexus QoS Capabilities
• Nexus QoS Configuration
–Nexus Configuration Model: MQC
–Platform Configuration Examples • Nexus 7x00
• Nexus 6000 / 5x00 / 3000
1K Cisco Nexus
x86
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Ingress/Egress Queuing Class-Maps
57
• class-map type queuing – Configures COS to Queue mapping
• Queuing class-map names are static, based on port-type and queue (Predefined)
• Configurable only in default VDC
– Changes apply to ALL ports of specified type in ALL VDCs
– Changes are traffic disruptive for ports of specified type
N7k-ADMIN(config)# class-map type queuing match-any
1p3q4t-out-pq1 1p7q4t-out-q-default 1p7q4t-out-q6 8q2t-in-q1 8q2t-in-q6
1p3q4t-out-q-default 1p7q4t-out-q2 1p7q4t-out-q7 8q2t-in-q2 8q2t-in-q7
1p3q4t-out-q2 1p7q4t-out-q3 2q4t-in-q-default 8q2t-in-q3
1p3q4t-out-q3 1p7q4t-out-q4 2q4t-in-q1 8q2t-in-q4
1p7q4t-out-pq1 1p7q4t-out-q5 8q2t-in-q-default 8q2t-in-q5
N7k-ADMIN(config)# class-map type queuing match-any 1p7q4t-out-pq1
N7k-ADMIN(config-cmap-que)# match cos 7
N7k-ADMIN(config-cmap-que)#
10G/40G/100G
Ingress Queue
Structure
10G/40G/100G
Egress Queue
Structure
1G Egress Queue
Structure 1G Ingress Queue
Structure
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Ingress Queuing – Logical View
58
CoS 0-4 (Q-Default)
CoS 5-7 (Q1)
8e Template
CoS 3 (Q4)
CoS 5-7 (Q1)
7e Template
CoS 0,1 (Q-Default)
CoS 2,4 (Q3)
CoS 0-2 (Q-Default)
CoS 5-7 (Q1)
6e Template
CoS 4 (Q3)
CoS 3 (Q4)
CoS 0 (Q-Default)
CoS 5-7 (Q1)
4e Template
CoS 4 (Q3)
CoS 1-3 (Q4)
Skid Buffers High & Low Drop
Threshold
High (Pause)
Threshold Low (Resume)
Threshold
Legend:
Skid Buffer
Drop Queue
No-Drop Queue
Pause Active
8e-4q4q Template
CoS 3-4 (Q3)
CoS 5-7 (Q1)
CoS 2 (Q4)
CoS 0-1 (Q-Default)
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Egress Queuing – Logical View
59
Egre
ss P
ort
Q-D
ef
PQ
1
Q3
Q2
default-4q-7e-out-policy
Q2
(0,1)
PQ1
(5,6,7)
Q3
(2,4)
Q-Def.
(3)
DWRR
Priority
DWRR
50% 50%
Egre
ss P
ort
Q-D
ef
(H)
PQ
3 (
L)
PQ
1
PQ
2
default-4q-6e-out-policy
PQ2
(0,1,2)
PQ3
(3)
PQ1
(5,6,7)
Q-Def.
(4)
DWRR
Prio Prio
DWRR
100%
Egre
ss P
ort
PQ
1 (
H)
PQ
2 (
L)
Q-D
ef
Q3
default-4q-4e-out-policy
Q3
(1,2,3)
PQ2
(0)
Q-Def.
(4)
PQ1
(5,6,7)
DWRR DWRR
Prio Prio
DWRR
100% 100%
red indicates no-drop
Egre
ss P
ort
PQ
1
Q2
Q3
Q-D
ef
DWRR
default-4q-8e-out-policy
Q-Def.
(0,1)
Q2
(3,4
)
Q3
(2)
PQ1
(5,6,7)
Priority
33% 33% 33%
Egre
ss P
ort
PQ
1
Q2
Q3
Q-D
ef
DWRR
default-4q4q-8e-out-policy
Q-Def.
(0,1)
Q2
(3,4)
Q3
(2)
PQ1
(5,6,7)
Priority
33% 33% 33%
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Queuing Policies
• policy-map type queuing – Define per-queue behavior such as queue size, WRED, shaping
• priority – defines queue as the priority queue
• bandwidth – defines WRR weights for each queue
• shape – defines SRR weights for each queue
• queue-limit – defines queue size and defines tail-drop thresholds
• random-detect – sets WRED thresholds for each queue
60
N7k(config)# policy-map type queuing pri-q N7k(config-pmap-que)# class type queuing 1p7q4t-out-pq1 N7k(config-pmap-c-que)# bandwidth no queue-limit set exit priority random-detect shape N7k(config-pmap-c-que)#
Note that some “sanity” checks are only performed when you attempt to tie the policy to an interface
Example: WRED on ingress 10G ports
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Queuing Service Policies
• service-policy type queuing – Attach a queuing policy-map to an interface
• Queuing policies always tied to physical port
• No more than one input and one output queuing policy per port
tstevens-7010(config)# int e1/1
tstevens-7010(config-if)# service-policy type queuing input my-in-q
tstevens-7010(config-if)# service-policy type queuing output my-out-q
tstevens-7010(config-if)#
61
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Modifying MTU – N7K – F series linecards
62
• MTU in network-QoS policy applies to all F1/F2 interfaces in absence of per-port MTU configuration. User-configured per-port MTU overrides any MTU in network-QoS policy (for that port)
• Per-port or network-QoS defined MTUs must be less than or equal to configured system jumbomtu value
• L2 switchport MTU must be 1518 or the “system jumbomtu” value if MTU configured per-port
• Example of per-port MTU (modifies MTU only on specified port):
N7K(config)# interface e3/1 N7K(config-if)# mtu 9216 N7K(config-if)#
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Modifying MTU – N7K - F1 and F2 series linecards
63
• Example of network-QoS MTU (modifies MTU for specified class on all F1/F2 ports)
N7K# !Clone the 7E policy (cannot modify default policies) N7K# qos copy policy-map type network-qos default-nq-7e-policy prefix new- N7K# conf Enter configuration commands, one per line. End with CNTL/Z.
N7K(config)# !Modify the newly cloned policy-map N7K(config)# policy-map type network-qos new-nq-7e N7K(config-pmap-nqos)# !Modify the 7E drop class N7K(config-pmap-nqos)# class type network-qos c-nq-7e-drop N7K(config-pmap-nqos-c)# mtu 8000 N7K(config-pmap-nqos-c)# !Apply the new policy-map to the system qos target N7K(config-pmap-nqos-c)# system qos N7K(config-sys-qos)# service-policy type network-qos new-nq-7e N7K(config-sys-qos)#
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
What Is a Strict Priority Queue?
64
• In classic definition, SP queue gets complete, unrestricted access to all interface bandwidth and is serviced until empty
–Can theoretically starve all other traffic classes
• Depending on hardware implementation, additional options for SP queue exist: –Multiple PQs with hierarchical relationship (e.g., level 1 vs. level 2) –Multiple PQs with bandwidth sharing according to DWRR weights –Optional SP queue shaping
M1 modules:
• SP queue adheres to classic SP queue definition –You cannot limit how much interface bandwidth traffic mapped to SP queue consumes
• Use care in mapping traffic to SP queue – SP traffic should be low volume
F1/F2 modules:
• Multiple SP queues can exist, depending on active network-QoS template
• SP queue(s) can be shaped to prevent complete starvation of other classes –Note that a shaped queue cannot exceed the shaped rate even if no congestion exists
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Modifying Queuing Behavior Shape the SP Queue on F1/F2 Modules
Clone a default egress “type
queuing” policy-map Creates a copy of a default egress queuing policy
Shape SP queue in new
(cloned) “type queuing” policy
Limit SP queue bandwidth consumption
Apply new “type queuing”
policy to target interface(s) Apply new queuing policy to F1/F2 interfaces
Important: applying new queuing policy takes effect immediately and is disruptive to any ports to which the policy is applied
65
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Modifying Queuing Behavior Shape the SP Queue on F1/F2 Modules
66
• Example: Shape the SP queue to 2Gbps on an interface, using a queuing policy cloned from the default “8E” egress queuing policy
N7K# !Clone the 8E egress queuing policy
N7K# qos copy policy-map type queuing default-4q-8e-out-policy prefix new-
N7K# conf t
Enter configuration commands, one per line. End with CNTL/Z.
N7K(config)# !Modify new queuing policy
N7K(config)# policy-map type queuing new-4q-8e-out
N7K(config-pmap-que)# !Modify the SP queue
N7K(config-pmap-que)# class type queuing 1p3q1t-8e-out-pq1
N7K(config-pmap-c-que)# !Shape the queue to 20% (2G)
N7K(config-pmap-c-que)# shape percent 20
N7K(config-pmap-c-que)# !Apply the new policy to target interface
N7K(config-pmap-c-que)# int e 2/1
N7K(config-if)# service-policy type queuing output new-4q-8e-out
N7K(config-if)#
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Modifying Queuing Behavior Make an Interface “Untrusted”
67
Create ingress “type queuing”
policy-map to set COS to 0
Rewrites COS of all frames to 0 – only needed if ingress is 1Q trunk
Create “type qos” marking policy to
set DSCP to 0 Rewrites DSCP of all IP packets to 0
Apply new policies to target
interface(s)
Apply new policies to interfaces
Important: applying new queuing policy takes effect immediately and is disruptive to any ports to which the policy is applied
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Modifying Queuing Behavior Make an Interface “Untrusted” – F1/F2 Modules
68
• N7K# !Clone the default input queuing policy, or create a new one from scratch
N7K# qos copy policy-map type queuing default-4q-8e-in-policy prefix untrusted-
N7K# conf
Enter configuration commands, one per line. End with CNTL/Z.
N7K(config)# !Modify the cloned policy
N7K(config)# policy-map type queuing untrusted-4q-8e-in
N7K(config-pmap-que)# !For F1/F2, must specify all queues even for untrusted policy
N7K(config-pmap-que)# class type queuing 2q4t-8e-in-q1
N7K(config-pmap-c-que)# !Give q1 the minimum buffer space
N7K(config-pmap-c-que)# queue-limit percent 1
N7K(config-pmap-c-que)# class type queuing 2q4t-8e-in-q-default
N7K(config-pmap-c-que)# !Give q-default maximum buffer space
N7K(config-pmap-c-que)# queue-limit percent 99
N7K(config-pmap-c-que)# !Set COS 0 for all frames
N7K(config-pmap-c-que)# set cos 0
N7K(config-pmap-c-que)# policy-map type qos untrusted
N7K(config-pmap-qos)# !use class-default to match everything
N7K(config-pmap-qos)# class class-default
N7K(config-pmap-c-qos)# !change DSCP of all packets to 0
N7K(config-pmap-c-qos)# set dscp 0
N7K(config-pmap-c-qos)# int e1/1-32
N7K(config-if-range)# !tie the queuing & qos policies to the target interface(s)
N7K(config-if-range)# service-policy type queuing input untrusted-4q-8e-in
N7K(config-if-range)# service-policy type qos input untrusted
Note: for an access switchport, queuing policy not necessary since no COS received
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Priority Flow Control – Nexus 7K Operations Configuration – Switch Level
69
• No-Drop PFC w/ MTU 2K set for Fibre Channel
N7K-50(config)# system qos
N7K-50(config-sys-qos)# service-policy type network-qos default-nq-7e-policy
show policy-map system
Type network-qos policy-maps
=====================================
policy-map type network-qos default-nq-7e-policy
class type network-qos c-nq-7e-drop
match cos 0-2,4-7
congestion-control tail-drop
mtu 1500
class type network-qos c-nq-7e-ndrop-fcoe
match cos 3
match protocol fcoe
pause
mtu 2112
Template Drop CoS (Priority) NoDrop CoS (Priority)
default-nq-8e-policy 0,1,2,3,4,5,6,7 5,6,7 - -
default-nq-7e-policy 0,1,2,4,5,6,7 5,6,7 3 -
default-nq-6e-policy 0,1,2,5,6,7 5,6,7 3,4 4
default-nq-4e-policy 0,5,6,7 5,6,7 1,2,3,4 4
Policy Template choices
show class-map type network-qos c-nq-7e-ndrop-fcoe
Type network-qos class-maps
=============================================
class-map type network-qos match-any c-nq-7e-ndrop-fcoe
Description: 7E No-Drop FCoE CoS map
match cos 3
match protocol fcoe
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Hierarchical Queuing Policies for ETS – F1 and F2
70
Enhanced Transmission Selection (ETS) provides priority group mappings and
bandwidth ratios
‒Controls hierarchical queuing policies for drop versus no-drop traffic classes
‒Defines bandwidth ratios advertised in DCBX for drop versus no-drop
classes
Only active when no-drop network-qos policy active (7E/6E/4E)
Top-level policy-map defines overall queue-limit and bandwidth ratios for drop
versus no-drop classes
Second-level policy-map defines priority, queue-limit, and bandwidth ratios for
individual drop and no-drop classes
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Example of ETS Queuing Policy – F linecards
71
• Example using default queuing policies under 6E network-QoS template
• Top-level policy-map policy-map type queuing default-4q-6e-out-policy
class type queuing c-4q-6e-drop-out
service-policy type queuing default-4q-6e-drop-out-policy
bandwidth remaining percent 70
class type queuing c-4q-6e-ndrop-out
service-policy type queuing default-4q-6e-ndrop-out-policy
bandwidth remaining percent 30
• Second-level policy-maps policy-map type queuing default-4q-6e-drop-out-policy
class type queuing 3p1q1t-6e-out-pq1
priority level 1
class type queuing 3p1q1t-6e-out-q-default
bandwidth remaining percent 100
policy-map type queuing default-4q-6e-ndrop-out-policy
class type queuing 3p1q1t-6e-out-pq2
priority level 1
class type queuing 3p1q1t-6e-out-pq3
priority level 2
Defines overall
bandwidth ratio for
drop classes
Defines overall
bandwidth ratio for no-
drop classes
Defines priority and
bandwidth ratios for
individual drop classes
Defines priority and
bandwidth ratios for
individual no-drop classes
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Mastering Data Center QoS – BRKRST-2509
72
• Data Center QoS Requirements
• Nexus QoS Capabilities
• Nexus QoS Configuration
–Nexus Configuration Model: MQC
–Platform Configuration Examples • Nexus 7x00
• Nexus 6000 / 5x00 / 3000
1K Cisco Nexus
x86
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Configuration Templates – MTU 6000/5000/3000/2000
73
• MTU can be configured for each class of service (no interface level MTU)
• No fragmentation since Nexus 5000 / 3000 is a L2 switch
• Cut-through, frames are truncated if they are larger than MTU
• Store-and-forward: frames are dropped if they are larger than MTU
• With L3 module (5000) or license (3000) L3 MTU at Routed Interface / SVI level
Each CoS queue on the
Nexus 5000 supports a
unique MTU
class-map type qos iSCSI
match cos 2
class-map type queuing iSCSI
match qos-group 2
policy-map type qos iSCSI
class iSCSI
set qos-group 2
class-map type network-qos iSCSI
match qos-group 2
policy-map type network-qos
iSCSI
class type network-qos iSCSI
mtu 9216
system qos
service-policy type qos input iSCSI
service-policy type network-qos
iSCSI
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Configuration Templates– MTU 6000/5000/3000/2000
74
• Nexus 5000 / 3000 supports different MTU for each system class
• MTU is defined in network-qos policy-map
• L2: no interface level MTU support on Nexus 5000
• L3 MTU: at SVI / Routed port level
Policy-map type network-qos jumbo
Class type network-qos class-default
MTU 9216
System qos
Service-policy type network-qos jumbo
Interface ethernet 1/x
Mtu 9216
Each qos-group on the
Nexus 5000/3000 supports a
unique MTU
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Configuration Templates – Cos Marking
75
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
ETS – Strict Priority and Bandwidth management
76
• Create classification rules first by defining and applying policy-map type qos
• Define and apply policy-map type queuing to configure strict priority and bandwidth sharing
pod3-5010-2(config)# class-map type queuing class-voice
pod3-5010-2(config-cmap-que)# match qos-group 2
pod3-5010-2(config-cmap-que)# class-map type queuing class-high
pod3-5010-2(config-cmap-que)# match qos-group 3
pod3-5010-2(config-cmap-que)# class-map type queuing class-low
pod3-5010-2(config-cmap-que)# match qos-group 4
pod3-5010-2(config-cmap-que)# exit
pod3-5010-2(config)# policy-map type queuing policy-BW
pod3-5010-2(config-pmap-que)# class type queuing class-voice
pod3-5010-2(config-pmap-c-que)# priority
pod3-5010-2(config-pmap-c-que)# class type queuing class-high
pod3-5010-2(config-pmap-c-que)# bandwidth percent 50
pod3-5010-2(config-pmap-c-que)# class type queuing class-low
pod3-5010-2(config-pmap-c-que)# bandwidth percent 30
pod3-5010-2(config-pmap-c-que)# class type queuing class-fcoe
pod3-5010-2(config-pmap-c-que)# bandwidth percent 20
pod3-5010-2(config-pmap-c-que)# class type queuing class-default
pod3-5010-2(config-pmap-c-que)# bandwidth percent 0
pod3-5010-2(config-pmap-c-que)# system qos
pod3-5010-2(config-sys-qos)# service-policy type queuing output policy-BW
FCoE Traffic given 20%
of the 10GE link
1Gig FC HBAs
1Gig Ethernet NICs
Traditional Server
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Priority Flow Control – Nexus 6000 / 5500/5000
77
• On Nexus 5000 once feature fcoe is configured, 2 classes are made by default
• class-fcoe is configured to be no-drop with an MTU of 2158
• Enabling the FCoE feature on Nexus 5548/96 does ‘not’ create no-drop policies automatically as on Nexus 5010/20
• Must add policies under system QOS:
policy-map type qos default-in-policy
class type qos class-fcoe
set qos-group 1
class type qos class-default
set qos-group 0
policy-map type network-qos default-nq-policy
class type network-qos class-fcoe
pause no-drop
mtu 2158
system qos
service-policy type qos input fcoe-default-in-policy
service-policy type queuing input fcoe-default-in-policy
service-policy type queuing output fcoe-default-out-policy
service-policy type network-qos fcoe-default-nq-policy
FCoE DCB Switch
DCB CNA Adapter
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Configuring QoS on the Nexus 6000/ 5500
78
• Check System Classes
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
QoS On ACI Fabric - Topology
79
Traffic marking can be performed by the fabric, simplifying the Data Center inbound and outbound QoS policy enforcement
VM
VM
Apic Apic Apic Apic
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
DSCP Marking on ACI Fabric – Create Policy
80
Under the Custom QoS, verify there is a policy named
“Marking_Test” created. Double click on it. You´ll observe
that policy determines all packets marked with DSCP 0 to
63 will be remarked to AF23
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
DSCP Marking on ACI – Apply Policy
81
Tenant -> Application Profiles
Verify Custom QoS
“Marking_Test” is associated
with that EPG.
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
DSCP Marking – Verify Marking
82
Once the policy is associated, go to the
Web VM console and issue a ping towards
the App VM. In addition, on the App
console, enable the TCPDUMP, so it´s
possible to see the marking changed
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Summary
83
• Data Center QoS requires characterization beyond voice and video.
• New capabilities: PFC, ETS , DCBX
• Platform consistency: MQC
• Platform dependencies : where PFC, PQ, Queue structure
• Different type of congestions / traffic flows
• More to QoS than Buffer Tuning: Application and transport tuning
• How to configure QoS on Nexus switches
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Recommended Readings
84
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Participate in the “My Favorite Speaker” Contest
• Promote your favorite speaker through Twitter and you could win $200 of Cisco Press products (@CiscoPress)
• Send a tweet and include
– Your favorite speaker’s Twitter handle <@flying91>
– Two hashtags: #CLUS #MyFavoriteSpeaker
• You can submit an entry for more than one of your “favorite” speakers
• Don’t forget to follow @CiscoLive and @CiscoPress
• View the official rules at http://bit.ly/CLUSwin
Promote Your Favorite Speaker and You Could be a Winner
85
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Complete Your Online Session Evaluation
• Give us your feedback and you could win fabulous prizes. Winners announced daily.
• Complete your session evaluation through the Cisco Live mobile app or visit one of the interactive kiosks located throughout the convention center.
Don’t forget: Cisco Live sessions will be available for viewing on-demand after the event at CiscoLive.com/Online
86
© 2014 Cisco and/or its affiliates. All rights reserved. BRKRST-2509 Cisco Public
Continue Your Education
• Demos in the Cisco Campus
• Walk-in Self-Paced Labs
• Table Topics
• Meet the Engineer 1:1 meetings
87