+ All Categories
Home > Documents > SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding,...

SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding,...

Date post: 06-Mar-2018
Category:
Upload: phamthu
View: 297 times
Download: 10 times
Share this document with a friend
174
SAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage Networking CCIE 6448 BRKSAN-3446
Transcript
Page 1: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

SAN Congestion! Understanding, Troubleshooting,

Mitigating in a Cisco FabricEdward Mazurek

Technical Lead Data Center Storage Networking

CCIE 6448

BRKSAN-3446

Page 2: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

• Introduction

• Slow Drain Terminology

• Understanding Fibre Channel Flow Control

• MDS Slow Drain Features

• Troubleshooting Slow Drain

• Alerting and Mitigating Slow Drain

• Conclusion

Agenda

Page 3: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Introduction

• Slow drain is a term to describe SAN congestion

• When devices do not receive data at the line rate this can cause congestion in the SAN

• SANs are getting increasing complex and heterogeneous

• Many different speeds

• Many different types of devices

• Host/storage workloads increasing

Page 4: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Reasons for Slow Drain

• Edge devices - An edge device can be slow to respond for a variety of reasons:

• Server performance problems: application or OS

• Host bus adapter (HBA) problems: driver or physical failure

• Speed mismatches: one fast device and one slow device

• Nongraceful virtual machine exit on a virtualized server, resulting in packets held in HBA buffers

• Storage subsystem performance problems, including overload

• Inter Switch Links (ISL)

• Lack of B2B credits for the distance the ISL is traversing• Ex: 4 credits per KM @ 8Gbps

• The existence of slow drain edge devices

• Edge devices with faster speeds than ISLs even when port-channeled

Page 5: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Reasons for Slow Drain Port-channel BW not the same as individual link BW 4x4Gb not equal 16Gb

Member ISL sending at full 4Gbps rate causing congestion back to storage

Port-

channel

16Gb(total)

H1

Congestion

4 x 4Gb links

Src, dst, oxid

8GbVOQs

No B2B

Credits

No R_Rdy

Sent

0 Rx credits remaining

Tx queue full due

To slower link Congestion

8Gb

Individual exchange

traverses single ISL

Page 6: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

SAN Congestion! BRKSAN-3446

Slow Drain Terminology

Page 7: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow Drain Terminology

• B2B – Buffer to Buffer Credits / Credits Remaining

• B2B Transitions to Zero

• Slow Ports / B2B Credit not Available

• Stuck Ports

• Credit Loss Recovery

Page 8: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow Drain Terminology

• Buffer to Buffer credits or B2B credits are the agreed upon buffer space on each side of a FC link

• Occurs on FLOGI and ACC(FLOGI)

• Occurs on ELP and ACC(ELP)

• B2B credit remaining is the count of FC frames that still can be sent by each side of a FC link

• Credits are returned by R_Rdy FC ordered set

B2B credits / Credits remaining

FLOGI

x Rx credits

y Tx creditsACC(FLOGI)

y Rx credits

x Tx credits

ELP

x Rx creditsx Tx credits

ACC(ELP)

y Rx creditsy Tx credits

End Device MDS

MDS MDS

ISL

Page 9: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow Drain Terminology

• Whenever a port hits zero credits this is counted as a “B2B transition to zero”

• Occurs and is counted in both Tx and Rx direction

• Transmit B2B transition to zero indicates attached device didn’t return credits

• Receive B2B transitions to zero indicates port didn’t return credits

• Important: Amount of time at zero credits is not easily determined

• Can occur normally

B2B Transitions to zero

Int fc1/13 Tx credits

0 remaining Tx credits

130 transmit B2B credit transitions to zero

FFC Frame

FC FrameFC Frame

Increment Tx transitions to zero

Page 10: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow Drain Terminology

• A port that is attached to a FC device that returns credits slowly

• The receiver of the FC frame does not immediately return an R_Rdymessage to the sender

• B2B Tx Credit not available is a term most often used when the MDS detects that the B2B credits are at zero for 100ms

• Called a Slow Port

Slow Port Detection

0ms

100ms

200ms

Increment Tx credit not available

0 remaining Tx credits

Int fc1/13 Tx credits E

FC FrameFC Frame

FC Frame

Increment Tx credit not available

Record slowport event (if configured)

New!

Page 11: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

MDS Slow Drain Features

• If no-credit-drop is configured and Txcredits are at 0 for that amount of time then port is considered “stuck”.

• Start dropping frames immediately without regard to age of frames

• Any newly arriving frames are dropped immediately as long as the port remains at 0 Tx credits

• Frees up frames queued at ISLs destined for slow/stuck ports quicker

Stuck Port Detection0 sec --

Credit

Frame

Once credit arrives

resume sending

--

--

--300ms

--

--

--

No Tx credits

Frame

Frame

Frame

Frame

Frames in Rx queuefrom other ports

Drop frames in Rx queue

Drop any new arriving Frames immediately

Frame

Frame

Frame

No-credit-drop 300ms

0 Tx credits

Page 12: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow Drain Terminology

• If transmit credits are at zero for 1 second (F port) 1.5 second (E) port then it invokes credit loss recovery

• Link Reset(LR) is transmitted

• If Link Reset Response(LRR) is received then both sides are back at the full B2B credits

• If LRR is not received then link is failed

• Counter is incremented and optionally a port-monitor alert can be generated

• Link Reset is better named Link Credit Reset – Part of FC-FS - Framing and Signaling

Credit Loss Recovery 0 sec --

1/1.5 sec --

No Credits (Stuck)

Credits

LRR

Successful

LR

Port resumes normal

operation –

nondisruptive

+60ms --

Unsuccessful

0 sec --

1/1.5 sec --

No Credits (Stuck)

Credits

No Response

Shut/No Shut+60ms --

LR

Page 13: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

SAN Congestion! BRKSAN-3446

Understanding FibreChannel Flow Control

Page 14: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Understanding Fibre Channel Flow Control

• Fiber Channel classes

• Fibre Channel class 3

• Fibre Channel Flow Control

• Fibre Channel Flow Control – Example

• MDS frame and credit processing

Page 15: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Fiber Channel classes

All data currently is transported using class 3

F

a

b

r

i

c

Class 1 X

Class 2 X

Class 3 X

Class 4 X

Class 6 X

Class F X

FC-AL X

Page 16: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Fibre Channel Class 3

• Class 3 is a best-effort packetized service:

• The receiving port does not acknowledge receipt of frames. If the fabric cannot deliver the frame for any reason, the frame can be discarded without notifying the sending port. However, Class 3 is not really unreliable, because it relies on ULP to help ensure that frames are delivered, by detecting and recovering from lost frames

• Class 3 does not guarantee fixed latency because data paths are variable

• Class 3 does not guarantee in-order delivery. For most Fibre Channel applications, including storage applications, the ULP is responsible for guaranteeing in-order delivery

Page 17: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Fibre Channel Flow Control

• Fibre Channel flow control attempts to minimize the chance of dropped frames

• Frames are only transmitted when it is known that the receiver has buffer space

• For each frame sent an R_Rdy (B2B Credit) should be returned

• R_Rdys can only be returned once the frame that has previously occupied that buffer location has been handled

• R_Rdys are not sent reliably – they can be corrupted/lost

• Each side informs the other side of the number of buffer credits it has

• F ports - In the Fabric Login(FLOGI)

• E ports – In the Exchange Link Parameters(ELP)

• Note: B2B credits are not negotiated – just agreed to

Page 18: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Fibre Channel Flow ControlN-Port Login

FLOGI 1 credit N-port

has one

credit!

FN

ACC (FLOGI) 3 credits

B B BB

MDS9710-A

MDS9710-A# show int fc1/14

fc1/14 is up

……….

Transmit B2B Credit is 1

Receive B2B Credit is 3

3 receive B2B credit remaining

1 transmit B2B credit remaining

1 low priority transmit B2B credit remaining

Note: These values are not typical. They are chosen for simplicity. Typical F ports values 16-32

F-Port has

three

credits!

End Device

Page 19: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Fibre Channel Flow Control

• As FC frames flow into the fabric, the MDS Rx buffer queue is decremented by 1 B2B credit for each received frame

• Once an R_Rdy is sent by the MDS, it frees up one B2B credit

Frame Flow Control

FN B B BB FrameFrameFrame R-Rdy B

MDS9710-A# show interface fc1/14

fc1/14 is up

……….

Transmit B2B Credit is 1

Receive B2B Credit is 3

1 receive B2B credit remaining

1 transmit B2B credit remaining

1 low priority transmit B2B credit remaining

MDS9710-A# show interface fc1/14

fc1/14 is up

……….

Transmit B2B Credit is 1

Receive B2B Credit is 3

0 receive B2B credit remaining

1 transmit B2B credit remaining

1 low priority transmit B2B credit remaining

Page 20: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Understanding Fibre Channel Flow Control

• Tx indicates transmit side of port

• Rx indicates receive side of port

• One side’s Tx is the adjacent side’s Rx

• Important to understand which direction the congestion is on

• Note: Increasing B2B credits does not usually increase performance

Tx and Rx Perspective

fc1/1

Transmit B2B Credit is 500

fc1/2EE

Receive B2B Credit is 500

Transmit B2B Credit is 250

Receive B2B Credit is 250

MDS9710-A# show interface fc1/1 bbcredit

fc1/1 is trunking

Transmit B2B Credit is 500

Receive B2B Credit is 250

Receive B2B Credit performance buffers is 0

250 receive B2B credit remaining

500 transmit B2B credit remaining

500 low priority transmit B2B credit remaining

MDS9710-A

ISL

MDS9710-B

Page 21: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Fibre Channel Flow Control – Example cont’Normal flow

Delta time ~0.7us

ports

FC data and R_Rdy

MDS

Xgig

Analyzer FC Port(1,1,4)FC Port(1,1,3) ServerFC Data

R_Rdy

Page 22: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Fibre Channel Flow Control – Example cont’Delayed/No R_RDYs

MDS

Xgig

Analyzer FC Port(1,1,4)FC Port(1,1,3) Server

Only data – no R_Rdys

FC Data

FC Data

FC Data

Page 23: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Fibre Channel Flow Control – Example cont’R-RDY recovery

R_Rdys start arriving

More R_Rdys

More R_Rdys

MDS

Xgig

Analyzer FC Port(1,1,4)FC Port(1,1,3) Server

R_Rdy

R_Rdy

R_Rdy

Page 24: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

FC

Frame

MDS Frame and Credit Processing

Line

Card

2

Line

Card

1

Active

Supervisor

Arbiter

Fabric Module(XBAR)

Fabric Module(XBAR)

Initiator sends an FC frame to the MDS port ASIC

1

XBAR interface sends request to Arbiter for grant to transmit frame to egress port via XBAR

4

Arbiter grants request to XBAR interface to forward frame – only sent when egress port has buffer space available

5

FC Frame is forwarded to XBAR then R_Rdy sent backsince buffer is now free

6

FC Frame is forward to egress line card7

MDS Port ASIC forwards frame to target8

FC Frame

R-Rdy

FC frame is received in its entirety and stored

2

Credit is returned to Arbiter9

XBAR

interface

FC Frame transmitted to VOQ

3

VOQ

P

o

r

t

P

o

r

t

FC Frame

Page 25: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

The Issue: Non-Responsive Devices causing upstream blocking

Interface

Buffer

VOQs

ISL

Slow

Drain

Device

No B2B

CreditsNo R_Rdy

Sent

No R_Rdy

Sent

All

Devices

Impacted

H1

H2

0 Tx credits remaining

Congestion Congestion Congestion Congestion

VOQs

VOQsNo R_Rdy

Sent

No B2B

Credits

No R_Rdy

Sent

0 Rx credits remaining0 Tx credits remaining0 Rx credits remaining

5 MB read issued from host H2 to storage S2

Congestion

S2

S1

No B2B

Credits

0 Rx credits remaining

Page 26: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

SAN Congestion! BRKSAN-3446

MDS Slow Drain Features

Page 27: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

MDS Slow Drain Features - Existing

• Virtual Output Queues

• Display credits and remaining credits

• Detect Tx and Rx credit transitions to zero

• Slow Port Detection

• Tx and Rx Credit not Available

• Stuck Port Detection

• Credit Loss Recovery / LR Rcvd B2B

• Display ingress queuing

Page 28: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

MDS Slow Drain Features -

• Congestion drop frames

• No credit drop frames

• On Board Failure Logging

• Port-monitor alerting / portguard

Enhanced!

Page 29: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

MDS Slow Drain Features -

• slowport-monitor

• show interface counters - txwait

• show interface - Percentage Tx credits are available for last 1s/1m/1h/72h

• txwait-history graphs

• show logging onboard txwait

• SNMP fcIfTxWaitCount variable

• show tech-support slowdrain

• DCNM Slow Drain Analysis

New!

Page 30: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Virtual Output Queues (VOQs)

VOQ Model

Frame to Port 5

Frame to Port 5 Frame to Port 6

Frame to Port 4

Frame to Port 4

Frame to Port 6

Frame to Port 6Frame to Port 4

Input Queue at Port 1

Top of VOQ

Input Queue at Port 1

Input Queue at Port 1

Top of VOQ Top of VOQ

Switch Without VOQ

Frame to Port 5

Frame to Port 5

Frame to Port 6

Frame to Port 4

Frame to Port 4

Frame to Port 6

Frame to Port 6

Frame to Port 4

Input Queue at Port 1

Top of Queue

This diagram shows the primary difference between a VOQ-based

switch and a switch without VOQ.

If destination port 4 was congested, the switch without VOQ would

block with frames to other output ports waiting behind the blocked

port.

In contrast, VOQ means that only the VOQ associated with port 4 will be blocked; frames to all other ports will flow normally.

X X

MDS implements VOQs on the input interface

VOQs help prevent head of line blocking

VOQs can alleviate but do not prevent congestion caused by slow drain

Page 31: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

MDS Slow Drain Features

• MDS can display the Tx and Rx credits agreed upon on each interface

• MDS can also display the credits remaining in both directions

• Tx and Rx credits are a static value

• Remaining credits are an instantaneous value

• Available via show interface bbcredit command

Display credits and remaining credits

28 Tx frames outstanding

MDS owes 8 credits

MDS9710# show interface fc1/1 bbcredit

fc1/1 is up

Transmit B2B Credit is 128

Receive B2B Credit is 32

Receive B2B Credit performance buffers is 0

24 receive B2B credit remaining

100 transmit B2B credit remaining

100 low priority transmit B2B credit remaining

Page 32: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

MDS Slow Drain Features

• Each time the Tx or Rx credits go to zero the MDS increments a counter

• Maintained as a hardware statistic

• Available in

• show interface counters

• slot x show hardware internal fc-mac port y statistics

• show hardware internal statistics

• Since there is no indication of time at zero this is not a great indication of slow drain in and of itself

• Use the slowport-monitor or various txwait commands instead

Detect credit transitions to zero

Page 33: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

MDS Slow Drain Features

• MDS software process detects when a port is at zero Tx or Rx credits for 100ms

• Since done by software may not catch each and every time

• Available in:

• slot x show hardware internal fc-mac port y error-statistics

• show logging onboard error-stats• xxx_CNTR_RX_WT_AVG_B2B_ZERO

• xxx_CNTR_TX_WT_AVG_B2B_ZERO

• show system internal snmp credit-not-available

• port-monitor tx-credit-not-available

Tx/Rx Credit not Available

0 sec --

1 sec --

Credits

100 ms

B2B Credits Sampled

Every 100 ms

100 ms

<snip>

Timestamped!

Page 34: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

MDS Slow Drain Features

• Creditmon is a process that runs periodically in each linecard

• It checks for transmit credits at zero

• F Port at 0 Tx credits for 1 second

• E Port at 0 Tx credits for 1.5 seconds

• Credit loss recovery invoked

• If successful then non-disruptive

• If port at 0 Rx credits, adjacent device is responsible for initiating recovery

• Part of FC-FS specification

Credit Loss Recovery0 sec --

1/1.5 sec --

No Credits (Stuck)

Credits

LRR

Successful

LR

Port resumes normal

operation

+60ms --

Unsuccessful

0 sec --

1/1.5 sec --

No Credits (Stuck)

Credits

No Response

Link failure Link

reset failed due to

timeout

+60ms --

LR

Page 35: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

MDS Slow Drain Features

• Adjacent device initiates credit loss recovery

• If MDS receives LR it checks if input buffers are empty

• If input buffers are not empty in 90ms the “LR Rcvd B2B” condition occurs and the link fails with reason “Link failure Link Reset failed nonempty Recv queue”

• Indication of upstream congestion

LR Rcvd B2B slot 10 show port-config internal link-events

Time PortNo Speed Event Reason

---- ------ ----- ----- ------

Apr 3 18:53:36 2014 00591356 fc10/30 4G UP Not FL

Apr 3 18:53:34 2014 00810034 fc10/30 --- DOWN LR Rcvd B2B

0 sec --

1 sec --

No Credits from MDS

Credits

No Response

Shut/No Shut+90ms --

LR

LR Rcvd B2B

MDS Port FC Device

Page 36: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

MDS Slow Drain Features

• Each frame the MDS receives is time stamped

• If frame cannot be delivered to the egress port it is timeout dropped

• MDS (by default) drops frames as timeout drops at 500ms

• Can be configured 100ms-500ms in 1ms intervals

• Lowering will timeout frames quicker and reduce effects of slow drain devices

Congestion Drop Frames

0 sec --

500ms --

Credit

Frame

Frame

Frame

Frame

Check Timestamp

of each frame

Drop the Frames

from the queue

Frames arrive

Enhanced!

Page 37: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

MDS Slow Drain Features

• Frames normally queued for Congestion Drop time

• Optionally, frames can be dropped immediately if the egress port is at 0 Tx B2B credits for a specified time

• Frees up frames queued at ISLs destined for slow/stuck ports quicker

• Helps unrelated devices in the presence of congestion

• Configured 1ms-500ms in 1ms intervals

• Done by hardware at exact time

Stuck port / No-credit-drop frames0 sec --

Credit

Frame

Once credit arrives

resume sending

--

--

--300ms

--

--

--

No Tx credits

Frame

Frame

Frame

Frame

Frames in Rx queuefrom other ports

Drop frames in Rx queue

Drop any new arriving Frames immediately

Frame

Frame

Frame

No-credit-drop 300ms

0 Tx credits

Enhanced!

Page 38: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

MDS Slow Drain Features

• MDS can show ports that have frames queued and the destination (egress) port(s) they are queued for

• Instantaneous (real time) only

• Helpful when other indications are not showing clear indications

• DI – Destination Index – This is an internal representation of the port

Display ingress queuing

DI

DI

DI

DI

DI

VOQs

Egress ports Ingress port

Page 39: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

MDS Slow Drain Features

• MDS 9710/9396S has the capability of displaying some key packet info for packets that have experienced a timeout drop

• 32 packets are kept per forwarding instance

• Output contains:

• Source FCID (SID)

• Destination FCID(DID)

• RCTL – Routing control (ELS, ABTS, etc.)

• Source Index(SI)

• Destination Index(DI)

Display dropped packet info

Page 40: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

MDS Slow Drain Features

• MDS can monitor ports withholding credits for as low as 1ms

• Records last 10 events for duration and date/time when occurred

• Included in OBFL

• Full featured for MDS 9700, 9396S, 9148s and 9250i

• Gen 3 has similar but only records 1 event per 100ms cycle

• Gen 4 records total wait time in 100ms

Slowport-monitor 0 Tx credits0 sec --

--

system timeout slowport-monitor 5 mode f

--

--

--

--

5ms --

--

--

--

10ms --

R_RDY

Interface Delay Timestamp

fc1/13 11ms 03/27/2015 12:01:00

fc1/13 8ms 03/27/2015 14:09:45New!

-

-

Page 41: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

MDS Slow Drain Features

• Each linecard logs significant events to an NVRAM buffer

• Events are time stamped

• Events can be displayed by date/time

• Show logging onboard <module x> <starttime mm/dd/yy-hh:mm:ss>

• error-stats

• flow-control request-timeout

• flow-control timeout-drops

• slowport-monitor-events

• txwait

On Board Failure Logging(OBFL)

Line Card 1

Line Card 2

Line Card n

OBFL - NVRAMError-statsFlow-controlTimeoutsRequest-timeouts

OBFL - NVRAMError-statsFlow-controlTimeoutsRequest-timeoutsSlowport-monitor-eventsTxwait

OBFL - NVRAMError-statsFlow-controlTimeoutsRequest-timeoutsSlowport-monitor-eventsTxwait.

.

.

New!

Page 42: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

MDS Slow Drain Features

• Allows alerting on many slow drain indications

• Three new counters!

• Optional portguard action allows either port to be flapped or error-disabled

• Different policies for E / F ports

Port-monitor / Portguard

SNMP AlertsPort-monitor active

Link-loss

Credit-loss

Tx-credit-not-avail

Slowport-count

Slowpoer-oper-delay

txwait

DCNM Server

New!

Page 43: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

MDS Slow Drain Features

• Displays graphical history of Txwait– credit not available

• Shows

• Last minute

• Last 60 minutes

• Last 72 hours

txwait-history

MDS# show process creditmon txwait-history port 13

TxWait history for port fc1/13:

==============================

79998 79993 999999

08887 58882 9899999

000000000000299870000000000000000029994000000000000362999500

1000 ### ### ######

900 #### ### ######

800 #### #### ######

700 ##### #### ######

600 ##### #### ######

500 ##### #### ######

400 ##### #### ######

300 ##### ##### ######

200 ##### ##### ######

100 ##### ##### #######

0....5....1....1....2....2....3....3....4....4....5....5....6

0 5 0 5 0 5 0 5 0 5 0

Credit Not Available per second (last 60 seconds)

# = TxWait (ms)

New!

Page 44: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

MDS Slow Drain Features

• show interface counters command

• Provides a quick way to check for problems

• Available for:

• MDS 9500 (Gen4 only)

• MDS 9700 (Gen5)

• MDS 9396S (Gen5)

• MDS 9148S

• MDS 9250i

Percentage Tx credits are available for last 1s/1m/1h/72h New!

Page 45: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

MDS Slow Drain Features

• New “show tech-support” flavor

• Contains all the commands necessary to troubleshoot SAN congestion issues

• Best when issued against the entire fabric via DCNM

show tech-support slowdrain New!

Page 46: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

SAN Congestion! BRKSAN-3446

Troubleshooting Slow Drain

Page 47: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Troubleshooting Slow Drain

• Classifying Slow Drain Symptoms

• Methodology

• Level by Level Troubleshooting

Page 48: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Troubleshooting Slow DrainClassifying Slow Drain Symptoms

Level Host Symptoms Default Switch Behavior

1 Latency Frame queuing

2 SCSI errors/retransmission Frame dropping

3 Extreme Delay Links failing/reset

Note: Each level includes all the symptoms of the previous levels

Levels of Performance Degradation

Page 49: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Troubleshooting Slow Drain

• Latency indicates SCSI exchanges are taking longer than normal

• No SCSI errors or retransmissions are noted

• Subtle and difficult to detect

• ISLs and other ports should be checked for low numbers of Tx/Rx remaining credits

• Use new slowport-monitor, OBFL txwait, txwait-history and alerting capabilities

Classifying Slow Drain Symptoms - Level 1: Latency

Page 50: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Troubleshooting Slow Drain

• Once any frame in a SCSI exchange is dropped the exchange will be aborted

• Abort exchanges will be listed in host logs

• Frames are held for a maximum of 500ms prior to dropping as timeouts

• This is the default Congestion Drop value

• Frames can also be dropped as timeouts if no-credit-drop is configured

• Use “show logging onboard starttime <date-time> error-stats”

Classifying Slow Drain Symptoms - Level 2: Retransmission

Page 51: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Troubleshooting Slow Drain

• Typically caused by ports without credits for 1 or 1.5 seconds

• Credit-loss Recovery is invoked

• Links may fail and/or flap

• Typically many timeout drops are also recorded

Classifying Slow Drain Symptoms - Level 3: Extreme Delay

Page 52: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Troubleshooting Slow Drain

• Cisco recommends troubleshooting slow drain in the following order:

Methodology

Level 3: Extreme Delay

Level 2: Retransmission

Level 1: Latency

Page 53: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Troubleshooting Slow Drain

• If Rx congestion then find ports communicating with this port that have Tx congestion

• Zoning defines which devices communicate with this port

• Understand topology

• If port communicating with port showing Rx congestion is FCIP

• Check for TCP retransmits

• Check for overutilization of FCIP

Methodology – Follow Congestion to Source

F E

Rx Credits

0 Remaining

Tx Credits

0 RemainingCongestion

Page 54: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Troubleshooting Slow Drain

• If Tx congestion found

• If F port then device attached is slow drain device

• If E port then go to adjacent switch and continue troubleshooting

• Continue to track through the fabric until destination F-port is discovered

Methodology - Follow Congestion to Source

E EF F

Rx Credits

0 Remaining

Tx Credits

0 RemainingCongestion

Page 55: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Level 3: Extreme Delay - Troubleshooting

• Supervisor command on all platforms

• Module command also available

• Credit loss recovery events are the most severe slow drain indications

• Check/change cables/SFPs/HBAs

• Show logging onboard error-stats also contains this

Check for credit loss recovery

MDS9710-1# show process creditmon credit-loss-events

Module: 01 Credit Loss Events: YES

----------------------------------------------------

| Interface | Total | Timestamp |

| | Events | |

----------------------------------------------------

| fc1/13 | 11524 | 1. Sat Mar 29 14:21:48 2014 |

| | | 2. Sat Mar 29 14:21:47 2014 |

| | | 3. Sat Mar 29 14:21:46 2014 |

| | | 4. Sat Mar 29 14:21:45 2014 |

| | | 8. Sat Mar 29 14:21:41 2014 |

| | | 9. Sat Mar 29 14:21:40 2014 |

| | |10. Sat Mar 29 14:21:39 2014 |

----------------------------------------------------

Page 56: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Level 3: Extreme Delay - Troubleshooting

• Two places to check:

1. Module link-events

2. Logging log

• Both indicate the same thing –Rx congestion

• Not normally a problem w/this port but the port this port is switching packets to

• If multiple ports fail at similar times then they are switching to same port

Check for LR Rcvd B2B

MDS9710-1# slot 1 show port-config internal link-events

*************** Port Config Link Events Log ***************

---- ------ ----- ----- ------

Time PortNo Speed Event Reason

---- ------ ----- ----- ------

...

Jul 28 00:46:39 2012 00670297 fc1/25 --- DOWN LR Rcvd B2B

MDS9710-1# show logging log

%PORT-2-IF_DOWN_LINK_FAILURE: %$VSAN 100%$ Interface fc5/32 is down (Link

failure)

%PORT-5-IF_DOWN_LINK_FAILURE: %$VSAN 100%$ Interface fc5/32 is down (Link

failure Link Reset failed nonempty recv queue)

Page 57: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Level 2: Retransmission - Troubleshooting

• TIMEOUT drops are drops for packets that hit either the Congestion Drop or No-Credit-Drop thresholds

• They are normally counted several different ways

• Reference appendix for counter names and definitions

Check for Transmit Frame Drops

MDS9710-1# show hard internal statistics module 1 pktflow dropped

Hardware statistics on module 01:

|------------------------------------------------------------------------|

| Device:Lightning Role:ARB-MUX Mod: 1 |

|------------------------------------------------------------------------|

|------------------------------------------------------------------------|

| Device:F16 Xbar Driver Role:FABRIC Mod: 1 |

|------------------------------------------------------------------------|

|------------------------------------------------------------------------|

| Device:F16 Que Driver Role:QUE Mod: 1 |

|------------------------------------------------------------------------|

|------------------------------------------------------------------------|

| Device:F16 Fwd Driver Role:L2 Mod: 1 |

|------------------------------------------------------------------------|

|------------------------------------------------------------------------|

| Device:F16 Mac Driver Role:FCMAC Mod: 1 |

|------------------------------------------------------------------------|

Instance:1

Cntr Name Value Ports

----- ----- ----- -----

0 F16_TMM_TIMEOUT_STATS_DROP 0000000000088775 13-16 -

1 F16_TMM_PORT_FRM_DROP_CNT 0000000000088775 13 -

2 F16_TMM_TOLB_TIMEOUT_DROP_CNT 0000000000088775 13 -

Page 58: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Level 2: Retransmission - Troubleshooting

• Counters are polled every 20 seconds

• When counter value changes it is included

• Several different counters are in error-stats:

• Timeout drops

• Credit loss recovery

• Tx/Rx credit not available(100ms)

• Force timeout on/off

Show logging onboard error-stats

mds9710-2# show logging onboard error-stats

----------------------------

Module: 1

----------------------------

--------------------------------------------------------------------------------

ERROR STATISTICS INFORMATION FOR DEVICE DEVICE: FCMAC

--------------------------------------------------------------------------------

Interface | | | Time Stamp

Range | Error Stat Counter Name | Count |MM/DD/YY HH:MM:SS

| | |

--------------------------------------------------------------------------------

fc1/13 |F16_TMM_TOLB_TIMEOUT_DROP_CNT |242618 |04/14/14 12:17:58

fc1/13 |FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO |124 |04/14/14 12:17:58

fc1/13 |FCP_SW_CNTR_CREDIT_LOSS |124 |04/14/14 12:17:58

fc1/13 |F16_TMM_TOLB_TIMEOUT_DROP_CNT |201650 |04/14/14 12:17:38

fc1/13 |FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO |108 |04/14/14 12:17:38

fc1/13 |FCP_SW_CNTR_CREDIT_LOSS |107 |04/14/14 12:17:38

242618 – 201650 =

40968 timeout drops in

the last 20 seconds

Page 59: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Level 2: Retransmission - Troubleshooting

• MDS 9710 and 9396S maintains a FIFO list of last 32 dropped packets

• Display is per instance(8 ports)

• These contain:

• Source FCID (SID)

• Destination FCID(DID)

• RCTL – Routing control (ELS, ABTS, etc.)

• Source Index(SI)

• Destination Index(DI)

• These are not necessarily the slow device! Could be a victim!

Display dropped packet information

module-1# show hardware internal fcmac inst 0 tmm_timeout_stat_buffer

Port Group num: 0 TMM TIMEOUT BUFFERS

---------------------------------------------

TO_RD:22 TO_WR:6 NUM PKTS:32

--------------------------------------------------------------

TMM TIMEOUT Packet :0

CHIPTIME :14227(0x3793) ZERO:0 FCTYPE:0

SID:330040 DID:170040 RCTL:0

TSTMP_VALID:1 HDRTSTMP:14176(0x3760) HDRCTL:6144 SI:12

DI:2 AT:0 PORTNUM:1

TMM TIMEOUT Packet :1

CHIPTIME :14227(0x3793) ZERO:0 FCTYPE:0

SID:330040 DID:170040 RCTL:0

TSTMP_VALID:1 HDRTSTMP:14176(0x3760) HDRCTL:6144 SI:12

DI:2 AT:0 PORTNUM:1

MDS9710-2# show system internal fcfwd idxmap port-to-interface

Port to Interface Table:(All values in hex)

--------------------------------------------------------------------------------

glob| |VL|lcl| if |slot|port| mts | port| flags

idx | if_index | |idx|type| | | node| mode|

-----|--------------------------|--|---|----|----|----|-----|-----|-------------

0| 01000000 fc1/1 | 0| 00| 01 | 00 | 00 | 0102| 08 | 00

1| 01001000 fc1/2 | 0| 01| 01 | 00 | 01 | 0102| 00 | 00

2| 01002000 fc1/3 | 0| 02| 01 | 00 | 02 | 0102| 00 | 00

<snip>

12| 01012000 fc1/13 | 0| 12| 01 | 00 | 12 | 0102| 00 | 00

Actual interface name

Shows packets from fc1/13 to fc1/3 dropped

Page 60: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Level 1: Latency - Troubleshooting

• Indicates 100ms increments where Tx B2B credits were 0

• % indicate % of 1 second so 20% is 200ms

Credit Not Available

MDS9513# show system internal snmp credit-not-available

Module: 6 Number of events logged: 6

------------------------------------------------------------------------------------------

Port Threshold Rising/Falling Interval(s) Event Time Type Duration available

----------------------------------------------------------------------------------------------------------

fc6/32 10/0(%) 1 Wed Apr 2 17:23:54 2014 Rising 10%

fc6/32 10/0(%) 1 Wed Apr 2 17:24:39 2014 Falling 0%

fc6/32 10/0(%) 1 Wed Apr 2 17:24:40 2014 Rising 20%

fc6/32 10/0(%) 1 Wed Apr 2 17:25:53 2014 Falling 0%

fc6/32 10/0(%) 1 Wed Apr 2 17:25:54 2014 Rising 20%

100ms Tx Delay

200ms Tx Delay

Page 61: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Level 1: Latency - Troubleshooting

• Included in OBFL error-stats

• Tracked in both Rx and Tx directions

• Indicates 100ms intervals where Tx or Rx credit is not available

Credit Not Available – continued

--------------------------------------------------------------------------------

ERROR STATISTICS INFORMATION FOR DEVICE DEVICE: FCMAC

--------------------------------------------------------------------------------

Interface | | | Time Stamp

Range | Error Stat Counter Name | Count |MM/DD/YY HH:MM:SS

| | |

--------------------------------------------------------------------------------

fc1/13 |F16_TMM_TOLB_TIMEOUT_DROP_CNT |1496855 |04/07/15 22:44:23

fc1/13 |FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO |217 |04/07/15 22:44:23

fc1/13 |FCP_SW_CNTR_CREDIT_LOSS |19 |04/07/15 22:44:23

fc1/13 |F16_TMM_TOLB_TIMEOUT_DROP_CNT |1486654 |04/07/15 22:44:03

fc1/13 |FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO |108 |04/07/15 22:44:03

fc1/13 |FCP_SW_CNTR_CREDIT_LOSS |9 |04/07/15 22:44:03

FCP_SW_CNTR_TX_WT_AVG_B2B_ZEROCredit not available 100ms incrementsIncremented by 217-108 = 109

Credit Loss Incremented by 19 - 9 = 10in 20 seconds

Page 62: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Level 1: Latency - Troubleshooting

• Data frames are sent using low priority credits

• If Tx B2B credit remaining is low then congestion is toward the adjacent switch

• If Rx B2B credit remaining is low then congestion is in this switch and perhaps other switches

• Can only be done while congestion is in progress

Check ISLs for Lack of Transmit Credits

MDS9710# show interface | include "fc|Belong|low

priority|remain" | exclude "description" |exclude

"Peer" | include "trunking" next 3

fc1/3 is trunking

500 receive B2B credit remaining

0 transmit B2B credit remaining

0 low priority transmit B2B credit remaining

Page 63: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Level-1 Troubleshooting: Latency

• Transitions to zero indicate when Tx or Rx B2B credits go to zero even just for an instant of time

• Transmit indicates the adjacent device is withholding credits

• Receive indicates this MDS is withholding credits from the adjacent device

• Look for large incrementing numbers since some devices go to zero normally

Check Transitions to Zero counters

9710-1# show int fc1/13 counters

fc1/13

549317 Transmit B2B credit transitions to zero

2388296 Receive B2B credit transitions to zero

1934443328 2.5us TxWait due to lack of transmit

credits

Percentage Tx credits not available for last

1s/1m/1h/72h: 0%/0%/98%/1%

32 receive B2B credit remaining

17 transmit B2B credit remaining

17 low priority transmit B2B credit remaining

Last clearing of "show interface" counters 01:25:25

Page 64: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Level-1 Troubleshooting: Latency

• “Prio 3” is class 3

• 000004 is port bitmap in hexadecimal indicating the presence of one or more queued frames

• B’0000 0000 0000 0000 0000 0100’

• GI (Hex) is Global Index (egress port)

Check for Frame Queuing on Ingress Portsmodule-1# show hardware internal f16_que inst 0 table iqm-statusmem0

+-------------------------------------------------------------------------------

| IQM: PG0 Status Memory (logical layout) for F16 Que Driver

| Inst 0; port(s) 1-8

|

Note: Only non-zero entries are displayed

Each non-zero bit indicates pending frame in VOQ for that IB

+----------+--------+--------+--------+--------+

| GI (Hex) | Prio 0 | Prio 1 | Prio 2 | Prio 3 |

+----------+--------+--------+--------+--------+

| c | 000000 | 000000 | 000000 | 000004 |

+----------+--------+--------+--------+--------+

rtp-san-33-18-9710-2# show system internal fcfwd idxmap port-to-interface

Port to Interface Table:(All values in hex)

--------------------------------------------------------------------------------

glob| |VL|lcl| if |slot|port| mts | port| flags

idx | if_index | |idx|type| | | node| mode|

-----|--------------------------|--|---|----|----|----|-----|-----|-------------

0| 01000000 fc1/1 | 0| 00| 01 | 00 | 00 | 0102| 00 | 00

1| 01001000 fc1/2 | 0| 01| 01 | 00 | 01 | 0102| 00 | 00

…snip

b| 0100b000 fc1/12 | 0| 0b| 01 | 00 | 0b | 0102| 00 | 00

c| 0100c000 fc1/13 | 0| 0c| 01 | 00 | 0c | 0102| 00 | 00

Port 1Port 3 Port 2

Port fc1/3

Egress port fc1/13

Each instance is 8 ports on this LC

Input interface fc1/3 has frame(s) queued for fc1/13

Page 65: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Level-1 Troubleshooting: Latency

• For generation 3: • slot x show hardware internal up-xbar <0-1> queued-packet-info

• For generation 4:• slot x show hardware internal que inst <0-3> memory iqm-statusmem0|1

• For generation 5/9396S:• slot x show hardware internal f16_que inst 0 table iqm-statusmem0

• 9148, 9250i & 9148S – Not available

• Each instance is a defined number of ports and is LC specific

• Issue command several times and look for patterns of GI that are the same. This is the slow port.

• Real time (instantaneous)

Check for Frame Queuing on Ingress Ports - continued

Page 66: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Level-1 Troubleshooting: Latency

• Request-timeouts indicate frames that could not immediately be sent to “Dest Intf” (egress port -slow)

• Do not indicate actual packet drops – just delayed

• If Dest Intf is FCIP then there are problems on the FCIP tunnel

• Check for TCP retransmits

• Check for overutilization of FCIP

Check for Arbitration Timeouts

MDS9513# show logging onboard flow-control request-timeout

----------------------------

Module: 9

----------------------------

--------------------------------------------------------------------------------

| Dest | Source |Events| Timestamp | Timestamp |

| Intf | Intf | Count| Earliest | Latest |

--------------------------------------------------------------------------------

|fc1/2 |fc9/24, | 28|Sun Feb 9 00:28:23 2014|Sun Feb 9 00:28:24 2014|

--------------------------------------------------------------------------------

Page 67: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Level-1 Troubleshooting: Latency

• txwait is a counter that increments every 2.5us when port is at 0 Tx credits and there are frames queued for transmit

• txwait * 2.5 / 1000000 = seconds of time the port was unable to transmit

• Only applies to the following:

• MDS 9500 with generation 4 linecards:• MDS 9000 Family 32-Port 8-Gbps Advanced Fibre Channel Switching Module (DS-X9232-256K9)

• MDS 9000 Family 48-Port 8-Gbps Advanced Fibre Channel Switching Module (DS-X9248-256K9)

• MDS 9700 48-Port 16-Gbps Fibre Channel Switching Module (DS-X9448-768K9)

• MDS 9148S 16G Multilayer Fabric Switch

• MDS 9250i Multiservice Fabric Switch

• MDS 9396S 16G Multilayer Fabric Switch

• Others will return zero

txwait New!

Page 68: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Level-1 Troubleshooting: Latency

• txwait can be seen in the following:

• show interface counters

• Raw value in 2.5us units

• show interface counters

• Percentage Tx credits are available for last 1s/1m/1h/72h

• show process creditmon txwait-history

• 60sec, 60min, 72hour graphs

• show logging onboard txwait

• SNMP fcIfTxWaitCount variable

txwait - continued New!

Page 69: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Level-1 Troubleshooting: Latency

mds9710-1# show interface fc1/13 counters | i fc|wait

fc1/13

6252650 2.5us Txwaits due to lack of transmit credits

6252650 * 2.5 / 1000000 = 15.631625 seconds

The above indicates the MDS was not able to transmit for over 15 seconds since the counters were cleared last

txwait - show interface counters New!

Page 70: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Level-1 Troubleshooting: Latency

• Utilizes the underlying txwait counter

txwait - Percentage Tx credits are available for last 1s/1m/1h/72h

MDS9710-1# show interface fc1/13 counters

fc1/13

5 Transmit B2B credit transitions to zero

2 Receive B2B credit transitions to zero

0 2.5us TxWait due to lack of transmit credits

Percentage Tx credits not available for last 1s/1m/1h/72h: 1%/5%/3%/2%

32 receive B2B credit remaining

128 transmit B2B credit remaining

128 low priority transmit B2B credit remaining

New!

Page 71: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Level-1 Troubleshooting: Latency

MDS9513# show logging onboard txwait module 4

---------------------------------

Module: 4 txwait count

---------------------------------

Notes:

- Sampling period is 20 seconds

- Only txwait delta >= 100 ms are logged

-----------------------------------------------------------------------------

| Interface | Delta TxWait Time | Congestion | Timestamp |

| | 2.5us ticks | seconds | | |

-----------------------------------------------------------------------------

| fc4/1 | 52927 | 0 | 0% | Wed May 27 13:20:12 2015 |

| fc4/1 | 52926 | 0 | 0% | Wed May 27 13:19:52 2015 |

| fc4/1 | 105854 | 0 | 1% | Wed May 27 13:19:32 2015 |

| fc4/1 | 52926 | 0 | 0% | Wed May 27 13:19:12 2015 |

• Delta values recorded when they are more than 100ms in the

20 second interval

txwait - show logging onboard txwait New!

Recorded every

20 seconds

Page 72: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Level-1 Troubleshooting: Latency

• Graphical display of time where Tx credits are not available

• Similar in format to cpu history

• 3 graphs per port

• Last 60 seconds

• Last 60 minutes

• Last 72 hours

• Utilizes the underlying txwaitcounter

txwait-history

mds9710-1# show process creditmon txwait-history module 1 port 13

TxWait history for port fc1/13:

==============================

697 54 6994

299 18 4780

000000000000000000000000000000000029000290088400000000000000

1000 # ##

900 # ##

800 ## ##

700 ## ##

600 ### ###

500 ### ## ###

400 ### ## ####

300 ### ## ####

200 ### ## ####

100 ### ## ####

0....5....1....1....2....2....3....3....4....4....5....5....6

0 5 0 5 0 5 0 5 0 5 0

Credit Not Available per second (last 60 seconds)

# = TxWait (ms)

New!

Page 73: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Level-1 Troubleshooting: Latency

• system timeout slowport-monitor <1-500> mode e|f

• Events are captured every 100ms

• Last 10 events per port captured in slowport-monitor-events

• Logging onboard slowport-monitor-events captures more events

• Currently implemented for:

• 9500 • Gen 3 LCs - DS-X9248-48K9 and DS-X92xx-96K9 modules

• Gen 4 LCs - DS-X9232-256K9 and DS-X9248-256K9 modules

• 9700 & 9396S (Gen 5)

• 9250i & 9148S

• Differences exist between Gen3, Gen4 and 9700/9250i/9148S/9396S

slowport-monitor New!

Page 74: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Level-1 Troubleshooting: Latency

• system timeout slowport-monitor… must be configured

• Events are captured every 100ms

• Last 10 events per port captured in slowport-monitor-events

• Differences exist between Gen3, Gen4 and 9700/9250i/9148S/9396S

show process creditmon slowport-monitor command New!

Page 75: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Level-1 Troubleshooting: Latency

• Gen3 modules have basic HW capabilities

• Each 100ms it can only be determined if port was at zero Txcredits for the admin delay period

• The actual amount of time and the number of times in that 100ms cannot be determined

• Recorded when at least one complete event occurred

• No oper delay

Slowport-monitor – Gen3 LCs - DS-X9248-48K9 and DS-X92xx-96K9 modules

mds9513# show process creditmon slowport-monitor-events module 2

Module: 02 Slowport Detected: YES

==================================================================

=======

Interface = fc2/1

--------------------------------------------------------

| admin | slowport | Timestamp |

| delay | detection | |

| (ms) | count | |

--------------------------------------------------------

| 10 | 194 | 1. 04/29/15 17:19:13.345 |

| 10 | 193 | 2. 04/29/15 17:19:13.245 |

| 10 | 192 | 3. 04/29/15 17:19:13.145 |

| 10 | 191 | 4. 04/29/15 17:19:13.045 |

| 10 | 190 | 5. 04/29/15 17:19:12.945 |

| 10 | 189 | 6. 04/29/15 17:19:12.845 |

| 10 | 188 | 7. 04/29/15 17:19:12.745 |

| 10 | 187 | 8. 04/29/15 17:19:12.645 |

| 10 | 186 | 9. 04/29/15 17:19:12.545 |

| 10 | 185 |10. 04/29/15 17:19:12.445 |

--------------------------------------------------------

Only 1 event in last 100ms

100ms intervals

Configured delay 10ms

New!

Page 76: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Level-1 Troubleshooting: Latency

• Gen4 modules use txwait for slowport-monitor

• Recorded when txwait is >= admin delay within 100ms

• oper delay is cumulative delay

• Txwait is cumulative for the 100ms interval

• 1 x 10ms

• 10 x 1ms

Slowport-monitor – Gen4 LCs - DS-X92xx-256K9

MDS9513# show process creditmon slowport-monitor-events module 4

Module: 04 Slowport Detected: YES

==================================================================

Interface = fc4/1

----------------------------------------------------------------

| admin | slowport | txwait| Timestamp |

| delay | detection | oper | |

| (ms) | count | delay | |

| | | (ms) | |

----------------------------------------------------------------

| 10 | 18 | 16 | 1. 05/21/15 14:39:09.102 |

| 10 | 17 | 56 | 2. 05/21/15 14:39:09.002 |

| 10 | 16 | 59 | 3. 05/21/15 14:39:08.905 |

| 10 | 15 | 10 | 4. 05/21/15 14:38:54.590 |

| 10 | 14 | 41 | 5. 05/21/15 14:38:54.490 |

| 10 | 13 | 80 | 6. 05/21/15 14:38:54.390 |

| 10 | 12 | 37 | 7. 05/21/15 14:38:39.970 |

| 10 | 11 | 56 | 8. 05/21/15 14:38:39.870 |

| 10 | 10 | 34 | 9. 05/21/15 14:38:39.775 |

| 10 | 9 | 29 |10. 05/21/15 14:38:25.430 |

----------------------------------------------------------------

Only 1 event per 100ms

100ms intervalsConfigured delay 10ms

New!

Cumulative delay in 100ms

Page 77: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Level-1 Troubleshooting: Latency

• Gen5/9250i/9148S/9396S have enhanced HW capabilities

• Each 100ms interval the number of times Tx credits remained at 0 for the configured(admin) delay is counted.

• The average operational delay is determined – This is how long the port was at 0 Tx credits

• Recorded when at least one complete event occurred

slowport-monitor – 9700/9250i/9148S/9396S (Gen 5 LCs)

MDS9710-1# show process creditmon slowport-monitor-events

Module: 01 Slowport Detected: YES

==================================================================

=======

Interface = fc1/13

----------------------------------------------------------------

| admin | slowport | oper | Timestamp |

| delay | detection | delay | |

| (ms) | count | (ms) | |

----------------------------------------------------------------

| 5 | 1300 | 20 | 1. 04/01/15 23:03:38.823 |

| 5 | 1296 | 19 | 2. 04/01/15 23:03:38.724 |

| 5 | 1291 | 19 | 3. 04/01/15 23:03:38.623 |

| 5 | 1256 | 19 |10. 04/01/15 23:03:37.923 |

----------------------------------------------------------------

te

Configured delay(5ms)

Actual average delay

4 events in last 100ms

New!

Note: Operdelay limited by no-credit-drop threshold

Page 78: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Level-1 Troubleshooting: Latencyslowport-monitor – Comparison 2 events in 100ms

Oper delay

15ms+30ms/2 = 22ms

2 events logged

0 51

01

5

Cre

dits

20

25

30

35

40

45

50

55

60

65

70

75 80

85

90

95

10

0

Time (ms)

0 51

01

5

Cre

dits

20

25

30

35

40

45

50

55

60

65

70

75 80

85

90

95

10

0

Time (ms)

0 Tx >= 5ms in 100ms

1 event logged

9500 Gen3

Gen5/9250i/9148S/9396S

0 51

01

5

Cre

dits

20

25

30

35

40

45

50

55

60

65

70

75 80

85

90

95

10

0

Time (ms)

9500 Gen4

Total time 45ms >= 5ms in 100ms

1 event logged

Poll

Sys

tem

tim

eo

ut

slo

wp

ort

-mo

nit

or

5

Page 79: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Level-1 Troubleshooting: Latency

More events available via logging onboardMDS9710-1# show logging onboard slowport-monitor-events

---------------------------------

Module: 1 slowport-monitor-events

---------------------------------

--------------------------------------------------------------------------

| admin | slowport | oper | Timestamp | Interface

| delay | detection | delay | |

| (ms) | count | (ms) | |

--------------------------------------------------------------------------

| 20 | 49 | 489 | 05/11/15 21:04:46.779 | fc1/13

| 20 | 48 | 489 | 05/11/15 21:04:46.272 | fc1/13

| 20 | 47 | 489 | 05/11/15 21:04:45.779 | fc1/13

| 20 | 46 | 489 | 05/11/15 21:04:45.272 | fc1/13

show logging onboard slowport-monitor command New!

Gen5/9250i/9148S/9396S

Page 80: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Level-1 Troubleshooting: Latency

Linecard Maximum events

per 100ms interval

Actual delay

measured?

Notes

DS-X9248-48K9 (gen3)

DS-X9224-96K9 (gen3)

DS-X9248-96K9 (gen3) 1

No – Just an

indication if admin

delay was reached.

Actual delay could be

much more

If actual delay hits slowport-monitor

admin delay then an indication is

made. That indication is checked

every 100ms and if true then raise

event

DS-X9232-256K9 (gen4)

DS-X9248-256K9 (gen4)1

Yes - Actual delay is

total delay per 100ms

interval

If total delay(sum of all individual

delays) in 100ms interval hits

slowport-monitor admin delay then

raise event

DS-X9448-768K9 (gen5)

MDS 9396S(gen5)

MDS 9148S

MDS 9250i

100

Yes – Average delay

for all events in

100ms interval

If actual delay hits slowport-monitor

admin delay and port

recovered(received credit) then

raise event. These are checked

every 100ms interval.

Slowport-monitor – Comparison

Page 81: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Level-1 Troubleshooting: Latency

• Contains all the commands available that pertain to slow drain

• Contains “context” commands to understand the FC topology

• Contains name server commands to identify devices

• Contains active zonesets to understand device relationships

• Most useful when run from DCNM and gathered for the entire fabric

• SAN Client -> Tools -> Run CLI Commands…

show tech-support slowdrain New!

Page 82: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

SAN Congestion! BRKSAN-3446

DCNM Slow Drain Analysis

Page 83: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

DCNM Slow Drain Analysis

• DCNM 7.1(1) added Slow Drain Analysis

• Used for pulling fabric wide slow drain counters for a defined period of time

• Useful for ongoing slow drain problems

• Accessed from the Web Client Health -> Diagnostics -> Slow drain Analysis

New!

Page 84: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

DCNM Slow Drain AnalysisStarting

Slow Drain Analysis

Page 85: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

DCNM Slow Drain Analysis3 steps to initiate collection of slow drain counters for a fabric

Step 1:

Select

fabric

Step 2:

Choose

duration

Step 3:

Start

collection

Page 86: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

DCNM Slow Drain AnalysisWhile underway…

Almost

finished

Page 87: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

DCNM Slow Drain AnalysisFinished

Select

job

Page 88: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

DCNM Slow Drain AnalysisCompleted Report

509 credit

loss events

in 10

minutes!

Only show

rows with

non-zero

counters

Filter results

as needed

Page 89: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

DCNM Slow Drain AnalysisCounter explanations - help

Hover over

counter for

addition

information

Page 90: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

DCNM Slow Drain AnalysisShow non-zero data rows only

Only show

rows with

non-zero

counters

Only 3

rows with

non-zero

counters

Page 91: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

DCNM Slow Drain AnalysisFiltering

Filter results

as needed

Page 92: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

SAN Congestion! BRKSAN-3446

Slow Drain Alerting and Mitigation

Page 93: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow Drain Alerting and Mitigation

• Port Monitor

• Congestion counters

• Portguard

• Adjust Congestion Drop Threshold

• Setting the No Credit Drop Threshold

Page 94: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow Drain Alerting and Mitigation

• Port-monitor allows monitoring of several counters relating to slow drain

• credit-loss-reco Credit loss recovery counter

• lr-rx The number of link resets received by the fc-port

• lr-tx Link resets transmitted by the fc-port

• timeout-discards Timeout discards counter

• tx-credit-not-available Credit not available counter(in 100ms increments)

• tx-discards Tx discards counter

• slowport-count Number of slowport events

• slowport-oper-delay Slowport operational delay

• txwait Amount of time at 0 Tx credits and packets queued

Port-monitor alerting

Note: There are other counters that are valuable and should also be considered for inclusion in monitoring but are not part of slow drain

New!

New!

New!

Page 95: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow Drain Alerting and Mitigation

• Number of times credit loss recovery was initiated due to port at 0 Tx credits for 1/1.5 seconds

• Most severe indication of congestion

• Normally other counters like timeout-discards will also increment

• Configure as a simple delta counter with a low value

• Applies to all types of switches and linecards

Port-monitor counter - credit-loss-reco

Page 96: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow Drain Alerting and Mitigation

• Number of times a Link Reset(LR) was received(lr-rx)

• Number of times a Link Reset(LR) was transmitted(lr-tx)

• Similar to credit-loss-reco counter

• May increment for other reasons besides congestion

• Normally other counters like timeout-discards will also increment

• Configure as a simple delta counter with a low value

• Applies to all types of switches and linecards

Port-monitor counter - lr-rx and lr-tx

Page 97: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow Drain Alerting and Mitigation

• Number of packets dropped due to reaching the congestion-drop (timeout) threshold

• When packets are dropped SCSI errors will result at the hosts and targets

• Configure as a simple delta counter with a low value

• Applies to all types of switches and linecards

Port-monitor counter - timeout-discards

Page 98: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow Drain Alerting and Mitigation

• Indicates 100ms intervals of a port at 0 Tx credits

• rising-threshold is configured as a percentage of polling-

interval(1 second)

• Examples:

• counter tx-credit-not-available poll-interval 1 delta rising-

threshold 10 event 4 falling-threshold 0 event 4

• 10 is 10% of 1 second or 100ms

• counter tx-credit-not-available poll-interval 1 delta rising-

threshold 20 event 4 falling-threshold 0 event 4

• 20 is 20% of 1 second or 200ms

• Only multiples 10 (10, 20, 30, etc…) should be configured

• Applies to all types of switches and linecards

Port-monitor counter - tx-credit-not-available

Page 99: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow Drain Alerting and Mitigation

• The number of packets dropped at egress for a variety of reasons.

• This counter would include timeout-drops as well

• Configure as a simple delta counter with a low value

• Applies to all types of switches and linecards

Port-monitor counter - tx-discards

Page 100: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow Drain Alerting and Mitigation

• Counts the number of times the slowport-monitor threshold was reached

• Only applies to MDS 9500 with generation 3 linecards

• 1/2/4/8 Gbps 24-Port Fibre Channel switching module (DS-X9224-96K9)

• 1/2/4/8 Gbps 48-Port Fibre Channel switching module (DS-X9248-96K9)

• 1/2/4/8 Gbps 4/44-Port Fibre Channel switching module (DS-X9248-48K9)

• Only counts a maximum of once per 100ms interval (10 per second)

• Indicates 0 Tx credits for at least the slowport-monitor interval

• Slowport-monitor must be configured for this to alert

• Refer to gen3 slowport-monitor section for more info

Port-monitor counter - Slowport-count New!

Page 101: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow Drain Alerting and Mitigation

• Alerts on slowport operational(actual) delay

• Only applies to the following

• MDS 9500 with generation 4 linecards• MDS 9000 Family 32-Port 8-Gbps Advanced Fibre Channel Switching Module (DS-X9232-256K9)

• MDS 9000 Family 48-Port 8-Gbps Advanced Fibre Channel Switching Module (DS-X9248-256K9)

• MDS 9700 48-Port 16-Gbps Fibre Channel Switching Module (DS-X9448-768K9)

• MDS 9148S 16G Multilayer Fabric Switch

• MDS 9250i Multiservice Fabric Switch

• MDS 9396S 16G Multilayer Fabric Switch

• Alerts on operational(actual) delay not on the admin(configured) delay

Port-monitor counter - slowport-oper-delay New!

Page 102: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow Drain Alerting and Mitigation

• Configured as an absolute counter

• Slowport-monitor must be configured for this to alert!

• Refer to Gen4 slowport-monitor section for more info

• Refer to Gen5/9250i/9148S/9396S slowport-monitor section for more info

Port-monitor counter - slowport-oper-delay - continued New!

Page 103: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow Drain Alerting and Mitigation

• Measures time port is at 0 Tx credits and frames are queued to send

• Only applies to the following

• MDS 9500 with generation 4 linecards• MDS 9000 Family 32-Port 8-Gbps Advanced Fibre Channel Switching Module (DS-X9232-256K9)

• MDS 9000 Family 48-Port 8-Gbps Advanced Fibre Channel Switching Module (DS-X9248-256K9)

• MDS 9700 48-Port 16-Gbps Fibre Channel Switching Module (DS-X9448-768K9)

• MDS 9148S 16G Multilayer Fabric Switch

• MDS 9250i Multiservice Fabric Switch

• MDS 9396S 16G Multilayer Fabric Switch

• Configured as a percentage of the polling interval

Port-monitor counter - txwaitNew!

Page 104: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow Drain Alerting and Mitigation

Linecard slowport-count slowport-oper-delay tx-wait

DS-X9248-48K9 (gen3) X

DS-X9224-96K9 (gen3) X

DS-X9248-96K9 (gen3) X

DS-X9232-256K9 (gen4) X X

DS-X9248-256K9 (gen4) X X

DS-X9448-768K9 (gen5) X X

MDS 9148S X X

MDS 9250i X X

MDS 9396S X X

Port-monitor slowport counters - comparison

Page 105: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow Drain Alerting and Mitigation

• Port-monitor allows separate policies

• F, FL ports(access)

• E, TL ports(trunks)

• Both F ports and E ports

• Only one policy type per port can be active at a time

• Note: port-type access includes F port connections to NPV switches that can carry several logins

• Note: NP ports are not currently monitored

Port-monitor alerting

MDS9513(config-port-monitor)# port-type ?

access-port Configure port-monitoring for access ports

all Configure port-monitoring for all ports

trunks Configure port-monitoring for trunk ports

Page 106: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow Drain Alerting and Mitigation

• counter <name> poll-interval <interval> delta rising-threshold <rthresh> event <id> falling-threshold <fthres> event <id> <portguard errordisable | flap>

• poll-interval – Seconds - How often should this counter be checked?

• delta – Compare the current value with the value at the previous poll interval

• absolute – Match the actual value

• rising-threshold – How much the counter must increase in this poll interval to trigger

• event – Indicates severity of alert - info, warning, error, etc.

• falling-threshold - How much the counter must decrease in this poll interval to reset

• portguard – Optional – Action to take when rising-threshold is reached• errordisable – Place put in error-disable state. Requires manual shut/no shut to re-activate

• flap – shut/no shut port

Port-monitor alerting - continued

Page 107: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow Drain Alerting and Mitigation

• Monitor-counter command determines which counters are active in a policy

Port-monitor alerting – continued

rtp-san-33-18-9710-1(config-port-monitor)# monitor counter ?

credit-loss-reco Configure credit loss recovery counter

err-pkt-from-port Configure err-pkt-from-port counter

err-pkt-from-xbar Configure err-pkt-from-xbar counter

err-pkt-to-xbar Configure err-pkt-to-xbar counter

invalid-crc Configure invalid-crc counter

invalid-words Configure invalid-words counter

link-loss Configure link-failure counter

lr-rx Configure the number of link resets received by the fc-port

lr-tx Configure the number of link resets transmitted by the fc-port

rx-datarate Configure rx performance counter

signal-loss Configure signal-loss counter

slowport-count Configure slow port sub-100ms counter

slowport-oper-delay Configure slow port operation delay

sync-loss Configure sync-loss counter

timeout-discards Configure timeout discards counter

tx-credit-not-available Configure credit not available counter

tx-datarate Configure tx performance counter

tx-discards Configure tx discards counter

txwait Configure tx total wait counter

Page 108: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow Drain Alerting and Mitigation

• Event indicates severity in alert

• 1 – Fatal

• 2 – Critical

• 3 – Error

• 4 – Warning

• 5 - Informational

Port-monitor alerting – RMON event severities

mds9513(config-port-monitor)# show rmon events

Event 1 is active, owned by PMON@FATAL

Description is FATAL(1)

Event firing causes log and trap to community public, last fired never

Event 2 is active, owned by PMON@CRITICAL

Description is CRITICAL(2)

Event firing causes log and trap to community public, last fired never

Event 3 is active, owned by PMON@ERROR

Description is ERROR(3)

Event firing causes log and trap to community public, last fired never

Event 4 is active, owned by PMON@WARNING

Description is WARNING(4)

Event firing causes log and trap to community public, last fired

2014/02/21-17:13:11

Event 5 is active, owned by PMON@INFO

Description is INFORMATION(5)

Event firing causes log and trap to community public, last fired

2014/03/08-08:25:19

Page 109: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow Drain Alerting and MitigationPort-monitor alerting – Example

port-monitor name AllPorts

port-type all

no monitor counter link-loss

no monitor counter sync-loss

no monitor counter signal-loss

no monitor counter invalid-words

no monitor counter invalid-crc

counter tx-discards poll-interval 60 delta rising-threshold 50 event 4 falling-threshold 10 event 4

counter lr-rx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4

counter lr-tx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4

counter timeout-discards poll-interval 60 delta rising-threshold 50 event 4 falling-threshold 10 event 4

counter credit-loss-reco poll-interval 60 delta rising-threshold 1 event 4 falling-threshold 0 event 4

counter tx-credit-not-available poll-interval 1 delta rising-threshold 10 event 4 falling-threshold 0 event 4

no monitor counter rx-datarate

no monitor counter tx-datarate

no monitor counter err-pkt-from-port

no monitor counter err-pkt-to-xbar

no monitor counter err-pkt-from-xbar

counter slowport-count poll-interval 1 delta rising-threshold 5 event 4 falling-threshold 0 event 4

counter slowport-oper-delay poll-interval 1 absolute rising-threshold 50 event 4 falling-threshold 0 event 4

counter txwait poll-interval 1 delta rising-threshold 20 event 4 falling-threshold 0 event 4

Policy applies to Access(F) and Trunk(E) ports

These counters are not monitored

Note: The above monitors 9 slow drain counters and does not monitor 10 others

New!

Page 110: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow Drain Alerting and Mitigation

MDS9710-1# show port-monitor AllPorts

Policy Name : AllPorts

Admin status : Not Active

Oper status : Not Active

Port type : All Ports

---------------------------------------------------------------------------------------------------------

Counter Threshold Interval Rising Threshold event Falling Threshold event PMON Portguard

------- --------- -------- ---------------- ----- ------------------ ----- --------------

TX Discards Delta 60 50 4 10 4 Not enabled

LR RX Delta 60 5 4 1 4 Not enabled

LR TX Delta 60 5 4 1 4 Not enabled

Timeout Discards Delta 60 50 4 10 4 Not enabled

Credit Loss Reco Delta 60 1 4 0 4 Not enabled

TX Credit Not Available Delta 1 10% 4 0% 4 Not enabled

slowport-count Delta 1 5 4 0 4 Not enabled

slowport-oper-delay Absolute 1 50ms 4 0ms 4 Not enabled

txwait Delta 1 20% 4 0% 4 Not enabled

----------------------------------------------------------------------------------------------------------

Port-monitor alerting – activation and output

New!

Page 111: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow Drain Alerting and Mitigation

• SNMP traps that are sent with the following object identifiers (OIDs):

• fcIfTxWtAvgBBCreditTransitionToZero: 1.3.6.1.4.1.9.9.289.1.2.1.1.38 • Note: There is no OID in the Rx direction.

• fcIfCreditLoss: 1.3.6.1.4.1.9.9.289.1.2.1.1.37

• fcIfLinkResetOuts: 1.3.6.1.4.1.9.9.289.1.2.1.1.10

• fcIfLinkResetIns: 1.3.6.1.4.1.9.9.289.1.2.1.1.9

• fcIfTimeOutDiscards: 1.3.6.1.4.1.9.9.289.1.2.1.1.35

• fcIfOutDiscards: 1.3.6.1.4.1.9.9.289.1.2.1.1.36

• fcIfSlowportCount: 1.3.6.1.4.1.9.9.289.1.2.1.1.44

• fcIfSlowportOperDelay: 1.3.6.1.4.1.9.9.289.1,2,1,1,45

• fcIfTxWaitCount: 1.3.6.1.4.1.9.9.289.1.2.1.1.15

SNMP trap OIDs sent by port-monitor

New!

New!

New!

Page 112: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow Drain Alerting and Mitigation

• Adding portgard to errdisable or flap a port can help the switch automatically

mitigate problems

• Should be done to access(F) ports only

• Use separate access(F) and trunk(E) policies

• Applies to delta counters only

Port-monitor portguard

Page 113: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow Drain Alerting and Mitigation

• The following adds portguard to timeout-discards and credit-loss-reco and adjusts the rising-threshold up a bit:

Port-monitor portguard - continued

port-monitor name AccessPorts

port-type access

no monitor counter link-loss

no monitor counter sync-loss

no monitor counter signal-loss

no monitor counter invalid-words

no monitor counter invalid-crc

counter tx-discards poll-interval 60 delta rising-threshold 50 event 4 falling-threshold 10 event 4

counter lr-rx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4

counter lr-tx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4

counter timeout-discards poll-interval 60 delta rising-threshold 60 event 4 falling-threshold 10 event 4 portguard errordisable

counter credit-loss-reco poll-interval 60 delta rising-threshold 4 event 4 falling-threshold 0 event 4 portguard errordisable

counter tx-credit-not-available poll-interval 1 delta rising-threshold 10 event 4 falling-threshold 0 event 4

no monitor counter rx-datarate

no monitor counter tx-datarate

no monitor counter err-pkt-from-port

no monitor counter err-pkt-to-xbar

no monitor counter err-pkt-from-xbar

counter slowport-count poll-interval 1 delta rising-threshold 5 event 4 falling-threshold 0 event 4

counter slowport-oper-delay poll-interval 1 absolute rising-threshold 50 event 4 falling-threshold 0 event 4

counter txwait poll-interval 1 delta rising-threshold 20 event 4 falling-threshold 0 event 4

Error disable the port when 60 timeout-discards happen in 60 seconds

Error disable the port when 4 credit loss recovery events occur in 60 seconds

Access(F) port policy

New!

Page 114: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow Drain Alerting and MitigationPort-monitor portguard – trunk (E) port policy

port-monitor name ISLPorts

port-type trunks

no monitor counter link-loss

no monitor counter sync-loss

no monitor counter signal-loss

no monitor counter invalid-words

no monitor counter invalid-crc

counter tx-discards poll-interval 60 delta rising-threshold 100 event 4 falling-threshold 10 event 4

counter lr-rx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4

counter lr-tx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4

counter timeout-discards poll-interval 60 delta rising-threshold 100 event 4 falling-threshold 10 event 4

counter credit-loss-reco poll-interval 60 delta rising-threshold 1 event 4 falling-threshold 0 event 4

counter tx-credit-not-available poll-interval 1 delta rising-threshold 10 event 4 falling-threshold 0 event 4

no monitor counter rx-datarate

no monitor counter tx-datarate

no monitor counter err-pkt-from-port

no monitor counter err-pkt-to-xbar

no monitor counter err-pkt-from-xbar

counter slowport-count poll-interval 1 delta rising-threshold 5 event 4 falling-threshold 0 event 4

counter slowport-oper-delay poll-interval 1 absolute rising-threshold 80 event 4 falling-threshold 0 event 4

counter txwait poll-interval 1 delta rising-threshold 20 event 4 falling-threshold 0 event 4

TrunkE) port policy

New!

Page 115: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow Drain Alerting and Mitigation

mds9710-1# show port-monitor active

Policy Name : ISLPorts

Admin status : Active

Oper status : Active

Port type : All Trunk Ports

---------------------------------------------------------------------------------------------------------

Counter Threshold Interval Rising Threshold event Falling Threshold event PMON Portguard

------- --------- -------- ---------------- ----- ------------------ ----- --------------

TX Discards Delta 60 100 4 10 4 Not enabled

LR RX Delta 60 5 4 1 4 Not enabled

LR TX Delta 60 5 4 1 4 Not enabled

Timeout Discards Delta 60 100 4 10 4 Not enabled

Credit Loss Reco Delta 60 1 4 0 4 Not enabled

TX Credit Not Available Delta 1 10% 4 0% 4 Not enabled

slowport-count Delta 1 5 4 0 4 Not enabled

slowport-oper-delay Absolute 1 50ms 4 0ms 4 Not enabled

txwait Delta 1 20% 4 0% 4 Not enabled

----------------------------------------------------------------------------------------------------------

Continued next slide…

Port-monitor portguard – when activated

New!

Page 116: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow Drain Alerting and Mitigation

…continued from previous slide

Policy Name : AccessPorts

Admin status : Active

Oper status : Active

Port type : All Access Ports

---------------------------------------------------------------------------------------------------------

Counter Threshold Interval Rising Threshold event Falling Threshold event PMON Portguard

------- --------- -------- ---------------- ----- ------------------ ----- --------------

TX Discards Delta 60 50 4 10 4 Not enabled

LR RX Delta 60 5 4 1 4 Not enabled

LR TX Delta 60 5 4 1 4 Not enabled

Timeout Discards Delta 60 60 4 10 4 Error Disable

Credit Loss Reco Delta 60 4 4 0 4 Error Disable

TX Credit Not Available Delta 1 10% 4 0% 4 Not enabled

slowport-count Delta 1 5 4 0 4 Not enabled

slowport-oper-delay Absolute 1 50ms 4 0ms 4 Not enabled

Tx wait Delta 1 20% 4 0% 4 Not enabled

----------------------------------------------------------------------------------------------------------

Port-monitor portguard – when activated - continued

New!

Page 117: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow Drain Alerting and MitigationDCNM event log

Page 118: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow Drain Alerting and Mitigation

• Lowering congestion drop timeout value from 500ms to 200ms

• Frees up ingress buffer space quicker

• Can be set differently on F and E ports

• Congestion timeout for mode F should be smaller than(or equal to) mode E.

• Global command for switch

• Recommended for F ports

Adjust Congestion Drop Threshold Lower

system timeout congestion-drop 200 mode f

0 sec --

200ms --

Credit

Frame

Frame

Frame

Frame

Check Timestamp

of each frame

Drop the Frames

from the queue

Page 119: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Setting the No Credit Drop Threshold

• No-credit-drop causes frames to be dropped immediately if the destination port is at 0 Tx credits for the time specified

• Should be used in conjunction with lowering congestion-drop threshold

• Recommended for F ports

• Can drastically improve ISL performance under slow drain conditions

• xxx_FORCE_TIMEOUT_ON/OFF counter

• By default no-credit-drop is not enabled

Setting the No Credit Drop Threshold system timeout no-credit-drop 200 mode f

Page 120: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Test results – congestion-timeout/no-credit-timeout Topology

ISL

Slow

Drain

Device

Ag104/1

Ag104/3

Ag104/4

Ag104/2Fc1/13

Fc1/14

Fc1/3 Fc1/3

Fc1/13

Fc1/14

4Gbps

4Gbps 4Gbps

4Gbps

8Gbps

Page 121: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Test results – congestion-timeout/no-credit-timeout 104/4 R-Rdy delay 300ms - Default timeout settings – frames/sec

Page 122: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Test results – congestion-timeout/no-credit-timeout 104/4 R-Rdy delay 300ms – Congestion-drop/no-credit-drop 200ms

Almost 3X improvement on the flow!

Page 123: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

SAN Congestion! BRKSAN-3446

Summary

Page 124: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Summary

• FC B2B flow control helps reduce packet loss

• Devices with problems can cause congestion problems in the fabric

• This congestion can propagate through the fabric affecting unrelated devices

• MDS has several features designed to alert, identify and mitigate

• Classify your problem and follow the troubleshooting guidelines

Page 125: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Troubleshooting Summary

• Proactive

Configure slowport-monitor

Configure congestion-drop and no-credit-drop

Configure port-monitor policies

• Reactive

Use several show logging onboard commands with starttime option to display events

Where do you start?

Page 126: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Troubleshooting Summary

• Configure slowport-monitor @ 10-25ms for both E & F ports

system timeout slowport-monitor 10 mode e

system timeout slowport-monitor 10 mode f

• Configure congestion-drop on F ports

system timeout congestion-drop 200ms mode f

Don’t go below 200ms!

• Configure no-credit-drop on F ports

System timeout no-credit-drop <ms> mode f

200ms – safe, 100ms – aggressive, 50ms – Very aggressive

• Configure port-monitor policy(s)

Use samples included in port-monitor section

Proactive

Page 127: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Troubleshooting Summary

• Show logging onboard <starttime mm/dd/yy-00:00:00> error-stats

Includes timestamped indications of all three levels of congestion

Credit-loss-recovery

timeout-discards

Latency 100ms Tx & Rx average wait

• Show logging onboard <starttime mm/dd/yy-00:00:00> slowport-monitor-events

Includes timestamped slowport-monitor-events

Mostly for grade 1 (latency) issues

• Show logging onboard <starttime mm/dd/yy-00:00:00> txwait

Includes timestamped interfaces that had >=100ms delay in 20 seconds

Mostly for grade 1 (latency) issues

Reactive

Page 128: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Additional References

• Slow Drain Device Detection and Congestion Avoidance Whitepaper

• http://www.cisco.com/c/en/us/products/collateral/storage-networking/mds-9700-series-multilayer-directors/white_paper_c11-729444.html

• Generation 4 (gen4) Linecard Slow Drain Counters and Commands Troubleshooting

• http://www.cisco.com/c/en/us/support/docs/storage-networking/mds-9509-multilayer-director/116098-trouble-gen4-00.html

• MDS 9148 Slow Drain Counters and Commands

• http://www.cisco.com/c/en/us/support/docs/storage-networking/mds-9100-series-multilayer-fabric-switches/116401-trouble-mds9148-00.html

Page 129: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

MDS Command Reference

Command Function

show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] Displays basic interface information including Tx/Rx credit values agreed to

in the FLOGI/ELP exchange and Tx/Rx credits remaining. Displays one or

more interfaces.

show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters Displays more counters pertaining to the interface including:

Timeout-discards, credit-loss, Tx/Rx transitions to zero, Tx/Rx credits

remaining, Txwait in 2.5us, average Txwait for 1s/1m/1h/72h. Displays one

or more interfaces.

show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters details Displays more counters pertaining to the interface but regarding slow drain,

not much different than the above. This will only work for a specified

interface or range of interfaces. To display all fc interfaces use the show

interface detailed-counters command. Displays one or more interfaces.

show interface detailed-counters Displays more counters for all interfaces but regarding slow drain, not much

different than the show interface counters command.

show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss]

error-stats

Display OBFL error-stats. This contains many counters related to slow drain

including timeout drops, tx/rx 100ms credit-not-available, credit-loss, force-

timeout, etc. Often the first command to use.

show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss]

txwait

Display txwait delta values recorded when greater than 99ms per 20 second

interval

MDS 9500

Page 130: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

MDS Command ReferenceMDS 9500 - continued

Command Function

show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss]

slowport-monitor-events

Display OBFL slowport-monitor-events. This is similar to show process

creditmon slowport-monitor-events but will likely contain more than 10 events

per interface

show logging onboard flow-control [starttime mm/dd/yy-hh:mm:ss]

request-timeout [module x]

Display OBFL arbitration timeouts. Note these are not packet drops. These

likely indicate the destination interface listed is congested. The source

interface will retry the arbitration request.

show hardware internal statistics [module x] [device all|fcmac] Displays statistical information for ports which include errors as well

show hardware internal statistics module x pktflow dropped Displays packet drop counters

show hardware internal errors [module x] Displays error information for ports. Error indications include frame

drops/discards for various reasons including timeout discards.

slot x show port-config internal link-events Linecard command to display the link-events log. This is a concise event log

for interfaces going up and down.

show process creditmon credit-loss-events [module x] Display last 10 credit loss events per interface

show process creditmon slowport-monitor-events [module x] Display last 10 slowport-monitor events per interface

System timeout slowport-monitor must be configured

Page 131: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

MDS Command ReferenceMDS 9500 - continued

Command Function

show process creditmon txwait-history [module x] [port x] Display 60 second, 60 minute, 72 hour histogram graphs

Only valid for generation 4 linecards

show system internal snmp credit-not-available Displays instances of the 100ms tx-credit-not-available

slot x show hardware internal up-xbar <0-1> queued-packet-info Displays information indicating packets that are momentarily queued.

For generation 3 linecards only

slot x show hardware internal que inst <0-3> memory iqm-statusmem0|1 Displays information indicating packets that are momentarily queued.

For generation 4 linecards only

Page 132: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

MDS Command ReferenceMDS 9700 / MDS 9396S

Command Function

show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] Displays basic interface information including Tx/Rx credit values agreed

to in the FLOGI/ELP exchange and Tx/Rx credits remaining. Displays one

or more interfaces.

show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters Displays more counters pertaining to the interface including:

Timeout-discards, credit-loss, Tx/Rx transitions to zero, Tx/Rx credits

remaining, Txwait in 2.5us, average Txwait for 1s/1m/1h/72h. Displays

one or more interfaces.

show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters details Displays more counters pertaining to the interface but regarding slow

drain, not much different than the above. This will only work for a specified

interface or range of interfaces. To display all fc interfaces use the show

interface detailed-counters command. Displays one or more interfaces.

show interface detailed-counters Displays more counters for all interfaces but regarding slow drain, not

much different than the show interface counters command.

show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss] error-stats Display OBFL error-stats. This contains many counters related to slow

drain including timeout drops, tx/rx 100ms credit-not-available, credit-loss,

force-timeout, etc. Often the first command to use.

show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss] txwait Display txwait delta values recorded when greater than 99ms per 20

second interval

Page 133: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

MDS Command ReferenceMDS 9700 / MDS 9396S - continued

Command Function

show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss]

slowport-monitor-events

Display OBFL slowport-monitor-events. This is similar to show process

creditmon slowport-monitor-events but will likely contain more than 10

events per interface

show logging onboard flow-control [starttime mm/dd/yy-hh:mm:ss]

request-timeout [module x]

Display OBFL arbitration timeouts. Note these are not packet drops. These

likely indicate the destination interface listed is congested. The source

interface will retry the arbitration request.

show hardware internal statistics [module x] [device all|fcmac] Displays statistical information for ports which include errors as well

show hardware internal statistics [module x|module-all] pktflow

dropped

Displays packet drop counters

Note: if “module x” or [module-all] is omitted then only the counters for the

supervisors are displayed. This is probably not what you want.

show hardware internal errors [module x|module-all] Displays error information for ports. Error indications include frame

drops/discards for various reasons including timeout discards.

Note: if “module x” or [module-all] is omitted then only the counters for the

supervisors are displayed. This is probably not what you want.

slot x show port-config internal link-events Linecard command to display the link-events log. This is a concise event

log for interfaces going up and down.

show process creditmon credit-loss-events [module x] Display last 10 credit loss events per interface

show process creditmon slowport-monitor-events [module x] Display last 10 slowport-monitor events per interface

System timeout slowport-monitor must be configured

Page 134: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

MDS Command ReferenceMDS 9700 / MDS 9396S - continued

Command Function

show process creditmon txwait-history [module x] [port x] Display 60 second, 60 minute, 72 hour histogram graphs

Only valid for generation 4 linecards

show system internal snmp credit-not-available Displays instances of the 100ms tx-credit-not-available

slot x show hardware internal fcmac inst [0-5 | 0-11]

tmm_timeout_stat_buffer

Displays information indicating packets dropped due to timeouts

slot x show hardware internal f16_que inst [0-5 | 0-11] table iqm-

statusmem0|1

Displays information indicating packets that are momentarily queued.

Page 135: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

MDS Command ReferenceMDS 9148

Command Function

show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] Displays basic interface information including Tx/Rx credit values agreed

to in the FLOGI/ELP exchange and Tx/Rx credits remaining. Displays one

or more interfaces.

show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters Displays more counters pertaining to the interface including:

Timeout-discards, credit-loss, Tx/Rx transitions to zero, Tx/Rx credits

remaining, Txwait in 2.5us, average Txwait for 1s/1m/1h/72h. Displays

one or more interfaces.

show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters details Displays more counters pertaining to the interface but regarding slow

drain, not much different than the above. This will only work for a specified

interface or range of interfaces. To display all fc interfaces use the show

interface detailed-counters command. Displays one or more interfaces.

show interface detailed-counters Displays more counters for all interfaces but regarding slow drain, not

much different than the show interface counters command.

show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss] error-stats Display OBFL error-stats. This contains many counters related to slow drain

including timeout drops, tx/rx 100ms credit-not-available, credit-loss, force-timeout,

etc. Often the first command to use.

show logging onboard flow-control [starttime mm/dd/yy-hh:mm:ss] request-

timeout [module x]

Display OBFL arbitration timeouts. Note these are not packet drops. These likely

indicate the destination interface listed is congested. The source interface will retry

the arbitration request.

Page 136: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

MDS Command ReferenceMDS 9148 - continued

Command Function

show hardware internal statistics all Displays statistical information for ports which include errors as well

slot 1 show hardware internal errors Displays error information for ports. Error indications include frame

drops/discards for various reasons including timeout discards.

show hardware internal packet-flow dropped Display counts of packets dropped

show hardware internal packet-dropped-reason Displays counters of packets dropped and the counter names(reasons) for

each

slot x show port-config internal link-events Linecard command to display the link-events log. This is a concise event

log for interfaces going up and down.

show process creditmon credit-loss-events [module x] Display last 10 credit loss events per interface

show system internal snmp credit-not-available Displays instances of the 100ms tx-credit-not-available

Page 137: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

MDS Command ReferenceMDS 9250i

Command Function

show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] Displays basic interface information including Tx/Rx credit values agreed

to in the FLOGI/ELP exchange and Tx/Rx credits remaining. Displays one

or more interfaces.

show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters Displays more counters pertaining to the interface including:

Timeout-discards, credit-loss, Tx/Rx transitions to zero, Tx/Rx credits

remaining, Txwait in 2.5us, average Txwait for 1s/1m/1h/72h. Displays

one or more interfaces.

show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters details Displays more counters pertaining to the interface but regarding slow

drain, not much different than the above. This will only work for a specified

interface or range of interfaces. To display all fc interfaces use the show

interface detailed-counters command. Displays one or more interfaces.

show interface detailed-counters Displays more counters for all interfaces but regarding slow drain, not

much different than the show interface counters command.

show interface fcx/y counters details Displays more counters pertaining to the interface but regarding slow drain, not

much different than the above.

show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss] error-stats Display OBFL error-stats. This contains many counters related to slow drain

including timeout drops, tx/rx 100ms credit-not-available, credit-loss, force-timeout,

etc. Often the first command to use. This command requires an single interface or

interface range.

show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss] txwait Display txwait delta values recorded when greater than 99ms per 20 second interval

Page 138: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

MDS Command ReferenceMDS 9250i - continued

Command Function

show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss]

slowport-monitor-events

Display OBFL slowport-monitor-events. This is similar to show process

creditmon slowport-monitor-events but will likely contain more than 10

events per interface

show logging onboard flow-control [starttime mm/dd/yy-hh:mm:ss]

request-timeout [module x]

Display OBFL arbitration timeouts. Note these are not packet drops. These

likely indicate the destination interface listed is congested. The source

interface will retry the arbitration request.

slot 1 show hardware internal statistics device all Displays statistical information for ports which include errors as well

slot 1 show hardware internal errors Displays error information for ports. Error indications include frame

drops/discards for various reasons including timeout discards.

slot 1 show port-config internal link-events Linecard command to display the link-events log. This is a concise event log

for interfaces going up and down.

show process creditmon credit-loss-events [module x] Display last 10 credit loss events per interface

show process creditmon slowport-monitor-events Display last 10 slowport-monitor events per interface

System timeout slowport-monitor must be configured

show process creditmon txwait-history [module x] [port x] Display 60 second, 60 minute, 72 hour histogram graphs

Only valid for generation 4 linecards

show system internal snmp credit-not-available Displays instances of the 100ms tx-credit-not-available

show hardware internal packet-flow dropped Display counts of packets dropped

Page 139: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

MDS Command ReferenceMDS 9148S

Command Function

show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] Displays basic interface information including Tx/Rx credit values agreed to

in the FLOGI/ELP exchange and Tx/Rx credits remaining. Displays one or

more interfaces.

show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters Displays more counters pertaining to the interface including:

Timeout-discards, credit-loss, Tx/Rx transitions to zero, Tx/Rx credits

remaining, Txwait in 2.5us, average Txwait for 1s/1m/1h/72h. Displays one

or more interfaces.

show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters details Displays more counters pertaining to the interface but regarding slow drain,

not much different than the above. This will only work for a specified

interface or range of interfaces. To display all fc interfaces use the show

interface detailed-counters command. Displays one or more interfaces.

show interface detailed-counters Displays more counters for all interfaces but regarding slow drain, not

much different than the show interface counters command.

show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss]

error-stats

Display OBFL error-stats. This contains many counters related to slow

drain including timeout drops, tx/rx 100ms credit-not-available, credit-loss,

force-timeout, etc. Often the first command to use.

show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss]

txwait

Display txwait delta values recorded when greater than 99ms per 20

second interval

Page 140: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

MDS Command ReferenceMDS 9148S - continued

Command Function

show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss]

slowport-monitor-events

Display OBFL slowport-monitor-events. This is similar to show process

creditmon slowport-monitor-events but will likely contain more than 10

events per interface

show logging onboard flow-control [starttime mm/dd/yy-hh:mm:ss]

request-timeout [module x]

Display OBFL arbitration timeouts. Note these are not packet drops. These

likely indicate the destination interface listed is congested. The source

interface will retry the arbitration request.

slot 1 show hardware internal statistics device all Displays statistical information for ports which include errors as well

slot 1 show hardware internal errors Displays error information for ports. Error indications include frame

drops/discards for various reasons including timeout discards.

slot 1 show port-config internal link-events Linecard command to display the link-events log. This is a concise event

log for interfaces going up and down.

show process creditmon credit-loss-events [module x] Display last 10 credit loss events per interface

show process creditmon slowport-monitor-events Display last 10 slowport-monitor events per interface

System timeout slowport-monitor must be configured

show process creditmon txwait-history [module x] [port x] Display 60 second, 60 minute, 72 hour histogram graphs

Only valid for generation 4 linecards

show system internal snmp credit-not-available Displays instances of the 100ms tx-credit-not-available

show hardware internal packet-flow dropped Display counts of packets dropped

Page 141: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow drain counters and descriptions

For the following MDS switches:9500 – Gen2/3/4 linecards

9700 – Gen 5 linecard

9148 – 8G 48 port Fabric switch

9250i – Multiservice Fabric Switch

9148S – 16G 48 port Fabric switch

9396S – 16G 96 port fabric switch

Table 1 – Counters indicating delay only

Table 2 – Counters indicating frame drops

Table 3 – Counters indicating action on or for an interface

Table 4 – Counters representing interrupts

Table 5 – SNMP variables

Superscripts indicate linecard generation or switch type. See the

list located after Table 5.

Page 142: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow drain counters and descriptions

Counter Name Description Commands Additional Info

FCP_CNTR_RCM_CH0_LACK_OF_CREDIT2

AK_FCP_CNTR_RCM_CH0_LACK_OF_CREDIT3

THB_RCM_RCP0_RBBZ_CH04,note1

F16_RCM_RCP0_RBBZ_CH05

FCP_CNTR_RCM_RBBZ_CH0 48

VIP_RCM_RBBZ_CH0_CNT 50i, 48S

Total count of transitions to zero for Rx B2B credits on ch0; these

transitions typically indicate that the switch is applying back pressure to

the attached device because of perceived congestion, and this perceived

congestion can be the result of a lack of Tx B2B credits being returned

on an interface over which this device is communicating

There is no indication of time at zero for this counter. It could stay at zero

for just an instant or for an extended duration of time.

Also shown in the output of show interface counters:

xxxx receive B2B credit transitions from zero

or

xxxx Receive B2B credit transitions to zero

In the above “from” was changed to “to” via this bug:

CSCug35184 show interface counters - transitions of rx BB credit to

zero state

Sup:

show hardware internal statistics all2,3,4,48,note2

show hardware internal statistics device all5,note2

None48s,50i,note3

Linecard:

slot x show hardware internal statistics2,3,48

slot x show hardware internal fc-mac port x error-statistic2,3

slot x show hardware internal statistics device fcmac all4

slot x show hardware internal statistics device

fcmac|all5,48s,50i

Note1: CSCts28865 B2B credit 0

transitions incorrect for

generation 4 linecards

Integrated in NX-OS 5.2(2)

Note2: CSCut21070 show

hardware internal statistics sup

command does not include fcmac

Note3: CSCus85931 Need

show hardware internal errors

command on MDS 9250i, 9148,

9148S

Table 1 - Counters indicating delay only

Table 1 - Counters indicating delay only

Page 143: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow drain counters and descriptions

FCP_CNTR_QMM_CH0_LACK_OF_TRANSMIT_CREDIT2

AK_FCP_CNTR_QMM_CH0_LACK_OF_TRANSMIT_CREDIT3

THB_TMM_PORT_TBBZ_CH04,note1

F16_RCM_RCP0_TBBZ_CH05

FCP_CNTR_TMM_TBBZ_CH048

FCP_CNTR_TMM_TBBZ_CH148

VIP_TMM_TBBZ_CH0_CNT50i, 48S

VIP_TMM_TBBZ_CH1_CNT50i, 48S

Total count of transitions to zero for Tx B2B credits on ch0 or ch1; these

transitions are typically the result of the attached device's withholding of

R_Rdy primitive from the switch due to congestion in that device.

There is no indication of time at zero for this counter. It could stay at zero for

just an instant or for an extended duration of time.

Also shown in the output of show interface counters:

xxxx transmit B2B credit transitions from zero

or

xxxx Transmit B2B credit transitions to zero

In the above “from” was changed to “to” via this bug:

CSCug35184 show interface counters - transitions of rx BB credit to zero

state

Sup:

show hardware internal statistics all2,3,4,48

show hardware internal statistics device all5

None48s,50i,note3

Linecard:

slot x show hardware internal statistics2,3,48

slot x show hardware internal fc-mac port x error-statistic2,3

slot x show hardware internal statistics device fcmac all4

slot x show hardware internal statistics device

fcmac|all5,48s,50i

Note1: CSCts28865 B2B credit 0

transitions incorrect for generation

4 linecards

Note2: CSCut21070 show

hardware internal statistics sup

command does not include fcmac

Note3: CSCus85931 Need show

hardware internal errors command

on MDS 9250i, 9148, 9148S

Table 1 - continued

Table 1 - Counters indicating delay only

Page 144: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow drain counters and descriptions

None2,3

THB_TMM_PORT_TWAIT_CNT4

F16_TMM_PORT_TWAIT_CNT5

None48

VIP_TMM_TXWAIT_CH0_CNT50i, 48S

VIP_TMM_TXWAIT_CH1_CNT50i, 48S

Packet is available to send, but no credit is available;

Gen4/Gen5: increments every clock cycle (cycle = 2.353 nanoseconds 425Mhz)

9250i/9148s: Increments every clock cycle (cycle = 2ns 500MHz)

Must multiply by number of ports in port-group to get actual time.

To calculate actual time:

Twait * clock_rate * ports in port_group

Sup:note3

None5,note1

None48s,50i,note2

Linecard:

slot x show hardware internal statistics device fcmac|all5,48s,50i

Note1: CSCut21070 show

hardware internal statistics sup

command does not include fcmac

Note2: CSCus85931 Need show

hardware internal errors command

on MDS 9250i, 9148, 9148S

Note3: See Table 3 SNMP variable

fcIfTxWaitCount

Table 1 - Counters indicating delay only

Table 1 - continued

Page 145: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow drain counters and descriptions Table 1 - Counters indicating delay only

Table 1 - continued

FCP_CNTR_RX_WT_AVG_B2B_ZERO2, 48

AK_FCP_CNTR_RX_WT_AVG_B2B_ZERO3

FCP_SW_CNTR_RX_WT_AVG_B2B_ZERO4

FCP_SW_CNTR_RX_WT_AVG_B2B_ZERO48,note1

FCP_SW_CNTR_RX_WT_AVG_B2B_ZERO5,50i, note2

FCP_SW_CNTR_RX_WT_AVG_B2B_ZERO48S

Count of the number of times an interface was at zero Rx B2B credits for 100

ms; this status typically indicates that the switch is withholding R_Rdy

primitive to the device attached on that interface due to congestion in the

path to devices with which it is communicating

Always incremented by the software creditmon process.

OBFL:

Show logging onboard error-stats2,3,4,5,48,48s,50i

Sup Hardware internal errors:

show hardware internal errors all|module x2,3,4

None5,note5

None48s,50i,note4

Sup Hardware internal statistics:

show hardware internal statistics all2,3,48

None4,5,note3

None48,48s,50i,note4

Linecard Hardware internal statistics:

slot x show hardware internal statistics2,3,48

slot xshow hardware internal fc-mac port x error-statistic2,3

slot xshow hardware internal errors4,5,48,48s,50i

slot x show hardware internal statistics device fcmac all port x4,5

Note 1: MDS 9148 added support

for this counter in

NX-OS 5.2(6)

Note2: Gen5 and 9250i do not

increment 6.2(1), 6.2(5) and 6.2(7).

CSCui27981

FCP_SW_CNTR_RX_WT_AVG_B2B_

ZERO not incrementing on DS-

X9448-768K9

Integrated in: 6.2(9)

Note3: CSCut21070 show

hardware internal statistics sup

command does not include fcmac

Integrated in: open

Note4: CSCus85931 Need show

hardware internal errors & stats

cmds on MDS 9250i 9148 9148S

Integrated in: open

Page 146: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow drain counters and descriptions Table 1 - Counters indicating delay only

Table 1 - continued

FCP_CNTR_TX_WT_AVG_B2B_ZERO2

AK_FCP_CNTR_TX_WT_AVG_B2B_ZERO3

FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO4

FCP_CNTR_TX_WT_AVG_B2B_ZERO48,note1,note2

FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO5, 50i,48s

Count of the number of times that an interface was at zero Tx B2B credits for

100 ms. This status typically indicates congestion at the device attached on

that interface.

Incremented by the creditmon software process on MDS 9500 and 9148.

Consequently, it could indicate an interval between 100ms and 199ms.

NX-OS 6.2(1) through 6.2(7) on the 9710 and NX-OS 6.2(5) through 6.2(7) on

the 9250i this was incremented based on

F16_FCP_INTR_TMM_P_CWAIT_FORCE_TIMEOUT_L_H and

VIPER_FCP_INTR_TMM_CWAIT_AVG_LIVE_CH1_OVER_THRESHOLD_RAISING

interrupts.

Consequently, this only occurred once when the HW interrupt occurred and

not each 100ms interval like in prior instances.

MDS 9700, 9250i and 9148S: In NX-OS 6.2(9) these are once again

incremented by the software creditmon process. They will once again

increment each 100ms interval where the port remains at 0 Tx credits.

OBFL:

Show logging onboard error-stats2,3,4,5,48,48s,50i

Sup Hardware internal errors:

show hardware internal errors all|module x2,3,4

None5,note5

None48s,50i,note4

Sup Hardware internal statistics:

show hardware internal statistics all2,3,48

None4,5,note3

None48,48s,50i,note4

Linecard Hardware internal statistics:

slot x show hardware internal statistics2,3,48

slot xshow hardware internal fc-mac port x error-statistic2,3

slot xshow hardware internal errors4,5,48,48s,50i

slot x show hardware internal statistics device fcmac all port x4,5

Note 1: MDS 9148 added support

for this counter in

NX-OS 5.2(6)

Note2: CSCud93587 MDS9148

OBFL doesn't contain

FCP_CNTR_TX_WT_AVG_B2B_ZERO

Integrated in: Unresolved

Note3: CSCut21070 show

hardware internal statistics sup

command does not include fcmac

Integrated in: open

Note4: CSCus85931 Need show

hardware internal errors & stats

cmds on MDS 9250i 9148 9148S

Integrated in: open

Page 147: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow drain counters and descriptions Table 1 - Counters indicating delay only

Table 1 - continued

RI12_CP_CNT_RESEND_MSG_DROP 2,3

FAL_RI0_CP_CNT_RESEND_MSG_DROP4

These are not packet drops. Only the request resend message to arbiter was

dropped. This can be the case when the original request was finally serviced,

so the follow up message was dropped. It can indicate some minor

congestion of the egress port, so request could not be granted immediately.

This is counted against the ingress port. It probably indicates some

congestion on an egress port.

Check show logging onboard flow-control request-timeout - You might see

corresponding entries.

Sup:

show hardware internal errors all|module x

OBFL:

show logging onboard error-stats

F16_TMM_PORT_STUCK_FORCE_TIMEOUT_L_H_CNT5,note1,note3 Count of times port was at zero Tx credits for the stuck port timeout value.

NX-OS 6.2(1) through 6.2(7) on the 9710 this was used for credit loss

recovery so was set to 1s(F port)/1.5s(E port).

MDS 9700, 9250i and 9148S: In NX-OS 6.2(9) the software creditmon process

once again detects credit loss recovery and the stuck force timout is used for

“system timeout no-credit-drop”. Defaults to 500ms with no action taken(no

packets are dropped then it is reached).

Needs to be configured via:

System timeout no-credit-drop <ms> mode e|f

This counter will increment even if “system timeout no-credit-drop” is not

configured since it defaults to 500ms. If no-credit-drop is not configured then

no action is taken and it simply indicates the port was at zero Tx credits for

500ms. Note3

This is similar to the viper counter:

VIP_TMM_STK_PRT_TO_TRANSITION_CHx_CNT

Sup:

NoneNote2

Linecard:

Slot x show hardware internal statistics device all

slot x show hardware internal errors

Note1: Might falsely increment

during port flap:

CSCus70632

F16_TMM_PORT_STUCK_FORCE_TI

MEOUT_L_H_CNT increments

during port flap

Note2: CSCut21070 show

hardware internal statistics sup

command does not include fcmac

Integrated in: open

Note3: CSCut27271 Stuck port

threshold not reset to default when

removing no-credit-drop

Integrated in: NX-OS 6.2(13)

Page 148: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow drain counters and descriptions Table 1 - Counters indicating delay only

Table 1 - continued

F16_TMM_PORT_CWAIT_FORCE_TIMEOUT_H_L_CNT5 Count of times a credit was received after the slow port timeout threshold had

been triggered.

Sup:

NoneNote1

Linecard:

Slot 1 show hardware internal statistics device all|fcmac

slot x show hardware internal errors

Note1: CSCut21070 show

hardware internal statistics sup

command does not include fcmac

Integrated in: open

Page 149: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow drain counters and descriptions Table 1 - Counters indicating delay only

Table 1 - continued

VIP_TMM_STK_PRT_TO_TRANSITION_CH0_CNT48S,50i,note2

VIP_TMM_STK_PRT_TO_TRANSITION_CH1_CNT48S,50i,note2

Count of times port was at zero Tx credits for the stuck port timeout value.

- channel 0 (high priority queue)

- channel 1 (low priority queue)

NX-OS 6.2(5) through 6.2(7) on the 9250i this was used for credit loss

recovery so was set to 1s(F port)/1.5s(E port).

In NX-OS 6.2(9) the software creditmon process once again detects credit loss

recovery and the stuck force timout is used for “system timeout no-credit-

drop”. Defaults to 500ms with no action taken (no packets are dropped then it

is reached).

Needs to be configured via:

System timeout no-credit-drop <ms> mode e|f

This counter will increment even if “system timeout no-credit-drop” is not

configured since it defaults to 500ms. If no-credit-drop is not configured then

no action is taken and it simply indicates the port was at zero Tx credits for

500ms. Note3

This is similar to the Gen5 counter

F16_TMM_PORT_STUCK_FORCE_TIMEOUT_L_H_CNT

Sup:

NoneNote1

Linecard:

slot 1 show hardware internal statistics device all|fcmac

slot 1 show hardware internal errors

Note1: CSCus85931 Need show

hardware internal errors & stats

cmds on MDS 9250i 9148 9148S

Integrated in: open

Note2: CSCut27271Stuck port

threshold not reset to default when

removing no-credit-drop

Integrated in: open

Page 150: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow drain counters and descriptions Table 1 - Counters indicating delay only

Table 1 - end

VIP_TMM_SLO_PRT_TO_TRANSITION_CH0_CNT48S,50i

VIP_TMM_SLO_PRT_TO_TRANSITION_CH1_CNT48S,50i

Count of times port was at zero Tx credits for the slow port timeout value.

- channel 0 (high priority queue)

- channel 1 (low priority queue)

NX-OS 6.2(5) through 6.2(7) on the 9250i this was used to increment the

FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO counter.

Consequently, this only occurred once when the HW interrupt occurred and

not each 100ms interval like in prior instances.

In NX-OS 6.2(9) this is used for the slowport-monitor feature.

Needs to be configured via:

System timeout slowport-monitor <ms> mode e|f

Slowport-monitor events can be displayed via:

show process creditmon slowport-monitor-events

show logging onboard slowport-monitor-events

This is similar to the Gen5 counter;

F16_FCP_INTR_TMM_P_CWAIT_FORCE_TIMEOUT_L_H

Sup:

NoneNote1

Linecard:

show hardware internal statistics device all|fcmac

Note1: CSCus85931 Need show

hardware internal errors & stats

cmds on MDS 9250i 9148 9148S

Integrated in: open

Page 151: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow drain counters and descriptions Table 2 - Counters indicating frame drops

Table 2

Counter Name Description Commands Additional Info

None2

None3

THB_TMM_PORT_FRM_DROP_CNT4 ,note 1

F16_TMM_PORT_FRM_DROP_CNT5

FCP_CNTR_TMM_NORMAL_DROP48

VIP_TMM_NORMAL_DROP_CNT50i, 48S, note2

VIP_TMM_TOTAL_DROP_ CNT50i, 48S, note2

Number of frames dropped in tolb_path or np path by the Transmit Memory

Manager(TMM); these drops include all types of packet drops: timeout,

offline, abort drops, dummy frame drops at egress, etc.

These counters are the aggregate counters for all the underlying counters.

OBFL:

Show logging onboard error-stats2,3,4,5,48,48s,50i

Sup Hardware internal errors/statistics:

show hardware internal errors all|module x2,3,4

show hardware internal statistics all48

None48s,50i,note3

Sup packet-dropped-reason:

show hardware internal packet-dropped-reason mod 48,48S,50i

Linecard Hardware internal errors

show hardware internal fc-mac port x error-statistic2,3

show hardware internal errors4,5,48,48s,50i

Note 1: Bugs:

CSCud77292 Gen 4 linecards do

not increment output discards on

interface statistics

Integrated into NX-OS 5.2(8c) and

6.2(1)

Note 2: It is not normal for drops to

occur so this counter's name is

misleading. The following bug

renamed

VIP_TMM_NORMAL_DROP_CNT to

VIP_TMM_TOTAL_DROP_CNT since

the drops included in this counter

are not necessarily normal.

CSCus60322 Add

VIP_TMM_TO_CNT and

VIP_TMM_TO_DROP_CNT to

packet-flow dropped

Integrated in NX-OS 6.2(13)

Page 152: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow drain counters and descriptions Table 2 - Counters indicating frame drops

Table 2 - continued

FCP_CNTR_LAF_TOTAL_TIMEOUT_FRAMES2

AK_FCP_CNTR_LAF_TOTAL_TIMEOUT_FRAMES3

THB_TMM_TOLB_TIMEOUT_DROP_CNT4

F16_TMM_TOLB_TIMEOUT_DROP_CNT5

FCP_CNTR_TMM_TIMEOUT_DROP48

VIP_TMM_TO_DROP_CNT50i, 48S,note2,note3

Timeout drops at egress due to frames hitting the congestion drop threshold

Congestion drop threshold is set via the following command and is on at 500ms

by default on all “modes” (port types):

system timeout congestion-drop mode e|f

OBFL:

Show logging onboard error-stats2,3,4,5,48,48s,50i

Show logging onboard flow-control timeout-drops 2,3,4,5,48,48s,50i,note1

Sup Hardware internal errors

show hardware internal errors all|module x2,3,4

show hardware internal statistics all48

show hardware internal statistics all5,note4

None48s,50i,note5

Sup packet-dropped-reason

show hardware internal packet-dropped-reason mod 2,3,48,48S,50i

Linecard Hardware internal errors

slot x show hardware internal statistics2,3,48

show hardware internal fc-mac port x error-statistic2,3

show hardware internal errors2,3,4,5,48,48s,50i

show hardware internal statistics device fcmac all4

show hardware internal statistics device all|fcmac2,3,5,48,48s,50i

Note1: These do not appear until

NX-OS 6.2(9)

Note2: These are included in

VIP_TMM_TO_CNT.

Note3: Should be included in `show

hardware internal statistics pktflow

dropped`:

CSCus60322 Add

VIP_TMM_TO_CNT and

VIP_TMM_TO_DROP_CNT to

packet-flow dropped

Integrated into NX-OS 6.2(13)

Note4: CSCut21070 show

hardware internal statistics sup

command does not include fcmac

Integrated in: open

Note5: CSCus85931 Need show

hardware internal errors & stats

cmds on MDS 9250i 9148 9148S

Integrated in: open

Page 153: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow drain counters and descriptions Table 2 - Counters indicating frame drops

Table 2 - continued

THB_TMM_TIMEOUT_STATS_DROP4

F16_TMM_TIMEOUT_STATS_DROP5

Timeout stats dropped because stats fifo full

These counters are not real drops. Basically what I have

understood from F16/TBIRD ASIC is that, there is TIMEOUT

STATS FIFO available at TMM. This FIFO holds, packets which

are timed out. If the FIFO is full and not read, newly timed

out packets will not be overwritten in to the FIFO and new

time-out packets are counted by TIMEOUT_STATS_DROP.

It is TBIRD/F16 feature. Viper does not have this feature.

Gen2/Gen3 also do not have this feature.

FCP_CNTR_LAF_C3_TIMEOUT_FRAMES_DISCARD2

AK_FCP_CNTR_LAF_C3_TIMEOUT_FRAMES_DISCARD3

THB_TMM_TO_CNT_CLASS_34

F16_TMM_TO_CNT_CLASS_35

None48,50i, 48S

Count of class-3 Fibre Channel frames dropped as a result

of congestion-drop timeout

OBFL:

Show logging onboard error-stats2,3,4,5

Sup Hardware internal errors

show hardware internal errors all|module x2,3

show hardware internal statistics device

fcmac|all4,5,note1

Linecard Hardware internal errors

show hardware internal fc-mac port x error-

statistic2,3

show hardware internal statistics2,3

show hardware internal statistics device fcmac all4

show hardware internal statistics device fcmac5

show hardware internal errors2,3

Note1: CSCut21070

show hardware internal

statistics sup command

does not include fcmac

Integrated in: open

Page 154: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow drain counters and descriptions Table 2 - Counters indicating frame drops

Table 2 - continued

FCP_CNTR_LAF_CF_TIMEOUT_FRAMES_DISCARD2

AK_FCP_CNTR_LAF_CF_TIMEOUT_FRAMES_DISCARD3

THB_TMM_TO_CNT_CLASS_F4

F16_TMM_TO_CNT_CLASS_F5

None48,48s,50i

Count of class-F Fibre Channel frames dropped due to congestion-drop

timeout

OBFL:

Show logging onboard error-stats2,3,4,5

Sup Hardware internal errors

show hardware internal errors all|module x2,3

show hardware internal statistics device fcmac|all4,5,note1

Linecard Hardware internal errors

show hardware internal fc-mac port x error-statistic2,3

show hardware internal statistics2,3

show hardware internal statistics device fcmac all4

show hardware internal statistics device fcmac5

show hardware internal errors2,3

Note1: CSCut21070 show

hardware internal statistics sup

command does not include fcmac

Integrated in: open

F16_TMM_PORT_STUCK_FORCE_TIMEOUT_CNT5

VIP_TMM_STUCK_PORT_TO_CNT48S,50i

Total number of frames force timeout dropped by Stuck port processing(no-

credit-drop)

Gen2/3/4 /9148 do not have a counter for this. Any frames dropped as a result

of no-credit-drop on these are just counted as timeout discards.

Sup:

None 5,note1

None 48S,50i,note2

Linecard:

show hardware internal statistics device all|fcmac

Note1: CSCut21070 show

hardware internal statistics sup

command does not include fcmac

Integrated in: open

Note2: CSCus85931 Need show

hardware internal errors & stats

cmds on MDS 9250i 9148 9148S

Integrated in: open

Page 155: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow drain counters and descriptions Table 2 - Counters indicating frame drops

Table 2 - end

FCP_CNTR_TMM_TIMEOUT48

VIP_TMM_TO_CNT 48S,50i,note1

Total number of Timeout drops counter which includes

frames timed out due to congestion(pkt timeout), HW stuck force timeout,

HW slow port force timeout.

Sup hardware internal statistics:

show hardware internal statistics all48

None 48S,50i,note2

Sup packet-dropped-reason

show hardware internal packet-dropped-reason mod 48,48S,50i

Linecard hardware internal errors/statistics:

show hardware internal errors48,48s,50i

show hardware internal statistics48

show hardware internal statistics device all|fcmac48s,50i

OBFL:

Show logging onboard error-stats48,48s,50i

Note1: Should be included in `show

hardware internal statistics pktflow

dropped`. See:

CSCus60322 Add

VIP_TMM_TO_CNT and

VIP_TMM_TO_DROP_CNT to

packet-flow dropped

Integrated into 6.2(13)

Note2: CSCus85931 Need show

hardware internal errors & stats

cmds on MDS 9250i 9148 9148S

Integrated in: open

F16_TMM_PORT_CWAIT_FORCE_TIMEOUT_CNT5

VIP_TMM_SLOW_PORT_TO_CNT 48S,50i

Total timeout packets dropped due to slow-port-monitor processing. This

doesn’t increment since the slow-port-monitor feature doesn’t include a

packet drop function

None. Not implemented.

Page 156: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow drain counters and descriptions Table 3 - Counters indicating an action on or for an interface

Table 3

Counter Name Description Commands Additional Info

FCP_CNTR_CREDIT_LOSS2 ,48

AK_FCP_CNTR_CREDIT_LOSS3

FCP_SW_CNTR_CREDIT_LOSS4,5 ,48s,50i

Count of the number of times that creditmon credit loss recovery has been

invoked on a port

OBFL:

Show logging onboard error-stats2,3,4,5,48,48s,50i

Sup Hardware internal errors/statistics:

show hardware internal errors all|module x2,3,4

None4,5,note1

show hardware internal statistics all48

None48s,50i,note2

Linecard hardware internal errors/statistics

show hardware internal fc-mac port x error-statistic2,3

show hardware internal statistics2,3,48

show hardware internal statistics device fcmac all4

show hardware internal statistics device fcmac|all5,48,48S,50i

show hardware internal errors2,3,4,48,48S,50i

Note1: CSCut21070 show

hardware internal statistics sup

command does not include fcmac

Integrated in: open

Note2: CSCus85931 Need show

hardware internal errors & stats

cmds on MDS 9250i 9148 9148S

Integrated in: open

Page 157: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow drain counters and descriptions Table 3 - Counters indicating an action on or for an interface

Table 3 - continued

FCP_CNTR_FORCE_TIMEOUT_ON2 ,48

AK_FCP_CNTR_FORCE_TIMEOUT_ON3

FCP_SW_CNTR_FORCE_TIMEOUT_ON4

FCP_SW_CNTR_FORCE_TIMEOUT_ON5,50i,48s,note2

Count of the number of times the "system timeout no-credit-drop threshold"

has been reached by this port; when a port is at zero Tx B2B credits for the time

specified, the port starts to drop packets at line rate

Note 1: For the 9700 and 9250i these counters will only increment prior to the

introduction of the HW slow drain feature in 6.2(9). Since the 9148S was first

supported in NX-OS 6.2(9) it will never have it increment. Reference the

F16_TMM_PORT_STUCK_FORCE_TIMEOUT_L_H_CNT5 and

VIP_TMM_STK_PRT_TO_TRANSITION_CH0_CNT48S,50i counters which indicate

the same thing (but are not in OBFL). Checking on whether these should be re-

added.

See the following counters to determine frames dropped due to force

timeout:

F16_TMM_PORT_STUCK_FORCE_TIMEOUT_CNT5

VIP_TMM_STUCK_PORT_TO_CNT48S,50i

OBFL:

Show logging onboard error-stats2,3,4,5,48,48s,50i

Sup hardware internal errors/statistics

show hardware internal errors all|module x2,3,4

show hardware internal statistics all48

None48s,50i,note2

Linecard hardware internal errors/statistics:

show hardware internal fc-mac port x error-statistic2,3

show hardware internal statistics2,3,48

show hardware internal statistics device fcmac all4

show hardware internal statistics device fcmac|all48

show hardware internal errors2,3,4,48

None 5,48S,50i,note3

Note2: CSCus85931 Need show

hardware internal errors & stats

cmds on MDS 9250i 9148 9148S

Integrated in: open

Note3:

After NX-OS 6.2(9) the following

counters are not incrementing on

MDS 9700, 9250i, 9148S:

CSCus93140 no-credit-drop SW

counters not incrementing on MDS

9700, 9250i, 9148S

Integrated in: open

Page 158: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow drain counters and descriptions Table 3 - Counters indicating an action on or for an interface

Table 3 - continued

FCP_CNTR_FORCE_TIMEOUT_OFF2

AK_FCP_CNTR_FORCE_TIMEOUT_OFF3

FCP_SW_CNTR_FORCE_TIMEOUT_OFF4,48

FCP_SW_CNTR_FORCE_TIMEOUT_OFF5,50i,48s,note 1

Count of the number of times that the port has recovered from the system

timeout no-credit-drop condition; this status typically means that R_Rdy

primitive has been returned or possibly that an LR and LRR has occurred.

Note 1: For the 9700 and 9250i these counters will only increment prior to the

introduction of the HW slow drain feature in 6.2(9). Since the 9148S was first

supported in NX-OS 6.2(9) they will never increment. They are re-added into

hardware internal statistics and logging onboard error-stats via the following

bug:

CSCus93140 no-credit-drop SW counters not incrementing on MDS 9700,

9250i, 9148S

F16_TMM_PORT_STUCK_FORCE_TIMEOUT_H_L_CNT5 indicates the same thing.

OBFL:

Show logging onboard error-stats2,3,4,5,48,48s,50i

Sup hardware internal errors/statistics

show hardware internal errors all|module x2,3,4

show hardware internal statistics all48

None48s,50i,note2

Linecard hardware internal errors/statistics:

show hardware internal fc-mac port x error-statistic2,3

show hardware internal statistics2,3,48

show hardware internal statistics device fcmac all4

show hardware internal statistics device fcmac|all48

show hardware internal errors2,3,4,48

None 5,48S,50i,note3

Note2: CSCus85931 Need show

hardware internal errors & stats

cmds on MDS 9250i 9148 9148S

Integrated in: open

Note3:

After NX-OS 6.2(9) the following

counters are not incrementing on

MDS 9700, 9250i, 9148S:

CSCus93140 no-credit-drop SW

counters not incrementing on MDS

9700, 9250i, 9148S

Integrated in: open

Page 159: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow drain counters and descriptions Table 3 - Counters indicating an action on or for an interface

Table 3 - continued

AK_FCP_CNTR_LINK_RESET_OUT2,3

FCP_SW_CNTR_LINK_RESET_OUT48

FCP_SW_CNTR_LINK_RESET_OUT4,5,50i,48S, note 1

Count of times a Link Credit Reset(LR) was transmitted from the interface.

Also shown in the output of “show interface counters detail” as:

xxx link reset protocol errors transmitted

Or

xxx link reset transmitted while link is active

Note the above just counts link resets that are transmitted when the link is

active.

Sup hardware internal errors/statistics

show hardware internal statistics all2,3,48

Linecard hardware internal errors/statistics

show hardware internal statistics all2,3,48

Note1: These are not incremented.

CSCus99138 Port software counters

not incrementing

Integrated in: open

AK_FCP_CNTR_LINK_RESET_IN2,3

FCP_SW_CNTR_LINK_RESET_IN48

FCP_SW_CNTR_LINK_RESET_IN4,5,50i,48S, note 1

Count of times a Link Credit Reset(LR) was received on the interface.

Also shown in the output of “show interface counters detail” as:

xxx link reset protocol errors received

Or

xxx link reset received while link is active

Note the above just counts link resets that are received when the link is active.

Reference AK_FCP_CNTR_LINK_RESET_OUT above.

Show logging onboard interrupt-stats will show

IP_FCMAC_INTR_PRIM_RX_SEQ_LR

Reference Additional Info in

AK_FCP_CNTR_LINK_RESET_OUT

above.

AK_FCP_CNTR_LRR_OUT 2,3

FCP_SW_CNTR_LRR_OUT4,5,48,50i,48S, note 1

Count of times a Link Credit Reset Response(LRR) was transmitted from the

interface.

Reference AK_FCP_CNTR_LINK_RESET_OUT above. Reference Additional Info in

AK_FCP_CNTR_LINK_RESET_OUT

above.

AK_FCP_CNTR_LRR_IN 2,3

FCP_SW_CNTR_LRR_IN4,5,48,50i,48S, note 1

Count of times a Link Credit Reset Response(LRR) was received on the interface.

Also shown using show interface fcx/y

xx input OLS,xx LRR,0 NOS,xx loop inits

Reference AK_FCP_CNTR_LINK_RESET_OUT above. Reference Additional Info in

AK_FCP_CNTR_LINK_RESET_OUT

above.

Page 160: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow drain counters and descriptions Table 3 - Counters indicating an action on or for an interface

Table 3 - end

AK_FCP_CNTR_OLS_OUT 2,3

FCP_SW_CNTR_OLS_OUT4,5,48,60i,48S, note 1

Count of times an Off Line Sequence(OLS) was transmitted from the interface.

Also shown using show interface fcx/y

xx output OLS,xx LRR, xx NOS, xx loop inits

Reference AK_FCP_CNTR_LINK_RESET_OUT above. Reference Additional Info in

AK_FCP_CNTR_LINK_RESET_OUT

above.

AK_FCP_CNTR_OLS_IN 2,3

FCP_SW_CNTR_OLS_IN4,5,48,50i,48S, note 1

Count of times an Off Line Sequence(OLS) was received on the interface.

Also shown using show interface fcx/y

xx input OLS,xx LRR,0 NOS,xx loop inits

Reference AK_FCP_CNTR_LINK_RESET_OUT above. Reference Additional Info in

AK_FCP_CNTR_LINK_RESET_OUT

above.

AK_FCP_CNTR_NOS_OUT 2,3

FCP_SW_CNTR_NOS_OUT4,5,48,50i,48S, note 1

Count of times an Not Operational Sequence(NOS) was transmitted from the

interface.

Also shown using show interface fcx/y

xx output OLS,xx LRR, xx NOS, xx loop inits

Reference AK_FCP_CNTR_LINK_RESET_OUT above. Reference Additional Info in

AK_FCP_CNTR_LINK_RESET_OUT

above.

AK_FCP_CNTR_NOS_IN 2,3

FCP_SW_CNTR_NOS_IN4,5,48,50i,48S, note 1

Count of times an Not Operational Sequence(NOS) was received on the

interface.

Also shown using show interface fcx/y

xx input OLS,xx LRR,0 NOS,xx loop inits

Reference AK_FCP_CNTR_LINK_RESET_OUT above. Reference Additional Info in

AK_FCP_CNTR_LINK_RESET_OUT

above.

AK_FCP_CNTR_LRR_OUT 2,3

FCP_SW_CNTR_LRR_OUT4,5,48,50i,48S, note 1

Count of times a Link Credit Reset Response(LRR) was transmitted from the

interface.

Also shown using show interface fcx/y

xx output OLS,xx LRR, xx NOS, xx loop inits

Reference AK_FCP_CNTR_LINK_RESET_OUT above. Reference Additional Info in

AK_FCP_CNTR_LINK_RESET_OUT

above.

Page 161: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow drain counters and descriptions Table 4 – Interrupt counters

Table 4

Counter Name Description Commands Additional Info

F16_FCP_INTR_TMM_P_CWAIT_FORCE_TIMEOUT_L_H5 Slowport condition detected count (Low to High transition: i.e. credit wait

(cwait) > threshold)

NX-OS 6.2(1) through 6.2(7) - Count of times port was at zero Tx credits

for 100ms. Only increments on the initial 100ms interval. . In these “pre-

slowport-monitor” releases this counter was used to trigger the

FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO counter increment.

NX-OS 6.2(9) and later – Should not occur.

OBFL:

show logging onboard interrupt-stats

Linecard:

Slot x show hardware internal fcmac port x interrupt-counts

F16_FCP_INTR_TMM_P_CWAIT_FORCE_TIMEOUT_H_L5 Slowport condition exited count (High to Low transition: ie creditwait

(cwait) < threshold)

NX-OS 6.2(1) through 6.2(7) - Count of times port received a credit after

being at zero Tx credits for 100ms or longer. In these “pre-slowport-

monitor” releases this counter was used to re-arm the

FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO counter increment.

NX-OS 6.2(9) and later – Should not occur.

OBFL:

show logging onboard interrupt-stats

Linecard:

Slot x show hardware internal fcmac port x interrupt-counts

Page 162: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow drain counters and descriptions Table 4 – Interrupt counters

Table 4 - continued

F16_FCP_INTR_TMM_P_STUCK_FORCE_TIMEOUT_L_H5 Stuck port condition detected count (Low to High transition.

Configured via:

“system timeout slowport-monitor xxx mode|f”

Defaults to 500ms with no action.

OBFL:

show logging onboard interrupt-stats

Linecard:

Slot x show hardware internal fcmac port x interrupt-counts

F16_FCP_INTR_TMM_P_STUCK_FORCE_TIMEOUT_H_L5 Count of times stuck port condition exited. OBFL:

show logging onboard interrupt-stats

Linecard:

Slot x show hardware internal fcmac port x interrupt-counts

VIPER_FCP_INTR_TMM_CWAIT_AVG_LIVE_CH0_OVER_THRESHOLD_RAISING48s,

50i

VIPER_FCP_INTR_TMM_CWAIT_AVG_LIVE_CH1_OVER_THRESHOLD_RAISING48s,

50i

Slowport condition detected count (Low to High transition: i.e. credit wait

(cwait) > threshold)

NX-OS 6.2(5) through 6.2(7) - Count of times port was at zero Tx credits

for 100ms. Only increments on the initial 100ms interval. . In these “pre-

slowport-monitor” releases this counter was used to trigger the

FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO counter increment.

NX-OS 6.2(9) and later – Should not occur.

OBFL:

show logging onboard interrupt-stats

Linecard:

Slot x show hardware internal fcmac port x interrupt-counts

Note: VIPER does not have a High to

Low interrupt like F16.

Page 163: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow drain counters and descriptions Table 4 – Interrupt counters

Table 4 - continued

VIPER_FCP_INTR_TMM_CWAIT_AVG_LIVE_CH0_OVER_THRESHOLD_FALLING48s,50i

VIPER_FCP_INTR_TMM_CWAIT_AVG_LIVE_CH1_OVER_THRESHOLD_FALLING48s,50i

Slowport condition detected count exited. OBFL:

show logging onboard interrupt-stats

Linecard:

Slot x show hardware internal fcmac port x interrupt-counts

Note: These are displayed in OBFL

with the VIPER_FCP_INTR_ prefix but

without the prefix in other places.

VIPER_FCP_INTR_TMM_STUCK_PORT_TIMEOUT_CH0_RAISING

VIPER_FCP_INTR_TMM_STUCK_PORT_TIMEOUT_CH1_RAISING

Count of times port was at zero Tx credits for the stuck port timeout

value ”no-credit-drop” (default value 500ms).

NX-OS 6.2(5) through 6.2(7) - Count of times port was at zero Tx credits

for 1s(F port) or 1.5s(E port). In these “pre-slowport-monitor” releases

this interrupt was used to trigger the FCP_SW_CNTR_CREDIT_LOSS

counter increment.

NX-OS 6.2(9) and later – Should not occur.

OBFL:

show logging onboard interrupt-stats

Linecard:

Slot x show hardware internal fcmac port x interrupt-counts

VIPER_FCP_INTR_TMM_STUCK_PORT_TIMEOUT_CH0_FALLING

VIPER_FCP_INTR_TMM_STUCK_PORT_TIMEOUT_CH1_FALLING

Count of times stuck port condition exited. OBFL:

show logging onboard interrupt-stats

Linecard:

Slot x show hardware internal fcmac port x interrupt-counts

Page 164: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow drain counters and descriptions Table 4 – Interrupt counters

Table 4 - continued

IP_FCMAC_INTR_PRIM_RX_SEQ_NOS Not Operational Sequence received on the interface.

NOS is a sequence that is transmitted continuously until a OLS is received.

OBFL:

show logging onboard interrupt-stats

Linecard:

show hardware internal fc-mac port 1 interrupt-counts2,3,48

Slot x show hardware internal fcmac port x interrupt-counts4,5,48s,50i

show interface fcx/y counters details

show interface detailed-counters

xxx non-operational sequences

received

IP_FCMAC_INTR_PRIM_RX_SEQ_OLS Off Line Sequence received on the interface.

OLS is a sequence that is transmitted continuously until a LR is received.

OBFL:

show logging onboard interrupt-stats

Linecard:

show hardware internal fc-mac port 1 interrupt-counts2,3,48

Slot x show hardware internal fcmac port x interrupt-counts4,5,48s,50i

show interface fcx/y counters details

show interface detailed-counters

xxx Offline Sequence errors received

IP_FCMAC_INTR_PRIM_RX_SEQ_LR Link Reset received on the interface. LR is sent under two conditions

normally:

1) Link bringup – NPS/OLS/LR/LRR

2) Credit Loss Recovery – LR is sent to bring each side up to its full

complement of B2B credits. This doesn’t bounce or flap the link

but just restore the B2B credits.

OBFL:

show logging onboard interrupt-stats

Linecard:

show hardware internal fc-mac port 1 interrupt-counts2,3,48

Slot x show hardware internal fcmac port x interrupt-counts4,5,48s,50i

show interface fcx/y counters details

show interface detailed-counters

xxx link reset protocol errors received

Page 165: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow drain counters and descriptions Table 4 – Interrupt counters

Table 4 - end

IP_FCMAC_INTR_PRIM_RX_SEQ_LRR Link Reset received on the interface. This is sent in response to a Link Reset. OBFL:

show logging onboard interrupt-stats

Linecard:

show hardware internal fc-mac port 1 interrupt-counts2,3,48

Slot x show hardware internal fcmac port x interrupt-counts4,5,48s,50i

show interface fcx/y counters details

show interface detailed-counters

xxx link reset responses received

Page 166: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow drain counters and descriptions Table 5 - SNMP variables applicable to slow drain

Table 5

Counter Name

n

Description Commands Additional Info

fcIfTxWaitCount 2,3,48 ,see note 1

fcIfTxWaitCount 4,note2,note3

fcIfTxWaitCount 5,note 3

fcIfTxWaitCount 50i, 48S, see note 4

OID 1.3.6.1.4.1.9.9.289.1.2.1.1.15

The number of times the FC-port waited due to lack of transmit credits and there

were packets queued for transmit. This is in units of 2.5us.

To calculate seconds txwait * 2.5 /1000000

There is no OID for the Rx direction of this.

Not generated by port-monitor

Based on the following counters:

THB_TMM_PORT_TWAIT_CNT4

F16_TMM_PORT_TWAIT_CNT5

VIP_TMM_TXWAIT_CH0_CNT50i, 48S

VIP_TMM_TXWAIT_CH1_CNT50i, 48S

Displayed via:

Show interface fcx/y counters detailed | i wait

-or-

Show interface detailed-counters | i fc|wait

Example:

rtp-san-34-15-9513# show int fc4/1 counters details | i wait

82864704 waits due to lack of transmit credits

Not generated by port-monitor

Note1: On gen2, gen3 and 9148,

this will always return zero.

Note2: Added to Gen4 linecards in

NX-OS 5.2(2)

Note3: Prior to 6.2(11a) this counter

was inaccurate. See the following

bug:

CSCus15233 fcIfTxWaitCount

incorrect on DS-X9232-256K9 and

DS-X9248-768K9

Fixed in 6.2(11a)

Note4: Prior to 6.2(11a) this counter

was inaccurate. See the following

bug:

CSCus15745 fcIfTxWaitCount

incorrect for MDS 9250i and 9148S

Fixed in 6.2(11a)

fcIfCreditLoss2,3,4,5,48,50i, 48S OID 1.3.6.1.4.1.9.9.289.1.2.1.1.37

The number of link resets that have occurred due to unavailable

credits from the peer side of the link.

Generated by port-monitor counter credit-loss-reco

Shown in the output of show interface counters:

xxx timeout discards, xxx credit loss

Credit loss recovery is initiated by

the MDS after 1 second(F port) / 1.5

seconds(E port) at zero Tx credits.

Other products may initiate at

different intervals

Page 167: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow drain counters and descriptions Table 5 - SNMP variables applicable to slow drain

Table 5 - continued

fcIfLinkResetOuts 2,3,4,5,48,50i, 48S OID 1.3.6.1.4.1.9.9.289.1.2.1.1.10

The number of link reset protocol errors issued by

the FC-Port to the attached FC-Port.

Generated by port-monitor counter lr-tx

Shown in the output of show interface fcx/y counters detailed:

xxx link reset protocol errors transmitted

or

xxx link reset transmitted while link is active

fcIfLinkResetIns2,3,4,5,48,50i, 48S OID 1.3.6.1.4.1.9.9.289.1.2.1.1.9

The number of link reset protocol errors received by

the FC-Port from the attached FC-port

Generated by port-monitor counter lr-rx

Shown in the output of show interface fcx/y counters detailed:

xxx link reset protocol errors received

or

xxx link reset received while link is active

fcIfTimeOutDiscards2,3,4,5,48,50i, 48S OID 1.3.6.1.4.1.9.9.289.1.2.1.1.35

The number of packets that are dropped due to time-out at the FC-port or due to the FC-port going

offline.

Generated by port-monitor counter timeout-discards

Shown in the output of show interface counters:

xxx timeout discards, xxx credit loss

Page 168: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow drain counters and descriptions Table 5 - SNMP variables applicable to slow drain

Table 5 - continued

fcIfOutDiscards2,3,4,5,48,50i, 48S OID 1.3.6.1.4.1.9.9.289.1.2.1.1.36

The total number of packets that are discarded in the egress side of the FC-port.

Generated by port-monitor counter tx-discards

Shown in the show interface fcx/y command:

xxx discards, xxx errors

fcIfTxWtAvgBBCreditTransitionToZero2,3,4,5,48,50i, 48S OID 1.3.6.1.4.1.9.9.289.1.2.1.1.38

Count of the number of times that an interface was at zero Tx B2B credits for 100 ms.

This status typically indicates congestion at the device attached on that interface.

Generated by port-monitor counter tx-credit-not-available

show system internal snmp credit-not-available

See the following hardware internal

error for more info:

xxx_CNTR_TX_WT_AVG_B2B_ZERO

Note: There is no OID in the Rx

direction.

CSCus93323 Portmonitor

fcIfTxWtAvgBBCreditTransitionToZero

truncates hcAlarmOwner

fcIfBBCreditTransistionFromZero2,3,4,5,48,50i, 48S OID 1.3.6.1.4.1.9.9.289.1.2.1.1.28

Increments when the transmit B2B credit transitions to zero

There is no indication of time at zero for this counter. It could stay at zero for just an

instant or for an extended duration of time.

Also shown in the output of show interface counters:

xxxx Transmit B2B credit transitions to zero

Not generated by port-monitor

Shown in the output of show interface counters:

xxxx Transmit B2B credit transitions to zero

Based off of the TBBZ hardware

statistic.

Page 169: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow drain counters and descriptions Table 5 - SNMP variables applicable to slow drain

Table 5 - end

fcHCIfBBCreditTransistionFromZero2,3,4,5,48,50i, 48S OID 1.3.6.1.4.1.9.9.289.1.2.1.1.40

Increments when the transmit B2B credit transitions to zero

There is no indication of time at zero for this counter. It could stay at zero for just an

instant or for an extended duration of time.

Not generated by port-monitor

Shown in the output of show interface counters:

xxxx Transmit B2B credit transitions to zero

Based off of the TBBZ hardware

statistic.

fcIfBBCreditTransistionToZero2,3,4,5,48,50i, 48S OID 1.3.6.1.4.1.9.9.289.1.2.1.1.39

Increments when the receive B2B credit transitions to zero

There is no indication of time at zero for this counter. It could stay at zero for just an

instant or for an extended duration of time.

Not generated by port-monitor

Shown in the output of show interface counters:

xxxx Receive B2B credit transitions to zero

Based off of the RBBZ hardware

statistic.

fcHCIfBBCreditTransistionToZero2,3,4,5,48,50i, 48S OID 1.3.6.1.4.1.9.9.289.1.2.1.1.41

Increments when the receive B2B credit transitions to zero

There is no indication of time at zero for this counter. It could stay at zero for just an

instant or for an extended duration of time.

Not generated by port-monitor

Shown in the output of show interface counters:

xxxx Receive B2B credit transitions to zero

Based off of the RBBZ hardware

statistic.

Page 170: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Slow drain counters and descriptions Legend - superscripts

• Superscripts:

• 1: Generation 1 modules are no longer supported by NX-OS 5.0 (and later releases) and are not covered by this presentation

• 2: Generation 2 DS-X9112, DS-X9124, and DS-X9148 and DS-X9304-18K9 modules

• 3: Generation 3 DS-X9248-48K9 and DS-X92xx-96K9 modules

• 4: Generation 4 DS-X92xx-256K9 modules

• 5: Generation 5 Cisco MDS 9710/9706 DS-X9448-768K9 module and MDS 9396S

• 48: Cisco MDS 9148

• 50i: Cisco MDS 9250i

• 48S: Cisco MDS 9148s

• Legend

• AK: Aakash (Generation 2 or Generation 3 line card MAC ASIC)

• THB: Thunderbird (Generation 4 ASIC)

• F16: F16 (Generation 5 ASIC)

• SAB: Sabre ASIC for MDS 9148

• VIP: Viper ASIC for MDS 9250i and 9148S

• RI: Request Interface

• TMM: Transmit Memory Manager

• FCP_SW: These indicate software counters

Page 171: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Complete Your Online Session Evaluation

Don’t forget: Cisco Live sessions will be available for viewing on-demand after the event at CiscoLive.com/Online

• Give us your feedback to be entered into a Daily Survey Drawing. A daily winner will receive a $750 Amazon gift card.

• Complete your session surveys though the Cisco Live mobile app or your computer on Cisco Live Connect.

Page 172: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Continue Your Education

• Demos in the Cisco Campus

• Walk-in Self-Paced Labs

• Table Topics

• Meet the Engineer 1:1 meetings

Page 173: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Thank you

Page 174: SAN Congestion! Understanding, Troubleshooting, · PDF fileSAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric Edward Mazurek Technical Lead Data Center Storage

Recommended