SAN Congestion! Understanding, Troubleshooting,
Mitigating in a Cisco FabricEdward Mazurek
Technical Lead Data Center Storage Networking
CCIE 6448
BRKSAN-3446
• Introduction
• Slow Drain Terminology
• Understanding Fibre Channel Flow Control
• MDS Slow Drain Features
• Troubleshooting Slow Drain
• Alerting and Mitigating Slow Drain
• Conclusion
Agenda
Introduction
• Slow drain is a term to describe SAN congestion
• When devices do not receive data at the line rate this can cause congestion in the SAN
• SANs are getting increasing complex and heterogeneous
• Many different speeds
• Many different types of devices
• Host/storage workloads increasing
Reasons for Slow Drain
• Edge devices - An edge device can be slow to respond for a variety of reasons:
• Server performance problems: application or OS
• Host bus adapter (HBA) problems: driver or physical failure
• Speed mismatches: one fast device and one slow device
• Nongraceful virtual machine exit on a virtualized server, resulting in packets held in HBA buffers
• Storage subsystem performance problems, including overload
• Inter Switch Links (ISL)
• Lack of B2B credits for the distance the ISL is traversing• Ex: 4 credits per KM @ 8Gbps
• The existence of slow drain edge devices
• Edge devices with faster speeds than ISLs even when port-channeled
Reasons for Slow Drain Port-channel BW not the same as individual link BW 4x4Gb not equal 16Gb
Member ISL sending at full 4Gbps rate causing congestion back to storage
Port-
channel
16Gb(total)
H1
Congestion
4 x 4Gb links
Src, dst, oxid
8GbVOQs
No B2B
Credits
No R_Rdy
Sent
0 Rx credits remaining
Tx queue full due
To slower link Congestion
8Gb
Individual exchange
traverses single ISL
SAN Congestion! BRKSAN-3446
Slow Drain Terminology
Slow Drain Terminology
• B2B – Buffer to Buffer Credits / Credits Remaining
• B2B Transitions to Zero
• Slow Ports / B2B Credit not Available
• Stuck Ports
• Credit Loss Recovery
Slow Drain Terminology
• Buffer to Buffer credits or B2B credits are the agreed upon buffer space on each side of a FC link
• Occurs on FLOGI and ACC(FLOGI)
• Occurs on ELP and ACC(ELP)
• B2B credit remaining is the count of FC frames that still can be sent by each side of a FC link
• Credits are returned by R_Rdy FC ordered set
B2B credits / Credits remaining
FLOGI
x Rx credits
y Tx creditsACC(FLOGI)
y Rx credits
x Tx credits
ELP
x Rx creditsx Tx credits
ACC(ELP)
y Rx creditsy Tx credits
End Device MDS
MDS MDS
ISL
Slow Drain Terminology
• Whenever a port hits zero credits this is counted as a “B2B transition to zero”
• Occurs and is counted in both Tx and Rx direction
• Transmit B2B transition to zero indicates attached device didn’t return credits
• Receive B2B transitions to zero indicates port didn’t return credits
• Important: Amount of time at zero credits is not easily determined
• Can occur normally
B2B Transitions to zero
Int fc1/13 Tx credits
0 remaining Tx credits
130 transmit B2B credit transitions to zero
FFC Frame
FC FrameFC Frame
Increment Tx transitions to zero
Slow Drain Terminology
• A port that is attached to a FC device that returns credits slowly
• The receiver of the FC frame does not immediately return an R_Rdymessage to the sender
• B2B Tx Credit not available is a term most often used when the MDS detects that the B2B credits are at zero for 100ms
• Called a Slow Port
Slow Port Detection
0ms
100ms
200ms
Increment Tx credit not available
0 remaining Tx credits
Int fc1/13 Tx credits E
FC FrameFC Frame
FC Frame
Increment Tx credit not available
Record slowport event (if configured)
New!
MDS Slow Drain Features
• If no-credit-drop is configured and Txcredits are at 0 for that amount of time then port is considered “stuck”.
• Start dropping frames immediately without regard to age of frames
• Any newly arriving frames are dropped immediately as long as the port remains at 0 Tx credits
• Frees up frames queued at ISLs destined for slow/stuck ports quicker
Stuck Port Detection0 sec --
Credit
Frame
Once credit arrives
resume sending
--
--
--300ms
--
--
--
No Tx credits
Frame
Frame
Frame
Frame
Frames in Rx queuefrom other ports
Drop frames in Rx queue
Drop any new arriving Frames immediately
Frame
Frame
Frame
No-credit-drop 300ms
0 Tx credits
Slow Drain Terminology
• If transmit credits are at zero for 1 second (F port) 1.5 second (E) port then it invokes credit loss recovery
• Link Reset(LR) is transmitted
• If Link Reset Response(LRR) is received then both sides are back at the full B2B credits
• If LRR is not received then link is failed
• Counter is incremented and optionally a port-monitor alert can be generated
• Link Reset is better named Link Credit Reset – Part of FC-FS - Framing and Signaling
Credit Loss Recovery 0 sec --
1/1.5 sec --
No Credits (Stuck)
Credits
LRR
Successful
LR
Port resumes normal
operation –
nondisruptive
+60ms --
Unsuccessful
0 sec --
1/1.5 sec --
No Credits (Stuck)
Credits
No Response
Shut/No Shut+60ms --
LR
SAN Congestion! BRKSAN-3446
Understanding FibreChannel Flow Control
Understanding Fibre Channel Flow Control
• Fiber Channel classes
• Fibre Channel class 3
• Fibre Channel Flow Control
• Fibre Channel Flow Control – Example
• MDS frame and credit processing
Fiber Channel classes
All data currently is transported using class 3
F
a
b
r
i
c
Class 1 X
Class 2 X
Class 3 X
Class 4 X
Class 6 X
Class F X
FC-AL X
Fibre Channel Class 3
• Class 3 is a best-effort packetized service:
• The receiving port does not acknowledge receipt of frames. If the fabric cannot deliver the frame for any reason, the frame can be discarded without notifying the sending port. However, Class 3 is not really unreliable, because it relies on ULP to help ensure that frames are delivered, by detecting and recovering from lost frames
• Class 3 does not guarantee fixed latency because data paths are variable
• Class 3 does not guarantee in-order delivery. For most Fibre Channel applications, including storage applications, the ULP is responsible for guaranteeing in-order delivery
Fibre Channel Flow Control
• Fibre Channel flow control attempts to minimize the chance of dropped frames
• Frames are only transmitted when it is known that the receiver has buffer space
• For each frame sent an R_Rdy (B2B Credit) should be returned
• R_Rdys can only be returned once the frame that has previously occupied that buffer location has been handled
• R_Rdys are not sent reliably – they can be corrupted/lost
• Each side informs the other side of the number of buffer credits it has
• F ports - In the Fabric Login(FLOGI)
• E ports – In the Exchange Link Parameters(ELP)
• Note: B2B credits are not negotiated – just agreed to
Fibre Channel Flow ControlN-Port Login
FLOGI 1 credit N-port
has one
credit!
FN
ACC (FLOGI) 3 credits
B B BB
MDS9710-A
MDS9710-A# show int fc1/14
fc1/14 is up
……….
Transmit B2B Credit is 1
Receive B2B Credit is 3
3 receive B2B credit remaining
1 transmit B2B credit remaining
1 low priority transmit B2B credit remaining
Note: These values are not typical. They are chosen for simplicity. Typical F ports values 16-32
F-Port has
three
credits!
End Device
Fibre Channel Flow Control
• As FC frames flow into the fabric, the MDS Rx buffer queue is decremented by 1 B2B credit for each received frame
• Once an R_Rdy is sent by the MDS, it frees up one B2B credit
Frame Flow Control
FN B B BB FrameFrameFrame R-Rdy B
MDS9710-A# show interface fc1/14
fc1/14 is up
……….
Transmit B2B Credit is 1
Receive B2B Credit is 3
1 receive B2B credit remaining
1 transmit B2B credit remaining
1 low priority transmit B2B credit remaining
MDS9710-A# show interface fc1/14
fc1/14 is up
……….
Transmit B2B Credit is 1
Receive B2B Credit is 3
0 receive B2B credit remaining
1 transmit B2B credit remaining
1 low priority transmit B2B credit remaining
Understanding Fibre Channel Flow Control
• Tx indicates transmit side of port
• Rx indicates receive side of port
• One side’s Tx is the adjacent side’s Rx
• Important to understand which direction the congestion is on
• Note: Increasing B2B credits does not usually increase performance
Tx and Rx Perspective
fc1/1
Transmit B2B Credit is 500
fc1/2EE
Receive B2B Credit is 500
Transmit B2B Credit is 250
Receive B2B Credit is 250
MDS9710-A# show interface fc1/1 bbcredit
fc1/1 is trunking
Transmit B2B Credit is 500
Receive B2B Credit is 250
Receive B2B Credit performance buffers is 0
250 receive B2B credit remaining
500 transmit B2B credit remaining
500 low priority transmit B2B credit remaining
MDS9710-A
ISL
MDS9710-B
Fibre Channel Flow Control – Example cont’Normal flow
Delta time ~0.7us
ports
FC data and R_Rdy
MDS
Xgig
Analyzer FC Port(1,1,4)FC Port(1,1,3) ServerFC Data
R_Rdy
Fibre Channel Flow Control – Example cont’Delayed/No R_RDYs
MDS
Xgig
Analyzer FC Port(1,1,4)FC Port(1,1,3) Server
Only data – no R_Rdys
FC Data
FC Data
FC Data
Fibre Channel Flow Control – Example cont’R-RDY recovery
R_Rdys start arriving
More R_Rdys
More R_Rdys
MDS
Xgig
Analyzer FC Port(1,1,4)FC Port(1,1,3) Server
R_Rdy
R_Rdy
R_Rdy
FC
Frame
MDS Frame and Credit Processing
Line
Card
2
Line
Card
1
Active
Supervisor
Arbiter
Fabric Module(XBAR)
Fabric Module(XBAR)
Initiator sends an FC frame to the MDS port ASIC
1
XBAR interface sends request to Arbiter for grant to transmit frame to egress port via XBAR
4
Arbiter grants request to XBAR interface to forward frame – only sent when egress port has buffer space available
5
FC Frame is forwarded to XBAR then R_Rdy sent backsince buffer is now free
6
FC Frame is forward to egress line card7
MDS Port ASIC forwards frame to target8
FC Frame
R-Rdy
FC frame is received in its entirety and stored
2
Credit is returned to Arbiter9
XBAR
interface
FC Frame transmitted to VOQ
3
VOQ
P
o
r
t
P
o
r
t
FC Frame
The Issue: Non-Responsive Devices causing upstream blocking
Interface
Buffer
VOQs
ISL
Slow
Drain
Device
No B2B
CreditsNo R_Rdy
Sent
No R_Rdy
Sent
All
Devices
Impacted
H1
H2
0 Tx credits remaining
Congestion Congestion Congestion Congestion
VOQs
VOQsNo R_Rdy
Sent
No B2B
Credits
No R_Rdy
Sent
0 Rx credits remaining0 Tx credits remaining0 Rx credits remaining
5 MB read issued from host H2 to storage S2
Congestion
S2
S1
No B2B
Credits
0 Rx credits remaining
SAN Congestion! BRKSAN-3446
MDS Slow Drain Features
MDS Slow Drain Features - Existing
• Virtual Output Queues
• Display credits and remaining credits
• Detect Tx and Rx credit transitions to zero
• Slow Port Detection
• Tx and Rx Credit not Available
• Stuck Port Detection
• Credit Loss Recovery / LR Rcvd B2B
• Display ingress queuing
MDS Slow Drain Features -
• Congestion drop frames
• No credit drop frames
• On Board Failure Logging
• Port-monitor alerting / portguard
Enhanced!
MDS Slow Drain Features -
• slowport-monitor
• show interface counters - txwait
• show interface - Percentage Tx credits are available for last 1s/1m/1h/72h
• txwait-history graphs
• show logging onboard txwait
• SNMP fcIfTxWaitCount variable
• show tech-support slowdrain
• DCNM Slow Drain Analysis
New!
Virtual Output Queues (VOQs)
VOQ Model
Frame to Port 5
Frame to Port 5 Frame to Port 6
Frame to Port 4
Frame to Port 4
Frame to Port 6
Frame to Port 6Frame to Port 4
Input Queue at Port 1
Top of VOQ
Input Queue at Port 1
Input Queue at Port 1
Top of VOQ Top of VOQ
Switch Without VOQ
Frame to Port 5
Frame to Port 5
Frame to Port 6
Frame to Port 4
Frame to Port 4
Frame to Port 6
Frame to Port 6
Frame to Port 4
Input Queue at Port 1
Top of Queue
This diagram shows the primary difference between a VOQ-based
switch and a switch without VOQ.
If destination port 4 was congested, the switch without VOQ would
block with frames to other output ports waiting behind the blocked
port.
In contrast, VOQ means that only the VOQ associated with port 4 will be blocked; frames to all other ports will flow normally.
X X
MDS implements VOQs on the input interface
VOQs help prevent head of line blocking
VOQs can alleviate but do not prevent congestion caused by slow drain
MDS Slow Drain Features
• MDS can display the Tx and Rx credits agreed upon on each interface
• MDS can also display the credits remaining in both directions
• Tx and Rx credits are a static value
• Remaining credits are an instantaneous value
• Available via show interface bbcredit command
Display credits and remaining credits
28 Tx frames outstanding
MDS owes 8 credits
MDS9710# show interface fc1/1 bbcredit
fc1/1 is up
Transmit B2B Credit is 128
Receive B2B Credit is 32
Receive B2B Credit performance buffers is 0
24 receive B2B credit remaining
100 transmit B2B credit remaining
100 low priority transmit B2B credit remaining
MDS Slow Drain Features
• Each time the Tx or Rx credits go to zero the MDS increments a counter
• Maintained as a hardware statistic
• Available in
• show interface counters
• slot x show hardware internal fc-mac port y statistics
• show hardware internal statistics
• Since there is no indication of time at zero this is not a great indication of slow drain in and of itself
• Use the slowport-monitor or various txwait commands instead
Detect credit transitions to zero
MDS Slow Drain Features
• MDS software process detects when a port is at zero Tx or Rx credits for 100ms
• Since done by software may not catch each and every time
• Available in:
• slot x show hardware internal fc-mac port y error-statistics
• show logging onboard error-stats• xxx_CNTR_RX_WT_AVG_B2B_ZERO
• xxx_CNTR_TX_WT_AVG_B2B_ZERO
• show system internal snmp credit-not-available
• port-monitor tx-credit-not-available
Tx/Rx Credit not Available
0 sec --
1 sec --
Credits
100 ms
B2B Credits Sampled
Every 100 ms
100 ms
<snip>
Timestamped!
MDS Slow Drain Features
• Creditmon is a process that runs periodically in each linecard
• It checks for transmit credits at zero
• F Port at 0 Tx credits for 1 second
• E Port at 0 Tx credits for 1.5 seconds
• Credit loss recovery invoked
• If successful then non-disruptive
• If port at 0 Rx credits, adjacent device is responsible for initiating recovery
• Part of FC-FS specification
Credit Loss Recovery0 sec --
1/1.5 sec --
No Credits (Stuck)
Credits
LRR
Successful
LR
Port resumes normal
operation
+60ms --
Unsuccessful
0 sec --
1/1.5 sec --
No Credits (Stuck)
Credits
No Response
Link failure Link
reset failed due to
timeout
+60ms --
LR
MDS Slow Drain Features
• Adjacent device initiates credit loss recovery
• If MDS receives LR it checks if input buffers are empty
• If input buffers are not empty in 90ms the “LR Rcvd B2B” condition occurs and the link fails with reason “Link failure Link Reset failed nonempty Recv queue”
• Indication of upstream congestion
LR Rcvd B2B slot 10 show port-config internal link-events
Time PortNo Speed Event Reason
---- ------ ----- ----- ------
Apr 3 18:53:36 2014 00591356 fc10/30 4G UP Not FL
Apr 3 18:53:34 2014 00810034 fc10/30 --- DOWN LR Rcvd B2B
0 sec --
1 sec --
No Credits from MDS
Credits
No Response
Shut/No Shut+90ms --
LR
LR Rcvd B2B
MDS Port FC Device
MDS Slow Drain Features
• Each frame the MDS receives is time stamped
• If frame cannot be delivered to the egress port it is timeout dropped
• MDS (by default) drops frames as timeout drops at 500ms
• Can be configured 100ms-500ms in 1ms intervals
• Lowering will timeout frames quicker and reduce effects of slow drain devices
Congestion Drop Frames
0 sec --
500ms --
Credit
Frame
Frame
Frame
Frame
Check Timestamp
of each frame
Drop the Frames
from the queue
Frames arrive
Enhanced!
MDS Slow Drain Features
• Frames normally queued for Congestion Drop time
• Optionally, frames can be dropped immediately if the egress port is at 0 Tx B2B credits for a specified time
• Frees up frames queued at ISLs destined for slow/stuck ports quicker
• Helps unrelated devices in the presence of congestion
• Configured 1ms-500ms in 1ms intervals
• Done by hardware at exact time
Stuck port / No-credit-drop frames0 sec --
Credit
Frame
Once credit arrives
resume sending
--
--
--300ms
--
--
--
No Tx credits
Frame
Frame
Frame
Frame
Frames in Rx queuefrom other ports
Drop frames in Rx queue
Drop any new arriving Frames immediately
Frame
Frame
Frame
No-credit-drop 300ms
0 Tx credits
Enhanced!
MDS Slow Drain Features
• MDS can show ports that have frames queued and the destination (egress) port(s) they are queued for
• Instantaneous (real time) only
• Helpful when other indications are not showing clear indications
• DI – Destination Index – This is an internal representation of the port
Display ingress queuing
DI
DI
DI
DI
DI
VOQs
Egress ports Ingress port
MDS Slow Drain Features
• MDS 9710/9396S has the capability of displaying some key packet info for packets that have experienced a timeout drop
• 32 packets are kept per forwarding instance
• Output contains:
• Source FCID (SID)
• Destination FCID(DID)
• RCTL – Routing control (ELS, ABTS, etc.)
• Source Index(SI)
• Destination Index(DI)
Display dropped packet info
MDS Slow Drain Features
• MDS can monitor ports withholding credits for as low as 1ms
• Records last 10 events for duration and date/time when occurred
• Included in OBFL
• Full featured for MDS 9700, 9396S, 9148s and 9250i
• Gen 3 has similar but only records 1 event per 100ms cycle
• Gen 4 records total wait time in 100ms
Slowport-monitor 0 Tx credits0 sec --
--
system timeout slowport-monitor 5 mode f
--
--
--
--
5ms --
--
--
--
10ms --
R_RDY
Interface Delay Timestamp
fc1/13 11ms 03/27/2015 12:01:00
fc1/13 8ms 03/27/2015 14:09:45New!
-
-
MDS Slow Drain Features
• Each linecard logs significant events to an NVRAM buffer
• Events are time stamped
• Events can be displayed by date/time
• Show logging onboard <module x> <starttime mm/dd/yy-hh:mm:ss>
• error-stats
• flow-control request-timeout
• flow-control timeout-drops
• slowport-monitor-events
• txwait
On Board Failure Logging(OBFL)
Line Card 1
Line Card 2
Line Card n
OBFL - NVRAMError-statsFlow-controlTimeoutsRequest-timeouts
OBFL - NVRAMError-statsFlow-controlTimeoutsRequest-timeoutsSlowport-monitor-eventsTxwait
OBFL - NVRAMError-statsFlow-controlTimeoutsRequest-timeoutsSlowport-monitor-eventsTxwait.
.
.
New!
MDS Slow Drain Features
• Allows alerting on many slow drain indications
• Three new counters!
• Optional portguard action allows either port to be flapped or error-disabled
• Different policies for E / F ports
Port-monitor / Portguard
SNMP AlertsPort-monitor active
Link-loss
Credit-loss
Tx-credit-not-avail
Slowport-count
Slowpoer-oper-delay
txwait
DCNM Server
New!
MDS Slow Drain Features
• Displays graphical history of Txwait– credit not available
• Shows
• Last minute
• Last 60 minutes
• Last 72 hours
txwait-history
MDS# show process creditmon txwait-history port 13
TxWait history for port fc1/13:
==============================
79998 79993 999999
08887 58882 9899999
000000000000299870000000000000000029994000000000000362999500
1000 ### ### ######
900 #### ### ######
800 #### #### ######
700 ##### #### ######
600 ##### #### ######
500 ##### #### ######
400 ##### #### ######
300 ##### ##### ######
200 ##### ##### ######
100 ##### ##### #######
0....5....1....1....2....2....3....3....4....4....5....5....6
0 5 0 5 0 5 0 5 0 5 0
Credit Not Available per second (last 60 seconds)
# = TxWait (ms)
New!
MDS Slow Drain Features
• show interface counters command
• Provides a quick way to check for problems
• Available for:
• MDS 9500 (Gen4 only)
• MDS 9700 (Gen5)
• MDS 9396S (Gen5)
• MDS 9148S
• MDS 9250i
Percentage Tx credits are available for last 1s/1m/1h/72h New!
MDS Slow Drain Features
• New “show tech-support” flavor
• Contains all the commands necessary to troubleshoot SAN congestion issues
• Best when issued against the entire fabric via DCNM
show tech-support slowdrain New!
SAN Congestion! BRKSAN-3446
Troubleshooting Slow Drain
Troubleshooting Slow Drain
• Classifying Slow Drain Symptoms
• Methodology
• Level by Level Troubleshooting
Troubleshooting Slow DrainClassifying Slow Drain Symptoms
Level Host Symptoms Default Switch Behavior
1 Latency Frame queuing
2 SCSI errors/retransmission Frame dropping
3 Extreme Delay Links failing/reset
Note: Each level includes all the symptoms of the previous levels
Levels of Performance Degradation
Troubleshooting Slow Drain
• Latency indicates SCSI exchanges are taking longer than normal
• No SCSI errors or retransmissions are noted
• Subtle and difficult to detect
• ISLs and other ports should be checked for low numbers of Tx/Rx remaining credits
• Use new slowport-monitor, OBFL txwait, txwait-history and alerting capabilities
Classifying Slow Drain Symptoms - Level 1: Latency
Troubleshooting Slow Drain
• Once any frame in a SCSI exchange is dropped the exchange will be aborted
• Abort exchanges will be listed in host logs
• Frames are held for a maximum of 500ms prior to dropping as timeouts
• This is the default Congestion Drop value
• Frames can also be dropped as timeouts if no-credit-drop is configured
• Use “show logging onboard starttime <date-time> error-stats”
Classifying Slow Drain Symptoms - Level 2: Retransmission
Troubleshooting Slow Drain
• Typically caused by ports without credits for 1 or 1.5 seconds
• Credit-loss Recovery is invoked
• Links may fail and/or flap
• Typically many timeout drops are also recorded
Classifying Slow Drain Symptoms - Level 3: Extreme Delay
Troubleshooting Slow Drain
• Cisco recommends troubleshooting slow drain in the following order:
Methodology
Level 3: Extreme Delay
Level 2: Retransmission
Level 1: Latency
Troubleshooting Slow Drain
• If Rx congestion then find ports communicating with this port that have Tx congestion
• Zoning defines which devices communicate with this port
• Understand topology
• If port communicating with port showing Rx congestion is FCIP
• Check for TCP retransmits
• Check for overutilization of FCIP
Methodology – Follow Congestion to Source
F E
Rx Credits
0 Remaining
Tx Credits
0 RemainingCongestion
Troubleshooting Slow Drain
• If Tx congestion found
• If F port then device attached is slow drain device
• If E port then go to adjacent switch and continue troubleshooting
• Continue to track through the fabric until destination F-port is discovered
Methodology - Follow Congestion to Source
E EF F
Rx Credits
0 Remaining
Tx Credits
0 RemainingCongestion
Level 3: Extreme Delay - Troubleshooting
• Supervisor command on all platforms
• Module command also available
• Credit loss recovery events are the most severe slow drain indications
• Check/change cables/SFPs/HBAs
• Show logging onboard error-stats also contains this
Check for credit loss recovery
MDS9710-1# show process creditmon credit-loss-events
Module: 01 Credit Loss Events: YES
----------------------------------------------------
| Interface | Total | Timestamp |
| | Events | |
----------------------------------------------------
| fc1/13 | 11524 | 1. Sat Mar 29 14:21:48 2014 |
| | | 2. Sat Mar 29 14:21:47 2014 |
| | | 3. Sat Mar 29 14:21:46 2014 |
| | | 4. Sat Mar 29 14:21:45 2014 |
| | | 8. Sat Mar 29 14:21:41 2014 |
| | | 9. Sat Mar 29 14:21:40 2014 |
| | |10. Sat Mar 29 14:21:39 2014 |
----------------------------------------------------
Level 3: Extreme Delay - Troubleshooting
• Two places to check:
1. Module link-events
2. Logging log
• Both indicate the same thing –Rx congestion
• Not normally a problem w/this port but the port this port is switching packets to
• If multiple ports fail at similar times then they are switching to same port
Check for LR Rcvd B2B
MDS9710-1# slot 1 show port-config internal link-events
*************** Port Config Link Events Log ***************
---- ------ ----- ----- ------
Time PortNo Speed Event Reason
---- ------ ----- ----- ------
...
Jul 28 00:46:39 2012 00670297 fc1/25 --- DOWN LR Rcvd B2B
MDS9710-1# show logging log
%PORT-2-IF_DOWN_LINK_FAILURE: %$VSAN 100%$ Interface fc5/32 is down (Link
failure)
%PORT-5-IF_DOWN_LINK_FAILURE: %$VSAN 100%$ Interface fc5/32 is down (Link
failure Link Reset failed nonempty recv queue)
Level 2: Retransmission - Troubleshooting
• TIMEOUT drops are drops for packets that hit either the Congestion Drop or No-Credit-Drop thresholds
• They are normally counted several different ways
• Reference appendix for counter names and definitions
Check for Transmit Frame Drops
MDS9710-1# show hard internal statistics module 1 pktflow dropped
Hardware statistics on module 01:
|------------------------------------------------------------------------|
| Device:Lightning Role:ARB-MUX Mod: 1 |
|------------------------------------------------------------------------|
|------------------------------------------------------------------------|
| Device:F16 Xbar Driver Role:FABRIC Mod: 1 |
|------------------------------------------------------------------------|
|------------------------------------------------------------------------|
| Device:F16 Que Driver Role:QUE Mod: 1 |
|------------------------------------------------------------------------|
|------------------------------------------------------------------------|
| Device:F16 Fwd Driver Role:L2 Mod: 1 |
|------------------------------------------------------------------------|
|------------------------------------------------------------------------|
| Device:F16 Mac Driver Role:FCMAC Mod: 1 |
|------------------------------------------------------------------------|
Instance:1
Cntr Name Value Ports
----- ----- ----- -----
0 F16_TMM_TIMEOUT_STATS_DROP 0000000000088775 13-16 -
1 F16_TMM_PORT_FRM_DROP_CNT 0000000000088775 13 -
2 F16_TMM_TOLB_TIMEOUT_DROP_CNT 0000000000088775 13 -
Level 2: Retransmission - Troubleshooting
• Counters are polled every 20 seconds
• When counter value changes it is included
• Several different counters are in error-stats:
• Timeout drops
• Credit loss recovery
• Tx/Rx credit not available(100ms)
• Force timeout on/off
Show logging onboard error-stats
mds9710-2# show logging onboard error-stats
----------------------------
Module: 1
----------------------------
--------------------------------------------------------------------------------
ERROR STATISTICS INFORMATION FOR DEVICE DEVICE: FCMAC
--------------------------------------------------------------------------------
Interface | | | Time Stamp
Range | Error Stat Counter Name | Count |MM/DD/YY HH:MM:SS
| | |
--------------------------------------------------------------------------------
fc1/13 |F16_TMM_TOLB_TIMEOUT_DROP_CNT |242618 |04/14/14 12:17:58
fc1/13 |FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO |124 |04/14/14 12:17:58
fc1/13 |FCP_SW_CNTR_CREDIT_LOSS |124 |04/14/14 12:17:58
fc1/13 |F16_TMM_TOLB_TIMEOUT_DROP_CNT |201650 |04/14/14 12:17:38
fc1/13 |FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO |108 |04/14/14 12:17:38
fc1/13 |FCP_SW_CNTR_CREDIT_LOSS |107 |04/14/14 12:17:38
242618 – 201650 =
40968 timeout drops in
the last 20 seconds
Level 2: Retransmission - Troubleshooting
• MDS 9710 and 9396S maintains a FIFO list of last 32 dropped packets
• Display is per instance(8 ports)
• These contain:
• Source FCID (SID)
• Destination FCID(DID)
• RCTL – Routing control (ELS, ABTS, etc.)
• Source Index(SI)
• Destination Index(DI)
• These are not necessarily the slow device! Could be a victim!
Display dropped packet information
module-1# show hardware internal fcmac inst 0 tmm_timeout_stat_buffer
Port Group num: 0 TMM TIMEOUT BUFFERS
---------------------------------------------
TO_RD:22 TO_WR:6 NUM PKTS:32
--------------------------------------------------------------
TMM TIMEOUT Packet :0
CHIPTIME :14227(0x3793) ZERO:0 FCTYPE:0
SID:330040 DID:170040 RCTL:0
TSTMP_VALID:1 HDRTSTMP:14176(0x3760) HDRCTL:6144 SI:12
DI:2 AT:0 PORTNUM:1
TMM TIMEOUT Packet :1
CHIPTIME :14227(0x3793) ZERO:0 FCTYPE:0
SID:330040 DID:170040 RCTL:0
TSTMP_VALID:1 HDRTSTMP:14176(0x3760) HDRCTL:6144 SI:12
DI:2 AT:0 PORTNUM:1
MDS9710-2# show system internal fcfwd idxmap port-to-interface
Port to Interface Table:(All values in hex)
--------------------------------------------------------------------------------
glob| |VL|lcl| if |slot|port| mts | port| flags
idx | if_index | |idx|type| | | node| mode|
-----|--------------------------|--|---|----|----|----|-----|-----|-------------
0| 01000000 fc1/1 | 0| 00| 01 | 00 | 00 | 0102| 08 | 00
1| 01001000 fc1/2 | 0| 01| 01 | 00 | 01 | 0102| 00 | 00
2| 01002000 fc1/3 | 0| 02| 01 | 00 | 02 | 0102| 00 | 00
<snip>
12| 01012000 fc1/13 | 0| 12| 01 | 00 | 12 | 0102| 00 | 00
Actual interface name
Shows packets from fc1/13 to fc1/3 dropped
Level 1: Latency - Troubleshooting
• Indicates 100ms increments where Tx B2B credits were 0
• % indicate % of 1 second so 20% is 200ms
Credit Not Available
MDS9513# show system internal snmp credit-not-available
Module: 6 Number of events logged: 6
------------------------------------------------------------------------------------------
Port Threshold Rising/Falling Interval(s) Event Time Type Duration available
----------------------------------------------------------------------------------------------------------
fc6/32 10/0(%) 1 Wed Apr 2 17:23:54 2014 Rising 10%
fc6/32 10/0(%) 1 Wed Apr 2 17:24:39 2014 Falling 0%
fc6/32 10/0(%) 1 Wed Apr 2 17:24:40 2014 Rising 20%
fc6/32 10/0(%) 1 Wed Apr 2 17:25:53 2014 Falling 0%
fc6/32 10/0(%) 1 Wed Apr 2 17:25:54 2014 Rising 20%
100ms Tx Delay
200ms Tx Delay
Level 1: Latency - Troubleshooting
• Included in OBFL error-stats
• Tracked in both Rx and Tx directions
• Indicates 100ms intervals where Tx or Rx credit is not available
Credit Not Available – continued
--------------------------------------------------------------------------------
ERROR STATISTICS INFORMATION FOR DEVICE DEVICE: FCMAC
--------------------------------------------------------------------------------
Interface | | | Time Stamp
Range | Error Stat Counter Name | Count |MM/DD/YY HH:MM:SS
| | |
--------------------------------------------------------------------------------
fc1/13 |F16_TMM_TOLB_TIMEOUT_DROP_CNT |1496855 |04/07/15 22:44:23
fc1/13 |FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO |217 |04/07/15 22:44:23
fc1/13 |FCP_SW_CNTR_CREDIT_LOSS |19 |04/07/15 22:44:23
fc1/13 |F16_TMM_TOLB_TIMEOUT_DROP_CNT |1486654 |04/07/15 22:44:03
fc1/13 |FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO |108 |04/07/15 22:44:03
fc1/13 |FCP_SW_CNTR_CREDIT_LOSS |9 |04/07/15 22:44:03
FCP_SW_CNTR_TX_WT_AVG_B2B_ZEROCredit not available 100ms incrementsIncremented by 217-108 = 109
Credit Loss Incremented by 19 - 9 = 10in 20 seconds
Level 1: Latency - Troubleshooting
• Data frames are sent using low priority credits
• If Tx B2B credit remaining is low then congestion is toward the adjacent switch
• If Rx B2B credit remaining is low then congestion is in this switch and perhaps other switches
• Can only be done while congestion is in progress
Check ISLs for Lack of Transmit Credits
MDS9710# show interface | include "fc|Belong|low
priority|remain" | exclude "description" |exclude
"Peer" | include "trunking" next 3
fc1/3 is trunking
500 receive B2B credit remaining
0 transmit B2B credit remaining
0 low priority transmit B2B credit remaining
Level-1 Troubleshooting: Latency
• Transitions to zero indicate when Tx or Rx B2B credits go to zero even just for an instant of time
• Transmit indicates the adjacent device is withholding credits
• Receive indicates this MDS is withholding credits from the adjacent device
• Look for large incrementing numbers since some devices go to zero normally
Check Transitions to Zero counters
9710-1# show int fc1/13 counters
fc1/13
…
549317 Transmit B2B credit transitions to zero
2388296 Receive B2B credit transitions to zero
1934443328 2.5us TxWait due to lack of transmit
credits
Percentage Tx credits not available for last
1s/1m/1h/72h: 0%/0%/98%/1%
32 receive B2B credit remaining
17 transmit B2B credit remaining
17 low priority transmit B2B credit remaining
Last clearing of "show interface" counters 01:25:25
Level-1 Troubleshooting: Latency
• “Prio 3” is class 3
• 000004 is port bitmap in hexadecimal indicating the presence of one or more queued frames
• B’0000 0000 0000 0000 0000 0100’
• GI (Hex) is Global Index (egress port)
Check for Frame Queuing on Ingress Portsmodule-1# show hardware internal f16_que inst 0 table iqm-statusmem0
+-------------------------------------------------------------------------------
| IQM: PG0 Status Memory (logical layout) for F16 Que Driver
| Inst 0; port(s) 1-8
|
Note: Only non-zero entries are displayed
Each non-zero bit indicates pending frame in VOQ for that IB
+----------+--------+--------+--------+--------+
| GI (Hex) | Prio 0 | Prio 1 | Prio 2 | Prio 3 |
+----------+--------+--------+--------+--------+
| c | 000000 | 000000 | 000000 | 000004 |
+----------+--------+--------+--------+--------+
rtp-san-33-18-9710-2# show system internal fcfwd idxmap port-to-interface
Port to Interface Table:(All values in hex)
--------------------------------------------------------------------------------
glob| |VL|lcl| if |slot|port| mts | port| flags
idx | if_index | |idx|type| | | node| mode|
-----|--------------------------|--|---|----|----|----|-----|-----|-------------
0| 01000000 fc1/1 | 0| 00| 01 | 00 | 00 | 0102| 00 | 00
1| 01001000 fc1/2 | 0| 01| 01 | 00 | 01 | 0102| 00 | 00
…snip
b| 0100b000 fc1/12 | 0| 0b| 01 | 00 | 0b | 0102| 00 | 00
c| 0100c000 fc1/13 | 0| 0c| 01 | 00 | 0c | 0102| 00 | 00
Port 1Port 3 Port 2
Port fc1/3
Egress port fc1/13
Each instance is 8 ports on this LC
Input interface fc1/3 has frame(s) queued for fc1/13
Level-1 Troubleshooting: Latency
• For generation 3: • slot x show hardware internal up-xbar <0-1> queued-packet-info
• For generation 4:• slot x show hardware internal que inst <0-3> memory iqm-statusmem0|1
• For generation 5/9396S:• slot x show hardware internal f16_que inst 0 table iqm-statusmem0
• 9148, 9250i & 9148S – Not available
• Each instance is a defined number of ports and is LC specific
• Issue command several times and look for patterns of GI that are the same. This is the slow port.
• Real time (instantaneous)
Check for Frame Queuing on Ingress Ports - continued
Level-1 Troubleshooting: Latency
• Request-timeouts indicate frames that could not immediately be sent to “Dest Intf” (egress port -slow)
• Do not indicate actual packet drops – just delayed
• If Dest Intf is FCIP then there are problems on the FCIP tunnel
• Check for TCP retransmits
• Check for overutilization of FCIP
Check for Arbitration Timeouts
MDS9513# show logging onboard flow-control request-timeout
----------------------------
Module: 9
----------------------------
--------------------------------------------------------------------------------
| Dest | Source |Events| Timestamp | Timestamp |
| Intf | Intf | Count| Earliest | Latest |
--------------------------------------------------------------------------------
|fc1/2 |fc9/24, | 28|Sun Feb 9 00:28:23 2014|Sun Feb 9 00:28:24 2014|
--------------------------------------------------------------------------------
Level-1 Troubleshooting: Latency
• txwait is a counter that increments every 2.5us when port is at 0 Tx credits and there are frames queued for transmit
• txwait * 2.5 / 1000000 = seconds of time the port was unable to transmit
• Only applies to the following:
• MDS 9500 with generation 4 linecards:• MDS 9000 Family 32-Port 8-Gbps Advanced Fibre Channel Switching Module (DS-X9232-256K9)
• MDS 9000 Family 48-Port 8-Gbps Advanced Fibre Channel Switching Module (DS-X9248-256K9)
• MDS 9700 48-Port 16-Gbps Fibre Channel Switching Module (DS-X9448-768K9)
• MDS 9148S 16G Multilayer Fabric Switch
• MDS 9250i Multiservice Fabric Switch
• MDS 9396S 16G Multilayer Fabric Switch
• Others will return zero
txwait New!
Level-1 Troubleshooting: Latency
• txwait can be seen in the following:
• show interface counters
• Raw value in 2.5us units
• show interface counters
• Percentage Tx credits are available for last 1s/1m/1h/72h
• show process creditmon txwait-history
• 60sec, 60min, 72hour graphs
• show logging onboard txwait
• SNMP fcIfTxWaitCount variable
txwait - continued New!
Level-1 Troubleshooting: Latency
mds9710-1# show interface fc1/13 counters | i fc|wait
fc1/13
6252650 2.5us Txwaits due to lack of transmit credits
6252650 * 2.5 / 1000000 = 15.631625 seconds
The above indicates the MDS was not able to transmit for over 15 seconds since the counters were cleared last
txwait - show interface counters New!
Level-1 Troubleshooting: Latency
• Utilizes the underlying txwait counter
txwait - Percentage Tx credits are available for last 1s/1m/1h/72h
MDS9710-1# show interface fc1/13 counters
fc1/13
…
5 Transmit B2B credit transitions to zero
2 Receive B2B credit transitions to zero
0 2.5us TxWait due to lack of transmit credits
Percentage Tx credits not available for last 1s/1m/1h/72h: 1%/5%/3%/2%
32 receive B2B credit remaining
128 transmit B2B credit remaining
128 low priority transmit B2B credit remaining
New!
Level-1 Troubleshooting: Latency
MDS9513# show logging onboard txwait module 4
…
---------------------------------
Module: 4 txwait count
---------------------------------
Notes:
- Sampling period is 20 seconds
- Only txwait delta >= 100 ms are logged
-----------------------------------------------------------------------------
| Interface | Delta TxWait Time | Congestion | Timestamp |
| | 2.5us ticks | seconds | | |
-----------------------------------------------------------------------------
| fc4/1 | 52927 | 0 | 0% | Wed May 27 13:20:12 2015 |
| fc4/1 | 52926 | 0 | 0% | Wed May 27 13:19:52 2015 |
| fc4/1 | 105854 | 0 | 1% | Wed May 27 13:19:32 2015 |
| fc4/1 | 52926 | 0 | 0% | Wed May 27 13:19:12 2015 |
• Delta values recorded when they are more than 100ms in the
20 second interval
txwait - show logging onboard txwait New!
Recorded every
20 seconds
Level-1 Troubleshooting: Latency
• Graphical display of time where Tx credits are not available
• Similar in format to cpu history
• 3 graphs per port
• Last 60 seconds
• Last 60 minutes
• Last 72 hours
• Utilizes the underlying txwaitcounter
txwait-history
mds9710-1# show process creditmon txwait-history module 1 port 13
TxWait history for port fc1/13:
==============================
697 54 6994
299 18 4780
000000000000000000000000000000000029000290088400000000000000
1000 # ##
900 # ##
800 ## ##
700 ## ##
600 ### ###
500 ### ## ###
400 ### ## ####
300 ### ## ####
200 ### ## ####
100 ### ## ####
0....5....1....1....2....2....3....3....4....4....5....5....6
0 5 0 5 0 5 0 5 0 5 0
Credit Not Available per second (last 60 seconds)
# = TxWait (ms)
New!
Level-1 Troubleshooting: Latency
• system timeout slowport-monitor <1-500> mode e|f
• Events are captured every 100ms
• Last 10 events per port captured in slowport-monitor-events
• Logging onboard slowport-monitor-events captures more events
• Currently implemented for:
• 9500 • Gen 3 LCs - DS-X9248-48K9 and DS-X92xx-96K9 modules
• Gen 4 LCs - DS-X9232-256K9 and DS-X9248-256K9 modules
• 9700 & 9396S (Gen 5)
• 9250i & 9148S
• Differences exist between Gen3, Gen4 and 9700/9250i/9148S/9396S
slowport-monitor New!
Level-1 Troubleshooting: Latency
• system timeout slowport-monitor… must be configured
• Events are captured every 100ms
• Last 10 events per port captured in slowport-monitor-events
• Differences exist between Gen3, Gen4 and 9700/9250i/9148S/9396S
show process creditmon slowport-monitor command New!
Level-1 Troubleshooting: Latency
• Gen3 modules have basic HW capabilities
• Each 100ms it can only be determined if port was at zero Txcredits for the admin delay period
• The actual amount of time and the number of times in that 100ms cannot be determined
• Recorded when at least one complete event occurred
• No oper delay
Slowport-monitor – Gen3 LCs - DS-X9248-48K9 and DS-X92xx-96K9 modules
mds9513# show process creditmon slowport-monitor-events module 2
Module: 02 Slowport Detected: YES
==================================================================
=======
Interface = fc2/1
--------------------------------------------------------
| admin | slowport | Timestamp |
| delay | detection | |
| (ms) | count | |
--------------------------------------------------------
| 10 | 194 | 1. 04/29/15 17:19:13.345 |
| 10 | 193 | 2. 04/29/15 17:19:13.245 |
| 10 | 192 | 3. 04/29/15 17:19:13.145 |
| 10 | 191 | 4. 04/29/15 17:19:13.045 |
| 10 | 190 | 5. 04/29/15 17:19:12.945 |
| 10 | 189 | 6. 04/29/15 17:19:12.845 |
| 10 | 188 | 7. 04/29/15 17:19:12.745 |
| 10 | 187 | 8. 04/29/15 17:19:12.645 |
| 10 | 186 | 9. 04/29/15 17:19:12.545 |
| 10 | 185 |10. 04/29/15 17:19:12.445 |
--------------------------------------------------------
Only 1 event in last 100ms
100ms intervals
Configured delay 10ms
New!
Level-1 Troubleshooting: Latency
• Gen4 modules use txwait for slowport-monitor
• Recorded when txwait is >= admin delay within 100ms
• oper delay is cumulative delay
• Txwait is cumulative for the 100ms interval
• 1 x 10ms
• 10 x 1ms
Slowport-monitor – Gen4 LCs - DS-X92xx-256K9
MDS9513# show process creditmon slowport-monitor-events module 4
Module: 04 Slowport Detected: YES
==================================================================
Interface = fc4/1
----------------------------------------------------------------
| admin | slowport | txwait| Timestamp |
| delay | detection | oper | |
| (ms) | count | delay | |
| | | (ms) | |
----------------------------------------------------------------
| 10 | 18 | 16 | 1. 05/21/15 14:39:09.102 |
| 10 | 17 | 56 | 2. 05/21/15 14:39:09.002 |
| 10 | 16 | 59 | 3. 05/21/15 14:39:08.905 |
| 10 | 15 | 10 | 4. 05/21/15 14:38:54.590 |
| 10 | 14 | 41 | 5. 05/21/15 14:38:54.490 |
| 10 | 13 | 80 | 6. 05/21/15 14:38:54.390 |
| 10 | 12 | 37 | 7. 05/21/15 14:38:39.970 |
| 10 | 11 | 56 | 8. 05/21/15 14:38:39.870 |
| 10 | 10 | 34 | 9. 05/21/15 14:38:39.775 |
| 10 | 9 | 29 |10. 05/21/15 14:38:25.430 |
----------------------------------------------------------------
Only 1 event per 100ms
100ms intervalsConfigured delay 10ms
New!
Cumulative delay in 100ms
Level-1 Troubleshooting: Latency
• Gen5/9250i/9148S/9396S have enhanced HW capabilities
• Each 100ms interval the number of times Tx credits remained at 0 for the configured(admin) delay is counted.
• The average operational delay is determined – This is how long the port was at 0 Tx credits
• Recorded when at least one complete event occurred
slowport-monitor – 9700/9250i/9148S/9396S (Gen 5 LCs)
MDS9710-1# show process creditmon slowport-monitor-events
Module: 01 Slowport Detected: YES
==================================================================
=======
Interface = fc1/13
----------------------------------------------------------------
| admin | slowport | oper | Timestamp |
| delay | detection | delay | |
| (ms) | count | (ms) | |
----------------------------------------------------------------
| 5 | 1300 | 20 | 1. 04/01/15 23:03:38.823 |
| 5 | 1296 | 19 | 2. 04/01/15 23:03:38.724 |
| 5 | 1291 | 19 | 3. 04/01/15 23:03:38.623 |
…
| 5 | 1256 | 19 |10. 04/01/15 23:03:37.923 |
----------------------------------------------------------------
te
Configured delay(5ms)
Actual average delay
4 events in last 100ms
New!
Note: Operdelay limited by no-credit-drop threshold
Level-1 Troubleshooting: Latencyslowport-monitor – Comparison 2 events in 100ms
Oper delay
15ms+30ms/2 = 22ms
2 events logged
0 51
01
5
Cre
dits
20
25
30
35
40
45
50
55
60
65
70
75 80
85
90
95
10
0
Time (ms)
0 51
01
5
Cre
dits
20
25
30
35
40
45
50
55
60
65
70
75 80
85
90
95
10
0
Time (ms)
0 Tx >= 5ms in 100ms
1 event logged
9500 Gen3
Gen5/9250i/9148S/9396S
0 51
01
5
Cre
dits
20
25
30
35
40
45
50
55
60
65
70
75 80
85
90
95
10
0
Time (ms)
9500 Gen4
Total time 45ms >= 5ms in 100ms
1 event logged
Poll
Sys
tem
tim
eo
ut
slo
wp
ort
-mo
nit
or
5
Level-1 Troubleshooting: Latency
More events available via logging onboardMDS9710-1# show logging onboard slowport-monitor-events
…
---------------------------------
Module: 1 slowport-monitor-events
---------------------------------
--------------------------------------------------------------------------
| admin | slowport | oper | Timestamp | Interface
| delay | detection | delay | |
| (ms) | count | (ms) | |
--------------------------------------------------------------------------
| 20 | 49 | 489 | 05/11/15 21:04:46.779 | fc1/13
| 20 | 48 | 489 | 05/11/15 21:04:46.272 | fc1/13
| 20 | 47 | 489 | 05/11/15 21:04:45.779 | fc1/13
| 20 | 46 | 489 | 05/11/15 21:04:45.272 | fc1/13
show logging onboard slowport-monitor command New!
Gen5/9250i/9148S/9396S
Level-1 Troubleshooting: Latency
Linecard Maximum events
per 100ms interval
Actual delay
measured?
Notes
DS-X9248-48K9 (gen3)
DS-X9224-96K9 (gen3)
DS-X9248-96K9 (gen3) 1
No – Just an
indication if admin
delay was reached.
Actual delay could be
much more
If actual delay hits slowport-monitor
admin delay then an indication is
made. That indication is checked
every 100ms and if true then raise
event
DS-X9232-256K9 (gen4)
DS-X9248-256K9 (gen4)1
Yes - Actual delay is
total delay per 100ms
interval
If total delay(sum of all individual
delays) in 100ms interval hits
slowport-monitor admin delay then
raise event
DS-X9448-768K9 (gen5)
MDS 9396S(gen5)
MDS 9148S
MDS 9250i
100
Yes – Average delay
for all events in
100ms interval
If actual delay hits slowport-monitor
admin delay and port
recovered(received credit) then
raise event. These are checked
every 100ms interval.
Slowport-monitor – Comparison
Level-1 Troubleshooting: Latency
• Contains all the commands available that pertain to slow drain
• Contains “context” commands to understand the FC topology
• Contains name server commands to identify devices
• Contains active zonesets to understand device relationships
• Most useful when run from DCNM and gathered for the entire fabric
• SAN Client -> Tools -> Run CLI Commands…
show tech-support slowdrain New!
SAN Congestion! BRKSAN-3446
DCNM Slow Drain Analysis
DCNM Slow Drain Analysis
• DCNM 7.1(1) added Slow Drain Analysis
• Used for pulling fabric wide slow drain counters for a defined period of time
• Useful for ongoing slow drain problems
• Accessed from the Web Client Health -> Diagnostics -> Slow drain Analysis
New!
DCNM Slow Drain AnalysisStarting
Slow Drain Analysis
DCNM Slow Drain Analysis3 steps to initiate collection of slow drain counters for a fabric
Step 1:
Select
fabric
Step 2:
Choose
duration
Step 3:
Start
collection
DCNM Slow Drain AnalysisWhile underway…
Almost
finished
DCNM Slow Drain AnalysisFinished
Select
job
DCNM Slow Drain AnalysisCompleted Report
509 credit
loss events
in 10
minutes!
Only show
rows with
non-zero
counters
Filter results
as needed
DCNM Slow Drain AnalysisCounter explanations - help
Hover over
counter for
addition
information
DCNM Slow Drain AnalysisShow non-zero data rows only
Only show
rows with
non-zero
counters
Only 3
rows with
non-zero
counters
DCNM Slow Drain AnalysisFiltering
Filter results
as needed
SAN Congestion! BRKSAN-3446
Slow Drain Alerting and Mitigation
Slow Drain Alerting and Mitigation
• Port Monitor
• Congestion counters
• Portguard
• Adjust Congestion Drop Threshold
• Setting the No Credit Drop Threshold
Slow Drain Alerting and Mitigation
• Port-monitor allows monitoring of several counters relating to slow drain
• credit-loss-reco Credit loss recovery counter
• lr-rx The number of link resets received by the fc-port
• lr-tx Link resets transmitted by the fc-port
• timeout-discards Timeout discards counter
• tx-credit-not-available Credit not available counter(in 100ms increments)
• tx-discards Tx discards counter
• slowport-count Number of slowport events
• slowport-oper-delay Slowport operational delay
• txwait Amount of time at 0 Tx credits and packets queued
Port-monitor alerting
Note: There are other counters that are valuable and should also be considered for inclusion in monitoring but are not part of slow drain
New!
New!
New!
Slow Drain Alerting and Mitigation
• Number of times credit loss recovery was initiated due to port at 0 Tx credits for 1/1.5 seconds
• Most severe indication of congestion
• Normally other counters like timeout-discards will also increment
• Configure as a simple delta counter with a low value
• Applies to all types of switches and linecards
Port-monitor counter - credit-loss-reco
Slow Drain Alerting and Mitigation
• Number of times a Link Reset(LR) was received(lr-rx)
• Number of times a Link Reset(LR) was transmitted(lr-tx)
• Similar to credit-loss-reco counter
• May increment for other reasons besides congestion
• Normally other counters like timeout-discards will also increment
• Configure as a simple delta counter with a low value
• Applies to all types of switches and linecards
Port-monitor counter - lr-rx and lr-tx
Slow Drain Alerting and Mitigation
• Number of packets dropped due to reaching the congestion-drop (timeout) threshold
• When packets are dropped SCSI errors will result at the hosts and targets
• Configure as a simple delta counter with a low value
• Applies to all types of switches and linecards
Port-monitor counter - timeout-discards
Slow Drain Alerting and Mitigation
• Indicates 100ms intervals of a port at 0 Tx credits
• rising-threshold is configured as a percentage of polling-
interval(1 second)
• Examples:
• counter tx-credit-not-available poll-interval 1 delta rising-
threshold 10 event 4 falling-threshold 0 event 4
• 10 is 10% of 1 second or 100ms
• counter tx-credit-not-available poll-interval 1 delta rising-
threshold 20 event 4 falling-threshold 0 event 4
• 20 is 20% of 1 second or 200ms
• Only multiples 10 (10, 20, 30, etc…) should be configured
• Applies to all types of switches and linecards
Port-monitor counter - tx-credit-not-available
Slow Drain Alerting and Mitigation
• The number of packets dropped at egress for a variety of reasons.
• This counter would include timeout-drops as well
• Configure as a simple delta counter with a low value
• Applies to all types of switches and linecards
Port-monitor counter - tx-discards
Slow Drain Alerting and Mitigation
• Counts the number of times the slowport-monitor threshold was reached
• Only applies to MDS 9500 with generation 3 linecards
• 1/2/4/8 Gbps 24-Port Fibre Channel switching module (DS-X9224-96K9)
• 1/2/4/8 Gbps 48-Port Fibre Channel switching module (DS-X9248-96K9)
• 1/2/4/8 Gbps 4/44-Port Fibre Channel switching module (DS-X9248-48K9)
• Only counts a maximum of once per 100ms interval (10 per second)
• Indicates 0 Tx credits for at least the slowport-monitor interval
• Slowport-monitor must be configured for this to alert
• Refer to gen3 slowport-monitor section for more info
Port-monitor counter - Slowport-count New!
Slow Drain Alerting and Mitigation
• Alerts on slowport operational(actual) delay
• Only applies to the following
• MDS 9500 with generation 4 linecards• MDS 9000 Family 32-Port 8-Gbps Advanced Fibre Channel Switching Module (DS-X9232-256K9)
• MDS 9000 Family 48-Port 8-Gbps Advanced Fibre Channel Switching Module (DS-X9248-256K9)
• MDS 9700 48-Port 16-Gbps Fibre Channel Switching Module (DS-X9448-768K9)
• MDS 9148S 16G Multilayer Fabric Switch
• MDS 9250i Multiservice Fabric Switch
• MDS 9396S 16G Multilayer Fabric Switch
• Alerts on operational(actual) delay not on the admin(configured) delay
Port-monitor counter - slowport-oper-delay New!
Slow Drain Alerting and Mitigation
• Configured as an absolute counter
• Slowport-monitor must be configured for this to alert!
• Refer to Gen4 slowport-monitor section for more info
• Refer to Gen5/9250i/9148S/9396S slowport-monitor section for more info
Port-monitor counter - slowport-oper-delay - continued New!
Slow Drain Alerting and Mitigation
• Measures time port is at 0 Tx credits and frames are queued to send
• Only applies to the following
• MDS 9500 with generation 4 linecards• MDS 9000 Family 32-Port 8-Gbps Advanced Fibre Channel Switching Module (DS-X9232-256K9)
• MDS 9000 Family 48-Port 8-Gbps Advanced Fibre Channel Switching Module (DS-X9248-256K9)
• MDS 9700 48-Port 16-Gbps Fibre Channel Switching Module (DS-X9448-768K9)
• MDS 9148S 16G Multilayer Fabric Switch
• MDS 9250i Multiservice Fabric Switch
• MDS 9396S 16G Multilayer Fabric Switch
• Configured as a percentage of the polling interval
Port-monitor counter - txwaitNew!
Slow Drain Alerting and Mitigation
Linecard slowport-count slowport-oper-delay tx-wait
DS-X9248-48K9 (gen3) X
DS-X9224-96K9 (gen3) X
DS-X9248-96K9 (gen3) X
DS-X9232-256K9 (gen4) X X
DS-X9248-256K9 (gen4) X X
DS-X9448-768K9 (gen5) X X
MDS 9148S X X
MDS 9250i X X
MDS 9396S X X
Port-monitor slowport counters - comparison
Slow Drain Alerting and Mitigation
• Port-monitor allows separate policies
• F, FL ports(access)
• E, TL ports(trunks)
• Both F ports and E ports
• Only one policy type per port can be active at a time
• Note: port-type access includes F port connections to NPV switches that can carry several logins
• Note: NP ports are not currently monitored
Port-monitor alerting
MDS9513(config-port-monitor)# port-type ?
access-port Configure port-monitoring for access ports
all Configure port-monitoring for all ports
trunks Configure port-monitoring for trunk ports
Slow Drain Alerting and Mitigation
• counter <name> poll-interval <interval> delta rising-threshold <rthresh> event <id> falling-threshold <fthres> event <id> <portguard errordisable | flap>
• poll-interval – Seconds - How often should this counter be checked?
• delta – Compare the current value with the value at the previous poll interval
• absolute – Match the actual value
• rising-threshold – How much the counter must increase in this poll interval to trigger
• event – Indicates severity of alert - info, warning, error, etc.
• falling-threshold - How much the counter must decrease in this poll interval to reset
• portguard – Optional – Action to take when rising-threshold is reached• errordisable – Place put in error-disable state. Requires manual shut/no shut to re-activate
• flap – shut/no shut port
Port-monitor alerting - continued
Slow Drain Alerting and Mitigation
• Monitor-counter command determines which counters are active in a policy
Port-monitor alerting – continued
rtp-san-33-18-9710-1(config-port-monitor)# monitor counter ?
credit-loss-reco Configure credit loss recovery counter
err-pkt-from-port Configure err-pkt-from-port counter
err-pkt-from-xbar Configure err-pkt-from-xbar counter
err-pkt-to-xbar Configure err-pkt-to-xbar counter
invalid-crc Configure invalid-crc counter
invalid-words Configure invalid-words counter
link-loss Configure link-failure counter
lr-rx Configure the number of link resets received by the fc-port
lr-tx Configure the number of link resets transmitted by the fc-port
rx-datarate Configure rx performance counter
signal-loss Configure signal-loss counter
slowport-count Configure slow port sub-100ms counter
slowport-oper-delay Configure slow port operation delay
sync-loss Configure sync-loss counter
timeout-discards Configure timeout discards counter
tx-credit-not-available Configure credit not available counter
tx-datarate Configure tx performance counter
tx-discards Configure tx discards counter
txwait Configure tx total wait counter
Slow Drain Alerting and Mitigation
• Event indicates severity in alert
• 1 – Fatal
• 2 – Critical
• 3 – Error
• 4 – Warning
• 5 - Informational
Port-monitor alerting – RMON event severities
mds9513(config-port-monitor)# show rmon events
Event 1 is active, owned by PMON@FATAL
Description is FATAL(1)
Event firing causes log and trap to community public, last fired never
Event 2 is active, owned by PMON@CRITICAL
Description is CRITICAL(2)
Event firing causes log and trap to community public, last fired never
Event 3 is active, owned by PMON@ERROR
Description is ERROR(3)
Event firing causes log and trap to community public, last fired never
Event 4 is active, owned by PMON@WARNING
Description is WARNING(4)
Event firing causes log and trap to community public, last fired
2014/02/21-17:13:11
Event 5 is active, owned by PMON@INFO
Description is INFORMATION(5)
Event firing causes log and trap to community public, last fired
2014/03/08-08:25:19
Slow Drain Alerting and MitigationPort-monitor alerting – Example
port-monitor name AllPorts
port-type all
no monitor counter link-loss
no monitor counter sync-loss
no monitor counter signal-loss
no monitor counter invalid-words
no monitor counter invalid-crc
counter tx-discards poll-interval 60 delta rising-threshold 50 event 4 falling-threshold 10 event 4
counter lr-rx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4
counter lr-tx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4
counter timeout-discards poll-interval 60 delta rising-threshold 50 event 4 falling-threshold 10 event 4
counter credit-loss-reco poll-interval 60 delta rising-threshold 1 event 4 falling-threshold 0 event 4
counter tx-credit-not-available poll-interval 1 delta rising-threshold 10 event 4 falling-threshold 0 event 4
no monitor counter rx-datarate
no monitor counter tx-datarate
no monitor counter err-pkt-from-port
no monitor counter err-pkt-to-xbar
no monitor counter err-pkt-from-xbar
counter slowport-count poll-interval 1 delta rising-threshold 5 event 4 falling-threshold 0 event 4
counter slowport-oper-delay poll-interval 1 absolute rising-threshold 50 event 4 falling-threshold 0 event 4
counter txwait poll-interval 1 delta rising-threshold 20 event 4 falling-threshold 0 event 4
Policy applies to Access(F) and Trunk(E) ports
These counters are not monitored
Note: The above monitors 9 slow drain counters and does not monitor 10 others
New!
Slow Drain Alerting and Mitigation
MDS9710-1# show port-monitor AllPorts
Policy Name : AllPorts
Admin status : Not Active
Oper status : Not Active
Port type : All Ports
---------------------------------------------------------------------------------------------------------
Counter Threshold Interval Rising Threshold event Falling Threshold event PMON Portguard
------- --------- -------- ---------------- ----- ------------------ ----- --------------
TX Discards Delta 60 50 4 10 4 Not enabled
LR RX Delta 60 5 4 1 4 Not enabled
LR TX Delta 60 5 4 1 4 Not enabled
Timeout Discards Delta 60 50 4 10 4 Not enabled
Credit Loss Reco Delta 60 1 4 0 4 Not enabled
TX Credit Not Available Delta 1 10% 4 0% 4 Not enabled
slowport-count Delta 1 5 4 0 4 Not enabled
slowport-oper-delay Absolute 1 50ms 4 0ms 4 Not enabled
txwait Delta 1 20% 4 0% 4 Not enabled
----------------------------------------------------------------------------------------------------------
Port-monitor alerting – activation and output
New!
Slow Drain Alerting and Mitigation
• SNMP traps that are sent with the following object identifiers (OIDs):
• fcIfTxWtAvgBBCreditTransitionToZero: 1.3.6.1.4.1.9.9.289.1.2.1.1.38 • Note: There is no OID in the Rx direction.
• fcIfCreditLoss: 1.3.6.1.4.1.9.9.289.1.2.1.1.37
• fcIfLinkResetOuts: 1.3.6.1.4.1.9.9.289.1.2.1.1.10
• fcIfLinkResetIns: 1.3.6.1.4.1.9.9.289.1.2.1.1.9
• fcIfTimeOutDiscards: 1.3.6.1.4.1.9.9.289.1.2.1.1.35
• fcIfOutDiscards: 1.3.6.1.4.1.9.9.289.1.2.1.1.36
• fcIfSlowportCount: 1.3.6.1.4.1.9.9.289.1.2.1.1.44
• fcIfSlowportOperDelay: 1.3.6.1.4.1.9.9.289.1,2,1,1,45
• fcIfTxWaitCount: 1.3.6.1.4.1.9.9.289.1.2.1.1.15
SNMP trap OIDs sent by port-monitor
New!
New!
New!
Slow Drain Alerting and Mitigation
• Adding portgard to errdisable or flap a port can help the switch automatically
mitigate problems
• Should be done to access(F) ports only
• Use separate access(F) and trunk(E) policies
• Applies to delta counters only
Port-monitor portguard
Slow Drain Alerting and Mitigation
• The following adds portguard to timeout-discards and credit-loss-reco and adjusts the rising-threshold up a bit:
Port-monitor portguard - continued
port-monitor name AccessPorts
port-type access
no monitor counter link-loss
no monitor counter sync-loss
no monitor counter signal-loss
no monitor counter invalid-words
no monitor counter invalid-crc
counter tx-discards poll-interval 60 delta rising-threshold 50 event 4 falling-threshold 10 event 4
counter lr-rx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4
counter lr-tx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4
counter timeout-discards poll-interval 60 delta rising-threshold 60 event 4 falling-threshold 10 event 4 portguard errordisable
counter credit-loss-reco poll-interval 60 delta rising-threshold 4 event 4 falling-threshold 0 event 4 portguard errordisable
counter tx-credit-not-available poll-interval 1 delta rising-threshold 10 event 4 falling-threshold 0 event 4
no monitor counter rx-datarate
no monitor counter tx-datarate
no monitor counter err-pkt-from-port
no monitor counter err-pkt-to-xbar
no monitor counter err-pkt-from-xbar
counter slowport-count poll-interval 1 delta rising-threshold 5 event 4 falling-threshold 0 event 4
counter slowport-oper-delay poll-interval 1 absolute rising-threshold 50 event 4 falling-threshold 0 event 4
counter txwait poll-interval 1 delta rising-threshold 20 event 4 falling-threshold 0 event 4
Error disable the port when 60 timeout-discards happen in 60 seconds
Error disable the port when 4 credit loss recovery events occur in 60 seconds
Access(F) port policy
New!
Slow Drain Alerting and MitigationPort-monitor portguard – trunk (E) port policy
port-monitor name ISLPorts
port-type trunks
no monitor counter link-loss
no monitor counter sync-loss
no monitor counter signal-loss
no monitor counter invalid-words
no monitor counter invalid-crc
counter tx-discards poll-interval 60 delta rising-threshold 100 event 4 falling-threshold 10 event 4
counter lr-rx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4
counter lr-tx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4
counter timeout-discards poll-interval 60 delta rising-threshold 100 event 4 falling-threshold 10 event 4
counter credit-loss-reco poll-interval 60 delta rising-threshold 1 event 4 falling-threshold 0 event 4
counter tx-credit-not-available poll-interval 1 delta rising-threshold 10 event 4 falling-threshold 0 event 4
no monitor counter rx-datarate
no monitor counter tx-datarate
no monitor counter err-pkt-from-port
no monitor counter err-pkt-to-xbar
no monitor counter err-pkt-from-xbar
counter slowport-count poll-interval 1 delta rising-threshold 5 event 4 falling-threshold 0 event 4
counter slowport-oper-delay poll-interval 1 absolute rising-threshold 80 event 4 falling-threshold 0 event 4
counter txwait poll-interval 1 delta rising-threshold 20 event 4 falling-threshold 0 event 4
TrunkE) port policy
New!
Slow Drain Alerting and Mitigation
mds9710-1# show port-monitor active
Policy Name : ISLPorts
Admin status : Active
Oper status : Active
Port type : All Trunk Ports
---------------------------------------------------------------------------------------------------------
Counter Threshold Interval Rising Threshold event Falling Threshold event PMON Portguard
------- --------- -------- ---------------- ----- ------------------ ----- --------------
TX Discards Delta 60 100 4 10 4 Not enabled
LR RX Delta 60 5 4 1 4 Not enabled
LR TX Delta 60 5 4 1 4 Not enabled
Timeout Discards Delta 60 100 4 10 4 Not enabled
Credit Loss Reco Delta 60 1 4 0 4 Not enabled
TX Credit Not Available Delta 1 10% 4 0% 4 Not enabled
slowport-count Delta 1 5 4 0 4 Not enabled
slowport-oper-delay Absolute 1 50ms 4 0ms 4 Not enabled
txwait Delta 1 20% 4 0% 4 Not enabled
----------------------------------------------------------------------------------------------------------
Continued next slide…
Port-monitor portguard – when activated
New!
Slow Drain Alerting and Mitigation
…continued from previous slide
Policy Name : AccessPorts
Admin status : Active
Oper status : Active
Port type : All Access Ports
---------------------------------------------------------------------------------------------------------
Counter Threshold Interval Rising Threshold event Falling Threshold event PMON Portguard
------- --------- -------- ---------------- ----- ------------------ ----- --------------
TX Discards Delta 60 50 4 10 4 Not enabled
LR RX Delta 60 5 4 1 4 Not enabled
LR TX Delta 60 5 4 1 4 Not enabled
Timeout Discards Delta 60 60 4 10 4 Error Disable
Credit Loss Reco Delta 60 4 4 0 4 Error Disable
TX Credit Not Available Delta 1 10% 4 0% 4 Not enabled
slowport-count Delta 1 5 4 0 4 Not enabled
slowport-oper-delay Absolute 1 50ms 4 0ms 4 Not enabled
Tx wait Delta 1 20% 4 0% 4 Not enabled
----------------------------------------------------------------------------------------------------------
Port-monitor portguard – when activated - continued
New!
Slow Drain Alerting and MitigationDCNM event log
Slow Drain Alerting and Mitigation
• Lowering congestion drop timeout value from 500ms to 200ms
• Frees up ingress buffer space quicker
• Can be set differently on F and E ports
• Congestion timeout for mode F should be smaller than(or equal to) mode E.
• Global command for switch
• Recommended for F ports
Adjust Congestion Drop Threshold Lower
system timeout congestion-drop 200 mode f
0 sec --
200ms --
Credit
Frame
Frame
Frame
Frame
Check Timestamp
of each frame
Drop the Frames
from the queue
Setting the No Credit Drop Threshold
• No-credit-drop causes frames to be dropped immediately if the destination port is at 0 Tx credits for the time specified
• Should be used in conjunction with lowering congestion-drop threshold
• Recommended for F ports
• Can drastically improve ISL performance under slow drain conditions
• xxx_FORCE_TIMEOUT_ON/OFF counter
• By default no-credit-drop is not enabled
Setting the No Credit Drop Threshold system timeout no-credit-drop 200 mode f
Test results – congestion-timeout/no-credit-timeout Topology
ISL
Slow
Drain
Device
Ag104/1
Ag104/3
Ag104/4
Ag104/2Fc1/13
Fc1/14
Fc1/3 Fc1/3
Fc1/13
Fc1/14
4Gbps
4Gbps 4Gbps
4Gbps
8Gbps
Test results – congestion-timeout/no-credit-timeout 104/4 R-Rdy delay 300ms - Default timeout settings – frames/sec
Test results – congestion-timeout/no-credit-timeout 104/4 R-Rdy delay 300ms – Congestion-drop/no-credit-drop 200ms
Almost 3X improvement on the flow!
SAN Congestion! BRKSAN-3446
Summary
Summary
• FC B2B flow control helps reduce packet loss
• Devices with problems can cause congestion problems in the fabric
• This congestion can propagate through the fabric affecting unrelated devices
• MDS has several features designed to alert, identify and mitigate
• Classify your problem and follow the troubleshooting guidelines
Troubleshooting Summary
• Proactive
Configure slowport-monitor
Configure congestion-drop and no-credit-drop
Configure port-monitor policies
• Reactive
Use several show logging onboard commands with starttime option to display events
Where do you start?
Troubleshooting Summary
• Configure slowport-monitor @ 10-25ms for both E & F ports
system timeout slowport-monitor 10 mode e
system timeout slowport-monitor 10 mode f
• Configure congestion-drop on F ports
system timeout congestion-drop 200ms mode f
Don’t go below 200ms!
• Configure no-credit-drop on F ports
System timeout no-credit-drop <ms> mode f
200ms – safe, 100ms – aggressive, 50ms – Very aggressive
• Configure port-monitor policy(s)
Use samples included in port-monitor section
Proactive
Troubleshooting Summary
• Show logging onboard <starttime mm/dd/yy-00:00:00> error-stats
Includes timestamped indications of all three levels of congestion
Credit-loss-recovery
timeout-discards
Latency 100ms Tx & Rx average wait
• Show logging onboard <starttime mm/dd/yy-00:00:00> slowport-monitor-events
Includes timestamped slowport-monitor-events
Mostly for grade 1 (latency) issues
• Show logging onboard <starttime mm/dd/yy-00:00:00> txwait
Includes timestamped interfaces that had >=100ms delay in 20 seconds
Mostly for grade 1 (latency) issues
Reactive
Additional References
• Slow Drain Device Detection and Congestion Avoidance Whitepaper
• http://www.cisco.com/c/en/us/products/collateral/storage-networking/mds-9700-series-multilayer-directors/white_paper_c11-729444.html
• Generation 4 (gen4) Linecard Slow Drain Counters and Commands Troubleshooting
• http://www.cisco.com/c/en/us/support/docs/storage-networking/mds-9509-multilayer-director/116098-trouble-gen4-00.html
• MDS 9148 Slow Drain Counters and Commands
• http://www.cisco.com/c/en/us/support/docs/storage-networking/mds-9100-series-multilayer-fabric-switches/116401-trouble-mds9148-00.html
MDS Command Reference
Command Function
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] Displays basic interface information including Tx/Rx credit values agreed to
in the FLOGI/ELP exchange and Tx/Rx credits remaining. Displays one or
more interfaces.
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters Displays more counters pertaining to the interface including:
Timeout-discards, credit-loss, Tx/Rx transitions to zero, Tx/Rx credits
remaining, Txwait in 2.5us, average Txwait for 1s/1m/1h/72h. Displays one
or more interfaces.
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters details Displays more counters pertaining to the interface but regarding slow drain,
not much different than the above. This will only work for a specified
interface or range of interfaces. To display all fc interfaces use the show
interface detailed-counters command. Displays one or more interfaces.
show interface detailed-counters Displays more counters for all interfaces but regarding slow drain, not much
different than the show interface counters command.
show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss]
error-stats
Display OBFL error-stats. This contains many counters related to slow drain
including timeout drops, tx/rx 100ms credit-not-available, credit-loss, force-
timeout, etc. Often the first command to use.
show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss]
txwait
Display txwait delta values recorded when greater than 99ms per 20 second
interval
MDS 9500
MDS Command ReferenceMDS 9500 - continued
Command Function
show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss]
slowport-monitor-events
Display OBFL slowport-monitor-events. This is similar to show process
creditmon slowport-monitor-events but will likely contain more than 10 events
per interface
show logging onboard flow-control [starttime mm/dd/yy-hh:mm:ss]
request-timeout [module x]
Display OBFL arbitration timeouts. Note these are not packet drops. These
likely indicate the destination interface listed is congested. The source
interface will retry the arbitration request.
show hardware internal statistics [module x] [device all|fcmac] Displays statistical information for ports which include errors as well
show hardware internal statistics module x pktflow dropped Displays packet drop counters
show hardware internal errors [module x] Displays error information for ports. Error indications include frame
drops/discards for various reasons including timeout discards.
slot x show port-config internal link-events Linecard command to display the link-events log. This is a concise event log
for interfaces going up and down.
show process creditmon credit-loss-events [module x] Display last 10 credit loss events per interface
show process creditmon slowport-monitor-events [module x] Display last 10 slowport-monitor events per interface
System timeout slowport-monitor must be configured
MDS Command ReferenceMDS 9500 - continued
Command Function
show process creditmon txwait-history [module x] [port x] Display 60 second, 60 minute, 72 hour histogram graphs
Only valid for generation 4 linecards
show system internal snmp credit-not-available Displays instances of the 100ms tx-credit-not-available
slot x show hardware internal up-xbar <0-1> queued-packet-info Displays information indicating packets that are momentarily queued.
For generation 3 linecards only
slot x show hardware internal que inst <0-3> memory iqm-statusmem0|1 Displays information indicating packets that are momentarily queued.
For generation 4 linecards only
MDS Command ReferenceMDS 9700 / MDS 9396S
Command Function
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] Displays basic interface information including Tx/Rx credit values agreed
to in the FLOGI/ELP exchange and Tx/Rx credits remaining. Displays one
or more interfaces.
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters Displays more counters pertaining to the interface including:
Timeout-discards, credit-loss, Tx/Rx transitions to zero, Tx/Rx credits
remaining, Txwait in 2.5us, average Txwait for 1s/1m/1h/72h. Displays
one or more interfaces.
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters details Displays more counters pertaining to the interface but regarding slow
drain, not much different than the above. This will only work for a specified
interface or range of interfaces. To display all fc interfaces use the show
interface detailed-counters command. Displays one or more interfaces.
show interface detailed-counters Displays more counters for all interfaces but regarding slow drain, not
much different than the show interface counters command.
show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss] error-stats Display OBFL error-stats. This contains many counters related to slow
drain including timeout drops, tx/rx 100ms credit-not-available, credit-loss,
force-timeout, etc. Often the first command to use.
show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss] txwait Display txwait delta values recorded when greater than 99ms per 20
second interval
MDS Command ReferenceMDS 9700 / MDS 9396S - continued
Command Function
show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss]
slowport-monitor-events
Display OBFL slowport-monitor-events. This is similar to show process
creditmon slowport-monitor-events but will likely contain more than 10
events per interface
show logging onboard flow-control [starttime mm/dd/yy-hh:mm:ss]
request-timeout [module x]
Display OBFL arbitration timeouts. Note these are not packet drops. These
likely indicate the destination interface listed is congested. The source
interface will retry the arbitration request.
show hardware internal statistics [module x] [device all|fcmac] Displays statistical information for ports which include errors as well
show hardware internal statistics [module x|module-all] pktflow
dropped
Displays packet drop counters
Note: if “module x” or [module-all] is omitted then only the counters for the
supervisors are displayed. This is probably not what you want.
show hardware internal errors [module x|module-all] Displays error information for ports. Error indications include frame
drops/discards for various reasons including timeout discards.
Note: if “module x” or [module-all] is omitted then only the counters for the
supervisors are displayed. This is probably not what you want.
slot x show port-config internal link-events Linecard command to display the link-events log. This is a concise event
log for interfaces going up and down.
show process creditmon credit-loss-events [module x] Display last 10 credit loss events per interface
show process creditmon slowport-monitor-events [module x] Display last 10 slowport-monitor events per interface
System timeout slowport-monitor must be configured
MDS Command ReferenceMDS 9700 / MDS 9396S - continued
Command Function
show process creditmon txwait-history [module x] [port x] Display 60 second, 60 minute, 72 hour histogram graphs
Only valid for generation 4 linecards
show system internal snmp credit-not-available Displays instances of the 100ms tx-credit-not-available
slot x show hardware internal fcmac inst [0-5 | 0-11]
tmm_timeout_stat_buffer
Displays information indicating packets dropped due to timeouts
slot x show hardware internal f16_que inst [0-5 | 0-11] table iqm-
statusmem0|1
Displays information indicating packets that are momentarily queued.
MDS Command ReferenceMDS 9148
Command Function
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] Displays basic interface information including Tx/Rx credit values agreed
to in the FLOGI/ELP exchange and Tx/Rx credits remaining. Displays one
or more interfaces.
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters Displays more counters pertaining to the interface including:
Timeout-discards, credit-loss, Tx/Rx transitions to zero, Tx/Rx credits
remaining, Txwait in 2.5us, average Txwait for 1s/1m/1h/72h. Displays
one or more interfaces.
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters details Displays more counters pertaining to the interface but regarding slow
drain, not much different than the above. This will only work for a specified
interface or range of interfaces. To display all fc interfaces use the show
interface detailed-counters command. Displays one or more interfaces.
show interface detailed-counters Displays more counters for all interfaces but regarding slow drain, not
much different than the show interface counters command.
show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss] error-stats Display OBFL error-stats. This contains many counters related to slow drain
including timeout drops, tx/rx 100ms credit-not-available, credit-loss, force-timeout,
etc. Often the first command to use.
show logging onboard flow-control [starttime mm/dd/yy-hh:mm:ss] request-
timeout [module x]
Display OBFL arbitration timeouts. Note these are not packet drops. These likely
indicate the destination interface listed is congested. The source interface will retry
the arbitration request.
MDS Command ReferenceMDS 9148 - continued
Command Function
show hardware internal statistics all Displays statistical information for ports which include errors as well
slot 1 show hardware internal errors Displays error information for ports. Error indications include frame
drops/discards for various reasons including timeout discards.
show hardware internal packet-flow dropped Display counts of packets dropped
show hardware internal packet-dropped-reason Displays counters of packets dropped and the counter names(reasons) for
each
slot x show port-config internal link-events Linecard command to display the link-events log. This is a concise event
log for interfaces going up and down.
show process creditmon credit-loss-events [module x] Display last 10 credit loss events per interface
show system internal snmp credit-not-available Displays instances of the 100ms tx-credit-not-available
MDS Command ReferenceMDS 9250i
Command Function
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] Displays basic interface information including Tx/Rx credit values agreed
to in the FLOGI/ELP exchange and Tx/Rx credits remaining. Displays one
or more interfaces.
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters Displays more counters pertaining to the interface including:
Timeout-discards, credit-loss, Tx/Rx transitions to zero, Tx/Rx credits
remaining, Txwait in 2.5us, average Txwait for 1s/1m/1h/72h. Displays
one or more interfaces.
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters details Displays more counters pertaining to the interface but regarding slow
drain, not much different than the above. This will only work for a specified
interface or range of interfaces. To display all fc interfaces use the show
interface detailed-counters command. Displays one or more interfaces.
show interface detailed-counters Displays more counters for all interfaces but regarding slow drain, not
much different than the show interface counters command.
show interface fcx/y counters details Displays more counters pertaining to the interface but regarding slow drain, not
much different than the above.
show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss] error-stats Display OBFL error-stats. This contains many counters related to slow drain
including timeout drops, tx/rx 100ms credit-not-available, credit-loss, force-timeout,
etc. Often the first command to use. This command requires an single interface or
interface range.
show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss] txwait Display txwait delta values recorded when greater than 99ms per 20 second interval
MDS Command ReferenceMDS 9250i - continued
Command Function
show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss]
slowport-monitor-events
Display OBFL slowport-monitor-events. This is similar to show process
creditmon slowport-monitor-events but will likely contain more than 10
events per interface
show logging onboard flow-control [starttime mm/dd/yy-hh:mm:ss]
request-timeout [module x]
Display OBFL arbitration timeouts. Note these are not packet drops. These
likely indicate the destination interface listed is congested. The source
interface will retry the arbitration request.
slot 1 show hardware internal statistics device all Displays statistical information for ports which include errors as well
slot 1 show hardware internal errors Displays error information for ports. Error indications include frame
drops/discards for various reasons including timeout discards.
slot 1 show port-config internal link-events Linecard command to display the link-events log. This is a concise event log
for interfaces going up and down.
show process creditmon credit-loss-events [module x] Display last 10 credit loss events per interface
show process creditmon slowport-monitor-events Display last 10 slowport-monitor events per interface
System timeout slowport-monitor must be configured
show process creditmon txwait-history [module x] [port x] Display 60 second, 60 minute, 72 hour histogram graphs
Only valid for generation 4 linecards
show system internal snmp credit-not-available Displays instances of the 100ms tx-credit-not-available
show hardware internal packet-flow dropped Display counts of packets dropped
MDS Command ReferenceMDS 9148S
Command Function
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] Displays basic interface information including Tx/Rx credit values agreed to
in the FLOGI/ELP exchange and Tx/Rx credits remaining. Displays one or
more interfaces.
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters Displays more counters pertaining to the interface including:
Timeout-discards, credit-loss, Tx/Rx transitions to zero, Tx/Rx credits
remaining, Txwait in 2.5us, average Txwait for 1s/1m/1h/72h. Displays one
or more interfaces.
show interface [fcx/y[-z]] [,fcx/y[-z]] [,…] counters details Displays more counters pertaining to the interface but regarding slow drain,
not much different than the above. This will only work for a specified
interface or range of interfaces. To display all fc interfaces use the show
interface detailed-counters command. Displays one or more interfaces.
show interface detailed-counters Displays more counters for all interfaces but regarding slow drain, not
much different than the show interface counters command.
show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss]
error-stats
Display OBFL error-stats. This contains many counters related to slow
drain including timeout drops, tx/rx 100ms credit-not-available, credit-loss,
force-timeout, etc. Often the first command to use.
show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss]
txwait
Display txwait delta values recorded when greater than 99ms per 20
second interval
MDS Command ReferenceMDS 9148S - continued
Command Function
show logging onboard [module x] [starttime mm/dd/yy-hh:mm:ss]
slowport-monitor-events
Display OBFL slowport-monitor-events. This is similar to show process
creditmon slowport-monitor-events but will likely contain more than 10
events per interface
show logging onboard flow-control [starttime mm/dd/yy-hh:mm:ss]
request-timeout [module x]
Display OBFL arbitration timeouts. Note these are not packet drops. These
likely indicate the destination interface listed is congested. The source
interface will retry the arbitration request.
slot 1 show hardware internal statistics device all Displays statistical information for ports which include errors as well
slot 1 show hardware internal errors Displays error information for ports. Error indications include frame
drops/discards for various reasons including timeout discards.
slot 1 show port-config internal link-events Linecard command to display the link-events log. This is a concise event
log for interfaces going up and down.
show process creditmon credit-loss-events [module x] Display last 10 credit loss events per interface
show process creditmon slowport-monitor-events Display last 10 slowport-monitor events per interface
System timeout slowport-monitor must be configured
show process creditmon txwait-history [module x] [port x] Display 60 second, 60 minute, 72 hour histogram graphs
Only valid for generation 4 linecards
show system internal snmp credit-not-available Displays instances of the 100ms tx-credit-not-available
show hardware internal packet-flow dropped Display counts of packets dropped
Slow drain counters and descriptions
For the following MDS switches:9500 – Gen2/3/4 linecards
9700 – Gen 5 linecard
9148 – 8G 48 port Fabric switch
9250i – Multiservice Fabric Switch
9148S – 16G 48 port Fabric switch
9396S – 16G 96 port fabric switch
Table 1 – Counters indicating delay only
Table 2 – Counters indicating frame drops
Table 3 – Counters indicating action on or for an interface
Table 4 – Counters representing interrupts
Table 5 – SNMP variables
Superscripts indicate linecard generation or switch type. See the
list located after Table 5.
Slow drain counters and descriptions
Counter Name Description Commands Additional Info
FCP_CNTR_RCM_CH0_LACK_OF_CREDIT2
AK_FCP_CNTR_RCM_CH0_LACK_OF_CREDIT3
THB_RCM_RCP0_RBBZ_CH04,note1
F16_RCM_RCP0_RBBZ_CH05
FCP_CNTR_RCM_RBBZ_CH0 48
VIP_RCM_RBBZ_CH0_CNT 50i, 48S
Total count of transitions to zero for Rx B2B credits on ch0; these
transitions typically indicate that the switch is applying back pressure to
the attached device because of perceived congestion, and this perceived
congestion can be the result of a lack of Tx B2B credits being returned
on an interface over which this device is communicating
There is no indication of time at zero for this counter. It could stay at zero
for just an instant or for an extended duration of time.
Also shown in the output of show interface counters:
xxxx receive B2B credit transitions from zero
or
xxxx Receive B2B credit transitions to zero
In the above “from” was changed to “to” via this bug:
CSCug35184 show interface counters - transitions of rx BB credit to
zero state
Sup:
show hardware internal statistics all2,3,4,48,note2
show hardware internal statistics device all5,note2
None48s,50i,note3
Linecard:
slot x show hardware internal statistics2,3,48
slot x show hardware internal fc-mac port x error-statistic2,3
slot x show hardware internal statistics device fcmac all4
slot x show hardware internal statistics device
fcmac|all5,48s,50i
Note1: CSCts28865 B2B credit 0
transitions incorrect for
generation 4 linecards
Integrated in NX-OS 5.2(2)
Note2: CSCut21070 show
hardware internal statistics sup
command does not include fcmac
Note3: CSCus85931 Need
show hardware internal errors
command on MDS 9250i, 9148,
9148S
Table 1 - Counters indicating delay only
Table 1 - Counters indicating delay only
Slow drain counters and descriptions
FCP_CNTR_QMM_CH0_LACK_OF_TRANSMIT_CREDIT2
AK_FCP_CNTR_QMM_CH0_LACK_OF_TRANSMIT_CREDIT3
THB_TMM_PORT_TBBZ_CH04,note1
F16_RCM_RCP0_TBBZ_CH05
FCP_CNTR_TMM_TBBZ_CH048
FCP_CNTR_TMM_TBBZ_CH148
VIP_TMM_TBBZ_CH0_CNT50i, 48S
VIP_TMM_TBBZ_CH1_CNT50i, 48S
Total count of transitions to zero for Tx B2B credits on ch0 or ch1; these
transitions are typically the result of the attached device's withholding of
R_Rdy primitive from the switch due to congestion in that device.
There is no indication of time at zero for this counter. It could stay at zero for
just an instant or for an extended duration of time.
Also shown in the output of show interface counters:
xxxx transmit B2B credit transitions from zero
or
xxxx Transmit B2B credit transitions to zero
In the above “from” was changed to “to” via this bug:
CSCug35184 show interface counters - transitions of rx BB credit to zero
state
Sup:
show hardware internal statistics all2,3,4,48
show hardware internal statistics device all5
None48s,50i,note3
Linecard:
slot x show hardware internal statistics2,3,48
slot x show hardware internal fc-mac port x error-statistic2,3
slot x show hardware internal statistics device fcmac all4
slot x show hardware internal statistics device
fcmac|all5,48s,50i
Note1: CSCts28865 B2B credit 0
transitions incorrect for generation
4 linecards
Note2: CSCut21070 show
hardware internal statistics sup
command does not include fcmac
Note3: CSCus85931 Need show
hardware internal errors command
on MDS 9250i, 9148, 9148S
Table 1 - continued
Table 1 - Counters indicating delay only
Slow drain counters and descriptions
None2,3
THB_TMM_PORT_TWAIT_CNT4
F16_TMM_PORT_TWAIT_CNT5
None48
VIP_TMM_TXWAIT_CH0_CNT50i, 48S
VIP_TMM_TXWAIT_CH1_CNT50i, 48S
Packet is available to send, but no credit is available;
Gen4/Gen5: increments every clock cycle (cycle = 2.353 nanoseconds 425Mhz)
9250i/9148s: Increments every clock cycle (cycle = 2ns 500MHz)
Must multiply by number of ports in port-group to get actual time.
To calculate actual time:
Twait * clock_rate * ports in port_group
Sup:note3
None5,note1
None48s,50i,note2
Linecard:
slot x show hardware internal statistics device fcmac|all5,48s,50i
Note1: CSCut21070 show
hardware internal statistics sup
command does not include fcmac
Note2: CSCus85931 Need show
hardware internal errors command
on MDS 9250i, 9148, 9148S
Note3: See Table 3 SNMP variable
fcIfTxWaitCount
Table 1 - Counters indicating delay only
Table 1 - continued
Slow drain counters and descriptions Table 1 - Counters indicating delay only
Table 1 - continued
FCP_CNTR_RX_WT_AVG_B2B_ZERO2, 48
AK_FCP_CNTR_RX_WT_AVG_B2B_ZERO3
FCP_SW_CNTR_RX_WT_AVG_B2B_ZERO4
FCP_SW_CNTR_RX_WT_AVG_B2B_ZERO48,note1
FCP_SW_CNTR_RX_WT_AVG_B2B_ZERO5,50i, note2
FCP_SW_CNTR_RX_WT_AVG_B2B_ZERO48S
Count of the number of times an interface was at zero Rx B2B credits for 100
ms; this status typically indicates that the switch is withholding R_Rdy
primitive to the device attached on that interface due to congestion in the
path to devices with which it is communicating
Always incremented by the software creditmon process.
OBFL:
Show logging onboard error-stats2,3,4,5,48,48s,50i
Sup Hardware internal errors:
show hardware internal errors all|module x2,3,4
None5,note5
None48s,50i,note4
Sup Hardware internal statistics:
show hardware internal statistics all2,3,48
None4,5,note3
None48,48s,50i,note4
Linecard Hardware internal statistics:
slot x show hardware internal statistics2,3,48
slot xshow hardware internal fc-mac port x error-statistic2,3
slot xshow hardware internal errors4,5,48,48s,50i
slot x show hardware internal statistics device fcmac all port x4,5
Note 1: MDS 9148 added support
for this counter in
NX-OS 5.2(6)
Note2: Gen5 and 9250i do not
increment 6.2(1), 6.2(5) and 6.2(7).
CSCui27981
FCP_SW_CNTR_RX_WT_AVG_B2B_
ZERO not incrementing on DS-
X9448-768K9
Integrated in: 6.2(9)
Note3: CSCut21070 show
hardware internal statistics sup
command does not include fcmac
Integrated in: open
Note4: CSCus85931 Need show
hardware internal errors & stats
cmds on MDS 9250i 9148 9148S
Integrated in: open
Slow drain counters and descriptions Table 1 - Counters indicating delay only
Table 1 - continued
FCP_CNTR_TX_WT_AVG_B2B_ZERO2
AK_FCP_CNTR_TX_WT_AVG_B2B_ZERO3
FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO4
FCP_CNTR_TX_WT_AVG_B2B_ZERO48,note1,note2
FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO5, 50i,48s
Count of the number of times that an interface was at zero Tx B2B credits for
100 ms. This status typically indicates congestion at the device attached on
that interface.
Incremented by the creditmon software process on MDS 9500 and 9148.
Consequently, it could indicate an interval between 100ms and 199ms.
NX-OS 6.2(1) through 6.2(7) on the 9710 and NX-OS 6.2(5) through 6.2(7) on
the 9250i this was incremented based on
F16_FCP_INTR_TMM_P_CWAIT_FORCE_TIMEOUT_L_H and
VIPER_FCP_INTR_TMM_CWAIT_AVG_LIVE_CH1_OVER_THRESHOLD_RAISING
interrupts.
Consequently, this only occurred once when the HW interrupt occurred and
not each 100ms interval like in prior instances.
MDS 9700, 9250i and 9148S: In NX-OS 6.2(9) these are once again
incremented by the software creditmon process. They will once again
increment each 100ms interval where the port remains at 0 Tx credits.
OBFL:
Show logging onboard error-stats2,3,4,5,48,48s,50i
Sup Hardware internal errors:
show hardware internal errors all|module x2,3,4
None5,note5
None48s,50i,note4
Sup Hardware internal statistics:
show hardware internal statistics all2,3,48
None4,5,note3
None48,48s,50i,note4
Linecard Hardware internal statistics:
slot x show hardware internal statistics2,3,48
slot xshow hardware internal fc-mac port x error-statistic2,3
slot xshow hardware internal errors4,5,48,48s,50i
slot x show hardware internal statistics device fcmac all port x4,5
Note 1: MDS 9148 added support
for this counter in
NX-OS 5.2(6)
Note2: CSCud93587 MDS9148
OBFL doesn't contain
FCP_CNTR_TX_WT_AVG_B2B_ZERO
Integrated in: Unresolved
Note3: CSCut21070 show
hardware internal statistics sup
command does not include fcmac
Integrated in: open
Note4: CSCus85931 Need show
hardware internal errors & stats
cmds on MDS 9250i 9148 9148S
Integrated in: open
Slow drain counters and descriptions Table 1 - Counters indicating delay only
Table 1 - continued
RI12_CP_CNT_RESEND_MSG_DROP 2,3
FAL_RI0_CP_CNT_RESEND_MSG_DROP4
These are not packet drops. Only the request resend message to arbiter was
dropped. This can be the case when the original request was finally serviced,
so the follow up message was dropped. It can indicate some minor
congestion of the egress port, so request could not be granted immediately.
This is counted against the ingress port. It probably indicates some
congestion on an egress port.
Check show logging onboard flow-control request-timeout - You might see
corresponding entries.
Sup:
show hardware internal errors all|module x
OBFL:
show logging onboard error-stats
F16_TMM_PORT_STUCK_FORCE_TIMEOUT_L_H_CNT5,note1,note3 Count of times port was at zero Tx credits for the stuck port timeout value.
NX-OS 6.2(1) through 6.2(7) on the 9710 this was used for credit loss
recovery so was set to 1s(F port)/1.5s(E port).
MDS 9700, 9250i and 9148S: In NX-OS 6.2(9) the software creditmon process
once again detects credit loss recovery and the stuck force timout is used for
“system timeout no-credit-drop”. Defaults to 500ms with no action taken(no
packets are dropped then it is reached).
Needs to be configured via:
System timeout no-credit-drop <ms> mode e|f
This counter will increment even if “system timeout no-credit-drop” is not
configured since it defaults to 500ms. If no-credit-drop is not configured then
no action is taken and it simply indicates the port was at zero Tx credits for
500ms. Note3
This is similar to the viper counter:
VIP_TMM_STK_PRT_TO_TRANSITION_CHx_CNT
Sup:
NoneNote2
Linecard:
Slot x show hardware internal statistics device all
slot x show hardware internal errors
Note1: Might falsely increment
during port flap:
CSCus70632
F16_TMM_PORT_STUCK_FORCE_TI
MEOUT_L_H_CNT increments
during port flap
Note2: CSCut21070 show
hardware internal statistics sup
command does not include fcmac
Integrated in: open
Note3: CSCut27271 Stuck port
threshold not reset to default when
removing no-credit-drop
Integrated in: NX-OS 6.2(13)
Slow drain counters and descriptions Table 1 - Counters indicating delay only
Table 1 - continued
F16_TMM_PORT_CWAIT_FORCE_TIMEOUT_H_L_CNT5 Count of times a credit was received after the slow port timeout threshold had
been triggered.
Sup:
NoneNote1
Linecard:
Slot 1 show hardware internal statistics device all|fcmac
slot x show hardware internal errors
Note1: CSCut21070 show
hardware internal statistics sup
command does not include fcmac
Integrated in: open
Slow drain counters and descriptions Table 1 - Counters indicating delay only
Table 1 - continued
VIP_TMM_STK_PRT_TO_TRANSITION_CH0_CNT48S,50i,note2
VIP_TMM_STK_PRT_TO_TRANSITION_CH1_CNT48S,50i,note2
Count of times port was at zero Tx credits for the stuck port timeout value.
- channel 0 (high priority queue)
- channel 1 (low priority queue)
NX-OS 6.2(5) through 6.2(7) on the 9250i this was used for credit loss
recovery so was set to 1s(F port)/1.5s(E port).
In NX-OS 6.2(9) the software creditmon process once again detects credit loss
recovery and the stuck force timout is used for “system timeout no-credit-
drop”. Defaults to 500ms with no action taken (no packets are dropped then it
is reached).
Needs to be configured via:
System timeout no-credit-drop <ms> mode e|f
This counter will increment even if “system timeout no-credit-drop” is not
configured since it defaults to 500ms. If no-credit-drop is not configured then
no action is taken and it simply indicates the port was at zero Tx credits for
500ms. Note3
This is similar to the Gen5 counter
F16_TMM_PORT_STUCK_FORCE_TIMEOUT_L_H_CNT
Sup:
NoneNote1
Linecard:
slot 1 show hardware internal statistics device all|fcmac
slot 1 show hardware internal errors
Note1: CSCus85931 Need show
hardware internal errors & stats
cmds on MDS 9250i 9148 9148S
Integrated in: open
Note2: CSCut27271Stuck port
threshold not reset to default when
removing no-credit-drop
Integrated in: open
Slow drain counters and descriptions Table 1 - Counters indicating delay only
Table 1 - end
VIP_TMM_SLO_PRT_TO_TRANSITION_CH0_CNT48S,50i
VIP_TMM_SLO_PRT_TO_TRANSITION_CH1_CNT48S,50i
Count of times port was at zero Tx credits for the slow port timeout value.
- channel 0 (high priority queue)
- channel 1 (low priority queue)
NX-OS 6.2(5) through 6.2(7) on the 9250i this was used to increment the
FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO counter.
Consequently, this only occurred once when the HW interrupt occurred and
not each 100ms interval like in prior instances.
In NX-OS 6.2(9) this is used for the slowport-monitor feature.
Needs to be configured via:
System timeout slowport-monitor <ms> mode e|f
Slowport-monitor events can be displayed via:
show process creditmon slowport-monitor-events
show logging onboard slowport-monitor-events
This is similar to the Gen5 counter;
F16_FCP_INTR_TMM_P_CWAIT_FORCE_TIMEOUT_L_H
Sup:
NoneNote1
Linecard:
show hardware internal statistics device all|fcmac
Note1: CSCus85931 Need show
hardware internal errors & stats
cmds on MDS 9250i 9148 9148S
Integrated in: open
Slow drain counters and descriptions Table 2 - Counters indicating frame drops
Table 2
Counter Name Description Commands Additional Info
None2
None3
THB_TMM_PORT_FRM_DROP_CNT4 ,note 1
F16_TMM_PORT_FRM_DROP_CNT5
FCP_CNTR_TMM_NORMAL_DROP48
VIP_TMM_NORMAL_DROP_CNT50i, 48S, note2
VIP_TMM_TOTAL_DROP_ CNT50i, 48S, note2
Number of frames dropped in tolb_path or np path by the Transmit Memory
Manager(TMM); these drops include all types of packet drops: timeout,
offline, abort drops, dummy frame drops at egress, etc.
These counters are the aggregate counters for all the underlying counters.
OBFL:
Show logging onboard error-stats2,3,4,5,48,48s,50i
Sup Hardware internal errors/statistics:
show hardware internal errors all|module x2,3,4
show hardware internal statistics all48
None48s,50i,note3
Sup packet-dropped-reason:
show hardware internal packet-dropped-reason mod 48,48S,50i
Linecard Hardware internal errors
show hardware internal fc-mac port x error-statistic2,3
show hardware internal errors4,5,48,48s,50i
Note 1: Bugs:
CSCud77292 Gen 4 linecards do
not increment output discards on
interface statistics
Integrated into NX-OS 5.2(8c) and
6.2(1)
Note 2: It is not normal for drops to
occur so this counter's name is
misleading. The following bug
renamed
VIP_TMM_NORMAL_DROP_CNT to
VIP_TMM_TOTAL_DROP_CNT since
the drops included in this counter
are not necessarily normal.
CSCus60322 Add
VIP_TMM_TO_CNT and
VIP_TMM_TO_DROP_CNT to
packet-flow dropped
Integrated in NX-OS 6.2(13)
Slow drain counters and descriptions Table 2 - Counters indicating frame drops
Table 2 - continued
FCP_CNTR_LAF_TOTAL_TIMEOUT_FRAMES2
AK_FCP_CNTR_LAF_TOTAL_TIMEOUT_FRAMES3
THB_TMM_TOLB_TIMEOUT_DROP_CNT4
F16_TMM_TOLB_TIMEOUT_DROP_CNT5
FCP_CNTR_TMM_TIMEOUT_DROP48
VIP_TMM_TO_DROP_CNT50i, 48S,note2,note3
Timeout drops at egress due to frames hitting the congestion drop threshold
Congestion drop threshold is set via the following command and is on at 500ms
by default on all “modes” (port types):
system timeout congestion-drop mode e|f
OBFL:
Show logging onboard error-stats2,3,4,5,48,48s,50i
Show logging onboard flow-control timeout-drops 2,3,4,5,48,48s,50i,note1
Sup Hardware internal errors
show hardware internal errors all|module x2,3,4
show hardware internal statistics all48
show hardware internal statistics all5,note4
None48s,50i,note5
Sup packet-dropped-reason
show hardware internal packet-dropped-reason mod 2,3,48,48S,50i
Linecard Hardware internal errors
slot x show hardware internal statistics2,3,48
show hardware internal fc-mac port x error-statistic2,3
show hardware internal errors2,3,4,5,48,48s,50i
show hardware internal statistics device fcmac all4
show hardware internal statistics device all|fcmac2,3,5,48,48s,50i
Note1: These do not appear until
NX-OS 6.2(9)
Note2: These are included in
VIP_TMM_TO_CNT.
Note3: Should be included in `show
hardware internal statistics pktflow
dropped`:
CSCus60322 Add
VIP_TMM_TO_CNT and
VIP_TMM_TO_DROP_CNT to
packet-flow dropped
Integrated into NX-OS 6.2(13)
Note4: CSCut21070 show
hardware internal statistics sup
command does not include fcmac
Integrated in: open
Note5: CSCus85931 Need show
hardware internal errors & stats
cmds on MDS 9250i 9148 9148S
Integrated in: open
Slow drain counters and descriptions Table 2 - Counters indicating frame drops
Table 2 - continued
THB_TMM_TIMEOUT_STATS_DROP4
F16_TMM_TIMEOUT_STATS_DROP5
Timeout stats dropped because stats fifo full
These counters are not real drops. Basically what I have
understood from F16/TBIRD ASIC is that, there is TIMEOUT
STATS FIFO available at TMM. This FIFO holds, packets which
are timed out. If the FIFO is full and not read, newly timed
out packets will not be overwritten in to the FIFO and new
time-out packets are counted by TIMEOUT_STATS_DROP.
It is TBIRD/F16 feature. Viper does not have this feature.
Gen2/Gen3 also do not have this feature.
FCP_CNTR_LAF_C3_TIMEOUT_FRAMES_DISCARD2
AK_FCP_CNTR_LAF_C3_TIMEOUT_FRAMES_DISCARD3
THB_TMM_TO_CNT_CLASS_34
F16_TMM_TO_CNT_CLASS_35
None48,50i, 48S
Count of class-3 Fibre Channel frames dropped as a result
of congestion-drop timeout
OBFL:
Show logging onboard error-stats2,3,4,5
Sup Hardware internal errors
show hardware internal errors all|module x2,3
show hardware internal statistics device
fcmac|all4,5,note1
Linecard Hardware internal errors
show hardware internal fc-mac port x error-
statistic2,3
show hardware internal statistics2,3
show hardware internal statistics device fcmac all4
show hardware internal statistics device fcmac5
show hardware internal errors2,3
Note1: CSCut21070
show hardware internal
statistics sup command
does not include fcmac
Integrated in: open
Slow drain counters and descriptions Table 2 - Counters indicating frame drops
Table 2 - continued
FCP_CNTR_LAF_CF_TIMEOUT_FRAMES_DISCARD2
AK_FCP_CNTR_LAF_CF_TIMEOUT_FRAMES_DISCARD3
THB_TMM_TO_CNT_CLASS_F4
F16_TMM_TO_CNT_CLASS_F5
None48,48s,50i
Count of class-F Fibre Channel frames dropped due to congestion-drop
timeout
OBFL:
Show logging onboard error-stats2,3,4,5
Sup Hardware internal errors
show hardware internal errors all|module x2,3
show hardware internal statistics device fcmac|all4,5,note1
Linecard Hardware internal errors
show hardware internal fc-mac port x error-statistic2,3
show hardware internal statistics2,3
show hardware internal statistics device fcmac all4
show hardware internal statistics device fcmac5
show hardware internal errors2,3
Note1: CSCut21070 show
hardware internal statistics sup
command does not include fcmac
Integrated in: open
F16_TMM_PORT_STUCK_FORCE_TIMEOUT_CNT5
VIP_TMM_STUCK_PORT_TO_CNT48S,50i
Total number of frames force timeout dropped by Stuck port processing(no-
credit-drop)
Gen2/3/4 /9148 do not have a counter for this. Any frames dropped as a result
of no-credit-drop on these are just counted as timeout discards.
Sup:
None 5,note1
None 48S,50i,note2
Linecard:
show hardware internal statistics device all|fcmac
Note1: CSCut21070 show
hardware internal statistics sup
command does not include fcmac
Integrated in: open
Note2: CSCus85931 Need show
hardware internal errors & stats
cmds on MDS 9250i 9148 9148S
Integrated in: open
Slow drain counters and descriptions Table 2 - Counters indicating frame drops
Table 2 - end
FCP_CNTR_TMM_TIMEOUT48
VIP_TMM_TO_CNT 48S,50i,note1
Total number of Timeout drops counter which includes
frames timed out due to congestion(pkt timeout), HW stuck force timeout,
HW slow port force timeout.
Sup hardware internal statistics:
show hardware internal statistics all48
None 48S,50i,note2
Sup packet-dropped-reason
show hardware internal packet-dropped-reason mod 48,48S,50i
Linecard hardware internal errors/statistics:
show hardware internal errors48,48s,50i
show hardware internal statistics48
show hardware internal statistics device all|fcmac48s,50i
OBFL:
Show logging onboard error-stats48,48s,50i
Note1: Should be included in `show
hardware internal statistics pktflow
dropped`. See:
CSCus60322 Add
VIP_TMM_TO_CNT and
VIP_TMM_TO_DROP_CNT to
packet-flow dropped
Integrated into 6.2(13)
Note2: CSCus85931 Need show
hardware internal errors & stats
cmds on MDS 9250i 9148 9148S
Integrated in: open
F16_TMM_PORT_CWAIT_FORCE_TIMEOUT_CNT5
VIP_TMM_SLOW_PORT_TO_CNT 48S,50i
Total timeout packets dropped due to slow-port-monitor processing. This
doesn’t increment since the slow-port-monitor feature doesn’t include a
packet drop function
None. Not implemented.
Slow drain counters and descriptions Table 3 - Counters indicating an action on or for an interface
Table 3
Counter Name Description Commands Additional Info
FCP_CNTR_CREDIT_LOSS2 ,48
AK_FCP_CNTR_CREDIT_LOSS3
FCP_SW_CNTR_CREDIT_LOSS4,5 ,48s,50i
Count of the number of times that creditmon credit loss recovery has been
invoked on a port
OBFL:
Show logging onboard error-stats2,3,4,5,48,48s,50i
Sup Hardware internal errors/statistics:
show hardware internal errors all|module x2,3,4
None4,5,note1
show hardware internal statistics all48
None48s,50i,note2
Linecard hardware internal errors/statistics
show hardware internal fc-mac port x error-statistic2,3
show hardware internal statistics2,3,48
show hardware internal statistics device fcmac all4
show hardware internal statistics device fcmac|all5,48,48S,50i
show hardware internal errors2,3,4,48,48S,50i
Note1: CSCut21070 show
hardware internal statistics sup
command does not include fcmac
Integrated in: open
Note2: CSCus85931 Need show
hardware internal errors & stats
cmds on MDS 9250i 9148 9148S
Integrated in: open
Slow drain counters and descriptions Table 3 - Counters indicating an action on or for an interface
Table 3 - continued
FCP_CNTR_FORCE_TIMEOUT_ON2 ,48
AK_FCP_CNTR_FORCE_TIMEOUT_ON3
FCP_SW_CNTR_FORCE_TIMEOUT_ON4
FCP_SW_CNTR_FORCE_TIMEOUT_ON5,50i,48s,note2
Count of the number of times the "system timeout no-credit-drop threshold"
has been reached by this port; when a port is at zero Tx B2B credits for the time
specified, the port starts to drop packets at line rate
Note 1: For the 9700 and 9250i these counters will only increment prior to the
introduction of the HW slow drain feature in 6.2(9). Since the 9148S was first
supported in NX-OS 6.2(9) it will never have it increment. Reference the
F16_TMM_PORT_STUCK_FORCE_TIMEOUT_L_H_CNT5 and
VIP_TMM_STK_PRT_TO_TRANSITION_CH0_CNT48S,50i counters which indicate
the same thing (but are not in OBFL). Checking on whether these should be re-
added.
See the following counters to determine frames dropped due to force
timeout:
F16_TMM_PORT_STUCK_FORCE_TIMEOUT_CNT5
VIP_TMM_STUCK_PORT_TO_CNT48S,50i
OBFL:
Show logging onboard error-stats2,3,4,5,48,48s,50i
Sup hardware internal errors/statistics
show hardware internal errors all|module x2,3,4
show hardware internal statistics all48
None48s,50i,note2
Linecard hardware internal errors/statistics:
show hardware internal fc-mac port x error-statistic2,3
show hardware internal statistics2,3,48
show hardware internal statistics device fcmac all4
show hardware internal statistics device fcmac|all48
show hardware internal errors2,3,4,48
None 5,48S,50i,note3
Note2: CSCus85931 Need show
hardware internal errors & stats
cmds on MDS 9250i 9148 9148S
Integrated in: open
Note3:
After NX-OS 6.2(9) the following
counters are not incrementing on
MDS 9700, 9250i, 9148S:
CSCus93140 no-credit-drop SW
counters not incrementing on MDS
9700, 9250i, 9148S
Integrated in: open
Slow drain counters and descriptions Table 3 - Counters indicating an action on or for an interface
Table 3 - continued
FCP_CNTR_FORCE_TIMEOUT_OFF2
AK_FCP_CNTR_FORCE_TIMEOUT_OFF3
FCP_SW_CNTR_FORCE_TIMEOUT_OFF4,48
FCP_SW_CNTR_FORCE_TIMEOUT_OFF5,50i,48s,note 1
Count of the number of times that the port has recovered from the system
timeout no-credit-drop condition; this status typically means that R_Rdy
primitive has been returned or possibly that an LR and LRR has occurred.
Note 1: For the 9700 and 9250i these counters will only increment prior to the
introduction of the HW slow drain feature in 6.2(9). Since the 9148S was first
supported in NX-OS 6.2(9) they will never increment. They are re-added into
hardware internal statistics and logging onboard error-stats via the following
bug:
CSCus93140 no-credit-drop SW counters not incrementing on MDS 9700,
9250i, 9148S
F16_TMM_PORT_STUCK_FORCE_TIMEOUT_H_L_CNT5 indicates the same thing.
OBFL:
Show logging onboard error-stats2,3,4,5,48,48s,50i
Sup hardware internal errors/statistics
show hardware internal errors all|module x2,3,4
show hardware internal statistics all48
None48s,50i,note2
Linecard hardware internal errors/statistics:
show hardware internal fc-mac port x error-statistic2,3
show hardware internal statistics2,3,48
show hardware internal statistics device fcmac all4
show hardware internal statistics device fcmac|all48
show hardware internal errors2,3,4,48
None 5,48S,50i,note3
Note2: CSCus85931 Need show
hardware internal errors & stats
cmds on MDS 9250i 9148 9148S
Integrated in: open
Note3:
After NX-OS 6.2(9) the following
counters are not incrementing on
MDS 9700, 9250i, 9148S:
CSCus93140 no-credit-drop SW
counters not incrementing on MDS
9700, 9250i, 9148S
Integrated in: open
Slow drain counters and descriptions Table 3 - Counters indicating an action on or for an interface
Table 3 - continued
AK_FCP_CNTR_LINK_RESET_OUT2,3
FCP_SW_CNTR_LINK_RESET_OUT48
FCP_SW_CNTR_LINK_RESET_OUT4,5,50i,48S, note 1
Count of times a Link Credit Reset(LR) was transmitted from the interface.
Also shown in the output of “show interface counters detail” as:
xxx link reset protocol errors transmitted
Or
xxx link reset transmitted while link is active
Note the above just counts link resets that are transmitted when the link is
active.
Sup hardware internal errors/statistics
show hardware internal statistics all2,3,48
Linecard hardware internal errors/statistics
show hardware internal statistics all2,3,48
Note1: These are not incremented.
CSCus99138 Port software counters
not incrementing
Integrated in: open
AK_FCP_CNTR_LINK_RESET_IN2,3
FCP_SW_CNTR_LINK_RESET_IN48
FCP_SW_CNTR_LINK_RESET_IN4,5,50i,48S, note 1
Count of times a Link Credit Reset(LR) was received on the interface.
Also shown in the output of “show interface counters detail” as:
xxx link reset protocol errors received
Or
xxx link reset received while link is active
Note the above just counts link resets that are received when the link is active.
Reference AK_FCP_CNTR_LINK_RESET_OUT above.
Show logging onboard interrupt-stats will show
IP_FCMAC_INTR_PRIM_RX_SEQ_LR
Reference Additional Info in
AK_FCP_CNTR_LINK_RESET_OUT
above.
AK_FCP_CNTR_LRR_OUT 2,3
FCP_SW_CNTR_LRR_OUT4,5,48,50i,48S, note 1
Count of times a Link Credit Reset Response(LRR) was transmitted from the
interface.
Reference AK_FCP_CNTR_LINK_RESET_OUT above. Reference Additional Info in
AK_FCP_CNTR_LINK_RESET_OUT
above.
AK_FCP_CNTR_LRR_IN 2,3
FCP_SW_CNTR_LRR_IN4,5,48,50i,48S, note 1
Count of times a Link Credit Reset Response(LRR) was received on the interface.
Also shown using show interface fcx/y
xx input OLS,xx LRR,0 NOS,xx loop inits
Reference AK_FCP_CNTR_LINK_RESET_OUT above. Reference Additional Info in
AK_FCP_CNTR_LINK_RESET_OUT
above.
Slow drain counters and descriptions Table 3 - Counters indicating an action on or for an interface
Table 3 - end
AK_FCP_CNTR_OLS_OUT 2,3
FCP_SW_CNTR_OLS_OUT4,5,48,60i,48S, note 1
Count of times an Off Line Sequence(OLS) was transmitted from the interface.
Also shown using show interface fcx/y
xx output OLS,xx LRR, xx NOS, xx loop inits
Reference AK_FCP_CNTR_LINK_RESET_OUT above. Reference Additional Info in
AK_FCP_CNTR_LINK_RESET_OUT
above.
AK_FCP_CNTR_OLS_IN 2,3
FCP_SW_CNTR_OLS_IN4,5,48,50i,48S, note 1
Count of times an Off Line Sequence(OLS) was received on the interface.
Also shown using show interface fcx/y
xx input OLS,xx LRR,0 NOS,xx loop inits
Reference AK_FCP_CNTR_LINK_RESET_OUT above. Reference Additional Info in
AK_FCP_CNTR_LINK_RESET_OUT
above.
AK_FCP_CNTR_NOS_OUT 2,3
FCP_SW_CNTR_NOS_OUT4,5,48,50i,48S, note 1
Count of times an Not Operational Sequence(NOS) was transmitted from the
interface.
Also shown using show interface fcx/y
xx output OLS,xx LRR, xx NOS, xx loop inits
Reference AK_FCP_CNTR_LINK_RESET_OUT above. Reference Additional Info in
AK_FCP_CNTR_LINK_RESET_OUT
above.
AK_FCP_CNTR_NOS_IN 2,3
FCP_SW_CNTR_NOS_IN4,5,48,50i,48S, note 1
Count of times an Not Operational Sequence(NOS) was received on the
interface.
Also shown using show interface fcx/y
xx input OLS,xx LRR,0 NOS,xx loop inits
Reference AK_FCP_CNTR_LINK_RESET_OUT above. Reference Additional Info in
AK_FCP_CNTR_LINK_RESET_OUT
above.
AK_FCP_CNTR_LRR_OUT 2,3
FCP_SW_CNTR_LRR_OUT4,5,48,50i,48S, note 1
Count of times a Link Credit Reset Response(LRR) was transmitted from the
interface.
Also shown using show interface fcx/y
xx output OLS,xx LRR, xx NOS, xx loop inits
Reference AK_FCP_CNTR_LINK_RESET_OUT above. Reference Additional Info in
AK_FCP_CNTR_LINK_RESET_OUT
above.
Slow drain counters and descriptions Table 4 – Interrupt counters
Table 4
Counter Name Description Commands Additional Info
F16_FCP_INTR_TMM_P_CWAIT_FORCE_TIMEOUT_L_H5 Slowport condition detected count (Low to High transition: i.e. credit wait
(cwait) > threshold)
NX-OS 6.2(1) through 6.2(7) - Count of times port was at zero Tx credits
for 100ms. Only increments on the initial 100ms interval. . In these “pre-
slowport-monitor” releases this counter was used to trigger the
FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO counter increment.
NX-OS 6.2(9) and later – Should not occur.
OBFL:
show logging onboard interrupt-stats
Linecard:
Slot x show hardware internal fcmac port x interrupt-counts
F16_FCP_INTR_TMM_P_CWAIT_FORCE_TIMEOUT_H_L5 Slowport condition exited count (High to Low transition: ie creditwait
(cwait) < threshold)
NX-OS 6.2(1) through 6.2(7) - Count of times port received a credit after
being at zero Tx credits for 100ms or longer. In these “pre-slowport-
monitor” releases this counter was used to re-arm the
FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO counter increment.
NX-OS 6.2(9) and later – Should not occur.
OBFL:
show logging onboard interrupt-stats
Linecard:
Slot x show hardware internal fcmac port x interrupt-counts
Slow drain counters and descriptions Table 4 – Interrupt counters
Table 4 - continued
F16_FCP_INTR_TMM_P_STUCK_FORCE_TIMEOUT_L_H5 Stuck port condition detected count (Low to High transition.
Configured via:
“system timeout slowport-monitor xxx mode|f”
Defaults to 500ms with no action.
OBFL:
show logging onboard interrupt-stats
Linecard:
Slot x show hardware internal fcmac port x interrupt-counts
F16_FCP_INTR_TMM_P_STUCK_FORCE_TIMEOUT_H_L5 Count of times stuck port condition exited. OBFL:
show logging onboard interrupt-stats
Linecard:
Slot x show hardware internal fcmac port x interrupt-counts
VIPER_FCP_INTR_TMM_CWAIT_AVG_LIVE_CH0_OVER_THRESHOLD_RAISING48s,
50i
VIPER_FCP_INTR_TMM_CWAIT_AVG_LIVE_CH1_OVER_THRESHOLD_RAISING48s,
50i
Slowport condition detected count (Low to High transition: i.e. credit wait
(cwait) > threshold)
NX-OS 6.2(5) through 6.2(7) - Count of times port was at zero Tx credits
for 100ms. Only increments on the initial 100ms interval. . In these “pre-
slowport-monitor” releases this counter was used to trigger the
FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO counter increment.
NX-OS 6.2(9) and later – Should not occur.
OBFL:
show logging onboard interrupt-stats
Linecard:
Slot x show hardware internal fcmac port x interrupt-counts
Note: VIPER does not have a High to
Low interrupt like F16.
Slow drain counters and descriptions Table 4 – Interrupt counters
Table 4 - continued
VIPER_FCP_INTR_TMM_CWAIT_AVG_LIVE_CH0_OVER_THRESHOLD_FALLING48s,50i
VIPER_FCP_INTR_TMM_CWAIT_AVG_LIVE_CH1_OVER_THRESHOLD_FALLING48s,50i
Slowport condition detected count exited. OBFL:
show logging onboard interrupt-stats
Linecard:
Slot x show hardware internal fcmac port x interrupt-counts
Note: These are displayed in OBFL
with the VIPER_FCP_INTR_ prefix but
without the prefix in other places.
VIPER_FCP_INTR_TMM_STUCK_PORT_TIMEOUT_CH0_RAISING
VIPER_FCP_INTR_TMM_STUCK_PORT_TIMEOUT_CH1_RAISING
Count of times port was at zero Tx credits for the stuck port timeout
value ”no-credit-drop” (default value 500ms).
NX-OS 6.2(5) through 6.2(7) - Count of times port was at zero Tx credits
for 1s(F port) or 1.5s(E port). In these “pre-slowport-monitor” releases
this interrupt was used to trigger the FCP_SW_CNTR_CREDIT_LOSS
counter increment.
NX-OS 6.2(9) and later – Should not occur.
OBFL:
show logging onboard interrupt-stats
Linecard:
Slot x show hardware internal fcmac port x interrupt-counts
VIPER_FCP_INTR_TMM_STUCK_PORT_TIMEOUT_CH0_FALLING
VIPER_FCP_INTR_TMM_STUCK_PORT_TIMEOUT_CH1_FALLING
Count of times stuck port condition exited. OBFL:
show logging onboard interrupt-stats
Linecard:
Slot x show hardware internal fcmac port x interrupt-counts
Slow drain counters and descriptions Table 4 – Interrupt counters
Table 4 - continued
IP_FCMAC_INTR_PRIM_RX_SEQ_NOS Not Operational Sequence received on the interface.
NOS is a sequence that is transmitted continuously until a OLS is received.
OBFL:
show logging onboard interrupt-stats
Linecard:
show hardware internal fc-mac port 1 interrupt-counts2,3,48
Slot x show hardware internal fcmac port x interrupt-counts4,5,48s,50i
show interface fcx/y counters details
show interface detailed-counters
xxx non-operational sequences
received
IP_FCMAC_INTR_PRIM_RX_SEQ_OLS Off Line Sequence received on the interface.
OLS is a sequence that is transmitted continuously until a LR is received.
OBFL:
show logging onboard interrupt-stats
Linecard:
show hardware internal fc-mac port 1 interrupt-counts2,3,48
Slot x show hardware internal fcmac port x interrupt-counts4,5,48s,50i
show interface fcx/y counters details
show interface detailed-counters
xxx Offline Sequence errors received
IP_FCMAC_INTR_PRIM_RX_SEQ_LR Link Reset received on the interface. LR is sent under two conditions
normally:
1) Link bringup – NPS/OLS/LR/LRR
2) Credit Loss Recovery – LR is sent to bring each side up to its full
complement of B2B credits. This doesn’t bounce or flap the link
but just restore the B2B credits.
OBFL:
show logging onboard interrupt-stats
Linecard:
show hardware internal fc-mac port 1 interrupt-counts2,3,48
Slot x show hardware internal fcmac port x interrupt-counts4,5,48s,50i
show interface fcx/y counters details
show interface detailed-counters
xxx link reset protocol errors received
Slow drain counters and descriptions Table 4 – Interrupt counters
Table 4 - end
IP_FCMAC_INTR_PRIM_RX_SEQ_LRR Link Reset received on the interface. This is sent in response to a Link Reset. OBFL:
show logging onboard interrupt-stats
Linecard:
show hardware internal fc-mac port 1 interrupt-counts2,3,48
Slot x show hardware internal fcmac port x interrupt-counts4,5,48s,50i
show interface fcx/y counters details
show interface detailed-counters
xxx link reset responses received
Slow drain counters and descriptions Table 5 - SNMP variables applicable to slow drain
Table 5
Counter Name
n
Description Commands Additional Info
fcIfTxWaitCount 2,3,48 ,see note 1
fcIfTxWaitCount 4,note2,note3
fcIfTxWaitCount 5,note 3
fcIfTxWaitCount 50i, 48S, see note 4
OID 1.3.6.1.4.1.9.9.289.1.2.1.1.15
The number of times the FC-port waited due to lack of transmit credits and there
were packets queued for transmit. This is in units of 2.5us.
To calculate seconds txwait * 2.5 /1000000
There is no OID for the Rx direction of this.
Not generated by port-monitor
Based on the following counters:
THB_TMM_PORT_TWAIT_CNT4
F16_TMM_PORT_TWAIT_CNT5
VIP_TMM_TXWAIT_CH0_CNT50i, 48S
VIP_TMM_TXWAIT_CH1_CNT50i, 48S
Displayed via:
Show interface fcx/y counters detailed | i wait
-or-
Show interface detailed-counters | i fc|wait
Example:
rtp-san-34-15-9513# show int fc4/1 counters details | i wait
82864704 waits due to lack of transmit credits
Not generated by port-monitor
Note1: On gen2, gen3 and 9148,
this will always return zero.
Note2: Added to Gen4 linecards in
NX-OS 5.2(2)
Note3: Prior to 6.2(11a) this counter
was inaccurate. See the following
bug:
CSCus15233 fcIfTxWaitCount
incorrect on DS-X9232-256K9 and
DS-X9248-768K9
Fixed in 6.2(11a)
Note4: Prior to 6.2(11a) this counter
was inaccurate. See the following
bug:
CSCus15745 fcIfTxWaitCount
incorrect for MDS 9250i and 9148S
Fixed in 6.2(11a)
fcIfCreditLoss2,3,4,5,48,50i, 48S OID 1.3.6.1.4.1.9.9.289.1.2.1.1.37
The number of link resets that have occurred due to unavailable
credits from the peer side of the link.
Generated by port-monitor counter credit-loss-reco
Shown in the output of show interface counters:
xxx timeout discards, xxx credit loss
Credit loss recovery is initiated by
the MDS after 1 second(F port) / 1.5
seconds(E port) at zero Tx credits.
Other products may initiate at
different intervals
Slow drain counters and descriptions Table 5 - SNMP variables applicable to slow drain
Table 5 - continued
fcIfLinkResetOuts 2,3,4,5,48,50i, 48S OID 1.3.6.1.4.1.9.9.289.1.2.1.1.10
The number of link reset protocol errors issued by
the FC-Port to the attached FC-Port.
Generated by port-monitor counter lr-tx
Shown in the output of show interface fcx/y counters detailed:
xxx link reset protocol errors transmitted
or
xxx link reset transmitted while link is active
fcIfLinkResetIns2,3,4,5,48,50i, 48S OID 1.3.6.1.4.1.9.9.289.1.2.1.1.9
The number of link reset protocol errors received by
the FC-Port from the attached FC-port
Generated by port-monitor counter lr-rx
Shown in the output of show interface fcx/y counters detailed:
xxx link reset protocol errors received
or
xxx link reset received while link is active
fcIfTimeOutDiscards2,3,4,5,48,50i, 48S OID 1.3.6.1.4.1.9.9.289.1.2.1.1.35
The number of packets that are dropped due to time-out at the FC-port or due to the FC-port going
offline.
Generated by port-monitor counter timeout-discards
Shown in the output of show interface counters:
xxx timeout discards, xxx credit loss
Slow drain counters and descriptions Table 5 - SNMP variables applicable to slow drain
Table 5 - continued
fcIfOutDiscards2,3,4,5,48,50i, 48S OID 1.3.6.1.4.1.9.9.289.1.2.1.1.36
The total number of packets that are discarded in the egress side of the FC-port.
Generated by port-monitor counter tx-discards
Shown in the show interface fcx/y command:
xxx discards, xxx errors
fcIfTxWtAvgBBCreditTransitionToZero2,3,4,5,48,50i, 48S OID 1.3.6.1.4.1.9.9.289.1.2.1.1.38
Count of the number of times that an interface was at zero Tx B2B credits for 100 ms.
This status typically indicates congestion at the device attached on that interface.
Generated by port-monitor counter tx-credit-not-available
show system internal snmp credit-not-available
See the following hardware internal
error for more info:
xxx_CNTR_TX_WT_AVG_B2B_ZERO
Note: There is no OID in the Rx
direction.
CSCus93323 Portmonitor
fcIfTxWtAvgBBCreditTransitionToZero
truncates hcAlarmOwner
fcIfBBCreditTransistionFromZero2,3,4,5,48,50i, 48S OID 1.3.6.1.4.1.9.9.289.1.2.1.1.28
Increments when the transmit B2B credit transitions to zero
There is no indication of time at zero for this counter. It could stay at zero for just an
instant or for an extended duration of time.
Also shown in the output of show interface counters:
xxxx Transmit B2B credit transitions to zero
Not generated by port-monitor
Shown in the output of show interface counters:
xxxx Transmit B2B credit transitions to zero
Based off of the TBBZ hardware
statistic.
Slow drain counters and descriptions Table 5 - SNMP variables applicable to slow drain
Table 5 - end
fcHCIfBBCreditTransistionFromZero2,3,4,5,48,50i, 48S OID 1.3.6.1.4.1.9.9.289.1.2.1.1.40
Increments when the transmit B2B credit transitions to zero
There is no indication of time at zero for this counter. It could stay at zero for just an
instant or for an extended duration of time.
Not generated by port-monitor
Shown in the output of show interface counters:
xxxx Transmit B2B credit transitions to zero
Based off of the TBBZ hardware
statistic.
fcIfBBCreditTransistionToZero2,3,4,5,48,50i, 48S OID 1.3.6.1.4.1.9.9.289.1.2.1.1.39
Increments when the receive B2B credit transitions to zero
There is no indication of time at zero for this counter. It could stay at zero for just an
instant or for an extended duration of time.
Not generated by port-monitor
Shown in the output of show interface counters:
xxxx Receive B2B credit transitions to zero
Based off of the RBBZ hardware
statistic.
fcHCIfBBCreditTransistionToZero2,3,4,5,48,50i, 48S OID 1.3.6.1.4.1.9.9.289.1.2.1.1.41
Increments when the receive B2B credit transitions to zero
There is no indication of time at zero for this counter. It could stay at zero for just an
instant or for an extended duration of time.
Not generated by port-monitor
Shown in the output of show interface counters:
xxxx Receive B2B credit transitions to zero
Based off of the RBBZ hardware
statistic.
Slow drain counters and descriptions Legend - superscripts
• Superscripts:
• 1: Generation 1 modules are no longer supported by NX-OS 5.0 (and later releases) and are not covered by this presentation
• 2: Generation 2 DS-X9112, DS-X9124, and DS-X9148 and DS-X9304-18K9 modules
• 3: Generation 3 DS-X9248-48K9 and DS-X92xx-96K9 modules
• 4: Generation 4 DS-X92xx-256K9 modules
• 5: Generation 5 Cisco MDS 9710/9706 DS-X9448-768K9 module and MDS 9396S
• 48: Cisco MDS 9148
• 50i: Cisco MDS 9250i
• 48S: Cisco MDS 9148s
• Legend
• AK: Aakash (Generation 2 or Generation 3 line card MAC ASIC)
• THB: Thunderbird (Generation 4 ASIC)
• F16: F16 (Generation 5 ASIC)
• SAB: Sabre ASIC for MDS 9148
• VIP: Viper ASIC for MDS 9250i and 9148S
• RI: Request Interface
• TMM: Transmit Memory Manager
• FCP_SW: These indicate software counters
Complete Your Online Session Evaluation
Don’t forget: Cisco Live sessions will be available for viewing on-demand after the event at CiscoLive.com/Online
• Give us your feedback to be entered into a Daily Survey Drawing. A daily winner will receive a $750 Amazon gift card.
• Complete your session surveys though the Cisco Live mobile app or your computer on Cisco Live Connect.
Continue Your Education
• Demos in the Cisco Campus
• Walk-in Self-Paced Labs
• Table Topics
• Meet the Engineer 1:1 meetings
Thank you