+ All Categories
Home > Documents > Emergency Maintenance(V100R001 01)

Emergency Maintenance(V100R001 01)

Date post: 14-Apr-2018
Category:
Upload: xotonuco
View: 237 times
Download: 0 times
Share this document with a friend

of 73

Transcript
  • 7/30/2019 Emergency Maintenance(V100R001 01)

    1/73

    Quidway S9300 Terabit Routing Switch

    V100R001

    Emergency Maintenance

    Issue 01

    Date 2009-04-15

    Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    2/73

    Huawei Technologies Co., Ltd. provides customers with comprehensive technical support and service. For any

    assistance, please contact our local office or company headquarters.

    Huawei Technologies Co., Ltd.

    Address: Huawei Industrial Base

    Bantian, Longgang

    Shenzhen 518129

    People's Republic of China

    Website: http://www.huawei.com

    Email: [email protected]

    Copyright Huawei Technologies Co., Ltd. 2009. All rights reserved.

    No part of this document may be reproduced or transmitted in any form or by any means without prior written

    consent of Huawei Technologies Co., Ltd.

    Trademarks and Permissions

    and other Huawei trademarks are the property of Huawei Technologies Co., Ltd.

    All other trademarks and trade names mentioned in this document are the property of their respective holders.

    Notice

    The information in this document is subject to change without notice. Every effort has been made in the

    preparation of this document to ensure accuracy of the contents, but the statements, information, and

    recommendations in this document do not constitute a warranty of any kind, express or implied.

    Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    http://www.huawei.com/
  • 7/30/2019 Emergency Maintenance(V100R001 01)

    3/73

    Contents

    About This Document.....................................................................................................................1

    1 Overview of Emergency Maintenance...................................................................................1-1

    1.1 Definition of Emergency Maintenance...........................................................................................................1-2

    1.2 Definition of Emergencies..............................................................................................................................1-21.3 Initiation of Emergency Maintenance.............................................................................................................1-2

    1.4 Guidelines forEmergency Maintenance.........................................................................................................1-3

    1.5 Flow for Emergency Maintenance..................................................................................................................1-3

    1.5.1 Notifying Huawei of the Emergency.....................................................................................................1-5

    1.5.2 Locating the Fault...................................................................................................................................1-5

    1.5.3 Collecting Fault Information..................................................................................................................1-5

    1.5.4 Rectifying the Fault................................................................................................................................1-6

    1.5.5 Obtaining Help.......................................................................................................................................1-7

    1.5.6 Checking the Handling Result................................................................................................................1-7

    1.5.7 Recording Information About Emergency Maintenance.......................................................................1-7

    1.6 Emergency Maintenance Precautions.............................................................................................................1-8

    1.7 Technical Support...........................................................................................................................................1-9

    2 Emergency Maintenance for Device Faults...........................................................................2-1

    2.1 Overview.........................................................................................................................................................2-2

    2.2 Flow for Handling Device Faults....................................................................................................................2-2

    2.3 Directions forEmergency Maintenance..........................................................................................................2-3

    2.3.1 Failure to Log In to a System Through the Console Interface...............................................................2-3

    2.3.2 Failure to Start a System........................................................................................................................2-7

    2.3.3 Abnormality of the Board Status............................................................................................................2-9

    2.3.4 Abnormality of the Interface Status.....................................................................................................2-11

    3 Emergency Maintenance for Service Faults..........................................................................3-1

    3.1 Overview.........................................................................................................................................................3-2

    3.2 Flow for Handling Service Faults...................................................................................................................3-2

    3.3 Guide to Emergency Maintenance..................................................................................................................3-3

    3.3.1 Failure to Forward IP Unicast Packets...................................................................................................3-4

    3.3.2 Failure to Forward IP Multicast Packets................................................................................................3-8

    3.3.3 Failure to Forward MPLS VPN Packets..............................................................................................3-14

    4 Guide to Fault Information Collection..................................................................................4-1

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance Contents

    Issue 01 (2009-04-15) Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    i

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    4/73

    4.1 Overview.........................................................................................................................................................4-2

    4.2 Collection of Basic Fault Information.............................................................................................................4-2

    4.3 Collection of Device Fault Information..........................................................................................................4-2

    5 Guide to System Reboot...........................................................................................................5-15.1 Overview.........................................................................................................................................................5-2

    5.2 Preparation for System Reboot.......................................................................................................................5-2

    5.3 Guide to System Reboot..................................................................................................................................5-2

    5.3.1 Running Command Lines.......................................................................................................................5-3

    5.3.2 Pressing the RESET Button on the MCU/SRU.....................................................................................5-4

    5.3.3 Switching Off and Switching On the System.........................................................................................5-4

    5.3.4 Operating Through the NMS..................................................................................................................5-4

    5.4 Verification of System Reboot........................................................................................................................5-5

    5.4.1 Displaying Information About System Reboot......................................................................................5-5

    5.4.2 Checking the Software Version and Configuration File........................................................................5-6

    5.5 Handling of a System Reboot Failure.............................................................................................................5-7

    6 Emergency Maintenance Record Table................................................................................. 6-1

    6.1 Notice of Emergency Maintenance.................................................................................................................6-2

    6.2 Emergency Record Table................................................................................................................................6-2

    7 System Upgrading Through BIOS..........................................................................................7-1

    Contents

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance

    ii Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    Issue 01 (2009-04-15)

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    5/73

    Figures

    Figure 1-1 Flowchart of emergency maintenance................................................................................................1-4

    Figure 1-2 Flowchart for identifying the type of a fault.......................................................................................1-6

    Figure 2-1 Flowchart for handling device faults..................................................................................................2-3

    Figure 2-2 Flowchart for handling the failed login to a system through the console interface............................2-5

    Figure 2-3 Flowchart for handling the failed system start...................................................................................2-8

    Figure 2-4 Flowchart for handling the abnormality of the board status.............................................................2-10

    Figure 2-5 Flowchart for handling the abnormality of the interface status........................................................2-12

    Figure 3-1 Flowchart for handling service faults.................................................................................................3-3

    Figure 3-2 Flowchart for handling the failure to forward IP unicast packets......................................................3-6

    Figure 3-3 Flowchart for handling the failure to forward IP multicast packets.................................................3-11

    Figure 3-4 Flowchart for handling the failure to forward MPLS VPN packets.................................................3-15

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance Figures

    Issue 01 (2009-04-15) Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    iii

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    6/73

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    7/73

    Tables

    Table 1-1 Methods of identifying the fault type...................................................................................................1-6

    Table 2-1 Collection of information about the failure to log in to a system through the console interface.........2-4

    Table 2-2 Collection of information about the failure to start a system...............................................................2-7

    Table 2-3 Collection of information about the abnormality of the board status................................................2-10

    Table 2-4 Collection of information about the abnormality of the interface status............................................2-11

    Table 3-1 Collection of information about the failure to forward IP unicast packets..........................................3-4

    Table 3-2 Collection of information about the failure to forward IP multicast packets.......................................3-9

    Table 3-3 Collection of information about the failure to forward MPLS VPN packets.....................................3-14

    Table 4-1 Collection of basic fault information...................................................................................................4-2

    Table 4-2 Collection of device fault information.................................................................................................4-3

    Table 6-1 Notice of emergency maintenance.......................................................................................................6-2

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance Tables

    Issue 01 (2009-04-15) Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    v

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    8/73

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    9/73

    About This Document

    Purpose

    This document describes how to rectify the device faults and service faults of the Quidway

    S9300 Terabit Routing Switch. It also provides instructions for fault information collection and

    device reboot.

    Related Versions

    The following table lists the product versions related to this document.

    Product Name Version

    S9300 V100R001

    Intended Audience

    This document is intended for:

    l Policy planning engineers

    l Installation and commissioning engineers

    l NM configuration engineers

    l Technical support engineers

    Organization

    This document is organized as follows.

    Chapter Description

    1 Overview of Emergency

    Maintenance

    Provides the definitions, causes, principle, flowcharts, and

    precautions of emergency maintenance.

    2 Emergency Maintenance

    for Device Faults

    Describes the emergency maintenance for device faults,

    focusing on fault clearance and service recovery and not fault

    rectification.

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance About This Document

    Issue 01 (2009-04-15) Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    1

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    10/73

    Chapter Description

    3 Emergency Maintenance

    for Service Faults

    Describes the emergency maintenance for service faults,

    focusing on fault clearance and service recovery rather than

    fault rectification.

    4 Guide to Fault

    Information Collection

    Describes how to collect and back up fault information on

    time after an emergency fault occurs.

    5 Guide to System Reboot Describes how to restart the device manually when services

    are interrupted because of a device fault and the device

    cannot restart automatically.

    6 Emergency Maintenance

    Record Table

    Describes the tables that you need to fill in when performing

    emergency maintenance.

    7 System Upgrading

    Through BIOS

    Describes how to upgrade software through BIOS when the

    host software program fails to start.

    Conventions

    Symbol Conventions

    The symbols that may be found in this document are defined as follows.

    Symbol Description

    DANGER

    Indicates a hazard with a high level of risk, which if not

    avoided, will result in death or serious injury.

    WARNING

    Indicates a hazard with a medium or low level of risk, which

    if not avoided, could result in minor or moderate injury.

    CAUTION

    Indicates a potentially hazardous situation, which if not

    avoided, could result in equipment damage, data loss,

    performance degradation, or unexpected results.

    TIP

    Indicates a tip that may help you solve a problem or savetime.

    NOTE Provides additional information to emphasize or

    supplement important points of the main text.

    General Conventions

    The general conventions that may be found in this document are defined as follows.

    Convention Description

    Times New Roman Normal paragraphs are in Times New Roman.

    About This Document

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance

    2 Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    Issue 01 (2009-04-15)

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    11/73

    Convention Description

    Boldface Names of files, directories, folders, and users are in

    boldface. For example, log in as userroot.

    Italic Book titles are in italics.

    Courier New Examples of information displayed on the screen are in

    Courier New.

    Command Conventions

    The command conventions that may be found in this document are defined as follows.

    Convention Description

    Boldface The keywords of a command line are in boldface.

    Italic Command arguments are in italics.

    [ ] Items (keywords or arguments) in brackets [ ] are optional.

    { x | y | ... } Optional items are grouped in braces and separated by

    vertical bars. One item is selected.

    [ x | y | ... ] Optional items are grouped in brackets and separated by

    vertical bars. One item is selected or no item is selected.

    { x | y | ... }* Optional items are grouped in braces and separated by

    vertical bars. A minimum of one item or a maximum of allitems can be selected.

    [ x | y | ... ]* Optional items are grouped in brackets and separated by

    vertical bars. Several items or no item can be selected.

    & The parameter before the & sign can be repeated 1 to n times.

    # A line starting with the # sign is comments.

    GUI Conventions

    The GUI conventions that may be found in this document are defined as follows.

    Convention Description

    Boldface Buttons, menus, parameters, tabs, window, and dialog titles

    are in boldface. For example, clickOK.

    > Multi-level menus are in boldface and separated by the ">"

    signs. For example, choose File > Create > Folder.

    Keyboard Operations

    The keyboard operations that may be found in this document are defined as follows.

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance About This Document

    Issue 01 (2009-04-15) Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    3

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    12/73

    Format Description

    Key Press the key. For example, press Enter and press Tab.

    Key 1+Key 2 Press the keys concurrently. For example, pressing Ctrl+Alt

    +A means the three keys should be pressed concurrently.

    Key 1, Key 2 Press the keys in turn. For example, pressing Alt, A means

    the two keys should be pressed in turn.

    Mouse Operations

    The mouse operations that may be found in this document are defined as follows.

    Action Description

    Click Select and release the primary mouse button without movingthe pointer.

    Double-click Press the primary mouse button twice continuously and

    quickly without moving the pointer.

    Drag Press and hold the primary mouse button and move the

    pointer to a certain position.

    Update History

    Updates between document issues are cumulative. Therefore, the latest document issue containsall updates made in previous issues.

    Updates in Issue 01 (2009-04-15)

    Initial commercial release.

    About This Document

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance

    4 Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    Issue 01 (2009-04-15)

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    13/73

    1Overview of Emergency MaintenanceAbout This Chapter

    This chapterdescribes the definition of emergency events, the guidelines, flowchart and

    precautions of emergency maintenance.

    1.1 Definition of Emergency Maintenance

    This section describes the definition and functions of emergency maintenance.

    1.2 Definition of Emergencies

    This section describes the definition and category of emergencies.

    1.3 Initiation of Emergency MaintenanceThis section describes the initiation for emergency maintenance.

    1.4 Guidelines for Emergency Maintenance

    This section describes the guidelines for emergency maintenance.

    1.5 Flow forEmergency Maintenance

    This section describes the flowchart of emergency maintenance.

    1.6 Emergency Maintenance Precautions

    This section describes the precautions for emergency maintenance.

    1.7 Technical Support

    This section describes how to seek Huawei technical support.

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance 1 Overview of Emergency Maintenance

    Issue 01 (2009-04-15) Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    1-1

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    14/73

    1.1 Definition of Emergency Maintenance

    This section describes the definition and functions of emergency maintenance.

    Emergency maintenance refers to rectifying an emergency and unexpected fault (such as power-

    down and service interruption) of a system or a device to enable it to resume normal operation

    and to minimize the loss.

    Measures for emergency maintenance also guide maintenance personnel to take preventive

    measures to protect the system before a surge in traffic.

    This document describes how to perform emergency maintenance for the Quidway S9300

    Terabit Routing Switch.

    The Huawei Quidway S9300 Terabit Routing Switch (hereafter referred to as the Terabit Routing

    Switch) is applied to the access layer, convergence layer, and transport layer of the MetroEthernet networks. The Terabit Routing Switchs include S9312, S9306, and S9303.

    1.2 Definition of Emergencies

    This section describes the definition and category of emergencies.

    An emergency refers to the faults that occur unexpectedly, involve a wide range of devices or

    services, and affect network operation and service quality. For the S9300, these faults are:

    l Abnormal system: All services are interrupted.

    l Abnormal Switch Routing Unit (SRU) or Main Control Unit (MCU): All services areinterrupted.

    l Abnormal service card: Some services are interrupted.

    l Abnormal service module: Some services are interrupted.

    l Abnormal network: Network services are interrupted.

    Generally, alarms and logs about an abnormality are displayed before an emergency arises. You

    can determine whether an emergency occurs by checking either alarms and logs or a complaint

    of a customer.

    NOTE

    The roadmap of emergency maintenance described in this chapter applies to emergencies. For common

    troubleshooting, refer to the Quidway S9300Terabit Routing Switch Troubleshooting.

    1.3 Initiation of Emergency Maintenance

    This section describes the initiation for emergency maintenance.

    The causes of an emergency include a software or hardware failure, an incorrect setting, improper

    maintenance operation, a line failure, or a natural disaster. Then emergency maintenance is

    initiated in either of the following situations:

    l Complaints of customers

    1 Overview of Emergency Maintenance

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance

    1-2 Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    Issue 01 (2009-04-15)

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    15/73

    A complaint of a customer is a main cause for emergency maintenance. When a fault

    reported by a customer or the Customer Service Center conforms to the conditions in

    Definition of Emergencies. initiate the emergency maintenance.

    l Alarm indication

    When you check the alarms output by the Network Management System (NMS) ordisplayed on the terminal, initiate the emergency maintenance if the alarms possibly cause

    a wide range of service failures.

    l Natural disaster

    When a natural disaster such as an earthquake, a fire, or a flood happens, it is required to

    temporarily power off devices to prevent them from damages. Therefore, the emergency

    maintenance need be initiated. Then power on the devices again after the disaster.

    1.4 Guidelines for Emergency Maintenance

    This section describes the guidelines for emergency maintenance.

    Emergent faults easily cause network access failures of numerous users, device breakdown, and

    service interruption, posing great damage. To improve the efficiency in handling an emergent

    fault and to minimize losses, you must comply with the following basic guidelines before

    maintaining the S9300:

    l To keep the stable running of a device and minimize the probability of emergencies, refer

    to the Quidway S9300Terabit Routing Switch Routine Maintenance.

    l The core function of emergency maintenance is to recover system operation and service

    provisioning as soon as possible. To respond to an emergency, you must have ready plans

    to cope with various emergencies according to the emergency maintenance manual.

    Managers and maintenance personnel must be familiar with the plans and well-trained.l The maintenance personnel must attend the emergency maintenance training, which is

    mandatory for maintenance personnel. You must learn the basic methods of identifying

    emergent faults and how to handle them.

    l When an emergency occurs, keep calm and check whether the hardware devices and the

    routing are working normally. Then check whether the emergency is caused by an

    S9300. If it is caused by the S9300, handle the fault according to the prepared schemes or

    the procedures in this manual.

    l The CF card contains important data. When an emergency occurs, do not format the CF

    card before consulting Huawei engineers.

    l Contact the Customer Service Center or the local office of Huawei early for technical

    support during troubleshooting.

    l After handling an emergent fault, collect alarm information related to this fault and send

    the fault handling report, device alarm files, and log files to Huawei for analysis. This can

    help Huawei to improve the after-sales service.

    1.5 Flow for Emergency Maintenance

    This section describes the flowchart of emergency maintenance.

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance 1 Overview of Emergency Maintenance

    Issue 01 (2009-04-15) Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    1-3

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    16/73

    NOTE

    l You must maintain detailed records of operations and results for further reference by Huawei engineers

    during troubleshooting so that they can handle a fault quickly.

    l When a fault persists, contact Huawei Customer Service Center. For contact information, see Technical

    Support.

    The main purpose of emergency maintenance is to recover a system as soon as possible. Figure

    1-1 shows the flowchart of emergency maintenance.

    Figure 1-1 Flowchart of emergency maintenance

    Locate the Fault

    Start

    End

    Check the handling

    result

    Record information

    about emergency

    maintenance

    Service recover? Obtain help

    Collect fault

    information

    Rectify the Fault

    Yes

    No

    Notify Huawei of the

    Emergency

    1.5.1 Notifying Huawei of the Emergency

    1.5.2 Locating the Fault

    1.5.3 Collecting Fault Information

    1.5.4 Rectifying the Fault

    1.5.5 Obtaining Help

    1.5.6 Checking the Handling Result

    1.5.7 Recording Information About Emergency Maintenance

    1 Overview of Emergency Maintenance

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance

    1-4 Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    Issue 01 (2009-04-15)

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    17/73

    1.5.1 Notifying Huawei of the Emergency

    When an emergency occurs, contact Huawei immediately for technical support.

    NOTE

    Even if you can independently complete emergency maintenance with the guidance of this manual, notify

    Huawei of the emergency. Then Huawei technical personnel maintain records of the fault to improve after-

    sales services.

    1.5.2 Locating the Fault

    When an emergency occurs, identify the nature of the fault by referring the complaint of a

    customer and alarm information. An emergency can be any of the following types:

    l Abnormal system: All services are interrupted.

    l Abnormal Switch Routing Unit (SRU) or Main Control Unit (MCU): All services are

    interrupted.

    l Abnormal service card: Some services are interrupted.

    l Abnormal service module: Some services are interrupted.

    l Abnormal network: Network services are interrupted.

    1.5.3 Collecting Fault Information

    When an emergency occurs, collect and back up information about the fault on time and provide

    it to Huawei engineers when seeking technical support.

    For details about fault information collection, see Guide to Fault Information Collection.

    Recording Basic Fault Information

    Collect the following basic information:

    l Specific time when the fault occurs

    l Detailed description of the fault

    l Software version of the S9300

    l Measures taken after the fault and the results

    l Severity level of the problem and expected time of system recovery

    Backing Up Device Fault Information

    Back up the following information:

    l Indicator status of the boards, power modules, and fans

    l Device alarms

    l Device logs

    l Device configuration

    l Device debugging information if the debugging is enabled

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance 1 Overview of Emergency Maintenance

    Issue 01 (2009-04-15) Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    1-5

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    18/73

    1.5.4 Rectifying the Fault

    Check whether it is a hardware fault such as a device breakdown or a software fault such as a

    service failure according to the flowchart shown in Figure 1-2 and the identifying methods listed

    in Table 1-1.

    Figure 1-2 Flowchart for identifying the type of a fault

    Start

    A service fault occurs A device fault occurs

    Yes

    No

    No

    No

    No

    Yes

    Yes

    Yes

    Can log in

    through the

    console

    interface?

    System starts

    normally?

    Board status is

    normal?

    Interface status

    is normal?

    Table 1-1 Methods of identifying the fault type

    Item Identifying Method

    Login through

    the console

    interface

    Connect the COM port of the PC or terminal to the console interface of the

    S9300 with a standard RS-232 configuration cable and set relevant

    parameters correctly on the terminal. For details, refer to the Quidway

    S9300 Terabit Routing Switch Configuration Guide - Basic

    Configurations. Check that a terminal displays normally, for example,

    is available on the terminal.

    System startup Check whether the system starts normally. If the command prompt such as

    is displayed, it means that the system starts normally.

    1 Overview of Emergency Maintenance

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance

    1-6 Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    Issue 01 (2009-04-15)

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    19/73

    Item Identifying Method

    Board status Run the display device command on the terminal to check whether the status

    of all boards is Normal. In the case of a local fault, check the status of the

    service board connected to the user who reports the fault. For example:

    display deviceS9312's Device status:

    Slot Sub Type Online Power Register Alarm

    Primary

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -- -

    9 - LPU Present PowerOn Registered Normal NA

    13 - SRU Present PowerOn Registered Normal

    Master

    Interface

    status

    Run the display interface command on the terminal to check whether the

    status of the interface connected to the user who reports the fault is Up and

    whether more packets are transmitted and received on the interface during

    a specified period. For example: display interface GigabitEthernet 1/0/12

    GigabitEthernet1/0/12 current state : UPDescription:HUAWEI, Quidway Series, GigabitEthernet1/0/12

    Interface

    Switch Port,PVID : 1,The Maximum Frame Length is 1526IP Sending Frames' Format is PKTFMT_ETHNT_2, Hardware address is

    0018-2000-0083

    Speed : 1000, Loopback: NONE

    Duplex: FULL, Negotiation: ENABLEMdi : NORMAL

    Last 300 seconds input rate 0 bits/sec, 0 packets/sec

    Last 300 seconds output rate 616 bits/sec, 0 packets/sec

    Input: 0 packets, 0 bytesUnicast: 0, NUnicast: 0

    Discard: 0, Error : 0

    Jumbo : 0

    Output: 191636 packets, 18992248 bytesUnicast: 12, NUnicast: 191624

    Discard: 19, Error : 0

    Jumbo : 0

    After you identify the fault type, see 2 Emergency Maintenance for Device Faults and 3

    Emergency Maintenance for Service Faults to proceed with emergency maintenance.

    1.5.5 Obtaining Help

    Obtain Huawei technical support according to the contact information given in Technical

    Support.

    1.5.6 Checking the Handling Result

    After services resume, check the device status, board indicators, and alarms to confirm that the

    system runs normally. Make a dialing test to prove that services are normal. For detailed

    operations, refer to the Quidway S9300Terabit Routing Switch Routine Maintenance.

    It is recommended to arrange technical personnel to monitor the system running during the

    service peak time so that further problems can be handled immediately.

    1.5.7 Recording Information About Emergency Maintenance

    Record the following information about emergency maintenance for a further analysis:

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance 1 Overview of Emergency Maintenance

    Issue 01 (2009-04-15) Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    1-7

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    20/73

    l Time of emergency maintenance

    l Version information

    l Fault symptom

    l Handling procedure and result

    For the format of an information record table, refer to Appendix A Emergency Maintenance

    Record Table.

    You need to record the output information during emergency maintenance by using the Capture

    Text function of the HyperTerminal or the related functions of other Telnet terminals.

    1.6 Emergency Maintenance Precautions

    This section describes the precautions for emergency maintenance.

    To ensure the security of the device and safety of the operators, comply with the following

    guidelines.

    Static Electricity

    Wear an ESD wrist strap before operating a board or the backplane, and follow these rules:

    l When you replace a board,

    Perform active/standby switchover if the board to be replaced is an active SRU/MCU.

    After the active/standby switchover, remove the board. The standby SRU/MCU can be

    removed directly without active/standby switchover.

    When the board to be replaced is a standby SRU/MCU, an LPU, or a CMU, run the

    power off slotslot-idcommand to power off the board, and then remove the board.

    l Always hold the board in an antistatic bag before installing it.

    l Always place the removed board in an antistatic bag.

    Laser/LED

    When you maintain a device with an optical module or optical interface, follow these rules:

    l Do not look straight into the optical fiber from which the light beam shoots out when you

    install and maintain the optical fiber.l Do not look straight into the connector of the optical fiber from which the light beam shoots

    out when you replace the pluggable optical module.

    l Only the qualified personnel who have attended training can operate the optical module

    and optical fiber.

    CAUTION

    When you install and maintain the optical fiber, keep the connector of the optical fiber clean,

    unfolded, and straight.

    1 Overview of Emergency Maintenance

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance

    1-8 Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    Issue 01 (2009-04-15)

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    21/73

    1.7 Technical Support

    This section describes how to seek Huawei technical support.

    If a fault persists after the maintenance personnel performs emergency maintenance according

    to the flowchart, contact Huawei Technical Support Center or the local office for technical

    support.

    NOTE

    Huawei Technologies Co. Ltd provides 24-hour technical support services.

    You can contact Huawei Technical Support Center at:

    l Telephone: +86-755-28560000

    l Fax: +86-755-28560111

    l Website: http://support.huawei.com

    l Email: [email protected]

    NOTE

    l For contact information about local offices, log in to http://support.huawei.com.

    l For ease of contacting technical support personnel, it is recommended to make a phone directory and

    mark it on the maintenance site. The phone directory can contain contact information about the superior

    maintenance personnel, Huawei engineers, transmission office maintenance personnel, and remote

    office maintenance personnel. At least two contact methods of each person must be provided.

    The maintenance personnel need maintain a detailed record of the emergency maintenance

    procedures, notify Huawei of the type of the board to be replaced, and apply for a spare one

    according to the warranty articles. The fault can thus be removed sooner. The fax can adopt the

    format of the Notice of Emergency Maintenance. For the details, refer to Appendix A

    Emergency Maintenance Record Table.

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance 1 Overview of Emergency Maintenance

    Issue 01 (2009-04-15) Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    1-9

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    22/73

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    23/73

    2 Emergency Maintenance for Device FaultsAbout This Chapter

    This chapter describes the flowchart and directions of handling device faults.

    2.1 Overview

    This section describes the definition and types of device faults.

    2.2 Flow for Handling Device Faults

    This section describes the flowchart for handling device faults.

    2.3 Directions for Emergency Maintenance

    This section describes the flowchart and procedure for handling a device fault.

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance 2 Emergency Maintenance for Device Faults

    Issue 01 (2009-04-15) Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    2-1

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    24/73

    2.1 Overview

    This section describes the definition and types of device faults.

    A device fault refers to a hardware failure of a device. To rectify a device fault, you must reset,

    repair, or replace the relevant hardware.

    During the running of a device, you can determine that a device fault occurs and initiate the

    emergency maintenance in either of the following cases:

    l You fail to log in to the system through the console interface.

    l You fail to start the system.

    l The board status is abnormal.

    l The interface status is abnormal.

    2.2 Flow for Handling Device Faults

    This section describes the flowchart for handling device faults.

    The roadmap of the emergency maintenance for device faults is as follows:

    1. Check the status of the integrated system.

    2. Check the board status.

    3. Check the interface status.

    Figure 2-1 shows the flowchart for handling device faults.

    2 Emergency Maintenance for Device Faults

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance

    2-2 Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    Issue 01 (2009-04-15)

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    25/73

    Figure 2-1 Flowchart for handling device faults

    Handle the failed

    system start

    Start

    Proceed to the flow

    for handling service

    faults

    Yes

    No

    Yes

    Yes

    Yes

    Can log in

    through the

    console

    interface?

    System starts

    normally?

    Board status is

    normal?

    Interface

    status is

    normal?

    Handle the failed

    system login

    through the console

    interface

    Handle the

    abnormality of the

    board status

    Handle the

    abnormality of the

    interface status

    No

    No

    No

    2.3 Directions for Emergency Maintenance

    This section describes the flowchart and procedure for handling a device fault.

    2.3.1 Failure to Log In to a System Through the Console Interface

    2.3.2 Failure to Start a System

    2.3.3 Abnormality of the Board Status

    2.3.4 Abnormality of the Interface Status

    2.3.1 Failure to Log In to a System Through the Console Interface

    Fault Description

    After the COM port of a PC or terminal is connected to the console interface of a S9300 with a

    standard RS-232 configuration cable and the relevant parameters are set, nothing is displayedon the terminal.

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance 2 Emergency Maintenance for Device Faults

    Issue 01 (2009-04-15) Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    2-3

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    26/73

    Fault Information Collection

    If you are unable to log in to a system through the console interface, collect the following

    information besides that described in Guide to Fault Information Collection for future

    reference.

    Table 2-1 Collection of information about the failure to log in to a system through the console

    interface

    No. Collecting Item Collecting Method

    1 Communication

    parameters of the

    COM port

    Check the communication parameters of the COM port such

    as the Windows-based HyperTerminal, including the bard

    rate, data bit, parity check or not, stop bit, and flow control or

    not.

    2 Indicator status Check the status of the following indicators:

    l RUN, ALM, and ACT indicators of the MCU/SRU

    l RUN, ALM, and FAULT indicators of the power modules

    l Status indicators of the fans

    2 Emergency Maintenance for Device Faults

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance

    2-4 Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    Issue 01 (2009-04-15)

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    27/73

    Handling Flowchart

    Figure 2-2 Flowchart for handling the failed login to a system through the console interface

    start

    Replace the cableNo

    Reset the system

    No

    Yes

    Yes

    Yes

    Yes

    Yes

    No

    No

    No

    Modify the

    parametersFault rectified?

    No

    Yes

    Yes

    NoParameters of

    the COM

    interface are

    correct?

    Cable is in good

    condition?

    Power module runs

    normally?

    The SRU/MCU

    runs normally?

    Exchange replace

    the SRU/MCU

    Repair the power

    supply system

    Fault rectified?

    Fault rectified?

    Fault rectified?

    Fault rectified?

    No

    Yes

    Yes

    No

    Seek technical

    supportEnd

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance 2 Emergency Maintenance for Device Faults

    Issue 01 (2009-04-15) Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    2-5

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    28/73

    CAUTION

    All the following steps can be performed only when the user services are already interrupted. If

    the user services are not interrupted, collect fault information and provide feedback to Huaweiengineers for further processing.

    Procedure

    Step 1 Check and modify the parameters of the COM port.

    Check whether the parameters of the COM port are identical with those of the console interface

    on the S9300. If the parameters are not identical, modify the parameters of the COM port.

    By default, the console interface of the S9300 adopts 9600 bps as the baud rate, 8 as the data

    bit, 1 as the stop bit, no parity check, and no flow control.NOTE

    When the parameters of the console interface are modified, adopt the modification.

    Step 2 Check and replace the cable.

    If the parameters of the COM port are correct, check whether the cable is in good condition.

    You can replace the cable with a new one to check that you can normally log in.

    Step 3 Check and repair the power supply system.

    When you find that the indicators of all the boards are off and all the fans fail to work (can be

    determined by listening to fans rotating), or the ALM indicator of the power module is on, thepower supply system of the device is possibly faulty and need repairs.

    The power supply system consists of the following:

    l Power supply system of the equipment room, chassis, or cabinet

    l Power module

    l Power supply system of the backplane

    You can check the power supply system as follows:

    l Check that the power module is switched on. When there are multiple power modules, ensure

    that at least one works normally.

    l Check whether the ALM indicator of the power module is on. If so, it indicates that the power

    module is faulty. You can replace the power module to solve the problem.

    l When no problem is found after the preceding checking, but the power supply system fails

    to work, see Technical Support for Huawei technical support.

    Step 4 Exchange and replace the MCU/SRU.

    After you confirm that the parameters of the COM port are correctly set, the cable is in good

    condition, and the power supply system works normally, the MCU/SRU is possibly faulty. When

    there are active and standby MCUs/SRUs, you can connect the configuration cable to the standby

    MCU/SRU; when there is only one MCU/SRU, you can replace it with a spare one.

    Step 5 Reset the system.

    2 Emergency Maintenance for Device Faults

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance

    2-6 Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    Issue 01 (2009-04-15)

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    29/73

    After you perform the preceding steps, you can reset the system if the fault persists. You can

    switch off the power and switch on the power module after three minutes to reset the system.

    Step 6 Seek technical support.

    For seeking Huawei technical support, see Technical Support.

    ----End

    2.3.2 Failure to Start a System

    Fault Description

    A system fails to start and the terminal runs as follows:

    l The terminal displays a message indicating that initialization fails.

    l The terminal stops at the file decompression state for a long period.

    l The system restarts continuously.

    Fault Information Collection

    If you are unable to start a system, collect the following information besides the generic

    information described in Guide to Fault Information Collection for future reference.

    Table 2-2 Collection of information about the failure to start a system

    No. Collecting Item Collecting Method

    1 Information about

    system startup

    Use the Capture Text function of the HyperTerminal or the

    related functions of other Telnet terminals to record

    information about system startup through a COM port or

    Telnet terminal.

    2 Name of the startup file Check the name of the startup file through the Basic Input/

    Output System (BIOS) menu.

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance 2 Emergency Maintenance for Device Faults

    Issue 01 (2009-04-15) Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    2-7

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    30/73

    Handling Flowchart

    Figure 2-3 Flowchart for handling the failed system start

    Start

    Plug in and out

    the Cfcard

    Yes

    No

    No

    No

    Yes

    Yes

    Debug or replace

    the SRU/MCU

    Re-upload the startup

    files through BIOS

    Yes

    End

    Make the startup files

    of active/standby

    SRU/MCU identical

    Fault rectified?

    Replace the

    CfcardFault rectified?

    Fault rectified?

    Fault rectified?

    Fault rectified?

    YesThe Cfcard

    self-test fails?

    The module

    self-test fails?

    File is incorrectly

    decompressed?

    System continuously

    restarts?

    Yes

    Yes

    Yes

    Yes

    No

    No

    No

    No

    No

    Seek technical

    support

    CAUTION

    All the following steps can be performed only when the user services are already interrupted. If

    the user services are not interrupted, collect fault information and provide feedback to Huawei

    engineers for further processing.

    2 Emergency Maintenance for Device Faults

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance

    2-8 Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    Issue 01 (2009-04-15)

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    31/73

    Procedure

    Step 1 Remove and insert the CF card.

    If the "CF Card Init.....FAIL!" message is displayed, the CF card may be held loosely. You can

    try the following operations to solve the problem:1. Remove the MCU/SRU.

    2. Remove the CF card, and then insert it.

    3. Re-insert the MCU/SRU.

    Step 2 Replace the CF card.

    If the fault cannot be rectified after the CF card is re-inserted, you need to replace the CF card.

    Step 3 Replace the MCU/SRU.

    When either the system prompts "Initializing module IPC_VP_CHANNEL.................FAIL!",

    or the memory self-test still fails after you perform Steps 1 and 2, the MCU/SRU is possiblyfaulty. You can try to replace the MCU/SRU. When there is only one MCU/SRU, you can replace

    it with a spare one.

    Step 4 Upload the startup file through BIOS again.

    When the system stops at the phase of file decompression or continuously restarts, the startup

    file is possibly incorrect or damaged. You can try to upload the startup file through BIOS.

    It is complicated to upload the startup file through BIOS. Contact Huawei engineers and perform

    the uploading with their guidance. For the procedures, see System Upgrading Through

    BIOS.

    Step 5 Seek technical support.

    For seeking Huawei technical support, see Technical Support.

    ----End

    2.3.3 Abnormality of the Board Status

    Fault Description

    The abnormality of the board status can be determined in one or more of the following cases:

    l When you run the display device command to view information about a board, the boardstatus is Abnormal.

    l When you run the display device command to view information about a board, the board

    status is Unregistered.

    l The RUN/ALM indicator of a board blinks at a frequency of 2 Hz or the red indicator is

    on.

    l A board continuously restarts.

    Fault Information Collection

    For the abnormality of the board status, collect the following information besides the genericinformation described in Guide to Fault Information Collection for future reference.

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance 2 Emergency Maintenance for Device Faults

    Issue 01 (2009-04-15) Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    2-9

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    32/73

    Table 2-3 Collection of information about the abnormality of the board status

    No. Collecting Item Collecting Method

    1 Indicator status of a

    board

    Check whether the indicator of a board is off, is on, blinks

    at a frequency of 2 Hz, or blinks at a frequency of 1 Hz.

    2 Detailed information

    about a board

    Check detailed information about a board by using the

    display deviceslot-idcommand.

    Handling Flowchart

    Figure 2-4 Flowchart for handling the abnormality of the board status

    Start

    Fault rectifyed?

    No

    End

    Yes

    Yes

    Reset the board

    No

    Replace the board

    Fault rectifyed?

    Cut over the services

    on the board and seek

    technical support

    CAUTION

    All the following steps can be performed only when the user services are already interrupted. If

    the user services are not interrupted, collect fault information and provide feedback to Huawei

    engineers for further processing.

    Procedure

    Step 1 Reset the board.

    2 Emergency Maintenance for Device Faults

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance

    2-10 Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    Issue 01 (2009-04-15)

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    33/73

    It is complicated to handle the abnormality of the board status. In an emergency situation, it is

    recommended to solve the problem by resetting or replacing the board. For other maintenance

    measures, such as fault location, contact Huawei engineers.

    You can reset a board by using the reset slot command, pressing the RESET button on the panel,

    or plugging in/out the board.

    Step 2 Replace the board.

    When resetting the board fails to solve the problem, you can try to replace the board with a spare

    one.

    Step 3 Cut over the services on the board and seek technical support.

    After you perform steps 1 and 2, but the fault persists, you can cut over the services on the faulty

    board to a board that is running normally or in an idle slot. For the cutover operations, contact

    Huawei engineers or perform the cutover according to the cutover scheme of the customer.

    In addition, provide fault information to the local office for technical support.

    ----End

    2.3.4 Abnormality of the Interface Status

    Fault Description

    The abnormality of the interface status can be determined in one or more of the following cases:

    l When you run the display interface command to view the status of an interface, the

    interface status is DOWN.

    l When you run the display interface command to view the status of an interface, the numberof the sent and received packets on the interface remains the same.

    l The indicator status of an interface is abnormal. For example, the LINK indicator of the

    interface is off.

    Fault Information Collection

    For the abnormality of the interface status, collect the following information besides the generic

    information described in Guide to Fault Information Collection for future reference.

    Table 2-4 Collection of information about the abnormality of the interface status

    No. Collecting Item Collecting Method

    1 Indicator status of an

    interface

    Check whether the indicator status of an interface is off, is

    on, blinks at a frequency of 2 Hz, or blinks at a frequency

    of 1 Hz.

    2 Detailed information

    about an interface

    Collect detailed information about an interface by using

    the display interfaceinterface-type interface-number

    command.

    3 Brief IP-related

    information about an

    interface

    Collect brief IP-related information about an interface by

    using the display ip interface briefcommand.

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance 2 Emergency Maintenance for Device Faults

    Issue 01 (2009-04-15) Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    2-11

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    34/73

    No. Collecting Item Collecting Method

    4 Brief information about

    all interfaces

    Collect brief information about all interfaces by using the

    display interface briefcommand.

    Handling Flowchart

    Figure 2-5 Flowchart for handling the abnormality of the interface status

    Cut over the services

    on the board and seek

    technical support

    Start

    No

    End

    Yes

    Yes

    NoFault rectified?

    Proceed to the flow for

    handling service faults

    Detect the link Fault rectified?

    Shut up the interface

    Perform a local

    loopback test

    Reset the interface

    Check and modify the

    configuration of the data

    link layer or the upper

    layer protocol

    End

    Yes

    Yes

    Yes

    Yes

    Yes

    No

    No

    No

    No

    No

    Status of

    interface indicator

    normal?

    Interface status

    is Up?

    Packets are

    transeived

    normally?

    Is manually shut

    down?

    Is the status

    normal?

    2 Emergency Maintenance for Device Faults

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance

    2-12 Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    Issue 01 (2009-04-15)

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    35/73

    CAUTION

    All the following steps can be performed only when the user services are already interrupted. If

    the user services are not interrupted, collect fault information and provide feedback to Huaweiengineers for further processing.

    Procedure

    Step 1 Start the interface.

    When you find that an interface is shut down through the shutdown command by checking the

    configuration, you can run the undo shutdown command in the interface view to start it.

    Step 2 Detect the link.

    Before detecting a link, check whether the LINK indicator of the interface is on.If so, it indicates that the physical link is Up and you can detect the link as follows:

    1. Check that the interface parameters at both ends of the link are identical, such as the duplex

    mode and rate.

    2. When the interfaces are optical ones, check whether the receiving and sending optical

    powers at both ends are normal by using the optical power meter. When you find that either

    end only sends or receives data, the optical module is possibly faulty or the optical fiber

    possibly fails to match the optical module. Then you can try to replace the optical module

    or the optical fiber.

    DANGER

    Do not look straight into the optical fiber from which the light beam shoots out reversely along

    a beam of light when you check the receiving and sending optical powers. You must use the

    optical power meter to measure the optical power.

    When the LINK indicator of the interface is off, you can check the link as follows:

    1. Perform a physical loopback test on the device. That is, connect the faulty interface to

    another interface that is in the normal state with an optical fiber or cable in good condition.

    2. When the LINK indicator is on, it indicates that the interface runs normally. You needcheck whether the optical fiber or the cable is damaged and whether the trunk link runs

    normally. In this case, the neighboring office is required to cooperate.

    3. If the LINK indicator is off, it indicates that the interface hardware is faulty. When a

    pluggable optical module is used, you can replace the optical module; otherwise, you can

    cut over the services from the faulty interface to another interface that runs normally.

    Step 3 Perform a local loopback test.

    When the interface status is Up, but the number of sent and received packets on the interface

    remains the same during a long period, it indicates that the interface neither receives nor sends

    any packets. Then you can run the loopback local command on the interface to perform a local

    loopback test and test data sending and receiving by using the ping command to view the changein the number of sent and received packets.

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance 2 Emergency Maintenance for Device Faults

    Issue 01 (2009-04-15) Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    2-13

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    36/73

    NOTE

    After the local loopback test is complete, run the undo loopbackcommand to disable the local loopback

    immediately.

    Step 4 Check and modify the configurations of the data link layer or the upper layer protocols.

    If the interface still fails to send and receive packets in the local loopback test, check the

    configuration of the data link layer or the upper layer protocols. For example, check that the

    configurations of the Point-to-Point Protocol (PPP) or the High level Data Link Control protocol

    at both ends are identical and the routing protocols run normally.

    Step 5 Reset the interface.

    After you perform the preceding steps, you can reset the interface if the fault persists.

    To reset an interface, run the shutdown and undo shutdown commands.

    Step 6 Contact Huawei technical support personnel.

    For seeking Huawei technical support, see 1.7 Technical Support.

    ----End

    2 Emergency Maintenance for Device Faults

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance

    2-14 Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    Issue 01 (2009-04-15)

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    37/73

    3 Emergency Maintenance for Service FaultsAbout This Chapter

    This chapter describes the flowchart and directions for handling service faults.

    3.1 Overview

    This section describes the definition and types of service faults.

    3.2 Flow for Handling Service Faults

    This section describes the flowchart for handling service faults.

    3.3 Guide to Emergency Maintenance

    This section describes the flowchart and procedure for handling a service fault.

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance 3 Emergency Maintenance for Service Faults

    Issue 01 (2009-04-15) Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    3-1

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    38/73

    3.1 Overview

    This section describes the definition and types of service faults.

    A service fault refers to the partial or global service congestion due to a software or network

    fault. You can handle a service fault by modifying service configuration, resetting service

    modules, or restoring network connections.

    NOTE

    Generally, a hardware fault may result in service interruption. For the handling of a device fault, see

    Emergency Maintenance for Device Faults.

    This chapter describes the emergency maintenance for service faults, focusing on fault clearance

    and prompt service recovery rather than fault rectification. To locate, handle, and rectify common

    service faults, refer to the Quidway S9300Terabit Routing Switch Troubleshooting.

    For the S9300, emergent service faults that commonly occur fall into the following:

    l Failure to forward IP unicast packets

    l Failure to forward IP multicast packets

    l Failure to forward MPLS VPN packets

    NOTE

    MPLS VPN = Multi-Protocol Label Switching Virtual Private Network

    3.2 Flow for Handling Service Faults

    This section describes the flowchart for handling service faults.

    Figure 3-1 shows the flowchart for handling service faults.

    3 Emergency Maintenance for Service Faults

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance

    3-2 Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    Issue 01 (2009-04-15)

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    39/73

    Figure 3-1 Flowchart for handling service faults

    Start

    Handle the faultYes

    No

    Handle the fault

    Handle the fault

    Handle the fault

    End

    Fault involves

    all users?

    Fault involves

    users on certain

    board?

    Fault involves

    users on certain

    interface?

    Fault involves

    users of certain

    type?

    Fault involves

    single users?

    Proceed to the

    troubleshooting flow

    Yes

    Yes

    Yes

    Yes

    No

    No

    No

    No

    NOTE

    For a fault affects a single user, you do not need to initiate the emergency maintenance. For the common

    handling flowchart of a fault, refer to the Quidway S9300Terabit Routing Switch Troubleshooting.

    3.3 Guide to Emergency MaintenanceThis section describes the flowchart and procedure for handling a service fault.

    3.3.1 Failure to Forward IP Unicast Packets

    3.3.2 Failure to Forward IP Multicast Packets

    3.3.3 Failure to Forward MPLS VPN Packets

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance 3 Emergency Maintenance for Service Faults

    Issue 01 (2009-04-15) Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    3-3

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    40/73

    3.3.1 Failure to Forward IP Unicast Packets

    Fault Description

    Certain unicast packets on a network cannot be forwarded.

    Fault Information Collection

    If IP unicast packets cannot be forwarded, collect the following information besides the generic

    information described in Guide to Fault Information Collection for fault location and future

    reference.

    Table 3-1 Collection of information about the failure to forward IP unicast packets

    No. Collecting Item Collecting Method1 Information about FIB entries display fib

    2 Information about certain FIB entries on

    a board in a specified slot

    display fib[ vpn-instancevpn-instance-

    name ] [ | { begin | exclude | include }

    regular-expression ]

    3 Information about ARP entries on a

    specified interface

    display arp interface

    4 Information about public BGP routes display bgp routing-table

    5 Information about the routes advertised

    to and imported from a specified peer

    display bgp routing-table peerip-

    address { advertised-routes | received-routes } [ statistics ]

    6 Information about the establishment of

    all BGP peers

    display bgp peer

    7 Information about the interface enabled

    with IS-IS

    display isis interface [ verbose ]

    8 Information about IS-IS LSDBs display isis lsdb

    9 Configuration of mesh-groups display isis mesh-group

    10 Information about the IS-IS neighborrelationships set up with the local end

    display isis peer [ verbose ]

    11 Information about the IS-IS routing table display isis route

    12 Logs of IS-IS routing calculation display isis spf-log

    13 OSPF errors display ospf error

    14 Information about the interface enabled

    with OSPF

    display ospf interface

    15 Information about OSPF peers display ospf peer

    16 Information about OSPF LSDBs display ospf lsdb

    3 Emergency Maintenance for Service Faults

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance

    3-4 Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    Issue 01 (2009-04-15)

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    41/73

    No. Collecting Item Collecting Method

    17 Information about the OSPF routing

    table

    display ospf routing

    18 Running status and configuration of RIPprocesses

    display rip

    19 Information about all the activated

    routes of the RIP database

    display ripdatabase

    20 Information about the interface enabled

    with RIP

    display ripinterface

    21 Information about RIP neighbors display ripneighbor

    NOTEFIB = Forwarding Information Base; ARP = Address Resolution Protocol; BGP = Border Gateway

    Protocol; IS-IS = Intermediate System to Intermediate System; LSDB = Link State Database; OSPF =Open Shortest Path First; RIP = Routing Information Protocol

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance 3 Emergency Maintenance for Service Faults

    Issue 01 (2009-04-15) Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    3-5

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    42/73

    Handling Flowchart

    Figure 3-2 Flowchart for handling the failure to forward IP unicast packets

    Start

    Refresh the FIB

    entries

    Reset the system

    Recover the

    uplinkFault rectified?

    End

    Recover the

    downlink

    Reset the routing

    protocol

    Yes

    Yes

    Yes

    Yes

    Yes

    No

    No

    No

    No

    No

    Yes

    Yes

    Yes

    No

    No

    No

    No

    Yes

    Can receive

    upstream

    packets?

    Can forward

    packets?

    Routing entries

    are correct?

    Forwarding

    entries arecorrect?

    Fault rectified?

    Fault rectified?

    Fault rectified?

    Fault rectified?

    Seek technical

    support

    CAUTION

    All the following steps can be performed only when the user services are already interrupted. If

    the user services are not interrupted, collect fault information and provide feedback to Huawei

    engineers for further processing.

    3 Emergency Maintenance for Service Faults

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance

    3-6 Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    Issue 01 (2009-04-15)

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    43/73

    Procedure

    Step 1 Check and recover the uplink.

    When some unicast packets fail to be forwarded, check whether the S9300 can receive upstream

    packets. You can run the display interface command to view whether the number of receivedpackets on the device changes. When you find that the device cannot receive any upstream

    packets, perform the following:

    1. Check whether the status of the upstream interface on the S9300 is normal. For details, see

    Abnormality of the Interface Status.

    2. If the status of the upstream interface is normal, ping the peer interface of the upstream

    interface. When the ping is successful, you can assume that a fault occurs on the upstream

    device. To recover the system, contact the site office where the upstream device resides.

    3. When the ping fails, detect the link connecting the interface on the S9300 to the upstream

    device. For example, check the cable for correct positioning, the optical module and the

    optical power for normality, the relay agent for normality, and the IP address for

    correctness.

    4. If the fault persists after you perform the preceding steps, contact Huawei for technical

    support. For seeking technical support, see Technical Support.

    Step 2 Check and recover the downlink.

    When the S9300 can receive incoming packets rather than send packets, check the connection

    and communication between the S9300 and the downstream device as follows:

    1. Check whether the status of the downstream interface on the S9300 is normal. For details,

    see Abnormality of the Interface Status.

    2. If the status of the downstream interface is normal, ping the peer interface of the downstream

    interface. When the ping is successful, you can judge that a fault occurs on the downstreamdevice. To recover the system, contact the site office where the downstream device resides.

    3. When the ping fails, detect the link connecting the interface on the S9300 to the downstream

    device. For example, check the cable for correct positioning, the optical module and the

    optical power for normality, the relay agent for normality, and the IP address for

    correctness.

    4. When the link is in good condition, the communication between the S9300 and the

    downstream device is possibly abnormal. You need to check the configuration such as

    routing according to the following step.

    Step 3 Check and restore the routing entries.

    If the S9300 fails to communicate with its downstream device, the routing entries are possiblyincorrect. You can try to check and restore the routing entries as follows:

    1. Check whether a route to the downstream device exists in the routing table of the S9300.

    If the route does not exist, add a static route, and then check whether the ARP entries on

    the downstream device can be learned.

    2. When the ARP entries on the downstream device cannot be learned, you can add static

    ARP entries.

    3. If there is still no route to the downstream device in the routing table of the S9300, the

    routing table is possibly oversized. You can try to delete unnecessary routing entries and

    update the routing table. Then check whether the S9300 learns the route.

    4. If a route to the downstream device exists, check this routing entry for its correctness, suchas the routing protocol, subnet mask, preference, and hop count. As the troubleshooting of

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance 3 Emergency Maintenance for Service Faults

    Issue 01 (2009-04-15) Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    3-7

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    44/73

    IP routing is complicated, it is not mentioned here. For details, refer to the Quidway

    S9300Terabit Routing Switch Troubleshooting - IP Routing.

    5. If the fault persists after you perform the preceding steps, reset the relevant routing protocol.

    For example, reset all IS-IS connections through the reset isis all command.

    6. If resetting the relevant routing protocol is ineffective, proceed to the following step.

    Step 4 Check and restore FIB entries.

    If the communication fails when the routing entries are normal, the FIB entries are possibly

    incorrect. You can run the display fib [ verbose ] command to check the FIB entries for their

    correctness. In the case of incorrect FIB entries, update the FIB entries and deliver them again.

    Step 5 Reset the system.

    To solve a software problem, resetting the system is the last and most effective solution. If other

    users are not affected, you can reset the system to solve the problem.

    Before resetting a system by using the reboot command, save the current configurations with

    the save command. If the fault impacts a small range, you can run the schedule reboot command

    to reset the system in off hours such as the wee hours.

    NOTE

    If the system can be restarted through a software program, do not reset the system.

    Step 6 Seek technical support.

    For seeking Huawei technical support, see Technical Support.

    ----End

    3.3.2 Failure to Forward IP Multicast Packets

    Fault Description

    You can determine that a failure to forward IP multicast packets occurs in either of the following

    situations:

    l A multicast distribution tree (MDT) cannot be set up.

    l No multicast routing entry exists on the S9300 directly connected to the multicast source.

    l Clients fail to receive multicast data, which may be due to the incorrect configuration ofthe Internet Group Management Protocol (IGMP).

    l The Protocol Independent Multicast (PIM) routing table has no (S, G) entry.

    l The multicast data can reach intermediate S9300s but not the last hop S9300.

    l Although an interface on an intermediate S9300 receives the multicast data, no

    corresponding (S, G) entry is created in the PIM routing table.

    l The static Rendezvous Point (RP) fails to communicate with the dynamic RP.

    l Mosaics are displayed in the multicast video image on clients.

    l The multicast video programs displayed are asynchronous on the clients connected to

    different S9300s, but the program is played fluently, without mosaics.

    3 Emergency Maintenance for Service Faults

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance

    3-8 Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    Issue 01 (2009-04-15)

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    45/73

    Fault Information Collection

    If the IP multicast packets cannot be forwarded, collect the following information besides the

    generic information described in Guide to Fault Information Collection for future reference.

    NOTE

    l Before using the debugging command to collect debugging information, run the terminal debugging

    command to enable the debugging display on a terminal, and then run the terminal monitor command

    to enable the display on the terminal.

    l For ease of fault location, it is recommended to collect long-term debugging information.

    l After you collect debugging information, run the undo debugging all command to disable all the

    debugging immediately.

    Table 3-2 Collection of information about the failure to forward IP multicast packets

    No. Collecting Item Collecting Method

    1 All routes learned on the S9300 display ip routing-table

    2 PIM routing table on the S9300 display pim routing-table

    3 Information about the unicast routes

    used by PIM

    display pim claimed-route

    4 Multicast routing table on the S9300 display multicast routing-table

    5 Multicast forwarding table on the

    S9300

    display multicast forwarding-table

    6 All PIM neighbors of the S9300 display pim neighbor

    7 All the interfaces enabled with PIM

    on the S9300

    display pim interface

    8 BSR information learned by the

    S9300 when PIM-SM is enabled

    display pim bsr-info

    9 RP information learned by the

    S9300 when PIM-SM is enabled

    display pim rp-info

    10 Whether the group that wants to

    receive multicast data can be mapped

    to the RP when the S9300 runs PIM-

    SM

    display pim rp-infogroup-address

    11 Information about the RPF neighbors

    and interfaces of the RPF from the

    S9300 to the multicast source

    display multicast rpf-infosource-address

    12 Information about IGMP groups display igmp group

    13 Information about IGMP interfaces display igmp interface

    14 Information about the IGMP routing

    table

    display igmp routing-table

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance 3 Emergency Maintenance for Service Faults

    Issue 01 (2009-04-15) Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    3-9

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    46/73

    No. Collecting Item Collecting Method

    15 All debugging information about PIM After you collect information by using the

    debugging pim all command, disable the

    debugging immediately.

    16 All debugging information about

    IGMP

    After you collect information by using the

    debugging igmp all command, disable the

    debugging immediately.

    NOTEPIM-SM = Protocol Independent Multicast-Sparse Mode; RPF = Reverse Forwarding Path

    3 Emergency Maintenance for Service Faults

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance

    3-10 Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    Issue 01 (2009-04-15)

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    47/73

    Handling Flowchart

    Figure 3-3 Flowchart for handling the failure to forward IP multicast packets

    Start

    Restore the routing

    protocol configurations

    Reset the system

    Restore the IGMP

    configurationsFault rectified?

    End

    Modify the TTL

    Restore the RP

    configurations

    Yes

    No

    Multicast routing

    entries are

    correct?

    RP about group G

    on all devices is

    identical?

    TTL of the

    packets is big

    enough to clients?

    User joins the

    multicast group G

    by IGMP?

    Fault rectified?

    Fault rectified?

    Fault rectified?

    Fault rectified?

    Yes

    Yes

    Yes

    No

    No

    No

    Yes

    Yes

    Yes

    Yes

    Yes

    No

    No

    No

    No

    No

    Seek technical

    support

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance 3 Emergency Maintenance for Service Faults

    Issue 01 (2009-04-15) Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    3-11

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    48/73

    CAUTION

    All the following steps can be performed only when the user services are already interrupted. If

    the user services are not interrupted, collect fault information and provide feedback to Huaweiengineers for further processing.

    Procedure

    Step 1 Check and restore the IGMP configuration.

    When clients fail to receive multicast data, check the IGMP configuration on the S9300

    connecting the clients for correctness as follows:

    1. Check whether multicast is enabled on the S9300. That is, check whether the multicast

    routing-enable command is run. If the command is not run, enable multicast in the system

    view and ensure that IGMP is enabled on all interfaces. Then check whether the clients

    succeed in receiving multicast data.

    2. If the clients still fail to receive multicast data, check whether the interface status is normal.

    Run the display igmp interfaceinterface-name command to view whether information

    about the specified interface is displayed. If no information is displayed, see Abnormality

    of the Interface Status to handle it; if the interface status is normal, check whether the

    clients succeed in receiving multicast data.

    3. If the clients still fail to receive multicast data, check whether access control lists (ACLs)

    are configured on the interface to prevent group G from joining the multicast group. Run

    the display current-configuration interface interface-name command to check whether

    the IGMP group policy is configured. If so, modify the ACL configuration to permit IGMP

    group G to join the multicast group. Then check whether the clients succeed in receivingmulticast data.

    4. When the clients still fail to receive multicast data, check whether the interface resides on

    the same network as the hosts. If the interface resides on a different network, modify the

    IP address of the interface, and then check whether the clients succeed in receiving multicast

    data.

    5. If the fault persists after you perform the preceding checking, run the reset igmp group

    command to delete the IGMP group, and then add it again to the multicast group.

    6. If deleting the IGMP group is not effective, proceed to the following step.

    Step 2 Check and modify the Time-to-Live (TTL) value of the packets sent by the multicast source.

    Check the TTL value of the (S, G) packets sent by the S server. If this value is too small, it is

    recommended to modify the TTL value to a larger one. The larger TTL value thus ensures the

    packets reach the hosts.

    Step 3 Check and modify the RP configuration.

    If the fault persists after you perform the preceding steps, check the RP configuration for

    correctness. First, ensure that all the devices in the PIM domain are enabled with PIM. There

    are two cases:

    When an RP is specified statically in the network, perform the following:

    1. Check whether the same static-rp command is run on all the devices. If the command isnot run, run the same static-rp command on all the devices, and then check the receiving

    3 Emergency Maintenance for Service Faults

    Quidway S9300 Terabit Routing Switch

    Emergency Maintenance

    3-12 Huawei Proprietary and Confidential

    Copyright Huawei Technologies Co., Ltd.

    Issue 01 (2009-04-15)

  • 7/30/2019 Emergency Maintenance(V100R001 01)

    49/73

    of multicast data. When ACLs are configured, ensure that the ACL configurations are also

    the same. Then check whether the clients succeed in receiving multicast packets.

    2. Check whether ACLs are configured to prevent the static RP from serving group G. If so,

    modify the ACL configuration to remove the restriction. Then check whether the clients

    succeed in receiving multicast packets.

    When a dynamic BSR-RP is configured in the network, perform the following:

    1. Check whether the BSR is correctly configured by running the display pim bsr-info

    command on the BSR. If the BSR is not configured, re-configure the BSR.

    2. Run the display pim rp-info command on the BSR to check whether the BSR learns RP

    information. If the BSR fails to learn RP information, check that the RP is correctly

    configured, a route between the BSR and the RP exists, and the BSR and the RP can ping

    each other. If the route is faulty, refer to the Quidway S9300Terabit Routing Switch

    Troubleshooting - Multicast.

    3. Run the display current-configuration command on both the BSR and the RP to check

    whether the crp-policy commands are run to prohibit group G. If so, modify the ACLconfiguration.

    4. If performing this step is not effective, proceed to the following step.

    Step 4 Check and restore multicast routing entries.

    If the fault per


Recommended