+ All Categories
Home > Documents > 4521 674297 AdminShutdown Rev4 2 External

4521 674297 AdminShutdown Rev4 2 External

Date post: 14-Apr-2018
Category:
Upload: muhammad-mohsin
View: 222 times
Download: 0 times
Share this document with a friend

of 16

Transcript
  • 7/29/2019 4521 674297 AdminShutdown Rev4 2 External

    1/16

    CONTROLLED SHUTDOWN OF SP CORE LINKS

    TECHNICAL PAPER

    Peter De Vriendt

    Clarence Filsfils

    Document Version 1.1

    2008-05-28

    Cisco Systems, Inc.170 West Tasman DriveSan Jose, CA 95134-1706USAhttp://www.cisco.comTel: 408 526-4000

    800 553-NETS (6387)Fax: 408 526-4100

  • 7/29/2019 4521 674297 AdminShutdown Rev4 2 External

    2/16

    Controlled Shutdown of SP Core Links Technical Paper

    Page 2 of 16 Copyright 2009 Cisco Systems, Inc. All rights reserved. Document Version 1.1

    . Legal Notice

    THE SPECIFICATIONS AND INFORMATION REGARDING THE PRODUCTS IN THIS DOCUMENT ARESUBJECT TO CHANGE WITHOUT NOTICE. ALL STATEMENTS, INFORMATION, ANDRECOMMENDATIONS IN THIS MANUAL ARE BELIEVED TO BE ACCURATE BUT ARE PRESENTEDWITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. USERS MUST TAKE FULL

    RESPONSIBILITY FOR THEIR APPLICATION OF ANY PRODUCTS.

    THE SOFTWARE LICENSE AND LIMITED WARRANTY FOR THE ACCOMPANYING PRODUCT ARESET FORTH IN THE INFORMATION PACKET THAT SHIPPED WITH THE PRODUCT AND AREINCORPORATED HEREIN BY THIS REFERENCE. IF YOU ARE UNABLE TO LOCATE THE SOFTWARELICENSE OR LIMITED WARRANTY, CONTACT YOUR CISCO REPRESENTATIVE FOR A COPY.

    The Cisco implementation of TCP header compression is an adaptation of a program developed by the Universityof California, Berkeley (UCB) as part of UCBs public domain version of the UNIX operating system. All rightsreserved. Copyright 1981, Regents of the University of California.

    NOTWITHSTANDING ANY OTHER WARRANTY HEREIN, ALL DOCUMENT FILES AND SOFTWAREOF THESE SUPPLIERS ARE PROVIDED AS IS WITH ALL FAULTS. CISCO AND THE ABOVE-

    NAMED SUPPLIERS DISCLAIM ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING,WITHOUT LIMITATION, THOSE OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSEAND NONINFRINGEMENT OR ARISING FROM A COURSE OF DEALING, USAGE, OR TRADEPRACTICE.

    IN NO EVENT SHALL CISCO OR ITS SUPPLIERS BE LIABLE FOR ANY INDIRECT, SPECIAL,CONSEQUENTIAL, OR INCIDENTAL DAMAGES, INCLUDING, WITHOUT LIMITATION, LOSTPROFITS OR LOSS OR DAMAGE TO DATA ARISING OUT OF THE USE OR INABILITY TO USE THISMANUAL, EVEN IF CISCO OR ITS SUPPLIERS HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCHDAMAGES.

    CCDE, CCENT, Cisco Eos, Cisco Lumin, Cisco Nexus, Cisco StadiumVision, Cisco TelePresence, CiscoWebEx, the Cisco logo, DCE, and Welcome to the Human Network are trademarks; Changing the Way We

    Work, Live, Play, and Learn and Cisco Store are service marks; and Access Registrar, Aironet, AsyncOS,Bringing the Meeting To You, Catalyst, CCDA, CCDP, CCIE, CCIP, CCNA, CCNP, CCSP, CCVP, Cisco, theCisco Certified Internetwork Expert logo, Cisco IOS, Cisco Press, Cisco Systems, Cisco Systems Capital, theCisco Systems logo, Cisco Unity, Collaboration Without Limitation, EtherFast, EtherSwitch, Event Center, FastStep, Follow Me Browsing, FormShare, GigaDrive, HomeLink, Internet Quotient, IOS, iPhone, iQuick Study,IronPort, the IronPort logo, LightStream, Linksys, MediaTone, MeetingPlace, MeetingPlace Chime Sound, MGX,Networkers, Networking Academy, Network Registrar, PCNow, PIX, PowerPanels, ProConnect, ScriptShare,SenderBase, SMARTnet, Spectrum Expert, StackWise, The Fastest Way to Increase Your Internet Quotient,TransPath, WebEx, and the WebEx logo are registered trademarks of Cisco Systems, Inc. and/or its affiliates inthe United States and certain other countries.

    All other trademarks mentioned in this document or website are the property of their respective owners. The useof the word partner does not imply a partnership relationship between Cisco and any other company.

    Copyright 2009 Cisco Systems, Inc. All rights reserved.

  • 7/29/2019 4521 674297 AdminShutdown Rev4 2 External

    3/16

    Controlled Shutdown of SP Core Links Technical Paper

    Abstract

    This paper describes how Service Providers can bring core links out of service without causing traffic loss.

    Key Technologies

    ISIS

    OSPF

    Fast Convergence

    IOS-XR

    IOS

    Page 3 of 16 Copyright 2009 Cisco Systems, Inc. All rights reserved. Document Version 1.1

  • 7/29/2019 4521 674297 AdminShutdown Rev4 2 External

    4/16

    Controlled Shutdown of SP Core Links Technical Paper

    Table of Contents

    1 Overview...................................................................................................................................................5 2 Recommended actions for taking a link out of service.........................................................................53 Test setup..................................................................................................................................................64 Overview of possible options and impact on packet loss......................................................................6

    4.1 Overview...........................................................................................................................................6 4.2 Local vs. remote traffic .....................................................................................................................74.3 Unplugging the fiber .........................................................................................................................74.4 Administratively shutting down the interface ...................................................................................8

    4.4.1 Shutting down a GE interface.......................................................................................................84.4.2 Shutting down POS interfaces ....................................................................................................10

    4.5 Configuring max-metric or max-metric-1.......................................................................................114.6 Admin shutdown of IGP interface ..................................................................................................114.7 OSPF passive interface ...................................................................................................................14

    5 Micro-loops.............................................................................................................................................14 6 Bringing the link back in service ..........................................................................................................167 Conclusion ..............................................................................................................................................16

    Page 4 of 16 Copyright 2009 Cisco Systems, Inc. All rights reserved. Document Version 1.1

  • 7/29/2019 4521 674297 AdminShutdown Rev4 2 External

    5/16

    Controlled Shutdown of SP Core Links Technical Paper

    Document Version 1.1 Copyright 2009 Cisco Systems, Inc. All rights reserved. Page 5 of 16

    1 Overview

    In order to assure network availability and not to violate Service Level Agreements, service providers arepaying attention more than ever trying to avoid data traffic loss. One of the operations that need to occurregularly is taking a link out of service (and in service). In this document, we focus on core links and wellexplain how this can be done in a controlled manner resulting in minimum or no packet loss. We explain thebehavior of different options and will illustrate through testing why one option is preferred to the other.

    The examples are illustrated using an IOS-XR test-bed using CRS1 routers in a simulated service providerISIS topology. There will be a focus on IOS-XR, but we also pay attention to IOS in particular in those caseswhere differences with IOS-XR are being experienced.

    The same results apply to OSPF when used as IGP, unless otherwise mentioned. We assume that all links are

    point-to-point and that POS links are configured with PPP. In case Layer 2 devices are installed in betweentwo routers we assume BFD is configured between them

    2 Recommended actions for taking a link out of service

    The best way to take a link out of service using IOS-XR is to first shut down the link under the IGPconfiguration. This will trigger rerouting of traffic onto the alternate path without causing any packet loss1 .

    When traffic has been rerouted the link can be safely taken out of service by either unplugging the fiber orshutting down the interface.

    An example is illustrated hereafter:

    Note: In OSPF it is not possible to configure the shutdown of the interface. A feature request has beenopened (CSCsq38111). As a workaround, one can configure the interface as passive prior to bringing the linkdown.

    As IOS does not have the ability to do a shutdown of the IGP interface, one can also configure the interfaceas passive in IGP configuration mode.

    To bring the link back in service one should bring the link up at its physical layer, assure the link comes up atlayer two and then bring the link up at IP level. When all actions are taken to assure that the link is in a stable

    1Assuming no micro-loop exists for the rerouted data traffic

    RP/0/RP1/CPU0:crs1e8-1#conf tRP/0/RP1/CPU0:crs1e8-1(config)#router isis 1RP/0/RP1/CPU0:crs1e8-1(config-isis)#int Gi0/6/0/4RP/0/RP1/CPU0:crs1e8-1(config-isis-if)#shutRP/0/RP1/CPU0:crs1e8-1(config-isis-if)#commit

  • 7/29/2019 4521 674297 AdminShutdown Rev4 2 External

    6/16

    Controlled Shutdown of SP Core Links Technical Paper

    and healthy state, the routing protocol interfaces can be brought up and a lossless transition on to the newlink will be observed.

    3 Test setup

    In order to illustrate the behavior for the different actions being taken, a regular fast convergence test-bed hasbeen setup using 3.6.1 IOS-XR on CRS1 and c12k routers. The simulated topology is a 700 node/2000prefixes ISIS IP network out of which 500 prefixes are configured with priority critical. The UUT is always aCRS1 8 slot chassis with RP rev A and MSC Rev A. ISIS initial delay timers for LSP-gen and SPF are set to0 ms.

    700 nodes2000 prefixes500 important

    Figure 1: Standard Fast Convergence test-bed

    Convergence graphs (i.e. see Figure 3) report percentile values calculated on 10 iterations for each scenario.Reported values are P0 (best result), P50 (median), P90 and P100 (worst-case). The X-axis represents theprefix position as it is being rerouted by the IGP and the Y axis represents the traffic loss in milliseconds asmeasured by a traffic generator.

    4 Overview of possible options and impact on packetloss

    4.1 Overview

    When bringing a core link out of service, following options are available:

    1. Unplugging the fiber

    2. Administratively shutting down the interface

    3. Configuring IGP max-metric or max-metric -1

    4. Shutting down the interface in router configuration mode (Best, preferred)

    Page 6 of 16 Copyright 2009 Cisco Systems, Inc. All rights reserved. Document Version 1.1

  • 7/29/2019 4521 674297 AdminShutdown Rev4 2 External

    7/16

    Controlled Shutdown of SP Core Links Technical Paper

    4.2 Local vs. remote traffic

    As traffic is flowing in two directions, when analyzing the impact of the actions taken on data traffic, wehave to distinguish between local and remote traffic. Hence we define:

    1. Local node: the node where the action has been taken

    2. Remote node: the node on the other side of the link that is neighboring with the local node

    3. Local traffic: traffic exiting the local node where the action has been taken

    4. Remote traffic: traffic exiting the remote node getting forwarded on the link to the local node

    The below picture illustrates this:

    Local traffic

    Remote traffic

    local node remote node

    local node remote node

    Figure 2: Local and remote traffic

    It is clear that when taking an action, both the local and the remote node need to reroute traffic on analternate path, hence the action taken on the local node should result in a trigger for convergence on theremote node such that also remote traffic can get rerouted.

    4.3 Unplugging the fiber

    When unplugging the fiber, normal convergence characteristics will be seen with a minimum loss if bothlocal and remote node are able to trigger convergence upon detecting the Layer1 failure.

    We assume that both Tx and Rx are unplugged at the same time on the local node. This will cause animmediate loss of traffic in both directions. In the lab setup an optical switch is used that causes a LOL onboth sides. In this failure case local and remote traffic should see the same loss of connectivity (LoC) if noLayer 2 devices are installed. Indeed, both local and remote node should trigger convergence when theydetect the LOL hence normal convergence characteristics are expected.

    With Layer 2 devices in between, one of the routers might not detect the Layer 1 failure and might have torely on BFD detection for triggering convergence. Convergence time will then depend on the timeoutinterval of BFD. Note that convergence might also get triggered upon the receipt of the neighbor LSPdepending on the timers set for LSP gen and BFD timeout. Indeed, suppose BFD is configured with atimeout of 150 ms and the LSP-gen initial wait is set to 50 ms. When the local node detects the failure, it will

    Document Version 1.1 Copyright 2009 Cisco Systems, Inc. All rights reserved. Page 7 of 16

  • 7/29/2019 4521 674297 AdminShutdown Rev4 2 External

    8/16

    Controlled Shutdown of SP Core Links Technical Paper

    generate an LSP after 50 ms and flood it. BFD on the remote node will not detect the failure for 150 ms.After 50 ms plus a small delta of less than 10 ms, the remote node will receive the LSP from the local nodeand trigger convergence. During SPF calculation, two way connectivity check (TWCC) will fail andrerouting on the remote node will occur.

    Following graphs illustrates convergence times upon POS link failure as seen on a CRS1 8 slot chassis with

    RP-revA and MSC-revA.

    36112-crs1-P-ISIS00-500Pr-2kT-i2i-Layer1down-050708-lnlb

    0

    20

    40

    60

    80

    100

    120

    140

    160

    180

    0

    100

    200

    300

    400

    500

    600

    700

    800

    900

    1000

    1100

    1200

    1300

    1400

    1500

    1600

    1700

    1800

    1900

    2000

    prefix nr

    trafficloss(msec)

    0%

    50%

    90%

    100%

    Agilent measurements

    Figure 3: POS Layer 1 failure local traffic

    4.4 Administratively shutting down the interface

    A very common approach of bringing a core link out of service is to administratively shutdown the routerinterface in configuration mode. However this approach will never result in a zero loss convergence. Exactbehavior will depend on the OS implementation. For example, it could be that upon shutdown of the

    interface, convergence is triggered while traffic keeps on forwarding. This will result in less traffic loss thanexpected. Such a behavior was not noticed on IOS-XR.

    4.4.1 Shutt ing down a GE interface

    When shutting down the GE interface, local traffic will immediately experience loss. At the same time, theISIS adjacency is getting deleted and the routing protocol is triggered for convergence. Very little differencewill be seen in terms of convergence times when compared to unplugging the fiber, as Figure 4 illustrates.

    Page 8 of 16 Copyright 2009 Cisco Systems, Inc. All rights reserved. Document Version 1.1

  • 7/29/2019 4521 674297 AdminShutdown Rev4 2 External

    9/16

    Controlled Shutdown of SP Core Links Technical Paper

    Document Version 1.1 Copyright 2009 Cisco Systems, Inc. All rights reserved. Page 9 of 16

    36112-crs1-P-ISIS00-500Pr-2kT-i2i-GEAdminDownLocal-051608-lnlb

    0

    20

    40

    60

    80

    100

    120

    140

    160

    180

    200

    0

    100

    200

    300

    400

    500

    600

    700

    800

    900

    1000

    1100

    1200

    1300

    1400

    1500

    1600

    1700

    1800

    1900

    2000

    prefix nr

    trafficloss(m

    sec)

    0%

    50%

    90%

    100%

    Agilent measurements

    Figure 4: Convergence due to admin shut down of a GE interface

    Remote traffic is experiencing a bit more loss as shown in Figure 5. When incoming traffic starts dropping

    on the local node, the OS needs to signal the link down and the remote router needs some time to detect thisand trigger its own convergence.

    36112-crs1-P-ISIS00-500Pr-2kT-i2i-GEAdminDownLocalRemote-051608-

    lnlb

    0

    20

    40

    60

    80

    100

    120

    140

    160

    180

    200

    trafficloss(msec)

    0

    100

    200

    300

    400

    500

    600

    700

    800

    900

    1000

    1100

    1200

    1300

    1400

    1500

    1600

    1700

    1800

    1900

    2000

    prefix nr

    0%

    50%

    90%

    100%

    Agilent measurements

    Figure 5: Remote traffic convergence due to GE admin failure

    With Layer 2 switches in between the two devices, BFD on the remote router will have to detect that theconnection with its neighbor is down and will trigger convergence on the remote node, as can be seen in thisexample:

  • 7/29/2019 4521 674297 AdminShutdown Rev4 2 External

    10/16

    Controlled Shutdown of SP Core Links Technical Paper

    4.4.2 Shutting down POS interfacesWhen shutting down a POS interface, one needs to configure the router to send L-AIS on the link such thatthe remote node can quickly take down its link and trigger convergence. In IOS, this is done by configuringpos ais-shut on the pos interface. In IOS-XR, there is a separation of Layer 1 and Layer 2 by using acontroller interface for Layer 1 and the POS interface for Layer 2 configurations. POS alarm information isto be configured under the controller interface:

    Local node:

    RP/0/0/CPU0:c12e5-1(config)#int GigabitEthernet0/3/0/0.3

    RP/0/0/CPU0:c12e5-1(config-subif)#shut

    RP/0/0/CPU0:c12e5-1(config-subif)#commit

    RP/0/0/CPU0:May 20 10:36:01.840 : isis[239]: %ROUTING-ISIS-4-ADJCHANGE : Adjacency to c12e6-1(GigabitEthernet0/3/0/0.3) (L2) Down, Interface state down

    LC/0/3/CPU0:May 20 10:36:01.863 : bfd_agent[111]: %L2-BFD-6-SESSION_REMOVED : BFD session to neighbor 18.0.0.18on interface GigabitEthernet0/3/0/0.3 has been removed

    Remote node:

    RP/0/9/CPU0:c12e6-1#LC/0/3/CPU0:May 20 10:36:02.056 : bfd_agent[111]: %L2-BFD-6-SESSION_STATE_DOWN : BFDsession to neighbor 18.0.0.17 on interface GigabitEthernet0/3/0/1.3 has gone down. Reason: Echo function failed

    interface POS0/2/0/5ipv4 address 10.10.105.85 255.255.255.252encapsulation ppp

    poscrc 32

    !

    !controller SONET0/2/0/5ais-shut

    The result of this is that when shutting down the interface in interface POS x/x/x/x configuration mode, noLAIS is going to be sent to the remote node. Instead, the interface on the local node will go down while theremote node will have to wait for its PPP timeout to bring the link down. Also ISIS will have to timeout ifBFD is not configured on the POS interface.

    Pitfall: Dont shutdown a POS interface in IOS-XR as LAIS is not being sent. As a result the remote endof the link will not go down until PPP times out. Instead, do an admin shutdown of the controllerinterface and have it prior to shutting it down configured for sending LAIS.

    When taking above considerations into account, then convergence by shutting down the sonet controllerinterface results in the same loss as shutting down a GE interface.

    If an administrator would still shutdown the POS interface and not the controller, then local traffic will seenormal convergence characteristics (similar to unplugging the fiber). And although the link will not go downuntil ppp times out and the adjacency will not go down until the hold timer expires, the remote node will

    Page 10 of 16 Copyright 2009 Cisco Systems, Inc. All rights reserved. Document Version 1.1

  • 7/29/2019 4521 674297 AdminShutdown Rev4 2 External

    11/16

    Controlled Shutdown of SP Core Links Technical Paper

    trigger convergence when it receives the LSP from the local node. The SPF calculation will fail TWCCwhich will trigger rerouting for the remote traffic.

    4.5 Configuring max-metric or max-metric-1

    Another option that was taken into consideration is to configure ISIS with max-metric or max-metric -1 priorto bringing the link down (admin or physical failure). This would result in traffic being rerouted away fromthe affected link without causing traffic loss. The disadvantage of this approach is that it works only in onedirection: configuring max-metric-1 at one end of the link would result in rerouting traffic on the local node,but the remote node needs to do its own metric change for remote traffic to converge. This will result inlarger durations of asymmetrical routing. Another drawback is that the administrator, when reconfiguring themetric before bringing the link back in service, might set the wrong metric value hence jeopardizing capacityplanning. With these drawbacks in mind, the next option is considered more optimal.

    4.6 Admin shutdown of IGP interface

    The advantage that IOS-XR has is that all IGP interface specific configurations are configured under the IGP

    router configuration mode.

    router isis 1

    is-type level-2-onlynet 39.0001.0000.0000.0000.0200.00nsf ietflog adjacency changeslsp-gen-interval maximum-wait 1000 initial-wait 0 secondary-wait 50lsp-refresh-interval 65000max-lsp-lifetime 65535address-family ipv4 unicastmetric-style widespf-interval maximum-wait 1000 initial-wait 0 secondary-wait 50spf prefix-priority critical isis-critical-aclspf prefix-priority medium isis-med-acl

    !interface Loopback0passivecircuit-type level-2-onlyaddress-family ipv4 unicastmetric 1 level 2

    !!interface GigabitEthernet0/3/0/1circuit-type level-2-only

    bfd minimum-interval 50bfd multiplier 3bfd fast-detect ipv4point-to-pointaddress-family ipv4 unicastmetric 200 level 2

    As a result, one can do an admin shutdown of the IGP router interface. This will trigger convergence on thelocal node without causing any traffic loss assuming no micro-loops exist in the network (more on micro-loops in the next section). Remote node convergence will be lossless as well and can be triggered in differentways:

    Document Version 1.1 Copyright 2009 Cisco Systems, Inc. All rights reserved. Page 11 of 16

  • 7/29/2019 4521 674297 AdminShutdown Rev4 2 External

    12/16

    Controlled Shutdown of SP Core Links Technical Paper

    Page 12 of 16 Copyright 2009 Cisco Systems, Inc. All rights reserved. Document Version 1.1

    1.Upon shutting down the interface, ISIS sends a hello that will immediately bring down theISIS adjacency on the remote node, hence triggering convergence. This is done on POSinterfaces and a feature request has been opened to have this done on GE configured aspoint-to-point (CSCsq40551).If the remote node runs IOS, then convergence will be triggered by point (3) hereafter).

    2.If the interface is GE, then the remote node will timeout (a direct hello is not sent on theLAN, even if it is configured for point-to-point, see CSCsq40551). In order for the neighborto quickly go down, it is advised to configure BFD between the two nodes. When this isdone, BFD will signal a session down message immediately after the IGP interface is beingshutdown hence bringing down the IGP adjacency on the local node.

    3.If BFD is not configured between local and remote node, and if the remote node is timingout the adjacency, then convergence on the remote node will still happen very quickly sinceit is triggered by the receipt of the LSP of the local node. This will trigger an SPF calculationwhich will fail TWCC with the local node hence triggering convergence.

    Some example output when the IGP adjacencies are on POS interface:

    An example on GE interface with BFD configured:

    Local node:

    RP/0/RP1/CPU0:crs1e8-1(config)#router isis 1

    RP/0/RP1/CPU0:crs1e8-1(config-isis)#int PO0/2/0/5

    RP/0/RP1/CPU0:crs1e8-1(config-isis-if)#shut

    RP/0/RP1/CPU0:crs1e8-1(config-isis-if)#commit

    RP/0/RP1/CPU0:May 19 17:50:59.876 : isis[247]: %ROUTING-ISIS-4-ADJCHANGE : Adjacency to c12e5-1 (POS0/2/0/5)(L2) Down, Interface state down

    Remote node:

    RP/0/0/CPU0:May 19 17:50:59.882 : isis[239]: %ROUTING-ISIS-4-ADJCHANGE : Adjacency to crs1e8-1 (POS0/5/0/0) (L2)Down, Interface state down

    RP/0/0/CPU0:May 19 17:50:59.882 : isis[239]: L2 Adj crs1e8-1 (POS0/5/0/0): State change: UP -> DOWN (Interface statedown)

    Local node:

    RP/0/RP1/CPU0:crs1e8-1(config)#router isis 1

    RP/0/RP1/CPU0:crs1e8-1(config-isis)#int GigabitEthernet0/6/0/4

    RP/0/RP1/CPU0:crs1e8-1(config-isis-if)#shutRP/0/RP1/CPU0:crs1e8-1(config-isis-if)#commit

    RP/0/RP1/CPU0:May 20 10:01:49.156 : isis[247]: %ROUTING-ISIS-4-ADJCHANGE : Adjacency to c12e5-1(GigabitEthernet0/6/0/4) (L2) Down, Interface state down

    LC/0/6/CPU0:May 20 10:01:49.157 : bfd_agent[105]: %L2-BFD-6-SESSION_REMOVED : BFD session to neighbor10.10.105.214 on interface GigabitEthernet0/6/0/4 has been removed

    Remote node:

    RP/0/0/CPU0:c12e5-1(config-isis-if)#LC/0/3/CPU0:May 20 10:01:49.161 : bfd_agent[111]: %L2-BFD-6-SESSION_STATE_DOWN : BFD session to neighbor 10.10.105.213 on interface GigabitEthernet0/3/0/3 has gone down.Reason: Nbor signalled down

    RP/0/0/CPU0:May 20 10:01:49.170 : isis[239]: %ROUTING-ISIS-4-ADJCHANGE : Adjacency to crs1e8-1(GigabitEthernet0/3/0/3) (L2) Down, BFD session DOWN

  • 7/29/2019 4521 674297 AdminShutdown Rev4 2 External

    13/16

    Controlled Shutdown of SP Core Links Technical Paper

    Test results form the test-bed show there is no packet loss when rerouting from one path to the backup pathupon triggering convergence using this methodology:

    36112-crs1-P-ISIS00-500Pr-2kT-i2i-isisIntdown-050808-lnlb

    0

    0

    0

    0

    0

    1

    1

    1

    1

    1

    1

    0

    100

    200

    300

    400

    500

    600

    700

    800

    900

    1000

    1100

    1200

    1300

    1400

    1500

    1600

    1700

    1800

    1900

    2000

    2100

    2200

    prefix nr

    trafficloss(msec)

    0%

    50%

    90%

    100%

    Agilent measurements

    36112-crs1-P-ISIS00-500Pr-2kT-i2i-IsisIntShutRemote-050908-lnlb

    0

    0

    0

    0

    0

    1

    1

    1

    1

    1

    1

    0

    10

    0

    20

    0

    30

    0

    40

    0

    50

    0

    60

    0

    70

    0

    80

    0

    90

    0

    100

    0

    110

    0

    120

    0

    130

    0

    140

    0

    150

    0

    160

    0

    170

    0

    180

    0

    190

    0

    200

    0

    210

    0

    220

    0

    prefix nr

    trafficloss(msec)

    0%

    50%

    90%

    100%

    Agilent measurements

    Figure 6: There is no packet loss for local and remote traffic for an admin shut down of the IGP interface.

    Document Version 1.1 Copyright 2009 Cisco Systems, Inc. All rights reserved. Page 13 of 16

  • 7/29/2019 4521 674297 AdminShutdown Rev4 2 External

    14/16

    Controlled Shutdown of SP Core Links Technical Paper

    Page 14 of 16 Copyright 2009 Cisco Systems, Inc. All rights reserved. Document Version 1.1

    4.7 OSPF passive interface

    As OSPF does not yet support the admin shutdown of its interface (feature requested with CSCSsq38111), avalid workaround is to configure the interface as passive. This will immediately bring down the adjacency onthe local node and trigger convergence. In order to also quickly bring down the adjacency on the remote

    node, one needs to configure BFD between the neighbors. Upon committing the passive interfaceconfiguration, BFD will signal a neighbor down message to the remote node which will then immediatelybring its OSPF adjacency down:

    If BFD is not configured between the nodes, then the adjacency on the remote node will only go down whenthe OSPF hellos have timed out. However rerouting of remote traffic will still be done quite fast as it will betriggered by the receipt of the LSA of the local node. (TWCC will fail)

    5 Micro-loopsStudies on service provider topologies have shown micro-loops are rare (less than 20% of link failures).Some of the traffic could experience traffic drop because it is caught in short duration loops between routerswhile they are in the process of converging. We call such loops micro-loops. When using the technique ofgracefully shutting down a core link, short loops could still occur since the change in topology will affectevery router in the network. So although traffic on the local node is being rerouted without causing any losson that node, other routers in the network might also change their routing table due to this topology change.This might cause traffic to be dropped if there is some desynchronization between two or multiple routers inthe network. An example of such a topology is given hereafter:

    Local node:

    RP/0/RP1/CPU0:crs1e8-1(config)#router ospf 1

    RP/0/RP1/CPU0:crs1e8-1(config-ospf)# area 0

    RP/0/RP1/CPU0:crs1e8-1(config-ospf-ar)# interface POS0/2/0/5

    RP/0/RP1/CPU0:crs1e8-1(config-ospf-ar-if)#passive

    RP/0/RP1/CPU0:crs1e8-1(config-ospf-ar-if)#commit

    RP/0/RP1/CPU0:May 21 11:54:55.370 : ospf[291]: %ROUTING-OSPF-5-ADJCHG : Process 1, Nbr 10.10.104.5 on POS0/2/0/5in area 0 from FULL to DOWN, Neighbor Down: interface down or detached

    LC/0/2/CPU0:May 21 11:54:55.370 : bfd_agent[111]: %L2-BFD-6-SESSION_REMOVED : BFD session to neighbor10.10.105.86 on interface POS0/2/0/5 has been removed

    Remote node:

    RP/0/0/CPU0:c12e5-1#LC/0/5/CPU0:May 21 11:54:55.378 : bfd_agent[111]: %L2-BFD-6-SESSION_STATE_DOWN : BFDsession to neighbor 10.10.105.85 on interface POS0/5/0/0 has gone down. Reason: Nbor signalled down

    RP/0/0/CPU0:May 21 11:54:55.384 : ospf[295]: %ROUTING-OSPF-5-ADJCHG : Process 1, Nbr 10.10.104.200 on POS0/5/0/0in area 0 from FULL to DOWN, Neighbor Down: BFD session down

  • 7/29/2019 4521 674297 AdminShutdown Rev4 2 External

    15/16

    Controlled Shutdown of SP Core Links Technical Paper

    5

    3

    1 1 1

    1

    1

    R1 R2 R3 R4

    R5 R6

    Figure 7: Topology where micro-loop could occur

    Before the admin shutdown of the IGP link R3R4, R1 routes traffic to R2 and R2 routes traffic to R3.

    When the IGP adjacency between R3-R4 is being deleted, R2 will reroute traffic back to R1 and R1 will alsoreroute traffic to R5. In this case there is a probability that R2 might reroute a prefix back to R1 before R1

    has rerouted this prefix to R5. This would cause traffic to that particular prefix to be dropped as long as R1and R2 keep pointing their FIB to each other.

    The study of micro-loops is still ongoing. Prediction of probability and duration is very complex as manyfactors come into play, such as differences in hardware/software of rerouting nodes, differences in time oftrigger due to flooding/propagation delays, differences in FIB update time due to routing protocol updateorder and CPU utilization being different, . It is outside the scope of this paper to try to give an examplethrough testing of such micro loops, as it might not be representative for real network situations, hencegiving biased information.

    Document Version 1.1 Copyright 2009 Cisco Systems, Inc. All rights reserved. Page 15 of 16

  • 7/29/2019 4521 674297 AdminShutdown Rev4 2 External

    16/16

    Controlled Shutdown of SP Core Links Technical Paper

    Page 16 of 16 Copyright 2009 Cisco Systems Inc All rights reserved Document Version 1 1

    6 Bringing the link back in service

    When the link was properly taken out of service, the reverse actions should been taken to bring the link backup and have it used for data forwarding. First, the physical and data link layer are to be brought up and thelink should be checked for connectivity at network layer. Once all checks have passed to assure the link isworking error free, the IGP interface can be unshut. This should not cause any traffic loss. When the IGP hasformed adjacencies, each side of the link will build a new LSP and flood it through the network. Traffic willonly get forwarded onto the new link when TWCC passes, meaning that both LSPs should be received on thererouting node. In case MPLS has been configured, it is advised to configure MPLS-IGP sync in order toassure labels have been exchanged prior to converging onto the new path. When session-protection waspreviously configured, there is no strict requirement to have MPLS-IGP sync enabled, since the LDP sessionwith the neighbor should stay up after the link was taken out of service and if an alternate IGP path wasavailable to reach it.

    Figure 8 shows that no data traffic is lost when converging to the new link.

    36112-crs1-P-ISIS00-500Pr-2kT-i2i-isisIntdown-050808-lnlb

    0

    0

    0

    0

    0

    1

    1

    1

    1

    1

    1

    0 500 1000 1500 2000

    trafficloss(msec)

    prefix nr

    0%

    50%

    90%

    100%

    Agilent measurements In Servic e

    Figure 8: Convergence when bringing the link back in service does not cause any traffic loss

    7 ConclusionWith IOS-XR it is possible to do a controlled shutdown of a core link without causing any traffic lossassuming no micro-loop exist in the network. Studies on service provider networks have proven that this willbe the case for 80% of the links of an average SP core network.


Recommended