+ All Categories
Home > Documents > An Analysis of ISP Backbone Availability · National Institute of Advanced Industrial Science and...

An Analysis of ISP Backbone Availability · National Institute of Advanced Industrial Science and...

Date post: 26-Apr-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
21
National Institute of Advanced Industrial Science and Technology An Analysis of ISP Backbone Availability Katsushi Kobayashi [email protected]
Transcript
Page 1: An Analysis of ISP Backbone Availability · National Institute of Advanced Industrial Science and Technology Abilene IS-IS operation • 9 sec. Hello interval, lost ISIS adjacency

National Institute of Advanced Industrial Science and Technology

An Analysis of ISP Backbone Availability

Katsushi [email protected]

Page 2: An Analysis of ISP Backbone Availability · National Institute of Advanced Industrial Science and Technology Abilene IS-IS operation • 9 sec. Hello interval, lost ISIS adjacency

National Institute of Advanced Industrial Science and Technology

• All results in this talk are based only with the IS-IS messages provided by Internet2 observatory. Therefore, the results of specific links and nodes in this presentation are not directly reflect the quality of its service, and/or of its equipment.

Page 3: An Analysis of ISP Backbone Availability · National Institute of Advanced Industrial Science and Technology Abilene IS-IS operation • 9 sec. Hello interval, lost ISIS adjacency

National Institute of Advanced Industrial Science and Technology

How much availability in ISP infrastructure.

• Your ISP offers 99.9% SLA for intra-ISP,

• really premium ?

• valuable to pay more ?

• Just presenting infrastructure availability, not taking into account :

• Any convergence delay of routing protocol

• Packet behavior

Page 4: An Analysis of ISP Backbone Availability · National Institute of Advanced Industrial Science and Technology Abilene IS-IS operation • 9 sec. Hello interval, lost ISIS adjacency

National Institute of Advanced Industrial Science and Technology

Internet infrastructure : viewpoint from Routing

• Breakdown network failures into its causes:

• Routing and centralized-NMS (Labovitz ’99)

• A lot of BGP activities• BGP failures affects world wide Internet system• BGP can be seen by other ISP’s• BGP continues to be recorded as UO’s RouteViews

Page 5: An Analysis of ISP Backbone Availability · National Institute of Advanced Industrial Science and Technology Abilene IS-IS operation • 9 sec. Hello interval, lost ISIS adjacency

National Institute of Advanced Industrial Science and Technology

ISP infrastructure : viewpoint from IGP

• Fewer IGP activities than BGP• IS-IS on Qwest , Alaettinoglu (‘02)• OSPF on Michi-Net, Watson (‘03)

• required to install collector ISP network inside.

• IGP dataset will disclose ISP backbone quality.

• or, It is not a news network is working fine :)

• IGP message represents infrastructure events• Lost adjacency, ext. route : circuit / switch / interface down• Est. adjacency, ext. route : circuit / switch / interface up• Lost LSP/LSA : router down• Reset LSP/LSA seq. : router up

Page 6: An Analysis of ISP Backbone Availability · National Institute of Advanced Industrial Science and Technology Abilene IS-IS operation • 9 sec. Hello interval, lost ISIS adjacency

National Institute of Advanced Industrial Science and Technology

IS-IS collector in Abilene

• IS-IS collector is part of I2 Abilene observatory activity.

http://ndb2-blmt.abilene.ucaid.edu/isis/ Contributed by Shu Zhang [ZK06]

• Deployed all Abilene nodes for multi observation points.

• Synchronized with CDMA timer (GPS based)

• From Aug. ’04 to Apr. ’07 data set is available.

[ZK06] S. Zhang and K. Kobayashi, “Rtanaly: A System to Detect and Measure IGP Routing Changes”

Page 7: An Analysis of ISP Backbone Availability · National Institute of Advanced Industrial Science and Technology Abilene IS-IS operation • 9 sec. Hello interval, lost ISIS adjacency

National Institute of Advanced Industrial Science and Technology

Abilene Network Map

Seattle

DenverSunnyvale

Los Angels

Kansas City

Chicago

Indianapolis

Atlanta

Washington

New York City

Houston

Seattle

Denver

Sunnyvale

Los Angels

Kansas City

Chicago

Indianapolis

Atlanta

Washington

New York City

Houston

11 nodes with T640 routers, and 14 OC192 circuits.

Page 8: An Analysis of ISP Backbone Availability · National Institute of Advanced Industrial Science and Technology Abilene IS-IS operation • 9 sec. Hello interval, lost ISIS adjacency

National Institute of Advanced Industrial Science and Technology

Abilene IS-IS operation

• 9 sec. Hello interval, lost ISIS adjacency after missing 3 hellos • 22.5 sec. failure detection delay is supposed.• More faster failure detection is possible, e.g., shorter hello

interval, BFD, carrier loss with circuit failure.

• IGP maintains infrastructure information only.

• Minimize IGP database

• Not import any BGP route into IS-IS.

Page 9: An Analysis of ISP Backbone Availability · National Institute of Advanced Industrial Science and Technology Abilene IS-IS operation • 9 sec. Hello interval, lost ISIS adjacency

National Institute of Advanced Industrial Science and Technology

• Network availability in hereafter:

• All network works without any failure.• From Network operator’s viewpoint.

• Don’t care specific source destination path availability.• Not from customer’s viewpoint.

• Timeframe:

• May include more than one event at same time.

ATLA

IPLS

Network

Timeframe of failure Timeframe of double failure

..........

Page 10: An Analysis of ISP Backbone Availability · National Institute of Advanced Industrial Science and Technology Abilene IS-IS operation • 9 sec. Hello interval, lost ISIS adjacency

National Institute of Advanced Industrial Science and Technology

Abilene IS-IS overview ’05-’06

• Node failure: timeout node LSP, or seq. number reset.

• Only 1 times on ’05 (53 sec. downtime), 2 on ’06 (1,298 sec. )

• Circuit failure: adjacency away from list in LSP

• Usually found, 635 timeframe on ’05, 513 on ’06.

• Ext. route failure: Route away from LSP

• Represent edge troubles ?

• Difficult to identify whether serious or trivial.To focus this failure.

Page 11: An Analysis of ISP Backbone Availability · National Institute of Advanced Industrial Science and Technology Abilene IS-IS operation • 9 sec. Hello interval, lost ISIS adjacency

National Institute of Advanced Industrial Science and Technology

Lost adjacency event

Note that above histograms are drawn with IS-IS captured data at Atlanta. Few details are different with other IS-IS observatory point.

2005/Jan.-Dec. 2006/Jan.-Dec.single−failure

Monitor duration: 365 (days)Total disrupt(count): 635, Availability: 0.95443

log_10(Disrupt time (sec.))

Freq

uenc

y

0 1 2 3 4 5 6

010

020

030

040

0

single−failureMonitor duration: 365 (days)

Total disrupt(count): 513, Availability: 0.98424

log_10(Disrupt time (sec.))

Freq

uenc

y

0 1 2 3 4 5 6

010

020

030

040

0

60 sec. 1 hour 1 day 60 sec. 1 hour 1 day

Page 12: An Analysis of ISP Backbone Availability · National Institute of Advanced Industrial Science and Technology Abilene IS-IS operation • 9 sec. Hello interval, lost ISIS adjacency

National Institute of Advanced Industrial Science and Technology

Breakdown in ‘05ATLA−IPLS

Monitor duration: 365 (days)Disrupt(count) 288, Avail: 0.99137

log_10(Disrupt time (sec.))

Freq

uenc

y

0 1 2 3 4 5 6

050

100

150

200

250

CHIN−IPLSMonitor duration: 365 (days)

Disrupt(count) 34, Avail: 0.99981

log_10(Disrupt time (sec.))

Freq

uenc

y

0 1 2 3 4 5 60

24

68

CHIN−NYCMMonitor duration: 365 (days)

Disrupt(count) 64, Avail: 0.99947

log_10(Disrupt time (sec.))

Freq

uenc

y

0 1 2 3 4 5 6

05

1015

DNVR−KSCYMonitor duration: 365 (days)

Disrupt(count) 4, Avail: 0.99997

log_10(Disrupt time (sec.))

Freq

uenc

y

0 1 2 3 4 5 6

0.0

0.5

1.0

1.5

2.0

DNVR−SNVAMonitor duration: 365 (days)

Disrupt(count) 12, Avail: 0.99302

log_10(Disrupt time (sec.))

Freq

uenc

y

0 1 2 3 4 5 6

01

23

45

DNVR−STTLMonitor duration: 365 (days)

Disrupt(count) 8, Avail: 0.99997

log_10(Disrupt time (sec.))

Freq

uenc

y

0 1 2 3 4 5 60.

00.

51.

01.

52.

02.

53.

0

Page 13: An Analysis of ISP Backbone Availability · National Institute of Advanced Industrial Science and Technology Abilene IS-IS operation • 9 sec. Hello interval, lost ISIS adjacency

National Institute of Advanced Industrial Science and Technology

Availability Map (05/01-12)

Seattle

Denver

Sunnyvale

Los Angels

Kansas City

Chicago

Indianapolis

Atlanta

Washington

New York City

Houston

0.9999/11/800

0.9991/122/12,4940.9730/24/819,803

0.9930/12/183,352

0.9913/288/170,303

0.9998/34/1,364

0.9994/64/5,090

0.9998/10/3,940

Availability / Disrupt count / Longest down time (sec.)

0.9997/54/1,194

0.9999/4/398

0.9992/16/7,071

0.9997/12/2,349

0.9999/8/501

0.9993/18/14,192

Hurricane KatrinaAug. ‘05

Page 14: An Analysis of ISP Backbone Availability · National Institute of Advanced Industrial Science and Technology Abilene IS-IS operation • 9 sec. Hello interval, lost ISIS adjacency

National Institute of Advanced Industrial Science and Technology

Yearly summary ’05 - ’062005/Jan.- Dec. 2006/Jan.- Dec.

Avail. Disrupt cnt. Avail. Disrupt cnt.

ATLA-HSTN 0.9738 24 0.9990 39

ATLA-IPLS 0.9914 288 0.9975 48

ATLA-WASH 0.9998 12 0.9994 25

CHIN-IPLS 0.9998 34 0.9998 14

CHIN-NYCM 0.9995 64 0.9999 30

DNVR-KSCY 1.0000 4 0.9999 18

DNVR-SNVA 0.9930 12 0.9922 51

DNVR-STTL 1.0000 8 0.9999 5

HSTN-KSCY 0.9993 18 0.9990 19

HSTN-LOSA 0.9991 121 0.9996 40

IPLS-KSCY 0.9998 10 0.9998 17

LOSA-SNVA 0.9997 54 0.9993 128

NYCM-WASH 0.9993 17 0.9989 113

SNVA-STTL 1.0000 11 1.0000 129

Total(*) 0.9544 677 0.9842 676

Page 15: An Analysis of ISP Backbone Availability · National Institute of Advanced Industrial Science and Technology Abilene IS-IS operation • 9 sec. Hello interval, lost ISIS adjacency

National Institute of Advanced Industrial Science and Technology

Critical events.

• 2 or more lost adjacency at same timeframe• Some combination makes serious impact. But, not all event

lead split graph condition.

• 32 timeframes (47 disrupt) in ’05, 58 (61) in ’06

• 26/47 timeframes in ’05, 49/61 in ’06, are attributed as missing a node in LSP database.

Page 16: An Analysis of ISP Backbone Availability · National Institute of Advanced Industrial Science and Technology Abilene IS-IS operation • 9 sec. Hello interval, lost ISIS adjacency

National Institute of Advanced Industrial Science and Technology

2 or more links failure (2) - Missing node -

Seattle

Denver

Sunnyvale

Los Angels

Kansas City

Chicago

Indianapolis

Atlanta

Washington

New York City

Houston

Missing IPLS router at...........

06/02/19 05:31-05:5606/02/19 06:30-06:3506/02/19 15:47-15:51

............

Page 17: An Analysis of ISP Backbone Availability · National Institute of Advanced Industrial Science and Technology Abilene IS-IS operation • 9 sec. Hello interval, lost ISIS adjacency

National Institute of Advanced Industrial Science and Technology

Two or more failure in ‘05

single−failureMonitor duration: 365 (days)

Total disrupt(count): 637, Availability: 0.95435

log_10(Disrupt time (sec.))

Freq

uenc

y

0 1 2 3 4 5 6

010

020

030

040

0

2005/Jan.-Dec.double−failure

Monitor duration: 365 (days)Total disrupt(count): 47, Availability: 0.99976

log_10(Disrupt time (sec.))

Freq

uenc

y

0 1 2 3 4 5 6

05

1015

2025

All lost adjacency events Two or more missing

60 sec. 1 hour 1 day 60 sec. 1 hour 1 day

Page 18: An Analysis of ISP Backbone Availability · National Institute of Advanced Industrial Science and Technology Abilene IS-IS operation • 9 sec. Hello interval, lost ISIS adjacency

National Institute of Advanced Industrial Science and Technology

Two or more failure in ‘06

2006/Jan.-Dec.double−failure

Monitor duration: 365 (days)Total disrupt(count): 61, Availability: 0.99959

log_10(Disrupt time (sec.))

Freq

uenc

y

0 1 2 3 4 5 6

010

2030

40

single−failureMonitor duration: 365 (days)

Total disrupt(count): 514, Availability: 0.98419

log_10(Disrupt time (sec.))

Freq

uenc

y

0 1 2 3 4 5 6

010

020

030

040

0

All lost adjacency events Two or more missing

60 sec. 1 hour 1 day 60 sec. 1 hour 1 day

Page 19: An Analysis of ISP Backbone Availability · National Institute of Advanced Industrial Science and Technology Abilene IS-IS operation • 9 sec. Hello interval, lost ISIS adjacency

National Institute of Advanced Industrial Science and Technology

Single link failure is trivial ? (1)

• Lost two or more adjacency events are rare, more than 99.95% availability, < 5 hours/year downtime.

• More than 500 lost single adjacency are founded.• 637 times in ’05, and 514 in ’06

• 3-4 hours/year downtime are estimated:• Only suppose 22 sec. downtime for each lost adjacency. • Other delays, i.e., routing convergence, degrade it more.

Page 20: An Analysis of ISP Backbone Availability · National Institute of Advanced Industrial Science and Technology Abilene IS-IS operation • 9 sec. Hello interval, lost ISIS adjacency

National Institute of Advanced Industrial Science and Technology

Single link failure is trivial ? (2)

• 22 sec. downtime for each lost adjacency is overestimated ?• Router can detect circuit failure more faster triggered with

lower layer information, e.g., loss of optical, framer error.• IGP timer hack or BFD provide faster failure detection as

sub-second or less [AC02].• Sub-second is derived from propagation delay limit,

impossible to reduce it.• IP FRR would help more.

Page 21: An Analysis of ISP Backbone Availability · National Institute of Advanced Industrial Science and Technology Abilene IS-IS operation • 9 sec. Hello interval, lost ISIS adjacency

National Institute of Advanced Industrial Science and Technology

Conclusion

• ’05-’06 Full-year availability evaluation using Abilene ISIS trace data:

• > 99.95 % backbone network viewpoint from IGP.

• Better than real one.• routing convergence delay / access link

• Abilene backbone is over-provisioned bandwidth.

• It is not a news network worked fine :-)

• Thanks for Shu Zhang, Randy Bush, and Xing Li


Recommended