+ All Categories
Home > Documents > ESnet5 Deployment Lessons Learned - Internet2 · ESnet5 Deployment Lessons Learned Joe Metzger,...

ESnet5 Deployment Lessons Learned - Internet2 · ESnet5 Deployment Lessons Learned Joe Metzger,...

Date post: 19-Aug-2018
Category:
Upload: hadung
View: 219 times
Download: 0 times
Share this document with a friend
22
ESnet5 Deployment Lessons Learned Joe Metzger, Network Engineer ESnet Network Engineering Group TIP January 16 2013
Transcript

ESnet5 Deployment Lessons Learned

Joe Metzger, Network Engineer

ESnet Network Engineering Group

TIP

January 16 2013

Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science

Outline

ESnet5 Overview •  Transport Network •  Router Network •  Transition Constraints

Deployment Experiences •  Challenges & Risks •  General Issues •  What went well •  What could have gone better •  What still needs to be done

Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science

ESnet5 Transport Network

ESnet Partnered with Internet2 to build a Ciena OME 6500 nation-wide optical system •  ESnet has 50% of the capacity of the shared optical system

−  (Excluding the Northern path from Seattle to Chicago through ID, MT, ND, MN & WI)

•  Shared Spectrum, Chassis, Configuration, Management, etc…

System •  Over 14,000 miles of shared Internet2 fiber, most of it pre-existing

•  Optical System Inventory report has 6495 components! −  341 nodes, 60+ add/drop/regen −  80% are common, the rest are dedicated to ESnet or Internet2 (over 500 are XFPs.) −  Sunnyvale has 4 32-slot shelves −  Sacramento is a 7-direction node

We extended the shared optical system to connect national laboratories •  ~600 miles of ESnet fiber including building 12 new laterals

•  Ring connecting Chicago-Hub & Starlight to ANL & FERMI •  Ring connecting Sacramento & Sunnyvale to LBL, SNLL, LLNL, SLAC, JGI, NERSC

•  Spur to ORNL

We also have our existing Infinera system on Long Island between hubs in NYC and BNL.

Services

•  Point-to-Point static dedicated optical Circuits

Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science

ESnet5 Optical System January 2012

Geography is only representational

SUNN

SEAT

LOSA

BOIS

EUGE

PORT

STLO

INDI

EQCH

CHIC

STAR

HOUS

DALL

TULS

KANS

SANA

GOOD

ELPA

ALBQ

DENV

PHOE

LASV

ECHOSALT

EURERENO

BOSTALBA

BUFF

PITT

CLEV

SOUT

JACK

ATLA

CHAT

BATOHOUL

JKMS

MEMPNASH

LOUI

CINC

CHAR

RALE

WASH

PHIL

AOFA

NEWY

ASHB

ANL

FNAL

ORNL

SNLL

NERSC

LBNL

JGI

SLAC

ESnet5 Optical Network

10:8 1/3/2013

44 Lambdas

61 Lambdas

100G+ Lit

Add Drop Node (Ciena)

ORNL Express Node (Ciena)

BNL

88 Lambdas

LIMAN Node (Infinera)

SAND

PEBL

SACR

Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science

1/16/13 5

ESnet5 Optical System January 2012 Lit Waves

SUNN

SEAT

LOSA

BOIS

EUGE

PORT

STLO

INDI

EQCH

CHIC

STAR

HOUS

DALL

TULS

KANS

SANA

GOOD

ELPA

ALBQ

DENV

PHOE

LASV

ECHOSALT

EURERENO

BOSTALBA

BUFF

PITT

CLEV

SOUT

JACK

ATLA

CHAT

BATOHOUL

JKMS

MEMPNASH

LOUI

CINC

CHAR

RALE

WASH

PHIL

AOFA

NEWY

ASHB

ANL

FNAL

ORNL

SNLL

NERSC

LBNL

JGI

SLAC

ESnet5 Optical Network

10:54 1/10/2013

44 Lambdas

61 Lambdas

100G+ Lit

Add Drop Node (Ciena)

100G Wave < 1000km

100G Wave > 1000km

10G Wave with optical protection

4x10G mux’d Wave

X2

X2

ORNL Express Node (Ciena)

BNL

88 Lambdas

LIMAN Node (Infinera)

SAND

PEBL

SACR

X3X2

Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science

ESnet5 Routed Network

Routers •  16 new Alcatel Lucent (ALU) 7750-SR12 −  10-slot router with up to 2x100G per slot today. −  56 100G interfaces & 200+ 10G interfaces

•  35 existing Juniper MXs −  Used in 10G hubs, commercial exchange points, sites

•  12 existing Juniper M7i & M10i −  For terminating links slower than GE

•  5 really old Cisco 7206s to be retired −  Terminating links slower than GE

Services •  Standard routed IP (including full Internet services) •  Point to Point Dynamic Virtual Circuits using OSCARS •  Various overlay networks (Private VPN’s, LHCONE VRF)

Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science

SNLL

PNNL

LANL

SNLA

JLAB

PPPL

BNL

AMES

LLNL

JGI

LBNL ANL

Salt Lake

GFDL PU Physics

SEAT

STA

R

commercial peering points R&E network peering locations

ALBQ

LASV

SDSC

LOSA

Routed IP 100 Gb/s Routed IP 4 X 10 Gb/s 3rd party 10Gb/s Express / metro 100 Gb/s Express / metro 10G Express multi path 10G Lab supplied links Other links Tail circuits

Major Office of Science (SC) sites LBNL Major non-SC DOE sites LLNL

CLE

V

ESnet optical transport nodes (only some are shown)

ESnet managed 10G router 10 ESnet managed 100G routers 100  

Site managed routers 100   10

10 10

10

10

100  10

10 10

1

100 

100 

100 

100 

10

10

100 

100 

10

10

10

10

10

100  

100  10

10

100  10

100 

SUNN ESnet PoP/hub locations

LOSA ESnet optical node locations (only some are shown)

10

100  

100  

100 

100 

100  

100  E

QC

H

Geography is only representational

ESnet5 January 2012

1

100  

10

SUNN

100 

100  100  

10

100 

Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science

ESnet5 Transition Constraints

Deadlines •  ESnet4 backbone waves on the Internet2 DWS (Infinera System)

needed to be shutdown no later than Nov 30th 2012 •  ESnet4 backbone waves on the NLR system needed to be

shutdown by Dec 31st 2012 •  ESnet4 metro waves in the San Francisco Bay Area needed to be

shutdown by Dec 31st 2012

Contributing Challenges •  Contracting & procurement always take longer than planned •  Equipment delivery delays

1/16/13 8

Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science

Challenges & Risks – 10x10 MSA Optics

•  There are around 100 pluggable 100G Optics (CFP) in ESnet5 now

•  Router & transport vendors in mid 2011: −  LR4’s were the only available & supported optic −  LR4’s were as high as $375K list each with discounted prices

greater than $50K

•  Santur put out a press release saying their 10x10 MSA CFPs were available for under $5K each in large quantities in July 2011 −  So, costs for 100 CFPs were bounded between $0.5M and $5M

•  We made a decision to go with Santur 10x10 MSA CFPs −  Worked with ALU and convinced them to support and resell them −  Purchased 18 from Santur directly for ANI Phase 1 while working

with Ciena to get them into their testing & support process

•  We had several millions of dollars of risk if this didn’t work out

1/16/13 9

Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science

10x10 MSA Optics: The results

No interoperability problems or failures yet! •  Currently resold and supported by ALU, Ciena, Brocade & others •  Work in Juniper & Cisco gear, but not certified or resold by them •  No unexpected interoperability problems encountered* •  Typically >50% cheaper than LR4s

Future Outlook •  We have deployed a small number of LR4’s where required for

interoperability with Cisco’s in Cisco supported configurations •  Santur was bought by NeoPhotonics •  NeoPhotonics only sells to OEMs •  Limited (1?) manufacture of the 10x10 MSA CFP leads to supply-

chain risks •  We are experiencing delivery delays

1/16/13 10

Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science

Challenges & Risks – ALU 7750’s

The ALU 7750 is a new entry to the R&E market sector •  Supports all the right protocols & has an attractive feature set •  Designed as a broadband service delivery platform •  We have been able to make it do what we need so far •  We have barely touched the unique features that it offers

Did run into one serious issue •  The box has many config knobs, some of which enable behavior

specified in internet drafts that have not been accepted by the global community

•  Don’t twist any unless you fully understand the global implications of what they do

•  A future ALU OS release will have 1 less knob •  Sorry to those folks who were impacted…

Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science

Being on the Leading/Bleeding Edge

This is a fun place to be, but it does add a bit of stress

Having good relationships with your suppliers is critical! •  We have had excellent support from Ciena and ALU in dealing

with challenges, problems & delays

Must be flexible because plans will change •  Moved all of our 1st set of Ciena 100G transponders to shorter-

reach spans •  Replaced all of our 1st generation of 100G router interfaces •  Will be replacing all of our 3rd-party 100G CFPs in our Ciena’s

Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science

What went well?

•  Partner relationship with Internet2 & sharing the common infrastructure.

•  Consolidation of ESnet4 IP & SDN routers to make space for the ESnet5 routers

•  Router & Transponder installations •  The first ~13,000 miles of ‘common’ fiber & optical system installs. •  Transitioning the new routers & circuits into production •  Staff changes (retirement of some senior people) were not as

disruptive as expected •  Acceptance testing

Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science

Acceptance Testing >1 PetaBytes, no loss (Or more than 100 PB if you count every packet in and every packet out on every interface)

1/16/13 14

Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science

What could have gone better - Shipping

Never put more than $250K of equipment in a single shipping crate. Even if it is a very heavy-duty crate with a strong steel shelf in the middle! The crate will be dropped. The shelf will bend, and the cards will be damaged!

crate-IMG_5938 Box6 NS110766347-IMG_5993

Box6 NS110766347-IMG_5994 Box6 NS110766347-IMG_5995

1/16/13 15

Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science

Discovered some serious PDU problems

Some of our AC dual supply PDUs had a power switching component that could silently fail closed, allowing them to leak power between input feeds leading to a serious shock hazard!

DC units had a problem with the DC power lug bolts & locking nuts. Leading to power balancing issues and the potential to short.

We worked with the vendor and they addressed both of these problems.

We swapped out a bunch of PDUs.

1/16/13 16

Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science

Lateral Build Challenges - JGI Locate services found and clearly marked 1 of the water lines We found the other one

1/16/13 17

Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science

Albuquerque

Goal: •  The ESnet4 hub was in 104 Gold •  The ESnet5 hub is in the first floor of 505 Marquette •  We needed to install 2 new tail circuits, or swing 3 existing circuits

from our ESnet4 hub to our ESnet5 hub before the backbone circuits to the ESnet4 hub terminated on November 30th

Challenges: •  Some providers only provide services at the building Minimum

Point Of Entry, regardless of what the order might say •  Others don’t provide any services there, they pull their fiber into a

proper suite/facility •  Some vendors take a long time to turn up new services •  The more parties involved, the more complex things get!

1/16/13 18

Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science

Albuquerque Solution

Big Thanks to Ed May, Gary Bauerschmidt at UNM/ABQG! Level3 & Centurylink also put in a lot of hard work to make this happen. Together we made it work with more than 48 hours before the deadline!

1/16/13 19

104 Gold

505 Marquette Level3: 1st floor

ELPA-ANI

MX480MX480

ALBU-SDN1

DENV-CR5

MX960MX960

DENV-CR2

ALBQ-ASW1

LANLRouter

ALBU-CR1

MX480MX480

SNLA-RT3

ALBQ-CR5

MX960MX960

ELPA-CR1

10GESnet4

10GESnet4

10GESnet4

10GESnet4

10G10G

10G10G

100GESnet5

100GESnet6

10G

1G

1G

10G

CenturylinkMain Hub 10G

1G1G

Other ESnet5Nodes

Other ESnet5Nodes

Level3 FDP

ESnet FDP

ABQG roomBasement

Centurylink FDP

Level3 FDP

RAMP RoomMPOE

Centurylink FDP

Critical room where both

providers are colocated!

Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science

What’s Left to do?

•  100G production connections to the labs: −  ANL, BNL, FNAL, LBL, LLNL, ORNL & NERSC

•  100G production connections to peers: −  MANLAN, Starlight, PACWAVE, WIX, Internet2

•  40G into Equinix Ashburn & re-arranging our Washington DC ring to provide diverse backbone connections for JLAB & other sites in the area

•  Lots of cleanup & consolidation at the hubs, moving connections from the MX’s to the ALUs

•  Normalize our 100G Testbed infrastructure •  Swap out our un-supported ‘third party’ 10x10 MSA CFPs in our

Ciena interfaces with Ciena supported ones covered by a 4-hour on-site maintenance contract

•  Additional diversity for ANL & FNAL 1/16/13 20

Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science

Summary

•  Next Time −  Plan more time & resources for communicating early and often −  Plan more time & resources for handling logistics

•  It is harder and takes more time than expected −  Include the logistics head-aches in the cost-benefit analysis of

dealing with multiple types of 10G optics (XFP & SPF+ in both 1310 & 1550)

We have a great team, and everybody pulled together to work on the challenges as they came up!

1/16/13 21

Questions?

Thanks!

Joe Metzger – [email protected]

http://www.es.net/

http://fasterdata.es.net/


Recommended