ESnet5 Deployment Lessons Learned
Joe Metzger, Network Engineer
ESnet Network Engineering Group
TIP
January 16 2013
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
Outline
ESnet5 Overview • Transport Network • Router Network • Transition Constraints
Deployment Experiences • Challenges & Risks • General Issues • What went well • What could have gone better • What still needs to be done
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
ESnet5 Transport Network
ESnet Partnered with Internet2 to build a Ciena OME 6500 nation-wide optical system • ESnet has 50% of the capacity of the shared optical system
− (Excluding the Northern path from Seattle to Chicago through ID, MT, ND, MN & WI)
• Shared Spectrum, Chassis, Configuration, Management, etc…
System • Over 14,000 miles of shared Internet2 fiber, most of it pre-existing
• Optical System Inventory report has 6495 components! − 341 nodes, 60+ add/drop/regen − 80% are common, the rest are dedicated to ESnet or Internet2 (over 500 are XFPs.) − Sunnyvale has 4 32-slot shelves − Sacramento is a 7-direction node
We extended the shared optical system to connect national laboratories • ~600 miles of ESnet fiber including building 12 new laterals
• Ring connecting Chicago-Hub & Starlight to ANL & FERMI • Ring connecting Sacramento & Sunnyvale to LBL, SNLL, LLNL, SLAC, JGI, NERSC
• Spur to ORNL
We also have our existing Infinera system on Long Island between hubs in NYC and BNL.
Services
• Point-to-Point static dedicated optical Circuits
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
ESnet5 Optical System January 2012
Geography is only representational
SUNN
SEAT
LOSA
BOIS
EUGE
PORT
STLO
INDI
EQCH
CHIC
STAR
HOUS
DALL
TULS
KANS
SANA
GOOD
ELPA
ALBQ
DENV
PHOE
LASV
ECHOSALT
EURERENO
BOSTALBA
BUFF
PITT
CLEV
SOUT
JACK
ATLA
CHAT
BATOHOUL
JKMS
MEMPNASH
LOUI
CINC
CHAR
RALE
WASH
PHIL
AOFA
NEWY
ASHB
ANL
FNAL
ORNL
SNLL
NERSC
LBNL
JGI
SLAC
ESnet5 Optical Network
10:8 1/3/2013
44 Lambdas
61 Lambdas
100G+ Lit
Add Drop Node (Ciena)
ORNL Express Node (Ciena)
BNL
88 Lambdas
LIMAN Node (Infinera)
SAND
PEBL
SACR
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
1/16/13 5
ESnet5 Optical System January 2012 Lit Waves
SUNN
SEAT
LOSA
BOIS
EUGE
PORT
STLO
INDI
EQCH
CHIC
STAR
HOUS
DALL
TULS
KANS
SANA
GOOD
ELPA
ALBQ
DENV
PHOE
LASV
ECHOSALT
EURERENO
BOSTALBA
BUFF
PITT
CLEV
SOUT
JACK
ATLA
CHAT
BATOHOUL
JKMS
MEMPNASH
LOUI
CINC
CHAR
RALE
WASH
PHIL
AOFA
NEWY
ASHB
ANL
FNAL
ORNL
SNLL
NERSC
LBNL
JGI
SLAC
ESnet5 Optical Network
10:54 1/10/2013
44 Lambdas
61 Lambdas
100G+ Lit
Add Drop Node (Ciena)
100G Wave < 1000km
100G Wave > 1000km
10G Wave with optical protection
4x10G mux’d Wave
X2
X2
ORNL Express Node (Ciena)
BNL
88 Lambdas
LIMAN Node (Infinera)
SAND
PEBL
SACR
X3X2
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
ESnet5 Routed Network
Routers • 16 new Alcatel Lucent (ALU) 7750-SR12 − 10-slot router with up to 2x100G per slot today. − 56 100G interfaces & 200+ 10G interfaces
• 35 existing Juniper MXs − Used in 10G hubs, commercial exchange points, sites
• 12 existing Juniper M7i & M10i − For terminating links slower than GE
• 5 really old Cisco 7206s to be retired − Terminating links slower than GE
Services • Standard routed IP (including full Internet services) • Point to Point Dynamic Virtual Circuits using OSCARS • Various overlay networks (Private VPN’s, LHCONE VRF)
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
SNLL
PNNL
LANL
SNLA
JLAB
PPPL
BNL
AMES
LLNL
JGI
LBNL ANL
Salt Lake
GFDL PU Physics
SEAT
STA
R
commercial peering points R&E network peering locations
ALBQ
LASV
SDSC
LOSA
Routed IP 100 Gb/s Routed IP 4 X 10 Gb/s 3rd party 10Gb/s Express / metro 100 Gb/s Express / metro 10G Express multi path 10G Lab supplied links Other links Tail circuits
Major Office of Science (SC) sites LBNL Major non-SC DOE sites LLNL
CLE
V
ESnet optical transport nodes (only some are shown)
ESnet managed 10G router 10 ESnet managed 100G routers 100
Site managed routers 100 10
10 10
10
10
100 10
10 10
1
100
100
100
100
10
10
100
100
10
10
10
10
10
100
100 10
10
100 10
100
SUNN ESnet PoP/hub locations
LOSA ESnet optical node locations (only some are shown)
10
100
100
100
100
100
100 E
QC
H
Geography is only representational
ESnet5 January 2012
1
100
10
SUNN
100
100 100
10
100
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
ESnet5 Transition Constraints
Deadlines • ESnet4 backbone waves on the Internet2 DWS (Infinera System)
needed to be shutdown no later than Nov 30th 2012 • ESnet4 backbone waves on the NLR system needed to be
shutdown by Dec 31st 2012 • ESnet4 metro waves in the San Francisco Bay Area needed to be
shutdown by Dec 31st 2012
Contributing Challenges • Contracting & procurement always take longer than planned • Equipment delivery delays
1/16/13 8
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
Challenges & Risks – 10x10 MSA Optics
• There are around 100 pluggable 100G Optics (CFP) in ESnet5 now
• Router & transport vendors in mid 2011: − LR4’s were the only available & supported optic − LR4’s were as high as $375K list each with discounted prices
greater than $50K
• Santur put out a press release saying their 10x10 MSA CFPs were available for under $5K each in large quantities in July 2011 − So, costs for 100 CFPs were bounded between $0.5M and $5M
• We made a decision to go with Santur 10x10 MSA CFPs − Worked with ALU and convinced them to support and resell them − Purchased 18 from Santur directly for ANI Phase 1 while working
with Ciena to get them into their testing & support process
• We had several millions of dollars of risk if this didn’t work out
1/16/13 9
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
10x10 MSA Optics: The results
No interoperability problems or failures yet! • Currently resold and supported by ALU, Ciena, Brocade & others • Work in Juniper & Cisco gear, but not certified or resold by them • No unexpected interoperability problems encountered* • Typically >50% cheaper than LR4s
Future Outlook • We have deployed a small number of LR4’s where required for
interoperability with Cisco’s in Cisco supported configurations • Santur was bought by NeoPhotonics • NeoPhotonics only sells to OEMs • Limited (1?) manufacture of the 10x10 MSA CFP leads to supply-
chain risks • We are experiencing delivery delays
1/16/13 10
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
Challenges & Risks – ALU 7750’s
The ALU 7750 is a new entry to the R&E market sector • Supports all the right protocols & has an attractive feature set • Designed as a broadband service delivery platform • We have been able to make it do what we need so far • We have barely touched the unique features that it offers
Did run into one serious issue • The box has many config knobs, some of which enable behavior
specified in internet drafts that have not been accepted by the global community
• Don’t twist any unless you fully understand the global implications of what they do
• A future ALU OS release will have 1 less knob • Sorry to those folks who were impacted…
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
Being on the Leading/Bleeding Edge
This is a fun place to be, but it does add a bit of stress
Having good relationships with your suppliers is critical! • We have had excellent support from Ciena and ALU in dealing
with challenges, problems & delays
Must be flexible because plans will change • Moved all of our 1st set of Ciena 100G transponders to shorter-
reach spans • Replaced all of our 1st generation of 100G router interfaces • Will be replacing all of our 3rd-party 100G CFPs in our Ciena’s
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
What went well?
• Partner relationship with Internet2 & sharing the common infrastructure.
• Consolidation of ESnet4 IP & SDN routers to make space for the ESnet5 routers
• Router & Transponder installations • The first ~13,000 miles of ‘common’ fiber & optical system installs. • Transitioning the new routers & circuits into production • Staff changes (retirement of some senior people) were not as
disruptive as expected • Acceptance testing
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
Acceptance Testing >1 PetaBytes, no loss (Or more than 100 PB if you count every packet in and every packet out on every interface)
1/16/13 14
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
What could have gone better - Shipping
Never put more than $250K of equipment in a single shipping crate. Even if it is a very heavy-duty crate with a strong steel shelf in the middle! The crate will be dropped. The shelf will bend, and the cards will be damaged!
crate-IMG_5938 Box6 NS110766347-IMG_5993
Box6 NS110766347-IMG_5994 Box6 NS110766347-IMG_5995
1/16/13 15
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
Discovered some serious PDU problems
Some of our AC dual supply PDUs had a power switching component that could silently fail closed, allowing them to leak power between input feeds leading to a serious shock hazard!
DC units had a problem with the DC power lug bolts & locking nuts. Leading to power balancing issues and the potential to short.
We worked with the vendor and they addressed both of these problems.
We swapped out a bunch of PDUs.
1/16/13 16
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
Lateral Build Challenges - JGI Locate services found and clearly marked 1 of the water lines We found the other one
1/16/13 17
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
Albuquerque
Goal: • The ESnet4 hub was in 104 Gold • The ESnet5 hub is in the first floor of 505 Marquette • We needed to install 2 new tail circuits, or swing 3 existing circuits
from our ESnet4 hub to our ESnet5 hub before the backbone circuits to the ESnet4 hub terminated on November 30th
Challenges: • Some providers only provide services at the building Minimum
Point Of Entry, regardless of what the order might say • Others don’t provide any services there, they pull their fiber into a
proper suite/facility • Some vendors take a long time to turn up new services • The more parties involved, the more complex things get!
1/16/13 18
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
Albuquerque Solution
Big Thanks to Ed May, Gary Bauerschmidt at UNM/ABQG! Level3 & Centurylink also put in a lot of hard work to make this happen. Together we made it work with more than 48 hours before the deadline!
1/16/13 19
104 Gold
505 Marquette Level3: 1st floor
ELPA-ANI
MX480MX480
ALBU-SDN1
DENV-CR5
MX960MX960
DENV-CR2
ALBQ-ASW1
LANLRouter
ALBU-CR1
MX480MX480
SNLA-RT3
ALBQ-CR5
MX960MX960
ELPA-CR1
10GESnet4
10GESnet4
10GESnet4
10GESnet4
10G10G
10G10G
100GESnet5
100GESnet6
10G
1G
1G
10G
CenturylinkMain Hub 10G
1G1G
Other ESnet5Nodes
Other ESnet5Nodes
Level3 FDP
ESnet FDP
ABQG roomBasement
Centurylink FDP
Level3 FDP
RAMP RoomMPOE
Centurylink FDP
Critical room where both
providers are colocated!
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
What’s Left to do?
• 100G production connections to the labs: − ANL, BNL, FNAL, LBL, LLNL, ORNL & NERSC
• 100G production connections to peers: − MANLAN, Starlight, PACWAVE, WIX, Internet2
• 40G into Equinix Ashburn & re-arranging our Washington DC ring to provide diverse backbone connections for JLAB & other sites in the area
• Lots of cleanup & consolidation at the hubs, moving connections from the MX’s to the ALUs
• Normalize our 100G Testbed infrastructure • Swap out our un-supported ‘third party’ 10x10 MSA CFPs in our
Ciena interfaces with Ciena supported ones covered by a 4-hour on-site maintenance contract
• Additional diversity for ANL & FNAL 1/16/13 20
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
Summary
• Next Time − Plan more time & resources for communicating early and often − Plan more time & resources for handling logistics
• It is harder and takes more time than expected − Include the logistics head-aches in the cost-benefit analysis of
dealing with multiple types of 10G optics (XFP & SPF+ in both 1310 & 1550)
We have a great team, and everybody pulled together to work on the challenges as they came up!
1/16/13 21