Post on 02-Feb-2016
description
transcript
David Foster, CERN
LHCC Comprehensive ReviewNovember 2007
LHCOPN Networking Status
David FosterHead, Network and Communications Systems Group
CERN IT-CS
LHCC Comprehensive Review, November 2007
David Foster, CERN
Information
• All technical content is on the LHCOPN Twiki: http://lhcopn.cern.ch
• Coordination Process– LHCOPN Meetings (every 3 months)
• Active Working Groups– Routing– Monitoring– Operations
– Active Interfaces to External Networking Activities• European Network Policy Groups• US Research Networking• Grid Deployment Board• LCG Management Board• EGEE
David Foster, CERN
Overview
• LHC Wide Area Networking– LHCOPN Mission– Current Status– Production– Issues and Risks
• Not Covered– CERN General Purpose Networking– Accelerator and Experiment Networks– Other Communications Systems
David Foster, CERN
Mission
• To assure the T0-T1 transfer capability.– Essential for the Grid to distribute data out to the
T1’s.– Capacity must be large enough to deal with most
situation including “Catch up”– The excess capacity can be used for T1-T1
transfers.• Lower priority than T0-T1• May not be sufficient for all T1-T1 requirements
• Resiliency Objective– No single failure should cause a T1 to be
isolated.
David Foster, CERN
GÉANT2: Consortium of 34 NRENs
Multi-Wavelength Core (to 40) + 0.6-10G Loops
Dark Fiber Core Among16 Countries:
AustriaBelgiumBosnia-Herzegovina Czech RepublicDenmarkFranceGermanyHungaryIrelandItaly,NetherlandSlovakiaSloveniaSpainSwitzerlandUnited Kingdom
22 PoPs, ~200 Sites38k km Leased Services, 12k km Dark Fiber Supporting Light Paths for LHC, eVLBI, et al.
H. Doebbeling
David Foster, CERN
OPN Status SummaryNovember 2007
Link Status Nominal E2e Capacity Changes Expected
BNL OPN Production 10G (Colt)
FNAL OPN Production 10G (Qwest)
TRIUMF OPN Production 5G +1G Backup link
ASGC OPN Production 2x1G (2.5G+10G to AMS)
10G via GN2 + IP peering with GN2
Q4 07
NDGF OPN Production 10G Connected to Nordunet.
Connect to NDGF OPN
Q1 08
SARA OPN Production 10G Still using GEANT/IP Q4 07
RAL OPN Production 10G
FZK OPN Production 10G
CNAF OPN Production 10G
IN2P3 OPN Production 10G
PIC OPN Production 10G
David Foster, CERN
USLHCNetNovember 2007
Link Status Nominal E2e Capacity Changes Expected
CERN - MANLan OPN Production 10G (Colt)
CERN - Starlight OPN Production 10G (Qwest)
CERN - NetherLight Backup 10G (GN2)
NetherLight - MANLan Backup 10G (Global Crossing)
MANLan - StarLight Backup 10G (Global Crossing)
MANLan - StarLight Backup 10G (Qwest)
MANLan - London Backup 10G (GC) New Q1 08
London - CERN Backup 10G New Q1 08
David Foster, CERN
USLHCNet
• A number of links providing alternate routing for primary traffic.
• Relationship with ESNet (and DOE approval) to provide capacity (O(5G)) on the ManLan – AMS link for additional ESNet-GEANT peering– This helps for US Tier-1 to EU Tier-2 connectivity.
– US Tier-2 to EU Tier-1 will require additional peering I2-GEANT. Discussions are ongoing.
LHCC Comprehensive Review, November 2007
David Foster, CERN
CBF Status SummaryNovember 2007
Link Status Nominal E2e Capacity
Provider Changes Expected
SARA - NDGF 10G Q4 ‘07
SARA - FZK In Place. Unused 10G Q4 ‘07 In Production
FZK - CNAF In Production 10G
FZK – IN2P3 In Production 10G
Triumf - BNL In Place. 1G Q4 ‘07 In Production
David Foster, CERN
David Foster, CERN
David Foster, CERN
David Foster, CERN
T0-T1 Lambda routing (schematic) Connect. Communicate. Collaborate
DEFrankfurt
Basel
T1 GRIDKA
T1
Zurich
CNAF
DK
Copenhagen
NL
SARA
UK
London
T1
BNL
T1
FNAL
CH
NY
Starlight
MAN LAN
FR
Paris
T1
IN2P3
Barcelona
T1
PIC
ES
Madrid
T1
RAL
ITMilan
Lyon
Strasbourg/Kehl
GENEVA
AtlanticOcean
VSNL N
VSNL S
AC-2/Yellow
Stuttgart
T1 NDGF
T0
Hamburg
T1SURFnet
T0-T1s:CERN-RALCERN-PICCERN-IN2P3CERN-CNAFCERN-GRIDKACERN-NDGFCERN-SARACERN-TRIUMFCERN-ASGCUSLHCNET NY (AC-2)USLHCNET NY (VSNL N)USLHCNET Chicago (VSNL S)
T1
TRIUMFT1
ASGC
???
Via SMW-3 or 4 (?)
Amsterdam
From Michael Enrico, DANTE
David Foster, CERN
T1-T1 Lambda routing (schematic) Connect. Communicate. Collaborate
DEFrankfurt
Basel
T1 GRIDKA
T1
Zurich
CNAF
DK
Copenhagen
NL
SARA
UK
London
T1
BNL
T1
FNAL
CH
NY
Starlight
MAN LAN
FR
Paris
T1
IN2P3
Barcelona
T1
PIC
ES
Madrid
T1
RAL
ITMilan
Lyon
Strasbourg/Kehl
GENEVA
AtlanticOcean
VSNL N
VSNL S
AC-2/Yellow
Stuttgart
T1 NDGF
T0
Hamburg
T1SURFnet
T1-T1s:GRIDKA-CNAFGRIDKA-IN2P3GRIDKA-SARASARA-NDGF
T1
TRIUMFT1
ASGC
???
Via SMW-3 or 4 (?)
From Michael Enrico, DANTE
David Foster, CERN
Some Initial ObservationsConnect. Communicate. Collaborate
DEFrankfurt
Basel
T1 GRIDKA
T1
Zurich
CNAF
DK
Copenhagen
NL
SARA
UK
London
T1
BNL
T1
FNAL
CH
NY
Starlight
MAN LAN
FR
Paris
T1
IN2P3
Barcelona
T1
PIC
ES
Madrid
T1
RAL
ITMilan
Lyon
Strasbourg/Kehl
GENEVA
AtlanticOcean
VSNL N
VSNL S
AC-2/Yellow
Stuttgart
T1 NDGF
T0
Hamburg
T1SURFnet(Between CERN and BASEL)Following lambdas run in same fibre pair:CERN-GRIDKACERN-NDGFCERN-SARACERN-SURFnet-TRIUMF/ASGC (x2)USLHCNET NY (AC-2)
Following lambdas run in same (sub-)duct/trench:(all above +)CERN-CNAFUSLHCNET NY (VSNL N) [supplier is COLT]
Following lambda MAY run in same (sub-)duct/trench as all above:USLHCNET Chicago (VSNL S) [awaiting info from Qwest…]
(Between BASEL and Zurich)Following lambdas run in same trench:CERN-CNAFGRIDKA-CNAF (T1-T1)
Following lambda MAY run in same trench as all above:USLHCNET Chicago (VSNL S) [awaiting info from Qwest…]
T1
TRIUMFT1
ASGC
???
Via SMW-3 or 4 (?)
KEY
GEANT2
NREN
USLHCNET
Via SURFnetT1-T1 (CBF)
From Michael Enrico, DANTE
David Foster, CERN
Result• SARA-CERN lambda has been rerouted• 4th diverse USLHCNET lambda will be added• RAL & PIC still need backups• CNAF needs a 3rd route into CERN
– Long route around “eastern ring” OR– New CBF solution(s)…
• Further investigations required in particular concerning:
– Physical routing of GRIDKA-IN2P3 in Paris area– Leased lambdas passing through UK
• Further analysis is on-going– May be some layer-1 switching solutions (LCAS) that could
help on the GEANT footprint.• Can do “LCAS protected 10GE” for ASGC
– Tests are on-going on the USLHCNet footprint
LHCC Comprehensive Review, November 2007
David Foster, CERN
Link Layer Monitoring
• Perfsonar very well advanced in deployment (but not yet complete). Monitors the “up/down” status of the links.
• Integrated into the “End to End Coordination Unit” (E2ECU) run by DANTE
• Provides simple indications of “hard” faults.
• Insufficient to understand the quality of the connectivity
LHCC Comprehensive Review, November 2007
David Foster, CERNLHCC Comprehensive Review, November 2007
David Foster, CERNLHCC Comprehensive Review, November 2007
David Foster, CERN
Initial Active Measurements
• One Way Latency– To measure network Reliability & detect Congestion– Between
• Tier0 to Tier1• Tier1 to Tier1
• Bandwidth– To detect & quantify service degradation– Between
• Tier0 and Tier1• Tier1 to Tier1
• ICMP based Latency– To measure Reliability & Congestion– Between
• LHCOPN Edge into Tier1 facility
David Foster, CERN
Active Monitoring Deployment
• It is a small number of servers at each Tier-1• Dante proposes to deploy this as a “service”.
Managed and maintained by them.– Mainly funded by the GEANT project as part of
the “transition to service” activity.– Major advantages in terms of measurement
quality and consistency.
• Will be presented at the next OB– Documents in preparation to cover requirements
from the T1’s and a “security plan”.
LHCC Comprehensive Review, November 2007
David Foster, CERN
Operational Procedures
• Have to be finalised but need to deal with change and incident management.– Many parties involved.– Have to agree on the real processes involved (activity
being lead by Mathieu Goutelle)• Recent Operations workshop made some progress
– Try to avoid, wherever possible, too many “coordination units”.
– All parties agreed we need some centralised information to have a global view of the network and incidents.
– Further workshop planned to quantify this.– We also need to understand existing processes used by
T1’s.
LHCC Comprehensive Review, November 2007
David Foster, CERN
Resiliency Issues
• The physical fiber path considerations continue– Some lambdas have been re-routed. Others still may
be.• Layer3 backup paths for RAL and PIC are still an
issue.– In the case of RAL, excessive costs seem to be a
problem.– For PIC, still some hope of a CBF between RedIris
and Renater• Overall the situation is quite good with the CBF
links, but can still be improved.– Most major “single” failures are protected against.
LHCC Comprehensive Review, November 2007
David Foster, CERN
Bigger Issues
• Will be important to get some agreements from the T1’s– Active Monitoring– Operational Management – in progress
• GEANT-2 will end (March 2009), GEANT-3 is being planned. GN-4 and beyond?– Assumption is that GEANT will continue ad-infinitum
• What will follow from EGEE-III in terms of network management resources?– Dante may be able to take over most of the
responsibility• Funding for USLHCNet assumed to continue.
LHCC Comprehensive Review, November 2007