EGEE-III INFSO-RI-222667
Enabling Grids for E-sciencE
www.eu-egee.org
EGEE and gLite are registered trademarks
Xavier JeanninActivity ManagerCNRS
EGEE-III First Review, 24-25 June, 2009
SA2: Networking Support Status Report
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 2
SA2 Overview
CountryTotal PM planned at M24
Total FTE
France 96 4.0Germany 12 0.5Greece 18 0.8Italy 12 0.5
Russia 6 0.3Spain 6 0.3
GEANT2-DANTE (UK) 3 0.1
Total PM planned at M24 153
Total FTE 6.4
SA2 provides an interface with the network• Operational interface that ensures the daily relations with the network
infrastructures: ENOC, advanced network services tasks• Relational interface that ensures the “higher level” of interactions with the
network providers
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009
SA2 – EGEE-III
SA2 Global view
3
TSA2.2 Support for the ENOC
IPv6 (GARR, CNRS)
Operational procedures (CNRS)
WLCG Support (CNRS)
Operational tools and maintenance
(RRC-KI, CNRS)
TSA2.3 Overall Networking coordination
TSA2.1 Running the ENOC
TT exchange standardization (GRNET)
Advanced network services (GRNET)
TNLC
IPv6 (GARR, CNRS)
Monitoring (DFN)
Site networking needs (RedIRIS)
Troubleshooting (DFN)
TSA2.4 Management and general project tasks
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
EGEE Network Operation Centre CNRS
Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 4
• Sites
GGUS
Users
Support Units
• NRENs
GÉANT2
EGEE Network
• Sites• SitesSites • NRENs
• NRENs
NRENs
ENOC
ENOC ensuring E2E connectivity for Grid sites on the whole path
GÉANT2NREN ARC 1
Grid site 1 NREN BRC 2
• Grid site 2
Operated by DANTEOperated by NOC of NREN A
Operated by NOC of NREN B
Operated by NOC of RC2
Operated by NOC of RC1
A single point of contact between EGEE and the NREN
Role of the ENOC
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009
network connectivity assessment• Assessment for year 2008 on EGEE certified Grid sites
(~ 300)
5
85% of sites < 4 days of downtime/year= 98.90% reachability/year
46 sites
55% of sites have less than a day of yearly unscheduled network downtime
80% of off-site network troubles are solved within 30 minutes
Tool Downcollector https://ccenoc.in2p3.fr/DownCollector
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
ENOC metrics
Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 6
• Issue: very few Grid user notifications about network problems• 19 NRENS• 11 different languages • 75 % of European
certified sites covered
ENOC webserver statistics
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 7
WLCG Support CNRS
• SA2 has also taken the lead in designing and implementing a pioneering federated operational model for the LHCOPN– https://twiki.cern.ch/twiki/bin/view/LHCOPN/OperationalModel
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 8
WLCG Support
• Processes were documented and disseminated– Several meeting and training sessions help the dissemination
• Related tools were released, including a GGUS helpdesk tailored for the LHCOPN
• Implementation is ongoing and will be ready for LHC start-up
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 9
Operational tools and maintenance RRC-KI
• Trouble matching and correlation for the ENOC– Correlate tickets with monitoring data– Better assessment of the impact on the grid of trouble tickets
GEANT/NRENScommunity
ENOC HELPDESKSERVICES
Ticketpreprocessing
Active monitoring &alert processing
PROBLEM NOTIFICATIONS&TICKETS
Topology database(NOD)
Historycorrelation
matrix
StatisticalMatchingEngine
(event correlator)Metaticket
store
ACTIVE NODE PORBES
TO ANALYSIS
ATTRIBUES
Node statestore
NODE STATE REPORTS
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 10
Operational tools and maintenance
• First stage of our study• The results are experimental and
should improve ModerateBad Moderate
Fine
(1) Alert notifications (2) Resulting node status
The matcher output (3) Dec 7 16:34:06 loss-moderate (rising)
Dec 7 16:39:07 loss-bad (rising) Dec 7 18:04:07 loss-moderate (resuming)
Dec 7 18:09:08 loss-fine (resuming)
• Future work plan includes: • production version setting up• automatic ticket ranking based on matching results• tuning of matching algorithm, possibly through more extensive
use of the topology knowledge
Counter
Time
Rising to warning
Resuming to normal
Rising to warning
Rising to critical
NORMAL ZONE
WARNING ZONE
CRITICAL ZONE
normal/warning hold zone
warning/critical hold zone
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 11
Network monitoring tools DFN • Network monitoring tools for efficient troubleshooting
– PerfSONAR-Lite TroubleShooting System– Launch test on demand
from a Grid site under central server control: ping, traceroute, DNS lookup, nmap and bandwidth measurements
– Based on PerfSONAR-PS(Perl version)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 12
Network monitoring tools
• First beta-release is foreseen for June– Beta-tester: CNRS, NorduNET,
GARR.– First version Autumn 2009
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 13
Sites networking needs RedIRIS
• Assess network requirements (bandwidth, delay, jitter, etc.) for a site within the Grid, according to the kind of site and VOs supported
• Empirical approach
• Deployment of perfSONAR at country scale– RedIRIS provides a much more bigger effort for this task than
EGEE funds for that – First deployment in Europe over several domains (4 domains,
9 sites) of such solution (no appliance box is used)– PerfSONAR is deployed into EGEE sites and into networks used.
• Issue about interoperability between perfSONAR's versions• First deployment end of September
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 14
Sites networking needs
Topology of the network monitored by this task
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
Technical Network Liaison Committee
• TNLC (Technical Network Liaison Committee):– Set up during EGEE in order to ease the technical discussions
between EGEE, the NRENs and the GÉANT2 project– Participants: EGEE SA2, GÉANT2 (represented by DANTE as
coordinator of GÉANT2), some of the NRENs involved in the EGEE activities, the NREN PC and CERN.
– 2 meetings
• Work mainly focused on:· Improvement of trouble ticket contents
· Improve the assessment of the impact of problems on the Grid.· Monitoring
· Design a solution for the Grid infrastructure
Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 15
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
Advanced network services GRNET• Collaboration with AMPS team - Advanced Multi-domain
Provisioning System – in order to automate SLA establishment• Development of web interface to manage the EGEE SLA requests
– Store and manage the EGEE users’ SLA requests• ENOC will act on behalf of the user
• The user request is stored into the ENOC• The ENOC validate it and will then use
the AMPS system to make the reservation• AutoBAHN has also been studied
but seems not mature at the moment
Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 16
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 17
Trouble ticket exchange standardization GRNET
• GRNET and the ENOC team provide the ENOC with a central server translating NREN’s ticket into standard ticket– Designed and implemented with
the open source software
• Ticket normalization is very important to improve efficiency of project’s wide network operations
• Dissemination was also made through a submission of a RFC about the normalization of the trouble tickets (“The Network Trouble Ticket Data Model”, Internet Draft) http://tools.ietf.org/html/draft-dzis-nwg-nttdm-00
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 18• 18
• Analysis of the gLite source code– Using the IPv6 metric (IPv6 code checker) in ETICS– Around 110 bugs on non-compliant function calls in the code reported– This analysis effectively incited developers to work on IPv6
• IPv6 compliance of external gLite dependencies
• A new IPv6 code checker developed by SA2 IPV6 CARE http://sourceforge.net/projects/ipv6-care – It monitors the execution of any programs
- even if you don’t have the source code -and detects networking function calls and provides the diagnosis
• Many informative studies https://twiki.cern.ch/twiki/bin/view/EGEE/IPv6FollowUp
– IPv6 programming method C/C++, Java, Python and Perl / IPv6 testing method gSOAP / Axis / Axis2 / Boost:asio / gridFTP / PythonZSI / PerlSOAPLite
– Assessment of the IPv6 compliance of gLite components: DPM & LFC
• Dissemination: meetings, training session, demonstration, video
IPv6 GARR/CNRS
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 19
Current stand on gLite and IPv6
• Full IPv6 compliance – for the production version
• Full IPv6 compliance – for a prototype version
• IPv6 compliance to be tested/verified by SA2 – gLite part of the deployment module claimed to be IPv6 compliant
• IPv6 porting currently on-going
• IPv6 porting plan exist
• No porting plan yet (we are not aware of)
IPv6 compliance
LFC DPM globus-url-copy/gridFTP
BDII(perl)
CREAM
VObox
lcgutils VOMS
PX MON dCache Torque C/S MPIutils
Condorutils AMGA
gfal
FTS
BDII(python) WMproxy/Job submission blah
WMS-server
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 20
Middleware Work plan and evolutionBLAH
WMS / WMproxyJob Management
GFALLCG util
FTS
VOMS Client and APIs
Apr 09
VOMS Server
RGMA
Jun 09 Apr 10Aug 09 Oct 09 Dec 09 Feb 10Feb 09
September 2007
March 20090%
20%40%60%80%
100%120%
67% 80%
Non-IPv6 compliant packagesIPv6 compliant packages
• Evolution obtained on the gLite repository of ETICS
• Workplan of JRA1 around IPv6
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 2121
Integration of IPv6 within EGEE
21
VOMS
.236 :d
LB
WMS CE WN1
WN2
BDIISE
LFC
PX
.233
.226 .227
.228.232
.231
.234
LB serverVOMS Server
UREC site BD-II
Workload management server
LFC File Catalog
LCG Computing Element
Worker Node (Torque/PBS)
DPM Storage Element
MyProxy server
.229
.235
2001:660:3302:7006::1
Gateway IPv6
:a
:8
:3:4
:5
UIUser Interface
.230 :7
:9
:b
:6
:c
UI2
VOMS2 .59
LB
WMS
DPM1
LFC
.50
.27
.22
.51
LB server
SA2 top level BD-II
RGMA-BDII.24
GARR site BD-II
User Interface
Workload management server
LFC File Catalog
Worker Node (Torque/PBS)
CE WN1
WN2CREAM.23
.56
LCG Computing Element
CREAM Computing Element
Storage Element
BDII
.30
.29
.21
DEV.34
Grid Job monitoring DB
.29.11Gateway
2001:760::159:242/64
IPv4/IPv6 Internet:Renater/GEANT/GARR
GARR/ROME UREC/PARIS
A distributed testbed
• Demonstration of the 2 first dual stack IPv4/IPV6 sites of EGEE at User Forum 09
Grid job over IPv6
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
Issues & future plans• ENOC
– Issue: lack of monitoring data and troubleshooting tools deployed in the end sites and available for the ENOC
Deployment of PerSONAR-Lite TroubleShooting System SA2 is providing an extra effort to design network monitoring with NREN support Spanish EGEE sites network monitoring
– Issue: impact assessment of trouble ticket Collaboration with NRENs
– Transition toward EGI-NGI• Second step of trouble matching and correlation work • Improvement of trouble ticket standardization software• Advanced network services
– Make the SLA installation procedure more automatic • IPv6
– Integration into EGEE validation process– Testing new gLite IPv6 modules
• Issue: network activity understaffed within the EGI-NGI project
Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 22
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
Main achievements
• ENOC running https://ccenoc.in2p3.fr/– First version of the trouble tickets model has been implemented
• WLCG / LHCOPN– Design the OPN operational model
https://twiki.cern.ch/twiki/bin/view/LHCOPN/OperationalModel• Monitoring
– Beta-version of PerfSONAR-Lite TSS• IPv6 https://twiki.cern.ch/twiki/bin/view/EGEE/IPv6FollowUp
– IPv6 care, informative reports– Status and work plan of the middleware– Demonstration of the 2 first dual stack IPv4/IPV6 sites of EGEE
• Trouble ticket exchange standardization – Submission of a RFC, the normalization of the trouble tickets (“The Network
Trouble Ticket Data Model”, Internet Draft)• TNLC
– EGEE 09 - Terena NRENs & Grid joint meeting, Barcelona Sept. 2009
Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 23