+ All Categories
Home > Documents > SA2: Networking Support Status Report

SA2: Networking Support Status Report

Date post: 24-Feb-2016
Category:
Upload: gali
View: 41 times
Download: 0 times
Share this document with a friend
Description:
SA2: Networking Support Status Report. Xavier Jeannin Activity Manager CNRS EGEE-III First Review, 24-25 June, 2009. SA2 Overview. SA2 provides an interface with the network - PowerPoint PPT Presentation
23
EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Xavier Jeannin Activity Manager CNRS EGEE-III First Review, 24-25 June, 2009 SA2: Networking Support Status Report
Transcript
Page 1: SA2: Networking Support  Status Report

EGEE-III INFSO-RI-222667

Enabling Grids for E-sciencE

www.eu-egee.org

EGEE and gLite are registered trademarks

Xavier JeanninActivity ManagerCNRS

EGEE-III First Review, 24-25 June, 2009

SA2: Networking Support Status Report

Page 2: SA2: Networking Support  Status Report

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 2

SA2 Overview

CountryTotal PM planned at M24

Total FTE

France 96 4.0Germany 12 0.5Greece 18 0.8Italy 12 0.5

Russia 6 0.3Spain 6 0.3

GEANT2-DANTE (UK) 3 0.1

Total PM planned at M24 153  

Total FTE   6.4

SA2 provides an interface with the network• Operational interface that ensures the daily relations with the network

infrastructures: ENOC, advanced network services tasks• Relational interface that ensures the “higher level” of interactions with the

network providers

Page 3: SA2: Networking Support  Status Report

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009

SA2 – EGEE-III

SA2 Global view

3

TSA2.2 Support for the ENOC

IPv6 (GARR, CNRS)

Operational procedures (CNRS)

WLCG Support (CNRS)

Operational tools and maintenance

(RRC-KI, CNRS)

TSA2.3 Overall Networking coordination

TSA2.1 Running the ENOC

TT exchange standardization (GRNET)

Advanced network services (GRNET)

TNLC

IPv6 (GARR, CNRS)

Monitoring (DFN)

Site networking needs (RedIRIS)

Troubleshooting (DFN)

TSA2.4 Management and general project tasks

Page 4: SA2: Networking Support  Status Report

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

EGEE Network Operation Centre CNRS

Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 4

• Sites

GGUS

Users

Support Units

• NRENs

GÉANT2

EGEE Network

• Sites• SitesSites • NRENs

• NRENs

NRENs

ENOC

ENOC ensuring E2E connectivity for Grid sites on the whole path

GÉANT2NREN ARC 1

Grid site 1 NREN BRC 2

• Grid site 2

Operated by DANTEOperated by NOC of NREN A

Operated by NOC of NREN B

Operated by NOC of RC2

Operated by NOC of RC1

A single point of contact between EGEE and the NREN

Role of the ENOC

Page 5: SA2: Networking Support  Status Report

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009

network connectivity assessment• Assessment for year 2008 on EGEE certified Grid sites

(~ 300)

5

85% of sites < 4 days of downtime/year= 98.90% reachability/year

46 sites

55% of sites have less than a day of yearly unscheduled network downtime

80% of off-site network troubles are solved within 30 minutes

Tool Downcollector https://ccenoc.in2p3.fr/DownCollector

Page 6: SA2: Networking Support  Status Report

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

ENOC metrics

Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 6

• Issue: very few Grid user notifications about network problems• 19 NRENS• 11 different languages • 75 % of European

certified sites covered

ENOC webserver statistics

Page 7: SA2: Networking Support  Status Report

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 7

WLCG Support CNRS

• SA2 has also taken the lead in designing and implementing a pioneering federated operational model for the LHCOPN– https://twiki.cern.ch/twiki/bin/view/LHCOPN/OperationalModel

Page 8: SA2: Networking Support  Status Report

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 8

WLCG Support

• Processes were documented and disseminated– Several meeting and training sessions help the dissemination

• Related tools were released, including a GGUS helpdesk tailored for the LHCOPN

• Implementation is ongoing and will be ready for LHC start-up

Page 9: SA2: Networking Support  Status Report

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 9

Operational tools and maintenance RRC-KI

• Trouble matching and correlation for the ENOC– Correlate tickets with monitoring data– Better assessment of the impact on the grid of trouble tickets

GEANT/NRENScommunity

ENOC HELPDESKSERVICES

Ticketpreprocessing

Active monitoring &alert processing

PROBLEM NOTIFICATIONS&TICKETS

Topology database(NOD)

Historycorrelation

matrix

StatisticalMatchingEngine

(event correlator)Metaticket

store

ACTIVE NODE PORBES

TO ANALYSIS

ATTRIBUES

Node statestore

NODE STATE REPORTS

Page 10: SA2: Networking Support  Status Report

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 10

Operational tools and maintenance

• First stage of our study• The results are experimental and

should improve ModerateBad Moderate

Fine

(1) Alert notifications (2) Resulting node status

The matcher output (3) Dec 7 16:34:06 loss-moderate (rising)

Dec 7 16:39:07 loss-bad (rising) Dec 7 18:04:07 loss-moderate (resuming)

Dec 7 18:09:08 loss-fine (resuming)

• Future work plan includes: • production version setting up• automatic ticket ranking based on matching results• tuning of matching algorithm, possibly through more extensive

use of the topology knowledge

Counter

Time

Rising to warning

Resuming to normal

Rising to warning

Rising to critical

NORMAL ZONE

WARNING ZONE

CRITICAL ZONE

normal/warning hold zone

warning/critical hold zone

Page 11: SA2: Networking Support  Status Report

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 11

Network monitoring tools DFN • Network monitoring tools for efficient troubleshooting

– PerfSONAR-Lite TroubleShooting System– Launch test on demand

from a Grid site under central server control: ping, traceroute, DNS lookup, nmap and bandwidth measurements

– Based on PerfSONAR-PS(Perl version)

Page 12: SA2: Networking Support  Status Report

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 12

Network monitoring tools

• First beta-release is foreseen for June– Beta-tester: CNRS, NorduNET,

GARR.– First version Autumn 2009

Page 13: SA2: Networking Support  Status Report

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 13

Sites networking needs RedIRIS

• Assess network requirements (bandwidth, delay, jitter, etc.) for a site within the Grid, according to the kind of site and VOs supported

• Empirical approach

• Deployment of perfSONAR at country scale– RedIRIS provides a much more bigger effort for this task than

EGEE funds for that – First deployment in Europe over several domains (4 domains,

9 sites) of such solution (no appliance box is used)– PerfSONAR is deployed into EGEE sites and into networks used.

• Issue about interoperability between perfSONAR's versions• First deployment end of September

Page 14: SA2: Networking Support  Status Report

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 14

Sites networking needs

Topology of the network monitored by this task

Page 15: SA2: Networking Support  Status Report

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Technical Network Liaison Committee

• TNLC (Technical Network Liaison Committee):– Set up during EGEE in order to ease the technical discussions

between EGEE, the NRENs and the GÉANT2 project– Participants: EGEE SA2, GÉANT2 (represented by DANTE as

coordinator of GÉANT2), some of the NRENs involved in the EGEE activities, the NREN PC and CERN.

– 2 meetings

• Work mainly focused on:· Improvement of trouble ticket contents

· Improve the assessment of the impact of problems on the Grid.· Monitoring

· Design a solution for the Grid infrastructure

Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 15

Page 16: SA2: Networking Support  Status Report

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Advanced network services GRNET• Collaboration with AMPS team - Advanced Multi-domain

Provisioning System – in order to automate SLA establishment• Development of web interface to manage the EGEE SLA requests

– Store and manage the EGEE users’ SLA requests• ENOC will act on behalf of the user

• The user request is stored into the ENOC• The ENOC validate it and will then use

the AMPS system to make the reservation• AutoBAHN has also been studied

but seems not mature at the moment

Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 16

Page 17: SA2: Networking Support  Status Report

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 17

Trouble ticket exchange standardization GRNET

• GRNET and the ENOC team provide the ENOC with a central server translating NREN’s ticket into standard ticket– Designed and implemented with

the open source software

• Ticket normalization is very important to improve efficiency of project’s wide network operations

• Dissemination was also made through a submission of a RFC about the normalization of the trouble tickets (“The Network Trouble Ticket Data Model”, Internet Draft) http://tools.ietf.org/html/draft-dzis-nwg-nttdm-00

Page 18: SA2: Networking Support  Status Report

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 18• 18

• Analysis of the gLite source code– Using the IPv6 metric (IPv6 code checker) in ETICS– Around 110 bugs on non-compliant function calls in the code reported– This analysis effectively incited developers to work on IPv6

• IPv6 compliance of external gLite dependencies

• A new IPv6 code checker developed by SA2 IPV6 CARE http://sourceforge.net/projects/ipv6-care – It monitors the execution of any programs

- even if you don’t have the source code -and detects networking function calls and provides the diagnosis

• Many informative studies https://twiki.cern.ch/twiki/bin/view/EGEE/IPv6FollowUp

– IPv6 programming method C/C++, Java, Python and Perl / IPv6 testing method gSOAP / Axis / Axis2 / Boost:asio / gridFTP / PythonZSI / PerlSOAPLite

– Assessment of the IPv6 compliance of gLite components: DPM & LFC

• Dissemination: meetings, training session, demonstration, video

IPv6 GARR/CNRS

Page 19: SA2: Networking Support  Status Report

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 19

Current stand on gLite and IPv6

• Full IPv6 compliance – for the production version

• Full IPv6 compliance – for a prototype version

• IPv6 compliance to be tested/verified by SA2 – gLite part of the deployment module claimed to be IPv6 compliant

• IPv6 porting currently on-going

• IPv6 porting plan exist

• No porting plan yet (we are not aware of)

IPv6 compliance

LFC DPM globus-url-copy/gridFTP

BDII(perl)

CREAM

VObox

lcgutils VOMS

PX MON dCache Torque C/S MPIutils

Condorutils AMGA

gfal

FTS

BDII(python) WMproxy/Job submission blah

WMS-server

Page 20: SA2: Networking Support  Status Report

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 20

Middleware Work plan and evolutionBLAH

WMS / WMproxyJob Management

GFALLCG util

FTS

VOMS Client and APIs

Apr 09

VOMS Server

RGMA

Jun 09 Apr 10Aug 09 Oct 09 Dec 09 Feb 10Feb 09

September 2007

March 20090%

20%40%60%80%

100%120%

67% 80%

Non-IPv6 compliant packagesIPv6 compliant packages

• Evolution obtained on the gLite repository of ETICS

• Workplan of JRA1 around IPv6

Page 21: SA2: Networking Support  Status Report

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 2121

Integration of IPv6 within EGEE

21

VOMS

.236 :d

LB

WMS CE WN1

WN2

BDIISE

LFC

PX

.233

.226 .227

.228.232

.231

.234

LB serverVOMS Server

UREC site BD-II

Workload management server

LFC File Catalog

LCG Computing Element

Worker Node (Torque/PBS)

DPM Storage Element

MyProxy server

.229

.235

2001:660:3302:7006::1

Gateway IPv6

:a

:8

:3:4

:5

UIUser Interface

.230 :7

:9

:b

:6

:c

UI2

VOMS2 .59

LB

WMS

DPM1

LFC

.50

.27

.22

.51

LB server

SA2 top level BD-II

RGMA-BDII.24

GARR site BD-II

User Interface

Workload management server

LFC File Catalog

Worker Node (Torque/PBS)

CE WN1

WN2CREAM.23

.56

LCG Computing Element

CREAM Computing Element

Storage Element

BDII

.30

.29

.21

DEV.34

Grid Job monitoring DB

.29.11Gateway

2001:760::159:242/64

IPv4/IPv6 Internet:Renater/GEANT/GARR

GARR/ROME UREC/PARIS

A distributed testbed

• Demonstration of the 2 first dual stack IPv4/IPV6 sites of EGEE at User Forum 09

Grid job over IPv6

Page 22: SA2: Networking Support  Status Report

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Issues & future plans• ENOC

– Issue: lack of monitoring data and troubleshooting tools deployed in the end sites and available for the ENOC

Deployment of PerSONAR-Lite TroubleShooting System SA2 is providing an extra effort to design network monitoring with NREN support Spanish EGEE sites network monitoring

– Issue: impact assessment of trouble ticket Collaboration with NRENs

– Transition toward EGI-NGI• Second step of trouble matching and correlation work • Improvement of trouble ticket standardization software• Advanced network services

– Make the SLA installation procedure more automatic • IPv6

– Integration into EGEE validation process– Testing new gLite IPv6 modules

• Issue: network activity understaffed within the EGI-NGI project

Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 22

Page 23: SA2: Networking Support  Status Report

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

Main achievements

• ENOC running https://ccenoc.in2p3.fr/– First version of the trouble tickets model has been implemented

• WLCG / LHCOPN– Design the OPN operational model

https://twiki.cern.ch/twiki/bin/view/LHCOPN/OperationalModel• Monitoring

– Beta-version of PerfSONAR-Lite TSS• IPv6 https://twiki.cern.ch/twiki/bin/view/EGEE/IPv6FollowUp

– IPv6 care, informative reports– Status and work plan of the middleware– Demonstration of the 2 first dual stack IPv4/IPV6 sites of EGEE

• Trouble ticket exchange standardization – Submission of a RFC, the normalization of the trouble tickets (“The Network

Trouble Ticket Data Model”, Internet Draft)• TNLC

– EGEE 09 - Terena NRENs & Grid joint meeting, Barcelona Sept. 2009

Networking Support – Xavier Jeannin - EGEE-III First Review 24-25 June 2009 23


Recommended