+ All Categories
Home > Documents > Report from WG2

Report from WG2

Date post: 01-Jan-2016
Category:
Upload: yolanda-terrell
View: 29 times
Download: 0 times
Share this document with a friend
Description:
Report from WG2. Andrea Sciabà. WG2 areas. Support tools Ticketing tools Accounting tools Request trackers Administration tools Underlying services Messaging services Information services WLCG operations and procedures. Support tools. Overview - PowerPoint PPT Presentation
Popular Tags:
20
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/ DB ES Andrea Sciabà Report from WG2 Andrea Sciabà
Transcript
Page 1: Report from WG2

Experiment Support

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

DBES

Andrea Sciabà

Report from WG2

Andrea Sciabà

Page 2: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

2Andrea Sciabà

WG2 areas

• Support tools– Ticketing tools– Accounting tools– Request trackers– Administration tools

• Underlying services– Messaging services– Information services

• WLCG operations and procedures

Page 3: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

3Andrea Sciabà

Support tools

• Overview– Tools mostly developed by other projects (OSG,

EGEE, EGI…)– WLCG heavily influenced their development– Rather mature by now

Page 4: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

4Andrea Sciabà

Technology and tools

• GGUS• Savannah• TRAC• JIRA• GOCDB• OIM• EGI operations portal

Page 5: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

5Andrea Sciabà

Ticketing tools and request trackers (1/2)• GGUS

– Used by all 4 experiments for incident reporting

• Savannah– Used by ATLAS, CMS, LHCb for internal investigation

before bridging incidents to GGUS (CMS) or to other trackers (ATLAS) for development and/or release management (LHCb)

• TRAC and JIRA– Used by some experiments (as CMS) as development

trackers but supporters make it available ‘as is’ so required improvements (e.g. on performance) are done on a best-effort basis

Page 6: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

6Andrea Sciabà

Ticketing tools and request trackers (2/2)

• Areas of improvement– GGUS

• Some external interfaces periodically break• Ensure continuous availability

– Savannah• Improve integration with other systems

– TRAC / JIRA• Experiments would like them to be officially supporte

• Areas of potential efficiency gains– GGUS: better reporting to avoid information repetition in multiple

meetings

• Largest use of operational effort• Missing areas

– Savannah future incertain

Page 7: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

7Andrea Sciabà

Accounting tools (1/2)

• Overview of technology and tools– APEL, Gratia, SGAS, DGAS

• APEL receives CPU accounting data from its clients and the other accounting systems

• Provides a single database of WLCG accounting data (~ 1 G jobs since 2004)

– EGI Accounting Portal• Provides summaries by site/month/VO/user/FQAN and

data can be plotted and downloaded• Authorisation to see data on users depends on role

– SAM/Nagios used to check that sites publish data and if this is published centrally

Page 8: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

8Andrea Sciabà

Accounting tools (2/2)

• Areas of improvement– Benchmarking: published data not reliable– SAM tests for accounting data publication do not check the

total of all batch systems, hence missing info may pass unnoticed

– Storage accounting: development of a portal under way in EMI; non-EMI SEs will have to provide data in the correct format

– Evolve Accounting Portal API in a full RESTful interface

• Areas of potential efficiency gains– Improved reliability from the redevelopment of the messaging

infrastructure; messaging used also by Gratia, etc.

• Largest use of operational effort– Not reported

• Missing areas– Not reported

Page 9: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

9Andrea Sciabà

Administration tools (1/2)

• Overview– GOCDB and EGI Operations portal provide

several critical functionalities• Information repository for all EGI sites and VOs• Downtime publication• Broadcasts

– GOCDB has a programmatic interface used to get info about registered sites, services and downtimes

– OIM provides very similar functionality for OSG

Page 10: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

10Andrea Sciabà

Administration tools (2/2)

• Areas of improvement– More updated info in GOCDB– Supported VOs

• Areas of potential efficiency gains– Seamless integration of GOCDB and OIM– Smarter and more reliable downtime notifications– Easier definition of new service types

• Largest use of operational effort– None identified

• Missing areas– A way to publish experiment news to a portal (similar

to the CERN IT Status Board)

Page 11: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

11Andrea Sciabà

Underlying services

• Overview– Messaging system and the information system

• Both developed by WLCG

– Will have to include batch systems as well

Page 12: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

12Andrea Sciabà

Technology and tools

• Active-MQ MSG system• BDII• GLUE• LDAP

Page 13: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

13Andrea Sciabà

Messaging system (1/2)

• Overview– Operated by EGI: two brokers at CERN, one at

AUTH and one at SRCE– Two more broker services at CERN for testing

and validation, one for ATLAS/DDM, one for IT-ES (each consisting of 2 prod and 1 test broker)

– Used by several applications• APEL• SAM• Ganga/DIANE monitoring• LFC catalogue synchronisation (EMI prototype)• ATLAS/DDM tracer service (prototye)• FTS monitoring

Page 14: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

14Andrea Sciabà

Messaging system (2/2)

• Areas of improvement– Security– scalability

• Areas of potential efficiency gains– Improve availability and reliability: now the

service must be stopped during some interventions

• Largest use of operational effort– None identified

• Missing areas– None identified

Page 15: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

15Andrea Sciabà

Information services (1/2)

• Overview– Covers several use cases

• Service discovery• Installed software• Storage capacity and accounting• Batch system queue status• Configuration• Installed capacity

– Fully distributed, hierarchical set of BDIIs, based on OpenLDAP

– Implements GLUE schema– Information providers generate the service

information

Page 16: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

16Andrea Sciabà

Information services (2/2)

• Areas of improvement– Stability: service info is prone to disappear, bad because use cases

shifted towards needing more stability– Information validity: info provider info very fragile, configuration very

error prone– Better policies for resource publication– Lower latency for dynamic information

• Areas of potential efficiency gains– Better validation tools– Accurate storage information would make storage accounting a lot

easier– Provide more powerful and user-friendly client tools

• Largest use of operational effort– Configuration and validation of information– Debugging IS problems for users and sites

• Missing areas– A continuous certification and auditing of the BDII information by WLCG

Page 17: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

17Andrea Sciabà

WLCG operations (1/4)

• Overview– Goals are:

• Efficient communication• Quick resolution of issues according to agreed targets• Coordination and decision• Well defined procedures

– Describes roles, bodies, communication channels and procedures

– Lots of experience accumulated– Quality is good but still manpower intensive– No visible decrease of incidents

Page 18: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

18Andrea Sciabà

WLCG operations (2/4)

• Technology and tools (so to speak…)– Daily meeting– Tier-1 service coordination meeting– GDB

• Roles and bodies– Security, information, data management officers– Site administrators– Site security officers– Experiment contact persons

Page 19: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

19Andrea Sciabà

WLCG operations (3/4)

• Procedures and policies– Scheduling downtimes

• Well defined rules to declare them

– Problem handling• Little in terms of formal procedures, issues and

incidents are handled and discussed in the daily meeting and the T1SCM

• SIR for major incidents are an essential tool• GDB also useful to discuss issues at a general level

Page 20: Report from WG2

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

20Andrea Sciabà

WLCG operations (4/4)

• Areas of improvement– Sometimes the strength of the link between an

experiment and a site is not enough• The very need of site contacts can be seen as an issue…

– Improve communication of the experiment requirements to the sites (e.g. via VO cards)

• Areas of potential efficiency gains– To have a real WLCG operations team: now

experiments do most the computing operations– A better communication channel for the Tier-2’s (now

only the GDB)

• Largest use of operational effort• Missing areas


Recommended