+ All Categories
Home > Documents > Wolfgang von Rüden, CERN, IT DepartmentSeptember 2006 IHEPCCC Meeting CERN Site Report based on the...

Wolfgang von Rüden, CERN, IT DepartmentSeptember 2006 IHEPCCC Meeting CERN Site Report based on the...

Date post: 13-Jan-2016
Category:
Upload: miranda-lindsay-boyd
View: 222 times
Download: 0 times
Share this document with a friend
Popular Tags:
33
Wolfgang von Rüden, CERN, IT Department IHEPCCC Meeting CERN Site Report based on the input from many IT colleagues, with additional information in hidden slides Wolfgang von Rüden IT Department Head, CERN 22 September 2006
Transcript
Page 1: Wolfgang von Rüden, CERN, IT DepartmentSeptember 2006 IHEPCCC Meeting CERN Site Report based on the input from many IT colleagues, with additional information.

Wolfgang von Rüden, CERN, IT Department

September 2006

IHEPCCC Meeting

CERN Site Reportbased on the input from many IT colleagues,

with additional information in hidden slides

Wolfgang von Rüden

IT Department Head, CERN

22 September 2006

Page 2: Wolfgang von Rüden, CERN, IT DepartmentSeptember 2006 IHEPCCC Meeting CERN Site Report based on the input from many IT colleagues, with additional information.

Wolfgang von Rüden, CERN, IT Department

September 2006

General Infrastructure and Networking

Page 3: Wolfgang von Rüden, CERN, IT DepartmentSeptember 2006 IHEPCCC Meeting CERN Site Report based on the input from many IT colleagues, with additional information.

3

Wolfgang von Rüden, CERN, IT Department

September 2006

3

3

Computer Security (IT/DI)• Incident analysis

• 14 compromised computers on average per month in 2006• Mainly due to user actions on Windows PCs, e.g. trojan code installed• Detected by security tools monitoring connections to IRC/botnets• Some Linux systems were compromised by knowledgeable

attacker(s)• Motivation appears to be money earned from controlled computers

• Security improvements in progress• Strengthened computer account policies and procedures • Ports closed in CERN main firewall (http://cern.ch/security/firewall)• Controls networks separated and stronger security policies applied• Logging and traceability extended to better identify cause of incidents• Investigation of intrusion detection at 10Gbps based on netflow data

• What is the policy of other labs concerning high-numbered ports?

Page 4: Wolfgang von Rüden, CERN, IT DepartmentSeptember 2006 IHEPCCC Meeting CERN Site Report based on the input from many IT colleagues, with additional information.

4

Wolfgang von Rüden, CERN, IT Department

September 2006

4

4

Timeline for Security Incidents May 2000 - August 2006

0

50

100

150

200

250

Jan-00 Jul-00 Jan-01 Jul-01 Jan-02 Jul-02 Jan-03 Jul-03 Jan-04 Jul-04 Jan-05 Jul-05 Jan-06 Jul-06

Nu

mb

er

of

inc

ide

nts

Code Red Worm (Webservers)

Suckit Rootkits(LINUX)

Blaster Worm variants

(Windows)

IRC Based Hacker Networks

(ALL platforms)

Non-centrally managed laptops & downloaded

code caused most incidents

Systems exposed in firewall caused most incidents

Change in trend

Compromised Machines

Timeline for Security IncidentsMay 2000 – August 2006

Page 5: Wolfgang von Rüden, CERN, IT DepartmentSeptember 2006 IHEPCCC Meeting CERN Site Report based on the input from many IT colleagues, with additional information.

5

Wolfgang von Rüden, CERN, IT Department

September 2006

5

5

Computing and Network Infrastructure for Controls: CNIC (IT/CO & IT/CS)

• Problem:• Control systems now based on TCP/IP and commercial PCs and devices• PLCs and other controls equipment cannot currently be secured

• Consequences: Control Systems vulnerable to viruses and hacking attacks• Risks: Down-time or physical damage of accelerators and experiments• Constraints:

• Access to control systems by off-site experts is essential• Production systems can only be patched during maintenance periods

• Actions Taken: Set up CNIC Working Group• Establish multiple separate Campus and Controls network domains• Define rules and mechanisms for inter-domain & off-site communications• Define policies for access to and use of Controls networks• Designate persons responsible for controls networks & connected equipment• Define & build suitable management tools for Windows, Linux and Networks• Test security of COTS devices and request corrections from suppliers• Collaborate with organizations & users working on better controls security

• Ref: A 'defence-in-depth' strategy to protect CERN's control systemshttp://cnlart.web.cern.ch/cnlart/2006/001/15

Page 6: Wolfgang von Rüden, CERN, IT DepartmentSeptember 2006 IHEPCCC Meeting CERN Site Report based on the input from many IT colleagues, with additional information.

6

Wolfgang von Rüden, CERN, IT Department

September 2006

6

6

Networking Status (IT/CS)• Internal CERN Network infrastructure progressing on time

• New campus backbone upgraded.• Farm router infrastructure in-place.• New Infrastructure for external connectivity in-place.• CERN internal infrastructure (starpoints) upgrade in progress to provide

better desktop connectivity.• Management tools to improve security control have been developed and

put into production (control of connections between the Campus and the Technical networks). This is part of the CNIC project.

• New firewall infrastructure being developed to improve aggregate bandwidth and integrate into a common management scheme.

• Large parts of the experimental areas and pits now cabled.

• The Dante POP for Geant-2 was installed at CERN towards the end of 2005.

• Current work items• Improved wireless network capabilities being studied for the CERN site.

Page 7: Wolfgang von Rüden, CERN, IT DepartmentSeptember 2006 IHEPCCC Meeting CERN Site Report based on the input from many IT colleagues, with additional information.

7

Wolfgang von Rüden, CERN, IT Department

September 2006

7

7

LHCOPN Status

• LHCOPN links coming online:• Final circuits to 8 Tier-1’s• Remaining 3 due before the end of the year.

• LHCOPN Management• Operations and Monitoring responsibilities being shared

between EGEE (Layer3) and Dante (Layer 1/2)• Transatlantic link contracts passed to USLHCNet (Caltech) to aid

DoE transparency• 3 links to be commissioned this year, Geneva-Chicago, Geneva-

NewYork and Amsterdam-NewYork• Current work items

• Improve management of multiple services across transatlantic links using VCAT/LCAS technology. Being studied by the USLHCNet group.

• Investigate the use of cross border fiber for path redundancy in the OPN.

Page 8: Wolfgang von Rüden, CERN, IT DepartmentSeptember 2006 IHEPCCC Meeting CERN Site Report based on the input from many IT colleagues, with additional information.

Wolfgang von Rüden, CERN, IT Department

September 2006

LHCOPNL2 CIRCUITS

3x10G

2x10G

1x10G

<10G

Bandwidth Managed

Cross Border Fiber

Page 9: Wolfgang von Rüden, CERN, IT DepartmentSeptember 2006 IHEPCCC Meeting CERN Site Report based on the input from many IT colleagues, with additional information.

9

Wolfgang von Rüden, CERN, IT Department

September 2006

9

9

Scientific Linux @ CERN (IT/FIO)• See https://www.scientificlinux.org/distributions/roadmap.

• Many thanks to FNAL for all their hard work.• Support for SL3 ends 31st October 2007

• SLC4• CERN specific version of SL4

• Binary compatible for end user• Adds AFS, tape hardware support, ... required at CERN

• Certified for general use at CERN end-March• Interactive and Batch services available since June• New CPU servers commissioned in October (1MSI2K) will all be

installed with SLC4• Switch of the default from SLC3 to SLC4 foreseen (hoped!) for end

October/November• Depends on availability of EGEE middleware• Will almost certainly be 32-bit

– Too much software not yet 64-bit compatible• SLC5 is low priority

• Could arrive 1Q07 at the earliest and no desire to switch OS just before LHC startup

• But need to start planning for 2008 soon.

Page 10: Wolfgang von Rüden, CERN, IT DepartmentSeptember 2006 IHEPCCC Meeting CERN Site Report based on the input from many IT colleagues, with additional information.

10

Wolfgang von Rüden, CERN, IT Department

September 2006

10

10

Internet Services (IT/IS)

• Possible subjects for HEP-wide coordination• E-mail coordination for anti-spam, attachments, digital

signatures, secure E-mail and common policies for visitors• Single sign on and integration with Grid certificates• Managing Vulnerabilities in Desktop Operating systems and

applications. Policies concerning “root” and “Administrator” rights on Desktop computers. Antivirus and anti-spyware policies

• Common Policies for Web hosting, role of CERN as a “catch-all” web hosting service for small HEP labs, conferences and activities distributed across multiple organizations.

• Desktop Instant messaging and IP telephony? Protocols, integration with email, presence information?

Page 11: Wolfgang von Rüden, CERN, IT DepartmentSeptember 2006 IHEPCCC Meeting CERN Site Report based on the input from many IT colleagues, with additional information.

11

Wolfgang von Rüden, CERN, IT Department

September 2006

11

11

Conference and AV Support (IT/UDS)• Video conferencing services

• HERMES H323 MCU• Joint project with IN2P3 (host), CNRS and INSERM

• VRVS preparing EVO rollout• Seamless Audio/Video conference integration through

SIP (beta test)• SMAC: conference recording (Smart Multimedia Archive

for Conferences)• Joint project with EIF (engineering school) and Uni Fribourg• Pilot in main auditorium

• Video Conference rooms refurbishment• Pilot rooms in B.40: standard (CMS), fully-featured (ATLAS)• 12 more requested before LHC turn-on

• Multimedia Archive Project• Digitisation: Photo / Audio / Video• CDS Storage and Publication

• e.g. http://cdsweb.cern.ch/?c=Audio+Archives

Page 12: Wolfgang von Rüden, CERN, IT DepartmentSeptember 2006 IHEPCCC Meeting CERN Site Report based on the input from many IT colleagues, with additional information.

12

Wolfgang von Rüden, CERN, IT Department

September 2006

12

12

Indico & Invenio Directions (IT/UDS)

• Indico as the “single interface”• Agenda migration virtually complete• VRVS booking done• HERMES and eDial booking soon• CRBS: Physical room booking under study• Invenio for Indico search

• CDS powered by Invenio• Released in collaboration with EPFL• Finishing major code refresh into Python• Flexible output formatting; XML, BibTeX

• RSS feeds; Google Scholar interfacing• In 18 languages (contributions from around the globe)

• Collaborative tools• Baskets, reviewing, commenting

• Document “add on”• Citation extraction linking (SLAC planning to collaborate)• Key-wording (ontology with DESY)

Page 13: Wolfgang von Rüden, CERN, IT DepartmentSeptember 2006 IHEPCCC Meeting CERN Site Report based on the input from many IT colleagues, with additional information.

13

Wolfgang von Rüden, CERN, IT Department

September 2006

13

13

Open Access• Preprints – already wholly OA

• Operational circular 6 (rev 2001) requires every CERN author to submit a copy of their scientific documents to the CERN Document Server (CDS)

• Institutional archive & HEP Subject archive

• Publications• Tripartite Colloquium December 2005: “OA Publishing in Particle

Physics”• Authors, publishers, funding agencies

• Task force (report June 2006)• …to study and develop sustainable business models for

particle physics• Conclude: a significant fraction of particle physics journals

are ready for a rapid transition to OA under a consortium funded sponsoring model

Page 14: Wolfgang von Rüden, CERN, IT DepartmentSeptember 2006 IHEPCCC Meeting CERN Site Report based on the input from many IT colleagues, with additional information.

14

Wolfgang von Rüden, CERN, IT Department

September 2006

14

14

Oracle related issues (IT/DES)

• Serious bug causing logical data corruption (wrong cursor sharing, side effect of new algorithm enabled by default in RDBMS 10.2.0.2)

• LFC and VOMS affected• Problem reported 11 Aug• Workaround in place 21 Aug (with small negative side-

effect)• First pre-patch released 29 Aug• Second pre-patch released 14 Sep• Prod-patch expected any day now

• Support request escalated to highest level• “In one of the most complex parts of the product”• Regular phone conferences with Critical Account Manager

• What to learn:• We feel we got good attention but still took time• Not always good to be on the latest release!

Page 15: Wolfgang von Rüden, CERN, IT DepartmentSeptember 2006 IHEPCCC Meeting CERN Site Report based on the input from many IT colleagues, with additional information.

15

Wolfgang von Rüden, CERN, IT Department

September 2006

15

15

CERN openlab

Concept• Partner/contributor sponsors latest

hardware, software and brainware (young researchers)

• CERN provides experts, test and validation in Grid environment

• Partners: 500’000 €/ year, 3 years• Contributors: 150’000 €, 1 year

Current Activities• Platform competence centre• Grid interoperability centre• Security activities• Joint events

Page 16: Wolfgang von Rüden, CERN, IT DepartmentSeptember 2006 IHEPCCC Meeting CERN Site Report based on the input from many IT colleagues, with additional information.

Wolfgang von Rüden, CERN, IT Department

September 2006

WLCG Update

Page 17: Wolfgang von Rüden, CERN, IT DepartmentSeptember 2006 IHEPCCC Meeting CERN Site Report based on the input from many IT colleagues, with additional information.

les.robertson@cern,ch 17

LCGWLCG depends on two major science grid infrastructures

….EGEE - Enabling Grids for E-ScienceOSG - US Open Science Grid

Page 18: Wolfgang von Rüden, CERN, IT DepartmentSeptember 2006 IHEPCCC Meeting CERN Site Report based on the input from many IT colleagues, with additional information.

les.robertson@cern,ch 18

LCG Grid progress this year Baseline services from the TDR are in operation

Agreement (after much discussion) on VO Boxes.. gLite 3

Basis for startup on EGEE grid Introduced (just) on time for SC4 New Workload Management System - now entering production

Metrics accounting introduced for Tier-1s and CERN (cpu and storage) site availability measurement system introduced – reporting for

Tier-1s & CERN from May job failure analysis

Grid operations All major LCG sites active Daily monitoring and operations now mature – EGEE and OSG –

taken in turn by 5 sites for EGEE Evolution of EGEE regional operations support structure

Page 19: Wolfgang von Rüden, CERN, IT DepartmentSeptember 2006 IHEPCCC Meeting CERN Site Report based on the input from many IT colleagues, with additional information.

les.robertson@cern,ch 21

LCG

Pre-SC4 April tests CERN T1s – SC4 target 1.6 GB/s reached – but only for one day

But – experiment-driven transfers (ATLAS and CMS) sustained 50% of the targetunder much more realistic conditions

CMS transferred a steady 1 PByte/month between Tier-1s & Tier-2s during a 90 day period

ATLAS distributed 1.25 PBytes from CERN during a 6-week period

Data Distribution

1.6 GBytes/sec

0.8 GBytes/sec

Page 20: Wolfgang von Rüden, CERN, IT DepartmentSeptember 2006 IHEPCCC Meeting CERN Site Report based on the input from many IT colleagues, with additional information.

target 88%74% 75%85% 86%

avail: 95% reliability: 95% avail: 69% reliability: 71% avail: 94% reliability: 94%

avail: 69% reliability: 73% avail: 59% reliability: 60% avail: 83% reliability: 83%

avail: 87% reliability: 87% avail: 97% reliability: 97% avail: 4% reliability: 4%

avail: 88% reliability: 88% avail: n/a reliability: 0% avail: n/a reliability: 0%

USCMS-FNAL-WC1

IN2P3-CC

SARA-MATRIX

NDGF

Data from SAM monitoring. Site availability and reliability as agreed in WLCG MB on 11 July 2006 (scheduled interruptions are excluded when calculating reliability)

TRIUMF-LCG2 Taiwan-LCG2

CERN-PROD FZK-LCG2

INFN-T1 RAL-LCG2

scheduled downlegend:

PIC BNL

Availability of WLCG Tier-1 Sites + CERN August 2006

tests passed

average (8 best sites):

reliabilityaverage (all sites):

site average colour coding: < 90% of target ≥ 90% of target ≥ targetavailability

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 310%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

SAM tests fail due to dCache function failure that does not affect CMS jobs. The problem is understood and is being worked on

All sites assumed up while SAM had problems on 1, 3, 4 August

Site not integrated into the Site Availability Monitoring (SAM) system -

not included in overall average

Site not integrated into the Site Availability Monitoring (SAM) system -

not included in overall average

August 2006 • two sites not yet integrated in measurement framework• SC4 target - 88% availability• 10-site average – 74% • best 8 sites average – 85%• reliability (excludes scheduled down time) ~1% higher

Page 21: Wolfgang von Rüden, CERN, IT DepartmentSeptember 2006 IHEPCCC Meeting CERN Site Report based on the input from many IT colleagues, with additional information.

les.robertson@cern,ch 24

LCG Job Reliability Monitoring Ongoing work System to process and analyse job logs implemented for some

of the major activities in ATLAS and CMS Errors identified, frequency reported to developers, TCG

Expect to see results feeding through from development to products in a fairly short time

More impact expected when the new RB enters in full production (old RB is frozen)

Daily report on most important site problems allows the operation team to drill down from site, to computing

elements to worker nodes In use by the end of August

Intention is to report long-termtrends by site, VO

0

0.2

0.4

0.6

0.8

1

1.2

23-May 12-Jun 2-Jul 22-Jul 11-Aug 31-Aug 20-Sep

FNAL

Page 22: Wolfgang von Rüden, CERN, IT DepartmentSeptember 2006 IHEPCCC Meeting CERN Site Report based on the input from many IT colleagues, with additional information.

les.robertson@cern,ch 25

LCG

Commissioning

Schedule2006

2007

2008

SC4 – becomes initial service whenreliability and performance goals met

01jul07 - service commissioned - full 2007 capacity, performance

first physics

Continued testing of computing models, basic services

Testing DAQTier-0 (??) & integrating into DAQTier-0Tier-1data flow

Building up end-user analysis support

Exercising the computing systems, ramping up job rates, data management performance, ….

Initial service commissioning – increase performance, reliability, capacity to target levels, experiencein monitoring, 24 X 7 operation, ….

Introduce residual services Full FTS services; 3D; SRM v2.2; VOMS roles

Page 23: Wolfgang von Rüden, CERN, IT DepartmentSeptember 2006 IHEPCCC Meeting CERN Site Report based on the input from many IT colleagues, with additional information.

les.robertson@cern,ch 26

LCG Challenges and Concerns

Site reliability Achieve MoU targets – with a more comprehensive set of tests Tier-0, Tier-1 and (major) Tier-2 sites Concerns on staffing levels at some sites 24 X 7 operation needs to be planned and tested – will be

problematic at some sites, including CERN, during the first year when unexpected problems have to be resolved

Tier-1s and Tier-2s learning exactly how they will be used Mumbai workshop, Tier-2 workshops Experiment computing model tests Storage, data distribution

Tier-1/Tier-2 interaction Test out data transfer services, network capability Build operational relationships

Mass storage Complex systems difficult to configure Castor 2 not yet fully mature SRM v2.2 to be deployed – and storage classes, policies

implemented by sites 3D Oracle - Phase 2 – sites not yet active/staffed

Page 24: Wolfgang von Rüden, CERN, IT DepartmentSeptember 2006 IHEPCCC Meeting CERN Site Report based on the input from many IT colleagues, with additional information.

les.robertson@cern,ch 27

LCG Challenges and Concerns

Experiment service operation Manpower intensive Interaction with Tier-1s, large Tier-2s Need sustained test load – to verify site and experiment

readiness

Analysis on the Grid is very challenging Overall grow in usage very promising

CMS has the lead with over 13k jobs/day submitted by ~100 users using ~75 sites (July 06)

They will continue to have an impact on and uncover weaknesses in services at all levels

Understanding the CERN Analysis Facility DAQ testing looks late

the Tier-0 needs time to react to any unexpected requirements and problems

Page 25: Wolfgang von Rüden, CERN, IT DepartmentSeptember 2006 IHEPCCC Meeting CERN Site Report based on the input from many IT colleagues, with additional information.

Wolfgang von Rüden, CERN, IT Department

September 2006

Tier0 Update

Page 26: Wolfgang von Rüden, CERN, IT DepartmentSeptember 2006 IHEPCCC Meeting CERN Site Report based on the input from many IT colleagues, with additional information.

les.robertson@cern,ch 29

LCGCERN Fabric progress this

yearCERN Fabric Tier-0 testing has progressed well

Artificial system tests .. and ATLAS Tier-0 testing at full throughput

Comfortable that target data rates, throughput can be met .. Including CASTOR 2

But DAQ systems not yet integrated in these tests CERN Analysis Facility (CAF)

Testing of experiment approaches to this have started only in the past few months

Includes PROOF evaluation by ALICE Much has still to be understood Essential to maintain Tier-0/CAF flexibility for hardware during

early years CASTOR 2

Performance is largely understood Stability and the ability to maintain a 24 X 365 service is now the

main issue

Page 27: Wolfgang von Rüden, CERN, IT DepartmentSeptember 2006 IHEPCCC Meeting CERN Site Report based on the input from many IT colleagues, with additional information.

30

Wolfgang von Rüden, CERN, IT Department

September 2006

30

30

CERN Tier0 Summary (IT/FIO)• Infrastructure

• A difficult year for cooling but the (long delayed) upgrade to the air conditioning system is now complete.

• The upgrade to the electrical infrastructure should be complete in early 2007 with the installation of an additional 2.4MW of UPS capacity

• No spare UPS capacity for physics services until then; the additional UPS systems are required before we install the hardware foreseen for 2007.

• Looking now at possible future computer centre as rise in power demand for computing systems seems inexorable—demand likely to exceed current 2.5MW limit by 2009/10.

• Water cooled racks as installed at the experiments seem to be more cost-effective than air cooling.

• Procurement• We have evaluated tape robots from IBM and STK and also their high-end tape

drives over the past 9 months.• Re-use of media means high-end drives are more cost-effective over a 5 year period.• Good performance seen from equipment from both vendors

• CPU and Disk server procurement continues with regular calls for tender• Long time between start of process and equipment delivery remains, but process is

well established

Page 28: Wolfgang von Rüden, CERN, IT DepartmentSeptember 2006 IHEPCCC Meeting CERN Site Report based on the input from many IT colleagues, with additional information.

31

Wolfgang von Rüden, CERN, IT Department

September 2006

31

31

Tier0, suite

• Readiness for LHC Production• Castor2 now seems on track

• All LHC experiments fully migrated to Castor2• Meeting testing milestones [images to show on next slides]• Still some development required, but effort is now focussed on the known

problem areas as opposed to firefighting.• Grid services now integrated with other production services

• Service Dashboard at https://cern.ch/twiki/bin/view/LCG/WlcgScDash shows readiness of services for production operation

– Significant improvement in readiness over past 9 months. [see later for image]

• Now a single daily meeting for all T0/T1 services• Still concerns over possible requirement for 24x7 support by engineers

• Many problems still cannot be debugged by on-call technicians• For data distribution, full problem resolution is likely to require contact with

remote site.

Page 29: Wolfgang von Rüden, CERN, IT DepartmentSeptember 2006 IHEPCCC Meeting CERN Site Report based on the input from many IT colleagues, with additional information.

35

Wolfgang von Rüden, CERN, IT Department

September 2006

35

35

Grid Service dashboard

Page 30: Wolfgang von Rüden, CERN, IT DepartmentSeptember 2006 IHEPCCC Meeting CERN Site Report based on the input from many IT colleagues, with additional information.

36

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

The EGEE project• Phase 1

– 1 April 2004 – 31 March 2006 71 partners in 27 countries

(~32M€ funding from EU)

• Phase 2– 1 April 2006 – 31 March 2008

91 partners in 32 countries(~37M€ EU funding)

• Status– Large-scale, production-quality

grid infrastructure in use by HEPand other sciences(~190 sites, 30,000 jobs/day)

– gLite3.0 Grid middlewaredeployed

EGEE provides essential support to the LCG project

Page 31: Wolfgang von Rüden, CERN, IT DepartmentSeptember 2006 IHEPCCC Meeting CERN Site Report based on the input from many IT colleagues, with additional information.

37

Wolfgang von Rüden, CERN, IT Department

September 2006

37

37

EU projects related to EGEE

Page 32: Wolfgang von Rüden, CERN, IT DepartmentSeptember 2006 IHEPCCC Meeting CERN Site Report based on the input from many IT colleagues, with additional information.

39

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Sustainability: Beyond EGEE-II

• Need to prepare for permanent Grid infrastructure– Production usage of grid infrastructure requires long-term

planning– Ensure a reliable and adaptive support for all sciences– Independent of short project cycles– Modelled on success of GÉANT

Infrastructure managed in collaboration with national grid initiatives

Page 33: Wolfgang von Rüden, CERN, IT DepartmentSeptember 2006 IHEPCCC Meeting CERN Site Report based on the input from many IT colleagues, with additional information.

40

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

EGEE’06 Conference

• EGEE’06 – Capitalising on e-infrastructures – Keynotes on state-of-the-art and real-world use– Dedicated business track– Demos and business/industry exhibition– Involvement of international community

• 25-29 September 2006• Geneva, Switzerland, organised by CERN• http://www.eu-egee.org/egee06


Recommended