LCG Service Challenges Plan
Jamie Shiers, CERN-IT-GD
22 March 2005
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
Introduction
Draws heavily on draft Service Challenge chapter for the LCG TDR
And recent presentations by various people (POB, GDB, SC, …)
Some high level (but worrying…) comments on SC1 and SC2
Main focus is planning for future challenges (mainly SC3…)
Main difference wrt previous presentations is emphasis on Main difference wrt previous presentations is emphasis on different aspects of the challengesdifferent aspects of the challenges
i.e. not just raw transfer rates…
Draft milestones presented at February SC meeting well understood
Planning is an iterative, continuous process…
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
Key Principles
Service challenges results in a series of services that exist in parallel with baseline production service
Rapidly and successively approach production needs of LHC
Initial focus: core (data management) services
Swiftly expand out to cover full spectrum of production and analysis chain
Must be as realistic as possible, including end-end testing of key experiment use-cases over extended periods with recovery from glitches and longer-term outages
Necessary resources and commitment pre-requisite to success!
Should not be under-estimated!
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
Overall Schedule
SC2SC3
LHC Service OperationFull physics run
2005 20072006 2008
First physicsFirst beams
cosmics
June05 - Technical Design Report
Sep05 - SC3 Service Phase
May06 –SC4 Service Phase starts
Sep06 – Initial LHC Service in stable operation
SC4
Apr07 – LHC Service commissioned
Apr05 – SC2 Complete
Jul05 – SC3 Throughput Test
Apr06 – SC4 Throughput Test
Dec05 – Tier-1 Network operational
preparationsetupservice
SC2SC2SC3SC3
LHC Service OperationLHC Service OperationFull physics run
2005 20072006 2008
First physicsFirst beams
cosmicsFull physics run
2005 20072006 20082005 20072006 2008
First physicsFirst beams
cosmics
June05 - Technical Design Report
Sep05 - SC3 Service Phase
May06 –SC4 Service Phase starts
Sep06 – Initial LHC Service in stable operation
SC4SC4
Apr07 – LHC Service commissioned
Apr05 – SC2 Complete
Jul05 – SC3 Throughput Test
Apr06 – SC4 Throughput Test
Dec05 – Tier-1 Network operational
preparationsetupservice
preparationsetupservice
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
SC1 / SC2
SC1 did not successfully complete its goals Dec04 - Service Challenge I complete
mass store (disk) - mass store (disk) 3 T1s (Lyon, Amsterdam, Chicago) (others also participated…) 500 MB/sec (individually and aggregate) 2 weeks sustained Software; GridFTP plus some scripts
We did not meet the milestone of 500MB/s for 2 weeks We need to do these challenges to see what actually goes wrong
A lot of things do, and did, go wrong We need better test plans for validating the infrastructure before the
challenges (network throughput, disk speeds, etc…) SC2: Mar05 - Service Challenge II should be complete
Software: reliable file transfer service mass store (disk) - mass store (disk), BNL, CNAF, FNAL, FZK, IN2P3, NIKHEF, RAL (more than originally
planned) 100MB/s per T1, 500MB/s aggregate out of CERN, push 2+ sites to ~500MB/s 1 month sustained
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
SC2 – Current Status
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
SC1/2 - Conclusions
Setting up the infrastructure and achieving reliable transfers, even at much lower data rates than needed for LHC, is complex and requires a lot of technical work + coordination
Even within one site – people are working very hard & are stressed. Stressed people do not work at their best. Far from clear how this scales to SC3/SC4, let alone to LHC production phase
Compound this with the multi-site / multi-partner issue, together with time zones etc and you have a large “non-technical” component to an already tough problem (example of technical problem follows…)
But… the end point is fixed (time + functionality)
We should be careful not to over-complicate the problem or potential solutions
And not forget there is still a humungous amount to do…
(much much more than we’ve done…)
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
SC2 – Sample Network Problem
Q: Since Tuesday 15 of March at 1pm (more or less) we are experiencing very low performance for data stream originated at Karlsruhe (192.108.46.0/24) and at CNAF (192.135.23.0/24) and reaching CERN (192.16.160.0/24) via the backup 10G connection: data flow that were running at 1Gbps now are at 5-10Mbps from Karlsruhe and ~500Mbps from CNAF. There is no problem on the other way (from CERN to the two locations). (Reported March 17)
A: Following tests between our NOC and DFN, a faulty connection on the patch panel has been detected as the origin of these errors. It has been fixed and no errors appear now. (Solved March 21)
Who / how is this going to be debugged (and fixed) during Service Phase at 00:15 Sunday morning when expt framework detects something is wrong?
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
SC2 – “Production problems”
For well understood reasons, we are currently trying to move away from ‘R&D’ style h/w (machines + network)
We need a stop-gap solution for SC2 and another for SC3
We are along way from “production mentality”
Don’t play with things whilst we’re running Don’t play with things whilst we’re running production unless change request signed in blood.production unless change request signed in blood.
All these issues can – and will – be solved but doesn’t any feel a case of Deju Vu?
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
(Some) Meetings
SC2 Meetings Phone Conference Monday Afternoon Daily call with different sites
Post SC2 Meetings Tuesday 4pm (starting April)
Storage Management Workshop 5-7 April @ CERN
T0/T1 network meeting 8 April in NIKHEF/SARA
Pre-SC3 Meeting June 13th – 15th at CERN
Taipei SC Meeting 26th April
Hepix @FZK 9-13 May
There are also others:• Weekly SC meetings at CERN;• T2 discussions with UK
• next meeting: June in Glasgow• T2 INFN workshop in Bari
• 26-27 May• T1 site visits
• PIC / BNL / FNAL / Triumf / ASCC• etc. etc.
IMHO: meetings are needed but not everyone needs to go to all.Nor do we need same style. I want “stand-up” or “face-to-face”
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
SC3 Preparation Workshop
This (proposed) workshop will focus on very detailed technical planning for the whole SC3 exercise.
It is intended to be as interactive as possible, i.e. not presentations to an audience largely in a different (wireless) world.
There will be sessions devoted to specific experiment issues, Tier1 issues, Tier2 issues as well as the general service infrastructure.
Planning for SC3 has already started and will continue prior to the workshop.
This is an opportunity to get together to iron out concerns and issues that cannot easily be solved by e-mail, phone conferences and/or other meetings prior to the workshop.
Is there a better way to do it? Better time?
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
SC3 on
SC3 is significantly more complex than previous challenges It includes experiments s/w, additional m/w, Tier2s etc
Proving we can transfer dummy files from A-B proves nothing Obviously need to show that basic infrastructure works…
Preparation for SC3 includes: Understanding experiments’ Computing Models Agreeing involvement of experiments’ production teams Visiting all (involved) Tier1s (multiple times) Preparing for the involvement of 50-100 Tier2s
Short of resources at all levels: “Managerial” – discussing with experiments and Tier1s (visiting) “Organizational” – milestones, meetings, workshops, … “Technical” – preparing challenges and running CERN end – 24 x
7 ???
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
Prepare for the next service challenge (SC3) -- in parallel with SC2 (reliable file transfer) –
Build up 1 GByte/s challenge facility at CERN The current 500 MByte/s facility used for SC2 will become the testbed from
April onwards (10 ftp servers, 10 disk servers, network equipment)
Build up infrastructure at each external centre Average capability ~150 MB/sec at a Tier-1 (to be agreed with each T-1)
Further develop reliable transfer framework software Include catalogues, include VO’s
2005 Q1 - SC3 preparation
Full physics run
2005 20072006 2008
First physicsFirst beams
cosmicsSC3SC2
disk-network-disk bandwidths
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
SC3 - 50% service infrastructure Same T1s as in SC2 (Fermi, NIKHEF/SARA, GridKa, RAL, CNAF, CCIN2P3) Add at least two T2s “50%” means approximately 50% of the nominal rate of ATLAS+CMS
Using the 1 GByte/s challenge facility at CERN - Disk at T0 to tape at all T1 sites at 60 Mbyte/s Data recording at T0 from same disk buffers Moderate traffic disk-disk between T1s and T2s
Use ATLAS and CMS files, reconstruction, ESD skimming codes(numbers to be worked out when the models are published)
Goal - 1 month sustained service in July 500 MBytes/s aggregate at CERN, 60 MBytes/s at each T1 end-to-end data flow peaks at least a factor of two at T1s network bandwidth peaks ??
2005 Q2-3 - SC3 challenge
SC3SC2
Full physics run
2005 20072006 2008
First physicsFirst beams
cosmics
tape-network-disk
bandwidths
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
2005 Q2-3 - SC3 additional
centres
In parallel with SC3 prepare additional centres using the 500 MByte/s test facility
Test Taipei, Vancouver, Brookhaven, additional Tier-2s
Further develop framework software Catalogues, VO’s, use experiment specific solutions
SC2SC3
Full physics run
2005 20072006 2008
First physicsFirst beams
cosmics
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
2005 Sep-Dec - SC3 Service
50% Computing Model Validation Period
The service exercised in SC3 is made available to experiments as a stable, permanent service for computing model tests
Additional sites are added as they come up to speed
End-to-end sustained data rates – 500 Mbytes/s at CERN (aggregate) 60 Mbytes/s at Tier-1s Modest Tier-2 traffic
SC2SC3
Full physics run
2005 20072006 2008
First physicsFirst beams
cosmics
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
SC3 – Milestone Decomposition
File transfer goals: Build up disk – disk transfer speeds to 150MB/s
SC2 was 100MB/s – agreed by site Include tape – transfer speeds of 60MB/s
Tier1 goals: Bring in additional Tier1 sites wrt SC2
PIC and Nordic most likely added later: SC4?
Tier2 goals: Start to bring Tier2 sites into challenge
Agree services T2s offer / require On-going plan (more later) to address this via GridPP, INFN etc.
Experiment goals: Address main offline use cases except those related to analysis
i.e. real data flow out of T0-T1-T2; simulation in from T2-T1
Service goals: Include CPU (to generate files) and storage Start to add additional components
Catalogs, VOs, experiment-specific solutions etc, 3D involvement, … Choice of software components, validation, fallback, …
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
SC3 – Milestone Decomposition
File transfer goals: Build up disk – disk transfer speeds to 150MB/s
SC2 was 100MB/s – agreed by site Include tape – transfer speeds of 60MB/s
Tier1 goals: Bring in additional Tier1 sites wrt SC2
PIC and Nordic most likely added later: SC4?
Tier2 goals: Start to bring Tier2 sites into challenge
Agree services T2s offer / require On-going plan (more later) to address this via GridPP, INFN etc.
Experiment goals: Address main offline use cases except those related to analysis
i.e. real data flow out of T0-T1-T2; simulation in from T2-T1
Service goals: Include CPU (to generate files) and storage Start to add additional components
Catalogs, VOs, experiment-specific solutions etc, 3D involvement, … Choice of software components, validation, fallback, …
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
SC3 – Experiment Goals
Meetings on-going to discuss goals of SC3 and experiment involvement
Focus on: First demonstrate robust infrastructure; Add ‘simulated’ experiment-specific usage patterns; Add experiment-specific components; Run experiments offline frameworks but don’t preserve data;
Exercise primary Use Cases except analysis (SC4) Service phase: data is preserved…
Has significant implications on resources beyond file transfer services Storage; CPU; Network… Both at CERN and participating sites (T1/T2) May have different partners for experiment-specific tests (e.g. not all T1s)
In effect, experiments’ usage of SC during service phase = data challenge
Must be exceedingly clear on goals / responsibilities during each phase!
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
SC3 – Experiment Involvement
Cont.
Regular discussions with experiments have started ATLAS: at DM meetings ALICE+CMS: every ~2 weeks LHCb: no regular slot yet, but discussions started…
Anticipate to start first with ALICE and CMS (exactly when TDB) ATLAS and LHCb around October
T2 sites being identified in common with these experiments More later…
List of experiment-specific components and the sites where they need to be deployed being drawn up
Need this on April timeframe for adequate preparation & testing
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
ALICE vs. LCG Service
Challenges
We could: Sample (bunches of) “RAW” events stored at T0 from our
Catalogue Reconstruct at T0 Ship from T0 to T1’s Reconstruct at T1 with calibration data Store/Catalogue the output
As soon as T2’s start to join SC3: Keep going with reconstruction steps
+ Simulate events at T2’s Ship from T2 to T1’s Reconstruct at T1’s and store/catalogue the output
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
ALICE vs. LCG Service Challenges
What would we need for SC3? AliRoot deployed on LCG/SC3 sites - ALICE Our AliEn server with: - ALICE
task queue for SC3 jobs catalogue to sample existing MC events and mimic raw data generation
from DAQ UI(s) for submission to LCG/SC3 - LCG WMS + CE/SE Services on SC3 - LCG Appropriate amount of storage resources - LCG Appropriate JDL files for the different tasks - ALICE Access to the ALICE AliEn Data Catalogue from LCG
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
ALICE vs. LCG Service
Challenges
Last step: Try the analysis of reconstructed data
That is SC4: We have some more time to think about it
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
ATLAS & SC3
July: SC3 phase 1 Infrastructure performance demonstration Little direct ATLAS involvement ATLAS observes performance of components and services
to guide future adoption decisions
Input from ATLAS database group
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
ATLAS & SC3
September: SC3 phase 2 Experiment testing of computing model Running
'real' software production and data management systems but working with throw-away data. ATLAS production involvement Release 11 scaling debugging Debug scaling for distributed conditions data access,
calibration/alignment, DDM, event data distribution and discovery
T0 exercise testing
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
ATLAS & SC3
Mid-October: SC3 phase 3 Service phase: becomes a production facility Production, adding more T2s Operations at SC3 scale producing and distributing useful
data New DDM system deployed and operating Conduct distributed calibration/align scaling test Conduct T0 exercise Progressively integrate new tier centers into DDM
system. After T0 exercise move to steady state operations for T0
processing and data handling workflows
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
CMS & SC3
Would like to use SC3 ‘service’ asap
Are already shipping data around & processing it as foreseen in SC3 service phase
More details in e.g. FNAL SC presentation Next…
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
CMS & Service Challenge III
Service Challenge III - stated Goal by LCG: Achieve 50% of the nominal data rate Evaluating tape capacity to achieve challenge while still supporting
the experiment Add significant degrees of additional complexity file catalogs
OSG currently has only experiment supported catalogs, CMS has a prototype data management system at the same time, so exactly what this means within the experiment needs to be understood
VO management software Anxious to reconcile VO management infrastucture in time for
the challenge. Within OSG we have started using some of the more advanced functionality of VOMS that we would clearly like to maintain.
Uses the CMS’s offline frameworks to generate the data and drive the data movement.
Requirements are still under negotiation
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
LHCb
Would like to see robust infrastructure before getting involved
Timescale: October 2005
Expecting LCG to provide somewhat more than other experiments
i.e. higher level functionality that file transfer service
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
Experiment plans - Summary
SC3 phases Setup and config - July + August Experiment software with throwaway data - September Service phase
ATLAS – Mid October ALICE – July would be best… LHCb – post-October CMS – July (or sooner)
Tier-0 exercise Distribution to Tier-1 …
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
SC3 – Milestone Decomposition
File transfer goals: Build up disk – disk transfer speeds to 150MB/s
SC2 was 100MB/s – agreed by site Include tape – transfer speeds of 60MB/s
Tier1 goals: Bring in additional Tier1 sites wrt SC2
PIC and Nordic most likely added later: SC4?
Tier2 goals: Start to bring Tier2 sites into challenge
Agree services T2s offer / require On-going plan (more later) to address this via GridPP, INFN etc.
Experiment goals: Address main offline use cases except those related to analysis
i.e. real data flow out of T0-T1-T2; simulation in from T2-T1
Service goals: Include CPU (to generate files) and storage Start to add additional components
Catalogs, VOs, experiment-specific solutions etc, 3D involvement, … Choice of software components, validation, fallback, …
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
A Simple T2 Model
N.B. this may vary from region to region
Each T2 is configured to upload MC data to and download data via a given T1
In case the T1 is logical unavailable, wait and retry MC production might eventually stall
For data download, retrieve via alternate route / T1 Which may well be at lower speed, but hopefully rare
Data residing at a T1 other than ‘preferred’ T1 is transparently delivered through appropriate network route
T1s are expected to have at least as good interconnectivity as to T0
Each Tier-2 is associated with a Tier-1 who is responsible for getting them set up
Services at T2 are managed storage and reliable file transfer DB component at T1; user agent also at T2
1GBit network connectivity – shared (less will suffice to start with, more maybe needed!)
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
Prime Tier-2 sites
For SC3 we aim for DESY FZK (CMS + ATLAS) Lancaster RAL (ATLAS) London RAL (CMS) Scotgrid RAL (LHCb) Torino CNAF (ALICE) US sites FNAL (CMS)
Responsibility between T1 and T2 (+ experiments) CERN’s role limited
Develop a manual “how to connect as a T2” Provide relevant s/w + installation guides Assist in workshops, training etc.
Other interested parties: Prague, Warsaw, Moscow, .. Also attacking larger scale problem through national / regional
bodies GridPP, INFN, HEPiX, US-ATLAS, US-CMS
Site Tier1 Experiment
Bari, Italy CNAF, Italy CMS
Turin, Italy CNAF, Italy Alice
DESY, Germany FZK, Germany ATLAS, CMS
Lancaster, UK RAL, UK ATLAS
London, UK RAL, UK CMS
ScotGrid, UK RAL, UK LHCb
US Tier2s BNL / FNAL ATLAS / CMS
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
Tier2 Region Coordinating Body Comments
Italy INFN A workshop is foreseen for May during which hands-on training on the Disk Pool Manager and File Transfer components will be held.
UK GridPP A coordinated effort to setup managed storage and File Transfer services is being managed through GridPP and monitored via the GridPP T2 deployment board.
Asia-Pacific ASCC Taipei The services offered by and to Tier2 sites will be exposed, together with a basic model for Tier2 sites at the Service Challenge meeting held at ASCC in April 2005.
Europe HEPiX A similar activity will take place at HEPiX at FZK in May 2005, together with detailed technical presentations on the relevant software components.
US US-ATLAS and US-CMS Tier2 activities in the US are being coordinated through the corresponding experiment bodies.
Canada Triumf A Tier2 workshop will be held around the time of the Service Challenge meeting to be held in Triumf in November 2005.
Other sites CERN One or more workshops will be held to cover those Tier2 sites with no obvious regional or other coordinating body, most likely end 2005 / early 2006.
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
SC3 – Milestone Decomposition
File transfer goals: Build up disk – disk transfer speeds to 150MB/s
SC2 was 100MB/s – agreed by site Include tape – transfer speeds of 60MB/s
Tier1 goals: Bring in additional Tier1 sites wrt SC2
PIC and Nordic most likely added later: SC4?
Tier2 goals: Start to bring Tier2 sites into challenge
Agree services T2s offer / require On-going plan (more later) to address this via GridPP, INFN etc.
Experiment goals: Address main offline use cases except those related to analysis
i.e. real data flow out of T0-T1-T2; simulation in from T2-T1
Service goals: Include CPU (to generate files) and storage Start to add additional components
Catalogs, VOs, experiment-specific solutions etc, 3D involvement, … Choice of software components, validation, fallback, …
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
Service Goals
Expect relatively modest increase in ‘service’ components
File catalog based on agreement from Baseline Services WG
Other services agreed by BSWG
Experiment-specific components and impact on other services, e.g. Distributed Database Services, need to be clarified as soon as possible
Similarly, requirements for processing power and storage at all sites involved (T0, T1, T2)
(This is for both Service and Challenge phases: where we run the experiments’ s/w and store the output!)
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
SC3 – Milestone Decomposition
File transfer goals: Build up disk – disk transfer speeds to 150MB/s
SC2 was 100MB/s – agreed by site Include tape – transfer speeds of 60MB/s
Tier1 goals: Bring in additional Tier1 sites wrt SC2
PIC and Nordic most likely added later: SC4?
Tier2 goals: Start to bring Tier2 sites into challenge
Agree services T2s offer / require On-going plan (more later) to address this via GridPP, INFN etc.
Experiment goals: Address main offline use cases except those related to analysis
i.e. real data flow out of T0-T1-T2; simulation in from T2-T1
Service goals: Include CPU (to generate files) and storage Start to add additional components
Catalogs, VOs, experiment-specific solutions etc, 3D involvement, … Choice of software components, validation, fallback, …
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
Ramp-up of FTS and T1s
As expected, this is proving to be a significant amount of work!
Need to keep up momentum:
Increasing file transfer performance; Adding tape; Building up network links; Bringing additional T1s into the challenges…
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
File Transfer Milestones
Need to understand how infrastructure at CERN will build up to address file transfer goals of SC3
Internal / External network infrastructure File transfer Server nodes Access to Experiment files
For each Tier1 need to understand how local infrastructure and network connectivity will be build up to meet requirements
For Tier2s, assume “Minimal Usable Model” (see earlier slide)…
Assume DB for FTS will run at T1 with “client” also at T2 Transfers can be initiated from both ends
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
2005 Sep-Dec - SC4 preparation
In parallel with the SC3 model validation period,in preparation for the first 2006 service challenge (SC4) –
Using 500 MByte/s test facility test PIC and Nordic T1s and T2’s that are ready (Prague, LAL, UK, INFN, ..)
Build up the production facility at CERN to 3.6 GBytes/s
Expand the capability at all Tier-1s to full nominal data rate
SC2SC3
SC4 Full physics run
2005 20072006 2008
First physicsFirst beams
cosmics
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
2006 Jan-Aug - SC4
SC4 – full computing model services - Tier-0, ALL Tier-1s, all major Tier-2s operational at full target data rates (~2 GB/sec at Tier-0)- acquisition - reconstruction - recording – distribution, PLUS ESD skimming, servicing Tier-2s
Goal – stable test service for one month – April 2006
100% Computing Model Validation Period (May-August 2006)
Tier-0/1/2 full model test - All experiments- 100% nominal data rate, with processing load scaled to 2006 cpus
SC2SC3
SC4 Full physics run
2005 20072006 2008
First physicsFirst beams
cosmics
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
2006 Sep – LHC service available
The SC4 service becomes the permanent LHC service – available for experiments’ testing, commissioning, processing of cosmic data, etc.
All centres ramp-up to capacity needed at LHC startup TWICE nominal performance Milestone to demonstrate this 3 months before first physics
data April 2007
SC2SC3
SC4LHC Service Operation
Full physics run
2005 20072006 2008
First physicsFirst beams
cosmics
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
Key dates for Connectivity
SC2SC3
SC4LHC Service Operation
Full physics run
2005 20072006 2008
First physicsFirst beams
cosmics
June05 - Technical Design Report Credibility Review by LHCC
Sep05 - SC3 Service – 8-9 Tier-1s sustain - 1 Gbps at Tier-1s, 5 Gbps at CERN
Extended peaks at 10 Gbps CERN and some Tier-1s
Jan06 - SC4 Setup – AllTier-1s 10 Gbps at >5 Tier-1s, 35 Gbps at CERN
July06 - LHC Service – All Tier-1s 10 Gbps at Tier-1s, 70 Gbps at CERN
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
Key dates for Services
SC2SC3
SC4LHC Service Operation
Full physics run
2005 20072006 2008
First physicsFirst beams
cosmics
June05 - Technical Design Report
Sep05 - SC3 Service Phase
May06 –SC4 Service Phase
Sep06 – Initial LHC Service instable operation
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
Additional threads started to address:
Experiment involvement; Bringing T2s in SC3; Longer-term goals of bringing all T2s into the LCG Service
(Challenges)
The enthusiasm and support provided to these new activities is much appreciated
We have a lot of work ahead…
…but the problem is beginning to become tractable(?)
LC
G P
roje
ct, G
rid
Dep
loym
ent G
rou
p, C
ER
N
Conclusions
To be ready to fully exploit LHC, significant resources need to be allocated to a series of Service Challenges by all concerned parties
These challenges should be seen as an essential on-going and long-term commitment to achieving production LCG
The countdown has started – we are already in (pre-)production mode
Next stop: 2020