Steve Lloyd IoP Dublin March 2005 Slide 1
The Grid for
Particle Physic(ist
)sSteve Lloyd
Queen Mary, University of London
What is it and how do I use it?
Steve Lloyd IoP Dublin March 2005 Slide 2
Starting from this event…
We are looking for this “signature”
Selectivity: 1 in 1013
Like looking for 1 person in a thousand world populations
Or for a needle in 20 million haystacks!
LHC Data Challenge
• ~100,000,000 electronic channels
• 800,000,000 proton-proton interactions per second
• 0.0002 Higgs per second
• 10 PBytes of data a year
• (10 Million GBytes = 14 Million CDs)
Computing Solution: The Grid
Steve Lloyd IoP Dublin March 2005 Slide 3
The Grid
Ian Foster / Carl Kesselman:
"A computational Grid is a hardware and software infrastructure that provides dependable, consistent, pervasive and inexpensive access to high-end computational capabilities."
'Grid' means different things to different people
All agree it’s a funding opportunity!
Steve Lloyd IoP Dublin March 2005 Slide 4
Electricity Grid
Analogy with the Electricity Power Grid
'Standard Interface'
Power Stations
Distribution Infrastructure
Steve Lloyd IoP Dublin March 2005 Slide 5
Computing Grid
Computing and Data Centres
Fibre Optics of the Internet
Steve Lloyd IoP Dublin March 2005 Slide 6
Middleware
MIDDLEWARE
CPUDisks, CPU etc
PROGRAMS
OPERATING SYSTEM
Word/Excel
Email/Web
Your Progra
mGames
CPUCluste
r
UserInterfac
eMachine
CPUCluste
r
CPUCluste
r
Resource Broker
Information Service
Single PC
Grid
DiskServer
Your Progra
m
Middleware is the Operating System of a distributed computing system
Replica CatalogueBookkeepin
g Service
Steve Lloyd IoP Dublin March 2005 Slide 7
GridPP
19 UK Universities, CCLRC (RAL & Daresbury) and CERN
Funded by the Particle Physics and Astronomy Research Council (PPARC)
GridPP1 - 2001-2004 £17m "From Web to Grid"
GridPP2 - 2004-2007 £16m "From Prototype to Production"
Not planning GridPP3 – aim is to incorporate Grid activities and facilities into baseline programme.
Steve Lloyd IoP Dublin March 2005 Slide 8
International Collaboration
• EU DataGrid (EDG) 2001-2004– Middleware Development Project
• US and other Grid projects Interoperability
• LHC Computing Grid (LCG)– Grid Deployment Project for LHC
• EU Enabling Grids for e-Science (EGEE) 2004-2006– Grid Deployment Project for all
disciplines
GridPP LCG
EGEE
Steve Lloyd IoP Dublin March 2005 Slide 9
GridPP Support
Manpower for Experiments:
Manpower for Middleware Development:
Hardware and Manpower at RAL (LHC Tier-1, BaBar Tier-A)Manpower for System Support at Institutes (Tier-2s)Manpower for LCG at CERN (under discussion)
• Metadata• Storage• Workload Management• Security• Information and
Monitoring• Networking
(Not directly supported but using LCG)
Steve Lloyd IoP Dublin March 2005 Slide 10
Paradigm Shift?
Jun: 80%:20%
Aug: 27%:73%
May: 89%:11%
Jul: 77%:23%
LHCb Monte Carlo Production
Grid
Non-Grid
Non-Grid
Grid
Steve Lloyd IoP Dublin March 2005 Slide 11
Tier Structure
Tier 0
Tier 1National centres
Tier 2Regional groups
Tier 3Institutes
Tier 4Workstations
Offline farm
Online system
CERN computer centre
RAL,UK
ScotGrid NorthGridSouthGrid London
FranceItalyGermanyUSA
Glasgow Edinburgh Durham
Steve Lloyd IoP Dublin March 2005 Slide 12
RAL Linux CSF : Weekly CPU Utilisation Financial Year 2000/01
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
3-Apr-00
1-May-00
29-May-00
26-Jun-00
24-Jul-00
21-Aug-00
18-Sep-00
16-Oct-00
13-Nov-00
11-Dec-00
8-Jan-01
5-Feb-01
5-Mar-01
Pla
tfo
rm-r
ela
ted
CP
U H
ou
rs
Resource Discovery at
Tier-1
1 July 2000
1 October 2000
Pre-Grid
GRID Load 21-28 July 2004Full again in 8 hours!
Steve Lloyd IoP Dublin March 2005 Slide 13
UK Tier-2 Centres
ScotGridDurham, Edinburgh, Glasgow NorthGridDaresbury, Lancaster, Liverpool,Manchester, Sheffield
SouthGridBirmingham, Bristol, Cambridge,Oxford, RAL PPD, Warwick
LondonBrunel, Imperial, QMUL, RHUL, UCL
Mostly funded by HEFCE
Steve Lloyd IoP Dublin March 2005 Slide 14
Tier-2 Resources
CPU Disk
ALICE ATLAS CMS LHCb ALICE ATLAS CMS LHCb
London 0.0 1.0 0.8 0.4 0.0 0.2 0.3 11.0
NorthGrid 0.0 2.5 0.0 0.3 0.0 1.3 0.0 12.1
ScotGrid 0.0 0.2 0.0 0.2 0.0 0.0 0.0 39.6
SouthGrid 0.2 0.5 0.2 0.3 0.0 0.1 0.0 6.8
Committed Resources at each Tier-2 in 2007 Experiments’ Requirement of a Tier-2 in 2008
Need SRIF3 Resources!
Doesn’t include SRIF3. Experiment shares determined by Institutes who bought the kit
Overall LCG shortfall ~30% in CPU ~50% in Disk (All Tiers)
Steve Lloyd IoP Dublin March 2005 Slide 15
The LCG Grid
123 Sites
33 Countries
10,314 CPUs
3.3 PBytes Disk
Steve Lloyd IoP Dublin March 2005 Slide 16
Grid Demo
http://www.hep.ph.ic.ac.uk/e-science/projects/demo/index.html
Steve Lloyd IoP Dublin March 2005 Slide 17
Getting Started
http://ca.grid-support.ac.uk/
1. Get a digital certificate
2. Join a Virtual Organisation (VO) For LHC join LCG and choose a
VO
3. Get access to a local User Interface Machine (UI) and copy your files and certificate there
Authentication – who you are
http://lcg-registrar.cern.ch/
Authorisation – what you are allowed to do
Steve Lloyd IoP Dublin March 2005 Slide 18
Job Preparation
############# athena.jdl #################Executable = "athena.sh";StdOutput = "athena.out";StdError = "athena.err";InputSandbox = {"athena.sh", "MyJobOptions.py", "MyAlg.cxx", "MyAlg.h", "MyAlg_entries.cxx", "MyAlg_load.cxx", "login_requirements", "requirements", "Makefile"}; OutputSandbox = {"athena.out","athena.err", "ntuple.root", "histo.root", "CLIDDBout.txt"};Requirements = Member("VO-atlas-release-9.0.4", other.GlueHostApplicationSoftwareRunTimeEnvironment);################################################
Input files
Output Files
Choose ATLAS Version (Satisfied by ~32 Sites)
Prepare a file of Job Description Language (JDL):
My C++ CodeJob Options
Script to run
Steve Lloyd IoP Dublin March 2005 Slide 19
Job Submission
[lloyd@lcgui ~/atlas]$ grid-proxy-initYour identity: /C=UK/O=eScience/OU=QueenMaryLondon/L=Physics/CN=steve lloydEnter GRID pass phrase for this identity:Creating proxy .............................. DoneYour proxy is valid until: Thu Mar 17 03:25:06 2005[lloyd@lcgui ~/atlas]$
Make a copy of your certificate to send out (~ once a day):
[lloyd@lcgui ~/atlas]$ edg-job-submit --vo atlas -o jobIDfile athena.jdlSelected Virtual Organisation name (from --vo option): atlasConnecting to host lxn1188.cern.ch, port 7772Logging to host lxn1188.cern.ch, port 9002================================ edg-job-submit Success ==================================== The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is:
- https://lxn1188.cern.ch:9000/0uDjtwbBbj8DTRetxYxoqQ
The edg_jobId has been saved in the following file: /home/lloyd/atlas/jobIDfile============================================================================================[lloyd@lcgui ~/atlas]$
Submit the Job: VO JDLFile to hold job IDs
Steve Lloyd IoP Dublin March 2005 Slide 20
[lloyd@lcgui ~/atlas]$ edg-job-status -i jobIDfile------------------------------------------------------------------1 : https://lxn1188.cern.ch:9000/tKlZHxqEhuroJUhuhEBtSA2 : https://lxn1188.cern.ch:9000/IJhkSObaAN5XDKBHPQLQyA3 : https://lxn1188.cern.ch:9000/BMEOq90zqALvkriHdVeN7A4 : https://lxn1188.cern.ch:9000/l6wist7SMq6jVePwQjHofg5 : https://lxn1188.cern.ch:9000/wHl9Yl_puz9hZDMe1OYRyQ6 : https://lxn1188.cern.ch:9000/PciXGNuAu7vZfcuWiGS3zQ7 : https://lxn1188.cern.ch:9000/0uDjtwbBbj8DTRetxYxoqQa : allq : quit------------------------------------------------------------------Choose one or more edg_jobId(s) in the list - [1-7]all:7*************************************************************BOOKKEEPING INFORMATION:
Status info for the Job : https://lxn1188.cern.ch:9000/0uDjtwbBbj8DTRetxYxoqQCurrent Status: Done (Success)Exit code: 0Status Reason: Job terminated successfullyDestination: lcg00125.grid.sinica.edu.tw:2119/jobmanager-lcgpbs-shortreached on: Wed Mar 16 17:45:41 2005*************************************************************[lloyd@lcgui ~/atlas]$
RAL
Valencia
CERNTaiwan
Job Status
Taiwan
Find out its status:
Ran at:
Steve Lloyd IoP Dublin March 2005 Slide 21
Job Retrieval
[lloyd@lcgui ~/atlas]$ edg-job-get-output -dir . -i jobIDfileRetrieving files from host: lxn1188.cern.ch ( for https://lxn1188.cern.ch:9000/0uDjtwbBbj8DTRetxYxoqQ )********************************************************************************* JOB GET OUTPUT OUTCOME Output sandbox files for the job: - https://lxn1188.cern.ch:9000/0uDjtwbBbj8DTRetxYxoqQ have been successfully retrieved and stored in the directory: /home/lloyd/atlas/lloyd_0uDjtwbBbj8DTRetxYxoqQ*********************************************************************************
[lloyd@lcgui ~/atlas]$ ls -lt /home/lloyd/atlas/lloyd_0uDjtwbBbj8DTRetxYxoqQtotal 11024-rw-r--r-- 1 lloyd hep 224 Mar 17 10:47 CLIDDBout.txt-rw-r--r-- 1 lloyd hep 69536 Mar 17 10:47 ntuple.root-rw-r--r-- 1 lloyd hep 5372 Mar 17 10:47 athena.err-rw-r--r-- 1 lloyd hep 11185282 Mar 17 10:47 athena.out
Retrieve the Output:
Steve Lloyd IoP Dublin March 2005 Slide 22
Conclusions
• The Grid is here – it works!• Currently difficult to install and maintain the
middleware and the experiment’s software• It is straightforward to use• There are huge resources available: Last
week LXBATCH had 6500 ATLAS Jobs queued - LCG had 3017 free CPUs
• Need to scale to full size ~10,000 → 100,000 CPUs
• Need Stability, Robustness, Security (Hackers Paradise!) etc
• Need continued funding beyond start of LHC!
Use it!
Steve Lloyd IoP Dublin March 2005 Slide 23
Further Info
http://www.gridpp.ac.uk