Robin Middleton
RAL-PPD/EGEE/GridPP
Grid Computing
A high-level look at Grid Computing in theworld of Particle Physics and at LHC in particular.
I am indebted to the EGEE, LCG and GridPPprojects and to colleagues therein for much
of the material presented here.
2
Overview
• e-Science and The Grid• Grids in Particle Physics
– EGEE LCG GridPP
– Virtual Organisations
• Computing Model (very high level !)• Components of the EGEE/LCG/GridPP Grid
– security
– information service
– resource brokering
– data management
• Monitoring & User Support• Other Projects / Sciences• Sustainability & EGI• Further information
– Links
3
What is e-Science ?What is the Grid ?
e-Science
• …also : e-Infrastructure, cyberinfrastructure, e-Research, …• Includes
– grid computing (e.g. WLCG, EGEE, EGI, OSG, TeraGrid, NGS…)• computationally and/or data intensive; highly distributed over wide area
– digital curation
– digital libraries
– collaborative tools (e.g. Access Grid)
– …many other areas
• Most UK Research Councils active in e-Science– BBSRC
– NERC (e.g. climate studies, NERC DataGrid)
– ESRC (e.g. NCeSS
– AHRC (e.g. studies in collaborative performing arts)
– EPSRC (e.g. eMinerals, MyGrid, …)
– STFC (formerly PPARC and CCLRC) (e.g. GridPP, AstroGrid) 4
5
e-Science in 2000
• Dr John Taylor (former Director General of Research Councils,Office of Science and Technology)– ‘e-Science is about global collaboration in key areas of science, and the next
generation of infrastructure that will enable it.’
– ‘e-Science will change the dynamic of the way science is undertaken.’
• SR2000 E-Science Budgets
£80m Collaborative projects
Generic Challenges EPSRC (£15m), DTI (£15m)
Industrial Collaboration (£40m)
Academic Application SupportProgramme
Research Councils (£74m), DTI (£5m)
PPARC (£26m) BBSRC (£8m) MRC (£8m) NERC (£7m) ESRC (£3m) EPSRC (£17m) CLRC (£5m)
And 9 Years on…
• An independent panel of international experts has judged the UK'se-Science Programme as "world-leading", citing that "investments are already empowering significant contributions to wellbeing in the UK and the world beyond".
“The panel found the e-Science Programme to have had a positive economic impact, especially in the important areas of life sciences and medicine, materials, and energy and sustainability. Attractive to industry from its inception, the programme has drawn in around £30 million from industrial collaborations, both in cash and in-kind. Additionally it has already contributed to 138 stakeholder collaborations, 30 licenses or patents, 14 spin-off companies and 103 key results taken up by industry and early indications show there are still more to come.”
http://www.rcuk.ac.uk/news/100210.htm
6
Grids, clouds, supercomputers, etc.
7
(Ack: Bob Jones - EGEE Project Director)
Bob Jones - October 2009 7
Grids• Collaborative environment• Distributed resources (political/sociological)• Commodity hardware (also supercomputers)• (HEP) data management• Complex interfaces (bug not feature)
Supercomputers• Expensive• Low latency interconnects• Applications peer reviewed• Parallel/coupled applications• Traditional interfaces (login)• Also SC grids (DEISA, Teragrid)
Clouds• Proprietary (implementation)• Economies of scale in management• Commodity hardware• Virtualisation for service provision and encapsulating application environment• Details of physical resources hidden• Simple interfaces (too simple?)
Volunteer computing• Simple mechanism to access millions CPUs• Difficult if (much) data involved• Control of environment check • Community building – people involved in Science• Potential for huge amounts of real work
Many different problems:Amenable to different solutions
No right answer
Many different problems:Amenable to different solutions
No right answer
Consider ALL as a combined e-Infrastructure ecosystemAim for interoperability and combine the resources into a consistent whole
Grids and Clouds• GridPP4 will take a
closer look at clouds• Issues
– relative costs
– I/O bandwidth : getting the data into the cloud in the first place !
– data security : entrusting our data to external body
8
© GridTalk
9
What is the Grid ?
• Much more than the web…
• “Grid computing [is] distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications, and, in some cases, high-performance orientation...we review the "Grid problem", which we define as flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources - what we refer to as virtual organizations."– From "The Anatomy of the Grid: Enabling Scalable Virtual Organizations" by
Foster, Kesselman and Tuecke
• “The Web on Steroids” !
10
What is the Grid ?
• The Grid : Blueprint for a New Computing Infrastructure(Ian Foster & Carl Kesselman)
– “A computational grid is a hardware and softwareinfrastructure that provides dependable, consistent,pervasive, and inexpensive access to high-endcomputational capabilities.”
– http://www.mkp.com/mk/default.asp?isbn=1558604758
• What is the Grid ? A Three Point Checklist (Ian Foster)i) Co-ordinates resources that are not subject to centralised control
- see EGEE/LCG/GridPP Grid
ii) …using standard, open, general-purpose protocols and interfaces- see Open Grid Forum, x.509, (also Globus, Condor, gLite)
iii) …to deliver nontrivial qualities of service- see LCG MoU, Service availability, resources promises (CPU, storage, network)
– http://www-fp.mcs.anl.gov/~foster/Articles/WhatIsTheGrid.pdf
11
Haven’t we beenhere before ?
• Multiple standards• Interoperability• Distributed resource sharing• No common management• Concurrent access• Flexible
Analogies continue…• Far from perfect• Much room for improvement• A “trip” hazard !
Acknowledgement: J.Gordon
12
Grids in Particle Physics
(with the LHC as an example)
13
EGEELCGGridPP
LCG LHC Computing GridDistributed Production Environment for Physics Data Processing
In 2007 : 100,000 CPU, 15PB/Yr, 5000 physicist, 500 institutes
EGEE Enabling Grids for E-sciencEStarts from LCG infrastructure
Production Grid in 27 countries
HEP, BioMed, CompChem,
Earth Science, …
GridPPGrid computing for HEP in UK
Major contributor to LCG & EGEE
19 Institutes
14
EGEE-III• 140 partners in 33 countries• ~32M€ (EU); 2 years• ~250 sites in 45 countries; • >75,000 CPU-cores• >270 VOs• ~210k jobs/day (peak 230k)• ~76 million jobs run in year
to Aug08
EGEE Federations UK/Ireland
CPU time delivered (CPU months)
x 2
231K jobs/dayNumber of jobs per month
15
EGEE-III
• 140 partners in 33 countries• ~32M€ (EU); 2 years• ~250 sites in 45 countries; • >75,000 CPU-cores• >270 VOs• ~210k jobs/day (peak 230k)• ~76 million jobs run in year
to Aug08• “Other VOs” 30k jobs/day
EGEE “reach” in 2008
16
LCG – LHC Computing Grid
• Worldwide LHC ComputingGrid
• Framework to deliverdistributed computing forLHC experiments
– Middleware / Deployment– Service/Data Challenges– Security– Applications Software– Distributed Analysis– Private Optical Network– MoUs
• Coverage– Europe EGEE– USA OSG– Asia Naregi, Taipei,
China…– Other…
17
GridPP
• Integrated within the LCG/EGEE framework
• UK Service Operations (LCG/EGEE)
– Tier-1 & Tier-2s
• HEP Applications Integration Exploitation by experiments
– @ LHC, FNAL, SLAC– GANGA (LHCb & ATLAS)
• Increasingly closer working with NGS
• Phase 1 : 2001-2004– Prototype (Tier-1)
• Phase 2 : 2004-2008– “From Prototype to Production”– Production (Tier-1&2)
• Phase 3 : 2008-2011– “From Production to Exploitation”– Reconstruction, Monte Carlo, Analysis
• Phase 4 : 2011-2014…• routine operation during LHC running
Tier-1 Farm Usage
18
Virtual Organisations
19
LHC Data Rates
• 40MHz beam crossing• 107 channels
~1015 Bytes/s
• Reduce 107 in trigger• Few MByte/event• 100Hz “to tape”• ~107s in a year !!
108x107 = 1 PByte /yr
per experiment
20
LHC Computing Model
physics group
regional group
Tier2
Lab a
Uni a
Lab c
Uni n
Lab m
Lab b
Uni bUni y
Uni x
Tier3physics
department
Desktop
Germany
Tier 1
USA
UK
France
Italy
……….
CERN Tier 1
……….
The LHC Computing
CentreCERN Tier 0
21
22
Tier Centres & the UK
23
A typical Access Pattern
Raw Data ~1000 Tbytes
AOD ~10 TB
AOD ~10 TB
AOD ~10 TB
AOD ~10 TB
AOD ~10 TB
AOD ~10 TB
AOD ~10 TB
AOD ~10 TB
AOD ~10 TB
Reco-V1 ~1000 Tbytes Reco-V2 ~1000 Tbytes
ESD-V1.1 ~100 Tbytes
ESD-V1.2 ~100 Tbytes
ESD-V2.1 ~100 Tbytes
ESD-V2.2 ~100 Tbytes
Access Rates Access Rates (aggregate, average)(aggregate, average)
100 Mbytes/s (2-5 100 Mbytes/s (2-5 physicists)physicists)
500500 Mbytes/s ( Mbytes/s (55--110 0 physicists)physicists)
11000 Mbytes/s (~000 Mbytes/s (~550 0 physicists)physicists)
22000 Mbytes/s (~000 Mbytes/s (~15150 0 physicists)physicists)
Typical LHC particle physics Typical LHC particle physics experiment One year of acquisition experiment One year of acquisition and analysis of dataand analysis of data
24
Principle Components
25
The Middleware Stack
Computing cluster Network resources Data storage
Operating system Local schedulerFile system
User access SecurityData transferInformation schema
Job Management Data managementApp monitoring system
User interfaces Applications
Hardware
System software
“Basic” services
“Collective” services
Application level services
Scientific LinuxScientific Linux NFS, AFS …NFS, AFS … PBS, Condor, LSF,…PBS, Condor, LSF,…
gLite (+Globus, Condor, etc)gLite (+Globus, Condor, etc)
EGEE/LCG & ExptsEGEE/LCG & Expts
Information system
FTSFTS
X.509X.509
LFCLFCDashboards, SAMDashboards, SAM
Resource BrokersResource Brokers
GANGA, etcGANGA, etc
26
gLite
27
Security – Job Submission
useruser CECE
user cert
low frequency
high frequency
host cert
proxy
authz
VO
informationsystem
1. VO affiliation(AccessControlBase
)4. CEs for VOs in authz?
3. job submission
MyProxyserver
WMS
2. cert upload
VO credential is used by the resource broker to pre-select available CEs.
28
Security – Running a Job
MyProxyserver
CECE
cert(long term)
host cert
proxy
authz
VO
WMS
1. cert download
LCAS/LCMAPS
authentication & authorization info
2. job start
LCAS: authorization based on (multiple) VO/group/role attributes
LCMAPS: mapping to user pool and to (multiple) groups
default VO = default UNIX group
other VO/group/role = other UNIX group(s)
voms-proxy-init
VO credential for authorization and mapping on the CE.
29
Information System
• At the core of making the Grid function• Hierarchy of distributed BDII/LDAP servers • Information organised using the GLUE Schema
30
Information System - LDAP
• Lightweight Directory Access Protocol: – structures data as a tree
– DIT = Directory Information Tree
• Following a path from the node back to
the root of the DIT, a unique name isbuilt (the DN):
“id=ano,ou=PPD,or=STFC,st=Chilton, \c=UK,o=grid”
o = grid (root of the DIT)
c= US c=UK c=Spain
st = Chilton
or = STFC
ou = PPD ou = ESC
objectClass:personcn: A.N.Other
phone: 5555666office: R1-3.10
31
WMS – Workload Management System
• WMS is composed of the following parts:1. User Interface (UI) : access point for the user to the WMS
2. Resource Broker (RB) : the broker of GRID resources, responsible to find the “best” resources where to submit jobs
3. Job Submission Service (JSS) : provides a reliable submission system
4. Information Index (BDII) : a server (based on LDAP) which collects information about Grid resources – used by the Resource Broker to rank and select resources
5. Logging and Bookkeeping services (LB) : store Job Info available for users to query
• (However, this is evolving with the moves to the gLite RB and the gLite CE !
Executable = “/bin/echo”;Arguments = “Good Morning”;StdError = “stderr.log”;StdOutput = “stdout.log”;OutputSandbox = {“stderr.log”, “stdout.log”};
Executable = “gridTest”;
StdError = “stderr.log”;
StdOutput = “stdout.log”;
InputSandbox = {“/home/robin/test/gridTest”};
OutputSandbox = {“stderr.log”, “stdout.log”};
InputData = “lfn:testbed0-00019”;
DataAccessProtocol = “gridftp”;
Requirements = other.Architecture==“INTEL” && \ other.OpSys==“LINUX” && other.FreeCpus >=4;
Rank = “other.GlueHostBenchmarkSF00”;
Example JDL
32
Data Management
• DPM – Disk Pool Manager• also (dCache), CASTOR
• LFC – LHC File Catalogue• FTS – File Transfer Service
33
Storage - DPM
• Disk Pool Manager: lightweight disk-only storage element– disk only storage with focus on manageability
• Features– secure: authentication via GSI or Kerberos 5, authorisation via VOMS
– full POSIX ACL support with DN (userid) and VOMS groups
– disk pool management (direct socket interface)
– storage name space (aka. storage file catalog)
– DPM can act as a site local replica catalog
– SRMv1, SRMv2.1 and SRMv2.2
– gridFTP, rfio
• Other Storage Element technologies…– dCache
– CASTOR
34
File Catalogues
• LFC– secure (authn: GSI, authz: VOMS) file and replica catalogue; DLI
– supports full POSIX namespace and ACLs
– central file catalogue and local file catalogue modes
• Fireman– secure (authn: GSI, authz: VOMS/ACL) file, replica and meta-data catalog;
data location interface (DLI) for WMS
– web-service interface with bulk operations
• AMGA– grid meta-data catalogue
– streaming socket interface
GlossarySURL = Storage URL
GUID = Global Unique ID
LFN = Logical File Name
35
File Transfer Service
• File Transfer Service is a data movement fabric service– multi-VO service, used to balance usage of site resources according to VO
and site policies
– uses SRM and gridFTP services of an SE
• Why is it needed ?– For the user, the service it provides is the reliable point to point movement of
Storage URLs (SURLs) among Storage Elements
– For the site manager, it provides a reliable and manageable way of serving file movement requests from their VOs
– For the VO manager, it provides ability to control requests coming from users (re-ordering, prioritization,...)
36
Grid Portals
37
Ganga• Job Definition & Management• Implemented in Python• Extensible – plug-ins• Used ATLAS, LHCb & non-HEP
http://ganga.web.cern.ch/ganga/index.php
38
GENIUS (INFN)
http://grid.infn.it/modules/italian/index.php?pagenum=6
39
Monitoring the Grid
User Support
40
Monitoring
• Lots and lots of it !– SAM – Service Availability Monitor
• https://twiki.cern.ch/twiki/bin/view/LCG/SAMOverview
– Network Monitoring – GridMon• http://gridmon.dl.ac.uk/gridmon/graph.html• Google Maps & Real-time Monitors
– Grid Map• http://gridmap.cern.ch/gm/
– Application Level• ARDA Dashboard
– CMS Dashboard– ATLAS Dashboard– LHCb Dashboard– ALICE Dashboard
41
Google Map – Site Status
• http://goc02.grid-support.ac.uk/googlemaps/sam.html
42
Google Map – Site Status
43
LCG Real-time Monitor
44
LCG Real-time Monitor
45
GridMap (2008)
GridMap (2009)
46
47
ARDA Dashboard http://dashboard.cern.ch/
• Used by all 4 LHC experiments to monitor jobs and file movements
48
ARDA Dashboard http://dashboard.cern.ch/
• Used by all 4 LHC experiments to monitor jobs and file movements
49
ARDA Dashboard http://dashboard.cern.ch/
• Used by all 4 LHC experiments to monitor jobs and file movements
50
ARDA Dashboard http://dashboard.cern.ch/
• Used by all 4 LHC experiments to monitor jobs and file movements
UK Status (& SAM Tests)
51
52
User Support
• Documentation– oodles of it !
– but much room for improvement !
• Experiment (VO) specific contacts
• GGUSGlobal Grid User Support– ticket based
– Linked to• regional centres• software experts
53
Other Projects
Other Sciences
54
EGEE Related Infrastructure Projects
DEISATeraGrid
Coordination in SA1 for:
• EELA, BalticGrid, EUMedGrid, EUChinaGrid, SEE-GRID
Interoperation with
• OSG, NAREGI
SA3: • DEISA, ARC, NAREGI
55
EGEE Collaborating Projects
Applicationsimproved services for academia,
industry and the public
Support Actionskey complementary functions
Infrastructuresgeographical or thematic coverage
EGEE - Communities
• Astronomy & Astrophysics• large-scale data acquisition, simulation, data storage/retrieval
• Computational Chemistry• use of software packages (incl. commercial) on EGEE
• Earth Sciences• Seismology, Atmospheric modeling, Meteorology, Flood forecasting, Pollution
• Fusion (build up to ITER)• Ion Kinetic Transport, Massive Ray Tracing, Stellarator Optimization.
• Grid Observatory• collect data on Grid behaviour (Computer Science)
• High Energy Physics• four LHC experiments, BaBar, D0, CDF, Lattice QCD, Geant4, SixTrack, …
• Life Sciences• Medical Imaging, Bioinformatics, Drug discovery• WISDOM – drug discovery for neglected / emergent diseases
(malaria, H5N1, …)56
ESFRI Projects
• Many are starting to look at their e-Science needs– some at a similar scale to the LHC (petascale)
– project design study stage
– http://cordis.europa.eu/esfri/
57
Cherenkov Telescope Array
National eScience Centreand other eScience Centres
• Edinburgh & Glasgow collaboration• e-Science Institute• Lectures & presentations• Meeting place• NeSC Mission Statement
– To stimulate and sustain the development of e-Sciencein the UK, to contribute significantly to its internationaldevelopment and to ensure that its techniques are rapidlypropagated to commerce and industry.
– To identify and support e-Science projects within andbetween institutions in Scotland, and to provide theappropriate technical infrastructure and support in orderto ensure rapid uptake of e-Science techniques byScottish scientists.
– To encourage the interaction and bi-directional flow ofideas between computing science research ande-Science applications
– To develop advances in scientific data curation andanalysis and to be a primary source of top qualitysystems and repositories that enable management,sharing and best use of research data.
58
Digital Curation
59
• Digital Curation Centre– Edinburgh, NeSC, HATII, UKOLN, STFC
– Objectives• Provide strategic leadership in digital
curation and preservation for the UKresearch community, with particularemphasis on science data
• Influence and inform national and international policy• Provide advocacy and expert advice and guidance to practitioners and funding
bodies• Create, manage and develop an outstanding suite of resources and tools• Raise the level of awareness and expertise amongst data creators and curators, and
other individuals with a curation role• Strengthen community curation networks and collaborative partnerships• Continue our strong association with our research programme
• Particle Physics– Study group / workshops (DESY & SLAC) in 2009 -> intermediate report
to ICFA
60
Sustainability
EGIEuropean Grid Infrastructure
61
Where Next ?
Testbeds Utility ServiceRoutine Usage
National
Global
European e-Infrastructure
Move to EGI/NGI
• De-centralised– emphasises NGIs
– still some centralised tasks
• Governed by NGIs• Initial co-funding from EU• For all disciplines
– sciences, humanities, …
62
e-IRG Recommendation, 12/2005:
“The e-IRG recognizes that the current project-based financing model of grids (e.g., EGEE, DEISA) presents continuity and interoperability problems,
and that new financing and governance models need to be explored – taking into account the role of national grid initiatives as recommended in the Luxembourg e-IRG meeting.”
e-IRG Recommendation, 12/2005:
“The e-IRG recognizes that the current project-based financing model of grids (e.g., EGEE, DEISA) presents continuity and interoperability problems,
and that new financing and governance models need to be explored – taking into account the role of national grid initiatives as recommended in the Luxembourg e-IRG meeting.”
• Specialised Support Centres- for VOs / disciplines (e.g. HEP)
- externally funded
EGI - Management Structure
63
EGI - Tasks
• Accounting, Security, User Support, Problem Tracking,Middleware testing, Deployment, VO Registration, Monitoring,Grid Information Systems, etc.
64
EGI - Transition
65
• EGI-DS Project– establish Blueprint for EGI
– establish EGI.org
• EGEE– begin transition ~Spring 2009
• EGI– operational Spring 2010
• Continuity of service is KEY– not only for LHC…
EGI - Status
• EU bids– Proposals submitted – Nov’09
– Significant (for HEP) SSCs not invited to hearing !
– EGI-Inspire & EMI hearing yesterday -> anticipate infrastructure & middleware development will be funded
• New legal entity, EGI.eu, created last week in Amsterdam– …and soon recruiting
• Proto UK NGI based on NGS & GridPP is in place
66
67
Further Information
iSGTW
InternationalScienceGridThisWeek
68
http://www.isgtw.org/
69
Links
• GridPP http://www.gridpp.ac.uk/• LCG http://lcg.web.cern.ch/LCG/
– LCGwiki https://twiki.cern.ch/twiki/bin/view/LCG/WebHome– monitoring & status
• EGEE http://www.eu-egee.org/– gLite http://glite.web.cern.ch/glite/
• EGI (European Grid Initiative)– Design Study http://web.eu-egi.org/
• Computing in…– ALTAS http://atlas-computing.web.cern.ch/atlas-computing/computing.php
– CMS http://cms.cern.ch/iCMS/jsp/page.jsp?mode=cms&action=url&urlkey=CMS_COMPUTING
– LHCb http://lhcb-comp.web.cern.ch/lhcb-comp/
• Portals– Ganga http://ganga.web.cern.ch/ganga/– GILDA https://gilda.ct.infn.it/
• Open Grid Forum http://www.ogf.org/• Globus http://www.globus.org/• Condor http://www.cs.wisc.edu/condor/
Robin Middleton
RAL-PPD/EGEE/GridPP
The End