Presenter Name
Facility Name
UK Testbed Status and
EDG Testbed Two.Steve Traylen
GridPP 7, Oxford
28th April 2003 Steve Traylen, [email protected]
PPD
Outline
• Status of the UK Sites.
• Release of EDG 2.
• UK Certificates.
• Grid monitoring in the UK.
28th April 2003 Steve Traylen, [email protected]
PPD
Manchester
CE SE(1.5TB)
EDG 1.4
80xWN
CE SE(5TB)
EDG 1.4
60xWN
CE SE
EDG 1.4
9xWN
EDG Testbed BaBar Farm DZero Farm
•GridPP and BaBar VO Servers.
•User Interface
•Plan that DZero farm will join LCG.
•SRIF bid in place for significant HEP resources for the end of the year.
28th April 2003 Steve Traylen, [email protected]
PPD
UCL
CE SE
EDG 1.4
1xWN
EDG Testbed •Network Monitors for WP7 development.
•SRIF bid in place for 200 cpus for the end of the year to join LCG1.
28th April 2003 Steve Traylen, [email protected]
PPD
RAL PPD
CE SE
EDG 1.4
9xWN
EDG Testbed
CE SE
EDG 2.0
1xWN
RGMA Testbed
MON
•User Interface
•Plan to be a portion of the Southern Tier2 Centre within LCG1.
•50 cpus and 5TB of disk expected for the end of year.
28th April 2003 Steve Traylen, [email protected]
PPD
Birmingham
CE SE
EDG 1.4
1xWN
EDG Testbed•Expansion to 60 cpus and 4TBs.
•Expect to participate within LCG1/EDG2
Liverpool
CE SE
EDG 1.4
1xWN
EDG Testbed
•Currently unmaintained.
•Plan to follow EDG 2, possibly integrating BaBar farm.
28th April 2003 Steve Traylen, [email protected]
PPD
RAL
CE SE
EDG 1.4
5xWN
EDG Testbed
CE
EDG 1.4
230xWN
Teir1/a
CE SE
EDG 2.0
RGMA Testbed
MON
CE SE
EDG 2.0
1xWN
EDG Dev Testbed
MON SE
ADS
•UI within CSF.
•NM for EDG2.
•Top level MDS for EDG.
•Various WP3 and WP5 dev nodes.
•VOMS for DEV TB.
•http://ganglia.gridpp.rl.ac.uk/
SE
LCG0 Testbed
CE 1xWN
28th April 2003 Steve Traylen, [email protected]
PPD
Cambridge
CE SE
EDG 1.4
15xWN
EDG Testbed•Farm shared with local NA-48, GANGA users.
•Some RH73 WNs for ongoing Atlas challenge.
•3TB GridFTP-SE.
•Plan to join LCG1/EDG2 later in the year with an extra 50 cpus later this year.
•EDG jobs will soon be fed into the local E-Science farm.
•http://farm002.hep.phy.cam.ac.uk/cavendish/
28th April 2003 Steve Traylen, [email protected]
PPD
Bristol
CE SE
EDG 1.4
1xWN
EDG Testbed
CE SE
EDG 2.0
1xWN
RGMA Testbed
MON
CE SE
CMS-LCG0
CMS/LHCb Farm
24xWN
CE SE
EDG 1.4
BaBar Farm
78xWN
•GridPP RC.
•Plan to join EDG2 and LCG1
28th April 2003 Steve Traylen, [email protected]
PPD
Imperial College
CE SE
EDG 1.4
EDG Testbed
WNs
CE
EDG 1.4
WNs
BaBar Farm
CE SE
CMS-LCG0
CMS-LCG0
WN
CE SE
EDG 2.0
1xWN
RGMA Testbed
MON
•RB and BD-II for EDG 1.4.
•RB and BD-II for EDG 2.0.
•Plan to be in LCG1 and other testbeds.
28th April 2003 Steve Traylen, [email protected]
PPD
Queen Mary
CE SE
EDG 1.4
1xWN
EDG Testbed
32xWN
• CE also feeds EDG jobs to 32 node E-Science farm.
•Plan to have LCG1/EDG2 running for the end of the year.•Expansion with SRIF grants.(64WN+2TB in Jan 2004, 100WN + 8TB in Dec 2004.)
•http://194.36.10.1/ganglia-webfrontend
28th April 2003 Steve Traylen, [email protected]
PPD
Oxford
CE SE
EDG 1.4
2xWN
EDG Testbed •Plan to join EDG2/LCG1.
•Nagios monitoring has been set up.
•(RAL is also evaluating Nagios.)
•Planning to send EDG jobs into 10 WN CDF farm.
•128 node cluster being ordered now.
28th April 2003 Steve Traylen, [email protected]
PPD
Glasgow
CE SE
EDG 1.4
ScotGRID
59xWN
• New hardware expected soon.
• WNs on a private network with outbound NAT in place.
• As ScotGRID grows plans to be part of LCG.
• Various WP2 development boxes.
CE SE
EDG 2.0
RGMA Testbed
MON
28th April 2003 Steve Traylen, [email protected]
PPD
UK Overview
• Now significant resources within EDG.
• Integrating EDG to farm has been repeated many times but it is difficult.
• Sites are keen to take part within LCG1 or EDG2.
• By the end of the year many HEP farms plan to be contributing to LCG1 resources.
28th April 2003 Steve Traylen, [email protected]
PPD
EDG 2.0
• Now in a permanent state of immanent release.
• Since 27th May:
– 25 pre releases.
– 295 configuration changes.
– Range from a typo to a new resource broker.
28th April 2003 Steve Traylen, [email protected]
PPD
Criteria for cutting EDG 2.0• For EDG 2.0 to the following must be satisfied.
– 50 sequential jobs. 98% success.– 250 jobs being ran by 1 RB. 80% success.– 5 jobs with 2GB i/o sandbox. 80% success.– 25 jobs which require two proxy renewals. 80%– Upload and register 1GB file to an SE, replicate
to a mass storage device.– Register 1000 files in less than 1000s.– Match a job against three files on an SE.
28th April 2003 Steve Traylen, [email protected]
PPD
Installation of an EDG2 Testbed• LCFGng recommended installation method - No
manual install instructions yet.– Significantly better than LCFG.
• Configuration (site-cfg.h) is less cryptic.• Less hand installation required.
– Install host certificates.– PBS server.– MySQL tables.– mkgridmap.conf.
28th April 2003 Steve Traylen, [email protected]
PPD
Integration after 2.0.
• Use gcc3.2.2 throughout.– Currently used by RB and the APIs the
RB uses.• GridFTP access to castor.• Integration of VOMS.
– Currently ongoing in parallel.– Has no impact on existing software.
• This will be EDG 2.1
28th April 2003 Steve Traylen, [email protected]
PPD
Required Nodes
• CE: gatekeeper, MDS, gin, ..• SE: GridFTP, WP5-SE, gin, …• WN: PBS batch worker + client tools.• MON: Servlets for a site, GOut for the RB.
Also collects fabric monitoring information…– On small sites can be moved to the CE.
• Generally configuration is more modular.
28th April 2003 Steve Traylen, [email protected]
PPD
LCG1 or EDG2
• Which testbed should I join?
– Significant resources best suited to LCG1.
– Small dynamic testbeds can contribute to continued development of testbed two.
28th April 2003 Steve Traylen, [email protected]
PPD
UK Certificates
• UK EScience CA was added to production EDG testbed 3 weeks ago.
• UK Hep CA will stop issuing certificates.
– Existing certificates will still be valid for the remainder of their lifetime.
28th April 2003 Steve Traylen, [email protected]
PPD
VO Membership + EDG Guidelines
0 20 40 60 80 100 120
BaBar
Eobs
Iteam
LHCb
Alice
BioMe
CMS
Atlas
WP6
Members
UK Members
28th April 2003 Steve Traylen, [email protected]
PPD
Ganglia• Ganglia provides time plots of system metrics.• In use at RAL, Cambridge and QMUL.• By default load, network i/o, memory.• Trivial to add new metrics, e.g. active MySQL
connection for CMS.• Expansion to the UK possible via LCFG objects
and instructions, however WP4 tools might be a used instead.
• Data could be collected centrally for a UK view.
28th April 2003 Steve Traylen, [email protected]
PPD
GridPP Map
• Checks HEP sites every 6(?) hours for:
– Ping
– Globus Submission
– EDG Job Submission via Imperial RB.
– EDG Job Submission via LYON RB.
• http://www.gridpp.ac.uk/map/
28th April 2003 Steve Traylen, [email protected]
PPD
GridPP RB Monitoring @ Imperial
• Publishes service status.• Publishes times for LDAP queries of resources.• http://www.hep.ph.ic.ac.uk/~dguser/diagnostics.html.• Imperial also submits test jobs, more sophisticated
jobs than the map, e.g. check for the existence of a CloseSE.
• http://www.hep.ph.ic.ac.uk/~dguser/Qstatus.html
28th April 2003 Steve Traylen, [email protected]
PPD
Monitoring
• Currently lots of monitoring but no central location.
• Most monitoring currently only shows the current state.
• The Grid operations centre can coordinate much of this.