Post on 13-Mar-2016
description
transcript
Large-Scale Computing with Grids
Jeff Templon
KVI Seminar17 February 2004
Jeff Templon – Seminar, KVI, 2004.02.17 - 2
Information I intend to transfer
Why are Grids interesting? Grids are solutions so I will spend some time talking about the problem and show how Grids are relevant. Solutions should solve a problem.
What are Computational Grids? How are Grids being used, in HEP as well as in other
fields? What are some examples of current Grid “research
topics”?
Jeff Templon – Seminar, KVI, 2004.02.17 - 3
Our Problem
Place event info on 3D mapTrace trajectories through hitsAssign type to each trackFind particles you wantNeedle in a haystack!This is “relatively easy” case
Jeff Templon – Seminar, KVI, 2004.02.17 - 4
More complex example
interactivephysicsanalysis
batchphysicsanalysis
detector
event summary data
rawdata
eventreprocessing
eventsimulation
analysis objects(extracted by physics topic)
Data Handling and Computation for
Physics Analysisevent filter(selection &
reconstruction)
processeddata
Jeff Templon – Seminar, KVI, 2004.02.17 - 6
Computational Aspects
To reconstruct and analyze 1 event takes about 90 seconds
Maybe only a few out of a million are interesting. But we have to check them all!
Analysis program needs lots of calibration; determined from inspecting results of first pass.
Each event will be analyzed several times!
Jeff Templon – Seminar, KVI, 2004.02.17 - 7
online systemmulti-level triggerfilter out backgroundreduce data volume
level 1 - special hardware
40 MHz (40 TB/sec)level 2 - embedded processorslevel 3 - PCs
75 KHz (75 GB/sec)5 KHz (5 GB/sec)100 Hz(100 MB/sec)data recording &offline analysis
One of the four LHC detectors
Jeff Templon – Seminar, KVI, 2004.02.17 - 8
Computational Implications (2)
90 seconds per event to reconstruct and analyze 100 incoming events per second To keep up, need either:
A computer that is nine thousand times faster, or nine thousand computers working together
Moore’s Law: wait 20 years and computers will be 9000 times faster (we need them in 2007!)
Jeff Templon – Seminar, KVI, 2004.02.17 - 9
More Computational Issues
Four LHC experiments – roughly 36k CPUs needed BUT: accelerator not always “on” – need fewer BUT: multiple passes per event – need more! BUT: haven’t accounted for Monte Carlo production –
more!! AND: haven’t addressed the needs of “physics users” at
all!
Jeff Templon – Seminar, KVI, 2004.02.17 - 10
Large Dynamic Range Computing
Clear that we have enough work to keep somewhere between 50 and 100 k CPUs busy
Most of the activities are “bursty”, so doing them with dedicated facilities is inefficient
Need some meta-facility that can serve all the activities Reconstruction & reprocessing Analysis Monte-Carlo All four LHC experiments All the LHC users
Jeff Templon – Seminar, KVI, 2004.02.17 - 11
LHC User Distribution
•Putting all computers in one spot leads to traffic jams•Which spot is willing to pay for & maintain 100k CPUs?
Jeff Templon – Seminar, KVI, 2004.02.17 - 12
Classic Motivation for Grids
Large Scales: 50k CPUs, petabytes of data (if we’re only talking ten machines, who cares?)
Large Dynamic Range: bursty usage patterns Why buy 25k CPUs if 60% of the time you only need 900
CPUs? Multiple user groups on single system
Can’t “hard-wire” the system for your purposes Wide-area access requirements
Users not in same lab or even continent
Jeff Templon – Seminar, KVI, 2004.02.17 - 13
Solution using Grids
Large Scales: 50k CPUs, petabytes of data Assemble 50k+ CPUs and petabytes of mass storage Don’t need to be in the same place!
Large Dynamic Range: bursty usage patterns When you need less than you have, others use excess capacity When you need more, use others’ excess capacities
Multiple user groups on single system “Generic” grid software services (think web server here)
Wide-area access requirements Public Key Infrastructure for authentication & authorization
How do Grids Work?
Public Key Infrastructure&
Generic Grid Software Services
Jeff Templon – Seminar, KVI, 2004.02.17 - 15
Public Key Infrastructure – Single Login
Based on asymmetric cryptography: public/private key pairs Need network of trusted Certificate Authorities (e.g. Verisign) CAs issue certificates to users, computers, or services
running on computers (typically valid for two years) Certificates carry
an “identity” /O=dutchgrid/O=users/O=kvi/CN=Nasser Kalantar a public key identity of the issuing CA a digital signature (transformation on cert text using CA private
key) Need to tighten this part up, takes too long
What is a Grid?
(what are these Generic Grid Software Services?)
Jeff Templon – Seminar, KVI, 2004.02.17 - 17
C = DS = =Grid software service(like http server)
InformationSystem
CC
C
C
C
C
C
Information System is Central Nervous System of Grid
Info system defines grid
Jeff Templon – Seminar, KVI, 2004.02.17 - 18
C = DS = =Grid software service
I.S.
CC
C
C
C
C
C
DS
DS
DSDSDS
DS
D.M.S
Data Grid
Jeff Templon – Seminar, KVI, 2004.02.17 - 19
C = DS = =Grid software service
I.S.
CC
C
C
C
C
C
DS
DS
DSDSDS
DS
D.M.S
Computing Task Submission
W.M.S.
proxy + command;(data);
Get fresh, detailed info
Coarse Requirements
Candidate Clusters
Jeff Templon – Seminar, KVI, 2004.02.17 - 20
DS
DS
DSDSDS
DS
List of best locations
C = DS = =Grid software service
I.S.
C
D.M.S
Computing Task Execution
W.M.S.
proxy + command;(data);
logger
Where is m
y data?
proxy
Find DMS
Jeff Templon – Seminar, KVI, 2004.02.17 - 21
DSDSDS
C = DS = =Grid software service
I.S.
C
D.M.S
Computing Task Execution
W.M.S.
logger
How to contact O.D.S.?
Where do I put the data?
proxy + data
Register outputDone
Jeff Templon – Seminar, KVI, 2004.02.17 - 22
Can do a bit more with data…
Jeff Templon – Seminar, KVI, 2004.02.17 - 23
C = DS = =Grid software service
I.S.
CC
C
C
C
C
C
DS
DS
DSDSDS
DS
D.M.S
Computing Task Submission with data
W.M.S.
proxy + command;(data file spec);
Get fresh, detailed info
Coarse Requirements
Candidate Clusters
Find copies of data
Jeff Templon – Seminar, KVI, 2004.02.17 - 24
DS
DS
DSDSDS
DS
List of best locations
C = DS = =Grid software service
I.S.
C
D.M.S
Computing Task Execution
W.M.S.
proxy + command;(data file spec);
logger
Where is m
y data?
Find DMS
Local access
Jeff Templon – Seminar, KVI, 2004.02.17 - 25
Grids are working now
See it in action
Jeff Templon – Seminar, KVI, 2004.02.17 - 26Applications
D0 Run II Data Reprocessing
Biomedical Image Comparison
Ozone Profile Computation, Storage, & Access
Jeff Templon – Seminar, KVI, 2004.02.17 - 27
Grid Research Topics “push” vs. “pull” model of operation (you just saw “push”)
We know how to do fine-grained priority & access control in “push” We think “pull” is more robust operationally
Software distribution HEP has BIG programs with many versions … “pre” vs. “zero”
install Data distribution
Similar issues (software is just executable data!) A definite bottleneck in D0 reprocessing
(1 Gbit/100/2/2=modem!) Speedup limited to 92% of theoretical About 1 hr CPU lost per job
Optimization in general is hard, and is it worth it????
Jeff Templon – Seminar, KVI, 2004.02.17 - 28
Jeff Templon – Seminar, KVI, 2004.02.17 - 29
Didn’t talk about
Grid projects European Data Grid – ends this month, 10 M€ / 3y VL-E – ongoing, 20 M€ / 5y EGEE – starts in April, 30 M€ / 2y
Gridifying HEP software & computing activities Transition from supercomputing to Grid computing
(funding) Integration with other technologies
Submit, monitor, control computing work from your PDA or mobile!
Jeff Templon – Seminar, KVI, 2004.02.17 - 30
Conclusions
Grids are working now for workload management in HEP Big developments in next two years with large-scale
deployments of LCG & EGEE projects Seems Grids are catching on (IBM / Globus alliance) Lots of interesting stuff to try out!!!
Jeff Templon – Seminar, KVI, 2004.02.17 - 31
What’s There Now? Job Submission
Marriage of Globus, Condor-G, EDG Workload Manager Latest version reliable at 99% level
Information System New System (R-GMA) – very good information model, implementation
still evolving Old System (MDS) – poor information model, poor architecture, but
hacks allow 99% “uptime” Data Management
Replica Location Service – convenient and powerful system for locating and manipulating distributed data; mostly still user-driven (no heuristics)
Data Storage “Bare gridFTP server” – reliable but mostly suited to disk-only mass
store SRM – no mature implementations
Jeff Templon – Seminar, KVI, 2004.02.17 - 32
Grids are InformationInformation
System
InformationSystem
A-Grid
B-Grid