Date post: | 01-Jan-2016 |
Category: |
Documents |
Upload: | bernadette-gaines |
View: | 214 times |
Download: | 0 times |
[email protected] DIANE ProjectCHEP 03
DIANE
Distributed Analysis Environment for semi-interactive simulation and analysis in Physics
Jakub T. Moscicki, CERN/IT
[email protected] DIANE ProjectCHEP 03
The need for distribution
do the analysis/simulation job in parallel tasks
to speed up the work
by using powerful, worldwide distributed computentional resources,
acessing the data in mass storage systems otherwise too big to fit on your laptop.
[email protected] DIANE ProjectCHEP 03
Practical Exampleexample: simulation with analysis
each task produces a file with histograms
job result = sum of histograms produced by tasks
master-worker model
client starts a job
workers perform tasks and produce histograms
master integrates the results
[email protected] DIANE ProjectCHEP 03
Tools at hand: local batch queue
clusters/farms of PCs running batch queuesuse LSF or PBS to submit parallel analysis tasks producing histograms
collect and post-process results by hand
add all the resulting histogram files
> foreach i (1 2 3 4 5 6 7 8 9 10) > bsub -q 8nh run-worker > end
Job <250973> is submitted to queue <8nh>. Job <250974> is submitted to queue <8nh>. ...
>ls LSFJOB_250973 LSFJOB_250974 LSFJOB_250975
[email protected] DIANE ProjectCHEP 03
Tools at hand: global batch queue
federation of clusters also known as a GRIDuse EDG Resource Broker to submit tasks
> dg-job-submit worker.jdl
Connecting to host grid014.ct.infn.it, port 7771Logging to host grid014.ct.infn.it, port 15830
****************************************************************************************** JOB SUBMIT OUTCOME The job has been successfully submitted to the Resource Broker. Use dg-job-status command to check job current status. Your job identifier (dg_jobId) is:
- https://grid014.ct.infn.it:7846/137.138.181.249/195456283026315?grid014.ct.infn.it:7771
******************************************************************************************
[email protected] DIANE ProjectCHEP 03
Comments
using middleware directly requires a lot of manual workintegration of task results
keeping track of failed task and resubmiting workers
not easy to monitor the job progress and cancel jobs
only one task per workervery inefficient if worker initialization time is long
[email protected] DIANE ProjectCHEP 03
User Wishlist
automatic integration of task results
monitoring of job progress and individual tasks
automatic error-recovery policies
granularity of the size of the task may change independently of the number of workers -- natural load-balancing and optimization of performance
performance fine tuning – workers may be mapped to threads, processed or machines depending on the context
uniform, transparent and easy user interface and API hiding complexity of underlying middleware mechanisms
the same API and UI is used when running local jobs and GRID jobs
batch, interactive and semi-interactive operation mode
[email protected] DIANE ProjectCHEP 03
Wishlist (cntd)
a lightweight “add-on” framework which drives the execution of parallel jobs in master worker model over any specific middleware implementation:
application oriented: target common HEP use cases
independent from any particular analysis tool
with layered and modular architecture which is easy to adapt to new environment: important for middleware transition
integrated in modern scripting environment: e.g. python
using standards: e.g. exploit AIDA for analysis making it easy to plug your favourite analysis tool
To address these issues DIANE Project was set up in CERN/IT
[email protected] DIANE ProjectCHEP 03
DIANE OverviewDIANE R&D Project
started in 2001 in CERN/IT with very limited resources (~1FTE)
collaboration with Geant 4 groups at CERN, INFN, ESA
succesful prototypes running on LSF and EDG
[email protected] DIANE ProjectCHEP 03
Applications of DIANEExamples of interdisciplinary applications
Geant4 simulation and analysis
speed-up factor ~ 30 times
cern.ch/diane
LHC: ntuple analysis and simulationradiotherapy: brachytherapy, IMRTspace missions: ESA Bepi Colombo, LISA
[email protected] DIANE ProjectCHEP 03
DIANE for HEP workgroup clusters
features many users, many jobs diverse applications:
ntuple analysis, simulation, ... interactive ... semi-interactive ... batch ~ 100s of machines
dynamic environment users may submit their analysis code
mixed CPU and I/O intensive some applications may be preconfigured
general analysis e.g. ntuple projections or experiment specific apps load balancing important
[email protected] DIANE ProjectCHEP 03
DIANE for Simulation in Medical Apps
example: brachytherapy optimization of the treatment planning by MC simulation
features CPU intensive few users, few jobs one preconfigured application interactive: seconds .. minutes ~ 10s of machines
ongoing joint collaboration with G4and hospital units in Torino, Italy
[email protected] DIANE ProjectCHEP 03
DIANE for Simulation in Space Science
LISA: MC simulation for gravitational waves experiment
Bepi Colombo mission: HERMES experiment features
CPU intensive big jobs (10 processor-years) preconfigured applications batch: days 1000+ machines
requirements: error recovery important monitoring and diagnostics
[email protected] DIANE ProjectCHEP 03
DIANE Prototype and Testing scalability tests
70 worker nodes
140 milion Geant 4 events
[email protected] DIANE ProjectCHEP 03
DIANE Screenshot
Sun Mar 16 14:58:31 2003 : DIANE.JobMaster.workerReady : worker 5 now readySun Mar 16 14:58:42 2003 : DIANE.JobMaster<ControlThread>.run : number of tasks to finish: 1 len(self.master.job_progress) : 5 len(self.master.ready_workers) : 9 len(self.master.busy_workers) : 1 len(self.master.registered_workers):10
Sun Mar 16 14:58:45 2003 : DIANE.JobMaster.receiveTaskResult : recieved result, taskid =3 status: ok
Processing file task-output2.hbkAdding histogram 10Adding histogram 20Scanned all IDs from 0 to 100, other HBOOK ids (if any) were ignoredSun Mar 16 14:58:45 2003 : DIANE.JobMaster<ControlThread>.run : job completed ok, quitting control loopDIANE.JobMaster<ControlThread>.notifyJobFinished : starting notificationDIANE.JobMaster<ControlThread>.notifyJobFinished : deactivating masterDIANE.JobMaster.workerReady : master not activatedDIANE.JobMaster<ControlThread>.sendResultToClient : terminated...terminating JobMaster server process312.520u 77.250s 15:09.53 42.8% 0+0k 0+0io 5835pf+0w
[1] Done start_master
[email protected] DIANE ProjectCHEP 03
Referencesmore informarion:
cern.ch/diane
www.ge.infn.it/geant4/techtransf
aida.freehep.org