+ All Categories
Home > Documents > The ATLAS Computing Analysis Model Roger Jones Lancaster University GDB BNL, Long Island, 6/9/2006.

The ATLAS Computing Analysis Model Roger Jones Lancaster University GDB BNL, Long Island, 6/9/2006.

Date post: 18-Jan-2018
Category:
Upload: derek-shields
View: 215 times
Download: 0 times
Share this document with a friend
Description:
RWL Jones 6 Sept 2006 BNL 3 Computing Resources Computing Model well evolved, documented in C-TDRComputing Model well evolved, documented in C-TDR Externally reviewed  October There are (and will remain for some time) many unknownsThere are (and will remain for some time) many unknowns Calibration and alignment strategy is still evolving Physics data access patterns still to be truly exercised Unlikely to know the real patterns until 2007/2008! Still uncertainties on the event sizes, reconstruction time All of this stresses the vital role of the ATLAS Computing System Commissioning exercisesAll of this stresses the vital role of the ATLAS Computing System Commissioning exercises These began with the (partially successful) T0-T1-T2 transfer tests in June/July Will have: T0 processing  T1  T2 T1 reprocessing  T2 Calibration data challenge Analysis Challenge Full chain test Lesson from LEP (and Tevatron?) - you always underestimate what you need!Lesson from LEP (and Tevatron?) - you always underestimate what you need!

If you can't read please download the document

Transcript

The ATLAS Computing & Analysis Model Roger Jones Lancaster University GDB BNL, Long Island, 6/9/2006 RWL Jones 6 Sept 2006 BNL 2 Overview Very brief summary ATLAS Facilities and their rolesVery brief summary ATLAS Facilities and their roles Resource requirements: bandwidth, hardware,running schedulesResource requirements: bandwidth, hardware,running schedules Analysis modes and operationsAnalysis modes and operations Data selectionData selection InteroperabilityInteroperability RWL Jones 6 Sept 2006 BNL 3 Computing Resources Computing Model well evolved, documented in C-TDRComputing Model well evolved, documented in C-TDR Externally reviewed October There are (and will remain for some time) many unknownsThere are (and will remain for some time) many unknowns Calibration and alignment strategy is still evolving Physics data access patterns still to be truly exercised Unlikely to know the real patterns until 2007/2008! Still uncertainties on the event sizes, reconstruction time All of this stresses the vital role of the ATLAS Computing System Commissioning exercisesAll of this stresses the vital role of the ATLAS Computing System Commissioning exercises These began with the (partially successful) T0-T1-T2 transfer tests in June/July Will have: T0 processing T1 T2 T1 reprocessing T2 Calibration data challenge Analysis Challenge Full chain test Lesson from LEP (and Tevatron?) - you always underestimate what you need!Lesson from LEP (and Tevatron?) - you always underestimate what you need! RWL Jones 6 Sept 2006 BNL 4 ATLAS Facilities (Steady State) Event Filter Farm at CERNEvent Filter Farm at CERN Located near the Experiment, assembles data into a stream to the Tier 0 Center Tier 0 Center at CERNTier 0 Center at CERN Raw data Mass storage at CERN and to Tier 1 centers Swift production of Event Summary Data (ESD) and Analysis Object Data (AOD) Ship ESD, AOD to Tier 1 centers Mass storage at CERN Tier 1 Centers distributed worldwide (10 centers)Tier 1 Centers distributed worldwide (10 centers) Re-reconstruction of raw data, producing new ESD, AOD (~2 months after arrival and at year end) Scheduled, group access to full ESD and AOD Tier 2 Centers distributed worldwide (approximately 30 centers)Tier 2 Centers distributed worldwide (approximately 30 centers) On demand user physics analysis of shared datasets Monte Carlo Simulation, producing ESD, AOD, ESD, AOD Tier 1 centers CERN Analysis FacilityCERN Analysis Facility Heightened access to ESD and RAW/calibration data on demand Calibration, detector optimization, some analysis - vital in early stages Tier 3 Centers distributed worldwideTier 3 Centers distributed worldwide Physics analysis RWL Jones 6 Sept 2006 BNL 5 Inputs to the ATLAS Computing Model in TDR These are now all out of the window! Please treat new numbers as preliminary RWL Jones 6 Sept 2006 BNL 6 New Straw Man Profile yearenergyluminosityphysics beam time GeV 5x10 30 protons - 26 days at 30% overall efficiency 0.7*10 6 seconds TeV 0.5x10 33 protons - starting beginning July 4*10 6 seconds ions - end of run - 5 days at 50% overall efficiency 0.2*10 6 seconds TeV 1x10 33 protons:50% better than 2008 6*10 6 seconds ions: 20 days of beam at 50% efficiency 10 6 seconds TeV 1x10 34 TDR targets: protons: 10 7 seconds ions: 2* 10 6 seconds RWL Jones 6 Sept 2006 BNL 7 Inputs to the ATLAS Computing Model (2) Under revision in the light of reality! (Some of) early ESD will be larger Simulation is remaining obstinately slow (esp. G4) Allow subset of data to be larger in early running RWL Jones 6 Sept 2006 BNL 8 Refinements of Model Much more planning for early runningMuch more planning for early running Calibration, optimisation and simulation heightened with early dataCalibration, optimisation and simulation heightened with early data Heightened access to RAW, ESD and augmented formats (discussion of streams later) Evolution of roles - more RAW/ESD and partial reconstruction in Tier 2s, Tier 1s retain some simulation role in early years. RWL Jones 6 Sept 2006 BNL 9 Total ATLAS Requirements start 2008 CPU(MSI2k)Disk(PB) Tape (PB) Tier CAF Tier 1s Tier 2s RWL Jones 6 Sept 2006 BNL 10 Evolution RWL Jones 6 Sept 2006 BNL 11 Observations Data storage requirements generally fall with reduced live- time (obviously)Data storage requirements generally fall with reduced live- time (obviously) CPU does not fall as muchCPU does not fall as much CERN CPU determined by rate and calibration requirements More calibration and optimisation on 2007 data Higher than hoped simulation time per event T1s see significant reductionsT1s see significant reductions Cumulative effect of less data on reprocessing T2s see small initial fall but are bigger post 2009 T2s see small initial fall but are bigger post 2009 Argument for spreading the gain and the pain with T1s RWL Jones 6 Sept 2006 BNL 12 Tier 2 view Tier 0 view Data Flow EF farm T0EF farm T0 320 MB/s continuous T0 Raw data Mass Storage at CERNT0 Raw data Mass Storage at CERN T0 Raw data Tier 1 centersT0 Raw data Tier 1 centers T0 ESD, AOD, TAG Tier 1 centersT0 ESD, AOD, TAG Tier 1 centers 2 copies of ESD distributed worldwide T1 T2T1 T2 Some RAW/ESD, All AOD, All TAG Some group derived datasets T2 T1T2 T1 Simulated RAW, ESD, AOD, TAG T0 T2 Calibration processing?T0 T2 Calibration processing? RWL Jones 6 Sept 2006 BNL 13 ATLAS partial &average T1 Data Flow (2008) Tier-0 CPU farm T1 Other Tier-1s disk buffer RAW 1.6 GB/file 0.02 Hz 1.7K f/day 32 MB/s 2.7 TB/day ESD2 0.5 GB/file 0.02 Hz 1.7K f/day 10 MB/s 0.8 TB/day AOD2 10 MB/file 0.2 Hz 17K f/day 2 MB/s 0.16 TB/day AODm2 500 MB/file Hz 0.34K f/day 2 MB/s 0.16 TB/day RAW ESD2 AODm Hz 3.74K f/day 44 MB/s 3.66 TB/day RAW ESD (2x) AODm (10x) 1 Hz 85K f/day 720 MB/s T1 Other Tier-1s T1 Each Tier-2 Tape RAW 1.6 GB/file 0.02 Hz 1.7K f/day 32 MB/s 2.7 TB/day disk storage AODm2 500 MB/file Hz 0.34K f/day 2 MB/s 0.16 TB/day ESD2 0.5 GB/file 0.02 Hz 1.7K f/day 10 MB/s 0.8 TB/day AOD2 10 MB/file 0.2 Hz 17K f/day 2 MB/s 0.16 TB/day ESD2 0.5 GB/file 0.02 Hz 1.7K f/day 10 MB/s 0.8 TB/day AODm2 500 MB/file Hz 3.1K f/day 18 MB/s 1.44 TB/day ESD2 0.5 GB/file 0.02 Hz 1.7K f/day 10 MB/s 0.8 TB/day AODm2 500 MB/file Hz 3.1K f/day 18 MB/s 1.44 TB/day ESD1 0.5 GB/file 0.02 Hz 1.7K f/day 10 MB/s 0.8 TB/day AODm1 500 MB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day AODm1 500 MB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day AODm2 500 MB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day Plus simulation and analysis data flow RWL Jones 6 Sept 2006 BNL 14 T1/T2 Group This has been trying to describe:This has been trying to describe: Network traffic to T1s and T2s at each specific site Required T2 storage at associated T1s Note: this has all been based on the C-TDR model and resources, so is unrealisticNote: this has all been based on the C-TDR model and resources, so is unrealistic Note: we also know that the pledges will changeNote: we also know that the pledges will change RWL Jones 6 Sept 2006 BNL 15 Observations The wide range of T1 sizes introduces some inefficiencies compared with the ideal caseThe wide range of T1 sizes introduces some inefficiencies compared with the ideal case Some T1s will have a large load because of their chosen T2s Some are underused and we must negotiate better balance The T2s tend to have too high a cpu/disk ratioThe T2s tend to have too high a cpu/disk ratio Optimal use of the T2 resources delivers lots of simulation with network and T1 disk consequences (although the higher cpu/event will reduce this) The T2 disk only allows about ~60% of the required analysis Other models would seriously increase network traffic BNL full ESD copy has network implications elsewhereBNL full ESD copy has network implications elsewhere RWL Jones 6 Sept 2006 BNL 16 Analysis computing model Analysis model broken into two components Scheduled central production of augmented AOD, tuples & TAG collections from ESD Derived files moved to other T1s and to T2s On-demand user analysis of augmented AOD streams, tuples, new selections etc and individual user simulation and CPU-bound tasks matching the official MC production Modest job traffic between T2s RWL Jones 6 Sept 2006 BNL 17 Streaming This is an optimisation issueThis is an optimisation issue All discussions are about optimisation of data access TDR had 4 streams from event filterTDR had 4 streams from event filter Primary physics, calibration, express, problem events Calibration stream has split at least once since! At AOD, envisage ~10 streamsAt AOD, envisage ~10 streams Based on trigger bits (immutable) Optimizes access for detector optimisation We are now planning ESD and RAW streamingWe are now planning ESD and RAW streaming Straw man streaming schemes to be tested in large-scale exercises Debates between inclusive and exclusive streams (access vs data management) - inclusive may add ~10% to data volumesDebates between inclusive and exclusive streams (access vs data management) - inclusive may add ~10% to data volumes (Some of) All streams to all Tier 1s(Some of) All streams to all Tier 1s Raw to archive blocked by stream and time for efficient reprocessing RWL Jones 6 Sept 2006 BNL 18 TAG Access Streaming is not the only way to partition and access a subsetStreaming is not the only way to partition and access a subset The selection and direct access to individual events is via a TAG databaseThe selection and direct access to individual events is via a TAG database TAG is a keyed list of variables/event Overhead of file opens is acceptable in many scenarios Works very well with pre-streamed data Two rolesTwo roles Direct access to event in file via pointer Data collection definition function Two formats, file and databaseTwo formats, file and database Now believe large queries require full database Multi-TB relational database Restricts it to Tier1s and large Tier2s/CAF File-based TAG allows direct access to events in files (pointers) Ordinary Tier2s hold file-based primary TAG corresponding to locally- held datasets RWL Jones 6 Sept 2006 BNL 19 TAG in Tiers Full relational database too demanding for most Tier 2sFull relational database too demanding for most Tier 2s Expect Tier 2 to hold file-based tag for every local datasetExpect Tier 2 to hold file-based tag for every local dataset Supports event access and limited dataset definition Tier 1 will be expected to hold full database TAG as well as file formats (for distribution)Tier 1 will be expected to hold full database TAG as well as file formats (for distribution) Tentative plans for queued access to full database version RWL Jones 6 Sept 2006 BNL 20 Group Analysis Group analysis will produceGroup analysis will produce Deep copies of subsets Dataset definitions TAG selections Characterised by access to full ESD and perhaps RAWCharacterised by access to full ESD and perhaps RAW This is resource intensive Must be a scheduled activity Can back-navigate from AOD to ESD at same site Can harvest small samples of ESD (and some RAW) to be sent to Tier 2s Must be agreed by physics and detector groups Big TrainsBig Trains Most efficient access if analyses are blocked into a big train Idea around for a while, already used in e.g. heavy ions Each wagon (group) has a wagon master )production manager Must ensure will not derail the train Train must run often enough (every ~2 weeks?) RWL Jones 6 Sept 2006 BNL 21 On-demand Analysis Restricted Tier 2s and CAFRestricted Tier 2s and CAF Can specialise some Tier 2s for some groups ALL Tier 2s are for ATLAS-wide usage Role and group based quotas are essentialRole and group based quotas are essential Quotas to be determined per group not per user Data SelectionData Selection Over small samples with Tier-2 file-based TAG and AMI dataset selector TAG queries over larger samples by batch job to database TAG at Tier- 1s/large Tier 2s What data?What data? Group-derived EventViews Root Trees Subsets of ESD and RAW Pre-selected or selected via a Big Train run by working group Optimised turn-around may require appropriate queuesOptimised turn-around may require appropriate queues RWL Jones 6 Sept 2006 BNL 22 ATLAS Data Management Based on DatasetsBased on Datasets PoolFileCatalog API is used to hide grid differencesPoolFileCatalog API is used to hide grid differences On LCG, LFC acts as local replica catalog Aims to provide uniform access to data on all grids FTS is used to transfer data between the sitesFTS is used to transfer data between the sites Movements triggered by subscriptions In recent exercises, T2 data subscribed by T1 or regional ATLAS Moving quickly to subscriptions done by data placement team Evidently Data management is a central aspect of Distributed Analysis and of productionEvidently Data management is a central aspect of Distributed Analysis and of production PANDA is closely integrated with DDM and operational LCG instance was closely coupled with SC3 Right now we run a smaller instance for test purposes Final production version will be based on new middleware for SC4 (FPS) RWL Jones 6 Sept 2006 BNL 23 Dataset Access Collections of selected files comprise a datasetCollections of selected files comprise a dataset Dataset will have a well defined associated luminosity (integer number of luminosity blocks) At present the primary source of dataset information is the simulation data from the production systemAt present the primary source of dataset information is the simulation data from the production system Production database suffices for now Soon (!) this will be from real dataSoon (!) this will be from real data Datasets will also be defined by physics groups, detector groups Associated data will be modified for detector status, calibration info etc Requires a separate repository for dataset information and selection ATLAS Metadata Interface being developed for thisATLAS Metadata Interface being developed for this Keeps the production database secure Interaction between dataset and TAG selection being worked outInteraction between dataset and TAG selection being worked out RWL Jones 6 Sept 2006 BNL 24 Interoperability Interoperability between Grid deployments is crucialInteroperability between Grid deployments is crucial Gathering topics of concern (my quick survey):Gathering topics of concern (my quick survey): An effective cross-deployment operations infrastructure (an improved GGUS) Data management and movement interoperability Will cross-deployment info systems support FTS advanced functionality? Common local catalogues Common monitoring, accounting and VOMS-based policies NorduGrid is a vital resource to us, but we lack a clear view of its future evolution Their visit next week will be very useful to us RWL Jones 6 Sept 2006 BNL 25 Conclusions Computing Model Data well evolved for placing Raw, ESD and AOD at Tiered centersComputing Model Data well evolved for placing Raw, ESD and AOD at Tiered centers Still need to understand all the implications of Physics Analysis Distributed Analysis and Analysis Model Progressing well Reductions from revised schedule significant but not as big as you might expect CPU/Disk imbalances really distort the model T1/T2 load sharing still needs optimisation SC4/Computing System Commissioning in 2006 is vital.SC4/Computing System Commissioning in 2006 is vital. Some issues will only be resolved with real data in Some issues will only be resolved with real data in


Recommended