Date post: | 31-Dec-2015 |
Category: |
Documents |
Upload: | sharon-dennis |
View: | 16 times |
Download: | 0 times |
The ATLAS ROOT-based data formats:
recent improvements and performance measurements
Wahid Bhimji University of Edinburgh
J. Cranshaw, P. van Gemmeren, D. Malon, R. D. Schaffer, and I. Vukotic
On behalf of the ATLAS collaboration
CHEP 2012
ATLAS ROOT-based data formats - CHEP 20122
Overview ATLAS workflows and ROOT usage
Optimisations of ATLAS data formats• “AOD/ESD” and “D3PD”
An ATLAS ROOT I/O testing framework
ATLAS ROOT-based data formats - CHEP 20123
ATLAS ROOT I/O data flow
RAW
ESD AOD
dESDdAO
D
D3PD
Reco
Reduce
Analysis
Analysis
BytestreamNot ROOT
AOD/ESDROOT with POOL Persistancy
D3PDROOT ntuples with only primitive types and vectors of those
Athena Software Framework
Non-AthenaUser Code(standard tools and examples)
Some Simplification!
User Ntuples
TAG
Analysis
ATLAS ROOT-based data formats - CHEP 20124
ATLAS analysis User analysis here called “analysis” as
opposed to “production” is:• Not centrally organised• Heavy on I/O – all ATLAS analysis is ROOT I/O
By no. of jobs,Analysis=55%
By wallclock time, Analysis=22%
ATLAS ROOT-based data formats - CHEP 20125
ATLAS analysis: growth of D3PD
Total D3PDsMore jobs
Top and SMgroups have most jobs
AOD: 46%
“pathena” jobs • running in framework • Eg. Private Simulation ;
AOD Analysis “prun” jobs
• Whatever user wants!• Mostly D3PD analysis
Optimisation of D3PD analysisbecoming very important
ATLAS ROOT-based data formats - CHEP 20126
ATLAS ROOT I/OPOOL: AOD/ESD use ROOT I/O via the POOL
persistency framework. ROOT is the only really supported
technology for object streaming into filesROOT versions: 2011 data (Tier 0 processing) : ROOT
5.26 2011 data (Reprocessing): ROOT 5.28 2012 data : ROOT 5.30
ATLAS ROOT-based data formats - CHEP 20127
ROOT I/O features we use
Writing files: From 2011 use ROOT “autoflush” and ``optimize
baskets'’ Initial 2011 running: baskets (buffers) resized to share out
(default) 30 MB and have similar number of entries. Split level: data members of objects are placed in
different branches. Initial 2011 running: AOD /ESD fully-split (99) into primitive data
Reading files: There is a memory buffer TTreeCache (TTC) which learns
used branches and adds them to a cache. Used by default for AOD->D3PD in Athena; For user code its up to them
See e.g. P. Canal’s talk at CHEP10
ATLAS ROOT-based data formats - CHEP 20128
ESD/AOD Optimisations ATLAS Athena processes event by event –
no partial objet retrieval Previous layout (fully-split 30 MB AutoFlush):
many branches and many events per basket Non–optimal particularly for event picking:
• Selecting with TAG: Using event metadata: : e.g. on trigger, event or object No payload data is retrieved for unselected events Slower data rate but overall faster
• Also multi-processor AthenaMP framework: Multiple workers, each read a non-sequential part of input
ATLAS ROOT-based data formats - CHEP 20129
2011 ESD/AOD Optimisations Switched splitting off Kept member-wise
streaming• Each collection stored in a single
basket.• Except for largest container
Number of baskets from ~10,000 to ~800, • Increases average size >x10.• Lowers the no. of reads.
Write fewer events per basket in optimisation: ,• ESD flush every 5 events• AOD every 10 events
Less data needed if selecting events when reading
ATLAS ROOT-based data formats - CHEP 201210
Performance Results
Reading all events is ~30% faster
Selective reading (1%) using TAGs: 4-5 times faster
AOD Layout All events Selective 1% read
OLD: Fully split, 30 MB Auto-flush
55 (±3) ms/ev. 270 ms /ev.
CURRENT: No split, 10 event Auto-flush
35 (±2) ms /ev. 60 ms/ev.
Local disk read; Controlled environment; Cache cleaned
ATLAS ROOT-based data formats - CHEP 201211
Further Performance Results
File Size is very similar in old and current format. Virtual Memory foot print reduced by about 50-100
MB for writing and reading:• Fewer baskets loaded unto memory.
Write Speed has increased by about 20%.• The write speed was increased even further (to almost 50%), as the
compression level was relaxed.
New scheme used for autumn 2011 reprocessing
Athena AOD read speed (including Transient/Persistent conversion and reconstruction of some objects) > 5 MB/s from ~3 MB/s in original processing
(including ROOT 5.26 to 5.28 as well as layout change)
ATLAS ROOT-based data formats - CHEP 201212
Testing Framework ROOT I/O changes affect different storage
systems on the Grid differently• E.g. TTC with Rfio/DPM needed some fixes
Also seen cases where AutoFlush and TTC don’t reduce HDD reads/time as expected
Need regular tests on all systems used (in addition to controlled local tests) to avoid I/O “traps”
Also now have a ROOT IO group well attended by ROOT developers ; ATLAS; CMS and LHCb • Coming up with a rewritten basket optimization• We promised to test any developments rapidly
ATLAS ROOT-based data formats - CHEP 201213
Built a Testing FrameworkUsing hammercloud:
Takes our tests from SVNRuns on all large Tier 2s
Highly instrumented ROOT (e.g. reads; bytes); WN (traffic; load; cputype); storage type etc.
Hammercloud
Oracle Db
SVNDefine Code;Release;Dataset;…
Uploads stats
Regularly submitting single tests
Sites
Data mining toolsCommand line, Web interface,
ROOT scripts
ROOT source (via curl)
dataset
Extremely fast feedback:a.m.: New feature to test inp.m: Answers for all storagesystems in the world.
Identical dataset – pushed to all sites
ATLAS ROOT-based data formats - CHEP 2012
Examples of Tests1. ROOT based reading of D3PD (or AOD):
• Provides metrics from ROOT (no. of reads/ read speed)• Like a user D3PD analysis• Reading all branches and 100% or 10% events (at random);• Reading limited 2% branches (those used in a real Higgs analysis)
2. Using different ROOT versions• Latest Athena Releases.• Using 5.32 (not yet in Athena) • Using trunk of ROOT
3. Athena D3PD making4. Instrumented user code examples5. Wide-Area-Network Tests
http://ivukotic.web.cern.ch/ivukotic/HC/index.asp14
ATLAS ROOT-based data formats - CHEP 201216
Testing ROOT Versions
ROOT 5.32 (red) again no big change on average
100% read, TTC on (30MB)
ROOT 5.28(Athena 17.0.4)
ROOT 5.30(Athena 17.1.4)
Tracking change to ROOT 5.30 in Athena – no significant changesin wall times on any storage system
Walltime
ATLAS ROOT-based data formats - CHEP 201217
Tuning D3PDs: Compression and Auto-Flush
Rewrite D3PD files:• Using ROOT 5.30 – current Athena release
Try different zip levels, current default 1:• Local testing suggested “6” more optimal (in
read speed) so copied this to all sites• Zip 6 files are ~5% smaller so also gains in copy
times and disk space Change autoflush setting, currently
30MB:• Try extreme values of 3 and 300 MB
ATLAS ROOT-based data formats - CHEP 201218
Tuning D3PDsWrite (in ROOT 5.30) files with different zip levels/
autoflush bufferZip 6 at least as good at 1 so default changed
3 MB auto flush not good!
ATLAS ROOT-based data formats - CHEP 201219
TTreeCache TTreeCache
essential at some sites
Users still don’t set it
Different optimal values per site
Ability to set in job environment would be useful
300 MB TTC
No TTC
ATLAS ROOT-based data formats - CHEP 201220
Reading limited branches
D3PD users don’t usually pick events• But do pick small sets of branches from
large D3PDs
So have a test to read a limited set of branches (~2%)• Based on those read in for a H->bb
analysis
ATLAS ROOT-based data formats - CHEP 201221
Drop in CPU efficiency on some storage systems
Systems using vector reading protocol (dCap; xrootd) still have high eff.
TTC: 300M
ATLAS ROOT-based data formats - CHEP 201222
Wide-Area-Network Analysis
CPU Eff.
100%-
50%-
0%-
Ping time
Running on various USsites
Local read
Reading from other US sites
Reading from CERN
ATLAS ROOT-based data formats - CHEP 201223
CPU Efficiencies over WAN
First measurements …. not bad rates • 94% local read eff. drops to 60%-80% for other US sites• and around 45% for reading from CERN
Offers promise for this kind of running if needed• plan to use such measurements for scheduling decisions
ATLAS ROOT-based data formats - CHEP 201224
ConclusionsMade performance improvements in ATLAS ROOT
I/O
Built I/O Testing framework: for monitoring and tuningPlan to:
• Test and develop core ROOT I/O with working group:• Basket Optimisation • Asynchronous Prefetching
• Provide sensible defaults for user analysis• Further develop WAN reading• Site tuning• Lots more mining of our performance data • New I/O strategies for multicore (see next talk!)