Date post: | 20-Jan-2016 |
Category: |
Documents |
Upload: | tracy-atkins |
View: | 214 times |
Download: | 0 times |
The ATLAS Trigger and Data Acquisition SystemAn historical overview
Fred Wickensrepresenting ATLAS TDAQ
But with some personal commentary
SLAC – 16 Nov 2010
16 Nov 2010 ATLAS TriggerDAQ 2
Health Warning
• This talk has been prepared at relatively short notice– insufficient time for me to check that details are totally up-to-
date with the appropriate experts. • Some statements are my own opinions/recollections
– may not be agreed by other ATLAS TDAQ participants.• Some slides have been stolen from other public talks on
ATLAS, any errors of interpretation or detail are entirely mine– If you wish to know more details of the system see talks
given at various recent conferences – especially CHEP2010, e.g. talks by Nicoletta Garelli and Ricardo Goncalo
16 Nov 2010 ATLAS TriggerDAQ 3
OutlineThe talk will describe the ATLAS Trigger Data Acquisition system,
focusing mainly on the DataFlow and HLT Framework. Describing how it evolved, what it is now, its performance and some perspectives for the future.
• Introduction• Including a Description of the problem as it appeared in 1994
when the ATLAS Technical Proposal was written• The history of how the system evolved
• Noting some of the key architectural and implementation decisions• The history of how the system evolved
• Noting some of the key architectural and implementation decisions• How it works• The performance achieved in 2010• Future Challenges• Summary
16 Nov 2010 ATLAS TriggerDAQ 4
Introduction
16 Nov 2010 ATLAS TriggerDAQ 5Fred Wickens, RAL - Seminar at SLACLHC Days in Split - 4 Oct. 20105
The ATLAS Detector
• Large angular coverage: ||<4.9; tracking in||<2.5
• Inner detector ~100M Channels
– Pixels, Si-strips and Transition Radiation Tracker
• Calorimeters – O(100K) Channels
– Liquid Argon electromagnetic; Iron-scintillating tile hadronic
• Outer Muon Spectrometer ~ 1M Channels
• Magnets:
– Inner Tracker 2T solenoid
– Muons 4T air-core toroids
16 Nov 2010 ATLAS TriggerDAQ 6
7 TeV
Physics rates at the LHC
• At LHC Physics of interest is small fraction of total interaction rate– b-physics fraction ~ 10-3
– t-physics fraction ~ 10-8
– Higgs fraction ~ 10-11
• At 14 Tev and Luminosity 1034
– Design energy + Luminosity
• Total interactions 109 sec-1
– b-physics 106 sec-1
– t-physics 10 sec-1
– Higgs 10-2 sec-1
16 Nov 2010 ATLAS TriggerDAQ 7
The LHC and ATLAS
• LHC design has – Energy 14 TeV– Luminosity 1034 cm-2s-1 – Bunch separation 25 ns (bunch length ~1 ns)
• This results in– ~ 23 interactions / bunch crossing
• ~ 80 charged particles (mainly soft pions) / interaction • ~2000 charged particles / bunch crossing
• Produces ~PetaByte/s in detector (A stack of CDs a mile high!)• ATLAS Technical Proposal assumed:
– Event size of ~1MB– Level-1 Trigger rate of ~100 kHz– Hence data rate into DAQ/HLT ~100 GB/s– Acceptable rate to Off-line ~100 MB/s
16 Nov 2010 ATLAS TriggerDAQ 8
Experiment TDAQ comparisons
16 Nov 2010 ATLAS TriggerDAQ 9
Time Line
• 1994 Dec – ATLAS Technical Proposal– First physics assumed in 2005
• 1998 June – Level-1 Technical Design Report• 1998 June – DAQ, HLT and DCS Technical Progress Report• 2000 March – DAQ, HLT and DCS Technical Proposal • 2003 June – DAQ, HLT and DCS Technical Design Report
– First physics now assumed in 2007• 2005 – Detector Commissioning no central DAQ• 2006 – Detector Commissioning with central DAQ• 2007 – Start combined Cosmic running• 2008 Sept – few days of LHC running + more Cosmic running• 2009 Nov – LHC re-start at 900 GeV• 2010 Mar – LHC starts running at 7 TeV (3.5 on 3.5)
16 Nov 2010 ATLAS TriggerDAQ 10
Sociology
• The ATLAS TDAQ community comprises a very large number of people and many institutes, only a few people at most institutes– L1 TDR – 85 people from 20 institutes– DAQ/HLT/DCS TP – 197 people from 45 institutes– DAQ/HLT/DCS TPR – 211 people from 42 institutes– DAQ/HLT/DCS TDR – 228 people from 41 institutes– Current
• TDAQ Author list – 574 people from 105 institutes• TDAQ Institutes Board – 73 institutes
• In addition, much of the early development of ATLAS was done in the context of the LHC R&D projects, which formed their own sub-communities, many wedded to particular views, potential solutions and even technologies
16 Nov 2010 ATLAS TriggerDAQ 11
Funding Issues
• Complicated the picture even more:• Most of TDAQ was funded directly by participating Funding
Agencies – not indirectly via the Common Fund– Consequently the number of Funding Agencies involved was
• L1 – 7 FA’s (3 L1-Calo, 3 L1-Muon, 1 Central L1)• DAQ/HLT – Originally 15 FA’s + ~17% CF
– Note subsequently 7 other FA’s have also contributed
• DCS – 2 FA’s + ~40% CF
• TDAQ has had to adjust to several major perturbations in funding– Loss of major part of CF to meet shortfalls elsewhere– Initial DAQ/HLT system had to be scaled back by ~50% as
money needed to meet ATLAS cash-flow problems– But offset more recently by some additional contributions
from new ATLAS collaborators
16 Nov 2010 ATLAS TriggerDAQ 12
Summary of the Problem
• Thus ATLAS faced a problem:– Requiring system of unprecedented scale and performance– A number of candidate technologies existed which might support
such a system, but no clear front runner - in terms of performance, cost, longevity and future evolution
– A long timescale – time for technologies to evolve and solutions to emerge, but need to ensure a solution is in place
– A large diverse community – many ideas but little agreement– A spectrum of approaches – from too abstract to too concrete
• Development of the system was done gradually with various targeted studies:– To obtain a better understanding of issues– Strike a balance between abstract and concrete– Find a good solution, avoid searching for the “best”– Form a coherent community with consensus views
16 Nov 2010 ATLAS TriggerDAQ 13
History of how the System Evolved
16 Nov 2010 ATLAS TriggerDAQ 14
The ATLAS TP Architecture for DAQ/HLT
• 3-Level Trigger• L1 uses selected coarse
data from calorimeters and muon spectrometer
• L1 latency ~2us– Data help in pipelines in
detector front-ends
• Avg L2 latency ~10ms• Event Building at ~1kHz• Data to storage/off-line
at ~100Hz / ~100MB/s
16 Nov 2010 ATLAS TriggerDAQ 15
Possible Architecture Implementation of L2
• Uses RoI principle (see later)
• During L2 decision time data stored in “LVL2 Buffers”
• Parallel processing in Local Processors of data within each RoI from different detectors
• Results from different detectors and different RoI’s combined in a Global Processor
16 Nov 2010 ATLAS TriggerDAQ 16
Possible Overall DAQ/HLT Implementation
• Much of the thinking still based on custom h/w– VMEbus crates– DSP’s or FPGA’s for
L2 Local– Special processor
boards with micro-kernel for L2 Global
– Various high speed interconnects suggested: Fibre Channel, HIPPI, ATM, SCI
• Although was recognised that commodity h/w might become available for some parts
16 Nov 2010 ATLAS TriggerDAQ 17
Some key choices in the TP Architecture - 1
• Uniformity from the level of the ROL (Read-Out Link)– The read-out of each detector up to the output of the Read-
Out Driver (ROD) is the responsibility of the detector group.• Although there are some commonalities there are major
differences across the ROD’s
• Separation of the ROD’s and the “Read-Out Crates” (now Read-Out System - ROS)
– This simplified decision making, and also greatly simplified stand-alone detector and TDAQ commissioning and debugging.
– The separation continues to give considerable operational advantages
– But it has been noted that combining these units could lead to cheaper hardware and more flexible solutions.
16 Nov 2010 ATLAS TriggerDAQ 18
Some key choices in the TP Architecture - 2• Separation of the ROD crates of different detectors into a small
number (~15) of fixed TTC zones (Timing Trigger and Control – a real-time high precision timing system for synchronisation and transport of small data packets)– The DAQ (and EF part of HLT, but not L1 or L2) can also be
partitioned to allow concurrent independent operation of different detectors.
• This supports parallel independent calibration or debugging runs of different detectors.
• The LVL1 architecture essentially as built (See below)– Although there were further developments
• some technology changes (e.g. fewer ASIC’s, more FPGA’s)• max rate reduced from 100 to 75 kHz
– to reduce the cost of some detector electronics• The RoI Principle
– See next slide
16 Nov 2010 ATLAS TriggerDAQ 19
Regions of Interest
• The Level-1 selection is dominated by local signatures (i.e. within Region of Interest - RoI)
• Typically, there are 1-2 RoI/event• Can obtain further rate reduction
at Level-2 using just data withinthe Region of Interest– E.g. validate calorimeter data at full
granularity– If still OK check track in inner detector
• Emphasis on Reducing network b/w and processing power required
• Thus reduced the demand on the technology, but stronger coupling between Trigger and DataFlow
16 Nov 2010 ATLAS TriggerDAQ 20
ARCHITECTURE
40 MHz
Trigger DAQ
~1 PB/s(equivalent)
~ 100 Hz ~ 100 MB/sPhysics
Three logical levels
LVL1 - Fastest:Only Calo and
MuHardwired
LVL2 - Local:LVL1
refinement +track
associationLVL3 - Full
event:“Offline” analysis
~2 s
~10 ms
~1 sec.
Hierarchical data-flow
On-detector electronics:
Pipelines
Event fragments buffered in
parallel
Full event in processor farm
16 Nov 2010 ATLAS TriggerDAQ 21
Central TriggerProcessor
Region-of-Interest Unit(Level-1/Level-2)
Level-2 TriggerFront-end Systems
Calorimeter TriggerProcessor
MuonTrigger
Processor
µ
Subtriggerinformation
Timing, trigger andcontrol distribution
JetET e /
Calorimeters Muon Detectors
Level-1 TDR• Calorimeter and muon
– trigger on inclusive signatures• muons; • em/tau/jet calo clusters;
missing and sum ET
• Bunch crossing identified• Hardware trigger with
– Programmable thresholds– Selection based on
multiplicities and thresholds • Region of Interest Information
sent to Level-2 – e.g.– calo clusters (ET>10GeV) – muon tracks ( pT > 6 GeV)
16 Nov 2010 ATLAS TriggerDAQ 22
Evolution up to the TPR (1998) and TP (2000)
• Wide range of studies in technology and software• Standardised suites of software starting to emerge for various
functions• Some major changes for L2, consensus that:
– Should be implemented using sequential processing of algorithms mainly in Unix PCs
– Drop the custom RoI Distributor assumed earlier • Data requests should pass via the network
• But still far from consensus in some key areas– Networks (Ethernet slowly emerged, but various more
exotic networks were favoured for a long-time)– Read-Out Buffers – DAQ groups focussed on functionality,
L2 community focussed on performance. But even here there is some consensus emerging on a custom ROBin card
16 Nov 2010 ATLAS TriggerDAQ 23
Evolution up to the TPR (1998) and TP (2000)
• In the TP– yet more
convergence– the whole
DAQ/HLT system is described in a common language(UML)
16 Nov 2010 ATLAS TriggerDAQ 24
The TDR (2003) or The System Crystalises
• Overall Architecture agreed
16 Nov 2010 ATLAS TriggerDAQ 25
The TDR (2003)
• Baseline Implementation agreed
16 Nov 2010 ATLAS TriggerDAQ 26
TDR (2003)
• Gigabit Ethernet to be used for all networks• Most of the system to use standard rack-mounted Linux PC
servers • Read-Out System based on Industrial PC plus ROBin cards• Custom h/w limited to:
– RoIB (VME based system to build the RoI pointers from different parts of L1 into a single record)
– ROBin – custom PCI card to buffer event data from the detector RODs
– ROL – the read-out link used to transport data from a detector ROD to a ROBin (160 MB/s S-Link)
• Some adjustments to rates and assumed event size (1.5 MB)
16 Nov 2010 ATLAS TriggerDAQ 27
TDR (2003)
• Standard racks and their locations defined• Assumed would implement HLT with 8GHz (single-
core) dual socket PCs!
16 Nov 2010 ATLAS TriggerDAQ 28
SDX Level 1 Layout
Row 6 EF EF EF EF EF EF EF EF EF EF EF EF EF EF EF EF EF
Row 4 EF EF EF EF EF EF EF EF EF EF EF EF EF EF EF EF EF EF
Row 2 EF EF EF EF EF EF EF EF EF EF EF EF EF EF EF EF
Rack Number 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20Select which year
SDX Level 2 Layout
Row 6 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2 NWbb NWbb T DCS cDCS DSS PP PP
Row 4 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2
Row 2 EF/L2 EF/L2 SFO SFO DC SFI SFI SFI SFIBackend switch rackDC switch rackDC switch rackOnline switch rackOnline Online Online Online
Rack Number 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
16 Racks
Landing 350daN/m2
Landing 350daN/m2
Land
ing
350d
aN/m
2
door
door
Ven
tilat
ion
duct
(on
the
flor)
water pipes
Shaft PX15Cable Tray USA15
door
Racks enter via this door
17 Racks
18 Racks
Crinoline Ladder
FRONTs
FRONTs
BACKs
100 cm
80-100 cm
70-90 cm
63 cm (taking into account the cooler)
The lower distance is due to structuralbeams or ventillation flaps and and applies to ~5 racks.
The lower limit is due to the powerdistribution boxes and will affectmost of the racks.
AIR FLOW
20052006200720082009
Landing 350daN/m2
Flap TrapVen
tilat
ion
duct
(on
the
flor)
water pipes
door
Racks enter via this door
13 Racks
18 Racks
2 Racks
17 Racks
AIR FLOW 63 cm (taking into account the cooler)
74-90 cm
100 cm
80-100 cm
FRONTs
BACKs
FRONT
e-box e-box e-box
e-box e-box e-box
ATLAS TDAQ Barrack Rack Layout
16 Nov 2010 ATLAS TriggerDAQ 29
How it Works
16 Nov 2010 ATLAS TriggerDAQ 30
ARCHITECTURE
H
L
T
40 MHz
75 kHz
~2 kHz
~ 200 Hz
40 MHz
RoI data = 1-2%
~2 GB/s
FE Pipelines2.5 s
LVL1 accept
Read-Out DriversROD ROD ROD
LVL1 2.5 s
CalorimeterTrigger
MuonTrigger
Event Builder
EB
~3 GB/s
ROS Read-Out Sub-systems
Read-Out BuffersROB ROB ROB
120 GB/s Read-Out Links
Calo MuTrCh Other detectors
~ 1 PB/s
Event Filter
EFPEFP
EFP
~ 1 sec
EFN
~3 GB/s
~ 300 MB/s
~ 300 MB/s
Trigger DAQ
LVL2 ~ 10 ms
L2P
L2SV
L2NL2PL2P
ROIB
LVL2 accept
RoI requests
RoI’s
16 Nov 2010 ATLAS TriggerDAQ 31
The Trigger Framework
• The development and testing of event selection code used the off-line software framework (Athena)
• The event selection code was then ported to the on-line “DataCollection” framework– Latter provides the interfaces to the on-line services
(e.g. run control, configuration, message passing). • But an increasing number of services provided in off-line code
would need to be ported and maintained for on-line use – Including services required to handle calibration and
alignment data• The TDR introduced the “PESA Steering Controller” (PSC) to
reduce this on-going effort
16 Nov 2010 ATLAS TriggerDAQ 32
The Trigger Framework
• The PSC is an interface inside the on-line application– Provides “Athena-like” environment (i.e. off-line)– Hides the on-line complications from the event selection
s/w
16 Nov 2010 ATLAS TriggerDAQ 33
The Trigger Framework
• The PSC allowed the use of many off-line services– Simplifies trigger code development and testing– Provides direct access to s/w to handle calibration/alignment– Greater homogeneity between off-line and on-line code– Event selection code in L2 and EF see the same
environment – so eased moving algorithms between them– In principle still allowed multiple algorithm threads in a
single application – in practice proved no longer practical• e.g. some offline services used external libraries which were
not thread-safe
• Hence L2 moved to use of many off-line services – but dropped multiple algorithm threads– Still need thorough testing of services introduced
• to ensure that they met the online requirements (timing, memory leaks and robustness)
16 Nov 2010 ATLAS TriggerDAQ 34
Event Selection Code• HLT algorithms:
– Extract features from sub-detector data – Combine features to reconstruct physical objects
• electron, muon, jet, etc – Combine objects to test event topology– Organised into Trigger Chains
• Trigger Chain:– Started if seed has fired– Processing of a chain stops as soon as an algorithm not passed– Chain passes if last Hypothesis in the chain is passed– Can be used to seed other chains in next level
• Trigger Menu– Consists of a list of triggers including prescales at each level
• i.e. L1 Item -> L2 Chain -> EF Chain – Can enable/disable a trigger during a run using the prescales
• Event passed if at least one EF Chain passed
16 Nov 2010 ATLAS TriggerDAQ 35
Execution of a Trigger Chain
match?
L2 calorim.
L2 tracking
cluster?
track?
Level 2 seeded by Level 1•Fast reconstruction algorithms •Reconstruction within RoI
Electromagneticclusters
EM ROI
Level1:Region of Interest is found and position in EM calorimeter is passed to Level 2
E.F.calorim.
E.F.tracking
track?
e/ OK?
e/ reconst.
Ev.Filter seeded by Level 2•Offline reconstruction algorithms •Refined alignment and calibration
16 Nov 2010 ATLAS TriggerDAQ 36
Changes since the TDR
• Multi-core technology – the 8GHz (and faster) CPU clocks did not appear, have to use multi-cores. Major impact to date is how to handle the large increase in number of applications!
• XPU racks – the initial HLT racks are connected to both DataCollection network (for L2) and Back-End network (for EF)
• Fewer but larger more performant SFOs• Concept of Luminosity blocks – defines a short period (1-2 mins)
where running is stable – implemented using a Tag added in L1-CTP– Allows parts of the system to be removed/added within a run– Allows a synchronised change of Trigger Pre-scales within a run
• Event streaming – separate files for different event types (express, different physics streams, calibration, debug etc)
• Partial Event Building added – for greater flexibility in calibrations• Better scaling by more proxies (e.g. for Databases, Information
Servers, gathering of histograms)• Better monitoring and configuration tools
16 Nov 2010 ATLAS TriggerDAQ 37
Even
t dat
a re
ques
ts &
Del
ete
com
man
ds
Requ
este
d ev
ent d
ata
ATLAS TDAQ System
USA15
SDX1
Read-OutDrivers(RODs)
Level 1trigger
Read-OutSubsystems
(ROSes)
Timing Trigger Control (TTC)
~1600Read-Out
Links
ROIBuilder
VME bus
Level 2Super-visors
DataFlowManager
EventFilter(EF) farm
[~ 500][~1600] [~100] [~5 ]LocalStorageSubFarmOutputs
(SFOs)
Level 2 farm
EventBuilderSubFarmInputs
(SFIs)
UX15
CERN computer
center
Data Storage
Control+ Configuration
Monitoring
File Servers[70]
[48]
[26]
Network Switches[4]
Region Of Interest (ROI)
surface
underground
[1]
[~ 150]
~90M channels ~98% fully operational in 2010
~90M channels ~98% fully operational in 2010
ATLAS Data
Trigger Info
[# nodes]
Control, Configuration and Monitoring Network not shown
16 Nov 2010 ATLAS TriggerDAQ 38
Other Components/Issues
• No time to include many other important aspects• In particular:
– Run Control– Monitoring - of the infrastructure and of the data– Configuration - of the many O(10K) applications– Error reporting– Use of databases– Calibration runs– Scaling of all of the above
16 Nov 2010 ATLAS TriggerDAQ 39
Status in 2010 Running
16 Nov 2010 ATLAS TriggerDAQ 40
TDAQ Farm StatusComponent Installed Comments
Online&Monitoring 100% ~60 nodes
ROSes 100% ~150 nodes
ROIB & L2SVs 100%
HLT (L2+EF) ~50% ~800 xpu nodes; ~300 EF nodes
Event Builder 100% ~60 nodes (exploiting multi-core)
SFO 100% Headroom for high instantaneous throughput
Networking 100% Redundancy deployed in critical areas
27 xpu racks ~800 xpu nodesXPU = L2 or EF Processing Unit can be configured to run either as L2 or EF on a “run by run” basis Possibility to move processing power between the L2 and the EF allows high flexibility to meet the trigger needs
16 Nov 2010 ATLAS TriggerDAQ 41
DataFlow rates Achieved
Output from ROS• Have exceeded by a good margin the TDR
specification of 20 kHz L2 data requests together with EB requests of 3 kHz
Event Building• Have sustained data rates of well over 4.5 GB/s for a
wide range of event sizes (100kB to 10MB)– An EB test with 1.3 MB events achieved 9 GB/s
SFO Output• Have sustained running at well over 1 GB/s
– cf 300 MB/s in TDR• Output to Computer Centre runs at up to ~900 MB/s
16 Nov 2010 ATLAS TriggerDAQ 42
ATLAS Run Efficiency
ATLAS Efficiency @Stable Beams at √s = 7 TeV(not luminosity weighted)Run Efficiency 96.5% (green): fraction of time in which ATLAS is recording data, while LHC is delivering stable beamsRun Efficiency Ready 93% (grey): fraction of time in which ATLAS is recording physics data with innermost detectors at nominal voltages (safety aspect)
Key functionality for maximizing efficiencyData taking starts at the beginning of the LHC fillStop-less removal/recovery: automated removal/ recovery of channels which stopped the triggerDynamic resynchronization: automated procedure to resynchronize channels which lost synchronization with LHC clock, w/o stopping the trigger
752.7 h of stable beams (March, 30th - Oct, 11th)
16 Nov 2010 ATLAS TriggerDAQ 43
Trigger Menu and Configuration
Trigger menu:• Collection of trigger signatures• ≈200 – 500 algorithm chains in current menus• Algorithms re-used in many chains• Selections dictated by ATLAS physics programme • Includes calibration & monitoring chains
Configuration infrastructure• Very flexible!• Pre-scale factors employed to
change menu while running– At change of Lumi Block
• Adapt to changing LHC luminosity
16 Nov 2010 ATLAS TriggerDAQ 44
Trigger Commissioning
• Initial timing in of L1 with Cosmics +single beams :
• First Collisions : L1 only• Since June : gradual
activation of HLT
16 Nov 2010 ATLAS TriggerDAQ 45
Beam spot monitoring in L2
• Example of using the flexibility and spare capacity in the system• Fit Primary vertex using tracks to Inner Detector data• Does not use RoI, so limited to a few kHz – because of ROS request limit• Very useful diagnostic during 2010 for LHC tuning
16 Nov 2010 ATLAS TriggerDAQ 46
Evolution of LHC Luminosity in 2010
16 Nov 2010 ATLAS TriggerDAQ 47
Total Integrated Luminosity in 2010
16 Nov 2010 ATLAS TriggerDAQ 48
16 Nov 2010 ATLAS TriggerDAQ 49
Summary of 2010 p-p running
• The ATLAS TDAQ system has operated beyond design requirements• meeting changing needs for trigger commissioning, understanding the
detector and accelerator, and delivering physics results• Robust and Flexible system
• thanks to years of planning, prototyping, commissioning and dedicated work by many people
• The data Recording farm regularly used well beyond design specifications• The system (dataflow and trigger) has successfully handled running with
luminosity spanning 5 orders of magnitude• There is space for the trigger to evolve and the selections will continue to
be optimized for even higher luminosities• There are enough HLT nodes to meet the full EB rate and present L
• If needed will install more CPU power in 2011• High Run Efficiency for Physics of 93%• Ready for running in 2011
16 Nov 2010 ATLAS TriggerDAQ 50
Moved on to 4 weeks of Heavy Ion running
16 Nov 2010 ATLAS TriggerDAQ 51
16 Nov 2010 ATLAS TriggerDAQ 52
Future Challenges
16 Nov 2010 ATLAS TriggerDAQ 53
Future Challenges – Up to Design Luminosity
• Small increase in event size – Still dominated by non zero-suppressed LAr Calo
• Need higher HLT rejection (x~4)– L1 increased x ~2, SFO reduced by x ~2
• Possible limits– HLT CPU power
• Add up to ~1200 more EF nodes– Also increases L2 - reduce number of XPUs used for EF
– ROS request rate• Some requests to go beyond TDR performance to allow more
non-RoI usage in L2 (e.g. ID full-scan and missing ET )• Possible improvements (with little change in network b/w):
– Optimise details of detector mapping into ROSs– Further optimisation of ROS s/w and/or L2 data requests– Update to ROS PCs– Add possibility of fetching pre-processed data (e.g. energy sum)
16 Nov 2010 ATLAS TriggerDAQ 54
Future Challenges – Beyond Design Luminosity
• Improve L1:– Use topology information and finer granularity data– Add a L1 Track Trigger– i.e. move some algorithms from L2 to L1
• Seems likely that:– L1 accept rate still ~100 kHz (detector power)– SFO event rate still ~200 Hz (off-line capacity)
• But much bigger events - several MB• DataFlow b/w and HLT processing will need to increase
– By at least factor of event size increase, possibly more– DataFlow - more full-scans and/or higher EB rate– HLT processing – more complex evts and some algorithms “stolen”
by L1• May be able to offset these by using pre-processed data
– E.g. in ROD, in ROS, Fast h/w Track Finder (FTK)• Technology improvements will help
– faster network, faster CPU’s, GPUs?,…
16 Nov 2010 ATLAS TriggerDAQ 55
Some likely Issues
• Scaling with the number of applications – likely to increase even more (currently > 10,000)– At least until find a way to use parallel threads
• problem also for off-line, but they may find a solution not compatible with the on-line environment!
– Need to avoid this causing problems (delays) in:State Transitions; Configuration; Gathering monitoring data (especially histograms)
• Load balancing – heterogeneous farms, events in the tails of processing time, L2 vs EF
• Bandwidth balancing – mapping of detectors to RODs/ROLs/ROS’s balanced the b/w load for design luminosity, but should be revisited
16 Nov 2010 ATLAS TriggerDAQ 56
Outlook
• Studies underway for a DataFlow evolution prototype– Combine L2, EB and EF in the same processor
• but still use RoI for L2
– Addresses various issues including load balancing and some aspects of scaling
• For the intermediate-term - may need to revise the specification for ROS requests and then investigate how to meet these (retaining current RODs and ROLs)
• For running in the longer term (>2020) - revisit the whole architecture of DAQ/HLT
16 Nov 2010 ATLAS TriggerDAQ 57
Summary
• Have described:– The ATLAS TDAQ system and how over a number of
years the architecture and implementation planning evolved to a system based almost entirely on commodity hardware
– How it uses the RoI principle to guide data flow across the networks as well as data processing in HLT
– The status of the system today and the performance achieved in 2010
• DataFlow rates at or beyond TDR specifications and >93% eff• A flexible HLT system running and many algorithms deployed
– Briefly the challenges for the future: to reach design luminosity and beyond
16 Nov 2010 ATLAS TriggerDAQ 58
Backup Slides
16 Nov 2010 ATLAS TriggerDAQ 5921 April 2023Fred Wickens, RALRicardo Gonçalo, RHUL
Pixel: 10x100μm; 80 M channels Strips: 80μm; 6 M channels
160000 channels
Beam Pickup: at ± 175m from ATLASTrigger on filled bunch Provide the reference timing
Minbias Trigger Scintillator: 32 sectors on LAr cryostatMain trigger for initial running coverage 2.1 to 3.8
16 Nov 2010 ATLAS TriggerDAQ 60
Detector ROD’s (Read-Out Drivers)
• Subdetector specific designs• Collects and processes data (no event selection) • Built as VME modules
– DSP’s, FPGA’s or ASIC’s for processing
– E.g. LArg ROD’s use DSPs to calculate
• Energy, fit quality, …
• Output via standard Read-Out Link (ROL) – 160 MByte/s optical fiber to
Read-Out System
16 Nov 2010 ATLAS TriggerDAQ 61
ROS - Read-Out System• ROBin Boards
– 64b x 66 MHz PCI card with buffer memory and hardware assisted memory control
– FPGA and embedded PPC– Receives data from up to 3
ROL’s– Stores data during LVL2
decision time
• ROS PC’s– Standard PC servers– Up to 4 ROBin’s per ROS PC– Receives data requests and
send responses via GbE
16 Nov 2010 ATLAS TriggerDAQ 62
Final ROBIN: ~600 used in full read-out
1
1
1
2
3
4
5
• 3 Read-Out Link channels (1) (200 MByte/s per channel), 64 MByte buffer memory per ROL, electrical Gigabit Ethernet (2) , PowerPC processor (466 MHz) (3), 128 MByte program and data memory, Xilinx XC2V2000 FPGA (4) , 66 MHz PCI-64 interface (5)• 12 layer PCB, 220*106mm, • Surface Mounted Devices on both sides• Power consumption ~ 15W (operational)
16 Nov 2010 ATLAS TriggerDAQ 63
MDT ROD’s, LArg ROD’s + ROS’s
16 Nov 2010 ATLAS TriggerDAQ 64
ATLAS DAQ/HLT HardwareHLT Racks on the right, Online services + Event Builder + SFO on left
16 Nov 2010 ATLAS TriggerDAQ 65
TDAQ 2010
1.5 MB/150 ns
~350 Hz
~3.5 kHz
20 kHz
~1 MHz
~20 kHz
~3.5 kHz
~ 350 Hz
~1 MHz
~30 GB/s
~550 MB/s
~5.5 GB/s
~40 ms
~300 ms
High Rate Test with Random
Triggers
~65 kHz
16 Nov 2010 ATLAS TriggerDAQ 6621 April 2023Fred Wickens, RAL
Fred Wickens, RAL - Seminar at SLACSLAC - 16 Nov. 2010 66
LHC collision rate (nb=4)
LHC collision rate (nb=2)
• Soft QCD studies• Provide control trigger on p-p collisions;
discriminate against beam-related backgrounds (using signal time)
• Minimum Bias Scintillators (MBTS) installed in each end-cap;
• Example: MBTS_1 – at least 1 hit in MBTS
• Also nr. of hits its in Inner Detector
Ricardo Gonçalo, RHUL LHC Days in Split - 4 Oct. 2010 66
Minimum Bias Trigger
Minbias Trigger Scintillator: 32 sectors on LAr cryostatMain trigger for initial running coverage 2.1 to 3.8
16 Nov 2010 ATLAS TriggerDAQ 67
80% acceptance due to support structures etc.
INST 3 (2008) S08003
Muon Trigger• Low PT: J/, and B-physics
• High PT: H/Z/W/τ μ, SUSY, exotics➝
• Level 1: look for coincidence hits in muon trigger chambers
– Resistive Plate Chambers (barrel) and Thin Gap Chambers (endcap)
– pT resolved from coincidence hits in look-up table
• Level 2: refine Level 1 candidate with precision hits from Muon Drift Tubes (MDT) and combine with inner detector track
• Event Filter: use offline algorithms and precision; complementary algorithm does inside-out tracking and muon reconstruction
16 Nov 2010 ATLAS TriggerDAQ 68
• Stand-alone: muons reconstructed from Muon Spectrometer information only– L2 efficiency > 98% w.r.t. L1 for
muons with pT > 4 GeV– Good agreement with simulation
• Combined: muons reconstructed from Muon Spectrometer segment combined with Inner Detector track– Sharp turn-on and high efficiency– Good agreement with simulation
• Alternative inside-out algorithm also used in Event Filter
Standalone Level 2 effic. wrt Level 1pT > 4GeV
CombinedEv. Filter effic. wrt Level 2pT > 4GeV
16 Nov 2010 ATLAS TriggerDAQ 69
e/γ Trigger
• pT≈3-20 GeV: b/c/tau decays, SUSY• pT≈20-100 GeV: W/Z/top/Higgs• pT>100 GeV: exotics
• Level 1: local ET maximum in ΔηxΔφ = 0.2x0.2 with possible isolation cut
• Level 2: fast tracking and calorimeter clustering – use shower shape variables plus track-cluster matching
• Event Filter: high precision offline algorithms wrapped for online running
W eν➝
L1 EM triggerpT > 5GeV
16 Nov 2010 ATLAS TriggerDAQ 70 70
ATLAS-CON
F-2010-065Jet Trigger
• QCD multijet production, top, SUSY, generic BSM searches
• Level 1: look for local maximum in ET in calorimeter towers of ΔηxΔφ = 0.4x0.4 to 0.8x0.8
• Level 2: simplified cone clustering algorithm (3 iterations max) on calorimeter cells
• Event Filter: anti-kT algorithm on calorimeter cells; currently running in transparent mode (no rejection)
• High Level Trigger running at EM scale plus jet energy scale corrections at the moment
Note in preparation
16 Nov 2010 ATLAS TriggerDAQ 71
Missing ET Trigger
• SUSY, Higgs• Level 1: ET
miss and ET calculated from all calorimeter towers
• Level 2: only muon corrections possible• Event Filter: re-calculate from calorimeter
cells and reconstructed muons Level 15 GeV threshold
Level 120 GeV threshold
16 Nov 2010 ATLAS TriggerDAQ 72
Plans for Heavy Ion Run
• Collect ≈3μb-1 of Pb-Pb collisions at 2.76 TeV/nucleon during 4 weeks in November
• Take advantage of ATLAS capabilities– Good angular coverage– Good particle ID– Forward scintillators and Zero Degree
Calorimeters
• Trigger rate ≈ 140 Hz– σPb+Pb≈ 7.6 barn– L ≈ 1x1025cm-2s-1 (1% of design)– I.e. around 100Hz of collisions
• Use modified L1 menu only– Use as little High Level Trigger as
possible– Avoid tracking if possible (1000s of
tracks for central collisions)
Trigger Triggers, thresholds, etc
Minimum bias
Hits in forward scintillators, zero-degree calorimeter, luminosity detectors etc for wide eta coveragePrimary triggers for heavy ion run
Σ ET 50, 500, 1000, 2000 GeVCentrality trigger and centrality veto to enhance peripheral collisions
Jets Single and di-jet triggers, scalar sum of jet energy for centrality veto
EM Single photon and electron triggers
Muons Single muon and di-muon triggers
Tau Single tau and di-tau triggers