+ All Categories
Home > Documents > HLT Infrastructure Commissioning - McGill Physicssteven/StevenWebFiles/Talks/HLTcommis.pdf ·...

HLT Infrastructure Commissioning - McGill Physicssteven/StevenWebFiles/Talks/HLTcommis.pdf ·...

Date post: 27-Apr-2018
Category:
Upload: ngotu
View: 224 times
Download: 2 times
Share this document with a friend
29
HLT Infrastructure HLT Infrastructure Commissioning Commissioning Steven Robertson Institute of Particle Physics ATLAS NSERC Review Vancouver, B.C. November 14th, 2007
Transcript

HLT Infrastructure HLT Infrastructure Commissioning Commissioning

Steven Robertson Institute of Particle Physics

ATLAS NSERC Review

Vancouver, B.C. November 14th, 2007

Dec 14, 2007 2HLT Infrastructure Commissioning Steven Robertson, IPP/McGill

OutlineOutline

● Overview of ATLAS trigger system● HLT hardware installation and commissioning schedule● Trigger commissioning and technical runs● Canadian HLT Testbed and online integration

Dec 14, 2007 3HLT Infrastructure Commissioning Steven Robertson, IPP/McGill

Trigger ConceptTrigger ConceptTrigger design motivated by need to

minimize data movement and processing time● Regions of Interest (ROIs) – access

data only in detector regions flagged by previous trigger levels/algorithms

LVL1 – implemented in hardware based on reduced granularity detector information

LVL2 – access data from individual detector Read Out Buffers (ROBs), but only in ROIs

EF – uses event fragments from event building with access to full detector granularity but limited by processing time

Dec 14, 2007 4HLT Infrastructure Commissioning Steven Robertson, IPP/McGill

ATLAS trigger and dataflowATLAS trigger and dataflow

Dec 14, 2007 5HLT Infrastructure Commissioning Steven Robertson, IPP/McGill

HLT OverviewHLT Overview

SV: Level-2 Supervisor - controls Level-2 trigger

DFM: Data-Flow Manager - controls event building

pROS: pseudo-ROS – buffers results from Level-2 trigger

Dec 14, 2007 6HLT Infrastructure Commissioning Steven Robertson, IPP/McGill

L1 JETROIT2CaloJet

TrigJetRec

time

TrigCaloCellMaker

L2

EFTrigCaloTowerMaker

TrigJetHypo

TrigL2JetHypo

Trigger SlicesTrigger SlicesTrigger organized as as series of vertical “slices”

representing classes of distinct trigger signatures● Trigger algorithms executed per ROI, controlled by

trigger steering● Feature extraction (FEX) algorithms ● Hypothesis algorithms – apply selection

Level 1: ● Sliding 0.8x0.8 window with 0.4x0.4 central cluster

Level 2:● One FEX algorithm (calorimeter jets with RCone = 0.4)● Hypothesis Algorithm: Cut on jet ET

Event Filter:● FEX algorithm: unpack calorimeter data, construct calorimeter

towers, perform jet reconstruction● Hypothesis algorithm: Cut on jet ET

(See talk by B. Vachon)

Dec 14, 2007 7HLT Infrastructure Commissioning Steven Robertson, IPP/McGill

HLT farmsHLT farmsMaximum HLT input rate fixed by L1

hardware,output rate by ATLAS data storage constraints (i.e. doesn't scale with physics rate)● HLT farm capacity dictates sophistication

of trigger algorithms (i.e. what data can be unpacked and how much processing can be performed)

● Full HLT ~500 L2 PCs, ~1800 EF PCs (all multi-core),

● ~800 of these will be “EF2” PCs capable of being used either as L2 or EF nodes

● Consistent with TDR total HLT capacity, but with somewhat different hardware

Dec 14, 2007 8HLT Infrastructure Commissioning Steven Robertson, IPP/McGill

HLT preseries hardwareHLT preseries hardwareCurrently 181 EF2 PCs (~5.2

racks) installed in SDX1● mostly 2xClovertown(quad-

core) with 1GB/core RAM, but also some 2xWoodcrest (dual-core) with 2GB/core

Preseries hardware specifications and configuration similar to those in the RTI request ● e.g. 2x1Gb/s network, cold-

swappable power supplies, remote management via IPMI etc

● studies to evaluate need for more memory on HLT nodes (i.e. more than 1GB/core?)

Dec 14, 2007 9HLT Infrastructure Commissioning Steven Robertson, IPP/McGill

Hardware acquisition scheduleHardware acquisition scheduleLarge fraction of dataflow and online hardware has already been

purchased:● ROS , DFM, SFO, monitoring and online essentially 100%

So far, only HLT hardware which is specifically needed for commissioning (technical runs, detector commissioning and cosmic-ray runs) has been purchased:● <10% of overall HLT system● Plan to expand to full system in 2009-2010 as luminosity requires

and funding permits● Initial purchase for “summer 2008” to bring total to >=36 racks for

first LHC beams, but delay purchase as late as possible and potentially stage into “early” and “late” phases

● Full size of system will depend on physics needs, limited by system performance, and algorithm performance and timing (and of course, funding)

Dec 14, 2007 10HLT Infrastructure Commissioning Steven Robertson, IPP/McGill

HLT Racks in SDX1HLT Racks in SDX1Empty racks (with power) already installed in SDX1

● 47U on upper floor – central switches, dataflow, online, will ultimately house EF2s for EF or L2

● 52U on lower floor (currently “preseries”) will house only EF racks

Rack coolers delivered for upper floor but only partially installed● Cooling is main limitation – studying options for higher performance

Dec 14, 2007 11HLT Infrastructure Commissioning Steven Robertson, IPP/McGill

EF2 Racks EF2 Racks EF2 racks can be allocated as either L2 or EF farm resources

depending on (time varying) need

Each EF2 rack has:● 1 x10 GbE connection to Dataflow network for L2 function● 2 x 1 GbE connections to Backend network for EF function● 1 x 1 GbE connection to control network

XPU rack

PUs

Data conc.

Ctrl. Conc.

DataFlow Network

BackEnd Network

L2PUs

EFPs EFPs EFPs

SFIsSVs

ROS PC

ROB

ROBROB

ROB

ROBROB

ROS PC

ROB

ROBROB

ROB

ROBROB

GE Data Link10GE Data Link

GE Control LinkRTI proposal requests funding for 10 EF2 racks with an extended purchase profile: ● maintain maximum flexibility

for HLT system, but in practice the price difference between L2, EF and EF2 racks is a few %

Dataflow network

Backend Network

Dec 14, 2007 12HLT Infrastructure Commissioning Steven Robertson, IPP/McGill

HLT HardwareHLT HardwareHLT will be built from off-the-shelf computing hardware

● Late purchase to exploit price/performance improvements

Various options are being considered as they become available:

● 2006 reference hardware was 3GHz 2xWoodcrest (dual-core Intel Xeon)

● In early 2007 1.86GHz 2xClovertowns (quad-core Intel Xeon) purchased for testing

● CERN Tier-0 purchasing twin-motherboard 2xClovertown (2.33GHz)

● New AMD quad-core chips currently being evaluated

Blade solution not currently competitive with U1, but this may change in the future

Dec 14, 2007 13HLT Infrastructure Commissioning Steven Robertson, IPP/McGill

HLT Hardware CommissioningHLT Hardware CommissioningTDAQ technical runs are the primary commissioning exercises for the

DAQ and HLT● Since December 2006, seven ~1 week runs● Seven ATLAS Canada members have so far worked shifts during these

runs, many others indirectly involved via e.g. data quality and HLT algorithm development

● Specific set of objectives during each run designed to test the integration of new features, from basic run control, HLT supervision (i.e. handling of “dummy” HLT algorithms), HLT algorithm performance, overall trigger menu, data quality monitoring and data flow to mass storage

Dec 14, 2007 14HLT Infrastructure Commissioning Steven Robertson, IPP/McGill

Technical runsTechnical runsUse realistic mix of simulated (L1-preselected) events injected into

TDAQ front end ● format identical to real data coming off the detector● validate DAQ software and HLT integration in realistic data-taking

environment and with HLT “preseries” hardware● study timing, resource usage,

networking, memory leaks, stability etc

● refine “wetware” interface for run control, monitoring and data quality

● define procedures for shifters, as well as for integration of software at Point 1 (i.e. validation in online framework)

● shift handover procedures, checklists, shift logging etc

Dec 14, 2007 15HLT Infrastructure Commissioning Steven Robertson, IPP/McGill

Technical RunsTechnical RunsTechnical runs also provide the opportunity to exercise the DAQ and run

control system, and in particular are (currently) the main mechanism for training of expert shifters and TDAQ operations experts

Dec 14, 2007 16HLT Infrastructure Commissioning Steven Robertson, IPP/McGill

Online MonitoringOnline MonitoringStrong coupling to Canadian HLT algorithm development and data

quality activities (talks by R. Moore, B. Vachon)

Online Histogram monitoring (May 2007 technical run):

Dec 14, 2007 17HLT Infrastructure Commissioning Steven Robertson, IPP/McGill

November 2007 Run HighlightsNovember 2007 Run Highlights

Most recent technical run took place Nov 19-27: ● 5 racks of 31 HLT “preseries” PCs available (~7%) of full system● In addition to mixed Monte Carlo, specially prepared MC samples and

real cosmics data (from earlier detector commissioning run) were used for specific tests

● full 1031 trigger menu implemented (200+ trigger chains)● full HLT from L2 to CASTOR (mass storage) implemented● up-to-date HLT and offline software (i.e. no cheating)● integration of many software tools and functionalities:

● TriggerTool and ORACLE database for management of trigger menu, prescales etc.● PartitionMaker for defining resources required for particular data-taking

configuration (previously required hand-editing of xml files by experts)● Histogram monitoring – many new features in online and offline handling of data-

quality histograms

Dec 14, 2007 18HLT Infrastructure Commissioning Steven Robertson, IPP/McGill

Nov 2007 technical run highlights Nov 2007 technical run highlights ● Several memory leaks located: ~2k/event in EF!

● “long” (60 hour) run resulted in hang of ~200 of 960 L2PUs

● Many algorithm performance studies by individual trigger “slices”, e.g jets and missing ET:

● compare the cell vs the FEB data unpacking for the L2 jets (first time FEB unpacking done online!)

● Compute Ex, Ey, Ez by summing 128 cells connected to one FEB (LAr)

● total processing time dominated by unpacking and data preparation time● initial studies indicate factor of 2 improvement using FEB

Cell unpackingFEB unpacking

Dec 14, 2007 19HLT Infrastructure Commissioning Steven Robertson, IPP/McGill

Canadian HLT TestbedCanadian HLT TestbedHLT testbed purchased in 2005 with

Quebec FQRNT Equipe funds (grant held by Warburton, Robertson and Vachon) ● Commissioned primarily by

UdeM postdocs and students

ATLAS software infrastructure and configuration identical to that used in technical runs● can reproduce full

functionality of HLT software environment with single instance of each HLT element (e.g. EF PC)

● (obviously) some differences in networking etc compared with actual HLT

UltraSPARC T1 2x(quad core) UltraSPARC T1 2x(quad core)

LVL2:LVL2: 2.6GHz Opteron 2x(dual core) 2.6GHz Opteron 2x(dual core) EF:EF: 2.2GHz Opteron 1x(dual core) 2.2GHz Opteron 1x(dual core)

Gateway:Gateway: 2.2GHz Opteron 1x(single core) 2.2GHz Opteron 1x(single core)

Dec 14, 2007 20HLT Infrastructure Commissioning Steven Robertson, IPP/McGill

Canadian HLT TestbedCanadian HLT TestbedUsed for:

● Hardware performance testing (single/dual/quad core, memory etc)

● nodes can be reconfigured to check e.g. EF performance on single vs dual core

● Release validation in advance of technical runs

● avoid wasted time during technical run

DAQ screen-shots on pages 15 & 16 were actually from the testbed, not from a technical run!

● Memory leak and stability checking ● possible to perform much longer runs than ~1week

● Detailed performance checking and online integration of HLT algorithms● without time-pressure of technical runs

● ATLAS DAQ and/or data quality monitoring shifter training

Dec 14, 2007 21HLT Infrastructure Commissioning Steven Robertson, IPP/McGill

Canadian HLT TestbedCanadian HLT TestbedTest of L2 Jet algorithm performance in multi-threaded environment

(on 2.6GHz dual-core Opteron 285):

● Total time (per ROI) for various L2 Jet trigger thresholds and different numbers of active threads

● VBF signal Monte Carlo simulation

Dec 14, 2007 22HLT Infrastructure Commissioning Steven Robertson, IPP/McGill

Online integrationOnline integrationHLT algorithm development typically takes place within the offline

software framework but algorithms must run online...● Time optimization, use of multi-threading● Services e.g. online database● Online histogram monitoring (in addition to offline monitoring)

CanadianHLT Testbed(2.6GHz Opteron 285)

March 2007 Technical Run

OldNew

Factor of 2 improvement in total processing time by replacing “pow(x,2)” by “x*x” and removing unnecessary dereferencing

Dec 14, 2007 23HLT Infrastructure Commissioning Steven Robertson, IPP/McGill

ConclusionConclusion

Canadians are playing an active and increasing role in HLT commissioning activities● This is a natural progression of HLT algorithm development,

data quality monitoring, online integration and HLT testbed activities as ATLAS moves towards data taking

● Canadian leadership in several important areas of HLT development which are coupled to online operations, but desirable to extend the scope to these activities to encompass online operational responsibilities

HLT RTI request represents a hardware contribution to ATLAS which is consistent with our ongoing Canadian HLT activities and which would help confirm our central role in HLT online operations

Dec 14, 2007 24HLT Infrastructure Commissioning Steven Robertson, IPP/McGill

Control networkControl networkCore online control network:

● Run control● Online databases● Monitoring traffic

Core 1 Core 2

Network root(to ATCN)

4Online racks Management

Servers

DataFlow Network

BackEnd NetworkSFO

Switch

4SFI racks

Conc.

57EF racks

Conc.

3SFO racks

Conc.

17LVL2 racks

Conc.

1DC Control

rack

Conc.

10 GE Control Links

GE Control Links

Dec 14, 2007 25HLT Infrastructure Commissioning Steven Robertson, IPP/McGill

Back end networkBack end networkSFI to EF traffic and data output to mass storage via SFOs

SFIs

Massstorage

SFOs

57 EFracks

~100 SFIs

~30 SFOs

Central 1

DataFlow Network

EFs

EF conc.

EFs

EF conc.

EFs

EF conc.

SFO conc.

Control Network

GE Data Link10GE Data Link

GE Control Link

Dec 14, 2007 26HLT Infrastructure Commissioning Steven Robertson, IPP/McGill

Data Collection networkData Collection network● LVL2 Traffic ● Event Builder Traffic

A,B

SVsBB

SFIsB

BAA

SFIsA

A

A (B)

B (A)

A (B)

B (A)

A (B)

B (A)

A,BA,B A,B

~20 ROSconc.

switches

17 L2PUracks

~160 ROSs

~550 L2PUs

~100 SFIs

BackEnd Network

A B

ROS PC

ROB

ROBROB

ROB

ROBROB

ROS PC

ROB

ROBROB

ROB

ROB

ROB

ROS conc .

A A

L2PUs

L2PU conc .

A A

L2PUs

L2PU conc .B B

L2PUs

L2PU conc .

B B

L2PUs

L2PU conc .

Central 1

A B

ROS PC

ROB

ROB

ROBROB

ROB

ROB

ROS PC

ROB

ROB

ROBROB

ROBROB

ROS conc .

A B

ROS PC

ROB

ROB

ROBROB

ROB

ROB

ROS PC

ROB

ROB

ROBROB

ROBROB

ROS conc .

Central 2

ControlNetwork

ControlNetwork A B

GE Data Link10GE Data Link

GE Control Link

Dec 14, 2007 27HLT Infrastructure Commissioning Steven Robertson, IPP/McGill

ATLAS Trigger SystemATLAS Trigger SystemYet another representation of the trigger:

Dec 14, 2007 28HLT Infrastructure Commissioning Steven Robertson, IPP/McGill

Benchmarking: AMDBenchmarking: AMD

Dec 14, 2007 29HLT Infrastructure Commissioning Steven Robertson, IPP/McGill

Benchmarking: IntelBenchmarking: Intel


Recommended