BTeV was terminated in February of 2005.
BTeV Trigger
BTeV Trigger Overview
• Trigger Philosophy: Trigger on characteristics common to all heavy-quark decays, separated production and decay vertices.
• Aim: Reject > 99.9% of background. Keep > 50% of B events.• The challenge for the BTeV trigger and data acquisition system
is to reconstruct particle tracks and interaction vertices in every beam crossing. Looking for topological evidence of B (or D) decay.
• This is feasible for BTeV detector and trigger system, because of – Pixel detector – low occupancy, excellent spatial resolution, fast readout
– Heavily pipelined and parallel architecture (~5000 processors)
– Sufficient memory to buffer events while awaiting trigger decision
– Rapid development in technology – FPGAs, processors, networking
• 3 Levels:– L1 Vertex trigger (pixels only) + L1 Muon trigger
– L2 Vertex trigger – refined tracking and vertexing
– L3 Full event reconstruction, data compression
BTeV detectorBTeV detector
pp
30 station
Multichip module
Si pixel detectorSi pixel detector
5 cm
1 cm
50 m
400 m Si pixel sensors
5 FPIX ROC’s
128 rows x22 columns
14,080 pixels (128 rows x 110 cols)
L1 vertex trigger algorithm
• Segment Finder (pattern Recognition)– Find beginning and ending segments of tracks from hit clusters
in 3 adjacent stations (triplets):• beginning segments: required to originate from beam region
• ending segments: required to project out of pixel detector volume
• Tracking and Vertex Finding.– Match beginning and ending segments found by FPGA segment
finder to form complete tracks. – Reconstruct primary interaction vertices using complete tracks
with pT<1.2GeV/c.
– Find tracks that are “detached” from reconstructed primaries.
b
p p
B-meson• Trigger Decision
– Generate Level-1 accept if it has two “detached” tracks going into the instrumented arm of the BTeV detector.
BTeV trigger overviewBTeV trigger overview
BTeV detector
L1 muon
L1 vertex
GlobalLevel-1
Level-1
Level 2/3 Crossing Switch
Data Logging
Front-end electronics
Level-1 Buffers
Level-2/3 Buffers
Information Transfer Control Hardware
ITCH
Level-2/3 Processor Farm#1
#2#m-1
#m
RDY
Crossing #N
Req. data for crossing #N
Level-3 accept
GL1 accept
PIX
> 2 x 10 channels7
500 GB/s(200KB/event)
2.5 MHz
L1 rate reduction: ~50x
L2/3 rate reduction: ~20x
12.5 GB/s(250KB/event)
50 KHz
2.5 KHz 200 MB/s(250KB / 3.125 = 80KB/event)
Level 1 vertex trigger architectureLevel 1 vertex trigger architecture
30 Pixel Stations
Pixel Pre-processors
FPGA Segment Finders
~2500-node track/vertex farm
Switch (sort by crossing number)
MERGE
To Global Level-1 (GL1)
Pixel PreprocessorCounting RoomCollision Hall
Pixel stations
FPIX2 Read-out chip
DCB
DCB
DCB
Data combiners
Row (7bits) Column (5bits) BCO (8bits) ADC (3bits)
sync (1bit)
Optical links
Pixelprocessor
Pixelprocessor
Pixelprocessor
Pixel processor
Optical Receiver Interface
Time Stamp Expansion
Event sorted by Time and column
Hit cluster finder & x-y coordinate translator
Level 1 Buffer
Interface
FPGAsegment finder
to neighboring FPGAsegment finder
The Segment Tracker Architecture
Long doublets
Triplets
N+1 Short doublets
N Short doublets
N-1 Short doublets
MUX
Station N Bend
Station N-1 Bend
Station N+1 BendLong doublet
projections
Triplets projection
Station N-1 nonbend
Station N nonbend
Triplets projection
Station N+1 nonbend
Triplets projection
Short doublet outputs
BB33 outputs
• Find interior and exterior track segments in parallel in FPGAs.
• The Segment finder algorithm is implemented in VHDL
Station 15
Station 16
Station 17
Station 15
Station 16
Station 17
Bend view
Nonbend view
12 half Pixel planes at 12 different Z locations.
L1 Track and Vertex Farm
• Original baseline of L1 track and vertex farm used custom made processor board based on DSP or other processors. Total processors estimated to be 2500 TI DSP 6711. The L1 switch is custom designed too.
• After DOE CD1 review, BTeV changed L1 baseline design. – L1 Switch, Commercial off-the-shell Infiniband switch
(or equivalent).– L1 Farm, array of commodity general processors,
Apple G5 Xserves (or equivalent).
30 Pixel Stations
Pixel Processors
FPGA Segment Finders (56 per highway)
56 inputs at ~45 MB/s each
Level 1 switch33
out
puts
at
~76
MB
/s e
ach
1 Highway
Trk/Vtx
node #1
Trk/Vtx
node #2
Trk/Vtx
node #N PTSM network
Global Level 1
Level 1
Buffer
Track/Vertex Farm
33 “8GHz” Apple Xserve G5’s
with dual IBM970’s
Infiniband switch
Ethernet network
Apple Xserve identical to
track/vertex nodes
Level 1 Trigger Architecture (New Baseline)
R&D projects
• Software development for DSP Pre-prototype.• Level 1 trigger algorithm processing time studies on
various processors.– Part of trigger system R&D for a custom-made Level
1 trigger computing farm.• StarFabric Switch test and bandwidth measurement.
– R&D for new Level 1 Trigger system baseline design.– After DOE CD1 review, BTeV collaboration decided to
change baseline design of Level 1 trigger system. • L1 Switch – replace custom switch with Infiniband switch(or
equivalent).• L1 Farm – replace DSP hardware with Apple G5 Xserves (or
equivalent).
• Pixel Preprocessor of Level 1 trigger system.– Clustering algorithm and firmware development.
DSP Pre-prototype main goals
• Investigate current DSP hardware and software to determine technical choices for baseline design.
• Study I/O data flow strategies.• Study Control and Monitoring techniques.• Study FPGA firmware algorithms and simulation
tools.
– Understand major blocks needed.
– Estimate logic size and achievable data bandwidths.
• Measure internal data transfers rates, latencies, and software overheads between processing nodes.
• Provide a platform to run DSP fault tolerant routines.• Provide a platform to Run Trigger algorithms.
Features of DSP Pre-prototype Board
• Four DSP mezzanine cards on the board. This can test different Different TI DSPs for comparision.
• The FPGA Data I/O Manager provides two way data buffering. It Communicates the PCI Test Adapter (PTA ) card to each DSP.
• Two Arcnet Network ports.– Port I is the PTSM (Pixel Trigger Supervise Monitor). – Port II is the Global Level 1 result port. – Each Network port is managed by a Hitachi microcontroller. – PTSM microcontroller communicates to the DSPs via DSP Host
Interface to generate initialization and commands.– GL1 microcontroller receives trigger results via DSP’s Buffered
Serial Port (BSP).• Compact Flash Card to store DSP software and parameters.• Multiple JTAG ports for debugging and initial startup.• Operator LEDs.
L1 trigger 4-DSP prototype board
C:\>
C:\>
DSP
DSP
DSP
DSP
Input Buffer Control Manager
Output Buffer Control Manager
InputBuffer
Output Buffer
LVD
S In
terf
ace
FPGA
FPGA
FPGA
Hitachi H8controllers
McBSP (for trigger decisions)
Host Port Interface
DSP
RO
M
JTAGFLASH RAM
ArcNetInterface
ArcNetInterface
LV
DS
Inte
rfac
e
LVDS Link
Pixel Trigger Supervisor Monitor (PTSM)
PCI Test Adapter
DSP P r ot o t y pe Boa r d 9/ 01
C:\>
To Global Level-1
RAM
RAM
RAM
RAM
RO
MR
OM
RO
M
Level 1 Pixel Trigger Test Stand for the DSP pre-prototype
Xilinx programmingcable
PTA+PMC card
ARCnet card
TI DSP JTAG emulator
DSP daughter card
DSP Pre-prototype Software(1)
• PTSM task on the Hitachi PTSM microcontroller.– System initialization. Kernel and DSP application downloading.– Command parsing and distribution to subsystems. – Error handling and reporting.– Hardware and software status reporting.– Diagnostics and testing functions.
• GL1 task on the Hitachi GL1 microcontroller.– Receives the trigger results from the DSP’s and send to the GL1
host computer.• Hitachi Microcontroller API. A library of low level C routines
have been developed to support many low level functions.– ArcNet network driver.– Compact Flash API. Support FAT16 file system.– LCD API. Display messages on the on-board LCD.– Serial Port API:– JTAG API– One Wire API– DSP Interface API. Boot and reset DSP’s; access memory and registers
on the DSP’s.
DSP Pre-prototype Software(2)
• Host computer software. – PTSM Menu-driven interface.– GL1 message receiving and displaying.
• Custom defined protocol built on the lowest level of ArcNet network driver. Most efficient without standard protocol overhead.
Processor evaluation
• We continued to measure Level 1 trigger algorithm processing time on various new processors.
• MIPS RM9000x2 processor. Jaguar-ATX evaluation board.– Time studies on Linux 2.4– Time studies on standalone. Compiler MIPS SDE Lite 5.03.06.– System (Linux) overhead for processing time is about 14%.
• PowerPC 7447 (G4) and PowerPC 8540 PowerQuiccIII (G5).
– GDA Tech PMC8540 eval card and Motorola Sandpoint eval board with PMC7447A.
– Green Hills Multi2000 IDE with Green Hills probe for standalone testing.
Green Hills Probe
8540 eval board
Candidate processors for Level 1 Farm
Intel Pentium 4/Xeon
IBM 970 PPC
Motorola 74xx G4 PPC
Motorola 8540 PQIII PPC
PMC Sierra MIPS RM9000x2
L1 algorithm processing time
TI TMS320C6711 (baseline)
341 us (600MHz, MIPS SDE Lite 5.03.06)
195 us (1GHz 7455, Apple PowerMac G4)
117 us (2.4 GHz Xeon)
74 us (2.0 GHz Apple PowerMac G5)
1,571 us provided for comparison
suited for an off-the-shelf solution using desktop PC (or G5 server) for computing farm.
Processor
271 us (660MHz, GHS MULTI 2K 4.01)
Motorola 7447A G4 PPC 121 us (1.4GHz, GHS MULTI 2K 4.01)
StarFabric Switch Testing and Bandwidth Measurement• In the new baseline design of BTeV Level 1 trigger system, the commercial, off-the-shelf
switch will be used for the event builder.• Two commercial switch technology are tested, Infiniband (by Fermilab) and StarFabric (by IIT
group with Fermilab).• Hardware setup for StarFabric switch testing.
– PC with PCI bus 32/33.– StarFabric adapter, StarGen 2010.– StarFabric switch, StarGen 1010.
• Software– StarFabric windows driver.
P4/W2k
SG2010
PC
I 32
/33
SG1010
Athlon/XP
SG2010
PC
I 32
/33
Test Stand
L1 Switch Bandwidth Measurement
• StarFabric bandwitdh is between 74~84 Mbytes/s for packet size of 1 kByte to 8 kBytes. This result can not meet the bandwidth requirement of event builder.
• A simple way to improve performance is to use PCI-x(32/66 or 64/66) . Infiniband test stand uses PCI-X adapters in input/output computer nodes.
• Based on this result and other consideration, Infiniband is chosen in the new baseline design of the Level 1 trigger system. But, we are still looking at StarFabric and other possible switch fabric.
167 MB/s Bandwidth Target
At peak luminosity (<6> ints./BCO),with 50% excess capacity
Infiniband
StarFabric
Pixel Preprocessor
Optical Receiver Interface
Time Stamp Expansion
Event sorted by Time and culomn
Hit cluster finder & x-y coordinate
translator
Segment Trackers
Pixel Detector Front-end
Level 1 Buffer
InterfaceDAQ
56 inputs at ~ 45 MB/s each
Level 1 switch
33 o
utpu
ts a
t ~
76 M
B/s
eac
h
Infiniband switch
30 station pixel detector
PP&STSegment Tracker Nodes
Row and Column Clustering
• A track can hit more than one pixel due to charge sharing.
• One function of pixel Preprocessor is to find adjacent pixel hits, group them as a cluster and calculate x-y coordinates of cluster.
• Adjacent hits in the same row form a row cluster.
• Two overlapping row clusters in adjacent columns form a cross column cluster.
Pixel Chip
Cluster Finder Block Diagram
• The order of input hits in a row is defined. However, the column order is not.
• The hash sorter is used to produce defined column order.
• The row cluster processor identifies adjacent hits in a row and pass the starting/ending row numbers to next stage.
• The cross-column processor groups overlap hits (or clusters) in adjacent columns together.
• Cluster parameters are calculated in the cluster parameter calculator.
Hit Input
Row Cluster Processor: Cross-Row Clusters
FIFO
Hash Sorter: Column Ordering
Cross-Column ProcessorCross-Col. Clusters
Col NCol N-1
Cluster ParametersCalculator
Cluster
Implementation for Cross-Column Cluster Finder
StateControl
Hits
Cross-rowheaders
Col. A
Col. B
Cross-columnheaders
Hits
Col. BCol. A
The cluster in Col. A is a single column one
and is popped out.
The two clusters form a cross-
column one and are popped out.
Col. BCol. A
If Col. B is not next to the Col. A, entire Col. A is popped out.
FIFO2
FIFO1
The cluster in Col. B is not connected with Col. A and is filled
into FIFO2.
Implementation for Cross-Column Cluster Finder (cont’d)
• The cross-column cluster finder firmware is written with VHDL.
Fill Col. A
Col. B = Col. A +1 ?
Pop Col. AFill Col. B
(1) uAN< uB1
(2) uA1 > uBN
Fill BPop A Pop APop B
Y
N
uAN< uB1 uA1 > uBN
Neither
StateControl
Col. A
Col. B
FIFO2
FIFO1
BES-II DAQ System
BES experiment upgraded its detector and DAQ system in 1997.
Beijing Spectrometer
Performance of BES-II and BES-I
Subsystem Variable BES-II BES-I
MDC P/P 1.78%(1+P2)1/2 1.76%(1+P2)1/2
xy 198-224 m 200-250 m
dE/dx 8.0% 7.8%
VC xy 90m 220m
TOF T 180ps 375ps
SC E/E 21%E -1/2 24.4%E -1/2
MUON Z 7.9 cm Layer 1
10.6 cm Layer 2
13.2 cm Layer 3
DAQ Dead Time 10ms 20ms
BES-II DAQ System• Front-end electronics for all of system,
except VC, consist of CAMAC BADC (Brilliant ADC).
• VCBD, VME CAMAC Branch Driver. Read data of one detector subsystem. And store the data in the local buffer.
• Two VME CPU modules with RT OS VMEexec.
– One for data acquisition and event building.
– Another one for event logging to tape and sending a fraction of events to Alpha 3600.
• DEC Alpha 3600 machine. – DAQ control console.– Status/error report. – Online data analysis and display.– Communication with BEPC control
machines to obtain BEPC status parameters.
• The system dead time: 10 ms.– BADC conversion: 6ms.– VCBD readout: 3ms.
ALPHA 3600 ServerOPEN VMS
VME BUS
VME BUS
VME 167HOST
VME 162Target 2
VME 162Target 1
VMEMemery
VCBD 0 VCBD1 VCBD2 VCBD3 VCBD4 VCBD5 VCBD6 VCBD7 VCBD81131
1821
VC 1879 TDC
BESControl
VMEBusRepeater
VMEBusRepeater
SCSI BUS
DISK 8mm Tape
Ethernet
C A M A C
MDC-Q TRGMDC-T H.V.LUMESCBSCMUONTOF
FASTBUS
Fastbus subsystem for Vertex Chamber
• One Fastbus crate for 640 VC channels.
• Fastbus logical board.– Distributes every kind
of signals to TDCs, common stop, reset (fast clear).
– Produce internal start and stop test pulses.
– Good event signal tells the 1821 to read data from 1879.
1879TDC
1879TDC
1821SM/I
logicalboard
PC
1131
VME162
VME162
commonstop
goodevent
reset
FASTbus
VME bus
SIB
ECL INPUT
Microcode for the 1821
• Initialization for 1879.
– TDC scale: 1 us.
– Compact parameter: 10 ns.
– Active Time Interval: 512 bins.
• Readout 1879 data into data memory of 1821.
– Block transfer.
– Sparse data scan method. TDC modules containing data are readout only.
• Send data ready signal (interrupt) to VME.
• SONIC language. Symbolic Macro Assembler. Converted to microcode under LIFT.
• LIFT (LeCroy Interactive Fastbus Toolkit). Tool for developing microcodes and testing FB system under PC.
VC DAQ Software in VME
• A task running in VME 162.
• Control by BES-II DAQ main task through message queues.
• Down loading the microcode into 1821.
• Control the procedure of VC data taking.
• Readout time data from 1821 into 1131 data memory after receiving interrupt signal.
• Data transfer modes:
– High 16-bit: DMA.
– Low 16-bit: word by word.
• Measured transfer rate.
– 96(chans)x7(modules)x2(both edges)+3(marks) = 1347 32-bit words.
– High 16-bit: DMA: 1.1 ms @VME 162.
– Low 16-bit: word by word: 3.5 ms@VME 162.
End
The End
Backup slides
Backups
GlobalLevel-1
ITCH
Information Transfer Control Hardware
GL1
Level-1 Buffers
12 x 24-port Fast Ethernet Switches
Level 2/3Processor Farm
Pixel Processors
FPGA Segment Finder
Track/Vertex Farm
Gigabit Ethernet Switch
Data Combiners +Optical Transmitters
OpticalReceivers
BTeV Detector
Front End Boards
8 Data Highways
Data Logger
Cross Connect Switch
BTeV trigger architecture
L1 Highway Bandwidth Estimates
WorkerNode
GL1 + ITCHNode
L2 Node
SegmentTrackerNode
SegmentTracker Node
~30
~15
~96
Total Triplets 2.5GB/s
Other: 10MB/s (Raw Pixel Data)
Switching Fabric
WorkerNode
WorkerNode
DAQ Switch
L2 Node
SegmentTracker Node
167 MB/s
83 MB/s
15 MB/s
1.8 MB/s + 0.5 MB/s
Bridge
300 KB/s
L2 Node
L1 Buffers
L1 Buffers
L1 Buffers
Muons
Front Ends
Results+Triplets 54 MB/s
(1/50 rejection)Other: 1 GB/s
Results+Triplets 583 KB/s
Bandwidth estimates are for 6
interactions/crossing & include 50 % excess capacity
DAQ Highway Switch