Jin Huang (BNL)
TPC Electronics MeetingJin Huang <[email protected]> 2
TPC Electronics MeetingJin Huang <[email protected]> 3
TODO: need testing optically and mechanically
Thanks to Al for running production Enough for all TPC need Need test, SPF+ transceivers
TPC Electronics MeetingJin Huang <[email protected]> 4
Thanks Al, John, putting together the kit
Expect first two delivered for testing in few weeks
If OK, approve production for rest 8 boards
TPC Electronics MeetingJin Huang <[email protected]> 5
Soldering sample, TTM One of two box of parts
TPC Electronics MeetingJin Huang <[email protected]> 6
TPC Electronics MeetingJin Huang <[email protected]> 7
Data path on EBDC: FELIX -> DMA -> Memory -> Compression on CPU -> Memory -> NIC -> Buffer boxes
Max to 10Gbps average rate per EBDC servers (x24 EBDCs, after 60% compression)
TPC Electronics MeetingJin Huang <[email protected]> 8
AuAu(Y-1)
AuAu(Y-3)
AuAu(Y-5)
pp pA
Average collision rate [kHz] 100 140 170 12900 2800
FFE → DAM data rate [Gbps] 1100 1476 1800 1700 1470
DAM → DAQ data rate [Gbps] 170 209 240 160 133
Per-event size @ DAQ [MB/evt] 1.4 1.7 2.0 1.3 1.1
TPC data rate table summed over 24 EBDCs, 2019 computing review
10Gbps/EBDC7Gbps/EBDC
A modern server: C621 mother board, 2x 16-core Intel Xeon Silver 4216 A PCI-express data source that match FELIX throughput in sPHENIX:
◦ ASUS Hyper M.2 X16 PCIe 3.0 + 4x [SAMSUNG 970 EVO PLUS M.2 2280 1TB]◦ Demonstrated to 48 Gbps write and 90 Gbps read in software RAID0 4 strip
1.3 TB all TPC beam data from 2019 FTBF test beam, uncompressed SAMPA ADC data in PRDF packaging
Thanks Martin and John: 2x bonded 10Gbps SFP+ NIC for data sink over network
TPC Electronics MeetingJin Huang <[email protected]> 9
Compare LZO, LZ4 and gzip in multiple levels
Analysis notes: https://github.com/sPHENIX-Collaboration/ebdc_compression/blob/master/data_48x_20GbpsNetwork/analysis.pdf
TPC Electronics MeetingJin Huang <[email protected]> 10
NIC limited to 20 Gbps
LZOp-lvl3(lzo1x_1_15_compress) fits the throughput and give 60+% compression◦ Similar to choice of PHENIX ◦ Possible further
optimization of parameters
LZ4-lvl3 reaches year-1 data rate and produce better compression◦ Decompression is x3 faster
than LZO in other tests
gzip-lvl1 are close to year-1 data rate too, and give very good compression (<45%)
TPC Electronics MeetingJin Huang <[email protected]> 11
Year 1 Year 5 Test network limit
Compression Assumption
TPC Electronics MeetingJin Huang <[email protected]> 12
TPC Electronics MeetingJin Huang <[email protected]> 13
TPC Electronics MeetingJin Huang <[email protected]> 14
dataset zipcmd ziplevel jobs totalInTime totalInSize totalOutTime totalOutSize Compression inRateGbps outRateGbps
0 gzip-1-48 gzip 1 48 35681.4667 1.345599e+12 35682.1754 5.911506e+11 0.439321 14.481192 6.361771
1 gzip-2-48 gzip 2 48 40012.4084 1.345599e+12 40013.1186 5.872109e+11 0.436394 12.913748 5.635376
2 gzip-3-48 gzip 3 48 56121.1630 1.345599e+12 56122.0689 5.710247e+11 0.424365 9.207047 3.907082
3 gzip-5-48 gzip 5 48 88668.8855 1.345599e+12 88670.1592 5.786260e+11 0.430014 5.827412 2.505831
4 gzip-7-48 gzip 7 48 260473.2410 1.345599e+12 260476.5507 5.738001e+11 0.426427 1.983736 0.845908
5 gzip-9-48 gzip 9 48 469253.2066 1.345599e+12 469258.8453 5.701411e+11 0.423708 1.101133 0.466553
6 lz4-1-48 lz4 1 48 17078.0360 1.345599e+12 17082.7751 9.084490e+11 0.675126 30.255830 20.420828
7 lz4-2-48 lz4 2 48 17037.3998 1.345599e+12 17041.7892 9.084490e+11 0.675126 30.327994 20.469941
8 lz4-3-48 lz4 3 48 41074.9621 1.345599e+12 41087.9044 7.775625e+11 0.577856 12.579687 7.266956
9 lz4-5-48 lz4 5 48 80064.1867 1.345599e+12 80089.0275 7.219256e+11 0.536509 6.453699 3.461391
10 lz4-7-48 lz4 7 48 129191.7374 1.345599e+12 129232.2927 7.007271e+11 0.520755 3.999560 2.082136
11 lz4-9-48 lz4 9 48 149248.8691 1.345599e+12 149296.7664 6.983585e+11 0.518994 3.462071 1.796219
12 lzop-1-48 lzop 1 48 16856.7238 1.345599e+12 16857.4112 8.581284e+11 0.637729 30.653060 19.547563
13 lzop-2-48 lzop 2 48 16902.8734 1.345599e+12 16903.5811 8.557885e+11 0.635991 30.569368 19.441017
14 lzop-3-48 lzop 3 48 16919.5788 1.345599e+12 16920.2196 8.557885e+11 0.635991 30.539186 19.421899
15 lzop-5-48 lzop 5 48 16922.7959 1.345599e+12 16923.4848 8.557885e+11 0.635991 30.533380 19.418152
16 lzop-7-48 lzop 7 48 328273.5071 1.345599e+12 328282.7795 6.592038e+11 0.489896 1.574023 0.771086
17 lzop-9-48 lzop 9 48 841328.1659 1.345599e+12 841350.8210 6.548982e+11 0.486696 0.614160 0.298901
gzip bzip2 lzma lzma -e xz xz -e lz4 lzop
1 8.1s 58.3s 31.7s 4m37s 32.2s 4m40s 1.3s 1.6s
2 8.5s 58.4s 40.7s 4m49s 41.9s 4m53s 1.4s 1.6s
3 9.6s 59.1s 1m2s 4m36s 1m1s 4m39s 1.3s 1.5s
5 14s 1m1s 3m5s 5m 3m6s 4m53s - 1.5s
7 21s 1m2s 4m14s 5m52s 4m13s 5m57s - 35s
9 33s 1m3s 4m48s 6m40s 4m51s 6m40s - 1m5s
gzip bzip2 lzma lzma -e xz xz -e lz4 lzop
TPC Electronics MeetingJin Huang <[email protected]> 15
https://catchchallenger.first-world.info/wiki/Quick_Benchmark:_Gzip_vs_Bzip2_vs_LZMA_vs_XZ_vs_LZ4_vs_LZO
gzip bzip2 lzma lzma -e xz xz -e lz4 lzop
1 3.5s 3.4s 6.7s 5.9s 7.2s 6.5s 0.4s 1.5s
2 3s 15.7 6.3s 5.6s 6.8s 6.3s 0.3s 1.4s
3 3.2s 15.9s 6s 5.6s 6.7s 6.2s 0.4s 1.4s
5 3.2s 16s 5.5s 5.4s 6.2s 6s - 1.5s
7 3s 15s 5.3s 5.3s 5.9s 5.8s - 1.3s
9 3s 15s 5s 5.1s 5.6s 5.6s - 1.2s
gzip bzip2 lzma lzma -e xz xz -e lz4 lzop
TPC Electronics MeetingJin Huang <[email protected]> 16
TPC Electronics MeetingJin Huang <[email protected]> 17
Buffer box
Input data stream: 600-1100 bidirectional fibers linksMax continuous: 6.5 Gbps / fiber
TPC DAM L3 Scope, 1.2.6
Output data stream to buffer box: 24 x 25+ Gbps Ethernet via fiberAfter triggering and compression
DAQ L2 ScopeWBS 1.6
Clock/Trigger input: Optical linksClock = 9.4 MHz
FEE L3 Scope1.2.5
Structure/GEM L3 Scopes
WBS 1.2.1-4
Transfer to SDCC continuously
➢ 150k chan
➢ 600 FEEs
➢ 24 sectors
➢ Continuous
readout
TPC Electronics MeetingJin Huang <[email protected]> 18
TPC Electronics MeetingJin Huang <[email protected]> 19
Continous-time simulation assuming cluster/layer = <dNch/deta>x2 and 3x5 sample / clusterLargest reduction factor comes from triggering throttling, which reduce data by factor of ~4
Raw data size for single event◦ 10Mbit for MB AuAu, and 130kbit
for MB pp.
Associating with LVL-1 triggers (15kHz) reduces data size to be recorded.
Lossless data compression (LZO algorithm) gives a reduction to 60% of the original size
Final average per-event data sizes (TPC-only) are:◦ 1.4MB/evt for MB AuAu in Year-1◦ 2.0MB/evt for MB AuAu in Year-5◦ 1.3MB/evt for MB pp
TPC Electronics MeetingJin Huang <[email protected]> 20
System AuAu(Y-1)
AuAu(Y-5)
pp
Collision rate [kHz]
100 170 12900
Raw data rate[Gbps]
1100 1800 1700
After LVL-1 trigger [Gbps]
290 400 260
After lossless compression (Gbps)
170 240 160
Per-event size [MB/evt]
1.4 2.0 1.3
After computing review, in July 2018, we improved the estimation in Geant4 based simulation: https://indico.bnl.gov/event/4024/◦ Mean channel occupancy in inner rings ~25% for triggered event @ 170kHz
collision, 30% @ 200kHz◦ Average single MB event data rate consistent between <dNch/deta>x2
estimation and Geant4 simulation (15% smaller)◦ Full pile up simulation suggest 25% higher data rate than <dNch/deta>x2
estimation (from longer ionization trail of off-time tracks?)
TPC Electronics MeetingJin Huang <[email protected]> 21
FEE Occupancy, AuAu MB + 170kHz
In the Feb DAQ workfest, there was lots of discussion on how TPC DAQ fits into sPHENIX global trigger and busy control
Following that, John K. demonstrated that via using Block RAM, we could allocate 77Mb in The DAM/FELIX FPGA◦ Up to demonstrate in actual usage as FIFO under all constraints
This study quantify buffer usage on FEE and FELIX with Geant4-simulated data stream and a “leaky-basket” buffer simulation ◦ Assuming “tight-packed” data, 2x 8bit header + tightly packed 10bit
ADC data
TPC Electronics MeetingJin Huang <[email protected]> 22
Estimating per-FELIX data at random time-window of 13us Per FELIX data has mean of 0.16MB (100Gbps) and a very
long tail
TPC Electronics MeetingJin Huang <[email protected]> 23
100Gbps * 13 us
100Gbps * 13 us
At ~100Gbps input and demonstrated FELIX output at a similar rate, transmitting all hits to the EBDC memory for trigger reduction is impractical.
However, if we throttle data on FELIX, we could dramatically reduce load on PCIe transmission
Required buffer depth ~ 70 Mb/9 MB◦ Guaranteed buffer for trigger delay = 16bit * 256 chan/FEE *
26 FEE * 20/us * 6.4us = 14 Mb / 1.7 MB◦ FIFO for DMA transmission, next slides ~ 2MB◦ Guaranteed buffer for taking the current trigger before
generating a real-time busy signal = 16bit * 256 chan/FEE * 26 FEE * 20/us * 13us = 27 Mb/3.5 MB
TPC Electronics MeetingJin Huang <[email protected]> 24
Using a leaky basket simulation to estimate buffer usage and probability of buffer full
At 100 Gbps DMA transfer (demonstrated throughput by ATLAS)
TPC Electronics MeetingJin Huang <[email protected]> 25
1MB buffer full once ~ 10s. 2MB buffer almost never get full
How about slower than expected DMA transfer? Say half speed, 50 Gbps 1 MB buffer would lead to 10-4 busy in any time frame. 2 MB would be quite
comfortable.
TPC Electronics MeetingJin Huang <[email protected]> 26
100 Gbps DMA transfer (shown possible by ATLAS) Much slower 50 Gbps DMA transfer
FEE observe similar large fluctuation in occupancy too. Depending on FEE’s location, average data rate is
~5Gbps @ module 1&2, 2.5 Gbps @ module 3.
TPC Electronics MeetingJin Huang <[email protected]> 27
5Gbps * 13 us
Assuming double 6.5Gbps links to FELIX, with total 10Gbps payload rate after encoding and max 95% usage
TPC Electronics MeetingJin Huang <[email protected]> 28
Assuming single 6.5Gbps link to FELIX, with total ~5Gbps payload rate after encoding and max 95% usage
TPC Electronics MeetingJin Huang <[email protected]> 29
Previous we assume single data link on module-2 FEEs. However, at this rate we would require double 6.5Gbps links.
TPC Electronics MeetingJin Huang <[email protected]> 30
Double uplinks per module-2 FEESingle uplinks per module-2 FEE
Jin Huang <[email protected]> 31
Max charge layer: Module-2’s 1st layer (#17)◦ Average <Q>~125 fC◦ P[Q > 300 fC/13 us] ~ 8%, P[Q>600fC/13 us] < 2%
SAMPA chip is expect to operate 600-700 fC/20usec
TPC Layer shown: [1-16]: R1, [17-32]: R2, [33-48]:R3
TPC Electronics Meeting
We would need almost all the 70Mb Block RAM FELIX FPGA can offer: ◦ Buffer for trigger delay and perform trigger throttling on FPGA ◦ DMA FIFO with 10-6 chance of buffer full.◦ Produce a real-time busy for GL1, and garmented buffer for current
event prior to busy
We would need double FEE->FELIX fiber links for module 1 & module 2 FEEs. ◦ Buffer usage ~60kB at <10-6 chance of buffer full.
But the loss at FEE is non-deterministic. Therefore, deeper buffer would be helpful.
◦ Total FEE->FELIX fiber links = 6*2 (module 1) + 8*2 (module 2) + 12 (module 3) = 40 out of 47 FELIX input links
TPC Electronics MeetingJin Huang <[email protected]> 32
TPC Electronics MeetingJin Huang <[email protected]> 33
TPC Electronics MeetingJin Huang <[email protected]> 37
sPHENIX use a next-generation TPC working in continuous mode and can receive high amount of charge
Past parameters was estimated by <dNch/deta> * scale factor(x2) * average cluster size (15ADC)/charge(60e) estimations
Tony’s implantation of TPC FEE simulation (PR455) allows investigating TPC FEE and DAQ in Geant4 simulation, check average rate and quantify the fluctuations
Two questions trying to address◦ Charge received by TPC FEE.
The FEE chip (SAMPA) is expected to operate up to 600-700 fC/20usec◦ DAQ output data rate after event building
A <dNch/deta>-based DAQ simulation expect 400 Gbps data output rate to memory of DAQ computer (240 Gbps after LZO-compression)
TPC Electronics MeetingJin Huang <[email protected]> 38
TPC is the dominating data contributor to sPHENIX event Using past <dNch/deta>x2 estimation, expect event size is◦ Single MB collision, no pile up:
1.05 MB/event (before compression)◦ Year-5 average, MB + 170kHz AuAu (plots below):
3.3 MB /event (before compression) 240 Gbps (15kHz trigger, LZO compression)
Now simulating the event size and data rate in Geant4 simulation.
TPC Electronics MeetingJin Huang <[email protected]> 39
Used b=0-14.7 fm AuAu sHIJING simulation to mimic 6.8b Au+Au non-elastic MB collision events◦ <dNch/deta> ~ 190
Primary collision within |z|<10 cm Pile up collision has Gauss z width of 30 cm
TPC Electronics MeetingJin Huang <[email protected]> 40
TPC data are sampled in wavelets (+4ADC sample for 1st
ADC sample above threshold) Wavelet are streamed in Byte-
aligned structure in raw data: ◦ 9 Byte for 5-ADC sample wavelet◦ Extends for longer wavelet
TPC Electronics MeetingJin Huang <[email protected]> 41
Draft wavelet data format
2x Byte header
7x Byte data for 5x10 bitADC numbers
Timing header/FEE
5-ADC-sample wavelets
Two wavelets pileup
Only hits within |z|=10cm + eta=1.1 lines is used in central tracking program assume 10-cm vertex limit
Saving ~17% DAQ output rate Implemented in wavelet acceptance cut Note: need to check such cut again with distortion and with
possible large-vertex program
TPC Electronics MeetingJin Huang <[email protected]> 42
<dNch/deta> estimation:Accepted ADC samples in bit size
sPHENIX TPC FEE in Geant4 simulation:Accepted ADC samples in ADC sum
R1
(6
mm
str
ip)
R2
,3 (
12
.5m
m s
trip
)
Single MB collision, no pile up, emulated with 0-14.7 fm sHIJIGN collisions <dNch/deta>x2 estimation: 1.05 MB/event (before compression) Geant4 simulation: 0.9 MB/event (before compression) Consistent with in the uncertainty of the <dNch/deta>x2 arbitrary scale factor
TPC Electronics MeetingJin Huang <[email protected]> 43
<dNch/deta> estimation:Accepted ADC samples in bit size
Consider MB collision in 170kHz AuAu collision (year-5 average) In pile-ups tracks, the high-eta ionization trail shifted into timing acceptance is likely to
leave more ADC hits than central rapidity tracks, which leads to longer wavelets and larger data rates
<dNch/deta>x2 estimation: 3.3 MB /event (before compression), 240 Gbps (15kHz trigger, LZO compression)
Geant4 simulation: ◦ ~3.6 MB/event (before compression), ~260 Gbps (15kHz trigger, LZO compression)◦ Note: the statistics is low and need to run more event
TPC Electronics MeetingJin Huang <[email protected]> 44
Note: the statistics is low
Since Tony’s PR#455, we can check TPC FEE and DAQ running parameters and their fluctuations in Geant4 simulations
Total charge in FEE: In the most challenging running event and running conditions (0-7% Central + 200kHz pile up Au+Au collisions), there is non-negligible chance (1-8%) for a channel in TPC FEE to reach 300 fCin 13 us. ◦ Simulated with conservative pileup collisions, 0-12 fm event in 200k Hz pile up
collision ◦ Chip developer expected the TPC FEE chip handle at least 600-700 fC/20 us◦ Necessary to follow up and test higher limits. 600 fC/13 us is much safer
(chance < 2%)
Data rate:◦ Average single MB event data rate consistent between <dNch/deta>x2
estimation and Geant4 simulation (difference ~15%)◦ Full pile up simulation suggest slightly higher data rate than <dNch/deta>x2
estimation (from longer ionization train of off time tracks?)
TPC Electronics MeetingJin Huang <[email protected]> 45
TPC DAQ lib in coresoftware◦ https://github.com/sPHENIX-
Collaboration/coresoftware/pull/462
Fun4All macros: ◦ https://github.com/blackcathj/macros/tree/tpc_electronics_h
epmc_mb/macros/g4simulations
Plotting macros: ◦ https://github.com/sPHENIX-
Collaboration/analysis/blob/master/TPC/DAQ/macros/DrawTPCIntegratedCharge.C
TPC Electronics MeetingJin Huang <[email protected]> 46
TPC Electronics MeetingJin Huang <[email protected]> 47
TPC Electronics MeetingJin Huang <[email protected]> 48
The FEE chip (SAMPA) is expected to operate at least up to 300 fC/20 us.
Takao: <dNch/deta> estimation expect ~100 fC average charge / 2 central collisions
In the current setup, it is easier to simulation 1 trigger window ~ 13 us
Consider most busy event:◦ 1x central AuAu event triggered◦ 200 kHz Au+Au collisions
(Year-5 begin-fill luminosity)-> 5 pile collisions relevant for drift window of one collision Pile up collision conservatively emulated using 0-
12 fm HIJING simulation (~0-66% Central)
◦ Current charge implementation: 1.2 cm MIP ionization ~ 57 e based on Geant4 edep, amplified by fix GEM gain of x2000 -> TPC FEE
As we know, occupancy is high: 30% max averaged per channel in inner layers with tails towards 50%
Next: count charge deposited in each FEE channel (after GEM amplification) within 13 us drift window
sPHENIX TPC FEE in Geant4 simulation:Number of ADC samples per channel within a drift window (max ~250)1 triggered central Au+Au collision + 200 kHz pile up collisions
100% Occup.
Full tracking layer ID (TPC layer 1-48)
# o
f ac
tive
AD
C s
amp
le p
er c
han
nel
per
eve
nt.
M
ax ~
25
0 h
it /
13
us 30% Occup.
Jin Huang <[email protected]> 49
Max charge layer: Module-2’s 1st layer (#17)◦ Average <Q>~125 fC◦ P[Q > 300 fC/13 us] ~ 8%, P[Q>600fC/13 us] < 2%
SAMPA chip is expect to operate 600-700 fC/20usec
TPC Layer shown: [1-16]: R1, [17-32]: R2, [33-48]:R3
TPC Electronics Meeting
0-7% Central + 200kHz Au+Au 0-100% Central + 170kHz Au+Au
TPC Electronics MeetingJin Huang <[email protected]> 50
TPC Electronics MeetingJin Huang <[email protected]> 51
TPC Electronics MeetingJin Huang <[email protected]> 52
TPC Electronics MeetingJin Huang <[email protected]> 53
TPC Electronics MeetingJin Huang <[email protected]> 54