Date post: | 20-Jan-2016 |
Category: |
Documents |
View: | 216 times |
Download: | 0 times |
9/11/01 CDR Z Fitter 1
Z Fitter Algorithm and Implementation
Masahiro Morii, Harvard U.
Requirements, I/O Algorithm Implementation Resources & Latency
9/11/01 CDR Z Fitter 2
Executive Summary
Z Fitter measures track’s z0, pT, tan Algorithm demonstrated in C++
emulation LUTs reduce real-time computation to
minimum Resource and timing evaluated
Use 4% of CLBs in XC2V4000 Can process 3 seeds/CLK4 in pipeline Latency < 1 CLK4 for each seed
FPGA implementation ready to start
9/11/01 CDR Z Fitter 3
Functionality
Fit seed tracks from the Finder to a helix A seed track = a set of 10 TSF segments Measure z0, pT, tan Decision Module
segment
fitted track
z0
9/11/01 CDR Z Fitter 4
I/O
Inputs from Seed Finder TSF segments, hit map Curvature (or 1/pT), tan FPGA internal bus
Outputs to Decision Module Fit results: z0, z0 error, , tan Hit map Sent over 10 traces at 45 MHz
9/11/01 CDR Z Fitter 5
Inputs
TSF segments 10 bit and 4 bit error
is relative to the seed segment Only 9 segments/seed are needed
Error is not used by the Fitter Hit map
Which layer had a segment 10 bits
9/11/01 CDR Z Fitter 6
Inputs
Curvature Fitter needs a 1st guess of from the
Finder 6-bit resolution
tan Not used by the Fitter Finder provides 6-bit resolution
9/11/01 CDR Z Fitter 7
Outputs
Fit results
Hit map Passed through from input
Quantity
Unit Resolution
Limits
z0 cm 8 bits ±127 cm
z0 error cm 4 bits 0 to 15 cm
2-12cm-
1
8 bits |pT| > 145 MeV/c
tan 2-5 8 bits |tan| < 3.97
9/11/01 CDR Z Fitter 8
Algorithm
Step 1: r- fit Ignore stereo information
6 measurements of at different r Find seed and that minimize 2 in r-
Step 2: z0 fit Use stereo information
6 measurements of z at different r Find z0 and tan that minimize 2 in r-z
9/11/01 CDR Z Fitter 9
r- Fit
Merge stereo layers virtual axial layers 3 U+V pairs plus 3 axial 6 r-
measurements
= 0
Subtract shiftdue to curvatureusing input
Residuals are due to errors in and (seed)
9/11/01 CDR Z Fitter 10
r- Fit
residual
Calculate and minimize
seedresid
ii
2resid )( iir
22222
22222
inin
ininin
ii
iii
iii
iiiiii
rrr
rrrr
22222
2222
inin
inin
ii
ii
iii
iiiiii
rrr
rrrr
Error of
Error of seed
= +
9/11/01 CDR Z Fitter 11
z0 Fit
Go back to 6 stereo layers Apply corrections for and
=
Subtract shiftdue to curvatureusing fitted
Residuals are due to stereo angles
9/11/01 CDR Z Fitter 12
z0 Fit
residual
6 zi make a straight line in d-z plane
i
iii rz
tanresid
Stereo angle
z
d
z0
9/11/01 CDR Z Fitter 13
z0 Fit
Minimize
Assume errori
ii
r
tan
)mm1(
2
202 tan
i
dzz
2
22
2
2
222
2
2
0
1
i
i
i
i
i
i
i
i
ii
i
i
i
i
dd
ddzdz
z2
22
2
2
2222
1
1
tan
i
i
i
i
i
i
i
i
i
ii
ii
dd
dzdz
9/11/01 CDR Z Fitter 14
Implementation
Biggest concern: Speed Pre-compute as much as possible
Most computation packed in LUTs Only additions and multiplications at run
time First step: Software emulation
Bit-wise emulation of what hardware will do Validate the algorithm Study and optimize the performance
9/11/01 CDR Z Fitter 15
Software Emulation
boolL1DczNIFitter::zFitter(const L1DczNIFtable* table, int hitmap, const int* segphi, int rhoin, int &z0, int &z0err, int &rhoout, int &dipout) const{ if (!table->fitok(hitmap)) { z0 = -128; z0err = 15; rhoout = -128; dipout = -128; return false; } int phi[9]; int rh = table->rh5(rhoin); int hitax = table->hitax(hitmap); int sumr2phi = 0; int sumr2phidpdr = 0; int i; for (i = 0; i < 9; i++) { phi[i] = (segphi[i]*table->phiconv(i))>>13; phi[i] += table->twistzero(i); if (rhoin >= 0) phi[i] += table->curvcorr(i,rh); else phi[i] -= table->curvcorr(i,rh); if (table->useax(hitax,i)) { sumr2phi += table->wr2(i)*phi[i]; sumr2phidpdr += table->wr2dpdr(i,rh)*phi[i]; } } int dPhi1 = (sumr2phi*table->sumr2dpdr2(hitax,rh))>>13;
int dPhi2 = (sumr2phidpdr*table->sumr2dpdr(hitax,rh))>>13; int dPhi3 = (dPhi1-dPhi2)*table->denomrp(hitax,rh); int dPhi = dPhi3>>16; int dRho1 = (sumr2phi*table->sumr2dpdr(hitax,rh))>>14; int dRho2 = (sumr2phidpdr*table->sumr2(hitax))>>14; int dRho3 = (dRho1-dRho2)*table->denomrp(hitax,rh); int dRho = dRho3>>16; rhoout = (rhoin<<2) + dRho; hitmap &= 63; rh = table->rh3(rhoout); int sumzs2 = 0; int sumzds2 = 0; for (i = 0; i < 6; i++) { phi[i] += -dPhi + ((table->dphidrho(i,rh)*dRho)>>6); int z = (table->rstereo(i)*phi[i])>>8; if (hitmap & (1<<i)) { sumzs2 += z*table->sigma2z(i); sumzds2 += z*table->dsigma2z(i,rh); } } int z01 = (sumzs2*table->sumd2s2(hitmap,rh))>>6; int z02 = (sumzds2*table->sumds2(hitmap,rh))>>6; z0 = ((z01-z02)*table->denomzt(hitmap,rh))>>16; int td1 = (sumzds2*table->sums2(hitmap))>>1; int td2 = (sumzs2*table->sumds2(hitmap,rh))>>1; dipout = ((td1-td2)*table->denomzt(hitmap,rh))>>16; z0err = table->z0err(hitmap,rh); return true;}
The whole code (58 lines C++) is made of LUTs, additions, multiplications and bit
shifts
9/11/01 CDR Z Fitter 16
Engineering Constraints
FPGA: Xilinx Virtex-II XC2V4000 chosen for the Seed Track Finder Allow much smaller resources than Finder
As few as possible CLBs Latency: as short as possible
Ideally < 1 CLK4 FPGA runs at 180 MHz 48 ticks/CLK4
Most logic operations take 1 tick 18-bit multiplication takes 2 ticks
9/11/01 CDR Z Fitter 17
Seed Counting
12 seeds/module/CLK4 4 Engines Each Finder/Fitter pair processes 3 seeds Fitter receives a new seed every 1/3 CLK4
Fitting takes ~3/4 CLK4 for each seed Pipeline processing
Finder
Fitter
~3/4 CLK4A seed
arriving every 1/3 CLK4
DecisionModule
9/11/01 CDR Z Fitter 18
Data Flow
Seg
ment serializer
9 TSF
segm
ents
x 10 bits/segment
10
r- pipeline
Accum
ulator
r- pipeline
r- pipeline
Dual-port memory
z0 pipeline
z0 pipeline
Accum
ulator
z0 tan
z0err
8
8
8
4
10 10
segme
nt hit m
ap
6 initial curvature
z0 error
curvature
z0 tan
hit map
r- fit
z0 fit
9/11/01 CDR Z Fitter 19
r-Fit Block
Seg
ment serializer
r- pipeline
Accum
ulator
r- pipeline
r- pipeline
Dual-port memory
Inp
ut 9
se
gm
en
ts
3 segs/pipelineUnit conversion
Stereo cancellationCurvature
subtraction
Accumulate
in
2 i
iir iir2
Carry stereosegments to z0 Fit
9/11/01 CDR Z Fitter 20
Accumulator
adder
MUX
hit map
LUT
Sw
itch
adder
MUX
Accumulate 2 quantities from 3 sources arriving
every 2 ticks
in
2 i
iir
iir2
Pipeline 1Pipeline 1iir 2
in
2 i
iir
Pipeline 2Pipeline 2iir 2
in
2 i
iir
Pipeline 3Pipeline 3iir 2
in
2 i
iir
9/11/01 CDR Z Fitter 21
and Calculation
hit map
curvature
r^2
r^2dd
LUT
X
X
X
X
LUT
X
X
6 multiplications and2 additions in 5 ticks
in
2 i
iir
iir2
9/11/01 CDR Z Fitter 22
z0 Fit Block
Dual-port memory
z0 pipeline
z0 pipeline
Accum
ulator z0
tan
and from r- Fit6 Stereo
segments
3 segs/pipeline
, correction z
conversion
Accumulate
2iiidz 2i
iz
Done!
9/11/01 CDR Z Fitter 23
Resources
Dominated by the LUTsFunction CLBs RAM blocks MultipliersLook-up tables 144 8Computation 42 29Serialization 8Dual-port memory 3Miscellaneous 20Total Z Fitter 217 8 29Available in XC2V4000 5760 120 120Usage (%) 4% 7% 24%
9/11/01 CDR Z Fitter 24
TimingSerialization
Unit conversion
Twist correction
Curvature correction
Calculation of r2phi & r2phidpdr
Accumulation
Calculation of dPhi & dRho
Calculation of rhoout
Calculation of z0err
0
rhoout
z0 & dipout
z0err
8 16 24 32 ticks
dRho correction
Calculation of z
Calculation of z/err2 & zd/err2
Accumulation
Calculation of z0 & dipout
Dual-port memory
dPhi correction
Input arrives
r- pipeline
and
calculated
DPM
z0 pipelinez0 and tan
calculated
37 ticks
Input arrives
Next seed arrives
Time is in CLK180 ticks
9/11/01 CDR Z Fitter 25
Timing and Latency
Separation between two input seeds 15 OK to process 3 seeds/CLK4 @ 180 MHz 2 seeds/CLK4 if 120 MHz (same as Finder)
Latency for z0 and tan = 37 ticks ~3/4 CLK4 @ 180 MHz Output will add ~5 ticks Still < 1 CLK4 If 120 MHz, ~1.3 CLK4
9/11/01 CDR Z Fitter 26
Executive Summary
Z Fitter measures track’s z0, pT, tan Algorithm demonstrated in C++
emulation LUTs reduce real-time computation to
minimum Resource and timing evaluated
Use 4% of CLBs in XC2V4000 Can process 3 seeds/CLK4 in pipeline Latency < 1 CLK4 for each seed
FPGA implementation ready to start