+ All Categories
Home > Documents > TigerSHARC CLU Closer look at the XCORRS

TigerSHARC CLU Closer look at the XCORRS

Date post: 05-Jan-2016
Category:
Upload: xenos
View: 23 times
Download: 2 times
Share this document with a friend
Description:
TigerSHARC CLU Closer look at the XCORRS. M. Smith, University of Calgary, Canada [email protected]. The practice. Suppose we have the vector – in-phase and out-of-phase data gathered over an antenna from a satellite for example. Gain issues make it x16 - PowerPoint PPT Presentation
31
TigerSHARC CLU Closer look at the XCORRS M. Smith, University of Calgary, Canada [email protected]
Transcript
Page 1: TigerSHARC CLU Closer look at the XCORRS

TigerSHARC CLU

Closer look at the XCORRS

M. Smith,

University of Calgary, Canada

[email protected]

Page 2: TigerSHARC CLU Closer look at the XCORRS
Page 3: TigerSHARC CLU Closer look at the XCORRS

The practice

Suppose we have the vector – in-phase and out-of-phase data gathered over an antenna from a satellite for example. Gain issues make it x16

-16-16j, 16+16j, 16+16j, -16-16j 16+16j, 16+16j -16-16j, 16+16j, 16+16j, -16-16j 16+16j, 16+16j, -16-16j 16+16j, 16+16j, etc

Question – if the original data from the satellite had this form -1-j,1+j,1+j, -1-j,1+j,1+j, -1-j,1+j,1+j, -1-j,1+j,1+j, -1-j,1+j,1+j, -1-j,1+j,1+j,

How is the satellite data delayed? FOR THIS EXAMPLE …….. 0, 3, 6, 9, 12 etc

Page 4: TigerSHARC CLU Closer look at the XCORRS

Tackle the issue with FIR

First – modify correlation function to handle complex values Ignore that issue at the moment

Imagine 1024 data points + 1024 PRN Need to do 1024 FIR each of 1024 taps We know how to optimize to do 2 taps every cycle (one

in X and one in Y) Cycle time is 1024 * 512 cycles = 1 ms at 500 MHz

XCORS can do 8 * 16 taps each cycle in each compute block – 148 times faster

Page 5: TigerSHARC CLU Closer look at the XCORRS

Where does the CLU fit in?

Page 6: TigerSHARC CLU Closer look at the XCORRS

XCORRS definition

Page 7: TigerSHARC CLU Closer look at the XCORRS

THEORYMathematicaldefinition

Uses registers

TRDC

And something calledCUT

Page 8: TigerSHARC CLU Closer look at the XCORRS

Satellite data

Quad fetch brings in8 complex values 8 bits eachPattern here is -1 + 0j, 1 + 0j, 1 + 0j, -1 + 0j, 1 + 0j, 1 + 0j, ……….

Page 9: TigerSHARC CLU Closer look at the XCORRS

PRN code – 2 bit complex number

Seems strange to have two dummy bitsBut actually makes sense

PRN -1+ -1j, 1 + j, 1 + j, -1 + -1j, 1 + j, 1 + j, ……….

+1, -1 are associated with the PSK – more next lecture

Problem BINARY means 1 and 0, so how represent 1 and -1

Page 10: TigerSHARC CLU Closer look at the XCORRS

PRN

Page 11: TigerSHARC CLU Closer look at the XCORRS

PRN

0x3 value go in asC15 and C160011 -- C15 = -1 –j C16 = +1 + j

Page 12: TigerSHARC CLU Closer look at the XCORRS

Loading the THR registers

Page 13: TigerSHARC CLU Closer look at the XCORRS

Standard XCORRS instruction

Lower 46 bits ofTHR1:0

R7:3

TR0, TR1, TR2 ……. TR15

Page 14: TigerSHARC CLU Closer look at the XCORRS

TR15:0 = XCORRS(R7:4, THR3:0)

TR0 += D7 * C22 + D6 * C21 +… 8 tapsTR1 += D7 * C21 + D6 * C20 +… 8 taps………..………..TR15 += D7 * C7 + D6 * C6 + … 8 taps

64 taps each cycles – on both x and y compute blocks – if set up properly

128 taps each cycle – these are “complex taps”compared to 2 real taps / cycle after lab. 3

Page 15: TigerSHARC CLU Closer look at the XCORRS

TR15:0 = XCORRS(R7:4, THR3:0) (CUT -7)

TR0 += D7 * C22 + D6 * C21 + … 8 tapsTR1 += D7 * C21 + D6 * C20 + … 8 taps………..………..TR14 += D7 * C8 + D6 * C7 2 tapsTR15 += D7 * C7 1 taps

Page 16: TigerSHARC CLU Closer look at the XCORRS

TR15:0 = XCORRS(R7:4, THR3:0) (CUT -15)

TR0 += D7 * C22 + D6 * C21 … 8 tapsTR1 += D7 * C21 + D6 * C20 … 7 taps………..TR7 += D7 * C15 … 1 tapsTR0 += 0 … 0 taps

………..TR15 += 0 … 0 taps

Page 17: TigerSHARC CLU Closer look at the XCORRS

TR15:0 = XCORRS(R7:4, THR3:0) (CUT +15)

TR0 += 0 … 0 tapsTR1 += D0 *C14 1 taps………..TR7 += D6 * C14 + D5 * C13 + … 7 tapsTR0 += D7 * C14 + D6 * C13 + … 8 taps

………..TR15 += D7 * C7 + D6 * C7 + … 8 taps

Page 18: TigerSHARC CLU Closer look at the XCORRS
Page 19: TigerSHARC CLU Closer look at the XCORRS

TR15:0 = XCORRS(R7:4, THR3:0) (CUT -15)

TR0 += D7 * C22 + D6 * C21 … 8 tapsTR1 += D7 * C21 + D6 * C20 … 7 taps………..TR7 += D7 * C15 … 1 tapsTR0 += 0 … 0 taps

………..TR15 += 0 … 0 taps

Page 20: TigerSHARC CLU Closer look at the XCORRS
Page 21: TigerSHARC CLU Closer look at the XCORRS

TR15:0 = XCORRS(R7:4, THR3:0) (CUT -7)

TR0 += D7 * C22 + D6 * C21 + … 8 tapsTR1 += D7 * C21 + D6 * C20 + … 8 taps………..………..TR14 += D7 * C8 + D6 * C7 2 tapsTR15 += D7 * C7 1 taps

Page 22: TigerSHARC CLU Closer look at the XCORRS
Page 23: TigerSHARC CLU Closer look at the XCORRS

TR15:0 = XCORRS(R7:4, THR3:0)

TR0 += D7 * C22 + D6 * C21 +… 8 tapsTR1 += D7 * C21 + D6 * C20 +… 8 taps………..………..TR15 += D7 * C7 + D6 * C6 + … 8 taps

64 taps each cycles – on both x and y compute blocks – if set up properly

128 taps each cycle – these are “complex taps”compared to 2 real taps / cycle after lab. 3

Page 24: TigerSHARC CLU Closer look at the XCORRS
Page 25: TigerSHARC CLU Closer look at the XCORRS

Problem at this point -- THR3:2 emptyNeed to bring in more PRN values

Page 26: TigerSHARC CLU Closer look at the XCORRS

TR15:0 = XCORRS(R7:4, THR3:0) (CUT +15)

TR0 += 0 … 0 tapsTR1 += D0 *C14 1 taps………..TR7 += D6 * C14 + D5 * C13 + … 7 tapsTR0 += D7 * C14 + D6 * C13 + … 8 taps

………..TR15 += D7 * C7 + D6 * C7 + … 8 taps

Page 27: TigerSHARC CLU Closer look at the XCORRS
Page 28: TigerSHARC CLU Closer look at the XCORRS

Final Result

Maximum correlation occurs every 3 shifts – which is what we expectIs it the correct results

Page 29: TigerSHARC CLU Closer look at the XCORRS

Correlation – result expected

In step-1 +0j, 1 + 0j, 1 + 0j, … 16 times

with-1 - j, 1 + j, 1 + j, … 16 times

-1 * -1 + 1 * 1 + 1 * 1 + 48 = 0x30 -- Real component

Out of step-1 +0j, 1 + 0j, 1 + 0j, … 16 times

with1 + j, 1 + j, -1 - j, … 16 times

-1 * 1 + 1 * 1 + 1 * -1 + -16 = -0x10 = 0xFFF0

Page 30: TigerSHARC CLU Closer look at the XCORRS

Final Result

1) Now have correlation values for 16 shifts in TR registers – store to external memoryRepeat for all other necessary shifts – find the maximum2) Now make parallel in SISD mode 3) Now make parallel in SIMD

Page 31: TigerSHARC CLU Closer look at the XCORRS

Take home Quiz 4

Old requirement

Do Lab 4 with FFT and XCORRS

Write tests and demonstrate XCORRS used for correlation

a) Not parallel instruction format – but in a loopb) Now do in optimized SISD modec) Now do in optimized SIMD mode


Recommended