The SKA LOW correlator design challenges
CSIRO ASTRONOMY AND SPACE SCIENCE
John Bunton | CSP System EngineerC4SKA, Auckland, 9-10 February, 2017
SKA1 Low antenna station (Australia)
The SKA LOW correlator design challenges
The SKA LOW correlator design challenges
Station beamforming part of Receptor Sub-Element (LFAA)
Low StationsLow Frequency Aperture Array - LFAALFAA has 512 station, maximum baseline less than 65km • Distributed between 1-16 subarrays
Each station 256 dual polarisation log periodic antenna.• Frequency band 50-350MHz
Signal is sent using RF-over-Fibre to Digitiser/beamformer.
Output to CSP_Low• 384 “coarse” channels per station (781 kHz each, 300 MHz total)• Channels can have any of 8 different look directions.• Stations can be in any of 16 subarrays • Up to 128 look directions! • 5.8Tbps of data
The SKA LOW correlator design challenges
Central Signal Processing
Central Signal Processing (CSP) tasked to take the data from LFAA and produce• visibilitiy data and • Pulsar products.
CSP divided into 4 sub-elements• Correlator and beamform• Pulsar timing• Pulsar search and• Local Monitor and control
Correlator and beamformer (CBF) work package is done by
CSIRO (Australia), ASTRON (Netherlands) and AUT (New Zealand)
The SKA LOW correlator design challenges
Correlator – “standard” mode
Low correlator full Stokes (all polarisation parameters)
Low 524,800 correlations per frequency channel
Upt to 65k channels across the band
Low 34G correlation per dump, 0.25s dumps, 11.0Tbps output
Note originally 0.9s
Compute load in the correlator (one correlation acc equiv 8 Flop)
Low 524,800 correlation at 0.3GHz = 1.26 Petaflops
Other processing within CSP similar in compute load
The SKA LOW correlator design challenges
MWA LOFAR JVLA ASKAP ALMA SKA1_LOW SKA1_MIDCorrelator TFLOPS 8 19 131 224 746 1258 2484Inpt data rate Tbits/sec 0.08 0.34 3.1 12.4 10.4 5.8 29.9Output data rate Gbit/sec 3 97 1 20 48 10995 2543
Frequency reolution
Frequency Resolution “standard” observing• 4.6 kHz across 300 MHz• 52k frequency channels
Or Zoom Mode - 4 bands• Zoom band bandwidth• each 4, 8, 16, 32, 64, 128 or 256 MHz• Note orginally just 4, 8, 16, 32 MHz• Zoom band centre frequency – anywhere in observing band. • Band overlap allowed• An LFAA frequency channels can be in any zoom band• 16k output frequency channels per zoom bnad• resolutions 0.23 to 14.5 kHz
The SKA LOW correlator design challenges
Requirement churn
Recent Engineering Change Proposal has allows an exchange of bandwidth for number of inputs.
Each of 512 LFFA stations can be configured to sum subsets of the 256• Example over 150 MHz sum two separate sets of 128 antennas
– Looks like two smaller stations (substations) each with 150MHz bandwidth.– Input to correlator now 1024 station, to maintain total numbe of correlations
constant bandwidth is reduced to 75MHz.
Must design so it is possibel to accomodate 1204 or 2048 substation
Other major recent changes • Added zoom modes 64-256 MHz. Required a major design change.• Decrease in integation time – major increase in output data rate
Design must be capable of adapting to requirement change.
The SKA LOW correlator design challenges
Station based processing
FX correlator implemented – channelise data to final resolution before correlations• 8 diiferent frequency resolutions (226 to 14.5kHz) - 8 filterbanks ???• Finest zoom mode 4096 channel filterbank
– AH HA! Implement finest zoom and integrate in frequency for rest– Integration of 1,2,4,8,16,24,32, or 64 channels for all resolution
Relative delay of astronomical signal to stations varies with time• Must be remove• Implemented as sample delay correction (coarse) and phase slope across
filterbank channels.
RFI flagging – input data has flags and internal flagging needed
The SKA LOW correlator design challenges
Corner turning
Filterbanks and correlation engine cannot process all frequency channels simultaneously• Must process part of the bandwidth at time • Filterbanks and correlator to process a few of the 384 LFAA channels at a time
Store all frequency channels for short term integration time and read out to filterbanks all time data for limited channels at time
Input data – All frequencies for limited time (0.2ms)
Output data – All time data for limited bandwidth (0.9, 0.25s)
The SKA LOW correlator design challenges
Gemini board
Proof of Concept (POC)
Single FPGA Xilinx Virtex Ultrascale+, water cooled, 4xHybrid Memory Cubes (2 link, 4G), 4x12 fibre optical at 25G, 4x
Four to be mounted in a 1U chassisBUT 2 link, 4G HMC is now end-of-life
Redesign underway for Prototype
HMC high bandwidth memory replaced by integrated HBM (smaller but faster)Add DDR4 for bulk memory
The SKA LOW correlator design challenges
Design Evolution - POC design – Separate Subsysems
With Gemini POC originally had a separate Correlator, Beamfromerand Station Based processing and 0.9s integration time
Major corner turn for correlator in the Station based processing(144Gbps per FPGA). But insufficient HMC or 0.9s (1.3TB double buffer but had 43 FPGAs with 16 G each)
Must store accumulate full time integration in correlation 0.34TBBut uses most of available HMC bandwidth
The SKA LOW correlator design challenges
Prototype (Unified) Design
Change to HBM reduced the available memory by half. Major problems fitting the design in
Go to unified design - Station based processing and correlation in the same FPGA.• Number FPGAs that accept inputs from LFAA increased from 43 to 288
– Six times reduction in input bandwidth per FPGA– Can now use DDR4 for corner turn– AND no buffer size limitation 0.9s possible– Correlator can output data SDP for a frequency channel as soon as it is
computed. Very little memory need for correlator buffer
What looked like a disaster with loss of a key component has lead a better an more robust design
The SKA LOW correlator design challenges
Connecting the FPGA
The Unified design has 288 FPGAs
All FPGAs must be able to communicate with all others
Switch ??
But the heart of switch is usually and FPGA
Number the FPGAs (X,Y,Z) (X 1:8) (Y,Z 1:6)Arrange as a cube with these coordinates
Cross connect within rows and colums
Inculding self connection each FPGA has6 in X, 6 in Y and 8 in Z connections 20 connection in total - 500Gbps
The SKA LOW correlator design challenges
FPGA Cube
X, 1 to 6
Y, 1 to 6
Z, 1 to 8
(1,1,1)
(3,4,3)
(3,1,1)
(3,4,1)
Data Flow
Input FPGA have at most one LFAA input (2 stations full bandwidth)
ZXY connections to uniformly distribute data for processing
Allows uniform distribution of compute and output in Zoom mode
Beamformers must bring all frequency data together use same XYZ
The SKA LOW correlator design challenges
IngestDoppler
Correction
Buffer InFilterbank
DelayRFI
CorrelatorOutput Buffer
PSS Beamformer
PST Beamformer
Z XY
Corr Emit
PSS Emit
PST Buffer PST EmitZ
XYZ
XY
Z
LFAA inputs
SDP Outputs
PSS Outputs
PST Outputs
Station Processing
Buffers
Array Processing
VLBI Reformat
VLBI Outputs
0.9 to 0.25 sec integration
With 0.9s integration output uses 72 (50% full)• One in 4 FPGAs have output.• Aggregate using Z connect All 8 interconnected FPGA send data to two output
FPGA half to each
At 0.25s (changed requirement) is 144 at 100%. Use 180 at 80%• Simply change to 5 out of 8 FPGAs have outputs. Each FPGA sends 1/5 of its
data to an output FPGA. • Design can accommodate 22Tbps of ouput data to SDP without modifications.• Small change to hardware (duplicate the 8 Z connection on each FPA) and can
do 44Tbps
The SKA LOW correlator design challenges
Substations
Unified design ease usage of fast memory
Without substations 4 LFAA channels are processed in parallel• Need for uniform distribution of load from 4 zoom bands.
With 2048 substations process 1LFAA station at a time - 4X stations• Same data rate to correlator, Same size for input buffer to correlator for 2048
substations
Correlator process 2048 stations in 16 passes• Buffer for correlation products increases form 55MB to 0.88GB
– Progressive readout during processing could reduce this but more complex
The SKA LOW correlator design challenges
Conclusion
CSIRO/ASTRON/AUT have design a flexible FPGA based system for the LOW Correlator
It has sufficient spare resources, I/O and memory to accommodate recent requirement changes and still have spare capacity• 20 of 48 internal optical connection per FPGA are currently used.• Further expanasion possible – not I/O limited
Zoom mode changes required major redesign of data ordering but no chance to the hardware
Changes to integration time and addition of substation were changes only to some subsystems
The SKA LOW correlator design challenges
Revised Low Correlator and Beamformers
All processing modules identical.
Cross connects deliver part band to each correlator and beamformer.
Interconnects in reverse aggregate the data
e.g. 2 complete PSS beams per Filtebank/Correlator
Now 4 LFAA station per GEMINIPrevious was 12
separate filterbank
The SKA LOW correlator design challenges
16 groups of 8 also an option
Filterbanks Cross connects Correlator
1/8 BW per link
1st group of 16
8thgroup of 16
GeminiFrom LFAA
To SDP,PSS, PST PSS and PST
Beamformers
Gemini
128 Gemini
6 Gemini per group
From LFAA
From LFAA
From LFAA
From LFAA
From LFAA
All internal links bi-directional
Gemini version II
On board to rule them all (functions that is)
One HMC retained for High Bandwidth External memory
Two DDR4 to be added for High Memory depth system
Up to 4 12-fibre 25G optics + QSPF,SFP
Change to card rack system, Each card a single Gemini II with all I/OWater cooling, up to 200W per card
One FPGA per LRU - Reduced (1/4) I/O per line replaceable unit (LRU
Pluggable Optics, Power and Water at rear – easy replacement
All data connections optical
The SKA LOW correlator design challenges
SKA1 Overview
SKA1-low stations include Station Beamformer
Central Signal Processing includes
Correlator and Pulsar systems
The SKA LOW correlator design challenges
The SKA LOW correlator design challenges
Thank youCASSJohn BuntonSKA1 CSP System Engineert +61 2 9372 4420e [email protected] www.atnf.csiro.au/projects/askap
PO BOX 76 EPPING, 1710, AUSTRALIA
The SKA LOW correlator design challenges
Central Signal Processing
For Mid and Low Central Signal Processing (CSP) consists of
Correlator between all pairs of elements (dish or station)
Tied Array Beamforming coherent sum of signals from all elements
Tied Array beams are processed by
Pulsar Search engine (limited bandwith)
Pulsar Timing engine
and are used for VLBI
LMC - Monitor of performance and control of all functions (NRC Canada, )
The SKA LOW correlator design challenges
The SKA LOW correlator design challenges
SKA1 MID antennas (South Africa)
Mid Dishes
133 15m offset Gregorian Dishes + 64 MeerKAT dishes
Total of 197 dishes (Distributed between 1-16 subarrays)
maximum baseline, less than 150 km
Receivers for 5 bands
0.35 to 1.050 GHz full bandwidth 0.70 GHz at 8 bit resolution
0.95 to 1.76 GHz full bandwidth 0.81 GHz at 8 bit resolution
1.65 to 3.05 GHz not installed during construction
2.80 to 5.18 GHz not installed during construction
4.6 to 13.8 GHz 2 x 2.5GHz! at 4 bits resolution
16 subarrays
The SKA LOW correlator design challenges
CSP Organisation at PDR 2014 (Correlator)
In December 2014 the Preliminary Design Review (PDR) heldAt that time three telescopes Low, Mid and Survey.Physical Implementation Proposal (PIP) submitted for each
Low lead by Oxford University with three designs in a single PIPUniboard (ASTRON), PowerMX (NRC Canada), Redback (CSIRO)
Survey lead by AUT (NZ) considered many options in a single PIPRedback (CSIRO), PowerMX (NRC Canada), Multicore processor, GPUs and ASIC
MID lead by NRC Canada had thee separate PIPsPowerMX (Canada), Redback (CSIRO Australia) & SKARAB (S.A.)
Project management MDA Canada, Local Monitor Control NRC
The SKA LOW correlator design challenges
Pulsars
The Pulsar teams are
Pulsar Timing lead by Swinburne University
CPU/GPU based
Pulsar Search lead by Manchester University CPU/GPU based with FPGA acceleration/power saving
Pulsar search on limited bandwidth (120 MHz Low, 300 MHz Mid)
They process array beams (coherent, polarisation corrected sums of data from ~400 stations) generated by CBF
The SKA LOW correlator design challenges
A shake up for CSP Correlator/Beamformer
One outcome of the review was the SKA Office wanted just one design to proceed for each Telescope
THEN as total cost too high Rebaselining occurred. DecisionsStop the Survey Telescope
Delay work PAFs (led by CSIRO) (Critical to Survey)
The politicians stepped and decided which designs would proceed
NRC Canada continue to lead Mid (PowerMX)
CSIRO to lead Low with ASTRON, (Redback/Uniboard)+AUT
This resulted in the South African and UK teams to leaving (taking most of the Systems Engineering with them)
The SKA LOW correlator design challenges
Pulsar Timing Beamforming
Mid and Low form 16 tied array beams
Delay aligned, coherent summation of dual polarisation data
Must apply polarisation correction for each value summed
~ 3M Jones Matrices for low
Time resolution of data 200ns Mid, 2us Low
Basically no significant time ripples allowed, narrow impulse response
For Mid approach taken was fractional time delay filter on wideband signal to do delay correction and achieve narrow impulse response
For Low summation done on narrow band channels, phase only which is cheaper than fractional time delay – followed by synthesis.
New approach to avoid synthesis filterbank being investigated.
The SKA LOW correlator design challenges
Pulsar Timing
Pulsar signal is smeared due to dispersion (delay α wavelength2)
Must remove dispersion.
Pulsar time implement overlap-save convolution on the beamformed time series and the correction filter.
Time series ~1 minute, bandwidth 10 MHz, multi-million point FFTs in GPUs.
Very stringent timing requirements on data supplied
less than 10ns error over a 10 year period.
The SKA LOW correlator design challenges
Pulsar Search (PSS)
Mid to form 1500 power beams at a bandwidth of 300MHz
Low to form 500 power beams at a bandwidth of 120MHz
dishes/stations in compact area used, PSS beams to fill dish/station beam
Coherent summation of dish/station data, phase on narrow channels ~20kHz
Polarisation correction to dish/station data (beam centre) and for Low after beamforming. (~800,000 Jones Matrices for MID)Search each beam for Pulsars in PSS engine (GPU/CPU 16 racks, Low)
500 dispersion measures
in each dispersion measure acceleration search to 300ms-1
The SKA LOW correlator design challenges
Other Functions
Both Mid and Low require a transient buffer.
in Low allocated to LFFA, 256GB per Station, 150MHz of data, 2-bit precision
in Mid allocated to CSP 32GB per dish, 300MHz of data, 2-bit precision
Mid produces four VLBI beams.
VLBI possible Europe, America, Australia, Asia.
not sufficient Low frequency telescope for VLBI with Low.
The SKA LOW correlator design challenges
Hardware
Pulsar Search (PSS) and Pulsar Timing (PST) use common hardware for Mid and Low. CPU/GPU based (FPGA acceleration for PSS).
PST two racks at each site
PSS 16 rack at Low, dissipating ~160 kW, Mid 59 racks @ ~470 kWwider bandwidth, more beams but lower total delay to searchEach compute node process 2 beams (TBC)
Correlator and Beamformers are FPGA basedMid based on PowerMX
Low based on Perentie (development from Redback+Uniboard)
The SKA LOW correlator design challenges
Perentie (CSIRO/ASTRON) for Low
July 2015 Final confirmation the CSIRO would lead CSP for Low
Condition of leadership was to collaborate with ASTON on design.
At that time ASTRON were completing their Uniboard II
CSIRO platform was Redback-3. Both multi-FPGA boards.
For SKA CSIRO had proposed, Redback-5 a board with a single FPGAAfter a lengthy downselect process it was decided in November to proceed with GEMINI board
Four of these to be mounted in a 1U chassis
The SKA LOW correlator design challenges