Increasing the speed of wideband VLBI correlation using GPUs
Arash Roshanineshat
Summer 20171
Overview
• Background• Correlation• DiFX• Hardware• DiFX with GPU• Benchmarks• Conclusion• Future Work• Acknowledgements
2
Background
3
The Event Horizon Telescope (EHT) is an international collaboration aiming to capture the first image of a black hole by creating a virtual Earth-sized telescope.
EHT Telescope Array:
Correlation
4
The result appears to have come from a single antenna whose surface is made of the actual individual antennas.
Source
The goal is to make high-resolution maps of radio sources.
Correlation
5
Correlation Method:• XF
1. Cross Multiplication2. Frequency Transformation
Filter Bank
Filter Bank
X X XR(x) R(x) R(x)
𝑉 (𝑡)
𝑉 (𝑡)
• FX 1. Frequency Transformation2. Cross Multiplications
FX
Correlation
6
Correlation Platforms:
• Application-Specific Integrated Circuits (ASICs)• Field Programmable Gate Arrays (FPGAs)
In 2007, Adam Deller introduced Distributed FX (DiFX) software package
DiFX
7
• The core is programmed in C and C++• Suitable for generic multi-processor systems• Supports modern hard-drive recording systems naturally• Easy to configure
Source data
DataStream 1
DataStream 2
DataStream N
Core 1
Core 2
DataStream M
FX Manager
DiFX
8
EHT data is recorded at 4096 mega-samples per second
DiFX correlator does not process it in real-time
Can we speed up the process?
DiFX
9
2010, Andrew Woods,“Accelerating Software Radio Astronomy FX Correlation with GPU and
FPGA Co-processors”
He simplified the DiFX code and used only the core.
Concluded that co-processors like GPU and FPGA will speed-up the process.
Focused on X-Engine of DiFX
To research the effect of GPU co-processors on full DiFX package,we setup a cluster from scratch.
Hardware
12
3 4
10
• 4 Machines without GPU• 20 CPU each• 3.4 GHz max frequency
Hardware
• 1 GPU Machine• 16 CPUs• 3.0 GHz max frequency• 4 x 1080 Ti GPUs 11
1 2 3 4
Hardware
Loading Data
Star shape network12
40 GbE
LAN
DependenciesDependencies of DiFX:
• MPI Libraries (OpenMPI)This library will provide Map/Reduce functions to distribute data in the cluster
• Haystack Observatory Postprocessing System (HOPS)This library will provide “fourfit” process to plot the output data
• Intel® Integrated Performance Primitives (Intel® IPP)Very optimized vector library for Intel CPUs
Additional Dependecy:• CUDA Driver
This will give the opportunity of using Nvidia GPUs
Software
.v2d .vex
.calc
$ mpifxcorr
.input$ vex2difx
$ calcif2
Output FilesMark4 Datafiles
$ difx2mark4
Plots
$ fourfit
Vdif Files
Difx Operation Block Diagram:Config files Core
14
Software
Output of Difx on CPU:
15
DiFX with GPU
Vector Operations and FFT Libraries are defined in the file:
architecture.h.in
Following architectures are introduced:
• Intel • GENERIC
16
DiFX with GPU- INTEL mode uses:
• Intel Integrated Performance Primitives (IPP) For F-Engine • Intel Integrated Performance Primitives (IPP) for X-Engine
- GENERIC architecture uses:• FFTW for F-Engine• C++ standards for X-Engine
FFTW cuFFTw
CUDA has made porting the code much more easier than before with having little modification of code.
17
DiFX with GPU
X Engine:
vectorMul(src, dst, length){
for i = 0 to length:dst[i] = src[i] * src[i]
}
Embarrassingly ParallelProblem
F Engine:
FFT function all use cuFFTw library
18
DiFX with GPU
19
• Very small difference. floating point operations are not guaranteed to be identical.
Benchmarks
1 2 3 4Only CPU machines 396 198 130 90
0
50
100
150
200
250
300
350
400
450
Tim
e (s
econ
d)
Benchmark of DiFX
20
CPU Machine
Benchmarks
21
Change one of the CPUmachines with the GPU
machine
GPU is Off!
Benchmarks
1 2 3 4Only CPU machines 396 198 130 90GPU Disabled 290 138 94 71
0
50
100
150
200
250
300
350
400
450
Tim
e (s
econ
d)
Benchmark of DiFX
22
CPU Machine
GPU MachineGPU Off
Benchmarks
23
Turn On the GPU!
Benchmarks
1 2 3 4Only CPU machines 396 198 130 90GPU Disabled 290 138 94 71GPU Enabled 258 119 80 63
0
50
100
150
200
250
300
350
400
450
Tim
e (s
econ
d)
Benchmark of DiFX
24
CPU Machine
GPU MachineGPU Off
GPU MachineGPU On
Conclusion
25
• GPU will make the process faster, about 20% to 25%• From the Financially perspective, a GPU machine is cheaper but
works better
Future Work
26
• Optimize the GPU process • Study other alternative libraries to support more co-
processors, like Tensorflow and Thrust
Acknowledgments
27
Jonathan Weintroub
Shep Doeleman
Andre Young
Lindy Blackburn
Rurik Primiani
Mark Peryer
Geoff Crew
DiIFX Community
Questions?
28