+ All Categories
Home > Documents > My Presentation SAO3tauceti.caltech.edu/casper-workshop-2017/slides/7_roshanineshat.pdf · Title:...

My Presentation SAO3tauceti.caltech.edu/casper-workshop-2017/slides/7_roshanineshat.pdf · Title:...

Date post: 09-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
28
Increasing the speed of wideband VLBI correlation using GPUs Arash Roshanineshat Summer 2017 1
Transcript
Page 1: My Presentation SAO3tauceti.caltech.edu/casper-workshop-2017/slides/7_roshanineshat.pdf · Title: Microsoft PowerPoint - My_Presentation_SAO3.pptx Author: arash Created Date: 8/14/2017

Increasing the speed of wideband VLBI correlation using GPUs

Arash Roshanineshat

Summer 20171

Page 2: My Presentation SAO3tauceti.caltech.edu/casper-workshop-2017/slides/7_roshanineshat.pdf · Title: Microsoft PowerPoint - My_Presentation_SAO3.pptx Author: arash Created Date: 8/14/2017

Overview

• Background• Correlation• DiFX• Hardware• DiFX with GPU• Benchmarks• Conclusion• Future Work• Acknowledgements

2

Page 3: My Presentation SAO3tauceti.caltech.edu/casper-workshop-2017/slides/7_roshanineshat.pdf · Title: Microsoft PowerPoint - My_Presentation_SAO3.pptx Author: arash Created Date: 8/14/2017

Background

3

The Event Horizon Telescope (EHT) is an international collaboration aiming to capture the first image of a black hole by creating a virtual Earth-sized telescope.

EHT Telescope Array:

Page 4: My Presentation SAO3tauceti.caltech.edu/casper-workshop-2017/slides/7_roshanineshat.pdf · Title: Microsoft PowerPoint - My_Presentation_SAO3.pptx Author: arash Created Date: 8/14/2017

Correlation

4

The result appears to have come from a single antenna whose surface is made of the actual individual antennas.

Source

The goal is to make high-resolution maps of radio sources.

Page 5: My Presentation SAO3tauceti.caltech.edu/casper-workshop-2017/slides/7_roshanineshat.pdf · Title: Microsoft PowerPoint - My_Presentation_SAO3.pptx Author: arash Created Date: 8/14/2017

Correlation

5

Correlation Method:• XF

1. Cross Multiplication2. Frequency Transformation

Filter Bank

Filter Bank

X X XR(x) R(x) R(x)

𝑉 (𝑡)

𝑉 (𝑡)

• FX 1. Frequency Transformation2. Cross Multiplications

FX

Page 6: My Presentation SAO3tauceti.caltech.edu/casper-workshop-2017/slides/7_roshanineshat.pdf · Title: Microsoft PowerPoint - My_Presentation_SAO3.pptx Author: arash Created Date: 8/14/2017

Correlation

6

Correlation Platforms:

• Application-Specific Integrated Circuits (ASICs)• Field Programmable Gate Arrays (FPGAs)

In 2007, Adam Deller introduced Distributed FX (DiFX) software package

Page 7: My Presentation SAO3tauceti.caltech.edu/casper-workshop-2017/slides/7_roshanineshat.pdf · Title: Microsoft PowerPoint - My_Presentation_SAO3.pptx Author: arash Created Date: 8/14/2017

DiFX

7

• The core is programmed in C and C++• Suitable for generic multi-processor systems• Supports modern hard-drive recording systems naturally• Easy to configure

Source data

DataStream 1

DataStream 2

DataStream N

Core 1

Core 2

DataStream M

FX Manager

Page 8: My Presentation SAO3tauceti.caltech.edu/casper-workshop-2017/slides/7_roshanineshat.pdf · Title: Microsoft PowerPoint - My_Presentation_SAO3.pptx Author: arash Created Date: 8/14/2017

DiFX

8

EHT data is recorded at 4096 mega-samples per second

DiFX correlator does not process it in real-time

Can we speed up the process?

Page 9: My Presentation SAO3tauceti.caltech.edu/casper-workshop-2017/slides/7_roshanineshat.pdf · Title: Microsoft PowerPoint - My_Presentation_SAO3.pptx Author: arash Created Date: 8/14/2017

DiFX

9

2010, Andrew Woods,“Accelerating Software Radio Astronomy FX Correlation with GPU and

FPGA Co-processors”

He simplified the DiFX code and used only the core.

Concluded that co-processors like GPU and FPGA will speed-up the process.

Focused on X-Engine of DiFX

To research the effect of GPU co-processors on full DiFX package,we setup a cluster from scratch.

Page 10: My Presentation SAO3tauceti.caltech.edu/casper-workshop-2017/slides/7_roshanineshat.pdf · Title: Microsoft PowerPoint - My_Presentation_SAO3.pptx Author: arash Created Date: 8/14/2017

Hardware

12

3 4

10

• 4 Machines without GPU• 20 CPU each• 3.4 GHz max frequency

Page 11: My Presentation SAO3tauceti.caltech.edu/casper-workshop-2017/slides/7_roshanineshat.pdf · Title: Microsoft PowerPoint - My_Presentation_SAO3.pptx Author: arash Created Date: 8/14/2017

Hardware

• 1 GPU Machine• 16 CPUs• 3.0 GHz max frequency• 4 x 1080 Ti GPUs 11

1 2 3 4

Page 12: My Presentation SAO3tauceti.caltech.edu/casper-workshop-2017/slides/7_roshanineshat.pdf · Title: Microsoft PowerPoint - My_Presentation_SAO3.pptx Author: arash Created Date: 8/14/2017

Hardware

Loading Data

Star shape network12

40 GbE

LAN

Page 13: My Presentation SAO3tauceti.caltech.edu/casper-workshop-2017/slides/7_roshanineshat.pdf · Title: Microsoft PowerPoint - My_Presentation_SAO3.pptx Author: arash Created Date: 8/14/2017

DependenciesDependencies of DiFX:

• MPI Libraries (OpenMPI)This library will provide Map/Reduce functions to distribute data in the cluster

• Haystack Observatory Postprocessing System (HOPS)This library will provide “fourfit” process to plot the output data

• Intel® Integrated Performance Primitives (Intel® IPP)Very optimized vector library for Intel CPUs

Additional Dependecy:• CUDA Driver

This will give the opportunity of using Nvidia GPUs

Page 14: My Presentation SAO3tauceti.caltech.edu/casper-workshop-2017/slides/7_roshanineshat.pdf · Title: Microsoft PowerPoint - My_Presentation_SAO3.pptx Author: arash Created Date: 8/14/2017

Software

.v2d .vex

.calc

$ mpifxcorr

.input$ vex2difx

$ calcif2

Output FilesMark4 Datafiles

$ difx2mark4

Plots

$ fourfit

Vdif Files

Difx Operation Block Diagram:Config files Core

14

Page 15: My Presentation SAO3tauceti.caltech.edu/casper-workshop-2017/slides/7_roshanineshat.pdf · Title: Microsoft PowerPoint - My_Presentation_SAO3.pptx Author: arash Created Date: 8/14/2017

Software

Output of Difx on CPU:

15

Page 16: My Presentation SAO3tauceti.caltech.edu/casper-workshop-2017/slides/7_roshanineshat.pdf · Title: Microsoft PowerPoint - My_Presentation_SAO3.pptx Author: arash Created Date: 8/14/2017

DiFX with GPU

Vector Operations and FFT Libraries are defined in the file:

architecture.h.in

Following architectures are introduced:

• Intel • GENERIC

16

Page 17: My Presentation SAO3tauceti.caltech.edu/casper-workshop-2017/slides/7_roshanineshat.pdf · Title: Microsoft PowerPoint - My_Presentation_SAO3.pptx Author: arash Created Date: 8/14/2017

DiFX with GPU- INTEL mode uses:

• Intel Integrated Performance Primitives (IPP) For F-Engine • Intel Integrated Performance Primitives (IPP) for X-Engine

- GENERIC architecture uses:• FFTW for F-Engine• C++ standards for X-Engine

FFTW cuFFTw

CUDA has made porting the code much more easier than before with having little modification of code.

17

Page 18: My Presentation SAO3tauceti.caltech.edu/casper-workshop-2017/slides/7_roshanineshat.pdf · Title: Microsoft PowerPoint - My_Presentation_SAO3.pptx Author: arash Created Date: 8/14/2017

DiFX with GPU

X Engine:

vectorMul(src, dst, length){

for i = 0 to length:dst[i] = src[i] * src[i]

}

Embarrassingly ParallelProblem

F Engine:

FFT function all use cuFFTw library

18

Page 19: My Presentation SAO3tauceti.caltech.edu/casper-workshop-2017/slides/7_roshanineshat.pdf · Title: Microsoft PowerPoint - My_Presentation_SAO3.pptx Author: arash Created Date: 8/14/2017

DiFX with GPU

19

• Very small difference. floating point operations are not guaranteed to be identical.

Page 20: My Presentation SAO3tauceti.caltech.edu/casper-workshop-2017/slides/7_roshanineshat.pdf · Title: Microsoft PowerPoint - My_Presentation_SAO3.pptx Author: arash Created Date: 8/14/2017

Benchmarks

1 2 3 4Only CPU machines 396 198 130 90

0

50

100

150

200

250

300

350

400

450

Tim

e (s

econ

d)

Benchmark of DiFX

20

CPU Machine

Page 21: My Presentation SAO3tauceti.caltech.edu/casper-workshop-2017/slides/7_roshanineshat.pdf · Title: Microsoft PowerPoint - My_Presentation_SAO3.pptx Author: arash Created Date: 8/14/2017

Benchmarks

21

Change one of the CPUmachines with the GPU

machine

GPU is Off!

Page 22: My Presentation SAO3tauceti.caltech.edu/casper-workshop-2017/slides/7_roshanineshat.pdf · Title: Microsoft PowerPoint - My_Presentation_SAO3.pptx Author: arash Created Date: 8/14/2017

Benchmarks

1 2 3 4Only CPU machines 396 198 130 90GPU Disabled 290 138 94 71

0

50

100

150

200

250

300

350

400

450

Tim

e (s

econ

d)

Benchmark of DiFX

22

CPU Machine

GPU MachineGPU Off

Page 23: My Presentation SAO3tauceti.caltech.edu/casper-workshop-2017/slides/7_roshanineshat.pdf · Title: Microsoft PowerPoint - My_Presentation_SAO3.pptx Author: arash Created Date: 8/14/2017

Benchmarks

23

Turn On the GPU!

Page 24: My Presentation SAO3tauceti.caltech.edu/casper-workshop-2017/slides/7_roshanineshat.pdf · Title: Microsoft PowerPoint - My_Presentation_SAO3.pptx Author: arash Created Date: 8/14/2017

Benchmarks

1 2 3 4Only CPU machines 396 198 130 90GPU Disabled 290 138 94 71GPU Enabled 258 119 80 63

0

50

100

150

200

250

300

350

400

450

Tim

e (s

econ

d)

Benchmark of DiFX

24

CPU Machine

GPU MachineGPU Off

GPU MachineGPU On

Page 25: My Presentation SAO3tauceti.caltech.edu/casper-workshop-2017/slides/7_roshanineshat.pdf · Title: Microsoft PowerPoint - My_Presentation_SAO3.pptx Author: arash Created Date: 8/14/2017

Conclusion

25

• GPU will make the process faster, about 20% to 25%• From the Financially perspective, a GPU machine is cheaper but

works better

Page 26: My Presentation SAO3tauceti.caltech.edu/casper-workshop-2017/slides/7_roshanineshat.pdf · Title: Microsoft PowerPoint - My_Presentation_SAO3.pptx Author: arash Created Date: 8/14/2017

Future Work

26

• Optimize the GPU process • Study other alternative libraries to support more co-

processors, like Tensorflow and Thrust

Page 27: My Presentation SAO3tauceti.caltech.edu/casper-workshop-2017/slides/7_roshanineshat.pdf · Title: Microsoft PowerPoint - My_Presentation_SAO3.pptx Author: arash Created Date: 8/14/2017

Acknowledgments

27

Jonathan Weintroub

Shep Doeleman

Andre Young

Lindy Blackburn

Rurik Primiani

Mark Peryer

Geoff Crew

DiIFX Community

Page 28: My Presentation SAO3tauceti.caltech.edu/casper-workshop-2017/slides/7_roshanineshat.pdf · Title: Microsoft PowerPoint - My_Presentation_SAO3.pptx Author: arash Created Date: 8/14/2017

Questions?

28


Recommended