Distributed Processingof Future
Radio Astronomical Observations
Ger van DiepenASTRON, Dwingeloo
ATNF, Sydney
ADASS2007; GvD 24-9-2007
ContentsContents
IntroductionIntroduction
Data DistributionData Distribution
ArchitectureArchitecture
Performance issuesPerformance issues
Current status and future workCurrent status and future work
ADASS2007; GvD 24-9-2007
Data Volume in future telescopesData Volume in future telescopes
LOFARLOFAR37 stations (666 baselines) grows to 63 stations (1953 baselines)37 stations (666 baselines) grows to 63 stations (1953 baselines)128 subbands of 256 channels (32768 channels)128 subbands of 256 channels (32768 channels)666*32768*4*8 bytes/sec = 700 MByte/sec666*32768*4*8 bytes/sec = 700 MByte/sec5 hour observation gives 12 TBytes5 hour observation gives 12 TBytes
ASKAP (spectral line observation)ASKAP (spectral line observation)45 stations (990 baselines)45 stations (990 baselines)32 beams, 16384 channels each32 beams, 16384 channels each990*32*16384*4*8 bytes/10 sec = 1.6 GByte/sec990*32*16384*4*8 bytes/10 sec = 1.6 GByte/sec12 hour observation gives 72 Tbytes12 hour observation gives 72 TbytesOne day observing > entire world radio archiveOne day observing > entire world radio archive
ASKAP continuum: ASKAP continuum: 280 GBytes (64 channels)280 GBytes (64 channels)
MeerKAT similar to ASKAPMeerKAT similar to ASKAP
ADASS2007; GvD 24-9-2007
Key IssuesKey Issues
TraditionallyTraditionally FutureFuture
SizeSize Few GBytesFew GBytes Several TBytesSeveral TBytes
Processing timeProcessing time weeks-monthsweeks-months < 1 day< 1 day
ModeMode InteractivelyInteractively Automatic pipelineAutomatic pipeline
Archived?Archived? AlwaysAlways Some dataSome data
WhereWhere DesktopDesktop Dedicated machineDedicated machine
IOIO Many passes Many passes through datathrough data
Only few passes Only few passes possiblepossible
Package usedPackage used AIPS,Miriad,Casa,..AIPS,Miriad,Casa,.. ??
ADASS2007; GvD 24-9-2007
Data DistributionData Distribution
Visibility data need to be stored in a distributed Visibility data need to be stored in a distributed wayway
Limited use for parallel IOLimited use for parallel IO
Too many data to share across networkToo many data to share across network
Bring processes to the data Bring processes to the data
NOT bring data to processesNOT bring data to processes
ADASS2007; GvD 24-9-2007
Data DistributionData Distribution
Distribution must be efficient for all purposes Distribution must be efficient for all purposes (flagging, calibration, imaging, (flagging, calibration, imaging,
deconvolution)deconvolution)
Process locally where possible and exchange as Process locally where possible and exchange as few data as possiblefew data as possible
Loss of a data partition should not be too painfulLoss of a data partition should not be too painful
Spectral partitioning seems best candidateSpectral partitioning seems best candidate
ADASS2007; GvD 24-9-2007
ArchitectureArchitecture
Connection types:Connection types:
SocketSocket
MPIMPI
MemoryMemory
DBDB
ADASS2007; GvD 24-9-2007
Data ProcessingData Processing
A series of steps have to be performed on the dataA series of steps have to be performed on the data
(solve, subtract, correct, image, ...)(solve, subtract, correct, image, ...)
Master get steps from control process (e.g. Master get steps from control process (e.g. Python)Python)
If possible, step is directly sent to appropriate If possible, step is directly sent to appropriate workersworkers
Some steps (e.g. solve) need iterationSome steps (e.g. solve) need iterationSubsteps are sent to workersSubsteps are sent to workers
Replies are received and forwarded to other workersReplies are received and forwarded to other workers
ADASS2007; GvD 24-9-2007
Calibration ProcessingCalibration Processing
Solving non-linearlySolving non-linearly
do {do {
1: get normal equations1: get normal equations
2: send eq to solver2: send eq to solver
3: get solution3: get solution
4: send solution4: send solution
} while (!converged)} while (!converged)
ADASS2007; GvD 24-9-2007
Performance: IOPerformance: IO
Distributed IO, yet 24 minutes to read 72 TByte onceDistributed IO, yet 24 minutes to read 72 TByte onceIO should be asynchronous to avoid idle CPUIO should be asynchronous to avoid idle CPU
Deployment decision what storage to use Deployment decision what storage to use Local disks (RAID)Local disks (RAID)SAN or NASSAN or NASSufficient IO-bandwidth to all machines is neededSufficient IO-bandwidth to all machines is needed
Calibration and imaging are used repeatedly, so the data Calibration and imaging are used repeatedly, so the data will be accessed multiple timeswill be accessed multiple timesBUT operate on chunks of data (work domain) to keep BUT operate on chunks of data (work domain) to keep data in memory while performing many steps on themdata in memory while performing many steps on themPossibly store in multiple resolutionsPossibly store in multiple resolutionsTiling for efficient IO if different access patternsTiling for efficient IO if different access patterns
ADASS2007; GvD 24-9-2007
Performance: NetworkPerformance: Network
Process locally where possibleProcess locally where possible
Send as few data as possibleSend as few data as possible
(normal equations are small matrices)(normal equations are small matrices)
Overlay operationsOverlay operations
e.g. Form normal equations for next work domain e.g. Form normal equations for next work domain while Solver solves current work domainwhile Solver solves current work domain
ADASS2007; GvD 24-9-2007
Performance: CPUPerformance: CPU
Parallelisation (OpenMP, ...)Parallelisation (OpenMP, ...)
Vectorisation (SSE instructions)Vectorisation (SSE instructions)
Keep data in CPU cache as much as possible, so Keep data in CPU cache as much as possible, so smallish data arrayssmallish data arrays
Optimal layout of data structuresOptimal layout of data structures
Keep intermediate results if not changingKeep intermediate results if not changing
Reduce number of operations by reducing the Reduce number of operations by reducing the resolutionresolution
ADASS2007; GvD 24-9-2007
Current statusCurrent status
Basic framework has been implemented and is Basic framework has been implemented and is used in LOFAR and CONRAD calibration and used in LOFAR and CONRAD calibration and imagingimagingCan be deployed on cluster or super (or desktop)Can be deployed on cluster or super (or desktop)Tested on SUN cluster, Cray XT3, Tested on SUN cluster, Cray XT3,
IBM PC cluster, MacBookIBM PC cluster, MacBookResource DB describes cluster layout and data Resource DB describes cluster layout and data partitioning.partitioning.Hence the master can derive which processor Hence the master can derive which processor should process with part of the data. should process with part of the data.
ADASS2007; GvD 24-9-2007
Parallel processed image Parallel processed image (Tim (Tim Cornwell)Cornwell)
Runs on ATNF’s Sun cluster “minicp” 8 nodesRuns on ATNF’s Sun cluster “minicp” 8 nodesEach node = 2 * dual core Opterons, 1TB, 12GBEach node = 2 * dual core Opterons, 1TB, 12GB
Also on CRAY XT3 at WASP (Perth, WA)Also on CRAY XT3 at WASP (Perth, WA)Data simulated using AIPS++Data simulated using AIPS++Imaged using CONRAD synthesis softwareImaged using CONRAD synthesis software
New software using casacoreNew software using casacoreRunning under OpenMPIRunning under OpenMPI
Long integration continuum imageLong integration continuum image8 hours integration8 hours integration128 channels over 300MHz128 channels over 300MHzSingle beamSingle beam
Use 1, 2, 4, 8, 16 processing nodes for Use 1, 2, 4, 8, 16 processing nodes for calculation of residual imagescalculation of residual images
Scales wellScales well
Must scale up hundred foldMust scale up hundred foldOr more….Or more….
ADASS2007; GvD 24-9-2007
Future workFuture work
More work needed on robustnessMore work needed on robustnessDiscard partition when processor or disk failsDiscard partition when processor or disk fails
Move to other processor if possible (e.g. replicated)Move to other processor if possible (e.g. replicated)
Store data in multiple resolutions?Store data in multiple resolutions?
Use master-worker in flagging, deconvolutionUse master-worker in flagging, deconvolution
Worker can use accelerators like GPGPU, FPGA, CellWorker can use accelerators like GPGPU, FPGA, Cell
(maybe through RapidMind)(maybe through RapidMind)
Worker can be a master itself to make use of BG/L in a PC Worker can be a master itself to make use of BG/L in a PC clustercluster
ADASS2007; GvD 24-9-2007
Future workFuture work
Extend to image processing (few TBytes)Extend to image processing (few TBytes)Source findingSource findingAnalysisAnalysisDisplayDisplayVO access?VO access?
ADASS2007; GvD 24-9-2007
Thank youThank you
Joint work with people at ASTRON, ATNF, and KATJoint work with people at ASTRON, ATNF, and KAT
More detail in next talk about LOFAR calibrationMore detail in next talk about LOFAR calibration
See poster about CONRAD softwareSee poster about CONRAD software
Ger van DiepenGer van [email protected]@astron.nl