Are we there yet? - HPC Advisory Council...Are we there yet? Experiences developing and...

Post on 02-Jun-2020

1 views 0 download

transcript

Arewethereyet?ExperiencesdevelopingandcommissioningtheHPCSystemforASKAPTelescope

CSIROASTRONOMYANDSPACESCIENCE

JuanCarlos(JC)Guzman|HeadofATNFSoftwareandComputingPerthHPCAdvisoryCouncilConference– 31July– 1August2017

Arewethereyet?ExperiencesdevelopingandcommissioningtheHPCSystemforASKAPTelescope

CSIROASTRONOMYANDSPACESCIENCE

JuanCarlos(JC)Guzman|HeadofATNFSoftwareandComputingPerthHPCAdvisoryCouncilConference– 31July– 1August2017

WeacknowledgetheWajarri Yamatji peopleasthetraditionalownersoftheObservatorysiteandtheNoongar peopleasthetraditionalownersoftheland

wherethismeetingisbeingheld

OutlineOverviewofASKAP

ASKAPComputingSystemhistory,challengesandfuture

LessonsLearned

OutlineOverviewofASKAP

ASKAPComputingSystemhistory,challengesandfuture

Lessonslearned

AustralianSKAPathfinder- overview• 36-antennamulti-beaminterferometerinaradio-quietzone• Frequencyrange:700MHz– 1.8GHz,baselinesfrom23mto6km• Surveyinstrument– pushingwideinstantaneousfieldofview• 2nd generationphased-arrayfeed(PAF)receiver+flexiblebeamformer• 3-axismount(wholeantennacanrotate)– canfixorrotatebeampattern• Automaticprocessingeventually– necessaryforthefullinstrument• Earlysciencewith12antennasstartedinOctober2016• MostreportedsciencewaswithBETA(6-antennaarraywithMkI PAF)• 18antennashavealreadybeenintegratedintothearray

PhasedArrayFeed– 188singlepolreceivers

Widefieldofview

7 |

• 126km2

• 32kmroadsandtracks• 16000kmopticfibre• >8000fibres• ControlBuilding• Powerstation• Underconstruction

MurchisonRadioObservatory(MRO)

MROpowerstation

ASKAP– systemarchitecture

x36

Combineddatarate~21Tb/s

~2.5GB/s

OutlineOverviewofASKAP

ASKAPComputingSystemhistory,challengesandfuture

Lessonslearned

Indirectimagingofthesky

Synthesistelescopesmeasurecorrelationsbetween

receivedvoltagesforeachpairofantennas

Threedifferenttypesofimagesarerequired

Continuumimage Spectrallinecube Transientimage• Verycoarseimage• Madeevery5seconds

• Veryaccurateimage• Needmultipleiterations• Hardtoparallelize

• 16200independentimages• Eachatslightlydifferentfrequency• Embarrassinglyparalleltask• Oneiterationmaybesufficient

Weneedtomakeimagesinnearrealtime,ideallyallthreetypesinparallel

Tothefirstorder(narrowFOV),themeasurementequationisa2DFourierTransform

Indirectimagingofthesky

Synthesistelescopesmeasurecorrelationsbetween

receivedvoltagesforeachpairofantennas

Threedifferenttypesofimagesarerequired

Continuumimage Spectrallinecube Transientimage• Verycoarseimage• Madeevery5seconds

• Veryaccurateimage• Needmultipleiterations• Hardtoparallelize

• 16200independentimages• Eachatslightlydifferentfrequency• Embarrassinglyparalleltask• Oneiterationmaybesufficient

Weneedtomakeimagesinnearrealtime,ideallyallthreetypesinparallel

Tothefirstorder(narrowFOV),themeasurementequationisa2DFourierTransform

90%ComputationCost

ASKAPKeyComputingRequirements• 90%ComputationalCostingridding/degridding• https://www.skatelescope.org/uploaded/59116_132_Memo_Humphreys.pdf• Developedstand-alonebenchmarkinggriddingcodefortestinginmultipleplatforms

• 10,000cores(80%efficiency),4GB/core200TFLOPsPeak• DataIngestfromCorrelator~2.8GB/s=~10TB/h(RawVisibilities)• Processingofrawvisibilities(calibration&imaging)needstokeepup• Cannotaffordtokeeprawvisibilities• Multiplescienceproductsafterobservations~5PB/year

ASKAPSDP- PietroBaracchiConference|JCGuzman14 |

ThePawsey HighPerformanceComputingCentreforSKAScience• AUD$80Msuper-computingcentre• 25%resourcestosupportoperationalrequirementsofstorageandprocessingofdatafromASKAPandMWA• ConstructioncompletedApril2013

ASKAPCentralProcessor@Pawsey CentreIngestcluster

• 16nodes,2socketspernode• 8coresCPUs,64GbofRAMpernode

CentralProcessor(Galaxy)472xCrayXC30ComputeNodes• 200TFlop/sPeak• 64GbofRAMpernode• 2socketspernode,10coreseach

SharedstorageCraySonexion Lustre Storage• 1.3PBusable• 480x4TBDiskDrives• PeakI/Operformance:30Gb/s

ASKAPsoft• Indevelopmentsince2007• Extensivere-useofcorelibraries• Re-writtenSynthesis(parallel)codeC++/MPI

• Assumptions• Instrumentstable(relativelyeasytocalibrate)

• Goodglobalskymodel• Imagingmodeladequate

• Automatedcalibrationandimaging(pipeline)• ASKAPisoneofthepathfindersinthisdomain(streaming+batch)

• Treatprocessingsoftwareasapartofthetelescope

• Requiresparadigmshiftinthesciencecommunity

• Commissioningrequiresdifferentthingstothefulltelescope

Calibration Pipeline Services

Small-N (e.g. Continuum) Imager Pipeline

Large-N (eg. Spectral Line) Imager Pipeline

Ingest Pipeline

UV Data

16416 Channels(18.5kHz)

UV Data

304 Channels(1MHz)

Imager(cimager)

Imager(cimager)

Source Finder/Identifier

Source Finder/Identifier

Source Catalog

Source Catalog

ccalibrator

Transient Detector Pipeline

Transient Imager

(cfimager) Images

Transient Finder/Identifier

Transient Detections

16416 Channels(18.5kHz)

Calibration Solution

~30 Channels(10MHz)

Calibration Data

Service

Sky Model Service

Light Curve Service

Image Cube

Images

ASKAP Science Processing

ASKAP-SW-0020

Version: 2.0Date: 20/12/2011Project: ASKAP

Prepared by: Tim Cornwell, Ben Humphreys, Emil Lenc, Maxim Voronkov, MatthewWhiting

Reviewed by: Ilana Feain,Review reference : Redmine issue 3280Approved by: Ilana Feain Date: 20/12/2011

Keywords: computing, science, processing

• Smallerdatasets!• 1 TB/hr (ASKAP-12)vs10TB/hr (ASKAP-36)• Largernaturalresolution(maximumbaseline=2.18km)

• Abletodomanualprocessing– stillhard(manybeams,largecubes),buttractable• Processingteamwillrunpipelinesmanuallyuponcompletionof

observation• Neededtounderstandandlearnabouttheinstrument!!

• Somefeaturesnotavailable• Processingisnotautomated• NoSkyModelavailable,norcalibrationserviceappliediningest• Transientpipelinenotyetdeveloped

ASKAPsoft forCommissioning&EarlyScience

Results:ASKAPsoft:First36beamimage

Imagecredit:Wasim Raja

• Continuumimagewith9antennasat939.5MHz• Processingresemblesanearly-scienceexperiment• Eachbeamcalibratedseparately• Individualdeconvolution ofdifferentbeams

• OnlyASKAPsoft used

Results:NGC7232WALLABYEarlyScience

Credit:JuanMadrid– 14Sep2016

ASKAPComputingProject

• Teamof7peopledistributedbetweenPerth&Sydney• ExternalReviews:PreliminaryDesign(2009),CriticalDesign(2010)andProductionReadiness(2016)• Iterativesoftwaredevelopmentprocess~2monthscycles• ContinuousIntegrationTool(Jenkins)• Confluence&JIRA• Subversionsoontobemovedtogit

Issues

• 1.3PBFaststorage(Lustre filesystem)aka/scratch2• MultipleusersdoingmanualprocessingneededduringcommissioningandEarlyScience

• SharedwithMWAusers• Shortageofspaceandnon-deterministicperformanceaffectedthedataingestsoftware(ingestpipeline)

• UnderestimatescratchspaceofEarlyScienceProgram

• New1.9PBfilesysteminMay2017• ProcuredbyPawsey• 1PBdedicatedtoASKAPreal-timeand0.9PBtoMWA

• Stillhaveashortageof0.5– 1PBtosupportEarlyScienceprogramdependingonthefateof/scratch2

Issues

• Needstableinstrumenttovalidateourassumptions• PAFbeamsstable(relativelyeasytocalibrate)• GoodGlobalSkyModel(continuum)• Imagingperformanceadequate(highdynamicrange)

• EarlyScienceandCommissioningdifferentusecaseasfull(automated)pipeline->ScopeCreep

• Under-estimateeffortonsoftwareintegration,verificationandsupport

• SharingresourceswithASKAPCommissioningandSKApre-construction

Nextsteps• SoftwareDevelopmentforbasicmodesforfullASKAP• Scalingtestinganddebugging• Real-timeservicesdevelopmentandintegration(calibration)• Automatedcontinuumandspectrallinepipelines

• AdditionalSciencePipelines• FullPolarisation Calibration• ”PostageStamps”– smallregions(10”spatialresolution)• TransientandZoom-modepipeline

• UpgradeoftheGalaxyplatformin12– 24months(TBD)• Testing,profilinginAthena(benchmarking&datachallenges)• EvaluatingGPUcode• UpdatingASKAPComputingRequirements

Nextsteps– TowardsSKA1inAustralia

• SKA1_LOW~100timeslargerthanASKAP

• JointICRAR/CSIROSKAScienceDataProcessingProject(namedRialto)• ContinueourinvolvementinSDPconsortiumtowardsCDR

• NextgenerationofcalibrationandimagingprocessingsoftwareasaprototypeforSKA1_LOW,ASKAPandMWA

• Re-useofASKAPsoft andDAliuGE ExecutionFramework

OutlineOverviewofASKAP

ASKAPComputingSystemhistory,challengesandfuture

Lessonslearned

Lessonslearned(forSKA1)

• ASKAPoperationsmodeldoesnotfollowtraditionalHPC(batch)user/supportmodel• Buildstrongrelationshipwithserviceproviders:ServiceAgreements,co-location

• DedicatedresourcesatalllevelsforRadioAstronomy:People,Software,Hardware

• Commissioningoftelescopestakeslongtime,significantresourcesandisdifferenttofulloperationsofthetelescope• Supportthetransitionperiodwasunderestimated

• Isolatefastsharedstorage(Lustre filesystem)from“traditional”HPCusermodelandincludemorestorageifyoucan

Arewethereyet?

ASKAPsoft isalreadyworking!

Stilllotsofworktodo,manychallengesaheadandmoretolearn!

Whensoftwareisreallyfinished?...…Never?

CSIROAstronomyandSpaceScienceJuanCarlosGuzmanHeadofATNFSoftwareandComputingt +61864368569E juan.guzman@csiro.auw www.csiro.au/cass

CSIROASTRONOMYANDSPACESCIENCE

Thankyou