ASKAP Computing - ASKAIC Technical Update “Towards SKA - ASKAP computing & the development of SKA systems”
Ben Humphreys ASKAP Computing Project Engineer 10th July 2009
Image Credit: Swinburne Astronomy Productions
CSIRO. ASKAP Computing - ASKAIC Technical Update
Outline
• ASKAP Computing & Software Architecture Overview
• ASKAP Central Processor Overview • Design • Middleware • Hardware
• ASKAP Data Storage
• Challenges for ASKAP Computing
CSIRO. ASKAP Computing - ASKAIC Technical Update
ASKAP Computing & Software Architecture
Overview
Architecture Goals
• The structure of the system • Top-level decomposition of the software system
• Definition of interactions between components
• Loosely coupled components • Hardware platform independence • Language independence • Operating system independence • Implementation independence • Location and server transparency • Asynchronous communication
CSIRO. ASKAP Computing - ASKAIC Technical Update
Logical View
CSIRO. ASKAP Computing - ASKAIC Technical Update
Evaluation of Middleware
• Evaluated: • Apache Tuscany (SCA/SOA) • ActiveMQ / JMS (Message Oriented Middleware) • ICE (Distributed Objects)
CSIRO. ASKAP Computing - ASKAIC Technical Update
Evaluation of Middleware
• Requirements • Support Linux • Language bindings for Java, C++ and Python • Support for Request / Response style communication
• Synchronous • Asynchronous
• Support for Publish / Subscribe style communication • Promote loose coupling • Support for fault tolerance (replication, etc.)
• Desirable: • Mature • Open Source
CSIRO. ASKAP Computing - ASKAIC Technical Update
Evaluation of Middleware
• Both ActiveMQ and ICE were found to meet our requirements
• We have selected ICE over ActiveMQ/JMS primarily because of its interface definition language
• Avoids us having to define our own interface definition language
• Avoids us having to build our own bindings between the language and the interface definition
• Also many of our interfaces appear to be best suited to an object oriented model
CSIRO. ASKAP Computing - ASKAIC Technical Update
Evaluation of Middleware ICE
• ICE • The Internet Communications Engine (ICE) is an object-oriented
middleware with support for C++, .NET, Java, Python, Ruby, and PHP • Ice supports heterogeneous environments: client and server can be
written in different programming languages, can run on different operating systems and machine architectures, and can communicate using a variety of networking technologies
• Open Source • Available under the GPL and a commercial license
• Put simply, ICE is CORBA done right!
• Long history of CORBA use in similar systems • Advanced Technology Solar Telescope (ATST) employs ICE
CSIRO. ASKAP Computing - ASKAIC Technical Update
Other Software Decisions
• Central processor decisions will be discussed later
• Telescope Operating System (aka Monitoring and Control) • Experimental Physics and Industrial Control System (EPICS) • Originally written jointly by Los Alamos National Laboratory and
Argonne National Laboratory • EPICS released 1990, used in >100 projects • Decision made after extensive evaluation
• PVSS-II • EPICS • ALMA Common Software (ACS) • TANGO
CSIRO. ASKAP Computing - ASKAIC Technical Update
Other Software Decisions
• Logging • Log4j, Log4cxx, Log4py
• Monitoring • MoniCA (Existing ATNF package)
• Data Services • MySQL or PostgreSQL • SQLAlchemy
CSIRO. ASKAP Computing - ASKAIC Technical Update
Pending Software Decisions
• Observation Preparation Tool • Investigating the possibility of reuse:
• Gemini Observing Tool • James Clerk Maxwell Telescope (JCMT) Observing Tool • ALMA Observing Tool
• Alarm Management System • Ongoing evaluation
• EPICS Alarm Handler • DESY (German Electron Synchrotron) Alarm Management System
• Operator Displays • Rich client? • Web based?
CSIRO. ASKAP Computing - ASKAIC Technical Update
CSIRO. ASKAP Computing - ASKAIC Technical Update
ASKAP Central Processor Overview
ASKAP Central Processor Overview
CSIRO. ASKAP Computing - ASKAIC Technical Update
Key Requirements
• Process data from observing to archive with minimal human decision making
• Calibrate automatically
• Imaging • Fully automated processing, random field • Fully automated processing, Galactic plane • Fully automated processing, HI
• Form science oriented catalogues automatically
CSIRO. ASKAP Computing - ASKAIC Technical Update
Traditional Approach to Data Reduction
• Traditional/interactive approach to data reduction typically involves multiple access to the same data
• Don’t want to sit in front of a computer all day, so ideally reducing say a 8-12 hour observation would take an hour or two
• For ASKAP this approach would require ~500GB/s • For SKA perhaps a few PB/s
CSIRO. ASKAP Computing - ASKAIC Technical Update
Streaming Approach to Data Reduction
• Streaming approach possible for single-pass, automated data reduction
• No temporary storage is required to store raw data. Must keep up with input data rate
• However there are downsides: • Certain optimizations impossible. Must process the data generally
in the order it is received • Failure handling becomes much more difficult
CSIRO. ASKAP Computing - ASKAIC Technical Update
Buffered Approach to Data Reduction
• Buffering the dataset at a certain stage allows: • Optimizations in ordering before the compute intensive stage • Better failure handling
• Single-pass, automated data reduction. Must keep up with input data rate but can cope with some processing time variability
• For ASKAP this approach would require ~10GB/s • For SKA perhaps 20-50TB/s
CSIRO. ASKAP Computing - ASKAIC Technical Update
Processing models
• Streamed • Visibility data is processed immediately • Few optimization options • Need enough memory to store images over 100TB for 16384 ch • Time to finish off • Difficult to handle faults without losing channels
• Buffered • Data is stored to disk for part or all of observation • When enough data is ready, processing starts • High latency • Reduced memory requirement (16-32TB for 16384 ch) • Many choices to optimize sequence of processing and hence
performance • Simplifies fault handling • Impossible if ingest rate exceeds disk I/O capability
CSIRO. ASKAP Computing - ASKAIC Technical Update
Central Processor Design
• Central Processor is composed of: • Frontend (Data Conditioner) – responsible for:
• Ingesting visibilities from the correlator • Ingesting metadata from the telescope operating system • Annotating visibility stream with metadata • Flagging suspect data (e.g. RFI impacted data) • Calculation of calibration parameters • Applying calibration to the observed visibilities • Writing out of measurement set or similar
• Backend – responsible for: • Processing of calibrated visibilities into science products:
• Image cubes • Source catalogs • Transient detections
CSIRO. ASKAP Computing - ASKAIC Technical Update
Central Processor Design Physical View (Omits Control Plane)
CSIRO. ASKAP Computing - ASKAIC Technical Update
Central Processor Design Logical View
CSIRO. ASKAP Computing - ASKAIC Technical Update
Central Processor Design Central Processor Process View / Data Flow
CSIRO. ASKAP Computing - ASKAIC Technical Update
Central Processor Design
• Front-End • Based on the “Pipes & Filters” design pattern
• Back-End • Is more complex
• Spectral line imager is generally a linear processing pipeline • Continuum imager is iterative and involves a large reduction step • Transient detection may be more decision event/decision oriented
• Design for course-grained parallelism • For most tasks, each channel can be handled independently • For many tasks each baseline, polarization, or time sample can be
handled independently • Doesn't preclude fine-grained parallelism where required
CSIRO. ASKAP Computing - ASKAIC Technical Update
Central Processor Middleware
• Selection of middleware can limit hardware options due to platform and operating system support restrictions
• Extensive evaluation of middleware was carried out
• Many potential middleware/frameworks identified: • MPI • CORBA (OmniORB, TAO) • IBM InfoSphere Streams (System S) • Data Distribution Service (DDS) • ICE
CSIRO. ASKAP Computing - ASKAIC Technical Update
Central Processor Middleware Front-End
• We have selected ICE for the frontend of the central processor
• ICE runs on most of major Unix platforms: • Full support from ZeroC:
• Linux, Solaris, MacOS X • Builds and runs:
• HP-UX, FreeBSD
• Prototypes of both the front-end (ingest pipeline) and the imager have been developed using ICE
CSIRO. ASKAP Computing - ASKAIC Technical Update
Central Processor Middleware Back-End
• The “Backend” of the central processor is where the substantial compute resources in ASKAP reside
• We have selected MPI for the backend of the central processor • MPI is supported on practically all HPC platforms • We believe this decision gives us the widest range of hardware
options • MPI implementation of ASKAPsoft simulator, imager & source
finder has been used for over two years
• Our design also requires a resource manager for the backend • Must be DRMAA compliant • Currently using Torque/PBS
CSIRO. ASKAP Computing - ASKAIC Technical Update
Model for synthesis data processing costs
• Key result from investigations over last two years
• Under continual refinement
• Key parameters • Convolution support • Cost per million points
per sec • Baseline length
• Gridding only • Calibration,
deconvolution, and source finding not yet included
CSIRO. ASKAP Computing - ASKAIC Technical Update
Evaluation of Hardware
• Built a small benchmark from the core gridding/degridding algorithm. About 90% of our computing requirements relate to this algorithm
• Distributed benchmark very widely
• Benchmarked on systems from: • Intel (Harpertown and Nehalem CPUs) • AMD (Opteron 2000 series CPUs) • NEC (SX-8R & SX-9R) • SGI (SGI Altix 4700 Itanium & SGI Altix XE) • IBM (BlueGene/P) • NVIDIA (Tesla C870 GPU & GeForce GTX 260) • Cray (XT5 & X2)
CSIRO. ASKAP Computing - ASKAIC Technical Update
Evaluation of Hardware
CSIRO. ASKAP Computing - ASKAIC Technical Update
Other numbers are confidential
Evaluation of Hardware
• Special processors for convolutional resampling:
• Co processors • FPGA • Cell processors • GPGPUs
• Field Programmable Gate Arrays (FPGA) • Performance appeared to be promising for small grid sizes.
Relative to CPU ~50x speedup • Limited memory makes large grid sizes challenging • Long development cycle and requires very specialised skill set
CSIRO. ASKAP Computing - ASKAIC Technical Update
Evaluation of Hardware
• Cell Processor • The STI Cell Processor has a HPC variant sold by IBM
in the QS22 blade • Difficult programming model • Very small (256KB) local memory is problematic for our
imaging algorithms. The memory bus becomes a significant bottleneck
• See paper by Varbanescu, et al.
• General Purpose Graphics Processor Unit (GPGPU) • General purpose GPU available from NVIDIA and AMD • We have ported our gridding benchmark with some
promising results • Software development effort is larger than with a regular
CPU and may cancel out any cost savings
CSIRO. ASKAP Computing - ASKAIC Technical Update
Evaluation of Hardware
• Have identified a metric to measure price/performance: • Price per million grid points per second
• i.e. how much it costs to acquire a computer to perform at a certain level
• Have identified a metric to measure power/performance: • Watts per million grid points per second
• i.e. how much power is required to perform at a certain level
• Final hardware decision will take many other factors into account; reliability, maintainability, quality, maintenance costs, integration/packaging, etc.
CSIRO. ASKAP Computing - ASKAIC Technical Update
Hardware Needs
• We are somewhat reliant on Moore’s Law • Building the Central Processor with hardware available today
would be too costly
• Hardware options must be kept open as long as possible so we are not railroaded to a certain platform/technology
• Discussions we have had with vendors and testing of current hardware indicate next generation systems (2011-2012) match our requirements and budget
CSIRO. ASKAP Computing - ASKAIC Technical Update
Hardware Needs for BETA
• Approximate requirements for an indicative Intel/AMD cluster: • 3-6 TFlop/s
• 256-512 cores (as of late 2008 / early 2009)
• 1-2 TB memory
• Good memory bandwidth • > 5GB/s per core
• 50 TB persistent storage (1 GB/s I/O rate) • Plus backup solution
• Modest network interconnect • Single 1GbE for compute nodes • Single 10GbE for the ingest and output nodes
CSIRO. ASKAP Computing - ASKAIC Technical Update
Hardware Needs for ASKAP
• Approximate requirements for an indicative Intel/AMD cluster: • 100 TFlop/s
• ~8000 cores (as of late 2008 / early 2009) • ~10000 if we assume a more realistic 80% efficiency
• 16-150 TB memory (depending on processing model)
• Good memory bandwidth • > 5GB/s per core
• 1 PB persistent storage (10 GB/s I/O rate) • Plus backup solution
• Modest network interconnect • 1GbE for compute nodes (but would likely use 10GbE or Infiniband) • 2-4 x 10GbE for the ingest and output nodes
CSIRO. ASKAP Computing - ASKAIC Technical Update
Central Processor Development Environment
• Hardware: • x86 64-bit
• Intel Core 2 / Core i7 • AMD Opteron
• Operating System: • Linux
• Debian Etch • Fedora 8
• Mac OSX 10.5
• Middleware: • OpenMPI 1.3 • ICE 3.3
• Other 3rd Party Packages • LAPACK • FFTW • BLAS • WCSLib • Boost • LOFAR Software • Casacore • Duchamp
CSIRO. ASKAP Computing - ASKAIC Technical Update
Can use optimized math libraries. ASKAPsoft has been trialed with: • Intel MKL • AMD Core Math Library • IBM ESSL • ATLAS
ASKAPsoft Codebase Status
• ASKAPsoft in development since 2006
• Approximately 110,000 SLOC • C++ 53375 (49%) • ANSI C 30616 (28%) • Python 13232 (12%) • Fortran 11272 (10%) • sh 1105 (1%)
• Depends upon over 50 third party packages
CSIRO. ASKAP Computing - ASKAIC Technical Update
Central Processor Scalability Testing
• Currently performing scalability testing at the National Computational Infrastructure (NCI) National Facility
• SGI Altix XE Cluster System • 156 x SGI Altix XE 320 nodes • 1248 cores (312 x 3.0GHz Intel Harpertown CPUs) • DDR InfiniBand interconnect • 18 x Quad-core SGI Altix XE 210 servers for Lustre filesystem
• Migrating to the new Sun Constellation system late 2009 • Hosted by the NCI National Facility @ ANU • 1500 Sun Blade modules (12,000 cores) • 500TB Lustre Filesystem
CSIRO. ASKAP Computing - ASKAIC Technical Update
Central Processor Scaling Timeline
CSIRO. ASKAP Computing - ASKAIC Technical Update
Central Processor I/O Scaling
CSIRO. ASKAP Computing - ASKAIC Technical Update
CSIRO. ASKAP Computing - ASKAIC Technical Update
ASKAP Data Storage
ASKAP Data Storage
• ASKAP Telescope/Instrument - Online System • Location: Geraldton & Boolardy • Provides storage required for the ASKAP instrument to operate
• ASKAP Science Data Archive Facility • Location: Probably Perth • Where astronomers go to access ASKAP data products
CSIRO. ASKAP Computing - ASKAIC Technical Update
ASKAP Data Storage Online System (Geraldton/Boolardy)
RDBMS Parallel Filesystem MySQL or PostgreSQL Lustre, GPFS, PVFS, pNFS
Probably < 10TB Minimum 1PB
What do we store there?
• Configuration • Scheduling Blocks • Source Catalogues • Calibration Parameters • Monitoring Archives • Logging & Alarms
What do we store there?
• Raw Datasets • Visibilities • Metadata
• Images • Image Cubes
CSIRO. ASKAP Computing - ASKAIC Technical Update
ASKAP Science Data Archive Facility
• Where astronomers go to access ASKAP data products • Located separately to Processing Centre (probably Perth) • One-way data path from Online Data Store to ASDAF
• Save for acknowledgement that data has been received
• Data sent to Archive: • Images, cubes, visibilities (continuum only) + their metadata • Transient images & time series • Source catalogues
• Capabilities of Archive: • Limited to queries and downloads • Reprocessing capabilities (e.g. stacking cubes) out of scope • Standard VO-style queries plus more specialised ones on ASKAP-specific
metadata • Normal downloading hard (large images!), so provide “take-away” capability
CSIRO. ASKAP Computing - ASKAIC Technical Update
ASKAP Science Data Archive Facility Size of potential data products
Product Size Continuum Visibility Data ~ 370 TB/year Spectral Line Visibility Data ~ 23 PB/year Transient Visibility Data ~ 23 TB/year Continuum Images ~ 256 TB/year Transient Images ~ 8.4 PB/year Spectral Line Images ~ 4 PB/all sky survey Continuum Catalogue ~ 60 GB Transient Catalogue ? Spectral Line Catalogue ? Spectral Line Stacks ?
CSIRO. ASKAP Computing - ASKAIC Technical Update
Source: Cornwell, T.J. “Cost estimates for the ASKAP Science Archive”, ASKAP-SW-0016, 2008
ASKAP Science Data Archive Facility
• Currently in negotiations with ICRAR for development of the ASKAP Science Data Archive Facility
• Hardware/Software Needs • Fast online storage • Cheap offline storage • Hierarchical Storage Management (HSM) • RDBMS/Hadoop/SciDB
• Significant software development and innovation required!! • Managing large datasets • Virtual observatory (IVOA) interface
CSIRO. ASKAP Computing - ASKAIC Technical Update
CSIRO. ASKAP Computing - ASKAIC Technical Update
Challenges for ASKAP Computing
Challenges for ASKAP Computing
• All the usual • Developing parallel/distributed software • Debugging parallel/distributed software • Reliability • Power (both logistics and running costs) • Acquisition, maintenance & software development costs
• Plus a few slightly more specific to our needs • Batch vs Streaming • Flop/s vs Memory sub-system performance
CSIRO. ASKAP Computing - ASKAIC Technical Update
Challenges for ASKAP Computing Batch vs Streaming
• The vast majority of HPC facilities run batch jobs, usually modeling/simulations and not processing of streaming data in real-time
• The tools to harness the potential of HPC for real-time data acquisition and analysis are still in their infancy or in most cases don’t exist
• The ASKAP Central Processor leverages two software frameworks:
• ICE – For handling of input data streams • MPI – For harnessing the power of a HPC system
• Ideally one software framework would suit the end to end requirements
CSIRO. ASKAP Computing - ASKAIC Technical Update
Challenges for ASKAP Computing Flop/s vs Memory sub-system performance
• Our imaging algorithms are more data intensive than computationally intensive. Typical operation:
• Load spectral sample (α) • Load convolution (x) • Load grid point (y) • Compute y
Challenges for ASKAP Computing Flop/s vs Memory sub-system performance
• Locality optimizations are hard because of… • large memory requirements of the images and convolution function
+ quasi-random access pattern • high input data rate and potential inability to buffer and reorder
input data
CSIRO. ASKAP Computing - ASKAIC Technical Update
Challenges for ASKAP Computing Flop/s vs Memory sub-system performance
• Bridging the Processor-Memory Gap • Good recent progress
• DDR3 • Move towards on-chip memory controllers • More channels to memory
• Can’t let the gap widen any further
• Locality awareness will always be important • Software advances are critical
• New implementation of algorithms • New algorithms • New approaches to data processing
CSIRO. ASKAP Computing - ASKAIC Technical Update
Challenges for ASKAP Computing Flop/s vs Memory sub-system performance
• NVIDIA GPU architecture is very good in this respect:
• NVIDIA GeForce GTX 285 memory bandwidth 159GB/s
• Typical x86 CPU memory bandwidth 15-35GB/s
• Memory stalls don’t leave the GPU core(s) idle
• Other threads can be scheduled while one thread is stalled on a load or store. But needs to have many (1000s) of threads to effectively hide memory latency
CSIRO. ASKAP Computing - ASKAIC Technical Update
Contact Us Phone: 1300 363 400 or +61 3 9545 2176
Email: [email protected] Web: www.csiro.au
Thank you
Australia Telescope National Facility Ben Humphreys ASKAP Computing Project Engineer
Phone: 02 9372 4211 Email: [email protected] Web: http://www.atnf.csiro.au/projects/askap/
CSIRO. ASKAP Computing - ASKAIC Technical Update