SKA Science Data Processor Update
Dr. John Taylor(on behalf of)High Performance Computing and Research Computing ServiceUniversity of CambridgeEmail: [email protected] Science Data Processor Consortium
ISC17 Frankfurt
Overview
• Science Data Processor Context• Overview of the Science Data Processor
(SDP)• Architectural Considerations• ALASKA Prototyping Environment• Next Steps
One SDP Two Telescopes
SDP Scope SKA Phase 1
Ref. SKA-TEL-SDP-0000001 SDP Preliminary Architecture Design P Alexander et al
SDP Key Performance Requirements -- SKA Phase 1
SDP Local Monitoring & Control
High Performance• ~100 PetaFLOPS
Data Intensive• ~100 PetaBytes/observation
(job)
Partially real-time• ~10s response time
Partially iterative• ~10 iterations/job (~6hour)
Telescope Manager
CSP
Observatory
High Volume & High Growth Rate• ~100 PetaByte/year
Infrequent Access
• ~few times/year max
Data Processor Data Preservation
Delivery System
Data Distribution•~100 PetaByte/year from Cape Town & Perth to rest of World
Data Discovery•Visualisation of 100k by 100k by 100k voxel cubes
Science Data Processor
~1 Tbytes-1~10
Gbytes-1~20Gbytes-1
~1 Gbytes-1
(TBC)
Illustrative Requirements
• HPC – ~250 PFLOP system (Peak)– ~200 PByte/s aggregate BW to fast working memory – ~80 PByte Storage– ~0.5-1 TeraByte/s sustained write to storage– ~5-10 TeraByte/s sustained read from storage– ~10000 FLOPS/byte read from storage– ~2 Bytes/Flop memory bandwidth
6
• Preservation and LMC– ~1-10 Gbytes/s QA information– ~ 20 Gbytes/s to HSM for preservation– ~10s latency to respond to alerts
• Ingest– ~Tbyte/sec
SDP Overview
• So the SDP is NOT ONLY just another HPC system– Achieve high-performance on key scientific algorithms in Exascale
regime• State-of-the-art HPC technologies are critical
• BUT, It needs to also:-– Collect, manage, store and deliver vast amounts of data into viable
products • Big Data => Variety, velocity, volume, veracity => Value
– Combine real-time and iterative execution environment and provide feedback at various cadence to other elements of the telescope
• High Performance Data Analytics
– Operate 365 days a year • Highly available and accommodate failure via software. Modern hyperscale
environments
– Extensible and Scalable• Provide a modern eco-system to accommodate new algorithm development and
upgrades
SDP Challenges
• Power efficiency – Current Exascale roadmap (US) indicates 20-25MW(!) for ExaFlop by 2023. – Typically US Gov. pays around 200-250MUSD for such beasts– ECP now in flow with separate budget to develop the capability.
• Cost – Are our assumptions correct? How will growth-rates pan-out (processor,
memory, networking and storage).
SDP Challenges
• Complexity of Hardware and Software• Combining Real-Time (Streaming), Off-line (Batch) with feedback• Multiple Sub-Systems (Ingest, Buffer, Processing, Control,
Preservation and Delivery)• Scalability
• Hardware roadmaps• Demonstrated software scaling is uncertain
• Extensibility, scalability, maintainability• SKA1 is the first “milestone” – expecting significant expansion in the
2020s with a 50yr observatory lifetime
KEY CHARACTERISTICS OF RADIO INTERFEROMETRY IMAGE PROCESSING
Key Characteristics of SKA Data Processing
Very large data volumes, all data are processed in each observation
Noisy Data
Corrected for by deconvolution using iterative algorithms (~10 iterations)
Sparse and Incomplete Sampling
Corrected by jointly solving for the sky brightness distribution and for the slowly changing corruption effects using iterative algorithms
Corrupted Measurements
Loosely coupled tasks, large degree of parallelism is inherently available
Multiple dimensions of
data parallelism
Data-parallelism schemes
Frequency
Time & baseline
o Data parallelism: Dominated by frequency
o Provides dominant scalingo Nothing more needed if each processing
node can manage a frequency channel complete processing
Processing nodes
Visibility data
Data-parallelism schemes
Frequency
Time & baseline
Visibility data
o Further data parallelism in locality in UVW-spaceo Use to balance memory bandwidth per nodeo Some overlap regions on target grids neededo UV data buffered either on a locally shared store or locally on each node
Data-parallelism schemes
Data-parallelism schemes
Frequency
Time & baseline
Visibility data
o To manage total I/O from buffer/bus distribute Visibility data across nodes for same target grid which is duplicated
o Duplication of target provides fall-over protection
KEY ARCHITECTURAL CONSIDERATIONS AND MAPPING TO CURRENT HARDWARE
SDP Functional Breakdown
Ref. SKA-TEL-SDP-00000013 SDP Preliminary Architecture Design P Alexander et al
Imaging Component
Imaging and Fast Imaging in more detail
Image Processing Model
UV data store
Major cycle
Astronomical quality data
Image gridded data
Deconvolve imaged data Update(minor cycle) current sky model
Solve for telescope and Update image-plane calibration calibration model model
Imaging processors
Subtract current sky UV processors model from visibilitiesusing current calibrationmodel
Grid UV data to form e.g. W-projection
Compute Island/Node Concept
Ref. SKA-TEL-SDP-0000018 SDP Data Processor Platform Design C. Broekema
Compute Island Concept
Current Hardware Costed Concept
SDP Networking
SDP Hardware Concept
PROTOYPING ACTIVITES
Prototyping for SDP – What we need to explore
• PRIORITY BASED ON RISK• Provisioning, Management and Control
– Multiple networks (single tenant), large HPC platform, integration within multiple sub-systems (LMC, Preservation, Delivery), logging and event handling
• Software Defined Networking– Multiplicity of networks required (some RDMA), can these all be subsumed by SDN
(and over Ethernet)• Virtualization and Containerization
– What is the overhead for parallel applications? Small % can have a dramatic impact on cost. Bare metal provisioning at scale
• Orchestration of Pipelines with Execution Framework– Data-driven Execution Framework, scheduling of pipelines, perhaps based on COTS
Big Data Solution• Orchestration of feedback mechanisms
– “High Performance Middleware”, Distributed Database to maintain telescope state and sky model
• Management of Storage Hierarchies– RDMA to Object Storage, Parallel FS, API to support hierarchy.
Prototyping Platform
• Create a flexible, but performance-driven prototyping environment (P3) to support and inform Architecture
– Support migration of SIP– Create a software environment to define infrastructure – Provision a variety of storage technologies to provide experimental
bench– Provision multiple networks to accommodate data flows– Provision some CPU acceleration– Enable PaaS to investigate Execution Frameworks
• Solution OpenStack A la SKA – Sits on top of P3
The Buffer
• Ephemeral Storage required to support – real-time (Hot - hours) and – batch (Cold - weeks) processing
• Buffer is localised to a Data Island which is a subset of a Compute Island
• Provide a single namespace across 1-n compute nodes
• Hot buffer – currently conceived as local to nodes is driven by
performance to meet real-time needs• Cold buffer
– network attached is driven by capacity and resilience
Execution Engine and Data Life Cycle
Approach: Build on BigData Concepts "data driven” → graph-based processing approach
receiving a lot of attentionInspired by Hadoop but for our complex data flow
Graph-based approach
Hadoop
Execution Engine: hierarchical
Processing Level 1
Cluster and data distribution controller
Relatively low-data rate and cadence of messaging
Staging: aggregation of data products
Processing level 2
Static data distribution exploiting inherent parallelism
Execution Engine: hierarchical
Processing Level 2Data Island
Shared file store across data island
Worker nodes
Task-level parallelism to
achieve scaling
Process controller
(duplicated for resilience)
Cluster manager e.g. Mesos-like
What do we need
• Processing Level 1:– Custom framework to provide scaleout
• Processing Level 2:– Many similarities to Big-Data frameworks– Need to modify / develop for High Throughput
• High Throughput data analytics framework (HTDAF)– Possibly development of something like SPARK
• New data models• Links to external processes• Memory management between framework and processes
– Shared file system needs to be very intelligent about data placement / management or by tightly coupled with the HTDAF
StackHPC
Performance Prototype Platform
Performance Prototype Platform
▸ Specification complete early 2017
▸ Deployment started late March
▸ Early users started April
▸ Hosted and Managed by Cambridge University, UIS
32
StackHPC
Performance Prototype Platform
ALASKA - HARDWARE▸ 3x Control nodes
▸ 29x Compute nodes
▸ 2x High-memory nodes
▸ 2x GPU nodes
▸ 1x NVMe Storage node
▸ 2x SSD Storage nodes
▸ 5x ARM64 Ceph Storage Cluster
33
Prototyping Platform
Prototyping Platform - Hardware
BASED ON CURRENT HARDWARE CONCEPT OF A COMPUTE ISLAND
34
Performance Prototype Platform
Technology Evaluation Zoo– GPU– NVMe– ARM64– HPC network fabrics– High memory
Software Evaluation Zoo– Ironic– Magnum– Sahara– Monasca– SDN
Application Evaluation Zoo– Spark et al– MPI workloads– Containerised workloads– RDMA data flow models– Programmable networks– Stimulus generation
Performance Prototype Platform
Developing Ironic– Zero-touch registration– Scalable provisioning– Multi-network support– Reconfigurable BIOS &
RAID
Developing Monasca– Logging via Monasca– Postgres data store– Grafana visualisation– Multi-tenant logging service– HPC monitoring services
Developing Kolla– Monasca containers– Kolla-on-Bifrost
Prototyping Work So Far
• P3-System is isolated in WCDC• Infrastructure is Software Defined• SDN Enablement• Support for SIP
• Docker Swarm, Mesos, SPARK, HPCaaS• CephFS Subsystem• Next Steps (short-term)
• Hot buffer - POSIX Cluster FileSystem• Cold buffer - Object or FileSystem• High Performance Monitoring and Logging• DBaaS
SKA and OpenStack Science-WG
We are not alone……..
BACKUP SLIDES
• Next Generation radio telescope – compared to best current instruments it is ...
• ~100 times sensitivity• ~ 106 times faster imaging the sky• More than 5 square km of collecting area on sizes
3000km• Two Phases (2023 and 2030)
• Will address some of the key problems of astrophysics and cosmology (and physics)
• Builds on techniques developed in Europe • It is an interferometer
• Uses innovative technologies...• Major ICT project• Need performance at low unit cost
What is the SKA
Pulsar as Natural Clocks: Testing gravity
• Pulsars are rotating neutron stars
• Pulse once per revolution → yery accurate clocks
• The SKA will detect around 30,000 pulsars in the Galaxy
• Relativistic binaries to test gravity
• Timing net of to detect gravitational waves
SKA Context Diagram
SDP - These are off-site! (In Perth &
Cape Town)
Science Regional Centres