Download - Distributed Resource Management and Parallel Computation Dr Michael Rudgyard Streamline Computing Ltd.

Distributed Resource Management and

Parallel Computation

Dr Michael RudgyardStreamline Computing Ltd

Streamline Computing Ltd

• Spin out of Warwick (& Oxford) University

• Specialising in distributed (technical) computing

– Cluster and GRID computing technology

• 14 employees & growing; focussed expertise in:

– Scientific Computing– Computer systems and support – Presently 5 PhDs in HPC and Parallel

Computation– Expect growth to 20+ people in 2003

Strategy

• Establish an HPC systems integration company..

• ....but re-invest profits into software– Exploiting IP and significant expertise– First software product released– Two more products in prototype stage

• Two complementary ‘businesses’– Both high growth

Track Record (2001 – date..)

• Installations include:– Largest Sun HPC cluster in Europe (176 proc)– Largest Sun / Myrinet cluster in UK (128 proc)– AMD, Intel and Sun clusters at 21 UK Universities– Commercial clients include Akzo Noble, Fujitsu,

Maclaren F1, Rolls Royce, Schlumberger, Texaco….

• Delivered a 264 proc Intel/Myrinet cluster:– 1.3 Tflop/s Peak !! – Forms part of the White Rose Computational

Grid

Streamline and Grid Computing

• Pre-configured ‘grid’-enabled systems:– Clusters and farms– The SCore parallel environment– Virtual ‘desktop’ clusters

• Grid-enabled software products:

– The Distributed Debugging Tool– Large-scale distributed graphics– Scaleable, intelligent & fault tolerant parallel

computing

‘Grid’-enabled turnkey clusters

• Choice of DRMs and schedulers:– (Sun) GridEngine – PBS / PBS-Pro– LSF / ClusterTools– Condor – Maui Scheduler

• Globus 2.x gatekeeper (Globus 3 ???)

• Customised access portal

The SCore parallel environment

• Developed by the Real World Computing Partnership in Japan (www.pccluster.org).

• Unique features, that are unavailable in most parallel environments:

– Low latency, high bandwidth MPI drivers – Network transparency: Ethernet, Gigabit and

Myrinet– Multi-user time-sharing (gang scheduling)– O/S level checkpointing and failover – Integration with PBS and SGE– MPICH-G port– Cluster management functionality

‘Desktop’ Clusters

• Linux Workstation Strategy– Integrated software stack for HPTC

(compilers, tools & libraries) – cf. UNIX workstations

• Aim to provide a GRID at point of sale:– Single point of administration for several

machines– Files served from front-end – Resource management– Globus enabled– Portal

• A cluster with monitors !!

The Distributed Debugging Tool

• A debugger for distributed parallel application– Launched at Supercomputing 2002

• Aim is to be the de-facto HPC debugging tool– Linux ports for GNU, Absoft, Intel and PGI– IA64 and Solaris ports; AIX and HP-UX soon…– Commodity pricing structure !

• Existing architecture lends itself to the GRID:– Thin client GUI + XML middleware + back-end– Expect GRID-enabled version in 2003

Distributed Graphics Software

• Aims – To enable very large models to be viewed and

manipulated using commodity clusters– Visualisation on (local or remote) graphics client

• Technology– Sophisticated data-partitioning and parallel I/O

tools– Compression using distributed model

simplification– Parallel (real-time) rendering

• To be GRID-enabled within e-Science ‘Gviz’ project

Parallel Compiler and Tools Strategy

• Aim to invest in new computing paradigms

• Developing parallel applications is far from trivial

– OpenMP does not marry with cluster architecture– MPI is too low-level– Few skills in the marketplace !– Yet growth of MPPs is exponential…

• Most existing applications are not GRID-friendly– # of processors fixed– No Fault Tolerance– Little interaction with DRM

DRM for Parallel Computation

• Throughput of parallel jobs is limited by:– Static submission model: ‘mpirun –np …..’– Static execution model: # processors fixed– Scaleability; many jobs use too many

processors !– Job Starvation

• Available tools can only solve some issues– Advanced reservation and back-fill (eg Maui)– Multi-user time-sharing (gang scheduling)

• The application itself must take responsibility !!

Dynamic Job Submission

• Job scheduler should decide the available processor resource !

• The application then requires:– In built partitioning / data management– Appropriate parallel I/O model– Hooks into the DRM

• DRM requires:– Typical memory and processor requirements– LOS information– Hooks into the application

Dynamic Parallel Execution

• Additional resources may become available or be required by other applications during execution…

• Ideal situation:– DRM informs application– Application dynamically re-partitions itself

• Other issues:– DRM requires knowledge of the application

(benefit of data redistribution must outweigh cost !)

– Frequency of dynamic scheduling– Message passing must have dynamic capabilities

The Intelligent Parallel Application

• Optimal scheduling requires more information:

– How well the application scales– Peak and average memory requirements– Application performance vs. architecture

• The application ‘cookie’ concept:– Application (and/or DRM) should gather

information about its own capabilities– DRM can then limit # of available processors– Ideally requires hooks into the programming

paradigm…

Fault Tolerance

• On large MPPs, processors/components will fail !

• Applications need fault tolerance: – Checkpointing + RAID-like redundancy (cf

SCore)– Dynamic repartitioning capabilities– Interaction with the DRM– Transparency from the user’s perspective

• Fault-tolerance relies on many of the capabilities described above…

Conclusions

• Commitment to near-term GRID objectives– Turn-key clusters, farms and storage installations– On going development of ‘GRID-enabled’ tools– Driven by existing commercial opportunities….

• ‘Blue’-sky project for next generation applications

– Exploits existing IP and advanced prototype– Expect moderate income from focussed

exploitation– Strategic positioning: existing paradigms will

ultimately be a barrier to the success of (V-)MPP computers / clusters !