Distributed Resource Management and
Parallel Computation
Dr Michael RudgyardStreamline Computing Ltd
Streamline Computing Ltd
• Spin out of Warwick (& Oxford) University
• Specialising in distributed (technical) computing
– Cluster and GRID computing technology
• 14 employees & growing; focussed expertise in:
– Scientific Computing– Computer systems and support – Presently 5 PhDs in HPC and Parallel
Computation– Expect growth to 20+ people in 2003
Strategy
• Establish an HPC systems integration company..
• ....but re-invest profits into software– Exploiting IP and significant expertise– First software product released– Two more products in prototype stage
• Two complementary ‘businesses’– Both high growth
Track Record (2001 – date..)
• Installations include:– Largest Sun HPC cluster in Europe (176 proc)– Largest Sun / Myrinet cluster in UK (128 proc)– AMD, Intel and Sun clusters at 21 UK Universities– Commercial clients include Akzo Noble, Fujitsu,
Maclaren F1, Rolls Royce, Schlumberger, Texaco….
• Delivered a 264 proc Intel/Myrinet cluster:– 1.3 Tflop/s Peak !! – Forms part of the White Rose Computational
Grid
Streamline and Grid Computing
• Pre-configured ‘grid’-enabled systems:– Clusters and farms– The SCore parallel environment– Virtual ‘desktop’ clusters
• Grid-enabled software products:
– The Distributed Debugging Tool– Large-scale distributed graphics– Scaleable, intelligent & fault tolerant parallel
computing
‘Grid’-enabled turnkey clusters
• Choice of DRMs and schedulers:– (Sun) GridEngine – PBS / PBS-Pro– LSF / ClusterTools– Condor – Maui Scheduler
• Globus 2.x gatekeeper (Globus 3 ???)
• Customised access portal
The SCore parallel environment
• Developed by the Real World Computing Partnership in Japan (www.pccluster.org).
• Unique features, that are unavailable in most parallel environments:
– Low latency, high bandwidth MPI drivers – Network transparency: Ethernet, Gigabit and
Myrinet– Multi-user time-sharing (gang scheduling)– O/S level checkpointing and failover – Integration with PBS and SGE– MPICH-G port– Cluster management functionality
‘Desktop’ Clusters
• Linux Workstation Strategy– Integrated software stack for HPTC
(compilers, tools & libraries) – cf. UNIX workstations
• Aim to provide a GRID at point of sale:– Single point of administration for several
machines– Files served from front-end – Resource management– Globus enabled– Portal
• A cluster with monitors !!
The Distributed Debugging Tool
• A debugger for distributed parallel application– Launched at Supercomputing 2002
• Aim is to be the de-facto HPC debugging tool– Linux ports for GNU, Absoft, Intel and PGI– IA64 and Solaris ports; AIX and HP-UX soon…– Commodity pricing structure !
• Existing architecture lends itself to the GRID:– Thin client GUI + XML middleware + back-end– Expect GRID-enabled version in 2003
Distributed Graphics Software
• Aims – To enable very large models to be viewed and
manipulated using commodity clusters– Visualisation on (local or remote) graphics client
• Technology– Sophisticated data-partitioning and parallel I/O
tools– Compression using distributed model
simplification– Parallel (real-time) rendering
• To be GRID-enabled within e-Science ‘Gviz’ project
Parallel Compiler and Tools Strategy
• Aim to invest in new computing paradigms
• Developing parallel applications is far from trivial
– OpenMP does not marry with cluster architecture– MPI is too low-level– Few skills in the marketplace !– Yet growth of MPPs is exponential…
• Most existing applications are not GRID-friendly– # of processors fixed– No Fault Tolerance– Little interaction with DRM
DRM for Parallel Computation
• Throughput of parallel jobs is limited by:– Static submission model: ‘mpirun –np …..’– Static execution model: # processors fixed– Scaleability; many jobs use too many
processors !– Job Starvation
• Available tools can only solve some issues– Advanced reservation and back-fill (eg Maui)– Multi-user time-sharing (gang scheduling)
• The application itself must take responsibility !!
Dynamic Job Submission
• Job scheduler should decide the available processor resource !
• The application then requires:– In built partitioning / data management– Appropriate parallel I/O model– Hooks into the DRM
• DRM requires:– Typical memory and processor requirements– LOS information– Hooks into the application
Dynamic Parallel Execution
• Additional resources may become available or be required by other applications during execution…
• Ideal situation:– DRM informs application– Application dynamically re-partitions itself
• Other issues:– DRM requires knowledge of the application
(benefit of data redistribution must outweigh cost !)
– Frequency of dynamic scheduling– Message passing must have dynamic capabilities
The Intelligent Parallel Application
• Optimal scheduling requires more information:
– How well the application scales– Peak and average memory requirements– Application performance vs. architecture
• The application ‘cookie’ concept:– Application (and/or DRM) should gather
information about its own capabilities– DRM can then limit # of available processors– Ideally requires hooks into the programming
paradigm…
Fault Tolerance
• On large MPPs, processors/components will fail !
• Applications need fault tolerance: – Checkpointing + RAID-like redundancy (cf
SCore)– Dynamic repartitioning capabilities– Interaction with the DRM– Transparency from the user’s perspective
• Fault-tolerance relies on many of the capabilities described above…
Conclusions
• Commitment to near-term GRID objectives– Turn-key clusters, farms and storage installations– On going development of ‘GRID-enabled’ tools– Driven by existing commercial opportunities….
• ‘Blue’-sky project for next generation applications
– Exploits existing IP and advanced prototype– Expect moderate income from focussed
exploitation– Strategic positioning: existing paradigms will
ultimately be a barrier to the success of (V-)MPP computers / clusters !