Rocks Clusters
SUN HPC Consortium
November 2004
Federico D. Sacerdoti
Advanced CyberInfrastructure Group
San Diego Supercomputer Center
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Outline
• Rocks Identity• Rocks Mission• Why Rocks • Rocks Design• Rocks Technologies, Services, Capabilities• Rockstar
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Identity
• System to build and manage Linux Clusters
General Linux maintenance system for N nodes
Desktops too
Happens to be good for clusters
• Free
• Mature
• High Performance Designed for scientific workloads
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Mission
• Make Clusters Easy (Papadopoulos, 00)
• Most cluster projects assume a sysadmin will help build the cluster.
• Build a cluster without assuming CS knowledge
Simple idea, complex ramifications Automatic configuration of all components and services ~30 services on frontend, ~10 services on compute nodes
Clusters for Scientists
• Results in a very robust system that is insulated from human mistakes
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Why Rocks
• Easiest way to build a Rockstar-class machine with SGE ready out of the box
• More supported architectures Pentium, Athlon, Opteron, Nocona, Itanium
• More happy users 280 registered clusters, 700 member support list HPCwire Readers Choice Awards 2004
• More configured HPC software: 15 optional extensions (rolls) and counting.
• Unmatched Release Quality.
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Why Rocks
• Big projects use Rocks BIRN (20 clusters)
GEON (20 clusters)
NBCR (6 clusters)
• Supports different clustering toolkits Rocks Standard (RedHat HPC) SCE SCore (Single Process Space) OpenMosix (Single Process Space: on the way)
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Design
• Uses RedHat’s intelligent installer Leverages RedHat’s ability to discover & configure hardware Everyone tries System Imaging at first
Who has homogeneous hardware? If so, whose cluster stays that way?
• Description Based install: Kickstart Like Jumpstart
• Contains a viable Operating System No need to “pre-configure” an OS
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Design
• No special “Rocksified” package structure. Can install any RPM.
• Where Linux core packages come from: RedHat Advanced Workstation (from SRPMS)
Enterprise Linux 3
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Leap of Faith
• Install is primitive operation for Upgrade and Patch Seems wrong at first
Why must you reinstall the whole thing?
Actually right: debugging a Linux system is fruitless at this scale. Reinstall enforces stability.
Primary user has no sysadmin to help troubleshoot
• Rocks install is scalable and fast: 15min for entire cluster Post script work done in parallel by compute nodes.
• Power Admins may use up2date or yum for patches. To compute nodes by reinstall
Rocks Technology
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Cluster Integration with Rocks
1. Build a frontend node1. Insert CDs: Base, HPC, Kernel, optional Rolls
2. Answer install screens: network, timezone, password
2. Build compute nodes1. Run insert-ethers on frontend (dhcpd listener)
2. PXE boot compute nodes in name order
3. Start Computing
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Tech: Dynamic Kickstart File
On node install
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Roll Architecture
• Rolls are Rocks Modules Think Apache
• Software for cluster Packaged
3rd party tarballs
Tested Automatically configured
services
• RPMS plus Kickstart graph in ISO form.
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Tech: Dynamic Kickstart File
With Roll (HPC)
HPCbase
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Tech: Wide Area Net InstallInstall a frontend without CDs
Benefits• Can install from minimal
boot image
• Rolls downloaded dynamically
• Community can build specific extensions
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Tech: Security & EncryptionTo protect the kickstart file
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Tech: 411 Information Service
• 411 does NIS Distribute passwords
• File based, simple HTTP transport Multicast
• Scalable
• Secure
Rocks Services
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Cluster Homepage
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Services: Ganglia Monitoring
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Services: Job Monitoring
SGE Batch System
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Services: Job Monitoring
How a job affects resources on this node
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Services: Configured, Ready
• Grid (Globus, from NMI)
• Condor (NMI)
Globus GRAM
• SGE Globus GRAM
• MPD parallel job launcher (Argonne) MPICH 1, 2
• Intel Compiler set
• PVFS
Rocks Capabilities
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
High Performance Interconnect Support
• Myrinet All major versions, GM2 Automatic configuration and support in Rocks since first
release
• Infiniband Via Collaboration with AMD & Infinicon
IB IPoIB
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Visualization “Viz” Wall
• Enables LCD Clusters One PC / tile Gigabit Ethernet Tile Frame
• Applications Large remote sensing Volume Rendering Seismic Interpretation
• Electronic Visualization Lab Bio-Informatics Bio-Imaging (NCMIR BioWall)
Rockstar
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rockstar Cluster
• Collaboration between SDSC and SUN
• 129 Nodes: Sun V60x (Dual P4 Xeon) Gigabit Ethernet Networking (copper) Top500 list positions: 201, 433
• Built on showroom floor of Supercomputing Conference 2003
Racked, Wired, Installed: 2 hrs total Running apps through SGE
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Building of Rockstar
QuickTime™ and aMPEG-4 Video decompressor
are needed to see this picture.
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rockstar Topology
• 24-port switches• Not a symmetric network
Best case - 4:1 bisection bandwidth Worst case - 8:1 Average - 5.3:1
• Linpack achieved 49% of peak• Very close to percentage peak of
1st generation DataStar at SDSC
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Future Work
• High Availability: N Frontend nodes. Not that far off (supplemental install server design) Limited by Batch System
Frontends are long lived in practice: Keck 2 Cluster (UCSD) uptime: 249 days, 2:56
• Extreme install scaling• More Rolls!• Refinements
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
www.rocksclusters.org
• Rocks mailing List https://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion
• Rocks Cluster Register http://www.rocksclusters.org/rocks-register
• Core: {fds,bruno,mjk,phil}@sdsc.edu