Post on 24-Dec-2015
transcript
Building a High-performance Computing Cluster Using FreeBSD
BSDCon '03September 10, 2003Brooks Davis, Michael AuYeung, Gary Green, Craig LeeThe Aerospace CorporationEl Segundo, CA{brooks,lee,mauyeung}@aero.org, Gary.B.Green@aero.org
HPC Clustering Basics
●
HPC Cluster features:
–
Commodity computers
–
Networked to enable distributed, parallel computations
–
Vastly lower cost compared to traditional supercomputers
●
Many, but not all HPC applications work well on clusters
Cluster Overview
●
Fellowship is the Aerospace Corporate Cluster
–
Name is short for "The Fellowship of the Ring"
●
Running FreeBSD 4.8-STABLE
●
Over 183GFlops of floating point performance using the LINPACK benchmark
Cluster OverviewNodes and Servers
●
160 Nodes (320 CPUs)
–
dual CPU 1U systems with Gigabit Ethernet
–
86 Pentium III (7 1GHz, 40 1.26GHz, 39 1.4GHz
–
74 Xeon 2.4GHz
●
4 Core Systems
–
frodo – management server
–
fellowship – shell server
–
gamgee – backup, database, monitoring server
–
legolas – scratch server (2.8TB)
Cluster OverviewNetwork and Remote Access
●
Gigabit Ethernet network
–
Cisco Catalyst 6513 switch
–
Populated with 11 16-port 10/100/1000T blades
●
Serial console access
–
Cyclades TS2000 and TS3000 Terminal Servers
●
Power control
–
Baytech RPC4 and RPC14 serial power controllers
Cluster OverviewPhysical Layout
Design Issues
●
Operating System
●
Hardware Architecture
●
Network Interconnects
●
Addressing and Naming
●
Node Configuration Management
●
Job Scheduling
●
System Monitoring
Operating System
●
Almost anything can work
●
Considerations:
–
Local experience
–
Needed applications
–
Maintenance model
–
Need to modify OS
●
FreeBSD
–
Diskless support
–
Cluster architect is a committer
–
Ease of upgrades
–
Linux Emulation
Hardware Architecture
●
Many choices:
–
i386, SPARC, Alpha
●
Considerations:
–
Price
–
Performance
–
Power/heat
–
Software support (OS, apps, dev tools)
●
Intel PIII/Xeon
–
Price
–
OS Support
–
Power
Network Interconnects
●
Many choices
–
10/100 Ethernet
–
Gigabit Ethernet
–
Myrinet
●
Issues
–
price
–
OS support
–
application mix
●
Gigabit Ethernet
–
application mix
●
middle ground between tightly and loosely coupled applications
–
price
Addressing and Naming Schemes
●
To subnet or not?
●
Public or private IPs?
●
Naming conventions
–
The usual rules apply to core servers
–
Large cluster probably want more mechanical names for nodes
●
10.5/16 private subnet
●
Core servers named after Lord of the Rings characters
●
Nodes named and numbed by location
–
rack 1, node 1:
● r01n01● 10.5.1.1
Node Configuration Management
●
Major methods:
–
individual installs
–
automated installs
–
network booting
●
Automation is critical
●
Network booted nodes
–
PXE
●
Automatic node disk configuration
–
version in MBR
– diskprep script●
Upgrade using copy of root
Job Scheduling
●
Options
–
manual scheduling
–
batch queuing systems (SGE, OpenPBS, etc.)
–
custom schedulers
●
Sun Grid Engine
–
Ported to FreeBSD starting with Ron Chen's patches
System Monitoring
●
Standard monitoring tools:
–
Nagios (aka Net Saint)
–
Big Sister
●
Cluster specific tools:
–
Ganglia
–
Most schedulers
●
Ganglia
–
port: sysutils/ganglia-monitor-core
●
Sun Grid Engine
System MonitoringGanglia
Lessons Learned
●
Hardware attrition can be significant
●
Neatness counts in cabling
●
System automation is very important
–
If you do it to a node, automate it
●
Much of the HPC community thinks the world is a Linux box
FY 2004 Plans
●
Switch upgrades: Sup 720 and 48-port blades
●
New racks: another row of racks adding 6 more node racks (192 nodes)
●
More nodes: either more Xeons or Opterons
●
Upgrade to FreeBSD 5.x
Future Directions
●
Determining a node replacement policy
●
Clustering on demand
●
Schedular improvements
●
Grid integration (Globus Toolkit)
●
Trusted clusters
Wish List
●
Userland:
–
Database driven, PXE/DHCP server
●
Kernel:
–
Distributed files system support (i.e. GFS)
–
Checkpoint and restart capability
–
BProc style distributed process management
Acknowledgements
Aerospace
–
Michael AuYeung
–
Brooks Davis
–
Alan Foonberg
–
Gary Green
–
Craig Lee
Vendors
–
iXsystems
–
Off My Server
–
Iron Systems
–
Raj Chahal
●
iXsystems, Iron Systems, ASA Computers
Resources
●
Paper and presentation:
–
http://people.freebsd.org/~brooks/papers/bsdcon2003/