Introduction to Fyrkat
An introduction to FyrkatCluster Computing
Andreas Engelbredt Dalsgaard
May 25, 2011
Andreas Engelbredt Dalsgaard An introduction to Fyrkat
Introduction to Fyrkat
How to get an account
https://fyrkat.grid.aau.dk/useraccount
Andreas Engelbredt Dalsgaard An introduction to Fyrkat
Introduction to Fyrkat
How to get help
https://fyrkat.grid.aau.dk/wiki
Andreas Engelbredt Dalsgaard An introduction to Fyrkat
Introduction to Fyrkat
What is a Cluster Anyway
It is NOT something that does any of the following:
Use magicAutomatically make your program run fasterProvide a single virtual OS image of all resourcesAlways makes your software faster
Then what is it?
Andreas Engelbredt Dalsgaard An introduction to Fyrkat
Introduction to Fyrkat
Terminology
Cluster
A set of closely connected computersUsually homogeneousConnectivity: GB Ethernet, InfiniBand/Myranet/etc.They usually run some form of *nixOften has InfiniBand/Myranet/etc.
Andreas Engelbredt Dalsgaard An introduction to Fyrkat
Introduction to Fyrkat
Terminology
Grid ComputingVirtual Supercomputer
Composed of several clusters
Distributed within organisation or globallyHigh-energy Physics E.g. CERN
Cloud Computing
Software as a serviceOn demand resourcesCloud storageCommodity hardware componentsUsually virtual access to resources
Andreas Engelbredt Dalsgaard An introduction to Fyrkat
Introduction to Fyrkat
So what is a cluster used for and by whom
Purpose
To solve problems faster than on a single machineTo solve problems that cannot be solved on a single machinePerformance is everything
Users want the last 10%Think about how the cache is usedThink about memory organisationConsider communication overhead
Users
Scientific researchersEngineersAcademic institutionsGovernment agencies E.g. military
Andreas Engelbredt Dalsgaard An introduction to Fyrkat
Introduction to Fyrkat
What does it look like
Andreas Engelbredt Dalsgaard An introduction to Fyrkat
Introduction to Fyrkat
Cluster Overview
Frontend nodeStorage Admin node
LAN
Compute nodes
...
User
Internet
Andreas Engelbredt Dalsgaard An introduction to Fyrkat
Introduction to Fyrkat
Cluster Overview(2)
Andreas Engelbredt Dalsgaard An introduction to Fyrkat
Introduction to Fyrkat
How are clusters used
A LRMS: Local Resource Management System
Also called batch systems
Tasks are split into jobsJobs are executed by the batch systemOrder of execution depend on schedulerA job gets a set of nodes/CPUs (exclusively)
How are most LRMS used
Make a job descriptionSubmit it to the LRMSThe LRMS will decide when to run the job
Torque/PBS, SLURM, LoadLeveler, Condor
Some are open source, some are proprietary
Andreas Engelbredt Dalsgaard An introduction to Fyrkat
Introduction to Fyrkat
What Runs Fyrkat
OpenSSH is used intensively
Ubuntu 10.04 LTS
Admin node
A LRMS serverOpenLdap server(user directory service)Application file server (/pack)
Compilers, Gnu and Intel c,c++ and fortranLibraries, E.g. openmpi
Dhcp, tftp, email, mailinglist, monitoring, monitoring andmonitoring
Frontend node
LRMS clientMount application filesystemFirewall
Andreas Engelbredt Dalsgaard An introduction to Fyrkat
Introduction to Fyrkat
What Runs Fyrkat(2)
Compute nodes
LRMS clientMount application filesystem/scratch partitionEverything else is read only
Storage
User dataAutomounted on all nodesBattery backed RAM, SSD, disksNo guaranties, use backup
Andreas Engelbredt Dalsgaard An introduction to Fyrkat
Introduction to Fyrkat
A Little About Hardware
For common jobs
84 IBM blade computers in 6 bladecenters.(killing1-84) 672cores, 1,3 Tb RAM
With two Xeon E5345 quad core 2.33 GHz CPUs, 16 GBRAM, a 53 GB scratch partition, Gbit ethernet and InfiniBandinterconnect.
24 HP computers (lion1-24) 288 cores, 3,5 Tb RAM
With two Xeon X5650 six core 2.66 GHz CPUs, 145 Gb RAM,a 53GB scratch partition, Gbit ethernet and InfiniBandinterconnect.
Multi-core CPUs is the norm
Andreas Engelbredt Dalsgaard An introduction to Fyrkat
Introduction to Fyrkat
A Little About Hardware(2)
Test jobs
5 SUN computers (sister1-5)
With a Xeon X3220 quad core 2.40GHz CPU, 8GB RAM, a 66GB scratch partition, Gbit ethernet interconnect
Interactive jobs
4 SUN computers (tiger1-4)
With two Xeon X5570 quad core 2.93GHz CPU withhyperthreading(HT) enabled, 68GB RAM, Gbit ethernetinterconnect.
GPU jobs
10 Colfax computers (cub1-10)
With two Xeon X5570 quad core 2.93GHz CPU with HTenabled, 48GB RAM, Gbit ethernet and InfiniBandinterconnect, 3xTesla C2070 or Tesla C1050(soon GTX 580)
Andreas Engelbredt Dalsgaard An introduction to Fyrkat
Introduction to Fyrkat
Trends
TrendsGPU used for calculationsMassive parallelism10 gbit ethernet
Andreas Engelbredt Dalsgaard An introduction to Fyrkat
Introduction to Fyrkat
What people use Fyrkat for
ScoutAnalysis of integrated circuitsContinues systemsE.g. Noise Analysis in integrated circuitsMobile communication
FyrkatContinues systemsSimulate an antennas behaviour
Antenna near field simulation using Finite-differencetime-domain method
Acoustic simulationsGrid computing
Andreas Engelbredt Dalsgaard An introduction to Fyrkat
Introduction to Fyrkat
How people use Fyrkat
Parameter sweeps
Serial code run in parallel
Distributed jobs
Programs are made explicitly parallelThis is hard workOpenMP, Pthreads, MPIMPI is the hard part
Usually it is more of a rewrite than a port
Many MPI solutions for different interconnect types
Andreas Engelbredt Dalsgaard An introduction to Fyrkat
Introduction to Fyrkat
Common Considerations
Task generationStatic task generation
Matrix-multiplicationParameter sweepsState-space exploration E.g. 15-puzzle
Dynamic task generation
Simulation using the Finite-difference time-domain methodRay tracingState-space exploration E.g. 15-puzzle
Task size
Uniform versus non-uniform
Size of Data Associated with Tasks
Andreas Engelbredt Dalsgaard An introduction to Fyrkat
Introduction to Fyrkat
Fyrkat
ssh fyrkat.grid.aau.dk
Pretty easy to get access to - Talk to your supervisor and use:https://fyrkat.grid.aau.dk/useraccount
Andreas Engelbredt Dalsgaard An introduction to Fyrkat
Introduction to Fyrkat
Scout
ssh scout.es.aau.dk
Hard to get access to - Talk to Torben Larsen or Josva Kleist
Andreas Engelbredt Dalsgaard An introduction to Fyrkat
Introduction to Fyrkat
How to get help
https://fyrkat.grid.aau.dk/wiki
Andreas Engelbredt Dalsgaard An introduction to Fyrkat
Introduction to Fyrkat
How to use job manager on Fyrkat
Software is installed in /pack
Start by adapting your PATH environment variable(see wiki)
Should add: /pack/slurm/bin/
See job queue: squeue
See status of workernodes: sinfo
See details about job: scontrol show job JobId=job ID
Andreas Engelbredt Dalsgaard An introduction to Fyrkat
Introduction to Fyrkat
How to submit job on Fyrkat
% - my_job_script
#!/bin/sh
#SBATCH --time=5:00
#SBATCH -n 1
#SBATCH --job-name=hello_world
#SBATCH --error=slurm-%j.err
#SBATCH --mail-type=FAIL
#SBATCH [email protected]
echo Hello World
sbatch my job script
See job queue: squeue
Andreas Engelbredt Dalsgaard An introduction to Fyrkat
Introduction to Fyrkat
How to submit MPI job on Fyrkat
export PATH=/pack/slurm/bin:/pack/openmpi-gnu-ib/bin:$PATH mpicc
mpi array.c -o mpi array
% - mpi_job_script
#!/bin/sh
#SBATCH --time=5:00
#SBATCH -n 12
#SBATCH --partition=lion
export LD_LIBRARY_PATH=/pack/openmpi-gnu-ib/lib:$LD_LIBRARY_PATH
mpirun mpi_array
sbatch mpi job script
Andreas Engelbredt Dalsgaard An introduction to Fyrkat
Introduction to Fyrkat
Demonstration
Andreas Engelbredt Dalsgaard An introduction to Fyrkat
Introduction to Fyrkat
How to get an account
https://fyrkat.grid.aau.dk/useraccount
Andreas Engelbredt Dalsgaard An introduction to Fyrkat