Machine overviewARCHER (a Cray XC30) is a Massively Parallel Processor (MPP) supercomputer design built from many thousands of individual nodes.
There are two basic types of nodes in any Cray XC30:• Compute nodes (4920)
• These only do user computation and are always referred to as “Compute nodes”
• 24 cores per node, therefore approx 120,000 cores
• Service/Login nodes (72/8) • Login nodes – allow users to log in and perform interactive tasks• Other misc service functions
• Serial/Post-Processing Nodes (2)
About ARCHER
Interacting with the systemUsers do not log directly into the system. Instead they run commands via an esLogin server. This server will relay commands and information via a service node referred to as a “Gateway node”
Computenode
Computenode
LNET Nodes
Computenode
Computenode
Gatewaynode
Computenode
Computenode
esLoginnode
Lustre OSS
Lustre OSS
CrayAries
Interconnect
Cray XC30 CabinetsCray Sonnexion
Filesystem
Ext
erna
l N
etw
ork
Infiniband links
Ethernet
User guide
Serialnode
Job submission example
my_job.pbs
nbrown23@eslogin008:~> qsub my_job.pbs50818.sdbnbrown23@eslogin008:~>
PBS QUEUE
Test-job.o50818
Test-job.e50818
Compute node
Compute node
Compute node
Compute node
Compute node
Compute node
nbrown23@eslogin008:~> qstat –u $USER50818.sdb nbrown23 standard test-job -- 2 48 -- 00:20 Q -- nbrown23@eslogin008:~> qstat –u $USER50818.sdb nbrown23 standard test-job 29053 2 48 -- 00:20 R 00:00
#!/bin/bash --login
#PBS -l select=2
#PBS -N test-job#PBS -A budget
#PBS -l walltime=0:20:0
# Make sure any symbolic links are resolved to absolute pathexport PBS_O_WORKDIR=$(readlink -f $PBS_O_WORKDIR) aprun -n 48 -N 24 ./hello_world
Quick start guide
ARCHER LayoutCompute node architecture and topology
Cray XC30 nodeThe XC30 Compute node features:• 2 x Intel® Xeon®
Sockets/die• 12 core Ivy Bridge
• 64GB in normal nodes• 128GB in 376 “high
memory” nodes
• 1 x Aries NIC• Connects to shared Aries
router and wider network
Cray XC30 Compute Node
NUMA Node 1NUMA Node 0
Intel® Xeon®12 Core die
AriesRouter
Intel® Xeon®12 Core die
Aries NIC
32GB 32GB
PCIe 3.0
Aries Network
QPI
DDR3
XC30 Compute Blade
Cray XC30 Rank1 Network
o Chassis with 16 compute bladeso 128 Socketso Inter-Aries communication over
backplaneo Per-Packet adaptive Routing
16 Aries connected by backplane
Cray XC30 Rank-2 Copper Network
4 nodes connect to a single Aries
6 backplanes connected with
copper cables in a 2-cabinet group:
Active optical cables interconnect
groups
2 Cabinet Group
768 Sockets
Copper & Optical Cabling
OpticalConnections
CopperConnections
ARCHER FilesystemsBrief Overview
Nodes and filesystems
RDF /home /work
Login/PP Nodes Compute Nodes
ARCHER Filesystems• /home (/home/n02/n02/<username>)
• Small (200 TB) filesystem for critical data (e.g. source code)• Standard performance (NFS)• Fully backed up
• /work (/work/n02/n02/<username>)• Large (>4 PB) filesystem for use during computations• High-performance, parallel (Lustre) filesystem• No backup
• RDF (/nerc/n02/n02/<username>)• Research Data Facility• Very large (26 PB) filesystem for persistent data storage (e.g. results)• High-performance, parallel (GPFS) filesystem• Backed up via snapshots
User guide
Research Data Facility• Mounted on machines such as:
• ARCHER (service and PP nodes)• DiRAC Bluegene/Q (frontend nodes)• Data Transfer Nodes (DTN)• Jasmin
• Data Analytic Cluster (DAC)• Run compute, memory, or IO intensive analyses on data hosted on
the service.• Nodes are specifically tailored for data intensive work with direct
connections to the disks.• Separate from ARCHER but very similar architecture
RDF guide
ARCHER SoftwareBrief Overview
Cray’s Supported Programming Environment
17
Programming Languages
Fortran
C
C++
I/O Libraries
NetCDF
HDF5
Optimized Scientific Libraries
LAPACK
ScaLAPACK
BLAS (libgoto)
Iterative Refinement
Toolkit
Cray Adaptive FFTs (CRAFFT)
FFTW
Cray PETSc (with CASK)
Cray Trilinos (with CASK)
Cray developed
Licensed ISV SW
3rd party packaging
Cray added value to 3rd party
3rd Party Compilers
• Intel Composer
GNU
Compilers
Cray Compiling Environment
(CCE)
Programming models
Distributed Memory (Cray MPT)• MPI• SHMEM
PGAS & Global View• UPC (CCE)• CAF (CCE)• Chapel
Shared Memory• OpenMP 3.0• OpenACC
Python
•CrayPat• Cray Apprentice2
Tools
Environment setup
Debuggers
Modules
Allinea (DDT)
lgdb
Debugging Support Tools
•Abnormal Termination Processing
Performance Analysis
STAT
Scoping Analysis
Reveal
Module environment• Software is available via the module environment
• Allows you to load in different packages and different versions of packages
• Deals with potential library conflicts
• This is based around the module command• List currently loaded modules: module list• List all modules: module available• Load a module: module load x• Unload a module: module unload x
Best practice guide
ARCHER SAFEService Administration
https://www.archer.ac.uk/safe
SAFE• SAFE is an online ARCHER management system which
all users have an account on• Request machine accounts• Reset passwords• View resource usage
• Primary way in which PIs manage their ARCHER projects• Management of project users • Track user’s project usage• Email users of the project
SAFE user guide
Project resources• Machine usage is charged in kAUs.
• This is time running your jobs on each compute node, 0.36 kAUs for a node hour.
• There is no usage charge for time spent working on the login nodes, post processing nodes or RDF DAC
• You can track usage via the SAFE or the budgets command (calculated daily.)
• Disk quotas• There is no specific charge made for disk usage, but all projects
have quotas• If you need more disk space then contact the PI or us if you
manage the project
User guide
To conclude….• You will be using ARCHER during this course
• If you have any questions then let us know
• The documentation on the archer website is a good reference tool• Especially the quick start guide
• In normal use if you have any questions or can not find something then contact the helpdesk• [email protected]