Operational computing environment at EARS

Operational computing environment at EARS

Jure JermanJure JermanMeteorological OfficeMeteorological Office

Environmental Agency of Slovenia (EARS)Environmental Agency of Slovenia (EARS)

OutlineOutline

Linux Cluster at Environmental Agency Linux Cluster at Environmental Agency of Slovenia, history and present state of Slovenia, history and present state

Operational experiencesOperational experiences Future requirements for limited area Future requirements for limited area

modellingmodelling Needed ingredients for future system?Needed ingredients for future system?

History & backgroundHistory & background

EARS: small service, limited resources for NWPEARS: small service, limited resources for NWP Small NWP group, research & operationsSmall NWP group, research & operations First research Alpha-Linux cluster (1996) – 20 First research Alpha-Linux cluster (1996) – 20

nodes nodes First Linux operational cluster at EARS (1997)First Linux operational cluster at EARS (1997)

5 x Alpha CPU5 x Alpha CPU One among first operational clusters in Europe One among first operational clusters in Europe

in the field of meteorologyin the field of meteorology

Tuba – current cluster Tuba – current cluster systemsystem Installed 3 years ago, Installed 3 years ago,

already outdatedalready outdated Important for gathering Important for gathering

of experiences of experiences Hardware:Hardware:

13 Compute Nodes, 13 Compute Nodes, 1 Master Node, Dual 1 Master Node, Dual

Xeon 2.4 Ghz, Xeon 2.4 Ghz, 28 GB memory28 GB memory Gigabit EthernetGigabit Ethernet

Storage: 4 TB IDE2SCSI Storage: 4 TB IDE2SCSI disk array, xfs disk array, xfs filesystemfilesystem

Tuba softwareTuba software

Open source, whenever possibleOpen source, whenever possible Cluster management software:Cluster management software: OS: RH Linux + SCore (5.8.2) (OS: RH Linux + SCore (5.8.2) (

www.pccluster.orgwww.pccluster.org)) Mature parallel environmentMature parallel environment

Lower latency MPI implementationLower latency MPI implementation Transparent to userTransparent to user Gang schedulingGang scheduling Pre-emptingPre-empting CheckpointingCheckpointing Parallel shellParallel shell Automatic fault recovery (hardware Automatic fault recovery (hardware

or SCore)or SCore) FIFO schedulerFIFO scheduler Capability of integration with Capability of integration with

OpenPBS and SGEOpenPBS and SGE Lahey and Intel compilersLahey and Intel compilers

http://www.pccluster.org/

Ganglia - Cluster Health Ganglia - Cluster Health monitoringmonitoring

Operational experiencesOperational experiences

In production for In production for almost 3 yearsalmost 3 years

Unmonitored suiteUnmonitored suite Minimal hardware Minimal hardware

related problems so related problems so far!far!

Some problems with Some problems with SCore (mainly related SCore (mainly related to buffers in MPI)to buffers in MPI)

NFS related problemsNFS related problems ECMWF's SMS, solves ECMWF's SMS, solves

majority of problems majority of problems

ReliabilityReliability

Operational setupOperational setup

ALADIN modelALADIN model 290x240x37 domain290x240x37 domain 9.3 km resolution9.3 km resolution 54h integration54h integration Target: 1 hTarget: 1 h

OptimizationsOptimizations

Not everything in a hardwareNot everything in a hardwareCode optimizationsCode optimizations

B-Level parallelization (up two 20 % at greater number of B-Level parallelization (up two 20 % at greater number of processors)processors)

Load balancing of grid point computations (depending on Load balancing of grid point computations (depending on the number of processors)the number of processors)

Parameter tuningParameter tuning NPROMA cash tuning NPROMA cash tuning MPI message sizeMPI message size

Improvement in compilers (Lahey –> Intel 8.1 20 – Improvement in compilers (Lahey –> Intel 8.1 20 – 25 %)25 %)

Still to work on: OpenMP (better efficiency of Still to work on: OpenMP (better efficiency of memory usage)memory usage)

Non operational useNon operational use

Downscaling of ERA-40 Downscaling of ERA-40 reanalysis with ALADIN reanalysis with ALADIN modelmodel Estimation of wind energy Estimation of wind energy

potential over Sloveniapotential over Slovenia Multiple nesting of target Multiple nesting of target

computational domain into computational domain into ERA-40 dataERA-40 data

10 years period, 8 years / 10 years period, 8 years / month month

Major question: How to Major question: How to ensure coexistence with ensure coexistence with operational suiteoperational suite

Foreseen developments in Foreseen developments in limited area modelinglimited area modeling

Currently ALADIN 9 kmCurrently ALADIN 9 km 2008-2009 Arome, 2.5 km : ALADIN NH 2008-2009 Arome, 2.5 km : ALADIN NH

solver + Meso NH physicssolver + Meso NH physics 3 times more expensive per Grid Point3 times more expensive per Grid Point Target Arome: ~200 x – 300 x more Target Arome: ~200 x – 300 x more

expensive (same computational domain, expensive (same computational domain, same time range)same time range)

How to get there (if?)How to get there (if?)

Linux commodity cluster at EARS?Linux commodity cluster at EARS? First upgrade in the mid 2006First upgrade in the mid 2006 5 times the current system (if possible, 5 times the current system (if possible,

below 64 processors)below 64 processors) Tests going on with:Tests going on with:

New processors: AMD Opteron, Intel Itanium-2New processors: AMD Opteron, Intel Itanium-2 Interconnection: Infinyband, Quadrics?Interconnection: Infinyband, Quadrics?

Compilers: PathScale (AMD Opteron) Compilers: PathScale (AMD Opteron) Crucial: Parallel file system (TerraGrid), Crucial: Parallel file system (TerraGrid),

already installed, replacement of NFS already installed, replacement of NFS

How to stay at the open How to stay at the open side of the fence?side of the fence?

Linux and other OpenSource projects are evolvingLinux and other OpenSource projects are evolving Great number of more and more complex software Great number of more and more complex software

projectsprojects Specific (operational) requirements in meteorologySpecific (operational) requirements in meteorology Space for system integratorsSpace for system integrators Price/performance gap between commodity and Price/performance gap between commodity and

brand name systems is getting smaller when the brand name systems is getting smaller when the size of system is growingsize of system is growing

Pioneer time of Beowulf clusters seems to be over Pioneer time of Beowulf clusters seems to be over Importance of extensive test of all cluster Importance of extensive test of all cluster

componentscomponents

ConclusionsConclusions

Positive experiences with small Positive experiences with small commodity Linux cluster, great commodity Linux cluster, great price/performance ratioprice/performance ratio

Our present type of development of Our present type of development of new cluster works for small cluster, new cluster works for small cluster, might work for medium sized and might work for medium sized and doesn’t for big systemsdoesn’t for big systems

Future are probably Linux clusters, Future are probably Linux clusters, but brandedbut branded

Date post:	07-Jan-2016
Category:	Documents
Upload:	salene
View:	29 times
Download:	0 times

Operational computing environment at EARS

Documents