Operational computing environment at EARS Jure Jerman Jure Jerman Meteorological Office Meteorological Office Environmental Agency of Slovenia (EARS) Environmental Agency of Slovenia (EARS)
Transcript
Slide 1
Operational computing environment at EARS Jure Jerman
Meteorological Office Environmental Agency of Slovenia (EARS)
Slide 2
Outline Linux Cluster at Environmental Agency of Slovenia,
history and present state Linux Cluster at Environmental Agency of
Slovenia, history and present state Operational experiences
Operational experiences Future requirements for limited area
modelling Future requirements for limited area modelling Needed
ingredients for future system? Needed ingredients for future
system?
Slide 3
History & background EARS: small service, limited resources
for NWP EARS: small service, limited resources for NWP Small NWP
group, research & operations Small NWP group, research &
operations First research Alpha-Linux cluster (1996) 20 nodes First
research Alpha-Linux cluster (1996) 20 nodes First Linux
operational cluster at EARS (1997) First Linux operational cluster
at EARS (1997) 5 x Alpha CPU 5 x Alpha CPU One among first
operational clusters in Europe in the field of meteorology One
among first operational clusters in Europe in the field of
meteorology
Slide 4
Tuba current cluster system Installed 3 years ago, already
outdated Installed 3 years ago, already outdated Important for
gathering of experiences Important for gathering of experiences
Hardware: Hardware: 13 Compute Nodes, 13 Compute Nodes, 1 Master
Node, Dual Xeon 2.4 Ghz, 1 Master Node, Dual Xeon 2.4 Ghz, 28 GB
memory 28 GB memory Gigabit Ethernet Gigabit Ethernet Storage: 4 TB
IDE2SCSI disk array, xfs filesystem Storage: 4 TB IDE2SCSI disk
array, xfs filesystem
Slide 5
Tuba software Open source, whenever possible Cluster management
software: Cluster management software: OS: RH Linux + SCore (5.8.2)
(www.pccluster.org) OS: RH Linux + SCore (5.8.2)
(www.pccluster.org)www.pccluster.org Mature parallel environment
Mature parallel environment Lower latency MPI implementation Lower
latency MPI implementation Transparent to user Transparent to user
Gang scheduling Gang scheduling Pre-empting Pre-empting
Checkpointing Checkpointing Parallel shell Parallel shell Automatic
fault recovery (hardware or SCore) Automatic fault recovery
(hardware or SCore) FIFO scheduler FIFO scheduler Capability of
integration with OpenPBS and SGE Capability of integration with
OpenPBS and SGE Lahey and Intel compilers Lahey and Intel
compilers
Slide 6
Ganglia - Cluster Health monitoring
Slide 7
Operational experiences In production for almost 3 years In
production for almost 3 years Unmonitored suite Unmonitored suite
Minimal hardware related problems so far! Minimal hardware related
problems so far! Some problems with SCore (mainly related to
buffers in MPI) Some problems with SCore (mainly related to buffers
in MPI) NFS related problems NFS related problems ECMWF's SMS,
solves majority of problems ECMWF's SMS, solves majority of
problems
Slide 8
Reliability
Slide 9
Operational setup ALADIN model 290x240x37 domain 290x240x37
domain 9.3 km resolution 9.3 km resolution 54h integration 54h
integration Target: 1 h Target: 1 h
Slide 10
Optimizations Not everything in a hardware Code optimizations
B-Level parallelization (up two 20 % at greater number of
processors) B-Level parallelization (up two 20 % at greater number
of processors) Load balancing of grid point computations (depending
on the number of processors) Load balancing of grid point
computations (depending on the number of processors) Parameter
tuning Parameter tuning NPROMA cash tuning NPROMA cash tuning MPI
message size MPI message size Improvement in compilers (Lahey >
Intel 8.1 20 25 %) Improvement in compilers (Lahey > Intel 8.1
20 25 %) Still to work on: OpenMP (better efficiency of memory
usage) Still to work on: OpenMP (better efficiency of memory
usage)
Slide 11
Non operational use Downscaling of ERA-40 reanalysis with
ALADIN model Downscaling of ERA-40 reanalysis with ALADIN model
Estimation of wind energy potential over Slovenia Estimation of
wind energy potential over Slovenia Multiple nesting of target
computational domain into ERA- 40 data Multiple nesting of target
computational domain into ERA- 40 data 10 years period, 8 years /
month 10 years period, 8 years / month Major question: How to
ensure coexistence with operational suite Major question: How to
ensure coexistence with operational suite
Slide 12
Foreseen developments in limited area modeling Currently ALADIN
9 km Currently ALADIN 9 km 2008-2009 Arome, 2.5 km : ALADIN NH
solver + Meso NH physics 2008-2009 Arome, 2.5 km : ALADIN NH solver
+ Meso NH physics 3 times more expensive per Grid Point 3 times
more expensive per Grid Point Target Arome: ~200 x 300 x more
expensive (same computational domain, same time range) Target
Arome: ~200 x 300 x more expensive (same computational domain, same
time range)
Slide 13
How to get there (if?) Linux commodity cluster at EARS? First
upgrade in the mid 2006 First upgrade in the mid 2006 5 times the
current system (if possible, below 64 processors) 5 times the
current system (if possible, below 64 processors) Tests going on
with: Tests going on with: New processors: AMD Opteron, Intel
Itanium-2 New processors: AMD Opteron, Intel Itanium-2
Interconnection: Infinyband, Quadrics? Interconnection: Infinyband,
Quadrics? Compilers: PathScale (AMD Opteron) Compilers: PathScale
(AMD Opteron) Crucial: Parallel file system (TerraGrid), already
installed, replacement of NFS Crucial: Parallel file system
(TerraGrid), already installed, replacement of NFS
Slide 14
How to stay at the open side of the fence? Linux and other
OpenSource projects are evolving Linux and other OpenSource
projects are evolving Great number of more and more complex
software projects Great number of more and more complex software
projects Specific (operational) requirements in meteorology
Specific (operational) requirements in meteorology Space for system
integrators Space for system integrators Price/performance gap
between commodity and brand name systems is getting smaller when
the size of system is growing Price/performance gap between
commodity and brand name systems is getting smaller when the size
of system is growing Pioneer time of Beowulf clusters seems to be
over Pioneer time of Beowulf clusters seems to be over Importance
of extensive test of all cluster components Importance of extensive
test of all cluster components
Slide 15
Conclusions Positive experiences with small commodity Linux
cluster, great price/performance ratio Positive experiences with
small commodity Linux cluster, great price/performance ratio Our
present type of development of new cluster works for small cluster,
might work for medium sized and doesnt for big systems Our present
type of development of new cluster works for small cluster, might
work for medium sized and doesnt for big systems Future are
probably Linux clusters, but branded Future are probably Linux
clusters, but branded