Post on 26-Mar-2015
transcript
OpenMOSIX experiences in Naples
INFN - Napoli1
INFM - UDR Napoli2
University of Naples (Dept. Of Physics)3
CINECA (Bologna) – November 2002
Rosario Esposito1 [resposit@na.infn.it]Paolo Mastroserio1 [mastroserio@na.infn.it]
Francesco Maria Taurino1,2 [taurino@na.infn.it]Gennaro Tortone1 [tortone@na.infn.it]
Index introduction our first cluster: MOSIX (Feb 2000) Majorana: (Jan 2001) farm setup: Etherboot & ClusterNFS VIRGO experiment (Jun 2001) ARGO experiment (Jan 2002) conclusions
Introduction (1/2)
Why Linux farm ?
high performance
low cost
Problems with big supercomputers
high cost
low and expensive scalability
(CPU, disk, memory, OS, programming tools, applications)
Introduction (2/2)
Why OpenMOSIX ?
In this environment, the choice (open)Mosix has been proven to be an optimal solution to give a general
performance boost on implemented systems.
network transparency preemptive process migration dynamic load balancing decentralized control and autonomy
Our first cluster: MOSIX (Feb 2000) (1/2)
Our first test cluster was configured in February 2000:
10 PCs, running Mandrake 6.1, acting as public Xterminals used by our students to open Xsessions on a DEC-Unix AlphaServer;
Those machines had the following hardware configuration:
- Pentium 200 Mhz- 64 MB RAM- 4 GB hard disk
Our first cluster: MOSIX (Feb 2000) (2/2)
We tried to turn this "Xterminals" in something more useful…
Mosix 0.97.3 and kernel 2.2.14 were used to convert those PCs in a small cluster to perform some data-reduction tests (mp3 compression with bladeenc program)
Compressing a wav file in mp3 format using bladeenc could take up to 10 minutes on a Pentium 200. Using the Mosix cluster, without any source code modification, we were able to compress a ripped audio cd (14-16 songs) in no more than 20 minutes
Once verified the ability of Mosix to reduce the execution time of those "toy" programs thanks to preemptive process migration and dynamic load balancing, we decided to implement a bigger cluster to offer a high performance facility to our scientific community…
Majorana: (Jan 2001) (1/2)
we decided to build a more powerful Mosix cluster available to all of our users, using low cost solutions and opensource tools (MOSIX, MPI, PVM);
The farm was composed by 6 machines.
5 computing elements with: Abit VP6 motherboard 2 Pentium III @800 Mhz 512 MB RAM PC133 a 100 Mb/s network card (3com 3C905C)
Majorana: (Jan 2001) (2/2)
1 server with: Asus CUR-DLS motherboard 2 Pentium III @800 Mhz 512 MB RAM PC133 4 IDE HD (os + home directories in RAID) a 100 Mb/s network card (3com 3C905C) - public LAN a 1000 Mb/s network card (Netgear GA620T) - private LAN
All of the nodes were connected to a Netgear switch equipped with8 Fast Ethernet ports and a Gigabit port dedicated to the server
farm setup: Etherboot & ClusterNFS
diskless nodes low cost eliminates install/upgrade of hardware, software on
diskless client side backups are centralized in one single main server zero administration at diskless client side
Etherboot (1/2)
DescriptionEtherboot is a package for creating ROM images that can download code from the network to be executed on an x86 computer
Examplemaintaining centrally software for a cluster of equally configured workstations
URLhttp://www.etherboot.org
Etherboot (2/2)
The components needed by Etherboot are A bootstrap loader, on a floppy or in an EPROM on a
NIC card A Bootp or DHCP server, for handing out IP addresses
and other information when sent a MAC (Ethernet card) address
A tftp server, for sending the kernel images and other files required in the boot process
A NFS server, for providing the disk partitions that will be mounted when Linux is being booted.
A Linux kernel that has been configured to mount the root partition via NFS
Diskless farm setup traditional method (1/2)
Traditional method Server
BOOTP server NFS server separate root directory for each client
Client BOOTP to obtain IP TFTP or boot floppy to load kernel rootNFS to load root filesystem
Diskless farm setup traditional method (2/2)
Traditional method – Problemsseparate root directory structure for each node
hard to set up lots of directories with slightly different contents
difficult to maintain changes must be propagated to each directory
ClusterNFSDescription
cNFS is a patch to the standard Universal-NFS server code that “parses” file request to determine an appropriate match on the server
Examplewhen client machine foo2 asks for file /etc/hostname it gets the contents of /etc/hostname$$HOST=foo2$$
URLhttps://sourceforge.net/projects/clusternfs
ClusterNFS features
ClusterNFS allows all machines (including server) to share the root filesystem
all files are shared by default files for all clients are named filename$$CLIENT$$ files for specific client are namedfilename$$IP=xxx.xxx.xxx.xxx$$ orfilename$$HOST=host.domain.com$$
Diskless farm setup with ClusterNFS (1/2)
ClusterNFS method Server
BOOTP server ClusterNFS server single root directory for server and clients
Clients BOOTP to obtain IP TFTP or boot floppy to load kernel rootNFS to load root filesystem
Diskless farm setup with ClusterNFS (2/2)
ClusterNFS method – Advantages easy to set up
just copy (or create) the files that need to be different
easy to maintain changes to shared files are global easy to add nodes
VIRGO experiment (Jun 2001) (1/4)
VIRGO is the collaboration between Italian and French research teams, for the realization of an interferometric gravitational wave detector;
The main goal of the VIRGO project is the first direct detection of gravitational waves emitted by astrophysical sources;
Interferometric gravitational wave detectors produce a large amount of “raw” data that require a significant computing power to be analysed.
To satisfy such a strong requirement of computing power we decided to build a Linux cluster running MOSIX.
VIRGO experiment (Jun 2001) (2/4)D
ata
Sw
itch
Inte
rnal
Sw
itch
SM 6010H 18 GB
SM 6010H 18 GB
SM 6010H 18 GB
SM 6010H 18 GB
SM 6010H 18 GB
SM 6010H 18 GB
SM 6010H 18 GB
SM 6010H 18 GB
SM 6010H 18 GB
SM 6010H 18 GB
SM 6010H 18 GB
SM 6010H 18 GB
Alpha Server4100
144 GB
VIRGO Lab Switch LAN Hardware
Farm nodesSuperMicro6010H- Dual Pentium III 1Ghz- RAM: 512Mbyte- HD: 18Gbyte- 2 Fast Ethernet interfaces- 1 Gbit Ethernet interface- (only on master-node)
StorageAlpha Server 4100HD: 144GB
VIRGO experiment (Jun 2001) (3/4)
The Linux farm has been strongly tested by executing intensive data analysis procedures, based on the Matched Filter algorithm, one of the best ways to search for known waveforms within a signal affected by background noise.
Matched Filter analysis requires a high computational cost as the method consists in an exhaustive comparison between the source signal and a set of known waveforms, called “templates”, to find possible matches. Using a large number of templates the quality of known signals identification gets better and better but a great amount of floating points operations has to be performed.
Running Matched Filter test procedures on the MOSIX cluster have shown a progressive reduction of execution times, due to a high scalability of the computing nodes and an efficient dynamic load distribution;
VIRGO experiment (Jun 2001) (4/4)
0,00
5,00
10,00
15,00
20,00
25,00
30,00
1 4 8 12 16 20 24
Number of processors
spee
d-u
p
measured speed-up
theoretical speed-up
The increase of computing speed respect to the number of processors doesn’t follow an exactly linear curve; this is mainly due to the growth of communication time, spent by the computing nodes to transmit data over the local area network.
speed-up of repeated Matched Filter executions
ARGO experiment (Jan 2002) (1/3)
The aim of the ARGO-YBJ experiment is to study cosmic rays, mainly cosmic gamma-radiation, at an energy threshold of ~100 GeV, by means of the detection of small size air showers.
This goal will be achieved by operating a full coverage array in the Yangbajing Laboratory (Tibet, P.R. China) at 4300m a.s.l. As we have seen for the Virgo experiment, the analysis of data produced by Argo requires a significant amount of computing power. To satisfy this requirement we decided to implement an OpenMOSIX cluster.
ARGO experiment (Jan 2002) (2/3)
currently Argo researchers are using a small Linux farm, located in Naples, constituted by:
5 machines (dual 1Ghz Pentium III with 1 Gbyte RAM) running RedHat 7.2 + openmosix 2.4.13.
1 file server with 1 Tbyte of disk space
ARGO experiment (Jan 2002) (3/3)
At this time the Argo OpenMOSIX farm is mainly used to run Monte Carlo simulations using “Corsika”, a Fortran application developed to simulate and analyse extensive air showers.
The farm is also used to run other applications such as GEANT to simulate the behaviour of the Argo detector.
The OpenMOSIX farm is responding very well to the researchers’ computing requirements and we already decided to upgrade the cluster in the near future, adding more computing nodes and starting the analysis of real data produced by Argo.
Currently ARGO researchers in Naples have produced 212 Gbytes of simulated data with this OpenMOSIX cluster
Conclusions
the most noticeable features of OpenMOSIX are its load-balancing and process migration algorithms, which implies that users need not have knowledge of the current state of the nodes
this is most useful in time-sharing, multi-user environments, where users do not have means (and usually are not interested) in the status (e.g. load of the nodes)
parallel application can be executed by forking many processes, just like in an SMP, where OpenMOSIX continuously attempts to optimize the resource allocation