How to make PC Cluster Systems?

How to make PC Cluster Systems?

Tomo Hiroyasu Doshisha University Kyoto Japan [email protected]

ClusterCluster

A Cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand alone/complete computers cooperatively working together as a single, integrated computing resource.

clus·ter n. 1. A group of the same or similar elements gathered or occur

ring closely together; a bunch: “She held out her hand, a small tight cluster of fingers” (Anne Tyler).

2. Linguistics. Two or more successive consonants in a word, as cl and st in the word cluster.

Why Parallel Processing?

Evolutionary Computation

FeaturesIt simulates the mechanism of creatures’ heredity and evolution. It can apply to several types of problems.It needs a huge computational costs.

There are several individuals.Tasks can be divided into sub tasks.

High Performance Computing

Name Rmax(Gflops)4938

9632

2144

6144

1336

# Proc

8192

2379

5808

1608

1417

ASCI White

ASCI Red2

Top５００Top５００http://www.top500.org

SP Power III5

ASCI Blue4

ASCI Blue Pacific3

1

Ranking

Parallel Computers

Commodity HardwareCPU

Pentium

AlphaPower etc.

NetworkingInternet

Lan

Wan

Gigabit

cable less

etc.

PCs + Networking

PC Clusters

Why PC Cluster?Why PC Cluster?

hardwareCommodity Off-the-shelf

SoftwareOpen sourceFree ware

PeoplewareUniversity students and staffLab nerds

High ability

Low Cost

Easy to setup

Easy to use

Possession

Name Rmax(Gflops)237

232.6

143.3

96.2

64.7

# Proc

512

580

528

196

132

Los Lobos

CPlant Cluster84

Top５００Top５００http://www.top500.org

SCore II/PIII 800 MHz396

Kepler PIII 650 MHz215

CLIC PIII 800 MHz126

60

Ranking

Contents of this tutorial

Concept of PC ClustersSmall ClusterAdvanced Cluster

HardwareSoftware

Books, Web sites, …Conclusions

What is cluster computing systems?

Beowulf Cluster

A Beowulf is a collection of personal computers (PCs) interconnected by widely available networking running any one of several open-source Unix-like operating systems.Some Linux clusters are built for reliability instead of speed. These are not Beowulfs. The Beowulf Project was started by Donald Becker when he moved to CESDIS in early 1994. CESDIS was located at NASA's Goddard Space Flight Center, and was operated for NASA by USRA.

http://beowulf.org/

AvalonAvalon

Los Alamos NationalLaboratory

Alpha(140) ＋ MyrinetBeowulfFirst Beowulf in the ranking of Top 500

http://cnls.lanl.gov/Frames/avalon-a.html

The Berkeley NOW project

The Berkeley NOW project is building system support for using a network of workstations (NOW) to act as a distributed supercomputer on a building-wide scale. April 30, 1997: NOW makes LINPACK Top 500!June 15, 1998: NOW Retreat Finale

http://now.cs.berkeley.edu/

Cplant ClusterCplant Cluster

Sandia National LaboratoryAlpha(580) + Myrinet

http://www.cs.sandia.gov/cplant/

RWCP ClusterRWCP Cluster

Japanese typical clusterScore, Open MPMyrinet

http://pdswww.rwcp.or.jp/

Doshisha Cluster

Pentium III 0.8G (256) + Fast EthernetPentium III 1.0 G (2*64) + Myrinet 2000

http://www.is.doshisha.ac.jp/cluster/index.html

Let’s start to build simple Cluster system !!

Simple Cluster

$10000

8nodes + gateway(file server)Fast EthernetSwitching Hub

What do we need?Hardware

CPUmemorymotherboardhard disccasenetwork cardcablehub

Normal PCs

Classification of Parallel Computers

Classification of Parallel Computers

What do we need?Software

OStoolsEditorCompilerParallel Library

Message passing

Message Passing Libraries

MPI (Message Passing Interface)

PVM (Parallel Virtual Machine)

PVM was developed at Oak Ridge National Laboratory and the University of Tennessee.

MPI is an API of message passing.1992: MPI forum1994 MPI 11887 MPI 2

http://www.epm.ornl.gov/pvm/pvm_home.html

http://www-unix.mcs.anl.gov/mpi/index.html

Implementations of MPI

Free ImplementationMPICH ： LAM ：WMPI ： Windows 95 ， NTCHIMP/MPIMPI Light

Bender ImplementationImplementations of parallel computersMPI/PRO ：

Procedure of constructing clusters

Prepare several PCs

Connected PCs

Install OS and tools

Install developing tools and parallel library

Installing MPICH/LAM

# rpm –ivh lam-6.3.3b28-1.i386.rpm

# rpm –ivh mpich-1.2.0-5.i386.rpm

# dpkg –i lam2_6.3.2-3.deb# dpkg –i mpich_1.1.2-11.deb# apt-get install lam2# apt-get install mpich

Parallel programming (MPI)

Massive parallel computer

PC-Cluster

user

gateway

JobsTasks

InitializationCommunicatorAcquiring number of process

Acquiring rank

Termination

Programming style sheet

# include “mpi.h”

int main( int argc, char **argv )

{

MPI_Init(&argc, &argv ) ;

MPI_Comm_size( …… ) ；MPI_Comm_rank( …… ) ;

/* parallel procedure */

MPI_Finalize( ) ;

return 0 ;

}

One by one communication

Group communication

Communications

Process A Process BReceive/send data

Receive/send data


[Sending] MPI_Send( void *buf, int count, MPI_Datat

ype datatype, int dest, int tag, MPI_Comm comm)void *buf ： Sending buffer starting address (IN)

int count ： Number of Data (IN)

MPI_ Datatype datatype ： data type (IN)

int dest ： receiving point (IN)

int tag ： message tag (IN)

MPI_Comm comm ： communicator(IN)

[Receiving]MPI_Recv( void *buf, int count, MPI_Datatyp

e datatype, int source, int tag, MPI_Comm comm, MPI_Status status)

void *buf ： Receiving buffer starting address (OUT)

int source ： sending point (IN)

int tag ： Message tag (IN)

MPI_Status *status ： Status (OUT)


~Hello.c~#include <stdio.h>#include "mpi.h"void main(int argc,char *argv[]){ int myid,procs,src,dest,tag=1000,count; char inmsg[10],outmsg[]="hello"; MPI_Status stat; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD,&myid); count=sizeof(outmsg)/sizeof(char); if(myid == 0){ src = 1; dest = 1; MPI_Send(&outmsg,count,MPI_CHAR,dest,tag,MPI_COMM_WORLD); MPI_Recv(&inmsg,count,MPI_CHAR,src,tag,MPI_COMM_WORLD,&stat); printf("%s from rank %d\n",&inmsg,src); }else{ src = 0; dest = 0; MPI_Recv(&inmsg,count,MPI_CHAR,src,tag,MPI_COMM_WORLD,&stat); MPI_Send(&outmsg,count,MPI_CHAR,dest,tag,MPI_COMM_WORLD); printf("%s from rank %d\n",&inmsg,src); } MPI_Finalize(); }


MPI_Recv(&inmsg,count,MPI_CHAR,src, tag,MPI_COMM_WORLD,&stat); MPI_Send(&outmsg,count,MPI_CHAR,dest, tag,MPI_COMM_WORLD);

MPI_Sendrecv(&outmsg,count,MPI_CHAR,dest, tag,&inmsg,count,MPI_CHAR,src, tag,MPI_COMM_WORLD,&stat);

0

0.5

1

1.5

2

2.5

3

3.5

4

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1x

y

Calculation of PI (approximation)

1

0 21

4dx

x

Integral calculus is divided in to sub sections.

Each subsection is allotted to processors.

Results of calculation are assembled.

-Parallel conversion-

Group communication

BroadcastMPI_Bcast( void *buf, int count, MPI_Datatype

datatype, int root, MPI_Comm comm )

Data

Rank of sending point

• Communication and operation (reduce) MPI_Reduce( void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm comm ) Operation handle

Operation

Group Communication

Rank of receiving point

MPI_SUM, MPI_MAX, MPI_MIN, MPI_PROD

Approximation of PI Programming flow

More Cluster systems !!

CPU

Intel Pentium III, IVAMD Athlon Transmeta Crusoe

Hardware

http://www.intel.com/

http://www.amd.com/

http://www.transmeta.com/

Network

GigabitWake On LAN

Hardware

EthernetGigabit EthernetMyrinetQsNetGiganetSCI

AtollVIAInfinband

Hard disc

SCSIIDERaidDiskless Cluster

Hardware

http://www.linuxdoc.org/HOWTO/Diskless-HOWTO.html

Case

Rack

Hardware

Box

inexpensive

compact

maintenance

Software Software

OS

Linux Kernels

Open source networkFree ware

The /proc file systemLoadable kernel modulesVirtual consolesPackage management

Features

OS

Linux Kernels

Linux Distributions

Red Hat www.redhat.comDebian GNU/Linux www.debian.orgS.u.S.E. www.suse.comSlackware www.slackware.org

http://www.kernel.org/

Administration software

Administration software

NFS （ Network File System)NIS (Network Information System)NTP (Network Time Protocol)

server client

client

client

Resource Management and SchedulingResource Management and Scheduling

Process distributionLoad balanceJob scheduling of multiple tasks

CONDOR http://www.cs.wisc.edu/condor/DQS http://www.scri.fsu.edu/~pasko/dqs.htmlLSF http://www.platform.com/index.htmlThe Sun Grid Engine http://www.sun.com/software/gridware/

Tools for Program Development


Editor Emacs

Language C, C++, Fortran, Java

CompilerGNU http://www.gnu.org/

NAG http://www.nag.co.uk

PGI http://www.pgroup.com/

VAST http://www.psrv.com/

Absoft http://www.absoft.com/

Fujitsu http://www.fqs.co.jp/fort-c/

Intel http://developer.intel.com/software/products/compilers/index.htm



MakeCVS Debugger

GdbTotal View http://www.etnus.com

Free MPI Implementations

Lam

http://www-unix.mcs.anl.gov/mpi/index.htmlEasy to useHigh portability

for UNIX, NT/Win, Globus

mpich

http://www.lam-mpi.org/High availability

LAM (6.3.2)

0

10

20

30

40

50

60

0 20 40 60

Number of Process

Spe

edup

1X5X10X50X100X500X

MPICH (1.2.1)

0

10

20

30

40

50

60

0 20 40 60

Number of Process

Spe

edup

1X5X10X50X100X500X

# node 32 ,2Processor type Pentium III 700MHz

Memory 128 Mbytes OS Linux 2.2.16

Network Fast Ethernet ， TCP/IP

Switching HUB

MPICH VS LAM （ SMP)

DGAGcc(2.95.3), mpicc-O2 –funroll - loops

LAM (6.4-a3)

0

2

4

6

8

0 10 20 30

Number of Process

Spe

edup

1X5X10X50X100X500X

MPICH (1.2.0)

0

2

4

6

8

0 10 20 30

Number of Process

Spe

edup

# node 8processor Pentium 850MHzⅢmemory 256 Mbytes

OS Linux 2.2.17Network Fast Ethernet ， TCP/IP

Switching HUB

DGAGcc(2.95.3), mpicc-O2 –funroll - loops

MPICH VS LAM (# process)

ProfilerProfiler

MPE (MPICH)Paradyn http://www.cs.wisc.edu/paradyn/

Vampierhttp://www.pallas.de/pages/vampir.htm

Message passing library for WinMessage passing library for Win

PVMPVM3.4WPVM

MPImpichWMPI(Critical Software)MPICH/NT(Mississippi State Univ.)MPI Pro(MPI Software Technology)

Cluster Distribution

FAI http://www.informatik.uni-koeln.de/fai/

Alinka http://www.alinka.com/

Mosix http://www.mosix.cs.huji.ac.il/

Bproc http://www.beowulf.org/software/bproc.html

Scyld http://www.scyld.com/

Scorehttp://pdswww.rwcp.or.jp/dist/score/html/index.html

Math LibraryPhiPac from BerkeleyFFTW from MIT www.fftw.orgAtlas

Automatic Tuned Linear Algebra softwarewww.netlib.org/atlas/

ATLAS is an adaptive software architecture and faster than all other portable BLAS implementations and it is comparable with machine specific libraries provided by the vender.

Math Library

PETScPETSc is a large stuite of data structures and routin

es for both uni and parallel processor scientific computing.

http://www-fp.mcs.anl.gov/petsc/

Parallel Genetic Algorithms

Models of Parallel GAs

Master Slave (Micro grained )

Cellular (Fine grained)

Distributed GAs

(Island, Coarse grained)

Master Slave model

a) delivers each individual to slave

Master node

client client client

evaluate

client client client client client client

b) returns the value as soon as finishes calculation

c) sends non-evaluated individual from master

crossover mutation evaluation selection

evaluate evaluate

Cellular GAs

Distributed Genetic Algorithms(Island GAs)

subpopulation

migration

Searching Ability of DGAs

Books and Web sites

BooksBooks

“Building Linux Clusters”“How to Build Beowulf”“High Performance Cluster Computing”

Web sitesWeb sitesIEEE Computer Society Task Force on Cluster Computinghttp://www.ieeetfcc.org/

White Paper http://www.dcs.port.ac.uk/~mab/tfcc/WhitePaper/

Cluster top 500http://clusters.top500.org/

Beowulf Projecthttp://www.beowulf.org/

Beowulf Under Groundhttp://www.beowulf-underground.org/

In this tutorial….

Concept of cluster systemHow to built systemsParallel Genetic Algorithms

SSI(Single System Image)

SSI(Single System Image)

Entry pointFile directoryControl pointVirtual NetworkMemory SpaceJob ManagerUser InterfaceMisc

Global Computing (GRID)

Internet

There are several types of computers

Powerful calculation resources

ex. SETI@home

Project rc5

From global to space computingFrom global to space computing

Distributed ComputingDistributed Computing

Date post:	30-Dec-2015
Category:	Documents
Upload:	kenneth-roach
View:	36 times
Download:	1 times

How to make PC Cluster Systems?

Documents