+ All Categories
Home > Documents > Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

Date post: 06-Apr-2018
Category:
Upload: maxnamew
View: 219 times
Download: 0 times
Share this document with a friend
70
eUROPEAN nETWORK for aDVANCED cOMPUTING  t ECHNOLOGY for sCIENCE DISSE MI NATIO N RE PORT Compiled by Stavros C. Farantos (FORTH) Institute of Electronic Structure and Laser Foundation for Research and Technology, Hellas An d Departme nt of Chemistry, University of Crete 1  
Transcript

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 1/70

eUROPEAN nETWORK for aDVANCED cOMPUTING

 tECHNOLOGY for sCIENCE

DISSEMINATION REPORT

Compiled by

Stavros C. Farantos (FORTH)

Institute of Electronic Structure and Laser

Foundation for Research and Technology, Hellas

And

Departme nt of Chemistry, University of Crete

1

 

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 2/70

Table of Contents

1. INTRODUCTION

2. ENACTS- Disseminat ion

3. WHAT HAS BEEN ACCOMPLISHED

4. CONCLUSIONS FOR THE FUTURE

APPENDIX I

CECAM- ENACTS Worksho p 2003

CECAM- ENACTS Worksho p 2004

APPENDIX II

OVERVIEW OF ENACTS STUDIES

Report on Grid Service Requirements (pp. 326)

Report on High Performance Computing Development for the Next Decade, and

its Implications for Molecular Modelling Applications (pp. 103)

Report on Grid Enabling Technologies (pp. 76)

Report on Data Management in HPC (pp. 146)

Report on Distance Learning and Support (pp. 56)

Report on Software Reusability and Efficiency (pp. 81)

Report on Grid Metacenter Demonstrator: Demonstratin g a European Metacentre

(pp. 41)

Report on Survey of Users' Needs (pp. 69)

APPENDIX III

Article on Distance Learning and Support by ICCC

APPENDIX IV

LIST OF ACRONYMS

2

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 3/70

1. INTRODUCTION

In thi s report we present the ac tions t aken to d i ssemina te the work which has been

performed in the eight s tudies of ENACTS in the t ime interval of four years, 2001- 2004.

Furthermore, we col lect in an appendix the aims and the conclusions of each separate

study from their final reports. We bel ieve that this effort i s worth pursuing if we take

into account tha t the report s f rom the eight invest igat ions amount in a total of 898pages.

Pract ical ly, in the ENACTS studies all important topics of Grid technology have been

considered – computational Grids, data Grids and data manageme nt, collaborative Grids,

Grid enabling technologies , software eff iciency and reusabil ity, dis tance learning and

support , plus the construction of a demonstrator for a future European Metacentre. The

current middleware and sof tware, which deploy and support a Grid, are reviewed, and in

the report s the reader can find the resul t s and ana lysi s of ques t ionnaires addressed to

leadi ng har dware s peci ali st s and use rs fr om all compu ta ti ona l sci ences. Thes e

ques tionnai res s ta te the present s ta tus of Grid comput ing and mos t impor tant ly they

reveal the future trends.

The purpose of thi s report is to h ighl ight the main achievements of the work done int he se four yea rs and we s ha ll tr y to fo re see the f ut ur e im plicat ions of t he rapi dl y

advancing Grid technology in computational sciences, specifically in molecular sciences.

The importance of the la tt e r s tems in the emphas is tha t has been given in the twenty

first century for studying complex physical systems from first principle, i.e. considering

the atomist ic composi tion of mat ter and the basic laws of physics.

 

For completeness the following Section remind us f rom where we star ted by referr ing to

the main object ives of ENACTS project, and par t icular ly to the disseminat ion tasks. In

Sect ion 3 we summarize the act ions taken in the disseminat ion project and in the finale

Sectio n so me ge ne ral con clu si on s are dr aw n with th e fut ur e imp lic at io ns of  

computational Grids in Computational Sciences. In Appendix I we tabulate the programs

of the two CECAM-ENACTS workshops held in 2003 and 2004 respectively, in Lyon. In

Appendix II we compile the object ives and the resul ts of each s tudy extracted from thesectoral report of each project. Appendix III contains the art icle on Distance Learning

and Support written by ICCC, which will be presented in EDEN (European Distance and E-

learning Network, http: / /www.eden- onl ine.org/   ), June 20- 23, 2005, Helsinki, Finland.

Finally, Appendix IV lists the meaning of the acronyms widely used in the Grid li terature.

 

2. ENACTS- Dissemination

ENACTS is a Co- operat ion Network in the "Improving Human Potential Access to

Research Infrastructures" Programme.

This Infrast ructure Co- operat ion Network brings together High Performance Computing(HPC) Large Scale Facilit ies (LSF) funded by the DGXII's IHP programme and key user

groups. The aim is to evaluate future t rends in the way that computat ional science wil l

be performed and the pan- European implicat ions. As par t of the Network's remit, i t wil l

run a Round Table to monitor and advise the operation of the four IHP LSFs in this area.

This coopera tion network fo llows on f rom the successfu l Framework IV Concer ted

Action (DIRECT: ERBFMECT970094) and brings together many of the key players from

3

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 4/70

around Europe who offer a rich diversity of High Performance Computing (HPC) systems

and services . In ENACTS, our s t rategy involves close co- operat ion at a pan- European

level - to review service provision and dis t il best- practice, to monitor users ' changing

requirements for value- added services , and to t rack technological advances. In HPC the

key dev elo pm en t s are in the are a of Grid co mp ut in g an d are drive n by large US

programmes . In Europe we urgently need to evalua te the s ta tus and likely impact s of  

these technologies in order to move us towards our goal of European Grid computing, a"virtual infrastructur e" - where each researcher, regardless of nationality or geographical

location, has access to the best resources and can conduct col laborat ive research with

top quality scientific and technological support.

ENACTS provides participants a co- operative structure within which to review the impact

of Grid computing technologies , enabling them to formulate a s t rategy for increasing the

quantity and quality of access provided.

The scope of our network is computation al science: the HPC infrastructure s which enable

it and the researchers , primar ily in the physica l sc iences, which use it. Three of the

participants (EPCC, CINECA and CESCA- CEPBA) are LSFs providing Researchers ' Access

in HPC under the HCM and TMR pr og ram m es . All wer e s uccess f ul in bi dd ing toFramework Programme V (FP V) for IHP funding to continue their programmes. In this,

they have been joined by the Parallab and the associated Bergen Computat ional Physics

Laboratory (BCPL) and all four LSFs are full partners in this network proposal and plan to

co- operat ive more closely in the Transnat ional Access programme. Between them, these

LSFs have already provided access to over 500 European researchers in a very wide range

o f dis ci pli ne s and a re t hus well placed to under s tand t he needs of academ ic and

indus tr i al r esearchers. The o ther 10 ENACTS members a re drawn f rom a range of  

European organisat ions with the aim of including representation from interested user

groups and also by centres in economically less favoured regions. Their input will ensure

tha t the Network 's s tra tegy i s guided by users ' needs and re levant to smaller s ta r t - up

centres and to larger more established facilit ies.

A l is t of par t icipants together with their role and ski lls is given in Table 1, whils t theirgeographical distribution is il lustrated in Figure 1.

Table 1: ENACTS participants by role and skills

e Role Skills/Interests

 EPCC  IHP- LSF Particle physics, materials science

 ICCC Ltd  User Optimisation techniques, control engineering

UNI- C LSF Statistical compu ting, bioinforma tics, multimedia

CSC  User Meteorology, chemistry

 ENS- L Society Computational condensed matter physics, chemistry FORTH User Computer science, computatio nal physics, chemistry

TCD User Particle physics, pharmaceuticals

CINECA IHP- LSF Meteorol ogy, VR

CSCISM User Molecular sciences

UiB IHP- LSF Computa tiona l physics

 PSNC User Computer science, networking

UPC  IHP- LSF Meteorology, compute r science

 NSC User Meteorology, CFD, engineering

CSCS- ETH- Zurich LSF Computer science, physics

4

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 5/70

Work Plan

The aim of ENACTS disseminatio n activity is:

• to raise awareness of the projects and to publicise i ts activities, particularly i ts

findings and results;

• to provide a mechanism to leverage efforts at a European level;

• to identify, define and undertake exploitation activities which will be beneficial to

the opera tors and users of European Large Scale fac ili ti es , a t a pan European

level.

The main focus in Year 4 will be on the following:

5

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 6/70

WP1: Workshop – CECAM-ENS-L, one of the project par tners , will organise a

workshop to di ssemina te the pro jec t resul t s to users of molecular simula t ion

techniques, one of the core user groups of computat ional science. This workshop

will be held in France.

WP2: Workshop/conference - ENACTS wil l host or par t icipate in a European

workshop or conference on Grid Computing. This act ivity wil l be organised by

ICCC.

WP3: Dissemination Report - FORTH, with input from participants, will produce a

report on the disseminat ion act ivit ies with pointers on how the project should be

taken forward e.g. RTD

3. WHAT HAS BEEN ACCOMPLISHED

WP1:

According to the Work Package 1 of the dissemination project the following actions have

been taken. CECAM (Eur opean Cen tr e for Atomi c and Molecul ar Compu ta t ions,

http://www.cecam.fr/   ) ha s or gani zed in t he year 2003 a four days wor ks hop wit h

emphasis the implicat ions of High Performance Computing to Atomist ic and Molecular

Simulat ions. Glenn Martyna, Jacob Schiotz and Gilles Zerah organized this event by

inviting key speakers, mainly computer code developers, to give a talk on

``Co mpon ent Archi tectures , Open Standards and Paral lel Algorithms for Molecular and

Atomistic Simulations on Large Grids, Supercompu ters, Workstations and Clusters’’.

The event took place in Lyon, October 13- 16, and the detailed program of this meet ing is

shown in the Appendix I.

The speakers covered topics such as Ab Ini tio sof tware for HPC (ABINI, OHMMS,

GAMESS), packages for molecular simulations of millions of atoms and the construction

of force fields (CP2K, TINKER, CHARMM, AMBER), as well as sub ject s for ma nagi ng large

amounts of data (XML), molecular informatics and the development of parallel software

with PYTHON and BSP. Talks on developing Grid computing algori thms to break  

fundamental barr iers in molecular s imulat ions were part icular ly interest ing. Among a

few exi st ing algori thms is the one used in Folding@Home, a pro ject devoted to the

protein fo ld ing problem. This Grid paradigm is based on in te rne t and volunteers who

offer their PCs to run the programs as sceensavers .

A second CECAM workshop was organized in November 16- 17, 2004, in Lyon, to collect

and eva lua te the main resu lt s of the ENACTS projec ts . Representa tives from eachworking group highlighted the main object ives and conclusions of their s tudy. It was

interest ing to see what happens after the elapse of two years s ince the complet ion of the

fi rs t s tudies . Here we present in shor t the events of thi s meeting. More detai ls can be

found in Appendix II, where we have compiled from the publ ished reports the object ives

and the conclus ions of each pro jec t. We hope tha t thi s will p rovide to the in te res ted

reader a detailed overview of the ENACTS accomplishment s.

I. Grid Service Requirements (2002)

6

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 7/70

This study was the first ENACTS project carried out by EPCC and PSNC with the aim to

establ ish a fi rm base on which the fo llowing pro jec ts could rely . The cur rent Grid

technology i s rev iewed, the t erminology is explained and, through a well s tudied

quest ionnaire, the Grid service requirements are invest igated. In their extended report ,

which amount s to 329 pages , t he aut ho r s coll ec t im port ant inf or ma ti on and d raw

conclusions for the future t rends in the Grid computat ional technology.The objec tives of the s tudy were to t rack what the user community expec ts f rom the

emerging Grid technology. The resul t s will be usefu l to infras t ruc ture operators,

middleware developers, the user community, especially f rom the physical computat ional

sciences and policy makers and funding bodies.

The well thought and des igned ques tionnai re was addressed to 85 group leaders and

examined seven areas of interest

– Key characteristics of user group

– Level of awareness

– User profile

– Application profile

– Infrastructure

– Security and services

– Future needsHere are some important outcomes:

– 71% expect the “Grid” to help tackle Grand Challenge problems

– Grid technology seen as too complicated

– Often perceived as one further level of complexity from HPC

– Security concern s – being “in control’’

Discussion: Computational Grids for solving Grand Challenge problems are sti ll missing!

II. High Perf ormance Com puti ng Deve lopment f or the Next Decade , and its

Implications for Molecular Modelling Applications (2002)

The second study of ENACTS was conducted by NSC, EPCC and CSCISM. The objectiveswere

• Determine the likely technology and economic t rends, which will prescribe the

hardware architectures of HPC systems over the next 5 to 10 years (From 2001).

– Survey of the technology roadmap for processors, memory, networking

(close ly coupled and LAN), data s torage, cus tom- bui lt solu tions , and

software paradigms and s tandards.

– Grid- related technologies.

• A case s tudy focus ing on the usefulness and implica tions of the t echnologies

discussed in the technology roadmap.

– The implications for the molecular science community.

Seven interviewees participated as individuals and not as company representatives.– Dr. Martin Walker, Compaq

– Dr. Wolfgang Mertz, Sgi

– Mr. Benoit Marchand, Sun

– Dr. Jamshed Mirza, IBM

– Dr. Burton J. Smith, Cray

– Mr. Tadashi Watanabe, NEC

– Dr. David Snelling, Fujit su

Hewlett- Packard was invited but did not participate.

7

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 8/70

Here are some of the conclusions from the interviews

• Increased mass market will provide a basis for technological development.

– Off the shelf components.

– Linux clusters for HPC.

– New / non- traditional HPC vendors.

• Future HPC systems are parallel , scalable architectures based on clustered SMPs.– COTS as well as proprietary components.

– Unix with Linux as target.

– SMP nodes with one to thousands of processors .

– Price not technology sets the upper l imit on size.

• Peak performance as wel l as price/performa nce will cont inue to fol low Moore’s

law.

– Performance strongly depends on application characteristics.

• Programming models and languages wil l be on an evolut ionary path rather than

on a revolutionary.

– MPI and OpenMP for parallel programm ing.

– Fortran, C, C++ and Java.

– Larger efforts than today will be needed to achieve high performance .

– Lack of adequate tools for sof tware developments will become a key

issue.

• We strongly believe that the Grid will play a key role in the way HPC is used.

– A layer between the user and the HPC system.

– Vendors should t ake the Grid in to cons idera t ion when developing new

systems and make old sof tware Grid enabled.

The case study: Molecular sciences

• Construct four typical model HPC systems five year from now (2001).

– Two based on projections of a proprietary hardware system (IBM SP Power4):

• A 12 M€ system for a large computing centre.

• A 3 M€ departmental system.

– Two based on projections of a COTS system (Intel Pentium 3- 4):

• A 12 M€ system for a large computing centre.

• A 3 M€ departmental system.

– Model potentials based on Force- Field (FF) parameterization s.

– Scales as O(M2) with system size M . Accuracy 20 kcal/mol.

– Excellent scaling and performanc e on massively parallel systems.

– Density Functional Theory (DFT) derived potentials.

– Scales as O(M3) or  M2logM . Accuracy 3- 7 kcal/mol.

– Needs ve ry efficient FFT, la rge m em ory and ex tr em el y effi cientinterconnects.

–  Ab initio potentials derived directly from the Schrödinger equation.

– Scales as O(M8). Accuracy 0.5 kcal/mol.

– Requires enormous disk s torage, large memory, high bandwidth and low

latency. Memory- bound, extreme data connectivity.

– Force- Field methods:

8

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 9/70

– Will cont inue to dominate for s tudies of liquid crystals , ferro- electr ic

neomatic mater ials and protein- folding. (Up to a few mil lion atoms and

time scale up to milliseconds.)

– Density Functional Theory (DFT) method s:

– Accura te elec tronic, s tructural and dynamical reac tive propert i es of  

systems containing 3000 (10000) atoms 5 (10) year from 2001.

– Will rep lace FF-methods in chemiomet r ic applica tions such as drug-design.

– Accurate simulations relevant to respiration and photosynt hesis.

– Simulat ions of nano- scale systems such as molecular engines, quantum

computat ion devices and chemical s torage of data.

–  Ab initio methods:

– Accuracy 0.2 kcal/mo l.

– Simulat ions of elementary reactions relevant to the field of atmospheric

chemistry and combust ion.

– Computat ions of c ross sec tions and rate cons tant s of sys tems of up to

10- 50 atoms.

– Studies of elementary reactions in the interstellar space

Recomendations for HPC centres

• Collaborate and team up with other centres. Share resources.

• Expertise in many different areas of system architecture.

• Involved in open source development.

• Continued emphasis on MPI and OpenMP.

• Increased performanc e expertise.

• Expertise in Grid infrastructur e and middle- ware.

• Application specific centres.

Discussion: Most of the conclusions are still valid (2004)!

III. Grid Enabling Technologie s (2002)

This study was carried out by FORTH and CSCS. The main objectives were

1. Report uses the service def ini tions f rom the Grid Service Requirements report

and evalua tes d if fe rent t echnologies and middleware which could be used to

implement these services.

2. Survey of current s tatus of computat ional grids in molecular science community –

physics, chemistry and biology.

3. Descript ion of authors’ experiences in instal ling Unicore and Globus on a local

cluster and thus implementing a small local computational grid.

9

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 10/70

In the fir st par t of the s tudy, an interne t search was car ri ed out (2002) to loca te

operat ing t es tbeds all over the wor ld , to see what kind of middleware is used and

what services they provide. Among the middleware for implementing a Grid (Globus,

Unicore, Legion) the authors invest igated in detai l the toolki t approach of Globus,

and the abs t rac t job objec t approach of Unicore, whereas among the establ ished

testbed grids they explored the USA based ALLIANCE and NASA Information Power

Grid (IPG) and the European EUROGRID.

The two main conclusions of the s tudy were

1 . Globus as a Grid deployment t echnology dominates in both sides of the Atlan tic

Ocean.

2. Most of the applica t ions concern par ti cle physics , ast ronomy, environmenta l andmeteorological projects and in less extent biological and chemical applications.

Discussion: Conclusions are still (2004) valid!

IV. Data Manageme nt in HPC (2003)

This s tudy was per formed by CINECA and TCD. The main objec tive was to gain

unde rs t and i ng of t he pr ob lem s a ss oc ia ted wit h s to ri ng , m anag ing and ext ract ing

information f rom the la rge datase t s increas ingly being genera ted by computa t iona l

scient is ts , and the technologies which could address these problems.

Principal aim of this study

•  Investigate and evaluate currently available technologies

•  Explore new standards and support their development

•  Develop good practice for users and center

•  Investigate plat form- independent and dis t ributed s torage solutions

•  Explore the use of different technologies in a coordinated fashion to a wide range

of data- intensive appl ications domains

This in depth analysis included

1. Basic technology for data management

2. Data models and scientific data libraries

3. Finding data and metadata

4. Higher level projects involving complex data management

5. Enabling technologies for higher level systems

6. Analysis of Data Management Questionnaire

7. Executive summary and recommenda tion

10

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 11/70

The conclusions are really interesting

•  60% respondents perceived a real benefi t f rom bet ter data management

•  Maj ori ty do not have access to s ophi st ic at ed or high per fo r mance s to rage

systems

• Many computational scientists are unaware of the evolving GRID technology

•  The use of data base management technologies (DBMS) is scarce

•  For indus t ry securi ty and reli ab il ity a re s tr ingent requirements for users of  

distributed data services

•  The survey identifies problems coming from interoperabili ty l imits:

- limited access to resources,

- geographic separation,

- s ite dependent access pol icies,

- security assets and policies,

- data format prol iferat ion,

- lack of bandwidth,

- coordination,- s tandardis ing data formats .

•  Resources in tegra tion problems ari sing f rom dif fe rent physica l and logical

schema:

- relational data bases,

- s tructured and semi- s tructured data bases,

- owner def ined formats , e tc .

•  Representat ion of semant ic propert ies of scienti f ic data and their uti lization in

retrieval systems.

•  The quest ionnaire shows that European researchers are some way behind in their

take up of data management solutions.•  However, many good solutions seem to ar ise f rom European projects and GRID

programmes in general.

•  The European research community expect that they will benefi t hugely from the

res ul ts of th es e pr oje ct s an d mo re sp ecific ally in th e de mo n st ra ti on of  

production based Grid projects.

V. Distance Learning and Support (2004)

The study was done by ICCC and UNI- C. The main objectives were

Part II: Various defini tions of distance learning and education. Presentat ion of basic

dis tance learning tools and models , including a discussion of their features , limitat ions,and benefi ts for prospect ive users . Al though this text is relat ively general , the authors

wan t to focus on th ree m ai n u se r gr oups whi ch have been iden ti fi ed : HPC cent re s

(service providers), Scientific Grid community (researchers and users) and Industrial Grid

community (vendors and end- users). These three target groups are short ly discussed in

the end of this section.

Part III: Result s f rom a comprehens ive survey di st ributed among 85 and under taken

among 25 major European research groups. It focuses on different aspects , such as the

needs and requirements of var ious potent ial t arge t groups and the pedagogica l and

11

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 12/70

organisa t iona l approach , which fi ts bes t wi th ident if ied t arge t groups . The survey

inc ludes a c lear ana lysi s how to ascer tain the feasibi li ty , viab il ity and relevance of  

adapt ing a proper dis tance learning s t rategy to the t raining requirements and leads into

a evaluat ion and agreement on a f ramework for col laborat ive development of suitable

distance learning based course material .

Part IV: Conclusions and recommendat ions. The purpose of the survey presented in Part

III is to gain a bet ter understanding of these key user groups’ needs and requirements inview of establ ishing a proper f ramework for dis tance learning and support. The analysis

of this survey, together with Part II, which presented general concepts and technological

issues, wil l be instrumental in establishing key recommendat ions for target groups. This

sect ion provides a summary of the dis tance learning features offered by the leading but

sti ll rather smal l groups of Grid specialis ts and users and i t wi ll make recommendat ions

on a poss ib le s trategy tha t suppor t s a success fu l uptake of Grid t echnology around

larger communities.

Discussion: Grid Service Requirements report highlighted a paradox – li t tle enthusiasm

for dis tance learning from computat ional scient ists , but we need to t rain a large number

of scientists in Grid Computing quickly.

VI. Software Reusability and Efficiency (2004)

The s tudy exami nes iss ues s ur roundi ng po rt ab il it y of s of twar e and s uggest s how

program eff iciency can be ensured across a set of heterogeneous and/or geographical ly

dis t ributed computat ional plat forms and data s torage devices. Furthermore, i t reviews

emerg ing so ft wa re technol og ie s a s thes e aff ec t t he ab ili ty of exis ti ng and new

applications to make best use of HPC facilit ies.

Here is some of their conclusions:

1. Moving large numbers of appl ication codes between heterogeneous HPC systems or to

new software environments is only feasible if wel l - accepted s tandards, languages and

tools are available.

2. An HPC centre or consort ium should therefore promote best practices in portable data

management towards i ts user community and provide appropriate advice and t raining.

3 A too l th at is con si de re d ma tu re en ou gh an d pr ovi de s ad de d valu e sh ou ld be

promoted to the user community of the HPC consortium by means of providing adequate

training material .

4. An HPC user may not rapidly move onto the grid if this requires significant changes to

his/her existing application.

5. Adapt ing to emerg ing web and gri d s tanda r ds m ay requ ir e regu la r o r frequent

changes to the grid- enabled appl ication.

6. The use of open s tandards should be promoted act ively

7. TheWeb has become the user interface of global business, and Web services now offer

a s trong foundation for sof tware in te roperabi li ty through the core open s tandards of  

XML, SOAP, WSDL, and UDDI. Models and appl ications that make use of this huge

potential are just beginning to emerge.8. Many of the new technologies require knowledge of programming languages like Java

and C++, scr ip ting languages like Perl, middleware like Globus , and XML-based

standards. Several of these are fairly new and evolving.

9. Another emerging form of grid computing is the collaborative grid. Such a grid

enabl es the creat ion of virt ua l o rgan iza ti ons in whi ch r em ot e re sear ch groups can

perform joint research and share data.

10 . The use o f gri d te chno logi es will even tual ly le ad t o m or e remote (dis tant )

collab or at io ns . It is th er ef or e es se nt ia l th at mec ha ni sm s are in plac e fo r co de

12

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 13/70

maintenance by a large research group with mult iple programmers modifying a s ingle

code.

VII. Grid Metacenter Demonstrator Demonstrating a European Metacentre (2004)

The ENACTS project would seem incomplete without having some experience, at leastsome of us, working with a real grid. This was materialized by the construction of a Data

Grid to run a project on Quantum Chromo- Dynamics. The aims of the project were

 

• Set- up a data grid across the 3- sites using QCDgrid.

• Use a genuine scientific scenario.

• Ensure all data is described using meta- data.

• Ensure the data is portable between the systems involved.

• QCDgri d was writ ten to m anage t he QCD dat a be long ing t o t he UK QCD

commu nity (UKQCD).

• The original grid consisted of 6 geographically dispersed sites (UKQCD).

• Around 5 terabytes of data.• The amount of data is expected to grow dramatical ly when QCDOC comes onl ine

later in 2004.

• QCDgrid is a layer of software written on top of the Globus Toolkit .

• Uses s ecur it y in fr as tr uct u re and bas ic gri d ope ra ti ons s uch a s da ta

transfer

• also uses more advanced features such as the replica catalogue.

Conclusions and difficulties

• Certificates

– Some certificate issuers took several weeks to issue certificates.

– Different pol icies on issuing cer t if icates , e .g. non- human users (project

accounts).– Not too many difficulties using multiple certificates.

• Moving to a heterogeneous environment.

– Instal ling of Globus 2.x is difficul t on Solar is – led to the Solar is node

being unable to submit data.

• A few mi nor p robl em s ge tti ng s ys tem s peci fic func ti ons to work (e.g. d f  

command).

• Usual minor compilation issues – did require gcc compiler.

• Globus

– This presented the biggest difficulty!

– Installation difficulties and firewall issues

• several months before a “hel loworld” job would run from any s ite

to any other.

– Migrating from GT 2.0 - > GT 2.4• Major difficulties!

• Had to re- write the replica schema.

• Remove some error- handling functionality.

VII. Survey of Users' Needs (2004)

13

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 14/70

ENACTS in i ts final year launched a quest ionnaire to find out what the European users

need for HPC and Grid computing. The final conclusions are

The crucial role of the comput ing power is the key is sue in the involvement in Grid

related issues. It is very encouraging that a large majority of the users is willing to share

knowledge, tools, data and results. The main challenges Grid computing will face will be

pol iti ca l; m or e agr eemen t s will be needed on how to u se t he dat a and re sour ce s inmultinational Grid infrastructu re.

The Grid is in deve lopmen t st age and use rs a re willi ng to cont ri bu t e to t he Grid

inf ras truc ture development , if p roper help and suppor t is provided . Virtua l tools for

educat ion and t raining could enable faster, cheaper and more environmental ly f riendly

communication. To enhance the adaptat ion of vir tual tools, these tools should be more

user- friendly.

The growing amount of data that is produced presents challenges. Standardisat ion and

interoperabi li ty efforts of the internat ional community are needed for diffusion of the

knowledge, cooperation and best exploitation of resources.

Discussion: Good agreement with what has been learnt in the previous years.

WP2:

With respect to the Work Package 2,  ICCC will present a paper on Distance Learning and

Support in the EDEN annual conference which will be held in Helsinki, June 20- 23, 2005.

The article is shown in Appendix III.

WP3:

This report.

14

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 15/70

4. CONCLUSIONS

The ENACTS project with its well planned and in depth studies has shown that GRIDS of 

computers are well developed not only at the national level but also as a European multi-

nat ional int eg ra ti on . Compu te r sci en ti st s in m os t European coun tr ie s have taken

initiatives and financial support to create a Grid all over the country to support scientific

research, education, economical act ivit ies and administ rat ion. European Union in i ts VIFramework for research and technology approved several projects for Grid computing,:

EGEE (Enabling Grids for e - Science in Europe) , DEISA (Dist ributed European

Infras t ruc ture for Supercomputing Applica tions), HPC- Europa (High Performance

Computing- Europa) and NextGrid (an architecture for the next generation Grid). The

middleware to deploy and opera te a Grid is wel l establ ished wi th GLOBUS taking a

leading role.

The key players in hardware development have given their full support in this new

technology. IBM, for example, is backing the World Community Grid

(ht tp: / /www.worldcommunitygrid.org/  ) and its human proteome folding project, which

uses dis t r ibuted computing to predict the s t ructure of all proteins found in human

genome. IBM has donated the hardware, software, technical advice, and expertise to built

the infrast ructure for the project which is run by grid- computing special is ts UnitedDevices (ht tp: / /www.ud.com/ ). IBM also contributes with the publications of books and

articles available in Redbooks (ht tp: / /www.redbokks.ibm.com/ , ` ` Introduct ion to Grid

Computing with Globus’’, ``E nabling Applications for Grid Computing with Globus’’,

` `Fun dame ntals of Grid Computing’’), thus supporting educat ion in the new computer

technologies

(ht tp: / /www.developer .ibm.com/universi ty/s tudents/contests /Scholars_Chal lenge.html ).

In the last years, tools to deploy and operate a Grid have been developed and i t seems

that the number of such packages will increase with the t ime. For example a few of these

products are: ` `The Gridbus Toolkit’’ (ht tp: / /www.gridbus.org/  ) , ASSIST (Application of 

the Grid and the coordinat ion language ASSIST), P- Grade (Parallel Grid Run- time and

Application Development Environment, ht tp: / /www.lpds.sztaki .hu/pgrade/   ).

What kind of applications do the existing Grids run?

Among the exist ing Grids (computat ional , data, col laborat ive, access- grid, educat ional)

the most successfu l a re the data - Grids wi th the bes t representat ive of them to be the

DataGrid (h ttp: / / eu - datagr id.web.cern .ch/eu- datagr id /   ) project coordinated by CERN

and funded by the European Union. The objec tive was to bui ld the infrast ruc ture for

handl ing dat a from hundreds o f t er abyt es t o pet aby te s ac ross widely dis tr ibu ted

scient if ic communit ies . Contrary to this, successful computat ional Grids s imilar to the

DataGrid have not yet appeared in spite of what the majority of the HPC users want;

`` 71% of the users asked expect the “Grid” to help them to tackle Grand Challenge

problems’’

Why such a difference? Let us see what is the s ituation in Molecular and computat ional

Material and Biological sciences. Admittedly these are among the computational sciences

the most demanding in computer resources. Numerical s imulat ions provide a means for

studying the propert ies of mater ials under equil ibr ium or non- equi librium condi tions.

Star ting f rom physica lly founded cohesion models or from an ab- ini tio descr ip t ion of  

atomic bonding, atomic scale simulat ion techniques (Molecular Dynamics and Monte

Carlo) give insight to the behaviour of the materials studied, biological or not. This can

help improving the quality of materials for technological applications and possibly to the

development of new mater ials and/or fabr ication processes. Unfortunately, the scales of  

15

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 16/70

most propert i es relevant for applica tions a re fundamenta l ly d if fe rent f rom these of  

a tomis t ic simulat ions . The cha llenge therefore for numerical simula tions cons is t s in

b ri dg ing the gaps of s pace and ti me scal es t ha t sepa ra t e a tomi c and m ac roscopi c

mater ial propert ies which may differ more than twelve orders of magni tude.

Grid (distr ibuted) computing could be the solut ion in bridging the gaps in t ime and 

 space .

As a mat ter of fact, computat ional quantum chemistry was among the f irst disciplines to

t ry to expl oi t t he idea of net work ing compu te r s. In ear ly ni ne ti es Har tr ee - Fock  

calcula t ions were di st ributed across continent s to evalua te the per formance in such

remotely dis t ributed cluster of computers . It i s interesting that the ear ly experiments

proved tha t, i) on a ne twork of computers a computa tion can be performed fas te r than

on any s ingle machine, and ii) a computat ion can be accelerated fur ther by introducing

heterogenei ty in the network. However , in the fol lowing years there was no s ignif icant

progress in this subject and the conclusion of ENACTS was that, whereas computat ional

quantum chemistry was among the pioneers in metacomputing, other areas of HPC have

taken the lead. It turns out that this pessimistic conclusion is sti ll valid.

In our days, the most successful computat ional Grid model is that based on internet by

access ing thousands of PCs via running the programs as screensavers. The fi rs t such

application was SETI@home (http: / /set iatho me.ssl .berkely.edu/  ) to analyse the data from

radio telescopes looking for signs of extraterrestrial l ife. Taking its inspiration from SETI,

a protein folding project called Folding@Home

(ht tp : / /www.s tanford .edu/group/pandegroup/ fo ld ing) has been in operat ion at Stanford

University for four years now.

1In order to use successfully a worldwide distributed computing environment of thousands or even

millions of heterogeneous processors such as the Grid, communications among these processors should be

a minimum. There are not many molecular simulations using the Grid environment such as to allow us to

point out the strategy one should adopt in writing codes for molecular applications. A practical rule is to

allow each processor to work independently, even though calculations are repeated, and only if something

important happens to one of them, then they communicate. Such a strategy is called EmbarrassingParallelization. A. F. Voter (Phys. Rev. B, 57, R13985, 1998) has shown that a linear speedup can be

obtained in calculating the first transition rate constants by integrating replicas of the system independently

on M processors. A set of parallel replicas of a single simulation can be statistically coupled to closely

approximate long trajectories. This produces nearly linear speedup over a single simulation. The

investigators with the Folding@Home have simulated the folding mechanism and accurately predicted the

folding rates of several fast folding proteins.

Another ac tive academic pro jec t is Predic tor@home (ht tp: / /predictor .scr ipps.edu/   )

whic h is ai min g to str uc tu re s pr ed ic tio n. This is a pilo t of the Berkeley Ope n

Infrast ructure for Network Computing (BOINC) – a sof tware plat form for dis t ribut ing

computing using volunteer computer resources. Predictor@home which is led by Charles

Brooks of The Scripps Research Institute, has the final goal of testing new algorithms as

part of the sixth bianuual CASP (Critical Assessmen t of Techniques for Protein StructurePrediction) . Another act ion on the human protein folding problem has been taken by the

World Co mm u ni ty Grid (WCG) (ht tp: / /www.worldcommunitygrid.org/  ) which is

s uppor t ed by Unit ed Device s (UD). UD als o r uns t he life - sci ence re sear ch hub

(http://www.grid.org/ ) to search for drug candidates to treat the smallpox virus. Some 35

mil lion molecules were vir tual ly screened aga ins t smal lpox pro te ins , and 44 s trong

t rea tment candida tes were identi fi ed and handed to the US Depar tment of Defense for

fur ther evalua tion. The WCG takes the idea a s tage fur ther, and aims to c reate the

world’s largest public computing Grid and apply i t to a number of humanitar ian projects

16

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 17/70

– of which protein folding is the fi rs t . In the ``Hu man Proteome Folding Project’’ the

program used is cal led Rosetta which computes a scor ing func tion to sor t through the

possible s tructures for a given sequence and choose the best . The computers in the Grid

try to fold the protein in different ways. This wil l be done mil lions of t imes for each

protein , and lowest scoring s tructures compared by researchers wi th the Protein Data

Bank (PDB) at Rutgers University.

The common ingredient of the above successful appl ications in Internet Grid Computingis t he compu ta t ional algo ri thm em pl oyed and which gua ran tee s ``Embarrassing

Parallelization’’. Thus, the pract ice up to now demons tra tes tha t the Grid comput ing

requires new approaches in solving scient if ic problems. Paral lelized codes wri tten for

la rge para ll el machines may be of no good use when thousands of computers should be

exploited connected by relatively slow networks.

` ` Wi th ou t th es e ne w alg or it hm s an d ne w pr og ra m m i ng p ar ad ig m s suit ab le for

 d is tr ibuted computing Computational Grids will not p lay the role and they wil l not

 fulfill computat ional scientists’ ambitions to tackle Grand Challenge problems’’.

17

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 18/70

APPENDIX I

CECAM-ENACTS

Workshop 2003

Componen t Architectures, Open Standards and Parallel Algorithms for

Molecular and Atomistic Simulations on Large Grids, Supercompu ters,Workstations and Clusters

Day 1 : Morning Chair : Glenn Martyna

9:00- 9:20 Glenn Martyna , IBM, YKT, “Welcome and Introdu ctio n”.

9:20- 10:10 Xavier Gonze , Universite Catholique de Louvain, “The ABINIT software

project”.

10:10- 10:30 Coffee Break!

10:30- 11:20 Jeongnim Kim, University of Illinois, “Development of portable electronicstructure tools on high- performance computers (OHMMS)’.

11:20- 12:10 Mark Gordon , Iowa State University, “Enabling high perfor man ce electronic

structur e theory: Models and applications (GAMESS)”.

Day 1 : Afternoon Chair : Heather Netzlo ff 

1:40- 2:30 Thijs Vlugt , Utrecht University, “Parallel Configuratio nal Bias MC”.

2:30- 2:50 Coffee Break!

2:50- 3:40 Rajiv Kalia, USC, “Multimillion atom simulation of nanoscale dynamics

and fracture”.

3:40- 4:30 Jens Jergen Mortensen , Danmarks Tekniske Univers it e t, “Real space

implementation of the projector augmented wave method in CAMPOS Atomic Simulation

Environment”.4:30- 5:20 Discussion.

Day 2 : Morning Chair : Ken Esler

9:00- 9:40 Fawzi Mohamed , CSCS Science Division, “The CP2K molecular simulatio n

package”.

9:40- 10:30 Martin Head- Gordon , UC Berkeley, “Fast meth ods for electron correlation

(QCHEM)”.

10:30- 10:50 Coffee Break!

10:50- 11:40 Ramkumar Vadali , UIUC, “A Framework for Multiple Concurrent FFTs”.

11:40- 12:20 Glenn Martyna , IBM, YKT, “Fine grained paralleliza tion of plane wave

based DFT”.

Day 2 : Afternoon Chair : Gilles Zerah

1:40- 2:30 Xavier Gonze , Universite Catholique de Louvain, “Introduct ion to XML”.

2:30- 2:50 Coffee Break!

2:50- 3:20 Peter Murray- Rust , Cambridge, “Molecular Informatics for the Grid”.

FSatom Forum : Gilles Zerah moderates

i. Introduc tion to FSatom.

ii. Summary of activities of FSatom.

iii . Report from Pseudopoten tial work group.

18

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 19/70

iv. Forum and Collective Decisions.

Day 3 : Morning Chair : Michael Shirts

9:30- 10:20 Jakob Schiotz , Danm ar ks Tekni ske Unive rs it et , “Pyt hon a s a glue in

atomicscale s imulat ions: Advanced s imulat ion methods and paral lel molecular dynamics

within the CAMPOS Atomic Simulation Environme nt”.10:20- 10:40 Coffee Break!

10:40- 11:30 Weitao Yang , Duke University, “(1) O(N) Electronic Structure Calculations

with Nonor thogonal Localized Molecular Orbital s (2) Dis tr ibuted Computa tion of  

Chemical Reaction Paths in Enzymes.”.

11:30- 12:20 Heather Netzloff , Iowa State University, “Simulating solvent effects and

liquid structur e with the effective fragment method (GAMESS)”.

Day 3 : Afternoon Chair : Philip Blood

1:40- 2:30 Konrad Hinsen , Centre de Biophysique Molecul, “High level parallel software

development with Python and BSP”.

2:30- 2:50 Coffee Break!

2:50- 3:40 Pengyu Ren,Washington University, “TINKER Polarizable Mutlipole Based

Model for Molecular Simulation”.

3:40- 4:30 Michael Crowley , Scripps, “CHARMM and AMBER: Palaverous colloquy kills

parallel efficiency. How do we cut down on the chatter?”.

4:30- 5:20 Michael Shirts , Stanford Universi ty , “Folding@Home: Grid comput ing

algorithms to break fundamental barriers in molecular simulation”.

Day 4 : Morning Chair : Ramkumar Vadali

9:00- 9:15 Philip Blood , University of Utah, “Grid computing for multiple t ime and length

scales”.

9:15- 10:05 Ken Esler , University of Ill inois, “Path Integral Monte Carlo for Systems of 

Heavy Atoms”.

10:05- 10:25 Coffee Break!

10:25- 11:15 Carlo Cavazzo ni , Cineca, “Computational Material Science Issues on Grid

and Large Clusters”.

11:15- 12:05 Mark Tuckerman , NYU, “Towards a l inear scaling approach in plane wave

based DFT: Gauge invariance and localized orbitals (PINY MD)”.

Day 4 : Afternoon Chair : Gilles Zerah

(a) Workshop Summary and Discussion.

(b) Glenn Martyna , IBM, Yorktown Heights, “Closing Remarks”.

19

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 20/70

CECAM-ENACTS

Workshop 2004

NOV 16, 2004

============

10.00 - 10.30 What is ENACTS: our objectives - our methodology

10.30 - 12.30 6 x 15 minutes talks presenting the strong points

of each of the 6 studies

12:30 - 14:00 Lunch

14.00 - 14:30 Lessons learn form the ENACTS demonstrator (feasibility)

14.00 - 15.00 Conclusions / most interesting aspects of user survey

15.00 - 15.30 Coffee break 

15.30 - 16.00 An overview of trends in HPC and Grid Computing in Europe

(policy/funding aspects as well as technological aspects)

16.00 - 18.00 Presentation from the invited speakers

===============================================================

NOV 17, 2004

============

9.00- 10.30 users presentations and general discussion (follow)

coffee break 

11.00 general discussion and conclusions

12.00

Meeting close

20

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 21/70

APPENDIX II

Overview of ENACTS Studies

PROJECT 1

Grid Service Requirements

http://www.epcc.ed.ac.uk/enacts/gridservice.pdf 

JC Desplat, Judy Hardy, Mario Antonio letti (EPCC)

Edinburgh Parallel Computing Centre

an d

Jarek Nabrzysk i, Maciej Stroinski, Norbert Meyer (PSNC)Poznan Supercomputing and Networking Centre

Study objectives

The object ive of this s tudy was to specify the level and qual ity of services users require

from a computat ional Grid. The report was wri tten in consultat ion with a representat ive

select ion of users f rom different computat ional science groups across Europe. This top

- do wn a pp ro ac h is co mp le me n te d by a bot to m - u p exa mi na ti on of th e en ab li ng

technologies . There were f ive work- packages in this project, totall ing 6 s taff months of  

effort.

This s tudy is composed of four main parts :

Part I: Presentation of the ENACTS project, i ts scope and membership.

Part II: Presentat ion of the different types of computat ional Grids, with a discussion of  

their features, l imitations, and FAQ for prospective users. Since the material presented in

this section is relat ively detailed and tackles technical considerat ions which are mostly

ir re levant to end- users, the authors expect thi s sec tion to be of part icula r in te res t to

operators of HPC systems and systems administ rators who want to f ind out more about

Grid computing. Sections specifically aimed at end- users (such as a FAQ) have also been

included in this second part. This section has been written by PSNC.

Part III: Result s f rom a comprehens ive survey under taken among 85 major European

re sear ch groups . Thi s cover ed a spec ts s uch a s awar eness and expect a ti ons of Grid

compu ti ng, gr oups and app li ca ti ons p ro fil e, work ing pr ac ti ce s, bot tl enecks and

requirements, future needs, etc. This section has been written by EPCC.Part IV: Conclusions and recommendat ions. The purpose of the survey presented in Part

III i s to gain a bet ter understanding of these key user groups’ composi t ion and working

pract ices in view of establishing their key requirements . The analysis of this survey,

together with Part II, which presented t echnica l and implementa t ion is sues, will be

ins trumenta l in estab li shing key recommendations for pol icy makers and resource

providers . This sect ion will provide a summary of the features offered by the leading-

edge Grid middleware presented in Part II, compare thi s to the users’ requi rements

established throughout Part III, and foll owing t he se , m ake recom m enda t ions on a

21

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 22/70

possible s trategy for a successful uptake of Grid technology within the user community.

This section has been written by EPCC.

Summary and conclusions

This section comprises three main parts:

1. A summary of the features offered by leading- edge Grid middleware technology and

higher level components , based on the detailed mater ial presented in Par t I I;

2. A summary of the key service requirements identi fied by the user community in the

preceding survey; and

3. A proposed s tra tegy to increase a successfu l uptake of Grid- based computa t iona l

resources within the user community. This s t rategy will lead to recommendat ions aimed

at both HPC resource providers and funding agencies.

A. Summary of features available within a Grid environment:

comparative study of different Grid models

This s tudy aims to help users of Grid systems that can exploi t heterogeneous networked

computing resources by di scovering under - u til ised remote computers and deploying

 jobs t o them . As m any of t he syst em s whi ch ar e cu rr en tl y avail ab le tend to have

signif icantly s imilar functions, a s tudy of the relat ive suitabil ity of such systems for

being extended and used for this purpose is a logical f irs t s tep. This report presents the

f indings of such a s tudy that considered the systems presented in the previous section.

These are LSF, Globus, LEGION, Condor, Unicore and Entropia.

The main features of the systems analysed are divided into four major categories .

System, f lexibi li ty and interface: See Table 1. Thes e dat a lead to t he foll owingconclusions.

The majori ty of the systems are available in public domain. The only system that

does not have a public domain version is LSF.

Documenta t ion is exce llen t for LSF, and good for the majori ty of the o ther

systems. The majority of the systems are constant ly being updated and extended

with new capabilities.

The majority of the systems operate under several versions of UNIX, including

Linux.

Windows NT versions are under development for Condor and Globus.

Impact on the owners of computational nodes is typically smal l and configurable

by the users themselves.The majority of the systems have both a graphical and a command- line interface.

The only system without a graphical interface is Legion.

Scheduling and Resource Management: See Table 2. We draw the following conclusions.

Batch jobs are supported by all systems, whether central ised or dis tr ibuted.

Interactive jobs are not suppor ted by Condor and Unicore.

22

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 23/70

Par all el jobs ar e fu ll y s uppor t ed only by LSF, alt hough limi ted s uppor t is

provided by Globus and Legion.

Resour ce reques t s fr om use rs ar e s uppor t ed by all t he j ob m anagemen t

systems except Legion. In Globus, a special specification language, RSL, is used to

specify the job requirements.

The most flexible scheduling is provided by LSF. In Condor and Entropia, one

of seve ral pr ed efi ne d sc he du li ng po licie s ca n be select ed by a glo bal

administrator.

All central ised job management systems support job priori t ies assigned by users .

All central ised job management systems and Globus support job monitor ing.

Accounting is available in all centralised job managemen t systems.

 Efficiency and utilisation: See Table 3. These data lead to the following conclusions.

Stage- in and s tage- out a re suppor ted by LSF, Globus and Legion, and to a

limited extent by Condor.

Time- sharing of jobs is supported by LSF, Globus, Legion, and Unicore.

Time- sharing of processes is available in Condor.

Process migration is supported by all the central ised job management systems.

Dynamic load balancing is supported only by LSF.

Scalabili ty is high for LSF and all distributed job management systems.

Scalability in Condor is limited by a central scheduler.

 Efficiency and utilisation

 Fault tolerance and security: See Table 4. We draw the following conclusions.

System level checkpointing is supported only by LSF and for only a small subset

of the operat ing systems. Run- time library checkpoin ting is suppor ted by LSF

and Condor. User level checkpointing is supported by LSF and Condor.

Fault tolerance is the best in LSF and Condor.

Authentication based on Kerberos is provided in LSF, Globus, Legion and Unicore.

Strong authentication based on SSL and X.509 certificates is provided in Condor,

Globus, and Unicore. Addit ional ly , Globus suppor t s hardware authent ica tion

tokens.

Strong authorisat ion is available in LSF, Condor and Legion, DCE.

Encryption is clearly documented in Globus, Legion, and Unicore, but may also be

present in several other systems using Kerberos and SSL.

 Fault tolerance and security

B. Summary of key user requirements within a Grid infrastructure

Advocat ing the use of Grid computing by the user community invariab ly yie lds one

reaction:

“Why? Can you demonstrate how the Grid would benefit my work?”This has to const i tute the core preoccupat ion for people involved in the uptake of Grid

computing before any of the technical aspects presented in Part III are considered.

The survey presented in Part III has provided a relatively detailed account of user groups

character is t ics (such as group s ize and scope, level of co- ordinat ion and composi t ion,

e tc .), o f the ir expec tat ions of Grid computing and of the ir cur rent requi rements. The

most remarkable finding was how much these factors varied across research a reas. A

summary of these characteristic features is given in Table 25, page 195.

23

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 24/70

This d iversi ty across the user spect rum sugges t s tha t there will be no such thing as an

ideal Grid infrast ructure. Grid user service requirements must be considered from two

different angles to get a bet ter understanding of how best to address them:

1. Consider users’ aims and expectat ions: What do they plan to use computat ional Grids

for? What a re the expected benefi ts for the user communi ty f rom the ir own point of  

view?2. Consider users’ requirements: What do users identi fy as the key features and services

within a Grid environment, and why?

What do users expect from a computat ional Grid?

One of the most reassuring aspects of the survey was the high level of support f rom the

user community for the concept of Grid computing. Indeed, a s tunning 91% of the group

leaders par t icipat ing in this survey bel ieve that their group wil l benefit f rom accessing

computat ional Grids. Beyond this though, i t i s important to highl ight exactly what users

expect from this new technology.

Summarising the discussion from “Perception and benefits of computation al Grids,” page

130, in order of importance users expect Grid computing to:

1. provide more cycles;

2. enable the study of larger scale problems;

3. increase opportunities to share application codes and packages;

4. provide access to sophisticated visualisation and data analysis tools; and

5. share distributed data.

Which services will user require within a Grid environment?

“Section 2: Awareness,” page 125 and “Sec tion 6: Secur ity and serv ices ,” page 174

considered the key features users would expect to find wi th in a Grid environment . A

summary of these f indings is presented in Table 25, page 195 under the categories ‘most 

important factors in a Grid environment’ an d ‘most important added services in a Grid 

environment’.

A detai led d iscussion of these findings , and in par ti cu la r an ana lysi s of d iscipl ine

speci fi c var ia tion and requi rements i s avai lable in Part III. The core requirements

essent ial ly consis t of increased levels of conventional hardware resources such as faster

CPUs, m or e p rocess o rs o r g reat er t ot al m em or y. Whils t t he need fo r great er to ta l

memory and access to more PEs i s a clear indicat ion of the users’ in ten tion to t ackle

la rger or more complex p robl em s (see IV.2.2.2), t he need fo r fa st er CPUs als o

encompas ses the issues raised in IV.2.2.1, i .e. more cycles and increased throughput.

I t is not surpr ising that these were the most highly- rated factors , as these are the ones

typical ly put forward dur ing procurements for HPC sys tems . However , they a re not

linked to Grid computing per se. In this respect, the following categories are much more

relevant i.e. the need for (in decreasing order of importance):

1. longer run- times;

2. ease of use;3. better throughput;

4. best machine for the job; and

5. access to resources not locally available.

In addit ion to these requi rements, users a lso regarded the fo llowing services to be

important within a Grid environment (again, in decreasing order of importance):

1. availability and reliability;

24

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 25/70

2. ease of use;

3. support (e.g., Technical queries);

4. security of code and data;

5. guaranteed turn- around time; and

6. remote visualisation tools.

 Miscellaneous considerations

A num ber of ot he r im port ant iss ues al so emerged t hr oughou t t he quest ionna ir e.

Although they did not fi t into any the categories of sections IV.2.2 and IV.2.3, these

issues have nonetheless a definite relevance to this study. They are:

1. The difference in awareness level across research areas is quite not iceable and will

have implica tions in t erms of s tra tegy through funding and d issemina t ion ac tivi ti es .

Different research groups will adopt the t echnology ear li er than o thers , based on the

following combination of factors:

th e ben ef it s th ey ex pec t fro m co mp ut at io na l Grid s (see “Sectio n 2:

Awareness,” page 125);the type of services they wil l require (see “Section 2: Awareness ,” page 125,

and “Section 6: Security and services,” page 174);

their structure, level of co- ordination, and existing collaborations at a national

and internat ional level (see “Section 1: Star t ,” page 121, and “Sect ion 3: User

profile,” page 146);

the type of resources they own and their wil lingness to integrate them within a

Grid (see “Section 5: Infrast ructure,” page 170 and “Section 6: Securi ty and

services,” page 174); and

their existing links with HPC centres.

2. There is a much greater diversi ty of working practices , s ize, s tructure col laborat ive

links and applica tions’ charac te r is t ics and requi rements across research a reas than

within any single a rea (see “Sec tion 3: User profi le ,” page 146). This sugges t s the

suitabili ty of topical Grids for large, well co- ordinated groups involved in international

collaborations.

3 . The type of a rchi tecture deemed essent ial wi th in a Grid infrast ruc ture al so varies

considerably across research areas (see Table 25, page 195 and “Most wanted resources

in a Grid environment ,” page 142). In particular, the reliance on large MPP systems has

to be bet ter understood as such systems will progressively be superseded by cluster of  

shared memory systems.

4. The preference for a particular Grid concept varies not only across disciplines but also

depends on the level of experience of the users concerned (see Figure 35, page 150 and

“Features of mainstream research areas ,” page 147). Thus, portals are clear ly favoured

by the less Grid li terate whi ls t toolki ts (and in par ticular Globus) are put forward as the

be st sol uti on by t ho se at th e ot he r en d of the sp ec tr um . This s ug ge st s t ha t the

devel opm ent of por ta ls for m ai ns t ream app licat ion codes and packages (s uch a s

Gauss ian) is t he m os t effect ive s tr at egy fo r t he in tr oduci ng t he concep t o f Grid

computing to groups from the chemistry and bioinformatics areas (see “Section 3: User

profile,” page 146). It is l ikely that the additional features of portals, such as the abili ty

to

generate configurat ion f iles and perform sani ty checks on these lat ter , will a lso prove

important to these communit ies .

25

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 26/70

5. The preferred scope for computational Grids (local, national, topical, etc.) correlated

direct ly with both the type of collaborations groups were involved in and the amount of  

resources they directly control.

6 . The compos it ion or s t ruc ture of the research groups in t erm of propor tion of b lack  

box users and user developers i.e. the d is tr ibution of the ir users’ leve l of computingexpertise, suggest that many groups in physics and engineering are often making a low

re- use of their codes (see “Section 3: User profile,” page 146). These practices result in a

plethora of codes ‘in development’ which are inadequate to be integrated within a portal

o r p robl em so lv ing envi ronmen t . This was ill us tr a ted by t he high frequency of  

recompilations required between successive runs (see “Section 4: Appl ication prof ile,”

page 156). S teps should be t aken to encourage communi ty led ini ti at ives to develop

standar d skeleton codes following the examples of say, Cactus [27] or DL_POLY [98].

7 . Applica t ion bot tl enecks appear roughly uni form across applica tions a reas : CPU

performance, memory to CPU bandwidth, and memory capaci ty were reported as the top

3 bot t lenecks across all areas . However, network performance s tar ts to become an issue

for the comput ing , engineer ing , and ast rophys ics communit ies, whi ls t data s torage

capaci ty is of concern to the chemistry and ast rophysics communit ies (see “Section 4:

Applicat ion prof ile,” page 156). Note that network performance will be one of the main

bot tle ne ck s to any user who make s routin e use of com pu ta ti on al stee ri ng or

metacomput ing . Mechanisms providing job migra tion (e.g., for fau lt tolerance or

guaranteed turn around t ime) and access to dist r ibuted datasets wil l be equally affected

by poor network performance.

8. Following rece nt inves tm en ts in the natio na l and tran sn at io na l netw or k  

inf ras truc tures , the bot tl eneck for end- to- end bandwidth has now been di sp laced to

within inst itu tions and universi ti es (see “Sec tion 5: Infras t ruc ture,” page 170) . A

successful uptake of services such as remote visual isation will require upgrades to local

networks for which funding can prove difficult to secure.

9. Security is a mat ter for researchers both as users of Grids and as prospective resource

providers . Indeed, should a group decide to set up their own local Grid or even join an

exist ing Grid, it is important tha t the securi ty framework can be fully unders tood,

operated , and t rus ted by the group’s sys tems administ ra tors and suppor t t eams (see

“Section 6: Security and services ,” page 174).

10. Although i t was shown that most applicat ion codes could be made ful ly portable with

few modif icat ions , it is important to keep on promoting code por tabi li ty within the

research community (see “Section 4: Appl ication prof ile,” page 156) . The same remark 

also applies to data interoperabili ty.

.

PROJECT 2

High Performance Computing Development for

the Next Decade, and its

26

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 27/70

Implications for Molecular Modelling Applications

http://www.epcc.ed.ac.uk/enacts/hpcroad.pdf 

Jan Fagerstroem, Torgny Faxen, Peter Munger, and Anders Ynnerman (NSC)

National Supercompute r Centre

J-C Desplat (EPCC)

Edingurgh Parallel Computi ng Centre

Filippo De Angelis, Francesco Mercuri, Marzio Rosi, Antonio Sgamellotti, Francesco

Tarantelli and Giuseppe Vitillaro (CSCISM)

Center for High Performance Computing in Molecular Sciences

Study objectives

This report present s the resul t s obtained within the HPC Technology Roadmap s tudy

within the ENACTS project. The project started in February 2001, and the present report

present s the resul t s f rom the second s tudy wi th in ENACTS: The HPC Technology

Roadmap.

The HPC Technology Roadmap s tudy is under taken by the Nat ional Supercomputer

Centre (NSC) in Link¨oping, Sweden in collaboration with Center for High Performance

Computing in Molecular Sciences (CSCISM) in Perugia, Italy. The objective of the study is

to dete rmine the likely t echnology and economic t rends, which will p rescr ibe the

hardware architectures of HPC systems over the next 5 to 10 years , and to evaluate the

effects that this wil l have on appl ications sof tware. The work within the s tudy has been

shared between NSC and CSCISM in the following way. NSC has provided a survey of thetechnology roadmap for processors, memory, networking (closely coupled and LAN),

dat a s to rage , cust om - bu il t so lu ti ons, and so ft wa re par ad igm s and s tanda r ds. Thi s

survey was accomplished by interviews with several major HPC vendors, and is reported

in sect ion I. NSC has also coordinated the s tudy. Based on the results of the technology

roadmap survey CSCISM has provided a case s tudy focusing on the usefu lness and

implica tions of the t echnologies d iscussed in the t echnology roadmap, for the key

molecular science community. The case study is presented in section II.

Summary and conclusions

Who can look into the future? We a ll know thi s is imposs ib le but never theless it is

necessary. In order to plan our act ivit ies we need to have some model of what we think that the future brings. The interview mater ial we have at hand for this report is made by

highly acclaimed, knowledgeable representa tives from the mos t pres tigious and well

known vendors of HPC systems. It is a very interesting and exciting material to read. All

interviewed have s tressed tha t thi s is the ir own personal view of future HPC and not

necessary that of the vendor , so we have unique personal opinions of HPC future based

on experience and knowledge from rich sources, some which is only available to people

within the corporations.

27

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 28/70

Many forward looking s tatements are extrapolat ions of today’s t rend. All material f rom

the interviews are made public so it comes as no surpri se tha t vendor representat ives

can no t s ha re all o f t he ir knowl edge , e spec ia ll y m aj or alt er at ions in st ra tegi es .

Nevertheless , having access to such a complete set of in- depth interviews gives a unique

snapshot of where the industry believe it is heading.

However, i t is hard not to get lost in all the details, especially in an area where there is somuch rapid progress as in the computer industry. Everything is moving ahead at a very

rapid pace, and some areas faster than others . Below we t ry to summarise and interpret

the interviews keeping the focus on what the implicat ions will be for the end users and

the computer centres .

Summary of the interviews

1. HPC market

If we start by looking at how market is viewed, HPC today has a rather small share of the

total computer market whereas the commerc ia l (enterpri se) and consumer market s a re

much larger. This trend will continue. This also means that a majority of the investments

wil l be in non- HPC markets . There is a difference of opinion on the implicat ions of this

however. Either this market trend will force future HPC systems to be integrated directly

f rom commercial components (processors , networks or even comple te sys tems), or

proprietary components can stil l be developed cost effectively for HPC specific solutions

since what the consumer and commerc ia l market develops and invest s in can be used

also for HPC, especially the facilit ies and methods for making chips.

There is also general agreement that a majority of the investment will move (if it has not

already moved) to the consumer market , but again there are different opinions on how

this will effect future HPC system architectures . Some bel ieve this means that future

sys tems are going to based mainly on consumer market components even though thi s

might require a whole new programming paradigm, while others bel ieve i t wil l enable

new technologies that can be used in more traditional HPC environment s.

Cons tan t p ri ce pr es s ur e and t he p re sence of low budge t so lu ti ons cont inue to put

pressure on profi t margins. Vendors need to look at where they can cut costs . Software is

expensive to develop and maintain and the Open source movement is viewed as a way of  

shar ing and thus cut t ing cost . People intensive tasks such as extensive benchmarking is

another area where vendors are looking to cut cost .

2. HPC systems

Future system architecture actual ly has both s imilari t ies and large differences between

the interviewed. From a very general point of view one could actually say that all share a

view of future systems with:

Clustered SMP nodes giving a scalable parallel architectures.

The SMP no de s can be an yt hi ng fro m a sin gle pr oc es so r u p to th ou sa n ds

processors.

These systems are only limited in size by:

– available mon ey

– cost of building space and util ity (power)

– reliability

General purpose systems (but special purpose systems is not ruled out).

Non uniform memory access (but with very large variations in “NUMAness”).

28

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 29/70

Thi s is als o refl ec ted by a gene ra l agr eem en t on MPI and OpenMP as t he par al le l

programming models for the present as well as the future.

On the other hand though, underneath this very general common view almost everything

e lse can d if fe r: the processors , memory subsys tems, node in te rconnect s , compiler s,

operating systems, power consumpt ion, cooling, square foot usage etc. and it equally fairto say that we really are talking about very different computer systems.

As a complement to genera l purpose sys tems , special purpose sys tems will emerge .

These systems wil l be based on consumer product type of components , giving very high

price/performance rat io but requiring very different programming models . Init ially they

will be targeted towards niche markets with suitable applications such as l ife sciences.

In addition to vendor specific architectures, low cost cluster HPC systems (Beowulf) has

opened up a whole new sec tor for HPC. From being a rather limited solut ion it has

matured significantly over the last years and continues to evolve, particularly in the high

end. Most of the vendors claim that Beowulf is complimentary to their offering. Many

vendors now also offer their own Beowulf clusters . It i s yet unclear how this market will

evolve and who the main p layers a re going to be. It is likely however tha t Beowulf  

clusters will blend into the HPC world more and more seamlessly. Just as with traditional

HPC we also believe there will be a multitude of various HW and SW configurations for

Beowulfs.

3. Building blocks (processors , memory (bw, latency) , network, storage, graphics

etc.)

Silicon will continue to be used for chips for the next 5–10 years. Moore’s law is believed

to continue. Important to note however is that this is for the `` original’’ Moore’s law, the

doubl ing of t ransis tors every 18 months. Several quest ions can be der ived from this fact:

How are al l new t ransi s tors going to be used? How is thi s t ransla ted in to sus tained

per formance improvements? Answers to these ques tions were expressed d if fe rently

from different persons.

Num ber of p rocess o r a rchi tect u re s cont inue to dec reas e, but t he re is no gene ra l

agreement as to if this means death to special purpose processors. High volume general

purpose processors has to be very general purpose so performance for HPC appl ications

will no t be opt imal. On the o ther hand i s the ques tion open if the cost for developing

special purpo se processors will be prohibitive.

Memory sub sy st em s will contin ue to be a differ en tia to r betw ee n vendo rs.

Micropr oces s o r per fo r mance will conti nue to inc reas e m uch fa st er than m em ory

performance. Latency is getting to the point where the latency (in terms of clock cycles)

wil l actually s tar t to increase. Instead latency reduction techniques such as pre- fetch,

mult i - threading, deep memory hierarchies etc . will be appl ied, and this is an area where

many different techniques will be developed. Codes with regular memory patterns are

likel y to bene fi t m or e than o ther s by m any of the lat ency reduct ion t echn iques .

Vectori sa t ion is only ac tive ly pursued by two vendors but could al so be viewed as alatency reduction technique. Bandwidth is the main focus for some vendors. It is l ikely to

become one main differentiator between different systems and could replace the whole

vector /no - vector debate.

Mass s torage systems will increasingly be networked rather than direct ly attached. Disk 

densi ty wil l double every year and tape densi ty somewhat s lower. I /O rates are not likely

to keep up with capaci ty increase. No consensus if d isks will over take t apes for long

29

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 30/70

term storage. Al ternat ive mass s torage solutions are s ti ll far f rom being commercial ly

ready.

Graphics was touched on in most interviews and there is s trong consensus that increased

bandwidth and at the same t ime very l imited increase in display resolut ion will change

the way some visualisat ion is done. Centralised graphics servers wil l be used with thin

clients for display only, even at large distances.

4. Parallel programming models

No one is p redi ct ing any m aj or change in t he par all el pr og ram m ing m odel s, m os t

vendors seem s be of t he op in ion that wha teve r t he ha rdwar e looks like it m us t be

possible to program in either MPI or OpenMP. There are also some voices for co- array

For tr an and UPC bu t is ver y unli kel y t ha t t hey will be as dom inant . The hybr id

MPI+OpenMP programming model is favoured by some but i t remains very much unclear

if this is going to be widely used. HPF seems to to have a future in Japan only .

5. Programming languages

No surprises here: Fortran, C, C++ and maybe Java continue to dominate.

6. Software tools

There seems to be a potent ia l for a conf li ct when it comes to the necessary sof tware

tools for debugging and performance analysis of very large systems. Vendors are more

and more shying away from developing the necessary SW due to cost, instead direct ing

cus tomers to commerc ia l 3rd par ty sof tware or open source . At the same time, even

though the paral le l programming models a re the same across d if fe rent p la tforms , the

hardware underneath might be very different and thus requir ing system specific tools to

be able to ful ly understand and uti lise the most out of the system.

7. Pain/gai n

The pain versus gain is not going to be improved, probably get worse i f not much worse

to be able to tame the biggest and most powerful systems. Almdahl’s law is a s tern task 

mas ter , espec ia lly wi th processors counts reaching 10000’s. Maintaining very la rge

sys tems will al so requi re exceptiona l e ffor t s. Housing, powering, main ta in ing and

scheduling system with ten’s of thousands of processors is not for the faint of hear t . It i s

no coincidence that public presentat ions of new very large systems spend a s ignif icant

time talking about the building size, power required, pictures of cables etc.

8. Operating system s

Linux is the driving force around open source operat ing systems in the HPC world. No

one can afford to s tay outside of Linux. How HPC specif ic features are going to make i t

into Linux is not yet clear and it seems like the ini tial euphoria has set t led down a bi t

when vendors a re s ta rt ing to rea li se tha t working wi th the Open source communi ty isvery different f rom doing your own development . One way or the other we still believe

that Linux will be the main HPC operating system 5 years from now.

9. Price performance (peak vs sustained)

There is general agreement that price/performance wil l continue to improve at current

rates, which is approximately Moore’s law. Important to note however that this is for

peak speeds. When i t comes to sustained performance, the picture is much more vague

30

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 31/70

and most answers indicate that there will be an even s t ronger dependence on the type of  

appl ication being used. Highly paral lel appl ications with low bandwidth requirements

wil l increasingly see a much bet ter price/performa nce than t radi tional memory intensive

HPC applications that are not trivially parallel.

10. Benchmarks

Almost unanimous agreement tha t benchmarking for cus tomer procurements will have

to change . The cost is too h igh for thi s in today’s market where margins a re ge tt ing

s ma ll er all t he tim e. Requi rem en t s for bet te r gene ra l benchmar ks is t he re but t he

awareness of current efforts such as IDC’s was low.

11. Grid

Most bel ieved the Grid is s ti ll immature today but will become important in the future,

but the opinions d if fe red subs tant ial ly in how and what way thi s will happen. Di rect

expe ri ence seem ed low and one coul d argue whe the r t he r es ponse might have been

different if the people interviewed had more experience of the Grid. Interestingly to note

was that no one thought that the Grid and Grid development would have any impact on

future system architecture. I t i s rather something that their systems are used for and an

opportunity to sell systems!

12. Future computer centres

HPC computer centres will have a place in the future according to the interviewed. Many

different reasons were given but the fact that future very high performance wil l require

even more exper ti se in various fi elds such as hardware , applica tions , per formance

analysis / tuning, logis tics, procurements etc is reason enough to believe in a future for

HPC centres. Application specific centres were predicted by many.

Our conclusions of the technology watch report

If anyone bel ieved that future HPC systems all would move towards a unif ied computer

architecture they will be disappointed. New advances in all relevant computer technology

made poss ib le by an increasing market as well as the Open source movement instead

make the possibili t ies even larger than before.

We see five major trends for the next 5–10 years:

1. We bel ieve that the increasing mass market for computer and consumer systems will

provide the bas is for a cont inued exponent ial t echnical development in al l a reas of  

computer systems development . This will mean many new and excit ing possibil it ies also

for HPC computer systems and we wil l continue to see a proli feration of systems instead

of a convergence towards some common pla tform. A subs tant ial par t of fu ture HPC

systems will be provided by non- traditional HPC vendors. A major driving force behind

this wil l be Linux clusters used for HPC. Off the shelf components means much lower

profi t margins which wil l create new and different business models than what we havetoday. New HPC vendors will appear, some will merge and some will disappear.

2 . We see future HPC sys tems are paral le l, sca lable computing a rchi tec tures tha t a re

based on the notion of clustered SMPs. Scaling is possible by either adding nodes and/or

increasing the nodes themselves. A signif icant port ion of the components are based on

COTS but proprie ta ry components p lay an important role. Opera ting sys tems is Unix

with Linux as target. SMP nodes can be anything from a single processor up to thousands

o f pr oces s or s and will have lar ge vari ati ons in p rocess o rs , compil er s, m em ory

31

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 32/70

subsys tems, node in te rconnec t and mass s torage . The upper limi t in si ze is set by the

prize, not the technology.

3. Peak performance as wel l as pr ice/performance will cont inue to improve according to

Moore’s law. This is not t rue for sustained performance however where there will be an

even m or e p ro found dependence than t oday bet ween per fo r mance and app li ca ti on

character is t ics . Applicat ions that easily can take advantage of advances in consumerproduct technology might even get a “super” Moore’s law performance increase. Life

sciences are in that category.

4. Paral lel programming models and programming languages wil l be on an evolut ionary

path rather than a revolutionary. MPI and OpenMP for parallel programming and Fortran,

C, C++ and Java for programming languages. This is good news for the user. The bad

news is that to achieve high performance, detailed knowledge about the appl ication as

well as the under ly ing hardware is s ti ll r equired. Gett ing to ex treme per formance

requires even la rger e ffor t s than today due to the sheer si ze of sys tems tha t will be

available. Lack of adequate tools for software development will likely become a key issue

for HPC.

5. The Grid is evolving fast and we strongly believe it will play a key role in the way HPC

is used. In the context of this report , we see one new key role will be to serve as a layer

between the user and the HPC computer system. Hiding details of not only geographical

loca tion but (maybe more important ly) sys tem archi tec tures and usage will in many

cases enable the continued proliferation we see in HPC system architectures. We believe

HPC vendors should cons ider thi s when developing new sys tems . This development

should be very s t raightforward as long as we are talking about users of 3rd par ty codes

and s t raightforward compile and run applicat ions. Many high performance user however

stil l needs to be able to be close to the system and this is an area where the Grid will take

longer t ime to develop.

For HPC centres we believe that they wil l be needed but the rapid development of ever

faster networks, Grid and Grid middleware, increased pain/gain rat io, increased system

complexi ty and the increased infrast ructure cost for logis t ics (power,housing) etc will

mean fewer but larger centres. HPC centres will have to compete with other centres and

to be successful we have the following recommen dation s:

Collaborations and teaming up with other centres. Resource sharing.

Expertise in many different areas of system architecture.

Involved in open source development.

Continued emphasis on MPI and OpenMP.

Performance exper ti se, increas ing . Vendor suppor t decreasing due to leaner

profit models.

Expertise in Grid infrastructure, middle- ware.

Application specific centres.

Molecular scienc es

Overview

From the interviews to the computer companies representatives, the fol lowing scenario

ap pe ar s to cha ra ct er is e th e ne ar - fu tu re dev el op me n t of co mp ut er h ar dw ar e a nd

architectures:

32

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 33/70

Almost without exception, the experts interviewed concur that Moore’s law will continue

to descr ibe accura tely the pace of further chip in tegra tion and the resul ting t rend in

theoretical peak performance of computers, at least for the next 5 years. Similarly, there

is wide agree me nt on the (short - to - mediu m ter m) exp one nt ial increa se in

communica tion bandwidth. However , among the mos t impor tant consequences of thi s

for high performance computing is that memory — and more general ly communicat ion

— latency as measured in CPU cycles will increase rapidly and impede most applicationsf rom reaping fully the benef it s of Moore’s law. The princ ipa l a rchi tec tura l fea tures

proposed to par tly al leviate thi s problem aim a t h id ing la tency behind an increased

depth of data s torage hierarchies and/or a modera te - to- high level of hardware mul ti -

threading.

Increased transistor integration will also have the likely consequences that, on one hand,

more room for extra logic will be available on- chip, allowing a widespread development

of relat ively low- cost special purpose processors. On the other hand, high integration

wil l probably mean that not all t ransis tor will be able to sync to one clock and mult iple-

clock chips will appear.

It is likely t ha t, wit h various s hi ft s of bal ance , we will con ti nue t o wi tnes s the

development of both relatively low- cost Commercial- Off- The- Shelf (COTS) computers

and proprie ta ry hardware , the la tt e r del iver ing higher per formance a t a much higher

cost. SMP clustering will presumably be the dominant architecture. The total number of  

processors comprising a HPC machine will be in the 10000’s , with consequent problems

of rel iabil ity. These wil l have to be addressed by redundancy and s tat is tical algorithms,

eventually merging into true AI management systems.

Disk drives for I /O systems will l ikely cont inue to increase in capaci ty at a much faster

pace than I /O bandwidth, so that the latter will become the limiting factor . Networked

I/O system s, based on SAN, NAS and Internet SCSI will become widespread.

The mos t popular programming languages cur rent ly used (Fort ran , C, C++, Java) will

continue to dominate software development. Interestingly, i t is generally agreed that the

programmer’s pain/gain rat io is dest ined to increase, s ignal ling that compiler and RTE

technology development will not keep pace with hardware complexity and speed.

It can be safely assumed tha t Unix- like opera t ing sys tems will continue to dominate

HPC. Linux will rapidly increase its market share, although proprietary OS will be phased

out very slowly, if at all.

Based on the above assumptions we have projected two probably typical HPC computers

5 years f rom now, within the price range allot ted, one exemplifying a proprietary SMP

cluster and the other being the natural development of current Beowulf clusters. Based

on the interviews and cur rent t rends, we have assumed tha t, wi th in the pro jec ted time

span, the price per processor will remain roughly unchanged.

Conclusions of the case study report

At the ac tual s ta te of t heor y and algori thm s, we can t ry t o p ro ject the im pact on

applications in the field of Molecular Sciences that the three approaches to the definition

of the inter- atomic potent ial outl ined in the previous sect ion will have in 5 to 10 years.

In

particular:

33

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 34/70

Model potent ials based on Force- Field (FF) parameter isat ions will

co nti nu e to do mi na te th e sce ne in th e fiel d of liq ui d crys ta ls,

ferroelectr ic nematic mater ials [21] and protein- folding. Indeed, due

to t he la rge di mensi ons o f t he syst em s under s tudy (up to a few

mil lion of a toms) and to the very long time- sca le of the dynamica l

phenomena relevant in such fi elds (up to mi ll iseconds), FF-basedmethods will s t il l represent the only viable s imulat ion tool up to 10

years.

In 5 (10) year s, DFT m et hods will pr obabl y all ow t he accur at e

compu ta t ions o f elec tr oni c, s tr uc tu r al and dynam ical reac tive

p ro pe rt ie s of sy st em s con tai ni ng 30 00 (100 00) at om s. We can

therefore predict that DFT method s will substitute FF

parameteri sa tions in chemiomet r ic appl icat ions, in which a la rge

number of medium- size calculat ions is needed. This will have a direct

impact in pharmacology; indeed , the des ign of a new drug usua lly

requires a pre- selec tion opera ted by computer simula tions and data

analysis; the advantage of a much higher accuracy in the descr ipt ion

of th e inve stig at ed mole cu la r sy st em s an d pr op er ti es dire ctlytranslates into a high selectivity of the target system with a significant

reduct ion of the number of laboratory tests , up to a factor of 10. DFT

methods will al so allow the accura te s imula tion of small pro tein

systems, or of real is t ic port ions of them, with part icular impact on the

comprehension of the act ion mechanism of meta lloenzymes, where a

red uc ed mo de l us ually neglec ts the fu nd am e nt al u nd er ly in g

in te ract ions. To unde rs t and t he im port ance of s uch a fiel d, it is

suff icient to ment ion that both respirat ion and photosynthesis involve

metal lo- organic act ive centres const ituted by several thousand atoms;

comprehension of the act ion mechanism of such systems wil l al low to

device effi cient synt het ic bio - mi me ti c anal ogues of t he nat u ra l

sy st em s, with a hig h imp ac t in th e field of en erg y sto ra ge a nd

molecular sensors . Moreover , we can predict that DFT- based methods

wil l al low the accurate s imulat ion of nano- scale systems with a high

im pact in t he des ign of m ol ecul ar eng ines , quant um compu ta t ion

devices and chemical storage of data [18].

Ab ini tio potent ials wil l reach such a high s tandard accuracy (ca. 0.2

kcal / m ol ) s o a s t o allow the reali st ic si mu la ti on o f elemen ta r y

react ions of relevance in the field of a tmospheric chemis try ( e.g.

ozone depletion) [37] or combustion ( e.g. nitrogen oxides chemistry)

[46], in tegr at ing and par ti all y repl ac ing exis ti ng expe ri men ta l

techniques. Ab initio simulations will allow the accurate computation

of react ive c ross sec tions and rate cons tant s of elementary sys tems

co mp o se d up to 10–5 0 at om s, u nd er ext re me con di ti on s (hig h

tempera t ur e and pr es s ur e) on a quant it at ive bas is ; m or eove r, t he

study of elementary reactions in the interstellar space [16, 45, 34], e.g.

the synthesis of organic compounds from small molecules and atoms

(C, N, O, H) whic h is no t u su ally dir ec tly acce ss ib le to th e

experimenta li s ts , will be of great help in the des ign of new space-

aircraf ts materials and ult imately in the comprehension of the origin

of l ife. However, according to the previous analysis, the computational

resources required for such demanding appl ications can be achieved

exclusively by large- scale computing facil it ies . In this view, the 12

MEur o pl at fo r m opt ion di scus s ed above might repr es ent only a

34

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 35/70

s ta rt ing po in t fo r t he creat ion of a lar ge tr ans na t ional European

super- computing resource.

35

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 36/70

PROJECT 3

Grid Enabling Technologies

http:// ww w.epc c.ed.ac.uk/ enacts /gridenabling.pdf 

Stavros C. Farantos an d Stamatis Stamatiadis (FORTH)

Institute of Electronic Structure and Laser

Foundation for Research and Technology, Hellas

And

Nello Nellari and Djordje Maric (ETH- CSCS)

Swiss Center for Scientific Computin g

Study objectives

This s tudy was accomplished in 2002. The main object ive was to evaluate the current

technologies for Grid computing as descr ibed in the “Grid Service Requirements” s tudy

and have been implemented in several pro jec ts and Grid t es tbeds a round the wor ld .

Academic applications were mainly covered, with emphasis given in Molecular Sciences .

This sectoral report was aiming:

To briefly review the most popu lar software packages for

implementing a Grid. Here we examine Globus, Legion an d UNICORE

since most applications are related to these technologies.

To consider the current t rend of development for Grid technologies

and definition of standard s.

To locate the available test - beds for computat ional Grids worldwide,

mainly focusing on those based on Globus, Legion and UNICORE.

To review the current appl ications and projects with the established

software packages for the computat ional Grids ment ioned above.

To invest igate the present s ta tus of Grid Comput ing for Molecular

Simulat ions and point out the needs for a new programming design of  

such applications.

To repor t personal experience f rom bui ld ing a loca l Grid based on

Globus paradigm and UNICORE.

The reports consis ts of seven chapters.

After the Introduction, Chapter 2 brief ly reviews the most commonly used middleware

for implementing a Grid: Globus, Legion and UNICORE. Their main characteristics and

diffe re nc es are e mp h as iz e d a nd it co ns id er s nex t fut ur e develo pm e nt s for Grid

technologies. In Chapter 3 the authors col lect in tables 5 to 7 test - beds of computat ional

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 37/70

and data Grids as well as appl ications and projects and give short descr ipt ions of their

his tory and funct ions. This information, which is pr incipal ly col lected from the World

Wide Web, is by no means complete. In the Appendix A the URL addresses are given

which the interested reader can consul t for more information. The types of applicat ions

and t he ca tego ri es of sci en ti st s t o whom cu rr en t Grid s add re s s ar e p re sen ted and it

a tt empts to point out future needs. Chapter 4 deals with the current s tatus of Grids in

Molecular Sciences- Physics, Chemistry, and Biology. Chapters 5 and 6 present thepersonal experience of the authors on installing Globus and UNICORE in local computers.

Small loca l compu ta t ional Grid s we re const r uc ted , whi ch allowed to run a few

applications. Finally, in Chapter 7 the main conclusions of this s tudy are summarized.

Summary and Conclusions

The s tudy reported in thi s document is organized a round six main point s: a genera l

ana lysi s on the princ ipa l Grid sof tware packages , the cur rent development of Grid

t echnol og ie s, the locati on of compu ta t ional Grid te st - beds, t he revi ewing of the

app li ca ti ons / p r o j ec t s r unni ng on t hos e te st - beds , t he s ta tu s of Gri d compu ti ng fo r

Molecular Simulat ions and a summary of our practical experiences on implementing a

computa tion al Grid with Globus and UNICORE.

The ana lysi s on Grid sof tware packages in Chapter 2 considers par ti cu la rly Globus ,

Legion and UNICORE, s ince a t the moment those product s charac te r ize mos t of the

current Grid deployments and the most relevant Grid projects . This par t a ims to provide

complementary information with respec t to the fi rs t ENACTS report “Grid Serv ice

Requirements”. Actual ly, the fi rs t ENACTS report offers an extensive presentat ion of  

available Grid technologies , solutions and products , and this sectoral report often refers

to thi s document for the t echnical detai ls and the presenta t ion of l ess popular Grid

products.

Current ly the major i ty of projects and Grid ini tiatives are based on Globus. Moreover,

Globus seams to be dominant a lso in Grid t es t - beds, especial ly in academic t es t - beds

and Gri d dep loym ent s buil t on Linux m achi nes and Linux clu st er s. However , t he

impress ion i s tha t the cur rent phase cor responds to an ear ly phase for Grid comput ing

charac te r ized by a d if fi cu lt in te roperabi li ty between d if fe rent Grid t echnologies , a

complex instal la tion and configura tion process for Grid product s , and fina lly a very

limi ted number of end users . At the moment there is a growing consensus a round the

OGSA proposal based on Web Services s tandards and speci fi ca tions . Now, OGSA

compliant Grid products are available, such as the Globus 3.x. The expectation is that the

fol lowing phase of Grid computing, character ized by an agreement on a wel l- defined set

of s tandards and speci fi ca tions , will enable an eas ie r inte rac tion between d if fe rent

technologies and components , thus enabl ing the competi t ion and poss ib ly al so the

special iza tion of the various Grid products . The fina l objec t ive is to provide a set of  

services to the user communi ty for an e ffec tive , seamless and easy access to var ious,

di st ri bu t ed and het er ogeneous r es ou rces . The s ucces s of Gri d compu ti ng will be

measured on the benef it s tha t a f inal user can get in a day by day work by using Grid

technologies and products.

The li st s presented in thi s report of the cur rent t es t - beds and the applica tions /pro jec t s

run on them by no means a re supposed to be exhaus tive . We know tha t several o thers

exist which may have been just started. However, the following conclusions can be drawn

from the internet search we have carried out:

11. Globus as a Grid deployment technology dominates in both s ides of the Atlant i

ocean.

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 38/70

22. Most of the appl ications concern part icle physics , ast ronomy, environmental and

meteorological projects and in less extent biological and chemical applications.

It is interesting to compare the above lists with a previous review the reference of which

is given in the Introduct ion. The explosive growth of test - beds and applicat ions in Grid

computing is remarkable. I t is worth mentioning the announcement of Mathematica for

its new version suita ble for a grid environ me nt, the GridMathe ma tica(h tt p: / / www.wolf ram .com / p r oduct s / g r i dm at hem a ti ca / ). Mat hem at ica is a popu la r

software package for symbolic and numerical calculations as well as graphics.

Special attention was given for the Grid applications in the field of Molecular Sciences.

Although, quantum chemistry has always kept a pioneer role in applying new computer

technologies , i t turns out that in the Grid case there is no much progress . This is because

distrib uti ng com pu ti ng req ui re s new nu me ric al algorith ms an d new ways of  

programming. At present, there are just a few examples which apply the establ ished Grid

deployments , but we bel ieve in the near future we shal l see them expanding . We may

think that parallelized codes are ready for running in a Grid. This is not t rue s ince the

employment of thousands or even mil lions of computers requires different programming

approaches. At present only problems with trivial (‘embarrassing’) parallelization can be

treated in a Grid.

The la st two chap te r s of t he docum ent des cr ibe the expe ri ences of t he au tho rs in

instal ling Globus and UNICORE. Those chapters intend to give an idea how the models

and concep t s, expr es s ed in the s econd chap te r, a re t hen im pl em ent ed in r eali ty .

Therefore, the intention is to complement this s tudy with pract ical experiences rather

than provide a user guide for installing Globus or UNICOR.

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 39/70

PROJECT 4

Data Management in HPC

http://www.epcc.ed.ac.uk/enacts/datamanagement.pdf 

Giovanni Erbacci, Marco Sbrighi (CINECA)

Inter- University Consortium

And

Audrey Crosbie, Carole McGloughlin (TCD)

Trinity College Dublin

Study objectives

The ob je ct ive s of t hi s st ud y were to gai n an un de rs ta n di n g of th e p ro bl em s

associated with s toring, managing and extracting information from the large datasets

inc reas ingly be ing gene ra ted by computat ional s cien ti s ts , and to iden ti fy the

emerging technologies which could address these problems.

The principlal deliverable from this activity is a Sectoral Report which enables the

par ti cipan t s to pool the ir knowledge on the lat es t da ta - mining , warehous ing and

assimilation methodologies . The document reports on user needs and investigate new

developments in computa t ional sc ience. The report makes recommendat ions on how

these technologies can meet the changing needs of Europe's Scientific Researchers.

The study focus on:

The increasing data storage, analysis, transfer, integration, mining, etc;

The needs of users;

The emerging technologies to address these increasing needs.

This study benefits the following groups:

Researchers involved in both advanced academic and industrial activities, who

will be able to management their data more effectively;

The research centres which will be able to give better advice, produce more

research and deliver better services for their capital investments .

European Research Programmes developing more international collaborations

and improved support to European Research Area.

The report consis ts of eight chapters:

Chapter 1 ENACTS Project

Chapter 2 Basic technology for data management

Chapter 3 Data models and scientific data libraries

Chapter 4 Finding data and metadata

Chapter 5 Higher level projects involving complex data management

Chapter 6 Enabling technologies for higher level systems

Chapter 7 Analysis of Data Management Questionnaire

Chapter 8 Summary and Conclusions

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 40/70

Summary and conclusions

The ob je ct iv e of t hi s st ud y was to gai n an un de rs t an d in g of th e p ro bl em s an d

reference the emerging solutions associa ted wi th s tor ing, managing and extrac t ing

informat ion from the large datasets increasingly being genera ted by computa t ional

scientis ts. These results are presented in this report which contains a s tate of the art

overview of scientific data managemet tools , technologies, methodologies , on- going

projects and European activities.

More p reci se ly , the repor t p rovides an inves tiga tion and evalua tion of cu rren t

technologies. It explores new standards and supports their development. It suggests

go od p ra ct ic e fo r us er s an d Cen te rs , inve sti ga te s pl at fo rm - in de pe n de n t an d

di st ribu ted s to rage solut ions and explores the use o f di fferen t t echno logies in a

coordinated fashion to a wide range of data- intensive applications domains. A survey

was conducted to a sses s the p roblems as sociated with the management of large

datase ts and to assess the impact of the current hardware and sof tware technologies

on current Data Management Requirements . This report documents the users’ needs

and investigates new data management tools and techniques applicable to the service

and support of computational scientis ts. The following main observations come from

the analysis of the questionnaire;

60% of those who responded to the questionnaire s tated that they perceived a

real benefit from better data management activity.

The majority of participa nt s in the survey do not have access to

sophisticated or high performance s torage systems.

Many of the computa t ional sc ient ists who answered the ques tionnaire are

unaware of the evolving GRID technologies and the use of data management

technologies within these groups is limited.

For indus try , securi ty and rel iabi li ty are s tr ingent requirements for users of  

distributed data services.

The survey identifies problems coming from interoperabili ty l imits , such as;

o limited access to resources,

o geographic separation,

o s i te dependent access policies,

o security assets and policies,

o data format proliferation,

o lack of bandwidth,

o coordination,

o s tandardising data formats .

Resources in tegra tion problems arising from different physica l and logica l

schema, such as relat ional data bases, s truc tured and semi- s tructured data

bases and owner defined formats .

The ques tionnaire shows tha t European researchers a re some way behind in thei r

take up of da ta management solut ions . However , many good solutions seem to ar ise

from European projec ts and GRID programmes in general . The European research

community expect that they will benefit hugely from the results of these projects and

more specifica lly in the demons tra t ion of product ion based Grid projec ts. From the

above observations and from the analysis of the emerging technologies available or

under development for scient if ic data management community , we can sugges t the

following actions.

Recommendations

This report aims to make some recommendations on how technologies can meet thechanging needs of Europe’s Scientific Researchers . The firs t and most obvious point

that is a clear gap exists in the complete lack of awareness of the participants for the

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 41/70

available software. European researchers appear to be content with continuing with

their own style of data s torage, manipulation, etc. There is an urgent need to get the

information out to them. This could be by use of dissemination, demonstration of 

available equipment within institutions and possibly if successful on a much greater

scale, encourage more participation by computational scientis ts in GRID computing

projec ts. Real demons tra tion and implementa t ion of a GRID environment wi thin

multiple institutions would show the true benefits . Users’ course and the availabili tyof machines would have to be improved.

1. The Role of HPC Centres in the Future of Data Management

The HPC centres al l over Europe will play an important role in the future of da ta

management and GRID computing. Informat ion on curren t s ta te of the ar t and

sof tware should be avai lable through these Cent res . If informa tion is not f reely

available then a t a minimum these centres should direc t researchers to where they

will be able to obtain this information.

Dissemination and demons tra t ions would certain ly be a good s tar t to improving the

awareness of researchers. Product ion of f lyers and pos ter of the current s ta te of the

art, along with seminars , tutorials and conferences that would appeal to all involved

in a ll a reas of the scient if ic community . HPC centres can be seen the key to then atio n' s tec hn ol ogi ca l an d eco no mi ca l suc ce ss . Their rol e sp an s all of t he

computation al sciences.

2. National Research Councils

National Research Councils play an important role in the research of the future. This

report aims to make recommendat ions for ‘nat ional research counci ls ’ to address

avoiding bottleneck in applications. The ENACTS reports endeavour to find current

bottlenecks and eliminate them for the future researchers. This report introduces two

national research councils , one from the UK and one from Denmark. These are used

as examples to demons t rate promot ion act ivi ti es and how nat ional bodies can

encourage the use of new tools and techniques. The UK Research Council s tates that

‘e -Science is abou t globa l coll abora tion in key a reas of science and the next

generation of infrastructure that will enable i t.’ Research Councils UK (RCUK) is a

s trateg ic par tne rship se t up to champion s cience , engineer ing and t echnology

support ed by the seven UK Research Councils. Through RCUK, the Research Councils

a re work ing together to crea te a common f ramework for re search , t ra in ing and

knowledge transfer. http:/ /www.shef.ac.uk/cics/facil i t ies/natrcon.htm

The Danish National Research Foundation is committed to funding unique research

within the basic sciences, life sciences, technical sciences, social sciences and the

humani ti es . T he aim is to ident ify and support groups o f s cien ti s ts who based on

int er na ti on al eval ua ti on are able to cr ea te inn ov at iv e a nd cr ea tiv e r es ea rc h

environments of the highest international quality.

http:/ /www.dg.dk/english_objectives.html

3. Technologies and Standards

There is an increasing need by computational scientis t to engage in data s torage, data

analysis, data t ransfer , in tegra t ion of da ta and data mining. This report gives an

overview of the emerging technologies tha t a re being developed to address these

increasing needs. The partners in ENACTS would support and encourage continued

research and technology transfer in data management tools and techniques.

New Technologies

Discovery Net Project. The arrival of new disciplines (such as bioinformatics) and

tec hn ol og ie s will tr an sf or m a da ta d um p to kn owl ed ge an d inf or ma ti on . The

Discovery Ne t Projec t aims to bui ld the fi rs t e - Science p la tfo rm fo r scien ti fi cdiscovery from the data genera ted by a wide variety of h igh throughput devices a t

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 42/70

Imperial College of Science, UK. It is a multi- disciplinary project, serving application

scientis ts from various fields including biology, combinatorial chemistry, renewable

ene rgy and geology. It i s a se rv ice o ri en tated computing model fo r knowledge

discovery, allowing users to connect to and use data analysis software as well as data

so ur ce s th at are m ad e availab le onli ne by thi rd pa rt ie s. It de fi ne s s ta nd ar d

a rchi tectu re s and too ls , allowing scien ti s ts to plan manage sha re and execute

complex knowledge discovery and data analysis procedures such as remote services.It allows se rv ice p roviders to pub li sh and make avai lable data min ing and data

analys is sof tware component s a s se rv ices to be used in knowledge di scovery

procedures. It also allows data owners to provide interfaces and access to scientific

databases, data s tore sensors and experimental results as services so that they can be

integrated in knowledge discovery processes. http:/ /ex.doc.ic.ac.uk/new/index.php

OGSA- DAI is playing an important role in the construction of middleware to assist

wi th access and the integrat ion of da ta f rom separate data sources via the grid. I t is

engaged with iden ti fy ing the requi rements , des igning solut ions and deliver ing

software that will meet this purpose. The project was conceived by UK Database Task 

Force and is working closely with the Global Grid Forum DAIS- WG and the Globus

Team. It is funded by DTI e- Science Grid Core Project involving: National e- Science

Cent re ; ESNW; IBM; EPCC and ORACLE. http:/ /www.ogsadai.org.uk/index.php

 Data Standards

A push for the s tandardisation of data will increase the usability of the software that

is currently available. There is an ongoing push to provide a s tandardised framework 

for metadata including binary data, such as the DFDL initiative. The DFDL (Data

Format Descrip tion Language) is par t of the Global Grid Forum ini tiat ive (GGF).

http:/ /www.epcc.ed.ac.uk/dfdl/  . Current ly DFDL is an informal email discuss ion

group , p roviding a language to descr ibe the way fo rma ts fo r metada ta should be

wri tten. There is a need for a s tandardised unambiguous descrip tion of da ta . XML

provides an e ssen ti al mechan ism for t ransfe rr ing data between se rv ices in an

applica t ion and pla tform neutral format. It i s not wel l suited to large datasets wi th

repet it ive s tructu res , such a s la rge ar rays o r t ab le s. Furthermore, many legacy

systems and valuable data sets exist that do not use the XML format. The aim of this

working group is to def ine an XML-based language , the Data Format Descrip tion

Language (DFDL), for descr ibing the s truc ture of b inary and charac ter encoded

(ASCII/Unicode) fi les and data s treams so that their format, s tructure, and metadata

can be exposed . T hi s ef fo rt speci fi ca lly does not aim to crea te a gene ric da ta

representation language. Rather, DFDL endeavours to describe existing formats in an

ac tionable manner tha t makes the da ta in i ts cu rren t fo rma t access ib le through

generic mechanism s.

Data interoperabili ty in of great importance especially within a GRID context. The

iVDGL project is a global data grid that will serve as the forefront for experiments in

both physics and a st ronomy. http://www.ivdgl.org/ . Data interoperabi li ty is the

sharing of data between unre la ted data sources and mul t ip le applica tions. Crea tingenterpr ise data warehouses or commerce webs ites f rom heterogeneous data sources

a re two of the mos t popu la r scenar ios fo r Microsoft SQL as an in te roperabi li ty

plat fo rm. It p re se rves the ir investments in exi st ing systems through easy data

interoperabili ty, while providing additional functionali ty and cost effectiveness that

their existing database systems do not provide. It enables easy access of data and the

exchange of data among groups.

Global File Systems

Traditional local fi le systems support a persis tent name space by creating a mapping

between blocks found on disk devices with a set of files , file names, and directories.

These file systems view devices as local: devices are not shared so there is no need in

th e file sys te m to enf or ce device sh ar in g se ma nt ic s. Inst ea d, t he focu s is onaggressively caching and aggregating file system operations to improve performance

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 43/70

by reducing the number of ac tual di sk acces ses requi red for each file system

operation.

GFSThe Global File System (GFS) is a shared- device, cluster fi le system for Linux. GFS

supports journal ing and rapid recovery from client failures. Nodes wi thin a GFS

cluster physically share the same storage by means of Fibre Channel (FC) or sharedSCSI devices. The file system appears to be local on each node and GFS synchronize s

file access across the cluster. GFS is fully symmetric. In other words, all nodes are

equal and there is no server which could be e ither a bot tleneck or a single point of  

failure. GFS uses read and write caching while maintaining full UNIX file system

semantics. To find out more please see

http:/ /www.aspsys.com/software/cluster/gfs_clustering.aspx

FedFS

T here has been an inc reas ing demand fo r bet t er per fo rmance and avail ab il ity in

storage systems. In addition, as the amount of available s torage becomes larger, and

the acces s pat te rn more dynamic and dive rse, the maintenance p roper t ie s of the

storage system have become as important as performance and availabil i ty. A loose

clustering of the local file systems of the cluster nodes as an ad- hoc global file spaceto be used by a d is t ributed applica t ion is def ined. It is ca lled the dis tr ibuted file

system architecture, a federated file system (FedFS). A federated file system is a per-

application global file  naming facil ity that the application can use to access files in

th e clu st er in a loca ti on in de pe n de n t m an ne r. FedFS also su pp o rt s dy na mi c

reconfigura tion, dynamic load balancing through migrat ion and recovery through

rep li ca tion . FedFS p rovides all these fea tu res on top of autonomous local file

systems . A federa ted file system is crea ted ad hoc, by each app li ca tion , and it s

lifetime is l imited to the l ifetime of the dis tributed application. In fact , a federated

file system is a convenience provided to a dis tributed application to access files of 

mul tiple local fil e systems acros s a cluster through a loca tion- independen t file

naming. A location- independent global file naming enables FedFS to implement load

balancing, migra t ion and replica tion for increased avai labi li ty and performance .

http:/ /discolab.rutgers .edu/fedfs/  

 Knowledge Management

e- DIKT (e- Science Data Information & Knowledge Transforma tion) is a project which

applies sol id sof tware engineer ing techniques to leading edge computer science

research to produce robust, scalable data management tools that enable new research

areas in e- Science. E-DIKT has been funded through a Research Development Grant

by the Scottish Higher Education Funding Council. E-DIKT will initially investigate the

use of new database techniques in astronomy, bioinformatics, particle physics and in

creating virtual global organisations using the new Open Grid Services Architecture

(OGSA). E-DIKT’s realm of enquiry will be at the Grid scale, the terabyte regime of d at a ma na ge m en t, its go al to s tr ai n - tes t t he co mp ut e r scien ce th eo ri es an d

techniques a t this sca le . Present ly e- DIKT is looking into the fol lowing areas of  

research:

Enabling in te roperab il ity and in te rchange o f bina ry and XML da ta in

astronomy

– tools to provide “implicit XML” representation of pre- existing binary

files;

Enabling relational joins across terabyte- sized database tables;

Testing new data replication tools for particle physics;

Engineering indust rial - streng th middlewar e to sup po rt the data

management needs of biolog is ts and biochemis ts invest igat ing p ro tein

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 44/70

s truc ture and function (as i t re la tes to human disease and the development of  

drug therapies);

Building a data integration testbed using the Open Grid Services Architecture

Data Access and Integration components being developed as part of the UK’s

core e- Science programme and the Globus- 3 Toolkit .

Working over t ime with a wider range of scientific areas, i t is anticipated that e- DIKTwill develop generic spin- off technologies that may have commercial applications in

Scotl an d an d bey on d in are as suc h as dr ug di sc ov er y, fin an ci al anal ysi s an d

agricultural development. For this reason, a key component of the e- DIKT team will

be a dedicated commercialisation manager who will push out the benefits of e- DIKT

to industry and business. http:/ /www.edikt.org/ 

Meeting the Users’ Needs

One of the points addressed in the Users’ Quest ionnaire was the ease of use of new

data management tools . While a lot of researchers are not content with their own data

management, they would not be willing to change unless i t was an easy changeover.

That i s ease of use and quanti ty and quali ty of funct ions would be important i ssues

for res ea rc he rs whe n loo ki ng at migr at in g to ne w an d im pr ov ed sy st em s. Th e

ENACTS partners welcome the European initiatives and projects aiming to develop

GRID comput ing and data management tools . However , this development must be

focused a t the end user and projec t resul ts mus t be tes ted on rea l sys tems to enable

applications research to benefit from migrating to new tools.

Future Development s

The princ iple del iverable f rom this ac tivi ty is represented by this Sectoral Report

which ha s en ab le d th e pa rt ici pa nt s to p ool th ei r kn owl ed ge o n th e lat es t da ta

management t echno logies and methodo logies . The document has focused on use r

needs and has invest iga ted new developments in computa t ional science . The s tudy

has identified:

The incr ea si ng d at a st or ag e, an al ys is, tra ns fe r, in te gr at io n, mi ni ng

requirements being generated by European researchers;

The needs of researchers across d ifferent d iscipl ines are d iverse but can

converge through some general policy and standard for data management;

T he re are emerg ing techno log ie s to addres s these var ied and inc reas ing

needs, but there needs to be a greater flow of information to a wider audience

to educate the scientific community as to what is coming in GRID computing.

The study will benefit:

Researchers who wil l gain knowledge and guidance fur ther enabling the ir

research;

Research centres that will be better positioned to give advice and leadership.Europea n research progr am s in developi ng mor e inter na tio nal

collaborations.

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 45/70

PROJECT 5

Distance Learning and Support

http://www.epcc.ed.ac.uk/enacts/DistanceLearning.pdf 

Josef Novak, Miroslav Rozloznik, Miroslav Tuma,

Division HPC, Prague, Czech Republic (IHCC)

Institute of High Performance Computing Centre

Study objectives

The main task of this s tudy was to describe and analyze the role of dis tance learning

and support within the field of Grid computing. It often happens that Grid computing

tools have a s trong feedback on the s trategies and efficiency of the dis tance learning.

T he o rigina l plan of the composi tion of the s tudy was a s follows. It had to becomposed of the four main parts :

Part I: Presentation of the ENACTS project, of the Distance Learning and Support

Project, the envisaged workplan, technical objectives and benefits. We also give a

short descrip t ion of par t ic ipa ting organiza tions together wi th the li st of research

team members.

Part II: Various definit ions of distance learning and education. Presentation of basic

dis ta nc e lea rni ng to ol s an d m od el s, incl ud in g a di sc us si on of t hei r fea tu re s,

limitations, and benefits for prospective users. Although this text is relatively general,

the authors want to focus on three main user groups which have been identified: HPC

centers (service providers), Scientific Grid community (researchers and users) and

Industr ial Grid community (vendors and end- users) . These three target groups are

shortly discussed in the end of this section. This section was written by ICCC.

Part III: Results f rom a comprehensive survey dis t ributed among 85 and undertaken

among 25 major European research groups. It focuses on different aspects , such as

the needs and requirements of var ious potentia l target groups and the pedagogical

and organisational approach, which fits best with identified target groups. The survey

includes a clear analysis how to ascertain the feasibili ty, viabili ty and relevance of 

adapt ing a proper d is tance learning s tra tegy to the t ra in ing requirements and leads

into a evaluat ion and agreement on a f ramework for col labora tive development of  

sui table d is tance learning based course material . This sec tion incorporated in the

Final Report was written by I.C.C.C. in close collaborat ion with UNI-C.

Part IV: Conclus ions and recommendat ions . The purpose of the survey presented in

Par t III is to gain a be tt er unde rs tand ing of these key use r groups’ needs and

requirements in v iew of establ ishing a proper f ramework for d is tance learning andsupport . The analysis of this survey, together with Part II, which presented general

co nc ep t s an d tec hn ol og ic al iss ue s, will be ins tr u me n ta l in est ab li sh in g key

recommendations for target groups. This section provides a summary of the dis tance

lear ni ng fea tu re s of fe re d by t he lea di ng bu t still r at he r s ma ll gr ou ps of Grid

special is ts and users and i t wil l make recommendat ions on a possible s tra tegy tha t

supports a successful uptake of Grid technology around larger communities. This

section was written by ICCC.

Based on the results of the questionnaire, we have changed the s tructure as follows.

Keeping Part I, Part III and Part IV as they were proposed, replacing Part II by a short

int roductory text on dis tance learning def in it ions and ident i fied user groups. The

bas ic reasons for this change were the fol lowing: evalua tion of the resul ts implies

unexpected conclusions. While it seems tha t d is tance learning is re levant to Gridcomputing, i t should be applied in a special form with which we deal below.

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 46/70

Summary and conclusions

This section contains three subsections that cover the following subjects:

1. Related projects and activities

2. Recommended di st ance learning concep t s and too ls o ffered by contempora ry

technologies

3. Key requirements and needs of users in the Grid community

T he fir st subsec tion summarizes the relat ions to related p ro ject s and par tne rs’

effor ts. We decided to put such a subsection here in order to see a few important

fac ts: to see some projec ts dea ling wi th the issue of d istance learning from another

perspect ive; to unde rs tand some of the concep t s cons idered here in dif fe ren t

frameworks and with different results; to motivate further investigations.

Othe r subsect ions a re based on answers from Pa rt III, t hei r in te rpre tat ion and

d eriv at io n of gen er al co ncl us io n s. The way t he res ul ts wer e ass em bl ed wasment ioned above . We bel ieve tha t the proposed s tra tegies and recommendations

should he lp to es tabl ish a p roper f ramework for di st ance lea rn ing and support .

Consequently, they might contribute to the successful uptake of Grid technology even

in larger communities in Europe and all over the world.

1. Related projects and activities

T here a re a number of p ro ject s and ac tivi ti es closely related to ENACTS. An

important contribution wil l be probably played by the results of the LeGE working

group whose main task is to facili tate the establishment of a European Learning Grid

Infras truc ture. This projec t is interes ting for both i ts goals and for the way grid

technologies attempt to achieve these goals (LeGE-WG: Learning Grid of Excellence

Working Group; http:/ / www.lege- wg.org). It clearly dis tinguishes between primaryscientific and technological objectives on one s ide and operational objectives on the

other. The former set of objectives is devoted to basic technological and pedagogical

issues , legisla tive condi t ions, new European methodologies and s tandardisa t ion of  

emerging Grid- aware solutions. The latter set of objectives deals with the practical

s te ps to p ro mo t e e - l ea rni ng in fac t, th ey incl ud e t he who le Eur op ea n Hig he r

Education, European Scientific and Engineering Research communities and the like. In

a sense, LeGE might be considered as a very general metaproject with respect to our

research.

An important dual view could be provided by a research projec t which would t ry to

generalize our target groups horizontally. In other words, one may consider a general

target groups of s tudents, researchers or computa t ional scient is ts . It i s d if ficult to

predic t how important is d is tance learning for each group. An overview of thei r

interests , needs and requirements can be found at ILIAS Opensource, University of Cologne (http:/ / www.ilias.uni- koeln.de/ios /in dex.ht ml ).

Anothe r in te re st ing p ro ject somewha t related to ENACTS is the EGEE projec t

(Enabling Grids for E- science in Europe). This project was launched on April 1st, 2004

(http:/ / pu blic.eu- egee.org/) . Its goal is to describe and interpret current national and

regional Grid efforts. The project covers a large portion of the industrial partners .

 2 . Summary of recommended dis tance learning concepts and tools

As we have seen above, there is, predictably, a non- negligible interest in acquiring

scientific and technological information. What we did not expected was that most of 

the users acquire this information by means of tradit ional sources. In particular, the

users   prefer s tandard research papers and non- research art ic les, manuals , bookle ts ,

hardware,   sof tware and sof tware documentat ion. Yet, there is one modern fea ture .

T he ways to  access are based on elec tronic too ls (e -mai l or in te rne t) . T he web

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 47/70

envir on m en t is th e n ew   wrapper tha t includes mainly the cla ss ical sources of  

information as l isted above.  The new scientific and technological information can be

extrac ted not only f rom a paper based   agenda. Another important way to obtain i t

includes workshops, conferences,   congresses . As far as the size of such meet ings is

concerned, there is typically a  reasonable limit. It is well- known that meetings with

larger number of participants are   less effective in passing scientific information and

for the learning process. Rather, they  play a social role in the scientific society. Theyare important for celebrating important  personalities , awarding prizes and the l ike.

We have investigated the role of the training type in the overall educational process.

Based on the previous s tudy, there are bas ical ly two important types of t ra in ing:

informal   and organised training. The for me r is ind ivi du al is ti c an d trie s to

unders tand the subjec t mat ter f rom scra tch. In such cases, for ins tance learning a

programming language, it is important to s ta rt with reading examples and not

reading use r guides o r manual s . T hese tool s are more impor tan t la te r when the

informal t ra in ing t ransfers smoothly into an organised t ra in ing. Users often rea lly

need to check careful ly bas ic ideas of the new subjec t . The amount of t ime spent in

these initial exercises which we might call a setup t ime is rather individual. Then the

user might a ttend intens ive courses on some sof tware or hardware products , learn

new ways how to cope with new communication tools and how to use grids.

If we t ry to point out the consequences of the bas ic user proposals , we can see tha tthey may need a peer- based (community facilitated resource) rather than organised

instructor courses. Such courses or training should come later when the user requires

more advanced informat ion. One example f rom previous times was a community of  

u se rs of p ar al lel co mp u ta ti on al to ol s. The in divi du al acce ss wa s m uc h m or e

important than an organised learning of var ious formalisms. Note tha t an organised

tra ining might be in th is case not very effec tive since the techniques of paral le l

programming and using parallel machines are rapidly changing. In addition, parallel

programmi ng tools are typically very individualistic.

User s are inter es te d in “Open source ” ap pr oa ch to tr ai ni ng an d inf or m at io n

resources and materials . They need to have access to databases in order to be able to

see closely   related information. They need to be ab le to have enough mater ia l in

order to extend the ir knowledge in var ious direc t ions . The next sec tion will t ry to

analyze the obtained re su lt s f rom the point of view of di st ance lea rn ing. T hen a

specific distance learning way will be propos ed.

  3. A pro po se d stra te gy an d reco m me n da ti on s for esta bli shi ng a pro pe r

 framework for dis tance learning and support in Grid community

As we have seen, five bas ic fea tures can charac ter ize the mains tream of d is tance

learning (conservative definition):

1. it is a highly structured activity

2. it deals with a highly structured content

3. i t i s typical ly a one- to- many process (teacher- centered process) in which the

tutor plays really a key role4. it i s character is ed by a frequent monitor ing o f it s participants by tes ts ,

assignments etc.

5. it is based on combination of various web- based tools .

Consider now ques tions the users may ask. They may need large data resources tha t

do not p reci se ly co rrespond to the p revious di st ance lea rn ing cha racte ri zat ion .

Although the ac tivi ty might be considered as h ighly s truc tured in both it s form and

content, i t is not   teacher- centered in the s trictest sense of the word. The database-

oriented learning serves more for information exchanges between two partners: those

who create them and those who use them. Never theles s, thi s database - or iented

learning has a t endency to develop in the t eache r - centered way. While the fir st

encounters with the teaching in grid computing might be very unorganised, they have

to change. They have to transform into a fully organized training.

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 48/70

We see another conclusion concerning the role of teachers and s tudents in distance

learning. I t appears tha t the delayed form might be fur ther developed by increased

activity of the s tudents . Let us try to explain this conclusion more carefully. There are

many types of speci fi c g rid computat ions . Just now it would be very cost ly and

inefficient to prepare very specialized experts in the field of Grid Computing that is

changing so rapidly. Nowadays, i t is more important to increase the general level of 

knowledge of Grid Comput ing in par ticular communities. Deep t ra in ing of expertsmight be more useful once Grid Comput ing becomes a general ly accepted form of  

computa t ion. In such a future scenar io, s tudents will p lay a more ac tive role in the

learning process than current d is tance learning analysis based on the ques tionnaire

suggests . In this context, let us dis tinguish two basic user groups: experienced users

entering the new emerging field of grid computing and students who are gett ing their

f irst quali fica tion. One important d if ference between these users cons ists in thei r

possibilities. While for an experienced user there is no difficulty in attending a couple

of workshops per annum, this could be a problem for s tudents because of a lack of  

funding. Consequently, the s tudents a re a very specific class of users wi th a much

more open a tt itude to distance learning techniques .

Now let us take int o acco un t a user - frien dly enviro nm en t - - a reas on ab le

compromise of users’ demands for a flexible d is tance learning framework. We wil l

cal l i t a grid t ra in ing portal . I t i s a cus tomisable , personal ised web interface foraccess ing services using dis tance learning and educat ion tools . It would provide a

common gateway to resources wi th special a ttent ion to the tools ment ioned above.

That is, personalised and delayed  distance learning forms must be preferred.

We do not aim to present a f ixed form of what we cal l the dis tance learning porta l.

Instead, we would like to sum ma ri se our conclusion s and sub se qu en t

rec om m en d a ti on s in ter ms of a flexible tool “under cons truc t ion”. Of course , it

should ref lect available technologies as wel l. We will describe it s bas ic s tructure

putting an emphasis on i ts most important features.

1. Control (management) unit

2. User resources

3. Information resources

4. Communication subsystem

Let us now descr ibe these par ts of the proposed porta l system. The control unit , or

more accurately, the management unit contains basic institutions and individuals

 jointly with tools that run the portal . Although there should be various specific rules

how to handle the portal organisat ion, it i s important to solve the problems of i ts

technical updates, financial support , technological development, software upgrades

etc. Some of them might need a rather sophisticated s trategic decision. The control

u ni t sh ou ld co nt ai n two sp ec if ic layer s: service (maintenance) layer for

implement ing the cont rol mechani sms and evaluation layer. One of the mo st

important tasks of the control unit is to balance two basic functions. Firs t, there will

be a s trong pressure of technological developments on hardware tools which will

inc lude both the node demands and the network demands . Technologically, thesedemands will present themselves in the need to make the nodes more powerful and

to make the network with ever- larger bandwidth. Nevertheless, there is a strong gap

between very fas t processors and relat ively slow connections. In o ther words, the

network technology is lagging behind whi le increasing the network bandwidth is a

real technological challenge. Second, the management unit must take care of financial

resources including:

1 . The s tar t?u p development cos ts,

2. The cost of the hardware and sof tware too ls connected to the por tal includ ing

regular

updates and upgrades ,

3. The operational cost of the technology,

4. The management cost and5. The technology remediation costs.

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 49/70

The cost function must be careful ly evalua ted and balanced with the technologica l

requirements that increase the overall portal cost. Once these two items are balanced,

we should take care of the overall efficiency. Mr. Soren Nipper in [14] proposed and

presen ted a picture showing the use r g roups on top of the pyramid with it s l arge

base corresponding to the overall costs. This might be a figure which is temporarily

valid now.However , it does not need neces sa r ily co rrespond to the futu re development .

T he re fo re, when t ak ing in to account such model s, a real is ti c forecast has an

important role. As of now, this figure might represent large s tartup costs s ince we are

still in the start ? up  period of the Grid technology.

By user resources we mean the groups of portal users . Their leaders will be engaged

in s trategic decisions. The learning mechanism will not be s trongly teacher- centered

but more or less teacher- student balanced. The users will come with new initiatives

in order to offer overa ll improvements. As far as the target groups are concerned,

they should not be very large assuming the results of the questionnaire. On the other

hand, we do not have an exact idea how they will develop in the future. Some hints ,

however, suggest that the target groups may increase, such as parallel computational

tools two decades ago. After a long period of relatively small user groups, we can see

larg e tea ms colla bo ra ti ng ove r a ne t on t he devel op m en t of lar ge - scal e HPCapplica t ions using dis t ributed tools like SourceForge and dealing wi th powerful

version of synchronising software that enables collaboration of tens of developers .

Information resources are the thi rd par t of the overa ll por tal sys tem. By tha t we

mean the technological content of the portal covering both i ts hardware and software

parts, particularly information databases with papers , lectures in writ ten or recorded

forms, simula tion so ftware and t echnolog ical too ls to presen t all these va rious

mater ia ls . As of now we do no t have very large mul timed ia re sources fo r Grid

Computing in our field. This will likely change in the future, however. In any case, the

development of mechani sms how to s to re , p ro tect , deve lop, upda te and clear ly

organise these data items is a more challenging problem. There will be a specific layer

in this i tem. A specific feature of the information resources will be i ts hierarchical

nature. In our case, the grid content will be first discussed on the level of HPC centers

then on the nat ional level (if there is some) and fina lly wi thin the European grid

community. In fac t, thi s subsys tem is exac tly the one which mus t be o rgan ized

hierarchically.

As far as the por ta l conten t of the informat ion resources is concerned , it can be

s tra ti fied into independent layers . The fi rs t layer may contain databases of wri tten

and elec tronica lly di st ributed in fo rma tion. T he use rs acces sing the por ta l have

di fferent needs . T he re fo re, the documents contained in it s databases shou ld be

sorted out according to their requirements . Other data, such as test codes, video and

audio material , should be s tra ti f ied in a similar manner , thereby creat ing an user-

friendly environment for gird users .

The final i tem is the communication subsystem - - a mechanisms that of exchanges

informat ion among the three p revious it ems. T he communica tion pat t erns do not

need to be uniform for the whole portal . In fact , various ways of communication canbe su pp or te d. For so me typ es of infor ma ti on excha ng es, the sync hr on ou s

connec tions , such a s videoconfe rences , aud io conferences o r Access Grid, are

p re fe rab le . Sometimes asynchronous mechan isms a re p re fe r red. In gene ra l, we

distinguish two types of information s ignals: control and service ones. The firs t type

serves for keeping the portal

in good shape and support ing it s deve lopment . T he second type (which should

prevail by a large margin) serves the users.

While creating a portal is our main recommendation, the question remains open as to

whether this is not something we had in mind even before evaluating the answers. In

other words, we need to ask whether the answers do not h ide some other poss ible

and even comple tely different solutions. We have aimed a t minimizing this h idden

risk by our methodology in which we give to some standard technologies a new, well-

defined content.

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 50/70

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 51/70

PROJECT 6

Software Reusability and Efficiency

http://www.epcc.ed.ac.uk/enacts/softwareefficiency.pdf 

Jacko Koster , (Parral ab, BCCS)

Parallab, Bergen Center for Computati onal Science,

University of Bergen (Norway)

Study objectives

The objec t ive of th is s tudy was to de termine the implica tions for new, and moreimportant ly, exi st ing programs of using a pan- European comput ing metacen t re.

Specifically, the s tudy looks at how portabili ty between facil it ies can be ensured and

how this will affect program efficiency. The report consists of six chapters.

Chapter 1 Introduction to the ENACTS project.

Chapter 2 Software reusability

Chapter 3 Standards programming languages

Chapter 4 Software efficiency,

Chapter 5 Grid and dis tributed environments , middleware

Chapter 6 Summary and conclusions

Challenges and Potential Benefits. Th e u pt ak e of co mp u te r si mu la ti on by new

groups and the cons tant ques t for greater efficiency of computer u ti li sa tion meanstha t new techn iques and approaches a re always requ ired . T he s tudy addres ses

emerging sof tware technologies tha t address these challenges and how they affec t

the ef fi ci ency and reuse of exi st ing app li ca tions . T hi s s tudy aims to review the

current posit ion. The study has two beneficiaries: the users who will be able to work 

more effectively, and the computer centres which will be able to give better advice

and produce more research for their capital investments .

Scope of the study

During the s tudy, we became aware of the difference in the way var ious sof tware

develope rs and development groups approach so ftware reuse . For an academic

researcher or research group, software reuse is typically the use of software that is in

public domain, open source, or otherwise freely available to the researcher in some

form. Reuse of software is often a way to acquire competence and improve and buildupon methods, techniques and algorithms (and the software i tself) that are developed

elsewhere by colleagues or other research groups. Whether or not to reuse software is

often a psychological and/or individual i ssue . In an indus tria l environment on the

o ther hand, sof tware reuse includes di fferen t concep t s. Obviously, commerc ia l

considerations (like licensing, development cost estimates, market potential) play an

imp or ta nt role, b ut also oth er issu es like quality ass ur an ce , reliability,

maintainability, and the support organization of 

the so ftware are more cr iti ca l. In all ca ses, sof tware reuse aims at exploi ting

p re vio us ly acq ui re d co mp et en ce an d to red uc e th e dev el op m en t co st of new

applica t ions . In the report, we primari ly address the HPC user community tha t i s

connected to the ENACTS consort ium, and this i s most ly academic . However , we

touch upon commercial aspects of code reuse as well.

The concept of reuse and efficiency of sof tware is closely related to many other

concepts in software engineering, including (but not limited to) the following:

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 52/70

• software architecture

• end- user and applica tion programming interfaces

• software life cycle (design, prototyping, implementation, verification, testing, and

maintenance)

• software quality, reliability, fault- tolerance

• software development environments

It is beyond the scope of thi s s tudy to review all these in de ta il. There is a vast

a mo un t of m et ho d ol og ie s an d liter at ur e an d on - goi ng res ea rc h in t he se are as.

However , during the s tudy we found i t impossible to avoid addressing some of the

aspects related to these concepts . For the writing of this sectoral report , we have tried

to s tay as close as poss ible to the original objec tives of the s tudy (eff iciency and

reusabili ty) and minimized the inclusion of other software engineering concepts .

Sof tware reusabi li ty in community - led ini ti at ives dea ls a lo t wi th the des ign of  

s tandards for protocols and languages. We have not attempted to describe recent and

emerging language standards in detail. Many of these are s ti ll under s tandardization

and therefore subjec t to change in the near future. Describing the temporary s ta tus

of evolving standards in detail would make this report obsolete in a relatively short

time.

Summary and conclusion

In this report, we have reviewed some of the aspects related to efficiency and reusability in modern and

emerging software technologies for computational science. We hope that the document has been useful

to the reader in a variety of ways, for example

• to u nd er s ta n d so me of t he ma jo r fac to rs t ha t imp ed e so ft wa re ef ficie nc y an dsuccessful sof tware reuse in t radi t ional HPC environments and in d ist r ibuted and

grid environments

• to recognize the relat ionship between sof tware reuse and various other sof tware

engineering concepts and techniques

• to recognize the complexity related to software efficiency in modern (multi- level)

hardware and software technologies

• to understand the importance of sof tware s tandards and sof tware interoperabi l ity

and their impact on many facets of software design and application development and

on computational science and engineering in general .

One objective of this stu dy was to put forward reco mm en da ti on s for the

establishment of a pan- European HPC centre with respect to software efficiency and

reusability. We believe the following conclusions and recomme nda tions are valid.

1. Moving large numbers of application codes between heterogeneous HPC systems

or to new software envi ronments i s only feasible if well - accep ted s tanda rds,

languages and tools a re avai lable . In a pan- European HPC consortium, this wil l be

faci li ta ted if the use r work envi ronment i s s tanda rd ized in some fo rm and bes t

pract ices for sof tware development and job execution are establ ished. Potential ly

t roublesome hardware- specific or si te - specific dependencies should obviously be

eliminated or be dealt with in a transparent manner.

2. Traditionally, data management connected to a numerical s imulation was limited

to the use of the programming bindings provided. The applica tion typical ly ran on

o nly on e arc hi te ct ur e an d t he local file sy st em s wo ul d en su re th at I /O to th e

ap plica ti on coul d be ach ie ve d by u si ng th e pri mi ti ve bi nd in gs pr ovi de d by

programming languages l ike C and Fortran. The archival, retrieval, and exploration of remote (off- s ite) data is of growing importance as computer systems enable the rapid

generation of extremely large, complex, and remote data sets . Increasing the ease and

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 53/70

efficiency of data transfer and access will greatly enhance the amount of science that

can be performed on HPC facili ties, and allows for inter- disciplinary sharing of data.

An HPC centre or consort ium should therefore promote bes t pract ices in portable

data management towards it s user community and provide appropria te advice and

training.

3. Tools are impor tan t in enhanc ing the ef fi ci ency of the sof tware engineer ing

process and of the sof tware i tsel f, and permit rapid code debugging, performanceprofiling and analysis , and subsequent software re- engineering. HPC centres should

actively keep users

up- to- date with the latest information on emerging tools, monitor their development

an d rep or t on th os e tha t are dee me d mo st effective an d usef ul for the use r

community. Promising tools should be evaluated for robus tness , funct ionali ty, and

performance, and their applicability in real- life applications assessed. A tool that is

considered mature enough and provides added value should be promoted to the user

community of the HPC consortium by means of providing adequate training material.

4. An HPC use r may not rap id ly move onto the grid if thi s requi re s sign if ican t

changes to h is /her exist ing applica tion. Ideal ly such move must be seamless and

require li tt le software revisions. Getting users onto the grid will require that end- user

interfaces towards the grid middleware are simple or a re similar to what the user

uses in a tradit ional HPC environment. The computational scientis t need not have togo through a lengthy learning process before he /she fee ls comfortable using grid

technology. Ideally, the sheer technological complexity of the grid should be hidden

to the user as much as possible. The HPC centres will have to make a serious effort in

achieving this by choosing the right middleware technologies, provide friendly end-

use r in te rfaces (portal s) to thi s middleware, and p rovide adequa te t ra ining and

support for the new methodologies.

An emerging problem in grid computing is the sheer amount of portals that are being

developed world- wide . A grid portal may be construc ted as a Web page interface to

provide easy acces s to grid app li ca tions and p rovides use r authen t icat ion, job

submission, job monitoring, and results of the job. Many grid projects are developing

their own interfaces to software packages that have a well- defined interface (e.g. , the

chemis t ry applica t ion Gaussian) . There appears to be qui te some overlapping (and

also incompatible) ac tivi ty in th is context tha t could be avoided by merging portal

development projects and reusing previously developed (generic) portal framework 

software.

5. Standards for grid services, web services are rapidly evolving. This is good in the

sense that i t shows a s ignificant progress in (and need for) the methodology. On the

other hand, i t makes i t harder for a user of the grid (not necessarily a code developer)

to keep up- toda te with the la te st developments and keep the app li ca tion ’alive’.

Adapt ing to emerg ing web and gr id s tanda rds may requi re regu la r o r frequen t

changes to the grid- enabled application. This is undesirable for production software.

An addi t ional complica t ion is tha t HPC centres typica lly do not support the same

(recent) version of a rapidly evolving s tandard or tool. Moreover, d if ferent (e.g.,

vendor ) implementa tions of evolving s tanda rds may not all support the la te st

changes to the s tandard definit ion. Regular synchronization between the HPC centresis needed in this context to reduce this kind of portabil ity problems.

6. T he use o f open st anda rds should be p romoted ac tively. Open s tanda rds have

several advantages , see Sect ions 2.5 and 3. Also the contr ibut ion by users or user

groups to the s tandardiza tion process of open s tandards should be promoted. Open

s tanda rds are not on ly used in the area of high per fo rmance comput ing or grid

comput ing, but also in many other a reas of science , like for example visual iza tion

(OpenGL),

7. TheWeb has become the user interface of global business , and Web services now

off er a st ro ng fo un da ti on for so ft wa re int er op er a bili ty th ro ug h th e cor e op en

standards of XML, SOAP, WSDL, and UDDI. Models and applications that make use of 

this huge potential are just beginning to emerge.

It can be expected that Web services will play an increasingly more important role in

integ ra ting new ini ti at ives in HPC consor t ia . Web services s tanda rds permitapp li ca tion- to- app li ca tion in te roperabi li ty, bu t the coordina t ion o f a se t ofWeb

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 54/70

services working towards a common end is s ti ll an open issue. Several XML-based

pro tocol s are unde r s tanda rd iza tion to t arge t speci fi c needs o f bus ines ses and

applica t ion domains . These s tandards clearly illus t ra te the momentum behind the

Web services computing community.

T he l arge major ity of HPC use r community is not cu rren t ly fami li ar wi th Web

services. We therefore s trongly recommend that HPC centres provide good training

programs for the ir users to ge t acquainted a t an ear ly s tage wi th the bas ic conceptsand tools in Web services and show the users the impact these services can have on

their research activity.

8. Many of the new technologies require knowledge of programming languages l ike

Java and C++, scripting languages like Perl, middlewares like Globus, and XML-based

standards. Several of these are fairly new and evolving. Computational science groups

and (senior) scient is ts tha t t radi tional ly have used HPC in the ir ac tivi ties a re not

always familiar with these recent languages and technologies. This lack of familiarity

hinders the uptake of these new technologies by t radi tional HPC user groups. It i s

therefore important that HPC centres not only provide adequate training programs on

th e lat es t tec hn ol og ie s b ut als o on t he basic co mp on e nt s (an d con ce pt s) th at

underpin these technologies.

9. In i ts ear ly days, grid comput ing was often thought of as a computational grid, an

infras t ruc ture tha t combines a se t of d is t ributed comput ing resources into one bigcomputing re source on which one can run a la rge scale app li ca tion tha t so lves a

computat iona l p roblem tha t is too la rge to fi t on any sing le machine. However ,

application efficiency on such a grid remains an issue. The interconnect between the

individual resources can be s low, faulty, and insecure, and hence, the efficiency and

rel iabi li ty of the overal l dist r ibuted applica tion may not be what one would like .

Moreover, these limi ta t ions in the in terconnect may lead to under- uti lizat ion of the

individual (but expensive) resources. These days, one can see a shi ft towards grid

comput ing as being a data grid, an infras t ruc ture for dis tr ibuted data management

that is transparent to the user and application. Such infrastructure greatly facil itates

the reusabili ty of existing applications. For example, a data grid allows the physical

locat ion of the data to be decoupled from the phys ical locat ion of the appl icat ion.

The application need not be aware of this s ince the run t ime environment will make

sure tha t the data wi ll be t ransferred to a p lace where the appl icat ion expects the

data to be . The data grid provides mechanisms tha t form the glue between remote

app li ca tions , devices tha t gene ra te data, and data bases , and thus enab le s the

creation of smart coupled applications.

Anothe r emerg ing fo rm of grid computing is the collaborative grid. Such a grid

enables the creat ion of v ir tua l organiza tions in which remote research groups can

perform joint research and share data.

10. T he use o f g rid t echno logies will eventual ly lead to more remote (di st an t)

coll abora tions . It is the refo re es sent ia l tha t mechani sms are in p lace for code

maintenance by a large research group with multiple programmers modifying a s ingle

code. It is our experience that many scientis ts have li tt le knowledge of the wealth of 

tools available to assis t in this . An HPC centre or consortium should be committed to

provide advice and t ra in ing in code development and maintenance; applica t ion of  th es e tec hn iq ue s lea ds dir ec tl y to incr ea se d pr od uc ti vi ty an d en ha nc ed co de

portabili ty. This will become even more apparent when grid- enabling technology will

h ave bec om e m at ur e en ou gh to bec om e an acce pt d to ol fo r re mo te (glob al)

collaboration.

Finally, we note that several newly established European projects in Framework VI

address some of the issues addressed in this report . These include HPC- EUROPA [33]

and DEISA [24].

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 55/70

PROJECT 7

Grid Metacenter Demostrator

Demonstrating a European Metacentre: Feasibility Reporthttp://www.epcc.ed.ac.uk/enacts/demonstrator.pdf 

Chris Johnson, Jean- Christophe Desplat (EPCC)

Edinburgh Parallel Computing Centre

an d

Jacko Koster, Jan- Frode Myklebust (BCCS)

Parallab, Bergen Center for Computati onal Science,

University of Bergen (Norway)

an d

Geoff Bradley (TCD)

Trinity College Dublin

Study objectives

Here we describe the ENACTS Demonstra tor activity itself, starting with the objective

stated in the Technical Annex [ENA]. We then go on to explain how the objectives

evolved as new t rends wi thin the user community were fol lowed. We then give a

descr ipt ion of the del iverables expected. We a lso descr ibe the way in which this

“Demonstra to ” activity fits into the ENACTS project as a whole.

Objective [Demonstrating a European Metacentre]: To draw together the resul ts

 from all of the Phase I technology studies and evaluate their practical consequences

 for operating a pan- European metacentre and constructing a best- practice model for  

collaborative working amon gst facilities.

Since the t ime at which the objective was writ ten, trends and technology have moved

on and a fast and unpredictable pace. In particular, large scale scientific communities

have necessari ly become more concerned with handling the la rge amounts of da ta

they produce, rather than simply concerning themselves s imply with gett ing the most

out of compute cycles on large machines. For this reason, we concentrated on the

data aspects of the “pan- European metacentre” . The intent ion being to demons tra te

that “The Grid” is able to solve many of the problems of data- sharing across what are

bec om in g kno wn as “virt ual org an is at io ns ”, so me th in g whic h will bec om e

increasingly important over the coming years (see [EU]).The Demonstra tor ac tivi ty began on 30th June 2003 fol lowing on from a kick- off  

meeting in Dublin earlier in 2003 and involved three European Partners , previously

described:

Centre Role Skills & Interest s

EPCC leading activity Particle Physics & Globus

Parallab participati ng Physics & Globus

TCD participating & providing users Physics & Globus

In co nt ra st to t he ear li er ENACTS activiti es, t he m ai n delive ra bl es of th e

Demonstrator activity do not consis t of reports , but of the actual demonstrator i tself.

This document describes the work done during the ENACTS Demonstrator activity. It

is intended primarily as a “Feasibil ity” report for those interested in sett ing up a pan-

European metacentre based on our findings in setting up such a centre

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 56/70

Summary and conclusions

Successes

This demonstration project was successful in a number of areas:

A data Grid has been successfully deployed on three clusters/s uper comp uters

with QCDgrid running across the Grid.

An XML Catalogue is operating with an XML Schema to describe the MILC

metadata, which is stored in an eXist database.

The MILC code has been altered to produce machine independent XDR output.

The MILC code has been altered to readin/ writeo ut XML datafiles.

The test users have certificates and accessed the data Grid.

All components of the Grid have been demonstrate d to the users .

 Problem areas

The users had conceptual problems with the Grid and unders tanding the purpose of  

the Metacentre. Their l imited knowledge of Grid technologies automatically led them

to believe that they were gaining access to a computational Grid as opposed to a data

Grid.

User feedback

The users were two QCD scientists in the School of Maths, Trinity College Dublin.

Q. How familiar were you with Grid technologies prior to this project ?

A. Only aware of a user working on this type of project at Edinburgh University.

Q. Would you be more/l ess likely to get involved in a Grid project now ?

A. It would be nice to use such resources, but I think i t would need a huge number

of nodes before i t would be useful as other groups would also need access to the

machine. We would rather run on local machines with guaranteed resources.

Q. What functionalit ies were missing from the Grid test- bed ?

A. Ideally, the Grid would have one central node that allocates jobs depending on

the load of each cluster/su percom puter. Effectively the Grid should appear as

one large cluster. Additional web based documentation of resources etc. required

(Note: this was provided).

Q. Would access to this Grid improve your productivity/efficiency and how would i t

alter your work practices ?

A. Having loads of CPUs is helpful, but it would only really be usable if I didn’t

have to go searching for idle machines. A well organised file server would be

useful

37Q. Would you engage in collaborative activities more readily via the Grid ? Would

you share datafiles, results, etc. ?

A. Yes !

 Additional user feedback

The following additional user feedback was provided

Having machine independent data is useful.

It is useful to have the binary data accessible on all sites.

The metadata for each jobs is very useful because if you’re sharing data with

other users you can easily find out the parameters they used to generate their

results.

Automatic upload of XML & binary output after run has completed would be

useful. The MILC code was modified to do this.

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 57/70

I t would be des irable to have the ent ire jobs submiss ion process avai lable

through a web interface.

A global queueing system, global ‘qsub’ would be very useful.

T he la rge XML Schema used fo r MILC has many blank fi elds fo r thi s

application which is confusing and messy.

The whole Grid system seems to be a bit delicate !Where’s the computational Grid ?

In the longer term it would be better to take software writ ten by the TCD QCD

group and Grid enable i t.

The whole Certificate issue was very frustrating !

 Feasibility

Our ENACTS “Demonstr ato r” activity has shown that it is feasible to set up a datagrid

ac ross geographical ly di spe rsed si te s using avail ab le techno logy p rovided tha t

considerable effort is first put in to set up Globus and i t’s associated packages. Given

more time we could have increased the number of users to our sys tem which would

have provided us with more feedback.

One could also ask the question “How would such a Grid scale up to use by more sites

and users? There does not appear to be an obvious limi ta t ion of the number of    sites

or number of users wi thin QCDgrid i tsel f al though in the future i t would be useful

have more control over the ownership of f iles s tored in the Grid. For example , i t i s

no t  presently possible for users to delete fi les once they are on the Grid - this has to

be done  by the administrators .

The “demons tra tor” it se lf was intended to take present technology and evaluate i t,

and we beli eve thi s has been success ful . T he act ivi ty has al so mot ivated those

involved in the QCDgrid project to generalise the software for use beyond their initial

UK Grid. Readers are encouraged to try the software for themselves.

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 58/70

PROJECT 8

Survey of Users' Needs

http://www.epcc.ed.ac.uk/enacts/userrequirements.pdf 

Giovanni Erbacci and Claudio Gheller (CINECA)

Inter- University Consortium

an d

Satu Torikka (CSC)

Centre for Scientific Computing

Study objectives

The present report a ims a t present ing the resul ts of the “Survey of Users’ Needs”

scientific/tec hnolo gical activity, whose objective are to determine users' requirement s

for access to HPC facil it ies and Datastores and assess the implications for changes in

th eir workin g pat te rn s if th es e were pr ovi de d wit hi n a met ac en tr e mo de l.

Furthermore , the report should out lines how users perce ive emerging technology

af fect ing the ir r esea rch and look ing at the t echnolog ical ba rr ie rs to mobili ty of  

researchers . The resul ts a re based on the opinions of both large user groups and of  

individual users of high performance and dis tributed computing facil it ies in Europe.

Work Plan

One of the key tasks for ENACTS is to collect information from existing and potentialu ser s of High Perf or ma nc e and dist ri bu te d com pu ti ng abo ut their fut ur e

requirements. While much of the quali tat ive, in- depth informat ion on requirements

can be collected via ENACTS participants (both HPC centres and user representatives),

ENACTS also aims to collect quantitative information from a wider range of groups

via a web base d qu es ti on nai re. This will enable the Netwo rk to check the

requirements of a far wider c ross - sec tion of the computa t ional science community

than would otherwise be

possible.

The q ue sti on n ai re (see t he d et ail s fr om sec ti on 2) ha s bee n de sig ne d to gai n

informat ion in a range of a reas inc luding the va lue placed on current services, the

limitations and applications bottlenecks, the level of user experience and expertise

and future requirements .

In add it ion, detai led informat ion have been coll ec ted from key use rs groups,represented or identified by ENACTS participants , by means of an interview. Each

u ser grou p was ask ed the sam e series of ope n en de d que st io ns abo ut their

requirements.

 Main Task s

The “Survey of Users’ Needs” activity consists in five workpackages, totalling 6.6

staf f- mo nt hs of effor t. The elap se d tim e for the activity is 6 mo nt hs. The

workpackages are summarised below.

WP1 User survey 2.0 CINECA, CSC Network Co- ordinato r

WP2 Results analysis 1.0 CINECA, CSC Network Co- ordinato r

WP3 In- depth interviews 2.0 CINECA, CSC Network Co- ordinator

WP4 User requirements report 0.5 CINECA, CSC Network Co- ordinator

WP5 Dissemina tion 0.6 CINECA, CSC Network Co- ordinato r

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 59/70

WP6 Project Managemen t 0.5 CINECA

WP1: This is a survey of the cu rren t expe rience and requi rements of use rs and

potential users of HPC facil it ies . It involves the design and promotion of a web- based

quest ionna ire for use rs of European LSFs and HPC cent re s. Thi s workpackage

comprises 2 s taff- months of effort and has been performed by CINECA and CSC. The

other HPC centres in the ENACTS network has promoted the ques tionnaire to the irusers and contact organisations.

WP2: The quest ionna i re returns have been ana lysed and data - mined to look fo r

sig nif ica nt tr en ds . This wo rk pa ck ag e ha s ta ke n 1 m on th of eff or t an d will be

under tak en by CINECA and CSC.

WP3: Eleven in- depth phone or face- to- face interviews have been conducted wi th

representatives of s ignificant computational science research groups or organisations

in Europe, to solicit their views and opinions on future requirements for HPC. This

activity has been led by CINECA and CSC, with assis tance also from other ENACTS

participants , who have identified target interview groups and conducted interviews.

This workpackage took 2 months of effort.

WP4: The ou tpu t f rom the ana lysi s of the use r survey and the comple ted in - dep th

interviews has been used to produce the User Requirements report (this report). This

report deta ils the differ ing requirements of s ignificant user groups in Europe and

summarises user requirements for Grid Computing. CINECA and CSC have writ ten

this report . The workpackage had a duration of 0.5 s taff- months.

WP2: 1 m m

The report consis ts of 7 sections

• Chapter 1 introduces the ENACTS project and the objectives of the specific

activity.

Chap te r 2 de sc ri be s th e str uc tu re an d t he co nt en t of t he qu es ti on n ai re .

Furthermore i t presents the data collection and analysis procedure adopted.Chap te r 3, 4 an d 5 pr es en t the det ails of the variou s sectio ns of the

questionnaire and the results obtained for each section.

Chapter 6 presents the resul t of in- depth in terviews with severa l selec ted

users.

Chap ter 7 d raw the conclus ions and p ropose some recommenda t ions and

suggestions for the future development of users needs driven research tools .

Summary and conclusions

In this report we have presented the results of a research which aims at analysing the

re qu ir em en t s an d de si de ra ta of Euro pe an HPC u se rs wit h re sp ec t to hig h- en d

computing resources, applications and data management tools . Moreover the survey

out lines how users perce ive emerging technologies and how these can affec t the ir

research/develop ment work.

The research is based on a ques tionnaire dedica ted to users of HPC fac il it ies and

d at as to r es an d Grid u se rs , in pa rt ic ul ar r ese ar ch er s an d scie nt ific so ft wa re

developers who require medium/lar ge computing resources, so that they can provide

a meaningful feedback on high- end CPU devices and infrastructures.

Eleven in- depth interviews to selec ted representa tives of computa t ional sc ience

research groups or organisations in Europe have been collected and analysed in order

to provide a wider perspective vis ion of how technological resources should evolve in

order to fulfil the expectation of the research community.

Conclusions on the user questionnaire

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 60/70

The ques tionnaire was answered by 125 users f rom eighteen European countr ies ,

mostly university researchers, representatives of a much larger scientific community.

Most of th e pa rt ici pa nt s are pa rt of me di um (2 - 1 0 me mb er s) or lar ge (11 - 50

me mb er s) resea rc h grou ps. It m us t be notice d th at the majo ri ty of the se

collaborations are local; only a third of the research groups account for international

collaborations. This is mainly due to the difficulty of having everyday remote working

sessions. The collaborative environments technologies can be crucial to improve thediffus ion of t ransnat ional working groups and the spread of skil ls and knowledge

across Europe.

The average present- t ime HPC user which emerges f rom the Quest ionnaire Part 2

shows tha t there is s ti ll a “t radit ional” approach to computer sc iences. Most of the

participants use local resources (workstations, departmental servers) with small (1 to

4 processors) or medium (8 to 32 processors) configurations. The main concerns are

related to the speed of a s ingle CPU and the memory size, rather than having plenty

of d ist r ibuted resources. The access to the computing pla t forms is most ly via ssh

connection, rather than more sophisticated methods l ike web portals . This can be due

mos tly to s ecur ity concerns. T he mos t common ope ra t ing systems are Linux and

various proprie tary Unix flavours – Aix, Irix e tc . Windows is get ting a growing

success, even though it is s ti ll quite li tt le diffused in the research community. Other

p roduc t s like MacOS are qui te uncommon. A la rge f ract ion of the resea rche rscomput ing related work is dedica ted to code development. Commercia l or f reeware

codes are not common in the scien ti fi c community. T hi s is due both to the high

specificity of many problems, which require dedicated algorithms and codes, and to

certain scepticism towards commercial software. Self- made or self- modified codes,

s ta r ting from previous ly home- made programs, a re the common choice for mos t of  

the researchers. Very li tt le space is given to commercial or freeware applications.

Specialised scientific libraries are frequently used, s ince they are highly optimised,

precise and accura te tools to perform s tandard tasks (like array and linear algebra

operations, FFT etc.). A tradit ional approach to numerics is once more confirmed by

the choice of Fortran and Fortran 90 as programming languages. However, also other

high- per fo rmance languages, in par ti cu la r C and C++, a re s ta r ting to di ffuse,

e mp ha si si ng th e op en ne ss tow ar d t he ex pe ri me n ta ti on of dif fe re nt an d ne w

techno logica l oppor tuni ti e s. T hi s is conf irmed al so by the in te re s t toward open

source products , which usual ly do not represent comple tely s table and easy to use

tools , but can provide the basic components to develop new applications.

The data seem not to be a major concern for researchers. Most of the applications are

comput ing in tensive , but the amount of da ta tha t a re produced is rapidly growing.

However their s torage, management and even analysis are considered as a secondary

problem. Results are usually s tored in files , with no particular organisation and often

with no s tandard format (like CGNS, FITS, HDF). Usually either raw binary fi les or

ASCII tables are used to save data . This can represent a s trong limi ta t ion for data

exchange even ins ide the same research group. But, f irst of al l, i t can be a cr it ical

challenge for the s tandardisa t ion and in teroperabil ity effor t of the international

community and, more generally, for diffusion of the knowledge, cooperation and best

exploitation of resources.The crucial role of the computing power emerges also from the analysis of the users’

interest and involvement in Grid related issues. In fact , the main attractive is to have

a larg e agg re ga te co mp u ti ng po we r availa bl e, wit h m uc h less co nc er n of its

architecture. This is confirmed also from the other points the users have indicated as

the most cri tical in a distributed system, that are high network bandwidth, principally

to download data, and e ffi ci en t schedule rs , to max imise the throughpu t o f the

workf low. Othe r opportun it ie s, like por tal s, sha red file systems on di st ribu ted

pla t forms , large s torage capacity, data management via databases, collabora tive

working session, are perceived as much less interesting and attractive. The Grid, since

it is s t il l in a development phase, is seen as an unfriendly environment, mainly due

to

lack of s table API, incomplete documentation, difficult ies in the management and in

the development of suitable applications. Nevertheless , a large fraction of the users iswilling to cont ribute to the Grid in fras t ruc ture development, if p roper he lp and

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 61/70

support i s provided. It i s a lso interes ting to not ice tha t more than 27% of the users

have al ready been involved in Grid related research projec ts. Fina lly, i t i s very

encouraging tha t a la rge majori ty of the users i s willing to share the ir codes and,

under proper condi t ions , the ir data and resul ts. However, those who do not want to

share resources jus t ify the ir choice wi th the fac t tha t these are too specific of the ir

work and therefore useless for the community.

Conclusions on the in- depth interviews

The in- depth interviews have outlined the following results.

The need for more CPU power was emphas ised by most representa t ives. Some of  

them considered the need for more s torage capacity as important as CPU power, and

some saw tha t data s torage and data management have become more important than

CPU power. The importance of da ta management tools i s like ly to increase and the

development of data analysis , data mining and visualisation tools will continue. The

future “winning” HPC architecture for the scientific community will be clusters for

the throughput volume computing. Clusters, possibly combined with Grid access, will

compr is e common, relat ively cheap p rocessors and ef fi ci en t networks be tween

p ro ce ss or s. Also th e evol uti on in t he so ft wa re are a will be tow ar d s clu st er s

dis tancing from pla tform- specific applica t ions. Shared memory systems, e ff ic ient

parallel machines especially, will remain as solutions for selected sub- communitieswho need

peak pe rfo rmance . In some answers the p ri ce of the sha red memory sys tems and

vector machines was considered too high for the future systems. The amount of data

wil l increase in the future. Many research groups wi ll use huge data se ts s tored in

dis t ributed s torages . Easy and rel iable access to the data se ts is important and i t

requires international collaboration . Technological solutions have to be developed to

handle very big data sets.

The Grid is here to s tay and developing according to all the interviewed, but opinions

of i ts development vary. Grids can be useful for the scientific community, or Grids

may remain as l imited solutions used by tradit ional high performance computing and

people needing manageab le huge data se ts di st ributed ove r seve ra l places . For a

successful future of Grid infrastructures, a lot of development is needed to make the

Grids more reliable and easy enough to use. There are interesting pilot projects going

on all around EU. The interviewed agree that the scientific commu nity is willing to

acce pt th e sh ar in g of k no wl ed ge , res ou rc es , to ol s an d da ta , bec au se it is t he

prerequisite for having a successful Grid architecture. However, there will be both

open and commercia l sof tware tools avai lable. Databases will be open to a la rge

extent, but commercial interests may restrict the access to data. The real challenge is

col labora t ion on both the pol it ical and technica l level. More agreements wi ll be

needed how to use the data and resources.

Virtual organisations, dis tributed workshops (e.g. Access Grid), or dis tance learning

are already useful tools in educational and training activit ies . Today, the technology

is s ti ll d if ficult and requires usually a se t of non- s tandard equipment and sof tware

and special skil ls to ope ra te. In orde r for these too ls to be more use fu l for the

scient if ic community, both technologica l and human resources are needed, e.g . tohave easy access from your desktop. These tools can enable faster, cheaper and more

environment ally friendly communica tion.

The developments of c luster comput ing and data s torage solutions are seen as the

most important expectat ions of the scient if ic community wi th respect to the near

future technologica l development. The HPC centres will have an important role in

helping the scientific communities build cluster solutions and giving users advanced

applica t ion support , e.g. in code opt imisa tion and Grid usage. HPC centres can

become the backbone of a true Grid infrastructure offering the scientific community

an easy and sound access to HPC and datastores .

Closing remark s

The user requirements, represent ing a common view of the HPC users ’ community

and the rep resen tat ives of sign if ican t computat iona l science re search groups o rorganisations in Europe, can be summarized as follows:

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 62/70

The crucial role of the computing power is the key issue in the involvement in Grid

related issues. It is very encouraging that a large majority of the users is will ing to

share knowledge , tools , data and results . The main challenges Grid comput ing wi ll

face will be pol it ical; more agreements will be needed how to use the data and

resources in multinational Grid infrastructure.

T he Grid is in development st age and use rs a re willing to cont ribute to the Grid

infrastructure development, if proper help and support is provided. Virtual tools for

educa tion and t raining could enable fas te r, cheaper and more envi ronmental ly

f ri endly communica t ion . T o enhance the adap ta tion o f vir tual too ls , these too ls

should be more user- friendly.

The growing amount of da ta tha t a re produced puts challenges . Standardisa tion and

interoperabi l ity effor ts of the interna t ional community are needed for di ffusion of  

the knowledge, cooperation and best exploitation of resources.

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 63/70

APPENDIX III

LIFELONG Distance Learning and Support Study in the ENACTS Project

 Josef Novák, I.C.C.C. Group, a.s, Miroslav Rozložník,  I.C.C.C. Group, a.s.,

Miroslav T?ma,  I.C.C.C. Group, a.s

 Abstract: - The paper is devoted to the joint scientific and technological s tudy on distance learning

and support wi thin the ENACTS project , which is a network of leading ins ti tu tions f rom around

Europe offering High Performance Computing systems and services together with key developments

in the area of Grid computing. The firs t part describes the ENACTS project with an emphasis to the

methodo logy for evalua tion of di st ance learning t echn iques and suppor t used in app li ca tions

related to Grid Comput ing. The second par t of the paper presents the main results achieved in the

study on current expe rience wi th Dis tance learning in Grid Computing communi ty. We will

emphasize especially the lifelong aspects of dis tance learning.

1 The ENACTS Project

The ENACTS Projec t i s a Co- opera tion Network in the ‘Improving Human Potent ial Access to

Research Infrast ruc tures ’ Programme. It i s running since 2000. This Infrast ruc ture Co- operat ion

Network brings together High Performance Computing (HPC) Large Scale Facilities (LSF) funded by

the DGXII's IHP programme and key user groups. Its aim has been to evaluate future trends in the

way that computational science would be performed and to cover the pan- European implications as

well . As a part of the Network's remit , it runs a Round Table to monitor and advise the operation of 

the four IHP LSFs in this area, EPCC (UK), CESCA- CEPBA (Spain), CINECA (Italy), and BCPL-Parallab

(Norway).

This co- operation network follows on from the successful Framework IV Concerted Action(DIRECT: ERBFMECT970094) [1] and brings together many of the key players from around Europe

who offer a rich diversity of High Performance Computing (HPC) systems and services. In ENACTS,

our s trategy involves close co- operation at a pan- European level – to review service provision and

distil best- practice, to monitor users ' changing requirements for value- added services, and to track 

technologica l advances. In HPC the key developments a re in the area of Grid computing and are

driven by large US programme s.

In Europe we urgently need to evaluate the s tatus and likely impacts of these technologies in

order to move us towards our goal of European Grid comput ing, a ‘vir tual infrast ruc ture’ - where

each researcher, regardless of nationali ty or geographical location, has access to the best resources

and can conduct collaborative research with top quality scientific and technological support. ENACTS

provides pa rt icipan t s wi th a co - ope ra tive s truc ture wi th in which to review the impac t of Grid

comput ing t echnolog ie s, enab ling them to fo rmula te a s trategy for inc reas ing the quant i ty and

quality of access provided.

The pri nci pa l obj ec tiv e of th e pr oj ec t is to en ab le th e for ma ti on of a pa n- Euro pe an HPC

metacentre. Achieving this goal requires both capital investment and a careful s tudy of the software

and support implications for users and HPC centers. A part of the latter goal is the core objective of 

this s tudy. The project is organised in two phases. A set of s ix s tudies of key enabling technologies

has been undertaken during the firs t phase:

1. Grid service requirements (EPCC, PSNC)

2. the roadmap for HPC (NSC, CSCISM);

3. Grid enabling technologies (ETH- Zurich, Forth);

4. data management and assimilation (CINECA, TCD);

5. distance learning and support (ICCC, UNI- C);

6. software efficiency and reusability (UPC, UiB).

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 64/70

2 Joint Scientif ic and Technological Study on Distance Learning and

Support

This sec tion descr ibes bas ic issues of a s tudy devoted par t icular ly to d is tance learning and

support . Objective s tart ing points (sources) of this s tudy were: users accessing facil it ies through a

pan- European metacentre , typica lly, require t ra in ing and support f rom remote centers in order to

make best use of the available facil it ies . The WWW-based technologies are emerging to accomplish

this , but are largely untried outside intranets. Here, ENACTS aims to determine the most appropriate

support and t ra in ing methods and the enabling technologies . The advent of Grid comput ing will

make it ever more likely tha t users will be using fac ili ties remotely. This means tha t the same

networked technologies must be used to provide t ra in ing and support . Current ly , there is li tt le

st anda rd izat ion in the t echnolog ie s used to develop t raining courses and none in the area of  

dis tributed support .

Let us describe now the technical objectives and conditions for this s tudy. They are implied bythe necess ity to agree on a f ramework for collaborat ive development of d istance learning based

course material. The objective is to make i t easy for participants (and other institutions in Europe) to

develop or customise re- usable training material. The widespread availabili ty of dis tance learning

material will increase the accessibili ty of HPC systems. One of the aims of the metacentre is to make

t ra ining in appropr ia te tool s and t echn iques avai lable to resea rchers who are remote f rom the

fac ili ties they are accessing. The provision of an appropria te f ramework for course development

makes this more feasible.

Distance lea rn ing is thus o f in te re s t to all inf ra st ructu re ope ra tors and resea rch groups in

Europe, but the cost of developing and maintaining training material has discouraged most centers

from commit t ing time and effor t to i t. Col labora tive projec ts offer the potentia l for sharing the

costs and the effort , but i t is vital to select an appropriate development environment. The result of 

this research is a set of papers and focused s tudies reviewing the s ta te of the tools , the s tandards

and the method ology will be of practical benefit.The definit ion of dis tance education may seem straightforward enough, but there is an ongoing

debate as to what is involved in the p roces s and concep t of di st ance educa tion . Glenn Hoyle' s

Distance Learning on the Net (http:/ /www.hoyle.com/distance.htm ) provides a l is t of definit ions of 

Distance Learning from various sources. His own summary is: "Distance Learning is a general term

used to cover the broad range of teaching and learning events in which the s tudent is separated (at a

distance) from the instructor, or other fellow learners."

There are many different dis tant learning definit ions. They will be discussed in the talk. With few

notable exceptions, the actual beneficiaries and users of tomorrow’s Grid technology have not yet

e stab li shed a dialogue regarding s tanda rdiza tion of t echnolog ie s and tool s used in educa tion.

Remarkably, formal remote training and dis tance learning are not rated highly in our survey. Yet, i t

i s appropriate to ra ise a quest ion about what would be the most relevant and effect ive dis tance

learning method in the context of European and international Grid communities.

3 Grids and Distance Learning: Results and Recommendations

3.1 Peer- based versus organized courses

This subsection is devoted to basic overview of our main results . Firs t, in the Grid community

there is a non- negligible interes t in acquiring scientific and technological information . What we

did not expected was tha t mos t of the use rs acqu ire thi s informa tion by means of t radi tional

sources. In particular, the users prefer s tandard research papers and non- research articles, manuals,

bookle ts , hardware , sof tware and sof tware documenta t ion. Yet, there is one modern fea ture . The

ways to access are based on elec tronic tools (e- mai l or interne t) . The web environment is the new

wrapper that includes mainly the classical sources of information as l is ted above.

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 65/70

The new scientific and technological information can be extracted not only from a paper- based

agenda. Another important way to obtain i t includes workshops, conferences, congresses. As far as

the s ize of such meetings is concerned, there is typically a reasonable l imit. I t is well- known that

meetings with larger number of participants are less effective in passing scientific information and

for the learning process. Rather, they play a social role in the scientific society. They are important

for celebrating important personalities, awarding prizes and the l ike.

We have investigated the role of the training type in the overall educational process. Based on theprevious s tudy, there are basically two important types of training: informal and organised training.

The former is individualis t ic and tries to understand the subject matter from scratch. In such cases,

for instance learning a programming language, i t is important to s tart with reading examples and not

reading user guides or manuals . These tools a re more important la ter when the informal t ra in ing

transfers smoothly into an organised training. Users often really need to check carefully basic ideas

of the new subject. The amount of t ime spent in these initial exercises which we might call a setup

time is rather individual . T hen the use r might a tt end in tensive courses on some so ftware or

hardware p roduc t s, lea rn new ways how to cope with new communicat ion too ls and how to use

grids.

If we t ry to point out the consequences of the bas ic user proposals , we can see that they may

need a peer- based (community facilitated resource) rather than organised ins truc tor courses . The

latter courses or training should come later when the user requires more advanced information. One

example from previous times was a communi ty of use rs of paral le l computa tiona l too ls . Theindividual access was much more important than an organised learning of various formalisms. Note

tha t an organised t ra in ing might be in this case not very effect ive s ince the techniques of paral le l

programming and using parallel machines are rapidly changing. In addition, parallel programming

tools are typically very individualistic.

Use rs are in te re s ted in ”Open source ” approach to t ra in ing and informat ion resources and

ma te ri al s. They ne ed to have acce ss to da ta ba se s in or de r to be able to see close ly related

information. They need to be able to have enough material in order to extend the ir knowledge in

various directions.

The next paragraphs will try to analyze the obtained results from the point of view of s tandard

distance learning. Then a specific distance learning way will be propos ed. As we have seen, five basic

features can characterize the mainstream of distance learning (conservative definition):

1 it is a highly s tructured act iv ity

2 it dea ls with a h ighly structured content

3 it i s typica lly a one - to - many proces s (teacher- centered proces s) in which the tutor

plays really a key role

4 it i s characterised by a frequent monitoring of i ts partic ipants by tes ts, ass ignments

etc.

5 it is based on combination of var ious web- based too ls .

3.2 Role of teacher in distance learning for grid computing

Consider now ques tions the users may ask. They may need large data resources tha t do not

precisely correspond to the previous dis tance learning characterization. Although the activity might

be cons idered as h ighly s truc tured in both i ts form and content , it i s not teacher- centered in thes tr ic tes t sense of the word. The database- oriented learning serves more for informat ion exchanges

between two partners: those who create them and those who use them. Nevertheless , this database-

oriented learning has a tendency to develop in the teacher- centered way. While the firs t encounters

with the teaching in grid computing might be very unorganised, they have to change. They have to

transform into a fully organized training.

We see another conclusion concerning the role of teachers and students in dis tance learning.

It appears tha t the delayed form might be fur ther developed by increased ac tivi ty of the s tudents.

Let us t ry to explain thi s conc lusion more ca re fu lly. T he re a re many types of speci fi c grid

computations. Just now it would be very costly and inefficient to prepare very specialized experts in

the field of Grid Computing that is changing so rapidly. Nowadays, i t is more important to increase

the genera l level of knowledge of Grid Comput ing in par ticular communit ies. Deep t ra in ing of  

ex pe rt s mig ht be m or e u sef ul o nce Grid Com pu ti ng beco me s a gen er al ly acce pt ed fo rm of  

computation. In such a future scenario, s tudents will play a more active role in the learning processthan current dis tance learning analysis based on the questionnaire suggests .

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 66/70

In this context, let us dis tinguish two basic user groups: experienced users entering the new

emerg ing fi eld of gr id computing and s tuden t s who are ge tt ing the ir fir st qua li fi ca tion . One

important di fference between these users consis ts in the ir poss ibi li ties. While for an experienced

user there is no difficulty in attending a couple of workshops per annum, this could be a problem for

students because of a lack of funding. Consequently, the s tudents are a very specific class of users

with a much more open a tt itude to distance  learning technique s .

Now le t us t ake in to accoun t a use r - fri endly envi ronment - - a rea sonab le compromise of  users’ demands for a flexible distance learning framework. We will call it a grid training portal. It is

a cus tomisable , personalised web in terface for accessing services using dis tance learning and

education tools . It would provide a common gateway to resources with special attention to the tools

mentioned above. That is, personalised and delayed distance learning forms must be preferred.

3.3 Longlife Distance Learning Portal for Grid Users

In th is subsect ion we do not aim to present a f ixed form of what we cal l the dis tance learning

portal which we mentioned also above . Instead, we would like to summarise our conclusions and

subsequen t recommenda tions in t erms of a flexible tool ”under construction”. Of course, this tool

should reflect available technologies as well. We will describe its basic structur e putting an emphasis

on i ts most important features.

Control (manageme nt) unit

User resources

Information resources

Communication subsyste m

Let us now descr ibe these par t s of the p roposed por ta l system. T he control unit , o r more

accurately, the management unit contains basic institutions and individuals jointly with tools that

ru n th e po rt al. Alt ho ug h th er e sh ou ld be vari ou s sp ecif ic ru le s ho w to ha nd le t he po rt al

o rgan isat ion, it is impor tan t to so lve the p roblems of it s t echn ical upda te s, financ ia l support ,

technological development, software upgrades etc. Some of them might need a rather sophisticated

strategic decision. The control unit should contain two specific layers: service (maintenance) layer

for implementing the control mechanisms and evaluation layer. One of the mos t important tasks of  

th e co nt ro l u ni t is to bal an ce tw o ba si c fu nc ti on s. Firs t, th er e will be a st ro ng pr es su r e of  

technological developments on hardware tools which will include both the node demands and the

network demands . Technologically, these demands will present themselves in the need to make the

nodes more powerful and to make the network with ever- la rger bandwidth . Neverthe less, there is a

st rong gap between ve ry fa st p rocessors and relat ively slow connect ions. In o ther words, the

network technology is lagging behind while increasing the network bandwidth is a real technological

challenge. Second, the management unit must take care of financial resources including:

1. Start−up development costs

2. Cost of the hardware and software tools connected to the portal including regular

updates and upgrades

3. Operational cost of the technology

4. Management cost

5. Technology remediation

The cost function must be carefully evaluated and balanced with the technological requirements

that increase the overall portal cost . Once these two items are balanced, we should take care of the

overal l efficiency. Mr. Soren Nipper in [14] proposed and presented a p ic ture showing the user

groups on top of the pyramid with i ts large base corresponding to the overall costs . This might be a

figure which is temporari ly val id now. However, it does not need necessar i ly correspond to the

future development . Therefore , when taking into account such models, a rea list ic forecast has an

important role. As of now, this figure might represent large s tar tup cos ts since we are s ti ll in the

start−up period of the Grid technology.

By user resources we mean the groups of portal users . Their leaders will be engaged in s trategic

decisions. The learning mechanism will not be s trongly teacher- centered but more or less teacher-student balanced. The users will come with new initiatives in order to offer overall improvements . As

far as the target groups are concerned, they should not be very la rge assuming the results of the

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 67/70

questionnaire. On the other hand, we do not have an exact idea how they will develop in the future.

Some hints, however, sugges t tha t the target groups may increase , such as para llel computa t ional

tools two decades ago. After a long period of relatively small user groups, we can see large teams

collaborating over a net on the development of large- scale HPC applications using dis tributed tools

like SourceForge and dea ling with power ful ve rs ion of synchron is ing so ftware tha t enables

collaboration of tens of developers.

Information resources a re the thi rd par t of the ove ra ll por ta l system. By tha t we mean thetechnologica l content of the portal covering both i ts hardware and sof tware par ts, par t icular ly

informat ion databases with papers, lec tures in wri t ten or recorded forms , s imula tion sof tware and

technologica l tools to present a ll these various materials. As of now we do not have very la rge

multimedia resources for Grid Computing in our field. This will likely change in the future, however.

In any case, the development of mechanisms how to s tore, protect , develop, update and clearly

organise these data i tems is a more challenging problem. There will be a specific layer in this i tem. A

specific fea ture of the informat ion resources wil l be it s h ierarchica l nature. In our case , the grid

content will be first discussed on the level of HPC centers then on the national level (if there is some)

and fina lly wi thin the European gr id community. In fac t, this subsystem is exact ly the one which

must be organized hierarchically.

As far as the portal content of the informat ion resources is concerned, it can be s t rat i fied into

independent layers . The firs t layer may contain databases of writ ten and electronically dis tributed

in fo rma t ion. The use rs access ing the por tal have di fferent needs . T he re fo re, the documentscontained in its databases should be sorted out according to their requirements . Other data, such as

tes t codes, video and audio mater ia l, should be s tra ti fied in a similar manner , thereby crea t ing an

user- friendly environment for Grid users .

The final item is the communication subsystem - - a mechanisms that of exchanges information

among the three previous it ems. The communicat ion pat t erns do no t need to be uni fo rm fo r the

wh ol e po rt al . In fac t, var io us way s of co mm u ni ca ti on can be su pp o rt e d. For so me typ es of  

information exchanges, the synchronous connections, such as videoconferences, audio conferences

or Access Grid, are preferable. Sometimes asynchronous mechanisms are preferred. In general , we

dis tinguish two types of informat ion signals : control and service ones. The fi rs t type serves for

keeping the portal in good shape and support ing i ts development . The second type (which should

prevail by a large margin) serves the users. 

4 Conclusions

In the four sec tions of our paper we summarized the bas ic points of the distance learning- related

part of the ENACTS project of the EU. After describing its basic goal we devoted to the methodology

and re su lt s. Embedd ing of our resul t s in to the di st ance lea rn ing and suppor t framework is al so

interes ting from the theore t ical point of v iew. On the other hand, c reat ing of the learning portal

seems to be very pract ical implica tion of our research. In this paper we covered the bas ic is sues

related to i ts form and contents . While creat ing a portal i s our main recommendat ion, the ques tion

rem ai ns op en as to wh et he r thi s is n ot so me th in g we ha d in mi nd eve n bef or e we sta rt ed th e

research. In o ther words , we need to ask whether this answer do not h ide some other poss ible and

eve n co mp le te ly diff er en t sol ut io n. We hav e ai me d at mi ni mi zi ng thi s hi dd en risk by ou r

methodology in which we give to some standard technologies a new, well- defined content.

 References:

[1] Glenn Hoyle: Distance Learning On the Net. (http:/ /www.hoyle.com/distance.htm ).

[2] Distance Education Clearinghou se

(http:/ /www.uwex.edu/disted ), The University of Wisconsin −Extension, 2003.

[3] Distance Learning Resource Network (http:/ /www.dlrn.org ), Star Schools Program, U.S. Dept. Of 

Education, 2000.

[4] Virginia Steiner: ”What is Distance Education?”, DLRN Research Associate, 1995.

(http : / /www.dlrn .org/ text / l ibrary/dl /what is .h tml ).

[5] Michael Moore, Greg Kearsley:   Distance Education: A Systems View, Wadsworth Publishing

Company, 1996.

[6] Instructional Technology Council, Washington DC, 2003. (http:/ /144.162.197.250) .

[7] The Institute for Distance Education, University of Maryland University College,

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 68/70

(ht tp : / /www.umuc.edu/ ide /modlmenu.html ).

[8] Dista nce Educatio n: A Cons um er’ s Guide, Weste rn Coop er at ive for educ at io nal

telecommunica tions, Pub. 2A300, 12 pp., 1997.

(http:/ /www.wcet. info/resources/publications/conguide/index.htm ).

[9] Derek Stockley, Chirnside Park, Melbourne, Victoria, Australia, 1996- 2003.

(http:/ / der ekstockley.com.au/elearning- definit ion.html)

[10] I. Stanchev, E. Niemi, N. Mileva. Teacher’s Guide ” How to develop Open and Distance Learning Materials ”, University of Twente, The Netherlands.

[11] Kit Logan and Pete Thomas. Learning Styles in Distance Education tudents Learning to Program.

Proceedings of 14 th Workshop of the Psychology of Programmin g Interest Group, Brunel University,

June 2002, pp. 29- 44. (www.ppig.org )

[12] Karen Bradford: Deep and Surface Approaches to Learning and the Strategic Approach to Study

in Higher Education; Based on Phenomenog rap hic Research. (http:/ /www.arasite .org/guestkb.htm ).

[13] The Centre for Teaching and Learning, University College, Dublin

(ht tp : / /www.ucd. ie / teaching/good/deep.htm)

[14] Soren Nipper: talk in Dublin, annual meeting of the Enacts project, 2003.

[15] (www.online- col leges- courses- degrees- classes .org/q- and- a- onl ine- accredi ted-

college.html

[16] VOD – The Online ITV Dictionary (http:/ /www.itvdictionary.com/vod.html)

[17] Institute of Educational Technology at The Open University (http:/ / iet .open.ac.uk/courseonline/)[18] Stanford Online and the Stanford Center for Professional Development (www.scpd.stanford.edu )

[19] TheAccess Grid Project(http:/ /www.accessgrid.org)

[20] The European Access Grid

(http:/ /euroag.accessgrid.org )

[21] EPCC’s Access Grid Node

(http:/ /www.epcc.ed.ac.uk/computing/grid/accessgrid/  )

[22] (http : / / tecfa .unige.ch/edu- comp/edu- ws94/contr ib /peraya . fm.html  )

[23] Interwise - enterprise comm unication s platform for Web conferencing

(http://www.interwise.com )

[24] Guide to Web Conferencing, Online Conferencing, e- Conferencing, Data Conferencing…

(http:/ /www.thinkofit .com/webconf/  ).

[25] ht tp : / /publ ish .uwo.ca /maandrus /Table .h tm

Blackboard.com (http:/ /www.blackboard.com ).

[26] WebCT (http:/ /www.webct.com ).

[27] Fle3. Learning Environments for Progress ive Inquiry Research Groups, UIAH Media Lab,

University of Art and Design Helsinki (http:/ /fle3.uiah.fi)

[28] ILIAS Opensour ce, University of Cologne (http:// www.ilias.uni- koeln.de/ios /in dex.ht ml )

[29] COSE VLE, Staffords hir e University Enterprise s Ltd. (http://www.staffs.ac.uk/COSE).

[30] The Centre for Educational Technology Interoperability Standards (CETIS)

(http://www.cetis.ac.uk/) .

[31] LeGE- WG: Learning Grid of Excellence Working Group. (http:/ / www.lege- wg.org).

[32] Distance learning projects in EU.

(http:/ / www.know- 2.org/index.cfm?PID=1) .

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 69/70

APPENDIX IV

LIST OF ACRONYMS

ACM Association for Computing Machinery

AI Artificial IntelligenceAPI Application Program Interface

ASP Application Service Provider

ATM Asynchronous Transfer Mode

B2B Business- to- business

BEOWULF Cluster of PCs connected by a fast network 

BLAS Basic Linear Algebra Subprogra ms

BSP Bulk Synchronous Parallel computi ng

CAVE CAVE Automat ic Virtual Environment

CI Configuration Interaction

CMOS Complementary Metal Oxide Semiconducto r

COTS Commercial Off- The- Shelf 

CP Car- Parrinello

CPU Central Processing UnitCRM Customer Relationship Management

CSCISM Center for High Performance Computing in Molecular Sciences

DAS Direct Attached Storage

DFT Density Functional Theory

DRAM Dynamic Random Access Memory

DRMAA Distributed Resource Management Application API

DSP Digital Signal Processing

DTF Distribute d Terascale Facility (IBM HPC installation in the US)

DWD Deutche Wetterdienst

EJB Enterprise JavaBeans

ENACTS European Network for Advanced Computing Technology for

Science

EPCC Edinburgh Parallel Computing Centre

ERP Enterprise Resource Planning

FAQ Frequently Asked Questions

FeRAM Ferrorelectric RAM

FF Force Field

FFT Fast Fourier Transform

FeDFS Federated file system

FM Fast Multipole

FP- CMOS Flexible Parameter CMOS

FPGA Field Programma ble Gate Array

GFS The Global File System

GGF Global Grid Forum

GSN Gigabyte System Network GT2 Globus Toolkit 2

GUPS Giga Updates Per Second

HPC High Performance Computing

HPF High Performance Fortran

HSM Hierarchical Storage Management

IA-64 Intel Architecture 64 bit

IDC International Data Corporation

IEEE Institute of Electrical and Electronics Engineers

IP Internet Protocol

ISV Independen t Software Vendor

ITRS International Technology Road- map for Semiconductors

J2EE Java2 Enterpris e Edition

JSP Java Server PagesJXTA JuXTApose

8/3/2019 Stavros C. Farantos- eUROPEAN nETWORK for aDVANCED cOMPUTING tECHNOLOGY for sCIENCE

http://slidepdf.com/reader/full/stavros-c-farantos-european-network-for-advanced-computing-technology-for 70/70

LAN Local Area Network 

LAPACK Linear Algebra PACKage

LTC Linux Technology Center (at IBM)

MCSCF Multi- Configuration Self- Consistent Field

MD Molecular Dynamics

MPI Message Passing Interface

MPP Multiple Parallel ProcessingMRAM Magnetore sistive RAM

MTA Multi Threaded Architecture (Cray HPC system)

NAS Network Attached Storage

NAVi Network Animated View

NSA National Security Agency

NSC National Supercomput er Centre

NUMA Non- Uniform Memory Access

OGSA Open Grid Service Architectur e

OpenMP Open Multi Processing

P2P Peer- to- peer

PBLAS Parallel Basic Linear Algebra Subprogra ms

PC Personal Computer

PP Pseudo PotentialPW Plane Wave

QOS Quality Of Service

RAM Random Access Memory

RFP Request For Proposal

RSL Resource Specification Language

RTE Run Time Environmen t

SAN Storage Area Network 

SCSI Small Computer System Interface

SCSL Source Code Software Licensing

SHMEM Shared Memory (access library)

SIA Semiconduct or Industry Association

SMP Symmetri c Multi Processing

SPEC Standard Performance Evaluation Corporation

SPP Special Purpose Processor

Sun ONE Sun Open Net Environment

TCO Total Cost of Ownership

TCP Transmission Control Protocol

TSMC Taiwan Semiconduct or Manufacturing Company

UMA Uniform Memory Access

UPC Unified Parallel C

W3C World Wide Web Consortiu m

WAN Wide Area Network 

WDM Wavelength Division Multiplexing

XC Exchange Correlation

XML Extensible Markup LanguageZPL Z (level) Programm ing Language


Recommended