Complex HPC meeting

Complex HPC meeting

Mark Baker Anne C. Elster Emmanuel Jeannot Antonio Plaza Leonel SousaFrédéric suter

October 19-20 2009Lisbon, Portugal

1 Program

Room 2.2 Room 2.3 Room 1.1

9h-9h45

9h55-10h55 2

10h55-11h25

11h25-12h45 2

12h45-14h15

14h15-15h55 6

15h55-16h25

16h25-18h05 4

Evening

9h-11h 10 8 7 9

11h-11h30

11h30-12h00

12h15-12h45

12h45-14h

14h-16h

Mon

day

octo

ber

19

Social Event (Restaurant TBA)

MC Meeting

ScheduleTime

Tues

day

obto

ber

20

Free time for discusssion

Lunch

Break

Working group discussions

Wrap-up

3

1

Room 2.1

Lunch

Break

Break

5

Introduction

1

1

Group nam

eFirstnam

eLastnam

eTitle

Keyw

ords

Anne C

.Elster

GPU

Com

puting and Nurm

erical Algorithm

s for Com

plex HPC

GPU

Com

puting and Nurm

erical Algorithm

s for Com

plex HPC

FranciscoF. R

iveraResearch activities of the C

A group (U

niv. Santiago de C

ompostela)

Dom

ingoG

imenez

Parallel Routines M

odelling and Applications at the Parallel C

omputing G

roup of the U

nviersity of Mu

parallel routines modeling, heterogeneous com

puting, scheduling, parallal computing applications

Alexandru

Herisanu

HPC

Infrastructure and Applications in C

S@

UPB

Cluster com

puting, HPC

, applications, Grid C

omputing

Alexey

LastovetskyH

igh performance heterogeneous com

puting in UCD

high performance heterogeneous com

putingD

anaPetcu

Parallel computing in R

omania

Cluster com

puting; Applications in science and engineering

LeonelSousa

PCs(C

PU+

GPU

)= H

eterogeneous System

sG

PU, C

UD

A, M

PI, CU

BLA

S, A

TLAS

Ram

onD

oalloResearch activities of the C

A group (U

niv. A C

oruña)H

PC support tools - Program

mability - Parallel libraries - G

raphicsM

arkBaker

Multi-core, C

louds, low-level infrastructure, and green com

putingM

ulti-core, clouds, low-level infrastructure, green com

putingU

rosCibej

Challenges of data-intensive com

putation in heterogeneous environments

data-intensive computation, data replication, scheduling

Andrea

Clem

atisIssues for m

ulti level parallel systems exploitation in com

plex applicationsm

ultilevel parallelism, grid, applications

Katerina

Doka

Research A

ctivities of the PDSG

Group

Grid, C

loud, P2P, Data M

anagement

Daniel

Katz

Science on the TeraG

ridlarge-scale com

puting, distributed applicationsEster

Martin G

arzonThe research group S

upercomputation-A

lgorithms

high performance com

puting; image processing; tom

ographic reconstruction; global optimization;

multim

edia.Svetozar

Margenov

Parallel PCG

Algorithm

s for Voxel FEM

System

slarge-scale scientific com

puting, parallel computing, IB

M B

lue Gene/P

PierreRam

etD

ynamic S

cheduling for Sparse D

irect Solver on N

UM

A and M

ulticore Architectures

Sparse D

irect solver, NU

MA, M

ulticore, Dynam

ic Scheduling

Bora

Ucar

MU

MPS

: A m

ultifrontal massively parallel sparse direct solver

Parallel computing; sparse direct solvers; linear system

of equationsJerzy

Wasniew

skiA Fast M

inimal S

torage Factorization of Sym

metric M

atricesN

umerical analysis, Linear A

lgebra, Sym

metric, triangular, and H

ermitian m

atrices, Cholesky

algorithm, diagonal pivoting m

ethod.Rom

anW

yrzykowski

Parallel adaptive finite element package w

ith dynamic load balancing for 3D

therm

omechanical problem

s3D

FEM, adaptive m

ethods, parallel algorithms/applications, dynam

ic load balancing, multigrid

algorithms, PC

-clusters, multicore com

puting, GPU

Magne

Haveraaen

A H

ardware Independent Parallel Program

ming M

odelJorge

Barbosa

Dynam

ic scheduling on heterogeneous machines

dynamic scheduling, parallel task, list scheduling, cluster com

putingEleni

Karatza

Performance of S

cheduling Strategies in D

istributed System

sPerform

ance, Sheduling, M

apping, Distributed S

ystems, S

imulation.

Gudula

Rünger

Mapping and scheduling of Parallel Tasks for M

ulti-core System

sParallel tasks, m

ixed parallelism, scheduling, m

appingO

zanSonm

ezApplication-O

riented Scheduling in M

ulticluster gridsM

ulticluster grids, co-allocation, cycle scavenging, job runtime predictions

Konstantinos

Karaoglanoglou

Discovering resources and m

apping in large-scale distributed environments

Resource D

iscovery, Fault-Tolerance, Trust Issues, Distributed S

ystems, G

rid and Cluster C

omputing

PierreKuonen

On-going research activities at the G

RID

& U

biquitous Com

puting Group

Grid program

ming, resources m

anagement.

Zafeirios

PapazachosM

odeling and Performance A

nalysis of Scheduling A

lgorithms in G

rid System

sScheduling policies; D

istributed systems; Perform

ance; Modeling and S

imulation; G

rid and cluster com

putingRam

inYahyapour

Resource M

anagement and S

cheduling for Clouds and G

ridsCloud C

omputing, S

chedulingStylianos

Zikos

Perfomance of resource allocation policies in grid system

sD

istributed systems, grids, site selection, resource allocation policies

Ranieri

Baraglia

Sorting using bitonic netw

ork with C

UD

AG

PU, Parallel program

ming, B

itonic sortSylvain

Contassot-

Vivier

PDE solver using asynchronous algorithm

s on a GPU

clusterCluster com

puting, GPU

computing, asynchronism

, numerical com

putation

JanKw

iatkowski

Autom

atic Program Parallelization

parallel processing, automatic program

parallelizationSidi A

hmed

Mahm

oudiParallel Im

age Processing on GPU

with C

UD

A and O

penGL

GPU

, CU

DA, O

penGL, Im

age Processing, Medical Im

agingSam

uelThibault

StarPU

, a runtime system

for accelerator-based multicore m

achinesM

ulticore, Accelerator, G

PU, C

ell, Scheduling, D

SM

Thomas

Brady

Sm

artGridR

PC: The N

ew R

PC M

odel of High Perform

ance Grid C

omputing and its

Implem

entationG

rid Com

puting, High Perform

ance Com

puting, Scientific C

omputing, G

rid Middlew

are, GridR

PC,

Sm

artGridR

PC.

Attila

Kertesz

Grid Interoperability S

olutions in Grid R

esource Managem

entG

rid Interoperability, Grid R

esource Managem

ent, Grid B

roker, Meta-B

rokeringFotis

LoukosD

istributed computation in Peer-to-Peer netw

orksdistributed com

putation, peer-to-peerM

arcinPaprzycki

Softw

are Agents as R

esource Brokers in G

ridThom

asRauber

Parallel OD

E-solvers on Multi-core S

ystems

multi-core, num

erical analysis, OD

E solversFranck

Seinstra

Real-W

orld Distributed C

omputing w

ith Ibishigh-perform

ance distributed computing ; user transparency ; platform

-independence ; middlew

are-independence; fault-tolerance ; m

alleability ; guaranteed connectivity ; real-world, real-tim

e, and off-line applications ; \The Prom

ise of the Grid\""

LeszekBorzem

skiM

WIN

G: A

Scalable Experim

ental Framew

ork for Monitoring and M

easurement in

Distributed C

omplex H

PC Environm

entsG

rid, cloud computing, com

munication system

, heterogeneous computing, distributed system

m

onitoring and measurem

entM

arcoD

aneluttoAutonom

ic managem

ent for efficient HPC

structured parallel computating, algorithm

ic skeletons, autonomic m

anagement, non functional

features, performance tuning

Alexey

Kalinov

Challenges of physical verification on heterogeneous platform

EDA, physical verification, hierarchical processing

PhilippKegel

Challenges and A

pproaches in Parallelizing Applications for M

edical Imaging

Medical Im

aging, PET, Threading Building B

locks, GPG

PUAntonio

PlazaH

igh Performance C

omputing in R

emote S

ensing Applications

High perform

ance computing, cluster com

puting, heterogeneous computing, FPG

As, G

PUs,

applications, rN

unoRom

aH

eterogeneous multi-core com

puter architectures and dedicated processing structures for signal processing applications

Heterogeneous m

ulti-core architectures, Video encoding, B

iological sequences alignment (D

NA and

protein)Julius

Zilinskas

High-Perform

ance Com

puting in Global O

ptimization and O

ptimization-B

ased Visualization

parallel algorithms, global optim

ization, visualization of multidim

ensional dataFrancois

Broquedis

ForestGO

MP, an efficient O

penMP runtim

e system for hierarchical architectures

OpenM

P, multicore, runtim

e system, affinities, N

UM

AM

anuelPrieto-M

atiasO

S S

cheduling on Asym

metric M

ulticore System

sO

S scheduling, A

symm

etric Multicore

JoãoSobral

Parallel programm

ing refinements for heterogeneous m

ulti-core parallel systems

separation of concerns, heterogeneous multicore system

s, parallel programm

ing

Monitoring/visualisation: 2 talks

Multicore: 3 talks

Applications: 5

talks

General talks

(small scale): 7 Talks

General talks

(large scale): 7 talks

GPU

: 5 talks

Middlew

are: 6 talks

Scheduling 2: 5

talks

Scheduling 1: 4

talks

Num

erical analysis: 5 talks

2

2 List of Presentations

Multi-core, Clouds, low-level infrastructure, and green computingMark Baker SSE, University of Reading mark.baker@computer.

org

Keywords: Multi-core, clouds, low-level infrastructure, green computingAbstract: New hardware and software technologies are emerging all the time, and there is a never-ending issue of whatchoices should be used to support both legacy and evolving applications and their algorithms. An example of one ofthese issues is the use of multi-core processors. The producers of these processors seem to think that a shared-memorymodel is adequate, but in reality, the HPC market-place will want to use large systems of multi-core processors, wherethere will be a need for intra and inter communication between the processors and cores. This type of programmingcan be overcome using MPI/OpenMP/threads, but in reality a programming paradigm that encompasses this typeof architecture is needed. Indeed, the way Intel, AMD, IBM Cell, and GPUs processors are designed means thatoptimisation of code on these systems will take a lot of time and effort. Another area that is changing rapidly is themove away from Grid computing, towards Clouds and Virtualisation, which are innovative, but potentially do not meetthe needs of some HPC users. In addition, the need for Green IT computing that optimises application performanceand potentially saves power is a very important aspect of computing in the future.

Sorting using bitonic network with CUDARanieri Baraglia CNR-ISTI ranieri.baraglia@

isti.cnr.it

Keywords: GPU, Parallel programming, Bitonic sortAbstract: After a short description of the ISTI’s HPCLab and a highlight on the research activities conducted, wepresent a fast sorting algorithm implementing an efficient bitonic sorting network on graphics processors. Sortingis a fundamental and universal problem in computer science. Even if sort has been extensively addressed by manyresearch works, it still remains an interesting challenge to make it faster by exploiting novel technologies. In thislight, the presentation shows how to use graphics processors as coprocessors to speed up sorting while allowing CPUto perform other tasks. Our new algorithm exploits a memory-efficient data access pattern maintaining the minimumnumber of accesses to the memory out of the chip. We introduce an efficient instruction dispatch mechanism toimprove the overall sorting performance.References:

• K. E. Batcher. Sorting networks and their applications. In AFIPS 68(Spring): Proceedings of the April 30 May2,1968, spring joint computer conference, pages307 314, New York, NY, USA, 1968. ACM.

• M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In FOCS 99: Pro-ceedings of the 40th Annual Symposium on Foundations of Computer Science, Washington, DC, USA, 1999.IEEE Computer Society.

• N. K. Govindara ju, J. Gray, R. Kumar, and D. Manocha. Gputerasort: High performance graphics coprocessorsorting for large database management. In ACM SIGMOD International Conference on Management of Data,Chicago, United States, June 2006.

Dynamic scheduling on heterogeneous machinesJorge Barbosa University of Porto [email protected]

Keywords: dynamic scheduling, parallel task, list scheduling, cluster computingAbstract: The aim of this presentation is to describe the work developed on scheduling dynamically multi-user andindependent jobs on clusters, both homogeneous and heterogeneous. The dynamic behavior means that the scheduleris able to adapt the scheduling when new jobs are submitted and also when processors availability changes. The aim

3

for future work is to study the possibility of extending the scheduling algorithms for dynamic application schedulingon multicore heterogeneous machines. Another field of research is to develop grid (multi-cluster) scheduling strategiesaccording to the utility computing concept. This is, to lower computation costs without damaging users QoS require-ments, by taking into account energy consumption as well as computation time.References:

• J. Barbosa, Belmiro Moreira Dynamic job scheduling on heterogeneous clusters in 8th International Symposiumon Parallel and Distributed Computing, pp.3-10, 2009

• J. Barbosa, J. Tavares and A.J. Padilha: Optimizing dense linear algebra algorithms on heterogeneous machines,Algorithms and Tools for Parallel Computing On Heterogeneous Clusters, Nova Science Publisher, N.Y., pp17-31, 2007

MWING: A Scalable Experimental Framework for Monitoring and Measurement in Dis-tributed Complex HPC Environments

Leszek Borzemski Wroclaw University of Technology, Wroclaw,Poland

[email protected]

Keywords: Grid, cloud computing, communication system, heterogeneous computing, distributed system monitoringand measurementAbstract: Today?s complex scientific problems are programmed on high-performance computing systems which areoften developed as distributed systems. Monitoring and measurements can help in exploitation of distributed systems,especially in the context of performance. Collected measurements can be essential in terms of network behavior di-agnosis and understanding how the Internet works and how to improve its operability. We have been designing aMWING scalable experimental framework to setup and carry out distributed monitoring and measurements. MWINGis a good starting point in the design of HPC systems based on cloud computing paradigm. The system can synchro-nize the activity of measurement/monitoring autonomous software agents that collect information on various networkperformance/reliability characteristics, e.g. to measure the quality of communication services. MWING frameworkuses local and centralized databases. Locally gathered data can be uploaded in one place allowing further analysis ofdata, e.g. with the use of a professional data mining engine. MWING has been used in a global experiment to observeand predict network behavior in a real-life Internet-wide measurement distributed infrastructure.References:

• BORZEMSKI L., CICHOCKI L., FRAS M., KLIBER M., NOWAK Z., MWING: A Multiagent System forWeb Site Measurements. LNCS, vol. 4496, 2007, 278-287.

BORZEMSKI L., CICHOCKI L., KLIBER M., Architecture of Multiagent Internet Measurement SystemMWING Release 2, LNCS, vol. 5559, 2009, 410-419.

SmartGridRPC: The New RPC Model of High Performance Grid Computing and its Imple-mentation

Thomas Brady University College Dublin [email protected]

Keywords: Grid Computing, High Performance Computing, Scientific Computing, Grid Middleware, GridRPC,SmartGridRPC.Abstract: The SmartGridRPC model is an extension of the GridRPC model, which aims to achieve higher perfor-mance. The traditional GridRPC provides a programming model and API for mapping individual tasks of an appli-cation in a distributed Grid environment, which is based on the client-server model characterised by the star networktopology. SmartGridRPC provides a programming model and API for mapping a group of tasks of an application in adistributed Grid environment, which is based on the fully connected network topology.

The presentation will outline the SmartGridRPC programming model and API, its implementation in SmartGrid-Solve and its performance advantages over the GridRPC model. In addition, experimental results using a real-world

4

application will also presented.References:

• Brady T., Dongarra J., Guidolin M., Lastovetsky A., and Seymour K.. SmartGridRPC: The New RPC Modelfor High Performance Grid Computing and its Implementation in SmartGridSolve., Manuscript submitted forpublication, April 2009.

• Guidolin M., Brady T. And Lastovetsky A. ADL: Obtaining Higher Performance in SmartGridSolve with Irreg-ular Algorithms, Manuscript submitted for publication, April 2009.

• Brady T, Guidolin M, Lastovetsky A. Experiments with SmartGridSolve: Achieving higher performance by im-proving the GridRPC model, in Proceedings of the 9th IEEE/ACM International Conference on Grid Computing(Grid2008), Tsukuba, Japan, 29.

• Brady T, Konstantinov E, Lastovetsky A, SmartNetSolve: High Level Programming System for H

ForestGOMP, an efficient OpenMP runtime system for hierarchical architecturesFrancois Broquedis INRIA - LaBRI - University of Bordeaux francois.broquedis@

labri.fr

Keywords: OpenMP, multicore, runtime system, affinities, NUMAAbstract: Today, multicore is everywhere and HPC computers are getting more and more hierarchical a. nd hard toprogram efficiently. Even if we can still program them like SMP architectures, the performance we obtain is oftendisappointing. Indeed, we now have to take care of both cache and memory affinities, to benefit from cache sharingand to limit NUMA penalties. The ForestGOMP platform extends the OpenMP runtime system to do so by providinga way to group related threads together and attach data to these groups. It then provides several scheduling policiesto distribute OpenMP threads and the attached data on any hierarchical architecture, and ways to design your ownscheduler.References:

• Dynamic Task and Data Placement over NUMA Architectures: an OpenMP Runtime Perspective. FrancoisBroquedis, Nathalie Furmento, Brice Goglin, Raymond Namyst, and Pierre-Andre Wacrenier.

• Scheduling Dynamic OpenMP Applications over Multicore Architectures. Francois Broquedis, Francois Di-akhate, Samuel Thibault, Olivier Aumage, Raymond Namyst, and Pierre-Andre Wacrenier.

Challenges of data-intensive computation in heterogeneous environmentsUros Cibej University of Ljubljana, Slovenia uros.cibej@fri.

uni-lj.si

Keywords: data-intensive computation, data replication, schedulingAbstract: Data-intensive applications are gaining more and more importance in HPC. The emergence of new appli-cations that require enormous amount of data has revealed new problems that need to dealt with when executing suchjobs in heterogeneous and geographically distributed environments. In our presentation we will describe some of thechallenges in this area and directions which we have taken in order to solve this issues.

Issues for multi level parallel systems exploitation in complex applicationsAndrea Clematis IMATI - CNR [email protected].

cnr.it

Keywords: multilevel parallelism, grid, applicationsAbstract: Different research activities are currently carried out at IMATI-CNR involving the exploitation of multi-level parallel systems in complex applications. An overview of issues arising from these activities will be presented

5

referring to the following domains: - biomedical domain: aspects related with the use of multilevel parallel processingare considered with reference to molecular docking and tissues micro array analysis; - hydro-meteo domain: ongoingresearch activities to deploy on the Grid complex hydro-meteo workflows are shortly illustrated; - mechanical sys-tem design: the need of practical multilevel parallelism in the design and simulation of mechanical components arediscussed. Current research activities and projects on these topics are described as well.

PDE solver using asynchronous algorithms on a GPU clusterSylvain Contassot-Vivier

University Henri Poincare - Nancy 1 [email protected]

Keywords: Cluster computing, GPU computing, asynchronism, numerical computationAbstract: We present a PDE solver based on the multisplitting-Newton approach using a GPU accelerated sparse lin-ear solver as inner core. The multisplitting approach allows us to make use of asynchronism which sensibly decreasesthe overall computation time by performing an implicit overlapping of computations by communications. Moreover,as most PDE problems come from physical modelling in which the dependency scheme produces sparse matrices, wepropose as our inner linear solver, a sparse one, designed to work with structured matrices where all the non-zeros areon a few diagonals. Finally, several benchmarks point out the interest of using asynchronous algorithms together withlocal accelerators like GPUs.References:

• The inner linear solver has been presented in ParCo2009. That presentation encloses the results of the entireproject consisting in using asynchronous algorithms on GPU clusters.

Autonomic management for efficient HPCMarco Danelutto Dept. Computer Science - Univ. Pisa [email protected]

Keywords: structured parallel computating, algorithmic skeletons, autonomic management, non functional features,performance tuningAbstract: We will discuss recent advances in the autonomic management of non functional features (performancetuning, security, fault tolerance, power management) in structured parallel computations. Results will be presented re-lated to the management of performance in different experiments. Evolution of the methodology that allows managingseveral different non functional concerns within the same application will also be discussed.References:

• Marco Aldinucci, Marco Danelutto, Peter Kilpatrick, Autonomic management of non-functional concerns indistributed - parallel application programming, IPDPS 2009, Rome, May 2009

• Marco Aldinucci, Marco Danelutto and Peter Kilpatrick, Co-design of distributed systems using skeletons andautonomic management abstractions, in: Euro-Par 2008 Workshops - Parallel Processing, Selected Papers, pages403414, Springer, 2009

• Marco Aldinucci, Marco Danelutto, Giorgio Zoppi and Peter Kilpatrick, Advances in Autonomic Components- Services, in: From Grids To Service and Pervasive Computing (Proc. of the CoreGRID Symposium 2008),pages 3-17, Springer, 2008

Research activities of the CA group (Univ. A Coruña)Ramon Doallo University of A Coruña [email protected]

Keywords: HPC support tools - Programmability - Parallel libraries - GraphicsAbstract: This talk will present the current research lines of the Computer Architecture Group of the University ofA Coruña, Spain, which involve support tools for HPC (high performance compilers, middleware, fault-tolerance,administration), programmability (PGAS UPC), development of parallel libraries, and computer graphics and visual-ization.

6

Research Activities of the PDSG GroupKaterina Doka National Technical University of Athens [email protected].

ntua.gr

Keywords: Grid, Cloud, P2P, Data ManagementAbstract: The presentation is about the research activities of the Parallel and Distributed Systems Group of theSchool of Electrical and Computer Engineering of the National Technical University of Athens. We will focus onData Management in Peer-to-Peer systems.

GPU Computing and Nurmerical Algorithms for Complex HPCAnne C. Elster NTNU (Norwegian Univ. of Science and Technology [email protected]

Keywords: GPU Computing and Nurmerical Algorithms for Complex HPCAbstract: In this presentations I will highlight some of my group’s work on numerical algorithm in GPU computingand how it relates to Complex HPC. As a leader for the "Numerical analysis for hierarchical and heterogeneous andmulticore systems" Working Group of this COST Action I will also give some pointer to current issues. I also inviteothers to come to me with suggestions.References:

• http://www.idi.ntnu.no/ elster/hpc-lab http://www.idi.ntnu.no/ elster/hpc-group/elster-alumni.html

Research activities of the CA group (Univ. Santiago de Compostela)Francisco F. Rivera Univ. Santiago de Compostela [email protected]

Keywords:Abstract: A summary of the recent research activities of the group will be presented: performance analysis andprediction, run-time performance optimization, applications, Grid simmulation among others.References:

• www.ac.usc.es

Parallel Routines Modelling and Applications at the Parallel Computing Group of the Un-viersity of Mu

Domingo Gimenez University of Murcia [email protected]

Keywords: parallel routines modeling, heterogeneous computing, scheduling, parallal computing applicationsAbstract: In this talk we introduce the Parallel Computing Group at the University of Murcia and the fields whereit research: parallel routines modeling and optimization, scheduling on heterogeneous systems, and applications ofparallel computing (meteorology, maritime contamination, filters design...)

A Hardware Independent Parallel Programming ModelMagne Haveraaen Universitetet i Bergen Magne.Haveraaen@ii.

uib.no

Keywords:Abstract: Parallel programming faces two major challenges: how to efficiently map computations to different parallelhardware architectures, and how to do it in a modular way, i.e., without rewriting the problem solving code. We pro-poseto treat dependencies as first class entities in programs. Programming a highly parallel machine or chip can thenbe formulated as finding an efficient embedding of the computations data dependency into the underlying hardwarescommunication layout. With the data dependency pattern of a computation extracted as an explicit entity in a program,one has a powerful tool to deal with parallelism.References:

7

• Eva Burrows and Magne Haveraaen: A Hardware Independent Parallel Programming Model. Journal of Logicand Algebraic Programming, Volume 78, Issue 7, August-September 2009, Pages 519-538. http://dx.doi.org/10.1016/j.jlap.2009.06.002

HPC Infrastructure and Applications in CS@UPBAlexandru Herisanu University Politehnica of Bucharest [email protected]

Keywords: Cluster computing, HPC, applications, Grid ComputingAbstract: In this presentation we show the recent advances in terms of hardware and software infrastructure as well asHPC applications in the Computer Science and Engineering Department of the University Politehnica of Bucharest.

Challenges of physical verification on heterogeneous platformAlexey Kalinov Cadence Design Systems [email protected]

Keywords: EDA, physical verification, hierarchical processingAbstract: Physical Verification is a process whereby an integrated circuit layout is checked via Electronic Design Au-tomation(EDA) software tools to see if it meets certain criteria. The nature of the process is quite irregular. Problemsin parallel implementation and possible benefits of using modern heterogeneous platform are discussed.

Discovering resources in distributed environmentsKonstantinosKaraoglanoglou

Aristotle University of Thessaloniki, Department ofInformatics

[email protected]

Keywords: Resource Discovery, Distributed SystemsAbstract: The proposed research work discusses issues concerning the "Algorithms and tools for mapping and ex-ecuting applications onto distributed and heterogeneous systems" scientific area. The discussed research work isconducted at the Parallel and Distributed Systems Group in the Department of Informatics of the Aristotle Universityof Thessaloniki under the supervision of Professor Dr. Helen Karatza. In our previous efforts, we have extensivelydealt with the discovery of resources and the efficient mapping of applications in large-scale heterogeneous distributedenvironments. In order to efficiently identify appropriate resources for certain applications, we enhanced the proposedmechanisms with a matchmaking framework. Moreover, our research took the direction of managing the uncertaintiesencountered in such environments by proposing a mechanism able to overcome the phenomenon of resource failures.As for future research directions, we intend to deal with efficient mapping mechanisms that take into considerationthe "resource evolution" phenomenon, weighing in the technical changes that so commonly occur in the resourcesof distributed environments. Finally, we intend to deal extensively in the direction of trusted mapping and directingapplications to reliable resources in a distributed environment. The objective of this research direction is to providerobustness to distributed environments against malicious behaviours.References:

• K. Karaoglanoglou and H. Karatza, "Resource Discovery in a Dynamical Grid based on Re-routing Tables",SIMPAT, Elsevier, July 2008.

Performance of Scheduling Strategies in Distributed SystemsEleni Karatza Aristotle University of Thessaloniki, Department of

[email protected]

Keywords: Performance, Scheduling, Mapping, Distributed Systems, Simulation.Abstract: Distributed systems offer considerable computational power, which can be used to solve problems with largecomputational requirements. The scheduling strategy that might be used in such a system is of great significance sincethe performance achieved is proportional to the algorithms effectiveness. An efficient scheduling strategy maximizesthe system performance and avoids unnecessary delays. In this talk we will present various scheduling strategies indistributed systems for various workloads. Parallel jobs are examined. Simulation models are used to evaluate the

8

performance of the scheduling algorithms.References:

• H. D. Karatza, Periodic Task Cluster Scheduling in Distributed Systems, in Computer System PerformanceModeling in Perspective, E. Gelenbe Editor, World Scientific, Imperial College Press, 2006, pp. 257-276.

Science on the TeraGridDaniel Katz University of Chicago [email protected]

Keywords: large-scale computing, distributed applicationsAbstract: This presentation will talk about the TeraGrid, and how various types of users make use of it to obtainresearch results

Challenges and Approaches in Parallelizing Applications for Medical ImagingPhilipp Kegel University of Munster philipp.kegel@

uni-muenster.de

Keywords: Medical Imaging, PET, Threading Building Blocks, GPGPUAbstract: Image reconstruction in the field of medical imaging, in particular computer tomography (PET), is a time-consuming task. In addition, new algorithms are being developed to increase image quality (e.g. respiratory motioncorrection), which require even more computational power. In a joint project with physicists, physicians, mathemati-cians and other computer scientists we are working on the parallelization of a real-world application for PET medicalimaging. We ported this application to various parallel architectures (clusters, GPUs, multi-cores) to evaluate theiraptitude for imaging algorithms. Currently, we are working with new programming models (e.g., Threading BuildingBlocks or OpenCL) to exploit recent parallel architectures, and started developing a domain-specific parallel libraryfor medical imaging applications.References:

• T. Hoefler, M. Schellmann, S. Gorlatch, and A. Lumsdaine. Communication optimization for medical image re-construction algorithms. In Lecture Notes in Computer Science, volume 5205, pages 75–83, Berlin/Heidelberg,2008. Springer.

• M. Schellmann, J. Vörding, S. Gorlatch, and D. Meiländer. Cost-effective medical image reconstruction: fromclusters to graphics processing units. In CF ’08: Proceedings of the 2008 conference on Computing Frontiers,pages 283–292, New York, NY, USA, 2008. ACM.

Grid Interoperability Solutions in Grid Resource ManagementAttila Kertesz MTA SZTAKI attila.kertesz@

sztaki.hu

Keywords: Grid Interoperability, Grid Resource Management, Grid Broker, Meta-BrokeringAbstract: Since the management and beneficial utilization of highly dynamic grid resources cannot be handled bythe users themselves, various grid resource management tools have been developed, supporting different grids. Toease the simultaneous utilization of different middleware systems, researchers need to revise current solutions. GridInteroperability can be achieved at different levels of grid systems. In this talk we gather interoperation efforts in Gridresource management, focusing on the following approaches: (1) extending existing resource brokers with multiplemiddleware support, (2) interfacing grid portals to different brokers and middleware or (3) developing a new, higherlevel middleware component that not only interfaces various brokers but also coordinates their simultaneous utilization.We show that all of these approaches contribute to enable Grid Interoperability, and conclude that the third solution isa significant step towards the final solution.References:

9

• A. Kertesz, P. Kacsuk, Grid Interoperability Solutions in Grid Resource Management, IEEE Systems Journal’sSpecial Issue on Grid Resource Management, Volume 3, Issue 1, pp. 131-141, doi: 10.1109/JSYST.2008.2011263,March 2009.

• A. Kertesz, J. D. Dombi, J. Dombi, Adaptive scheduling solution for grid meta-brokering, Acta CyberneticaVolume 19, pp. 105-123, 2009.

On-going research activities at the GRID & Ubiquitous Computing GroupPierre Kuonen University of Applied Sciences of Western Switzer-

land, [email protected]

Keywords: Grid programming, resources management.Abstract: The GRID & Ubiquitous Computing Group is a part of the Information and Communication Technologiesdepartment of the University of Applied Sciences of Western Switzerland, Fribourg (HES-SO/EIA-FR). This group isactive on topics related to parallel and distributed systems such as Grid computing and was partner of the CoreGRIDNoE. More specifically activities of the group are mainly focusing on the following aspects:

• Programming models and tool for GRID and distributed systems.

• Resource management for GRID systems. This presentation will focus on these two aspects of the researchactivities of the GRID & Ubiquitous Computing Group. Two projects will be presented

• POP-C++: a comprehensive object-oriented environment for developing HPC applications on the Grid.

• SmartGRID: a grid scheduling framework based on swarm intelligence for the scope of serving the overallGRID as a whole

References:

• "Programming the Grid with POP-C++", T. A. Nguyen, P. Kuonen, in Future Generation Computer Systems(FGCS), N.H. Elsevier, Volume 23, Issue 1, 1 January 2007, pages 23-30.

• Ye Huang, Amos Brocco, Pierre Kuonen, Michele Courant, Beat Hirsbrunner, "SmartGRID: A Fully Decentral-ized Grid Scheduling Framework Supported by Swarm Intelligence," in: International Conference on Grid andCooperative Computing (GCC’08), Shenzhen, China, IEEE Press, 160-168, October, 2008.

Automatic Program ParallelizationJan Kwiatkowski Wroclaw University of Technology jan.kwiatkowski@pwr.

wroc.pl

Keywords: parallel processing, automatic program parallelizationAbstract: Nowadays, it becomes more and more hard to enhance processor speed, therefore multiplying processingunits seems to be the best way to achieve larger performance. It causes that multi-core and hybrid processors as well asGPU’s used as a processing units stay more and more popular for commercial and home usage. Developing programsfor these “architectures” required from the programmers some additional specific knowledge about the processorarchitecture and parallel programming. We propose the SliCer, the hardware independent tool that in automatic wayparallelized serial programs written in C language by creating the proper number of threads that can be later executedin parallel depending on the used architecture. We used virtualization that makes it possible to utilize a variety ofhardware resources dynamically selected to ensure efficient execution performed in accordance to the requirements.References:

• Automatic Program Parallelization for Multicore Processors, accepted for PPAM’09 Conference

10

High performance heterogeneous computing in UCDAlexey Lastovetsky University College Dublin alexey.lastovetsky@

ucd.ie

Keywords: high performance heterogeneous computingAbstract: In this talk, I will present the research and development in the area of high performance heterogeneouscomputing conducted in the UCD Heterogeneous Computing Laboratory. Our current research covers models, algo-rithms and programming tools for high performance computing on heterogeneous computational clusters and Grids.Now we are extending our target platforms to mutlicores, hybrid computing nodes and clusters of such nodes.References:

• hcl.ucd.ie

Distributed computation in Peer-to-Peer networksFotis Loukos Aristotle University of Thessaloniki, Department of

[email protected]

Keywords: distributed computation, peer-to-peerAbstract: Peer-to-Peer networks are an emerging topic in networking. They consist of many users, each one equal toanother, forming an overlay over another network like the internet. One of their usages is cooperation and distributedcomputation. There are many large scale projects for distributed computation that use such networks and frameworkshave been created to support them. We will present this model of computation together with the advantages, disadvan-tages and problems that we face when implementing it.References:

• Fotis Loukos and Helen Karatza, "Reputation based Friend-to-Friend networks", Peer-to-Peer Networking andApplications, Springer, Volume 2, Issue 1, Pages 13-23, March 2009

• Fotis Loukos and Helen Karatza, "Enabling Cooperation in MANET-based Peer-to-Peer systems", In MobilePeer-to-Peer Computing for Next Generation Distributed Environments: Advancing, Conceptual and Algorith-mic Applications, Boon-Chong Seet Editor, IGI Global, Pages 118-131, 2009

Parallel Image Processing on GPU with CUDA and OpenGLSidi Ahmed Mahmoudi University of Mons sidi.mahmoudi@umons.

ac.be

Keywords: GPU, CUDA, OpenGL, Image Processing, Medical ImagingAbstract: The motivation of our work is to demonstrate the interest of the GPU exploitation using CUDA and OpenGLfor boosting performances of image processing algorithms. This concern is particularly important for a broad setof applications, such as real-time video processing, motion analysis, etc. We have implemented several algorithmssuch as geometrical transformations, removing noise, Gaussian smoothing, edge detection. These algorithms havebeen applied on high resolution and medical images. We propose a development scheme based upon CUDA forparallel constructs and OpenGL for visualization, which reduces data transfer between device and host memories.Experimental results have been conducted on several platforms, e.g. GPU GeForce8600 and GPU GTX280, showinga global speedup ranging from 20 to 60, by comparison with a standard CPU implementation.References:

• S.A. Mahmoudi, M.S. Haidar, N. Ihaddadene, Ch. Djeraba. Abnormal event Detection in real time video.1st Int. Workshop on Multimedia Interaction Analysis of Users a Controlled Environment.Oct. 2008, Chania,Greece. Mohammed Benjelloun, Saïd Mahmoudi “Spine Localization in X-ray Images Using Interest PointDetection”. Journal of Digital Imaging. Vol 22, No 3 (June), 2009: pp 309-318

• Mohammed Benjelloun, Saïd Mahmoudi “X-ray Image Segmentation for Vertebral Mobility Analysis”. Interna-tional Journal of Computer Assisted Radiology and Surgery. Volume 2, Number 6, Avril 2008, pages 371-383.

11

Parallel PCG Algorithms for Voxel FEM SystemsSvetozar Margenov Institute for Parallel Processing, BAS margenov@parallel.

bas.bg

Keywords: large-scale scientific computing, parallel computing, IBM Blue Gene/PAbstract: The presented study is motivated by the development of parallel numerical methods, algorithms, and soft-ware tools for micro finite element simulation of human bones. The voxel representation of the bone micro structureis obtained from a high resolution computer tomography image.The reference volume element has a strongly hetero-geneous micro structure composed of solid and fluid phases. Cruzeix-Raviart and Rannacher-Turek nonconformingfinite elements are used to discretize the arising strongly heterogeneous elasticity problems.

The efficiency of codes incorporating BoomerAMG and parallel MIC(0) will be discussed. The size of the consid-ered large scale problems goes beyond a billion of degrees of freedom. The presented parallel numerical tests includeresults on IBM Blue Gene/P machine of the Bulgarian Supercomputing Center. The ongoing Bulgarian NSF projectCenter of Excellence on Supercomputing Applications will be introduced.References:

• P. Arbenz, S. Margenov, Y. Vutov, Parallel MIC(0) preconditioning of 3D elliptic problems discretized byRannacher-Turek finite elements, Computers and Mathematics with Applications, 55 (10), 2197-2211

• S. Margenov, Y. Vutov, Parallel MIC(0) Preconditioning for Numerical Upscaling of Anisotropic Linear ElasticMaterials, Large-Scale Scientific Computing, Springer LNCS (to appear)

• N. Kosturski, S. Margenov, Numerical Homogenization of Bone Microstructure, Large-Scale Scientific Com-puting, Springer LNCS (to appear)

Supercomputation-AlgorithmsEster Martin Garzon University of Almeria [email protected]

Keywords: HPC; image processing; 3D tomography; global optimization; multimedia.Abstract: The research group Supercomputation-Algorithms focuses his interest on a set of problems that require in-tensive computation and that come from different scientific and technological fields, our lines are: (1) Development ofHigh Performance Computing techniques and; (2) its application to the fields of: (a) image processing and tomographicreconstruction; (b) global optimization and (c) multimedia. In HPC area, our goals are: design of new dynamic loadbalancing approaches focused on heterogeneous or non-dedicated clusters and tuning the sparse matrix-vector productto current parallel architectures. For image processing and tomographic reconstruction, our lines are: Improvement ofthe algorithms for noise reduction and the methodologies for 3D tomographic reconstruction; development of a newalgorithm for segmentation embedded in our noise reduction method; parallelization of 3D reconstruction methods forcomputer clusters and supercomputers. In global optimization the problems to be tackled are: the study of the effec-tiveness and efficiency of meta-heuristic methods; development of techniques for the reduction of the complexity andthe search space in Branch and Bound algorithms; the design of efficient high performance metaheuristic algorithmsto solve engineering, and industrial problems. In multimedia our research is intended to improve the characteristic ofa fully scalable video compression system into P2P multicast networks and mobile environments.References:

• http://www.ace.ual.es/Investigacion/public.html

Modeling and Performance Analysis of Scheduling Algorithms in Grid SystemsZafeirios Papazachos Aristotle University of Thessaloniki [email protected]

Keywords: Scheduling policies; Distributed systems; Performance; Modeling and Simulation; Grid and cluster com-putingAbstract: Distributing systems have emerged as a cost-effective and scalable solution to the increasing demand forcomputing resources. A distributed system consists of several resources which are connected via a communication

12

network. In order to maximize the efficiency of such a system, a proper scheduling algorithm is necessary. Thescheduling algorithm is responsible for allocating the available system resources to the existing jobs. An effectiveway of examining an algorithms performance is by implementing a simulation model and then running simulationexperiments. Topics of interest that are currently examined by means of simulation are: Quality of Service (QoS)in grid and cluster systems, scheduling policies for heterogeneous systems and performance analysis of distributedsystems.References:

• Z. Papazachos, H. Karatza, ?Performance Evaluation of Gang Scheduling in a Two-Cluster System with Migra-tions?, Proceedings of the 8th International Workshop on Performance Modeling, Evaluation, and Optimizationof Ubiquitous Computing and Network Systems (PMEO-UCNS 2009) in conjunction with IEEE InternationalParallel & Distributed Processing Symposium (IPDPS), Rome, Italy, May 25-29, 2009, pp. 1-8.

• Z. Papazachos and H. Karatza, ?The Impact of Task Service Time Variability on Gang Scheduling Performancein a Two-Cluster System?, Simulation Modelling Practice and Theory, Elsevier, Volume 17, Issue 7, August2009, pp. 1276-1289.

Software Agents as Resource Brokers in GridMarcin Paprzycki SRI PAS marcin.paprzycki@

ibspan.waw.pl

Keywords:Abstract: It is a widely held belief that software agents will become the next revolution in information technology.One of the areas where they are expected to play an important role is the Grid. Claims to this effect, as well asresearch results can be found in work of B. diMartino, O. Rana, B. Prasad and others. In our work we have taken adifferent approach to these researchers and proposed that agent teams should be utilized for resource brokering andmanagement. In this way, software agents become the ?brain? for the Grid ?brawn.? The aim of the presentation willbe to outline the assumptions underlying our work, introduce our system and discuss how agents representing usersinteract with the agent teams, and how the high-level intelligent infrastructure can utilize the actual Grid middlewareto execute a job.

Parallel computing in RomaniaDana Petcu West University of Timisoara [email protected]

Keywords: Cluster computing; Applications in science and engineeringAbstract: The presentation will refer to the activities of the Romanian teams working with parallel computing theoriesor parallel codes and their applications in engineering and science. A particular emphasis will be given on the resultsof the recent national collaborative projects as well as those of the team from West University of Timisoara [1-9].References:

• G. Macariu, D. Petcu: Parallel Multiple Polynomial Quadratic Sieve on Multi-Core Architectures. SYNASC2007: 59-65

• N. Somosi, D. Petcu: A Parallel Algorithm for Rendering Huge Terrain Surfaces. SYNASC 2006: 274-278

• D. Petcu: Parallel Jess. ISPDC 2005: 307-316

• D. Petcu: Adapting a Partitioning-Based Heuristic Load-Balancing Algorithm to Heterogeneous ComputingEnvironments. SYNASC 2005: 170-173

• C. Bonchis, G. Ciobanu, C. Izbasa, D. Petcu: A Web-Based P Systems Simulator and Its Parallelization. UC2005: 58-69

• D. Petcu: Parallel Explicit State Reachability Analysis and State Space Construction. ISPDC 2003: 207-214

• D. Zaharie, D. Petcu: Adaptive Pareto Differential Evolution and Its Parallelization. PPA

13

High Performance Computing in Remote Sensing ApplicationsAntonio Plaza University of Extremadura [email protected]

Keywords: High performance computing, cluster computing, heterogeneous computing, FPGAs, GPUs, applications,remote sensingAbstract: Remote sensing applications for Earth and planetary observation require computationally effective pro-cessing techniques in order to facilitate exploitation of high dimensional data sets in several contexts, includingenvironmental modeling and assessment, risk/hazard prevention and response, defense/security, and monitoring ofhuman-induced threats such as oil spills and other types of chemical contamination. With the aim of providing anoverview of recent developments and new trends in the design of parallel and distributed systems for hyperspectralimage analysis, this paper discusses and inter-compares different strategies for efficiently implementing a standardhyperspectral image processing chain, including heterogeneous networks of workstations, field programmable gatearrays (FPGAs) or graphics processing units (GPUs). Combined, these parts deliver a snapshot of the state-of-the-artin those areas, and a thoughtful perspective on the potential and emerging challenges of adapting high performancecomputing systems to remote sensing problems.

OS Scheduling on Asymmetric Multicore SystemsManuel Prieto-Matias Complutense University of Madrid [email protected]

Keywords: OS scheduling, Asymmetric MulticoreAbstract: Asymmetric multicore processors promise higher performance per watt than their symmetric counterparts,but to fully tap into their potential, the operating system must be aware of the asymmetry present in the platform.Previous research examining this issue has focused on mapping sequential applications to asymmetric cores in con-sideration of their Instruction Level Parallelism. The ArTeCS group has been working on this problem but addressinginstead the mapping of parallel applications that exhibit variations in their amount of thread-level parallelism. We willalso describe other research interests of the ArTeCS group within this network.

Dynamic Scheduling for Sparse Direct Solver on NUMA and Multicore ArchitecturesPierre Ramet LaBRI - INRIA Bordeaux [email protected]

Keywords: Sparse Direct solver, NUMA, Multicore, Dynamic SchedulingAbstract: Parallel sparse direct solvers are now able to solve efficiently real-life three-dimensional problems withseveral millions of equations. The PaStiX solver provided a hybrid MPI-thread implementation that is well suitedfor SMP nodes. This technique allows to treat large 3D problems where the memory overhead due to communicationbuffers was a bottleneck to the use of direct solvers. We introduce a simple way to schedule dynamically an applicationbased on a dependency tree to be more suitable for NUMA or multi-core architectures.References:

• see http://www.labri.fr/perso/ramet/

Parallel ODE-solvers on Multi-core SystemsThomas Rauber University Bayreuth rauber@uni-bayreuth.

de

Keywords: multi-core, numerical analysis, ODE solversAbstract: Ordinary differential equations (ODEs) arize in many application areas such as fluid flow, physics-basedanimation, mechanical systems, or chemical reaction. Due to the advent of multi-core technology, parallel resourcesare now widely available. The objective of this talk is to give an overview of parallel ODE solvers in the context ofthe hierarchical architecture of multi-core systems.

14

Heterogeneous multi-core computer architectures and dedicated processing structures forsignal processing applications

Nuno Roma INESC-ID / IST TU Lisbon [email protected]

Keywords: Heterogeneous multi-core architectures, Video encoding, Biological sequences alignment (DNA and pro-teins)Abstract: Two research projects ongoing in INESC-ID on multi-core computer architectures and dedicated processingstructures for signal processing applications will be presented: (*) Heterogeneous any-core processing structures foradvanced video coding (H.264/AVC); (*) Heterogeneous multi-core processing structures for computational biologyacceleration.

The first project targets the development of a novel class of scalable and highly-efficient heterogeneous multi-corearchitectures for advanced video encoding using a Processor-In-Memory (PIM) approach. The architecture consistsnot only of programmable general purpose processors (GPPs), but also of dedicated and highly efficient acceleratorunits, interconnected with fast data communication buses and programmable interconnection switches, using efficientdata distribution strategies to minimize the bandwidth requirements.

The second project targets the proposal of a new self-contained architecture of a heterogeneous parallel structurefor generic biological sequences alignment (DNA and protein). It incorporates a general purpose processor (GPP)and multiple specialized structures dedicated to the computation of pairwise matching scores. The GPP allows theimplementation of non-regular parts of several SW tools that can be implemented by this structure, including notonly the simple global/local alignment algorithms (Smith-Waterman, etc.), but also a broad set of packages based onheuristic strategies (FASTA, BLAST, etc.).References:

• T. Almeida, N. Roma, "A parallel programming framework for multi-core DNA sequence alignment", submittedto Int. Workshop on Multi-Core Computing Systems (MuCoCoS’2010), Poland, Feb. 2010;

• T. Dias, N. Roma, L. Sousa, M. Ribeiro, "Reconfigurable architectures and processors for real-time video motionestimation", J. Real-Time Image Processing, Springer Berlin, vol. 2, n. 4, pp 191-205, Dec. 2007;

• T. Dias, S. Momcilovic, N. Roma, L. Sousa, "Adaptive Motion Estimation Processor for Autonomous VideoDevices", EURASIP J. on Embedded Systems, Hindawi, n. 57234, pp. 1-10, May 2007.

Mapping and scheduling of Parallel Tasks for Multi-core SystemsGudula Rünger Chemnitz University of Technol ruenger@informatik.

tu-chemnitz.de

Keywords: Parallel tasks, mixed parallelism, scheduling, mappingAbstract: Multi-core clusters provide a huge amount of computing resources and use a hierarchically structuredinterconnection network. In this talk, we consider hierarchically structured parallel tasks to improve application per-formance on multi-core clusters. In particular, we consider scheduling and mapping techniques for parallel tasks thattake the architecture of the target systems into consideration. We evaluate the impact of scheduling and mapping fordifferent application programs on different parallel machinesReferences:

• Dümmler, J.; Rauber, T.; Rünger, G.: Mapping Algorithms for Multiprocessor Tasks on Multi-Core Clusters. In:Proc. of the 37th Int.Conf. on Parallel Processing (ICPP 2008): S. 141-148. IEEE Computer Society, Portland,Oregon, USA, 2008.

• Dümmler, J.; Kunis, R.; Rünger, G.: Layer-Based Scheduling Algorithms for Multiprocessor-Tasks with Prece-dence Constraints. In: Parallel Computing: Architectures, Algorithms and Applications: Proc. of the Int.Conf.ParCo 2007 (Advances in Parallel Computing, Bd. 15): S. 321-328. IOS Press, Jülich/Aachen, Germany, 2007

• Dümmler, J.; Rauber, T.; Rünger, G.: Scalable Computing with Parallel Tasks To appear in: Proc. of the 2ndWorkshop on Many-Task Computing on Grids and Supercomputers (MTAGS09), co-

15

Real-World Distributed Computing with IbisFranck Seinstra Department of Computer Science, Vrije Universiteit,

[email protected]

Keywords: high-performance distributed computing ; user transparency ; platform-independence ; middleware-independence; fault-tolerance ; malleability ; guaranteed connectivity ; real-world, real-time, and off-line applications; "The Promise of the Grid"Abstract: Ibis is an open source software framework, developed at the Vrije Universiteit, Amsterdam, that drasticallysimplifies the process of programming and deploying high-performance parallel and distributed (grid) applications.Ibis supports a range of programming models that yield efficient implementations, even on distributed sets of hetero-geneous resources. Also, Ibis is specifically designed to run in hostile (grid) environments that are inherently dynamicand faulty, and that suffer from connectivity problems.

One of the main features of the Ibis system is that it allows multiple grid systems, clusters, clouds, mobile devices,and stand-alone machines to be applied concurrently and transparently from within a single application (even underreal-time constraints). Ibis has been applied successfully in a number real-world applications, and has won prizes inseveral international competitions, including the First IEEE International Scalable Computing Challenge (at CCGrid2008), and the First International Data Analysis Challenge for Finding Supernovae (at IEEE Cluster/Grid 2008).Future research goals of the Ibis project include support for efficient and transparent use of multi-core systems andhardware accelerators (incl. GPUs, FPGAs, CELLSs, etcetera).References:

• Ibis website: http://www.cs.vu.nl/ibis/

• H.E. Bal, N. Drost, R. Kemp, J. Maassen, R.V. van Nieuwpoort, C. van Reeuwijk, and F.J. Seinstra. "Ibis: Real-World Problem Solving using Real-World Grids". Proceedings of the 23rd International Parallel & DistributedProcessing Symposium (IPDPS 2009) - Sixth High-Performance Grid Computing Workshop (HPGC 2009),Rome, Italy, May 2009.

• R. Kemp, N.O. Palmer, Th. Kielmann, F.J. Seinstra, N. Drost, J. Maassen and H.E. Bal, "eyeDentify: Mul-timedia Cyber Foraging from a Smartphone", IEEE International Symposium on Multimedia (ISM2009), SanDiego, USA, 14-16 December 2009.

Parallel programming refinements for heterogeneous multi-core parallel systemsJoão Sobral Universidade do Minho [email protected]

Keywords: separation of concerns, heterogeneous multicore systems, parallel programmingAbstract: Programming by refinement advocates the development of programs by incrementally refining a high-levelspecification towards more platform specific implementations. In this talk we present the concept of parallel program-ming refinement and several abstractions that can be used to generate efficient code for shared and distributed memoryarchitectures and compositions of these systems. We particularly present the case of parallel sorting algorithms werewe could generate the most well known parallel implementations by refining an abstract specification of a sortingalgorithm.References:

• R. Gonçalves, J. Sobral, Pluggable Parallelisation, 18th ACM international symposium on High PerformanceDistributed Computing (HPDC’09), Munique, June 2009.

• J. Sobral. Incrementally Developing Parallel Applications with AspectJ, 20th IEEE International Parallel -Distributed Processing Symposium (IPDPS’06), Rhodes, Greece, April 2006.

Application-Oriented Scheduling in Multicluster gridsOzan Sonmez Delft University of Technology [email protected]

16

Keywords: Multicluster grids, co-allocation, cycle scavenging, job runtime predictionsAbstract: Different application types in grids such as workflows, parallel applications, and bags-of-tasks pose differentrequirements that should be taken into account both from scheduling and system point of view, in order to improvetheir execution performance. This talk mainly covers our research and experiences on supporting various applicationtypes in a real multicluster grid scheduler, named KOALA.References:

• + O. Sonmez, H. Mohamed, D.H.J. Epema, "On the Benefit of Processor Co-Allocation in Multicluster GridSystems", IEEE Transactions on Parallel and Distributed Systems, 2009.

• + O. Sonmez, N. Yigitbasi, A. Iosup, and D.H.J. Epema, "Trace-Based Evaluation of Job Runtime and QueueWait Time Predictions in Grids", In the ACM/IEEE Int’l. Symp. on High Performance Distributed Computing(HPDC’09), Jun 11-13, 2009

• + O. Sonmez, B. Grundeken, H. Mohamed, A. Iosup, and D.H.J. Epema, "Scheduling Strategies for CycleScavenging in Multicluster Grid Systems", In the IEEE International Symposium on Cluster Computing and theGrid (CCGrid’09), May 18-21, 2009

PCs(CPU+GPU)= Heterogeneous SystemsLeonel Sousa NESC-ID/IST, TU Lisbon [email protected]

Keywords: GPU, CUDA, MPI, CUBLAS, ATLASAbstract: Nowadays, commodity computers are complex heterogenous systems that provide a huge amount of com-putational power. However, to take advantage of this power we have to orchestrate the use of processing units withdifferent characteristics, such as the general purpose multi-cores and GPUs. Moreover, these heterogeneous systemscan be interconnected to form a cluster of heterogeneous nodes, and once again exploiting the available comutationalpower brings the same type of problems, at a different level. A collaborative execution environment [1] is presented forexploiting data parallelism in a heterogeneous system composed by CPUs and GPUs, and the extension of CUDA isproposed, for using it in clusters of message-passing systems (MPI-CUDA [2]), in order to take advantage of clustersof these types of heterogeneous nodes.References:

• Aleksandar Ilic, Leonel Sousa, "Collaborative Execution Environment for Heterogeneous Parallel Systems",submitted to PDP’10.

• Schinichi Yamagiwa and Leonel Sousa, "CaravelaMPI: Message Passing Interface for Parallel GPU-based Ap-plications", ISPDC’09

StarPU, a runtime system for accelerator-based multicore machinesSamuel Thibault University Bordeaux 1 samuel.thibault@

labri.fr

Keywords: Multicore, Accelerator, GPU, Cell, Scheduling, DSMAbstract: StarPU is a scheduling framework for heterogeneous architectures which uses all available computing unitsin a uniform way. We achieve high performance by using a powerful data-caching and data-prefetching engine and byusing autotuning performance prediction models which permit to easily implement advanced scheduling policies.References:

• http://runtime.bordeaux.inria.fr/StarPU/

17

MUMPS: A multifrontal massively parallel sparse direct solverBora Ucar LIP/ENS Lyon [email protected]

Keywords: Parallel computing; sparse direct solvers; linear system of equationsAbstract: Improving the behaviour of parallel direct methods on modern platforms is critical to solve large linearsystems of equations arising in many scientific and engineering applications. Considering the complexity of novelparallel architectures, including massively parallel and multi-core systems, there are challenging research issues toaddress. After a presentation of our parallel sparse direct solver (MUMPS), we will discuss some of those issues.References:

• P. R. Amestoy, A. Guermouche, J.-Y. L’Excellent and S. Pralet. Hybrid scheduling for the parallel solution oflinear systems, Parallel Computing 32 (2): 136-156, 2006.

A Fast Minimal Storage Factorization of Symmetric MatricesJerzy Wasniewski Danish Technical University [email protected]

Keywords: Numerical analysis, Linear Algebra, Symmetric, triangular, and Hermitian matrices, Cholesky algorithm,diagonal pivoting method.Abstract: We describe a new data formats for storing triangular, symmetric, and Hermitian matrices. The standardtwo dimensional arrays of Fortran and C (also known as a full format) that are used to store triangular, symmetric,and Hermitian matrices waste nearly half the storage space but provide high performance via the use of level 3 BLAS.Standard packed format arrays fully utilize storage (array space) but provide low performance as there are no level 3packed BLAS. We combine the good features of packed and full storage using the new formats to obtain high per-formance using L3 (level 3). Also, these new formats require exactly the same minimal storage as LAPACK packedformat. These new formats even out perform the LAPACK full format for some computer platforms.References:

• LAPACK Note 199. F.G. Gustavson, J. Wasniewski, J. Langou and J.J. Dongarra "Rectangular Full PackedFormat for Cholesky’s Algorithm: Factorization, Solution and Inversion". UT-CS-08-614, April 28, 2008.Accepted for TOMS/ACM.

Parallel adaptive finite element package with dynamic load balancing for 3D thermomechan-ical problems

Roman Wyrzykowski Czestochowa University of Technology [email protected]

Keywords: 3D FEM, adaptive methods, parallel algorithms/applications, dynamic load balancing, multigrid algo-rithms, PC-clusters, multicore computing, GPUAbstract: Numerical modeling of 3D thermomechanical problems is a complex and time consuming issue. Adap-tive techniques are powerful tools to perform efficiently such modeling using the FEM analysis. During adaptationcomputational workloads change unpredictably at the runtime, therefore dynamic load balancing is required.

This paper presents the parallel adaptive FEM package NuscaS with the dynamic load balancing for 3D unstruc-tured meshes. This object-oriented package for the parallel FEM modeling is developed at Czestochowa University ofTechnology to investigate different thermomechanical phenomena. NuscaS uses the message-passing paradigm, and issuitable for distributed memory parallel computers such as PC-clusters. The implementation of adaptation in NuscaSis based on using the ParMETIS tool for the mesh repartitioning and load balancing.

Multigrid methods are among the fastest numerical algorithms for solving large sparse systems of linear equations.Multigrid is also a good preconditioning algorithm for Krylov iterative solvers. That is why, in this paper we presentalso parallelisation aspects of geometric multigrid algorithms developed for the NuscaS package. A parallel conjugategradient method with multigrid preconditioning is used for solving FEM problems using NuscaS on PC-clusters.References:

• R. Wyrzykowski, T. Olas, N. Sczygiol, Object-Oriented Approach to Finite Element Modeling on Clusters,Lecture Notes in Computer Science, 1947 (2001), 250-257.

18

• T. Olas, R. Wyrzykowski, K. Karczewski, A. Tomas, Performance Modeling of Parallel FEM Computations onClusters, Lecture Notes in Computer Science, 3019 (2004), 189-200.

• R. Wyrzykowski, N. Meyer, T. Olas, L. Kuczynski, B. Ludwiczak, C. Czaplewski, S. Oldziej, Meta-computationson the CLUSTERIX Grid, Lecture Notes in Computer Science, 2007, 4699, 489-500.

Resource Management and Scheduling for Clouds and GridsRamin Yahyapour TU Dortmund ramin.yahyapour@udo.

edu

Keywords: Cloud Computing, SchedulingAbstract: The dynamic management of Cloud environments is a complex task for resource providers as well asservice consumer. Considering that in a large-scale system, 10.000s of services as well as 10.000s of cores need tobe managed, manual management is not suitable and automatic mechanisms need to be established. The optimizationgoals are varying as performance is not necessarily the most important criteria. New criteria like power consumption,fault-tolerance become more important. SLAs are considered as a major element in future managament systems.

Perfomance of resource allocation policies in grid systemsStylianos Zikos Aristotle university of Thessaloniki, Department of

[email protected]

Keywords: Distributed systems, grids, site selection, resource allocation policiesAbstract: Efficient job scheduling in large scale distributed systems, such as grids, is challenging due to the largenumber of distributed and heterogeneous resources. Job scheduling takes place at multiple levels and is addressedby different entities at grid and local level. Grid schedulers can utilize dynamic site load information for site selec-tion, while local schedulers apply resource allocation policies. Keeping low the response times of jobs is a primaryobjective regarding performance. Moreover, other important parameters need to be taken into account, such as thecommunication traffic among schedulers, the fair utilization of available resources, and the energy consumption. Theoptimization of the parameters mentioned above is not an easy task, especially in an environment where job servicedemands may be highly variable and unknown a priori.References:

• S. Zikos and H.D. Karatza, “Resource Allocation Strategies in a 2-level Hierarchical Grid System”, Proceedingsof the 41st Annual Simulation Symposium (ANSS), IEEE Computer Society Press, SCS, April 13-16, 2008,Ottawa, Canada, pp. 157-174.

• S. Zikos and H. Karatza, “The Impact of Service Demand Variability on Resource Allocation Strategies in aGrid System”, ACM Transactions on Modeling and Computer Simulation (TOMACS), accepted, to appear.

• S. Zikos and H. Karatza, “Communication Cost Effective Scheduling Policies of Nonclairvoyant Jobs with LoadBalancing in a Grid”, The Journal of Systems and Software, Elsevier, accepted, to appear.

High-Performance Computing in Global Optimization and Optimization-Based Visualiza-tion

Julius Zilinskas Institute of Mathematics and Informatics [email protected]

Keywords: parallel algorithms, global optimization, visualization of multidimensional dataAbstract: Many problems in engineering, physics, economy and other fields are reduced to global minimization withmany local minimizers. Global optimization algorithms are computationally intensive, therefore high-performancecomputing is important. Multidimensional scaling is a technique for visualization of multidimensional data, whoseessential part is optimization of a function possessing many adverse properties including multidimensionality, multi-modality and non-differentiability. In this talk optimization and visualization algorithms relying on linear algebra and

19

their parallelization are discussed.References:

• R. Ciegis, D. Henty, B. Kågström, J. Žilinskas (Eds.)(2009) Parallel Scientific Computing and Optimization.Springer, ISBN 978-0-387-09706-0. doi:10.1007/978-0-387-09707-7.

3 List of Participants

Name Affiliation e-mail Center of interestsFrancisco Almeida La Laguna University [email protected] MultiCore, Cluster, Grid, Library,

Scheduling, Heterogeneous com-puting, GPU.

Mark Baker SSE, University of Reading [email protected] MultiCore, Cluster, Grid, Library,GPU.

Ranieri Baraglia CNR-ISTI [email protected]

Applications, Heterogeneous com-puting, GPU.

Jorge Barbosa University of Porto [email protected] MultiCore, Scheduling, Heteroge-neous computing.

Leszek Borzemski Wroclaw University of Technology,Wroclaw, Poland

[email protected]

MultiCore, Cluster, Grid, Applica-tions, Library, Scheduling, Hetero-geneous computing, GPU.

Thomas Brady University College Dublin [email protected] Grid, Applications, Library, Het-erogeneous computing.

Francois Broquedis INRIA - LaBRI - University of Bor-deaux

[email protected]

MultiCore.

Jose Carlos Cabaleiro Univ. Santiago de ompostela [email protected] MultiCore, Cluster, Grid, Applica-tions.

Gabriele Capannini CNR-ISTI [email protected]

Applications, Heterogeneous com-puting, GPU.

Uros Cibej University of Ljubljana, Slovenia [email protected]

Grid, Scheduling.

Andrea Clematis IMATI - CNR [email protected]

MultiCore, Grid, Applications.

Sylvain Contassot-Vivier

University Henri Poincare - Nancy1

[email protected] MultiCore, Cluster, Grid, Applica-tions, NumericalAnalysis, Hetero-geneous computing, GPU.

Marco Danelutto Dept. Computer Science - Univ.Pisa

[email protected] MultiCore, Cluster, Grid, Heteroge-neous computing.

Ramon Doallo University of A Coruña [email protected] MultiCore, Cluster, Grid, Ap-plications, NumericalAnalysis, li-brary,GPU, Other.

Katerina Doka National Technical University ofAthens

[email protected]

Grid, Applications, Heterogeneouscomputing.

Anne C. Elster NTNU (Norwegian Univ. of Sci-ence and Technology

[email protected] MultiCore, Cluster, Grid, Applica-tions, NumericalAnalysis, Library,Scheduling, Heterogeneous com-puting, GPU.

Francisco F. Rivera Univ. Santiago de Compostela [email protected] MultiCore, Cluster, Grid, Applica-tions.

Ivan Georgiev Institute of Mathematics and Infor-matics, BAS

[email protected] Applications, NumericalAnalysis,Library, Heterogeneous computing.

Arnaud Giersch LIFC, Univ. Franche-Comté [email protected]

Cluster, Grid, Scheduling, Hetero-geneous computing.

Domingo Gimenez University of Murcia [email protected] Applications, Scheduling, Hetero-geneous computing.

Magne Haveraaen Universitetet i Bergen [email protected]

MultiCore, NumericalAnalysis, Li-brary, Scheduling, GPU.

Alexandru Herisanu University Politehnica of Bucharest [email protected] MultiCore, Cluster, Grid, Applica-tions, Heterogeneous computing.

Emmanuel Jeannot INRIA [email protected]

MultiCore, Cluster, Grid, Library,Scheduling.

20

Alexey Kalinov Cadence Design Systems [email protected] Cluster, Applications, Heteroge-neous computing, GPU.

KonstantinosKaraoglanoglou

Aristotle University of Thessa-loniki, Department of Informatics

[email protected] Cluster, Grid, Scheduling, Hetero-geneous computing.

Eleni Karatza Aristotle University of Thessa-loniki, Department of Informatics

[email protected] Cluster, Grid, Scheduling, Hetero-geneous computing.

Daniel Katz University of Chicago [email protected] MultiCore, Cluster, Grid, Applica-tions, Scheduling, Heterogeneouscomputing.

Philipp Kegel University of Munster [email protected]

MultiCore, GPU.

Attila Kertesz MTA SZTAKI [email protected]

Grid, Scheduling.

Pierre Kuonen University of Applied Sciences ofWestern Switzerland, Fribourg

[email protected] MultiCore, Grid, Library, GPU.

Krzysztof Kurowski Institute of Bioorganic Chemistry -Poznan Supercomputing and Net-working Center

[email protected]

MultiCore, Cluster, Grid, Applica-tions, Library, Scheduling, Hetero-geneous computing, GPU.

Jan Kwiatkowski Wroclaw University of Technology [email protected]

MultiCore, Cluster, Grid, Heteroge-neous computing, GPU.

Alexey Lastovetsky University College Dublin [email protected]

MultiCore, Cluster, Grid, Library,Scheduling, Heterogeneous com-puting.

Fotis Loukos Aristotle University of Thessa-loniki, Department of Informatics

[email protected] Applications, Library, Scheduling,Heterogeneous computing.

Sidi Ahmed Mah-moudi

University of Mons [email protected]

Applications, GPU.

Pierre Manneback University of Mons [email protected]

.

Svetozar Margenov Institute for Parallel Processing,BAS

[email protected]

Cluster, Applications, Numerical-Analysis, Heterogeneous comput-ing, Other.

Ester Martin Garzon University of Almeria [email protected] MultiCore, Cluster, Applications,Heterogeneous computing, GPU.

José Carlos MouriñoGalleg

Application Senior Technician /CESGA

[email protected] MultiCore, Grid, Applications, Li-brary, Scheduling Heterogeneouscomputing, GPU.

Zafeirios Papazachos Aristotle University of Thessaloniki [email protected] Cluster, Grid, Scheduling, Hetero-geneous computing.

Marcin Paprzycki SRI PAS [email protected]

Grid, Applications, Library,Scheduling, Heterogeneous com-puting, Other.

Marcelo Pasin University of Lisbon [email protected] Cluster, Grid.Dana Petcu West University of Timisoara [email protected] MultiCore, Cluster, Grid, Applica-

tions, NumericalAnalysis, Schedul-ing.

Antonio Plaza University of Extremadura [email protected] Cluster, Applications, Heteroge-neous computing, GPU.

Peter Popov IPP-BAS [email protected] NumericalAnalysis, Library, Het-erogeneous computing.

Manuel Prieto-Matias Complutense University of Madrid [email protected] MultiCore, Applications, Schedul-ing, Heterogeneous computing,GPU.

Pierre Ramet LaBRI - INRIA Bordeaux [email protected] MultiCore, NumericalAnalysis, Li-brary, Scheduling, GPU.

Thomas Rauber University Bayreuth [email protected] MultiCore, Cluster, Numerical-Analysis, Scheduling, Heteroge-neous computing.

Wolfgang Rehm University of Technology Chemnitz(CUT)

[email protected] Cluster, Library, Heterogeneouscomputing.

Nuno Roma INESC-ID / IST TU Lisbon [email protected] MultiCore, Applications, Heteroge-neous computing.

21

Gudula Rünger Chemnitz University of Technol [email protected]

MultiCore, Cluster, Applications,NumericalAnalysis, Scheduling.

Franck Seinstra Department of Computer Science,Vrije Universiteit, Amsterdam

[email protected] MultiCore, Cluster, Grid, Applica-tions, Library, GPU, Other.

João Sobral Universidade do Minho [email protected] MultiCore, Cluster, Library, Het-erogeneous computing.

Ozan Sonmez Delft University of Technology [email protected] Cluster, Grid, Applications,Scheduling.

Leonel Sousa NESC-ID/IST, TU Lisbon [email protected] Cluster, Cluster, Scheduling, Het-erogeneous computing, GPU.

Frederic Suter CC IN2P3 / CNRS [email protected]

Grid, Scheduling, Heterogeneouscomputing.

Fabricio Sylva University of Lisbon, Departmentof Informatics

"FabricioA.B.Silva"<[email protected]>

Cluster, Grid, Scheduling, Hetero-geneous computing.

Guillermo L. Taboada University of A Coruña, Spain [email protected] MultiCore, Cluster, Library.Samuel Thibault University Bordeaux 1 samuel.thibault@labri.

frMultiCore, Library, Scheduling,Heterogeneous computing, GPU.

Bora Ucar LIP/ENS Lyon [email protected] NumericalAnalysis, Scheduling,Heterogeneous computing.

Jerzy Wasniewski Danish Technical University [email protected] MultiCore, Applications, Numeri-calAnalysis, Library, Other.

Roman Wyrzykowski Czestochowa University of Tech-nology

[email protected] MultiCore, Cluster, Grid, Applica-tions, NumericalAnalysis, Library,GPU, Other.

Ramin Yahyapour TU Dortmund [email protected] Cluster, Grid, Scheduling.Stylianos Zikos Aristotle university of Thessaloniki,

Department of [email protected] Cluster, Grid, Scheduling, Hetero-

geneous computing.Julius Zilinskas Institute of Mathematics and Infor-

[email protected]

MultiCore, Cluster, Grid, Applica-tions, NumericalAnalysis, Schedul-ing, Heterogeneous computing.

22

Date post:	22-Nov-2023
Category:	Documents
Upload:	ntnu-no
View:	0 times
Download:	0 times

Complex HPC meeting

Documents