Complex HPC meeting
Mark Baker Anne C. Elster Emmanuel Jeannot Antonio Plaza Leonel SousaFrédéric suter
October 19-20 2009Lisbon, Portugal
1 Program
Room 2.2 Room 2.3 Room 1.1
9h-9h45
9h55-10h55 2
10h55-11h25
11h25-12h45 2
12h45-14h15
14h15-15h55 6
15h55-16h25
16h25-18h05 4
Evening
9h-11h 10 8 7 9
11h-11h30
11h30-12h00
12h15-12h45
12h45-14h
14h-16h
Mon
day
octo
ber
19
Social Event (Restaurant TBA)
MC Meeting
ScheduleTime
Tues
day
obto
ber
20
Free time for discusssion
Lunch
Break
Working group discussions
Wrap-up
3
1
Room 2.1
Lunch
Break
Break
5
Introduction
1
1
Group nam
eFirstnam
eLastnam
eTitle
Keyw
ords
Anne C
.Elster
GPU
Com
puting and Nurm
erical Algorithm
s for Com
plex HPC
GPU
Com
puting and Nurm
erical Algorithm
s for Com
plex HPC
FranciscoF. R
iveraResearch activities of the C
A group (U
niv. Santiago de C
ompostela)
Dom
ingoG
imenez
Parallel Routines M
odelling and Applications at the Parallel C
omputing G
roup of the U
nviersity of Mu
parallel routines modeling, heterogeneous com
puting, scheduling, parallal computing applications
Alexandru
Herisanu
HPC
Infrastructure and Applications in C
S@
UPB
Cluster com
puting, HPC
, applications, Grid C
omputing
Alexey
LastovetskyH
igh performance heterogeneous com
puting in UCD
high performance heterogeneous com
putingD
anaPetcu
Parallel computing in R
omania
Cluster com
puting; Applications in science and engineering
LeonelSousa
PCs(C
PU+
GPU
)= H
eterogeneous System
sG
PU, C
UD
A, M
PI, CU
BLA
S, A
TLAS
Ram
onD
oalloResearch activities of the C
A group (U
niv. A C
oruña)H
PC support tools - Program
mability - Parallel libraries - G
raphicsM
arkBaker
Multi-core, C
louds, low-level infrastructure, and green com
putingM
ulti-core, clouds, low-level infrastructure, green com
putingU
rosCibej
Challenges of data-intensive com
putation in heterogeneous environments
data-intensive computation, data replication, scheduling
Andrea
Clem
atisIssues for m
ulti level parallel systems exploitation in com
plex applicationsm
ultilevel parallelism, grid, applications
Katerina
Doka
Research A
ctivities of the PDSG
Group
Grid, C
loud, P2P, Data M
anagement
Daniel
Katz
Science on the TeraG
ridlarge-scale com
puting, distributed applicationsEster
Martin G
arzonThe research group S
upercomputation-A
lgorithms
high performance com
puting; image processing; tom
ographic reconstruction; global optimization;
multim
edia.Svetozar
Margenov
Parallel PCG
Algorithm
s for Voxel FEM
System
slarge-scale scientific com
puting, parallel computing, IB
M B
lue Gene/P
PierreRam
etD
ynamic S
cheduling for Sparse D
irect Solver on N
UM
A and M
ulticore Architectures
Sparse D
irect solver, NU
MA, M
ulticore, Dynam
ic Scheduling
Bora
Ucar
MU
MPS
: A m
ultifrontal massively parallel sparse direct solver
Parallel computing; sparse direct solvers; linear system
of equationsJerzy
Wasniew
skiA Fast M
inimal S
torage Factorization of Sym
metric M
atricesN
umerical analysis, Linear A
lgebra, Sym
metric, triangular, and H
ermitian m
atrices, Cholesky
algorithm, diagonal pivoting m
ethod.Rom
anW
yrzykowski
Parallel adaptive finite element package w
ith dynamic load balancing for 3D
therm
omechanical problem
s3D
FEM, adaptive m
ethods, parallel algorithms/applications, dynam
ic load balancing, multigrid
algorithms, PC
-clusters, multicore com
puting, GPU
Magne
Haveraaen
A H
ardware Independent Parallel Program
ming M
odelJorge
Barbosa
Dynam
ic scheduling on heterogeneous machines
dynamic scheduling, parallel task, list scheduling, cluster com
putingEleni
Karatza
Performance of S
cheduling Strategies in D
istributed System
sPerform
ance, Sheduling, M
apping, Distributed S
ystems, S
imulation.
Gudula
Rünger
Mapping and scheduling of Parallel Tasks for M
ulti-core System
sParallel tasks, m
ixed parallelism, scheduling, m
appingO
zanSonm
ezApplication-O
riented Scheduling in M
ulticluster gridsM
ulticluster grids, co-allocation, cycle scavenging, job runtime predictions
Konstantinos
Karaoglanoglou
Discovering resources and m
apping in large-scale distributed environments
Resource D
iscovery, Fault-Tolerance, Trust Issues, Distributed S
ystems, G
rid and Cluster C
omputing
PierreKuonen
On-going research activities at the G
RID
& U
biquitous Com
puting Group
Grid program
ming, resources m
anagement.
Zafeirios
PapazachosM
odeling and Performance A
nalysis of Scheduling A
lgorithms in G
rid System
sScheduling policies; D
istributed systems; Perform
ance; Modeling and S
imulation; G
rid and cluster com
putingRam
inYahyapour
Resource M
anagement and S
cheduling for Clouds and G
ridsCloud C
omputing, S
chedulingStylianos
Zikos
Perfomance of resource allocation policies in grid system
sD
istributed systems, grids, site selection, resource allocation policies
Ranieri
Baraglia
Sorting using bitonic netw
ork with C
UD
AG
PU, Parallel program
ming, B
itonic sortSylvain
Contassot-
Vivier
PDE solver using asynchronous algorithm
s on a GPU
clusterCluster com
puting, GPU
computing, asynchronism
, numerical com
putation
JanKw
iatkowski
Autom
atic Program Parallelization
parallel processing, automatic program
parallelizationSidi A
hmed
Mahm
oudiParallel Im
age Processing on GPU
with C
UD
A and O
penGL
GPU
, CU
DA, O
penGL, Im
age Processing, Medical Im
agingSam
uelThibault
StarPU
, a runtime system
for accelerator-based multicore m
achinesM
ulticore, Accelerator, G
PU, C
ell, Scheduling, D
SM
Thomas
Brady
Sm
artGridR
PC: The N
ew R
PC M
odel of High Perform
ance Grid C
omputing and its
Implem
entationG
rid Com
puting, High Perform
ance Com
puting, Scientific C
omputing, G
rid Middlew
are, GridR
PC,
Sm
artGridR
PC.
Attila
Kertesz
Grid Interoperability S
olutions in Grid R
esource Managem
entG
rid Interoperability, Grid R
esource Managem
ent, Grid B
roker, Meta-B
rokeringFotis
LoukosD
istributed computation in Peer-to-Peer netw
orksdistributed com
putation, peer-to-peerM
arcinPaprzycki
Softw
are Agents as R
esource Brokers in G
ridThom
asRauber
Parallel OD
E-solvers on Multi-core S
ystems
multi-core, num
erical analysis, OD
E solversFranck
Seinstra
Real-W
orld Distributed C
omputing w
ith Ibishigh-perform
ance distributed computing ; user transparency ; platform
-independence ; middlew
are-independence; fault-tolerance ; m
alleability ; guaranteed connectivity ; real-world, real-tim
e, and off-line applications ; \The Prom
ise of the Grid\""
LeszekBorzem
skiM
WIN
G: A
Scalable Experim
ental Framew
ork for Monitoring and M
easurement in
Distributed C
omplex H
PC Environm
entsG
rid, cloud computing, com
munication system
, heterogeneous computing, distributed system
m
onitoring and measurem
entM
arcoD
aneluttoAutonom
ic managem
ent for efficient HPC
structured parallel computating, algorithm
ic skeletons, autonomic m
anagement, non functional
features, performance tuning
Alexey
Kalinov
Challenges of physical verification on heterogeneous platform
EDA, physical verification, hierarchical processing
PhilippKegel
Challenges and A
pproaches in Parallelizing Applications for M
edical Imaging
Medical Im
aging, PET, Threading Building B
locks, GPG
PUAntonio
PlazaH
igh Performance C
omputing in R
emote S
ensing Applications
High perform
ance computing, cluster com
puting, heterogeneous computing, FPG
As, G
PUs,
applications, rN
unoRom
aH
eterogeneous multi-core com
puter architectures and dedicated processing structures for signal processing applications
Heterogeneous m
ulti-core architectures, Video encoding, B
iological sequences alignment (D
NA and
protein)Julius
Zilinskas
High-Perform
ance Com
puting in Global O
ptimization and O
ptimization-B
ased Visualization
parallel algorithms, global optim
ization, visualization of multidim
ensional dataFrancois
Broquedis
ForestGO
MP, an efficient O
penMP runtim
e system for hierarchical architectures
OpenM
P, multicore, runtim
e system, affinities, N
UM
AM
anuelPrieto-M
atiasO
S S
cheduling on Asym
metric M
ulticore System
sO
S scheduling, A
symm
etric Multicore
JoãoSobral
Parallel programm
ing refinements for heterogeneous m
ulti-core parallel systems
separation of concerns, heterogeneous multicore system
s, parallel programm
ing
Monitoring/visualisation: 2 talks
Multicore: 3 talks
Applications: 5
talks
General talks
(small scale): 7 Talks
General talks
(large scale): 7 talks
GPU
: 5 talks
Middlew
are: 6 talks
Scheduling 2: 5
talks
Scheduling 1: 4
talks
Num
erical analysis: 5 talks
2
2 List of Presentations
Multi-core, Clouds, low-level infrastructure, and green computingMark Baker SSE, University of Reading mark.baker@computer.
org
Keywords: Multi-core, clouds, low-level infrastructure, green computingAbstract: New hardware and software technologies are emerging all the time, and there is a never-ending issue of whatchoices should be used to support both legacy and evolving applications and their algorithms. An example of one ofthese issues is the use of multi-core processors. The producers of these processors seem to think that a shared-memorymodel is adequate, but in reality, the HPC market-place will want to use large systems of multi-core processors, wherethere will be a need for intra and inter communication between the processors and cores. This type of programmingcan be overcome using MPI/OpenMP/threads, but in reality a programming paradigm that encompasses this typeof architecture is needed. Indeed, the way Intel, AMD, IBM Cell, and GPUs processors are designed means thatoptimisation of code on these systems will take a lot of time and effort. Another area that is changing rapidly is themove away from Grid computing, towards Clouds and Virtualisation, which are innovative, but potentially do not meetthe needs of some HPC users. In addition, the need for Green IT computing that optimises application performanceand potentially saves power is a very important aspect of computing in the future.
Sorting using bitonic network with CUDARanieri Baraglia CNR-ISTI ranieri.baraglia@
isti.cnr.it
Keywords: GPU, Parallel programming, Bitonic sortAbstract: After a short description of the ISTI’s HPCLab and a highlight on the research activities conducted, wepresent a fast sorting algorithm implementing an efficient bitonic sorting network on graphics processors. Sortingis a fundamental and universal problem in computer science. Even if sort has been extensively addressed by manyresearch works, it still remains an interesting challenge to make it faster by exploiting novel technologies. In thislight, the presentation shows how to use graphics processors as coprocessors to speed up sorting while allowing CPUto perform other tasks. Our new algorithm exploits a memory-efficient data access pattern maintaining the minimumnumber of accesses to the memory out of the chip. We introduce an efficient instruction dispatch mechanism toimprove the overall sorting performance.References:
• K. E. Batcher. Sorting networks and their applications. In AFIPS 68(Spring): Proceedings of the April 30 May2,1968, spring joint computer conference, pages307 314, New York, NY, USA, 1968. ACM.
• M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In FOCS 99: Pro-ceedings of the 40th Annual Symposium on Foundations of Computer Science, Washington, DC, USA, 1999.IEEE Computer Society.
• N. K. Govindara ju, J. Gray, R. Kumar, and D. Manocha. Gputerasort: High performance graphics coprocessorsorting for large database management. In ACM SIGMOD International Conference on Management of Data,Chicago, United States, June 2006.
Dynamic scheduling on heterogeneous machinesJorge Barbosa University of Porto [email protected]
Keywords: dynamic scheduling, parallel task, list scheduling, cluster computingAbstract: The aim of this presentation is to describe the work developed on scheduling dynamically multi-user andindependent jobs on clusters, both homogeneous and heterogeneous. The dynamic behavior means that the scheduleris able to adapt the scheduling when new jobs are submitted and also when processors availability changes. The aim
3
for future work is to study the possibility of extending the scheduling algorithms for dynamic application schedulingon multicore heterogeneous machines. Another field of research is to develop grid (multi-cluster) scheduling strategiesaccording to the utility computing concept. This is, to lower computation costs without damaging users QoS require-ments, by taking into account energy consumption as well as computation time.References:
• J. Barbosa, Belmiro Moreira Dynamic job scheduling on heterogeneous clusters in 8th International Symposiumon Parallel and Distributed Computing, pp.3-10, 2009
• J. Barbosa, J. Tavares and A.J. Padilha: Optimizing dense linear algebra algorithms on heterogeneous machines,Algorithms and Tools for Parallel Computing On Heterogeneous Clusters, Nova Science Publisher, N.Y., pp17-31, 2007
MWING: A Scalable Experimental Framework for Monitoring and Measurement in Dis-tributed Complex HPC Environments
Leszek Borzemski Wroclaw University of Technology, Wroclaw,Poland
Keywords: Grid, cloud computing, communication system, heterogeneous computing, distributed system monitoringand measurementAbstract: Today?s complex scientific problems are programmed on high-performance computing systems which areoften developed as distributed systems. Monitoring and measurements can help in exploitation of distributed systems,especially in the context of performance. Collected measurements can be essential in terms of network behavior di-agnosis and understanding how the Internet works and how to improve its operability. We have been designing aMWING scalable experimental framework to setup and carry out distributed monitoring and measurements. MWINGis a good starting point in the design of HPC systems based on cloud computing paradigm. The system can synchro-nize the activity of measurement/monitoring autonomous software agents that collect information on various networkperformance/reliability characteristics, e.g. to measure the quality of communication services. MWING frameworkuses local and centralized databases. Locally gathered data can be uploaded in one place allowing further analysis ofdata, e.g. with the use of a professional data mining engine. MWING has been used in a global experiment to observeand predict network behavior in a real-life Internet-wide measurement distributed infrastructure.References:
• BORZEMSKI L., CICHOCKI L., FRAS M., KLIBER M., NOWAK Z., MWING: A Multiagent System forWeb Site Measurements. LNCS, vol. 4496, 2007, 278-287.
BORZEMSKI L., CICHOCKI L., KLIBER M., Architecture of Multiagent Internet Measurement SystemMWING Release 2, LNCS, vol. 5559, 2009, 410-419.
SmartGridRPC: The New RPC Model of High Performance Grid Computing and its Imple-mentation
Thomas Brady University College Dublin [email protected]
Keywords: Grid Computing, High Performance Computing, Scientific Computing, Grid Middleware, GridRPC,SmartGridRPC.Abstract: The SmartGridRPC model is an extension of the GridRPC model, which aims to achieve higher perfor-mance. The traditional GridRPC provides a programming model and API for mapping individual tasks of an appli-cation in a distributed Grid environment, which is based on the client-server model characterised by the star networktopology. SmartGridRPC provides a programming model and API for mapping a group of tasks of an application in adistributed Grid environment, which is based on the fully connected network topology.
The presentation will outline the SmartGridRPC programming model and API, its implementation in SmartGrid-Solve and its performance advantages over the GridRPC model. In addition, experimental results using a real-world
4
application will also presented.References:
• Brady T., Dongarra J., Guidolin M., Lastovetsky A., and Seymour K.. SmartGridRPC: The New RPC Modelfor High Performance Grid Computing and its Implementation in SmartGridSolve., Manuscript submitted forpublication, April 2009.
• Guidolin M., Brady T. And Lastovetsky A. ADL: Obtaining Higher Performance in SmartGridSolve with Irreg-ular Algorithms, Manuscript submitted for publication, April 2009.
• Brady T, Guidolin M, Lastovetsky A. Experiments with SmartGridSolve: Achieving higher performance by im-proving the GridRPC model, in Proceedings of the 9th IEEE/ACM International Conference on Grid Computing(Grid2008), Tsukuba, Japan, 29.
• Brady T, Konstantinov E, Lastovetsky A, SmartNetSolve: High Level Programming System for H
ForestGOMP, an efficient OpenMP runtime system for hierarchical architecturesFrancois Broquedis INRIA - LaBRI - University of Bordeaux francois.broquedis@
labri.fr
Keywords: OpenMP, multicore, runtime system, affinities, NUMAAbstract: Today, multicore is everywhere and HPC computers are getting more and more hierarchical a. nd hard toprogram efficiently. Even if we can still program them like SMP architectures, the performance we obtain is oftendisappointing. Indeed, we now have to take care of both cache and memory affinities, to benefit from cache sharingand to limit NUMA penalties. The ForestGOMP platform extends the OpenMP runtime system to do so by providinga way to group related threads together and attach data to these groups. It then provides several scheduling policiesto distribute OpenMP threads and the attached data on any hierarchical architecture, and ways to design your ownscheduler.References:
• Dynamic Task and Data Placement over NUMA Architectures: an OpenMP Runtime Perspective. FrancoisBroquedis, Nathalie Furmento, Brice Goglin, Raymond Namyst, and Pierre-Andre Wacrenier.
• Scheduling Dynamic OpenMP Applications over Multicore Architectures. Francois Broquedis, Francois Di-akhate, Samuel Thibault, Olivier Aumage, Raymond Namyst, and Pierre-Andre Wacrenier.
Challenges of data-intensive computation in heterogeneous environmentsUros Cibej University of Ljubljana, Slovenia uros.cibej@fri.
uni-lj.si
Keywords: data-intensive computation, data replication, schedulingAbstract: Data-intensive applications are gaining more and more importance in HPC. The emergence of new appli-cations that require enormous amount of data has revealed new problems that need to dealt with when executing suchjobs in heterogeneous and geographically distributed environments. In our presentation we will describe some of thechallenges in this area and directions which we have taken in order to solve this issues.
Issues for multi level parallel systems exploitation in complex applicationsAndrea Clematis IMATI - CNR [email protected].
cnr.it
Keywords: multilevel parallelism, grid, applicationsAbstract: Different research activities are currently carried out at IMATI-CNR involving the exploitation of multi-level parallel systems in complex applications. An overview of issues arising from these activities will be presented
5
referring to the following domains: - biomedical domain: aspects related with the use of multilevel parallel processingare considered with reference to molecular docking and tissues micro array analysis; - hydro-meteo domain: ongoingresearch activities to deploy on the Grid complex hydro-meteo workflows are shortly illustrated; - mechanical sys-tem design: the need of practical multilevel parallelism in the design and simulation of mechanical components arediscussed. Current research activities and projects on these topics are described as well.
PDE solver using asynchronous algorithms on a GPU clusterSylvain Contassot-Vivier
University Henri Poincare - Nancy 1 [email protected]
Keywords: Cluster computing, GPU computing, asynchronism, numerical computationAbstract: We present a PDE solver based on the multisplitting-Newton approach using a GPU accelerated sparse lin-ear solver as inner core. The multisplitting approach allows us to make use of asynchronism which sensibly decreasesthe overall computation time by performing an implicit overlapping of computations by communications. Moreover,as most PDE problems come from physical modelling in which the dependency scheme produces sparse matrices, wepropose as our inner linear solver, a sparse one, designed to work with structured matrices where all the non-zeros areon a few diagonals. Finally, several benchmarks point out the interest of using asynchronous algorithms together withlocal accelerators like GPUs.References:
• The inner linear solver has been presented in ParCo2009. That presentation encloses the results of the entireproject consisting in using asynchronous algorithms on GPU clusters.
Autonomic management for efficient HPCMarco Danelutto Dept. Computer Science - Univ. Pisa [email protected]
Keywords: structured parallel computating, algorithmic skeletons, autonomic management, non functional features,performance tuningAbstract: We will discuss recent advances in the autonomic management of non functional features (performancetuning, security, fault tolerance, power management) in structured parallel computations. Results will be presented re-lated to the management of performance in different experiments. Evolution of the methodology that allows managingseveral different non functional concerns within the same application will also be discussed.References:
• Marco Aldinucci, Marco Danelutto, Peter Kilpatrick, Autonomic management of non-functional concerns indistributed - parallel application programming, IPDPS 2009, Rome, May 2009
• Marco Aldinucci, Marco Danelutto and Peter Kilpatrick, Co-design of distributed systems using skeletons andautonomic management abstractions, in: Euro-Par 2008 Workshops - Parallel Processing, Selected Papers, pages403414, Springer, 2009
• Marco Aldinucci, Marco Danelutto, Giorgio Zoppi and Peter Kilpatrick, Advances in Autonomic Components- Services, in: From Grids To Service and Pervasive Computing (Proc. of the CoreGRID Symposium 2008),pages 3-17, Springer, 2008
Research activities of the CA group (Univ. A Coruña)Ramon Doallo University of A Coruña [email protected]
Keywords: HPC support tools - Programmability - Parallel libraries - GraphicsAbstract: This talk will present the current research lines of the Computer Architecture Group of the University ofA Coruña, Spain, which involve support tools for HPC (high performance compilers, middleware, fault-tolerance,administration), programmability (PGAS UPC), development of parallel libraries, and computer graphics and visual-ization.
6
Research Activities of the PDSG GroupKaterina Doka National Technical University of Athens [email protected].
ntua.gr
Keywords: Grid, Cloud, P2P, Data ManagementAbstract: The presentation is about the research activities of the Parallel and Distributed Systems Group of theSchool of Electrical and Computer Engineering of the National Technical University of Athens. We will focus onData Management in Peer-to-Peer systems.
GPU Computing and Nurmerical Algorithms for Complex HPCAnne C. Elster NTNU (Norwegian Univ. of Science and Technology [email protected]
Keywords: GPU Computing and Nurmerical Algorithms for Complex HPCAbstract: In this presentations I will highlight some of my group’s work on numerical algorithm in GPU computingand how it relates to Complex HPC. As a leader for the "Numerical analysis for hierarchical and heterogeneous andmulticore systems" Working Group of this COST Action I will also give some pointer to current issues. I also inviteothers to come to me with suggestions.References:
• http://www.idi.ntnu.no/ elster/hpc-lab http://www.idi.ntnu.no/ elster/hpc-group/elster-alumni.html
Research activities of the CA group (Univ. Santiago de Compostela)Francisco F. Rivera Univ. Santiago de Compostela [email protected]
Keywords:Abstract: A summary of the recent research activities of the group will be presented: performance analysis andprediction, run-time performance optimization, applications, Grid simmulation among others.References:
• www.ac.usc.es
Parallel Routines Modelling and Applications at the Parallel Computing Group of the Un-viersity of Mu
Domingo Gimenez University of Murcia [email protected]
Keywords: parallel routines modeling, heterogeneous computing, scheduling, parallal computing applicationsAbstract: In this talk we introduce the Parallel Computing Group at the University of Murcia and the fields whereit research: parallel routines modeling and optimization, scheduling on heterogeneous systems, and applications ofparallel computing (meteorology, maritime contamination, filters design...)
A Hardware Independent Parallel Programming ModelMagne Haveraaen Universitetet i Bergen Magne.Haveraaen@ii.
uib.no
Keywords:Abstract: Parallel programming faces two major challenges: how to efficiently map computations to different parallelhardware architectures, and how to do it in a modular way, i.e., without rewriting the problem solving code. We pro-poseto treat dependencies as first class entities in programs. Programming a highly parallel machine or chip can thenbe formulated as finding an efficient embedding of the computations data dependency into the underlying hardwarescommunication layout. With the data dependency pattern of a computation extracted as an explicit entity in a program,one has a powerful tool to deal with parallelism.References:
7
• Eva Burrows and Magne Haveraaen: A Hardware Independent Parallel Programming Model. Journal of Logicand Algebraic Programming, Volume 78, Issue 7, August-September 2009, Pages 519-538. http://dx.doi.org/10.1016/j.jlap.2009.06.002
HPC Infrastructure and Applications in CS@UPBAlexandru Herisanu University Politehnica of Bucharest [email protected]
Keywords: Cluster computing, HPC, applications, Grid ComputingAbstract: In this presentation we show the recent advances in terms of hardware and software infrastructure as well asHPC applications in the Computer Science and Engineering Department of the University Politehnica of Bucharest.
Challenges of physical verification on heterogeneous platformAlexey Kalinov Cadence Design Systems [email protected]
Keywords: EDA, physical verification, hierarchical processingAbstract: Physical Verification is a process whereby an integrated circuit layout is checked via Electronic Design Au-tomation(EDA) software tools to see if it meets certain criteria. The nature of the process is quite irregular. Problemsin parallel implementation and possible benefits of using modern heterogeneous platform are discussed.
Discovering resources in distributed environmentsKonstantinosKaraoglanoglou
Aristotle University of Thessaloniki, Department ofInformatics
Keywords: Resource Discovery, Distributed SystemsAbstract: The proposed research work discusses issues concerning the "Algorithms and tools for mapping and ex-ecuting applications onto distributed and heterogeneous systems" scientific area. The discussed research work isconducted at the Parallel and Distributed Systems Group in the Department of Informatics of the Aristotle Universityof Thessaloniki under the supervision of Professor Dr. Helen Karatza. In our previous efforts, we have extensivelydealt with the discovery of resources and the efficient mapping of applications in large-scale heterogeneous distributedenvironments. In order to efficiently identify appropriate resources for certain applications, we enhanced the proposedmechanisms with a matchmaking framework. Moreover, our research took the direction of managing the uncertaintiesencountered in such environments by proposing a mechanism able to overcome the phenomenon of resource failures.As for future research directions, we intend to deal with efficient mapping mechanisms that take into considerationthe "resource evolution" phenomenon, weighing in the technical changes that so commonly occur in the resourcesof distributed environments. Finally, we intend to deal extensively in the direction of trusted mapping and directingapplications to reliable resources in a distributed environment. The objective of this research direction is to providerobustness to distributed environments against malicious behaviours.References:
• K. Karaoglanoglou and H. Karatza, "Resource Discovery in a Dynamical Grid based on Re-routing Tables",SIMPAT, Elsevier, July 2008.
Performance of Scheduling Strategies in Distributed SystemsEleni Karatza Aristotle University of Thessaloniki, Department of
Keywords: Performance, Scheduling, Mapping, Distributed Systems, Simulation.Abstract: Distributed systems offer considerable computational power, which can be used to solve problems with largecomputational requirements. The scheduling strategy that might be used in such a system is of great significance sincethe performance achieved is proportional to the algorithms effectiveness. An efficient scheduling strategy maximizesthe system performance and avoids unnecessary delays. In this talk we will present various scheduling strategies indistributed systems for various workloads. Parallel jobs are examined. Simulation models are used to evaluate the
8
performance of the scheduling algorithms.References:
• H. D. Karatza, Periodic Task Cluster Scheduling in Distributed Systems, in Computer System PerformanceModeling in Perspective, E. Gelenbe Editor, World Scientific, Imperial College Press, 2006, pp. 257-276.
Science on the TeraGridDaniel Katz University of Chicago [email protected]
Keywords: large-scale computing, distributed applicationsAbstract: This presentation will talk about the TeraGrid, and how various types of users make use of it to obtainresearch results
Challenges and Approaches in Parallelizing Applications for Medical ImagingPhilipp Kegel University of Munster philipp.kegel@
uni-muenster.de
Keywords: Medical Imaging, PET, Threading Building Blocks, GPGPUAbstract: Image reconstruction in the field of medical imaging, in particular computer tomography (PET), is a time-consuming task. In addition, new algorithms are being developed to increase image quality (e.g. respiratory motioncorrection), which require even more computational power. In a joint project with physicists, physicians, mathemati-cians and other computer scientists we are working on the parallelization of a real-world application for PET medicalimaging. We ported this application to various parallel architectures (clusters, GPUs, multi-cores) to evaluate theiraptitude for imaging algorithms. Currently, we are working with new programming models (e.g., Threading BuildingBlocks or OpenCL) to exploit recent parallel architectures, and started developing a domain-specific parallel libraryfor medical imaging applications.References:
• T. Hoefler, M. Schellmann, S. Gorlatch, and A. Lumsdaine. Communication optimization for medical image re-construction algorithms. In Lecture Notes in Computer Science, volume 5205, pages 75–83, Berlin/Heidelberg,2008. Springer.
• M. Schellmann, J. Vörding, S. Gorlatch, and D. Meiländer. Cost-effective medical image reconstruction: fromclusters to graphics processing units. In CF ’08: Proceedings of the 2008 conference on Computing Frontiers,pages 283–292, New York, NY, USA, 2008. ACM.
Grid Interoperability Solutions in Grid Resource ManagementAttila Kertesz MTA SZTAKI attila.kertesz@
sztaki.hu
Keywords: Grid Interoperability, Grid Resource Management, Grid Broker, Meta-BrokeringAbstract: Since the management and beneficial utilization of highly dynamic grid resources cannot be handled bythe users themselves, various grid resource management tools have been developed, supporting different grids. Toease the simultaneous utilization of different middleware systems, researchers need to revise current solutions. GridInteroperability can be achieved at different levels of grid systems. In this talk we gather interoperation efforts in Gridresource management, focusing on the following approaches: (1) extending existing resource brokers with multiplemiddleware support, (2) interfacing grid portals to different brokers and middleware or (3) developing a new, higherlevel middleware component that not only interfaces various brokers but also coordinates their simultaneous utilization.We show that all of these approaches contribute to enable Grid Interoperability, and conclude that the third solution isa significant step towards the final solution.References:
9
• A. Kertesz, P. Kacsuk, Grid Interoperability Solutions in Grid Resource Management, IEEE Systems Journal’sSpecial Issue on Grid Resource Management, Volume 3, Issue 1, pp. 131-141, doi: 10.1109/JSYST.2008.2011263,March 2009.
• A. Kertesz, J. D. Dombi, J. Dombi, Adaptive scheduling solution for grid meta-brokering, Acta CyberneticaVolume 19, pp. 105-123, 2009.
On-going research activities at the GRID & Ubiquitous Computing GroupPierre Kuonen University of Applied Sciences of Western Switzer-
land, [email protected]
Keywords: Grid programming, resources management.Abstract: The GRID & Ubiquitous Computing Group is a part of the Information and Communication Technologiesdepartment of the University of Applied Sciences of Western Switzerland, Fribourg (HES-SO/EIA-FR). This group isactive on topics related to parallel and distributed systems such as Grid computing and was partner of the CoreGRIDNoE. More specifically activities of the group are mainly focusing on the following aspects:
• Programming models and tool for GRID and distributed systems.
• Resource management for GRID systems. This presentation will focus on these two aspects of the researchactivities of the GRID & Ubiquitous Computing Group. Two projects will be presented
• POP-C++: a comprehensive object-oriented environment for developing HPC applications on the Grid.
• SmartGRID: a grid scheduling framework based on swarm intelligence for the scope of serving the overallGRID as a whole
References:
• "Programming the Grid with POP-C++", T. A. Nguyen, P. Kuonen, in Future Generation Computer Systems(FGCS), N.H. Elsevier, Volume 23, Issue 1, 1 January 2007, pages 23-30.
• Ye Huang, Amos Brocco, Pierre Kuonen, Michele Courant, Beat Hirsbrunner, "SmartGRID: A Fully Decentral-ized Grid Scheduling Framework Supported by Swarm Intelligence," in: International Conference on Grid andCooperative Computing (GCC’08), Shenzhen, China, IEEE Press, 160-168, October, 2008.
Automatic Program ParallelizationJan Kwiatkowski Wroclaw University of Technology jan.kwiatkowski@pwr.
wroc.pl
Keywords: parallel processing, automatic program parallelizationAbstract: Nowadays, it becomes more and more hard to enhance processor speed, therefore multiplying processingunits seems to be the best way to achieve larger performance. It causes that multi-core and hybrid processors as well asGPU’s used as a processing units stay more and more popular for commercial and home usage. Developing programsfor these “architectures” required from the programmers some additional specific knowledge about the processorarchitecture and parallel programming. We propose the SliCer, the hardware independent tool that in automatic wayparallelized serial programs written in C language by creating the proper number of threads that can be later executedin parallel depending on the used architecture. We used virtualization that makes it possible to utilize a variety ofhardware resources dynamically selected to ensure efficient execution performed in accordance to the requirements.References:
• Automatic Program Parallelization for Multicore Processors, accepted for PPAM’09 Conference
10
High performance heterogeneous computing in UCDAlexey Lastovetsky University College Dublin alexey.lastovetsky@
ucd.ie
Keywords: high performance heterogeneous computingAbstract: In this talk, I will present the research and development in the area of high performance heterogeneouscomputing conducted in the UCD Heterogeneous Computing Laboratory. Our current research covers models, algo-rithms and programming tools for high performance computing on heterogeneous computational clusters and Grids.Now we are extending our target platforms to mutlicores, hybrid computing nodes and clusters of such nodes.References:
• hcl.ucd.ie
Distributed computation in Peer-to-Peer networksFotis Loukos Aristotle University of Thessaloniki, Department of
Keywords: distributed computation, peer-to-peerAbstract: Peer-to-Peer networks are an emerging topic in networking. They consist of many users, each one equal toanother, forming an overlay over another network like the internet. One of their usages is cooperation and distributedcomputation. There are many large scale projects for distributed computation that use such networks and frameworkshave been created to support them. We will present this model of computation together with the advantages, disadvan-tages and problems that we face when implementing it.References:
• Fotis Loukos and Helen Karatza, "Reputation based Friend-to-Friend networks", Peer-to-Peer Networking andApplications, Springer, Volume 2, Issue 1, Pages 13-23, March 2009
• Fotis Loukos and Helen Karatza, "Enabling Cooperation in MANET-based Peer-to-Peer systems", In MobilePeer-to-Peer Computing for Next Generation Distributed Environments: Advancing, Conceptual and Algorith-mic Applications, Boon-Chong Seet Editor, IGI Global, Pages 118-131, 2009
Parallel Image Processing on GPU with CUDA and OpenGLSidi Ahmed Mahmoudi University of Mons sidi.mahmoudi@umons.
ac.be
Keywords: GPU, CUDA, OpenGL, Image Processing, Medical ImagingAbstract: The motivation of our work is to demonstrate the interest of the GPU exploitation using CUDA and OpenGLfor boosting performances of image processing algorithms. This concern is particularly important for a broad setof applications, such as real-time video processing, motion analysis, etc. We have implemented several algorithmssuch as geometrical transformations, removing noise, Gaussian smoothing, edge detection. These algorithms havebeen applied on high resolution and medical images. We propose a development scheme based upon CUDA forparallel constructs and OpenGL for visualization, which reduces data transfer between device and host memories.Experimental results have been conducted on several platforms, e.g. GPU GeForce8600 and GPU GTX280, showinga global speedup ranging from 20 to 60, by comparison with a standard CPU implementation.References:
• S.A. Mahmoudi, M.S. Haidar, N. Ihaddadene, Ch. Djeraba. Abnormal event Detection in real time video.1st Int. Workshop on Multimedia Interaction Analysis of Users a Controlled Environment.Oct. 2008, Chania,Greece. Mohammed Benjelloun, Saïd Mahmoudi “Spine Localization in X-ray Images Using Interest PointDetection”. Journal of Digital Imaging. Vol 22, No 3 (June), 2009: pp 309-318
• Mohammed Benjelloun, Saïd Mahmoudi “X-ray Image Segmentation for Vertebral Mobility Analysis”. Interna-tional Journal of Computer Assisted Radiology and Surgery. Volume 2, Number 6, Avril 2008, pages 371-383.
11
Parallel PCG Algorithms for Voxel FEM SystemsSvetozar Margenov Institute for Parallel Processing, BAS margenov@parallel.
bas.bg
Keywords: large-scale scientific computing, parallel computing, IBM Blue Gene/PAbstract: The presented study is motivated by the development of parallel numerical methods, algorithms, and soft-ware tools for micro finite element simulation of human bones. The voxel representation of the bone micro structureis obtained from a high resolution computer tomography image.The reference volume element has a strongly hetero-geneous micro structure composed of solid and fluid phases. Cruzeix-Raviart and Rannacher-Turek nonconformingfinite elements are used to discretize the arising strongly heterogeneous elasticity problems.
The efficiency of codes incorporating BoomerAMG and parallel MIC(0) will be discussed. The size of the consid-ered large scale problems goes beyond a billion of degrees of freedom. The presented parallel numerical tests includeresults on IBM Blue Gene/P machine of the Bulgarian Supercomputing Center. The ongoing Bulgarian NSF projectCenter of Excellence on Supercomputing Applications will be introduced.References:
• P. Arbenz, S. Margenov, Y. Vutov, Parallel MIC(0) preconditioning of 3D elliptic problems discretized byRannacher-Turek finite elements, Computers and Mathematics with Applications, 55 (10), 2197-2211
• S. Margenov, Y. Vutov, Parallel MIC(0) Preconditioning for Numerical Upscaling of Anisotropic Linear ElasticMaterials, Large-Scale Scientific Computing, Springer LNCS (to appear)
• N. Kosturski, S. Margenov, Numerical Homogenization of Bone Microstructure, Large-Scale Scientific Com-puting, Springer LNCS (to appear)
Supercomputation-AlgorithmsEster Martin Garzon University of Almeria [email protected]
Keywords: HPC; image processing; 3D tomography; global optimization; multimedia.Abstract: The research group Supercomputation-Algorithms focuses his interest on a set of problems that require in-tensive computation and that come from different scientific and technological fields, our lines are: (1) Development ofHigh Performance Computing techniques and; (2) its application to the fields of: (a) image processing and tomographicreconstruction; (b) global optimization and (c) multimedia. In HPC area, our goals are: design of new dynamic loadbalancing approaches focused on heterogeneous or non-dedicated clusters and tuning the sparse matrix-vector productto current parallel architectures. For image processing and tomographic reconstruction, our lines are: Improvement ofthe algorithms for noise reduction and the methodologies for 3D tomographic reconstruction; development of a newalgorithm for segmentation embedded in our noise reduction method; parallelization of 3D reconstruction methods forcomputer clusters and supercomputers. In global optimization the problems to be tackled are: the study of the effec-tiveness and efficiency of meta-heuristic methods; development of techniques for the reduction of the complexity andthe search space in Branch and Bound algorithms; the design of efficient high performance metaheuristic algorithmsto solve engineering, and industrial problems. In multimedia our research is intended to improve the characteristic ofa fully scalable video compression system into P2P multicast networks and mobile environments.References:
• http://www.ace.ual.es/Investigacion/public.html
Modeling and Performance Analysis of Scheduling Algorithms in Grid SystemsZafeirios Papazachos Aristotle University of Thessaloniki [email protected]
Keywords: Scheduling policies; Distributed systems; Performance; Modeling and Simulation; Grid and cluster com-putingAbstract: Distributing systems have emerged as a cost-effective and scalable solution to the increasing demand forcomputing resources. A distributed system consists of several resources which are connected via a communication
12
network. In order to maximize the efficiency of such a system, a proper scheduling algorithm is necessary. Thescheduling algorithm is responsible for allocating the available system resources to the existing jobs. An effectiveway of examining an algorithms performance is by implementing a simulation model and then running simulationexperiments. Topics of interest that are currently examined by means of simulation are: Quality of Service (QoS)in grid and cluster systems, scheduling policies for heterogeneous systems and performance analysis of distributedsystems.References:
• Z. Papazachos, H. Karatza, ?Performance Evaluation of Gang Scheduling in a Two-Cluster System with Migra-tions?, Proceedings of the 8th International Workshop on Performance Modeling, Evaluation, and Optimizationof Ubiquitous Computing and Network Systems (PMEO-UCNS 2009) in conjunction with IEEE InternationalParallel & Distributed Processing Symposium (IPDPS), Rome, Italy, May 25-29, 2009, pp. 1-8.
• Z. Papazachos and H. Karatza, ?The Impact of Task Service Time Variability on Gang Scheduling Performancein a Two-Cluster System?, Simulation Modelling Practice and Theory, Elsevier, Volume 17, Issue 7, August2009, pp. 1276-1289.
Software Agents as Resource Brokers in GridMarcin Paprzycki SRI PAS marcin.paprzycki@
ibspan.waw.pl
Keywords:Abstract: It is a widely held belief that software agents will become the next revolution in information technology.One of the areas where they are expected to play an important role is the Grid. Claims to this effect, as well asresearch results can be found in work of B. diMartino, O. Rana, B. Prasad and others. In our work we have taken adifferent approach to these researchers and proposed that agent teams should be utilized for resource brokering andmanagement. In this way, software agents become the ?brain? for the Grid ?brawn.? The aim of the presentation willbe to outline the assumptions underlying our work, introduce our system and discuss how agents representing usersinteract with the agent teams, and how the high-level intelligent infrastructure can utilize the actual Grid middlewareto execute a job.
Parallel computing in RomaniaDana Petcu West University of Timisoara [email protected]
Keywords: Cluster computing; Applications in science and engineeringAbstract: The presentation will refer to the activities of the Romanian teams working with parallel computing theoriesor parallel codes and their applications in engineering and science. A particular emphasis will be given on the resultsof the recent national collaborative projects as well as those of the team from West University of Timisoara [1-9].References:
• G. Macariu, D. Petcu: Parallel Multiple Polynomial Quadratic Sieve on Multi-Core Architectures. SYNASC2007: 59-65
• N. Somosi, D. Petcu: A Parallel Algorithm for Rendering Huge Terrain Surfaces. SYNASC 2006: 274-278
• D. Petcu: Parallel Jess. ISPDC 2005: 307-316
• D. Petcu: Adapting a Partitioning-Based Heuristic Load-Balancing Algorithm to Heterogeneous ComputingEnvironments. SYNASC 2005: 170-173
• C. Bonchis, G. Ciobanu, C. Izbasa, D. Petcu: A Web-Based P Systems Simulator and Its Parallelization. UC2005: 58-69
• D. Petcu: Parallel Explicit State Reachability Analysis and State Space Construction. ISPDC 2003: 207-214
• D. Zaharie, D. Petcu: Adaptive Pareto Differential Evolution and Its Parallelization. PPA
13
High Performance Computing in Remote Sensing ApplicationsAntonio Plaza University of Extremadura [email protected]
Keywords: High performance computing, cluster computing, heterogeneous computing, FPGAs, GPUs, applications,remote sensingAbstract: Remote sensing applications for Earth and planetary observation require computationally effective pro-cessing techniques in order to facilitate exploitation of high dimensional data sets in several contexts, includingenvironmental modeling and assessment, risk/hazard prevention and response, defense/security, and monitoring ofhuman-induced threats such as oil spills and other types of chemical contamination. With the aim of providing anoverview of recent developments and new trends in the design of parallel and distributed systems for hyperspectralimage analysis, this paper discusses and inter-compares different strategies for efficiently implementing a standardhyperspectral image processing chain, including heterogeneous networks of workstations, field programmable gatearrays (FPGAs) or graphics processing units (GPUs). Combined, these parts deliver a snapshot of the state-of-the-artin those areas, and a thoughtful perspective on the potential and emerging challenges of adapting high performancecomputing systems to remote sensing problems.
OS Scheduling on Asymmetric Multicore SystemsManuel Prieto-Matias Complutense University of Madrid [email protected]
Keywords: OS scheduling, Asymmetric MulticoreAbstract: Asymmetric multicore processors promise higher performance per watt than their symmetric counterparts,but to fully tap into their potential, the operating system must be aware of the asymmetry present in the platform.Previous research examining this issue has focused on mapping sequential applications to asymmetric cores in con-sideration of their Instruction Level Parallelism. The ArTeCS group has been working on this problem but addressinginstead the mapping of parallel applications that exhibit variations in their amount of thread-level parallelism. We willalso describe other research interests of the ArTeCS group within this network.
Dynamic Scheduling for Sparse Direct Solver on NUMA and Multicore ArchitecturesPierre Ramet LaBRI - INRIA Bordeaux [email protected]
Keywords: Sparse Direct solver, NUMA, Multicore, Dynamic SchedulingAbstract: Parallel sparse direct solvers are now able to solve efficiently real-life three-dimensional problems withseveral millions of equations. The PaStiX solver provided a hybrid MPI-thread implementation that is well suitedfor SMP nodes. This technique allows to treat large 3D problems where the memory overhead due to communicationbuffers was a bottleneck to the use of direct solvers. We introduce a simple way to schedule dynamically an applicationbased on a dependency tree to be more suitable for NUMA or multi-core architectures.References:
• see http://www.labri.fr/perso/ramet/
Parallel ODE-solvers on Multi-core SystemsThomas Rauber University Bayreuth rauber@uni-bayreuth.
de
Keywords: multi-core, numerical analysis, ODE solversAbstract: Ordinary differential equations (ODEs) arize in many application areas such as fluid flow, physics-basedanimation, mechanical systems, or chemical reaction. Due to the advent of multi-core technology, parallel resourcesare now widely available. The objective of this talk is to give an overview of parallel ODE solvers in the context ofthe hierarchical architecture of multi-core systems.
14
Heterogeneous multi-core computer architectures and dedicated processing structures forsignal processing applications
Nuno Roma INESC-ID / IST TU Lisbon [email protected]
Keywords: Heterogeneous multi-core architectures, Video encoding, Biological sequences alignment (DNA and pro-teins)Abstract: Two research projects ongoing in INESC-ID on multi-core computer architectures and dedicated processingstructures for signal processing applications will be presented: (*) Heterogeneous any-core processing structures foradvanced video coding (H.264/AVC); (*) Heterogeneous multi-core processing structures for computational biologyacceleration.
The first project targets the development of a novel class of scalable and highly-efficient heterogeneous multi-corearchitectures for advanced video encoding using a Processor-In-Memory (PIM) approach. The architecture consistsnot only of programmable general purpose processors (GPPs), but also of dedicated and highly efficient acceleratorunits, interconnected with fast data communication buses and programmable interconnection switches, using efficientdata distribution strategies to minimize the bandwidth requirements.
The second project targets the proposal of a new self-contained architecture of a heterogeneous parallel structurefor generic biological sequences alignment (DNA and protein). It incorporates a general purpose processor (GPP)and multiple specialized structures dedicated to the computation of pairwise matching scores. The GPP allows theimplementation of non-regular parts of several SW tools that can be implemented by this structure, including notonly the simple global/local alignment algorithms (Smith-Waterman, etc.), but also a broad set of packages based onheuristic strategies (FASTA, BLAST, etc.).References:
• T. Almeida, N. Roma, "A parallel programming framework for multi-core DNA sequence alignment", submittedto Int. Workshop on Multi-Core Computing Systems (MuCoCoS’2010), Poland, Feb. 2010;
• T. Dias, N. Roma, L. Sousa, M. Ribeiro, "Reconfigurable architectures and processors for real-time video motionestimation", J. Real-Time Image Processing, Springer Berlin, vol. 2, n. 4, pp 191-205, Dec. 2007;
• T. Dias, S. Momcilovic, N. Roma, L. Sousa, "Adaptive Motion Estimation Processor for Autonomous VideoDevices", EURASIP J. on Embedded Systems, Hindawi, n. 57234, pp. 1-10, May 2007.
Mapping and scheduling of Parallel Tasks for Multi-core SystemsGudula Rünger Chemnitz University of Technol ruenger@informatik.
tu-chemnitz.de
Keywords: Parallel tasks, mixed parallelism, scheduling, mappingAbstract: Multi-core clusters provide a huge amount of computing resources and use a hierarchically structuredinterconnection network. In this talk, we consider hierarchically structured parallel tasks to improve application per-formance on multi-core clusters. In particular, we consider scheduling and mapping techniques for parallel tasks thattake the architecture of the target systems into consideration. We evaluate the impact of scheduling and mapping fordifferent application programs on different parallel machinesReferences:
• Dümmler, J.; Rauber, T.; Rünger, G.: Mapping Algorithms for Multiprocessor Tasks on Multi-Core Clusters. In:Proc. of the 37th Int.Conf. on Parallel Processing (ICPP 2008): S. 141-148. IEEE Computer Society, Portland,Oregon, USA, 2008.
• Dümmler, J.; Kunis, R.; Rünger, G.: Layer-Based Scheduling Algorithms for Multiprocessor-Tasks with Prece-dence Constraints. In: Parallel Computing: Architectures, Algorithms and Applications: Proc. of the Int.Conf.ParCo 2007 (Advances in Parallel Computing, Bd. 15): S. 321-328. IOS Press, Jülich/Aachen, Germany, 2007
• Dümmler, J.; Rauber, T.; Rünger, G.: Scalable Computing with Parallel Tasks To appear in: Proc. of the 2ndWorkshop on Many-Task Computing on Grids and Supercomputers (MTAGS09), co-
15
Real-World Distributed Computing with IbisFranck Seinstra Department of Computer Science, Vrije Universiteit,
Keywords: high-performance distributed computing ; user transparency ; platform-independence ; middleware-independence; fault-tolerance ; malleability ; guaranteed connectivity ; real-world, real-time, and off-line applications; "The Promise of the Grid"Abstract: Ibis is an open source software framework, developed at the Vrije Universiteit, Amsterdam, that drasticallysimplifies the process of programming and deploying high-performance parallel and distributed (grid) applications.Ibis supports a range of programming models that yield efficient implementations, even on distributed sets of hetero-geneous resources. Also, Ibis is specifically designed to run in hostile (grid) environments that are inherently dynamicand faulty, and that suffer from connectivity problems.
One of the main features of the Ibis system is that it allows multiple grid systems, clusters, clouds, mobile devices,and stand-alone machines to be applied concurrently and transparently from within a single application (even underreal-time constraints). Ibis has been applied successfully in a number real-world applications, and has won prizes inseveral international competitions, including the First IEEE International Scalable Computing Challenge (at CCGrid2008), and the First International Data Analysis Challenge for Finding Supernovae (at IEEE Cluster/Grid 2008).Future research goals of the Ibis project include support for efficient and transparent use of multi-core systems andhardware accelerators (incl. GPUs, FPGAs, CELLSs, etcetera).References:
• Ibis website: http://www.cs.vu.nl/ibis/
• H.E. Bal, N. Drost, R. Kemp, J. Maassen, R.V. van Nieuwpoort, C. van Reeuwijk, and F.J. Seinstra. "Ibis: Real-World Problem Solving using Real-World Grids". Proceedings of the 23rd International Parallel & DistributedProcessing Symposium (IPDPS 2009) - Sixth High-Performance Grid Computing Workshop (HPGC 2009),Rome, Italy, May 2009.
• R. Kemp, N.O. Palmer, Th. Kielmann, F.J. Seinstra, N. Drost, J. Maassen and H.E. Bal, "eyeDentify: Mul-timedia Cyber Foraging from a Smartphone", IEEE International Symposium on Multimedia (ISM2009), SanDiego, USA, 14-16 December 2009.
Parallel programming refinements for heterogeneous multi-core parallel systemsJoão Sobral Universidade do Minho [email protected]
Keywords: separation of concerns, heterogeneous multicore systems, parallel programmingAbstract: Programming by refinement advocates the development of programs by incrementally refining a high-levelspecification towards more platform specific implementations. In this talk we present the concept of parallel program-ming refinement and several abstractions that can be used to generate efficient code for shared and distributed memoryarchitectures and compositions of these systems. We particularly present the case of parallel sorting algorithms werewe could generate the most well known parallel implementations by refining an abstract specification of a sortingalgorithm.References:
• R. Gonçalves, J. Sobral, Pluggable Parallelisation, 18th ACM international symposium on High PerformanceDistributed Computing (HPDC’09), Munique, June 2009.
• J. Sobral. Incrementally Developing Parallel Applications with AspectJ, 20th IEEE International Parallel -Distributed Processing Symposium (IPDPS’06), Rhodes, Greece, April 2006.
Application-Oriented Scheduling in Multicluster gridsOzan Sonmez Delft University of Technology [email protected]
16
Keywords: Multicluster grids, co-allocation, cycle scavenging, job runtime predictionsAbstract: Different application types in grids such as workflows, parallel applications, and bags-of-tasks pose differentrequirements that should be taken into account both from scheduling and system point of view, in order to improvetheir execution performance. This talk mainly covers our research and experiences on supporting various applicationtypes in a real multicluster grid scheduler, named KOALA.References:
• + O. Sonmez, H. Mohamed, D.H.J. Epema, "On the Benefit of Processor Co-Allocation in Multicluster GridSystems", IEEE Transactions on Parallel and Distributed Systems, 2009.
• + O. Sonmez, N. Yigitbasi, A. Iosup, and D.H.J. Epema, "Trace-Based Evaluation of Job Runtime and QueueWait Time Predictions in Grids", In the ACM/IEEE Int’l. Symp. on High Performance Distributed Computing(HPDC’09), Jun 11-13, 2009
• + O. Sonmez, B. Grundeken, H. Mohamed, A. Iosup, and D.H.J. Epema, "Scheduling Strategies for CycleScavenging in Multicluster Grid Systems", In the IEEE International Symposium on Cluster Computing and theGrid (CCGrid’09), May 18-21, 2009
PCs(CPU+GPU)= Heterogeneous SystemsLeonel Sousa NESC-ID/IST, TU Lisbon [email protected]
Keywords: GPU, CUDA, MPI, CUBLAS, ATLASAbstract: Nowadays, commodity computers are complex heterogenous systems that provide a huge amount of com-putational power. However, to take advantage of this power we have to orchestrate the use of processing units withdifferent characteristics, such as the general purpose multi-cores and GPUs. Moreover, these heterogeneous systemscan be interconnected to form a cluster of heterogeneous nodes, and once again exploiting the available comutationalpower brings the same type of problems, at a different level. A collaborative execution environment [1] is presented forexploiting data parallelism in a heterogeneous system composed by CPUs and GPUs, and the extension of CUDA isproposed, for using it in clusters of message-passing systems (MPI-CUDA [2]), in order to take advantage of clustersof these types of heterogeneous nodes.References:
• Aleksandar Ilic, Leonel Sousa, "Collaborative Execution Environment for Heterogeneous Parallel Systems",submitted to PDP’10.
• Schinichi Yamagiwa and Leonel Sousa, "CaravelaMPI: Message Passing Interface for Parallel GPU-based Ap-plications", ISPDC’09
StarPU, a runtime system for accelerator-based multicore machinesSamuel Thibault University Bordeaux 1 samuel.thibault@
labri.fr
Keywords: Multicore, Accelerator, GPU, Cell, Scheduling, DSMAbstract: StarPU is a scheduling framework for heterogeneous architectures which uses all available computing unitsin a uniform way. We achieve high performance by using a powerful data-caching and data-prefetching engine and byusing autotuning performance prediction models which permit to easily implement advanced scheduling policies.References:
• http://runtime.bordeaux.inria.fr/StarPU/
17
MUMPS: A multifrontal massively parallel sparse direct solverBora Ucar LIP/ENS Lyon [email protected]
Keywords: Parallel computing; sparse direct solvers; linear system of equationsAbstract: Improving the behaviour of parallel direct methods on modern platforms is critical to solve large linearsystems of equations arising in many scientific and engineering applications. Considering the complexity of novelparallel architectures, including massively parallel and multi-core systems, there are challenging research issues toaddress. After a presentation of our parallel sparse direct solver (MUMPS), we will discuss some of those issues.References:
• P. R. Amestoy, A. Guermouche, J.-Y. L’Excellent and S. Pralet. Hybrid scheduling for the parallel solution oflinear systems, Parallel Computing 32 (2): 136-156, 2006.
A Fast Minimal Storage Factorization of Symmetric MatricesJerzy Wasniewski Danish Technical University [email protected]
Keywords: Numerical analysis, Linear Algebra, Symmetric, triangular, and Hermitian matrices, Cholesky algorithm,diagonal pivoting method.Abstract: We describe a new data formats for storing triangular, symmetric, and Hermitian matrices. The standardtwo dimensional arrays of Fortran and C (also known as a full format) that are used to store triangular, symmetric,and Hermitian matrices waste nearly half the storage space but provide high performance via the use of level 3 BLAS.Standard packed format arrays fully utilize storage (array space) but provide low performance as there are no level 3packed BLAS. We combine the good features of packed and full storage using the new formats to obtain high per-formance using L3 (level 3). Also, these new formats require exactly the same minimal storage as LAPACK packedformat. These new formats even out perform the LAPACK full format for some computer platforms.References:
• LAPACK Note 199. F.G. Gustavson, J. Wasniewski, J. Langou and J.J. Dongarra "Rectangular Full PackedFormat for Cholesky’s Algorithm: Factorization, Solution and Inversion". UT-CS-08-614, April 28, 2008.Accepted for TOMS/ACM.
Parallel adaptive finite element package with dynamic load balancing for 3D thermomechan-ical problems
Roman Wyrzykowski Czestochowa University of Technology [email protected]
Keywords: 3D FEM, adaptive methods, parallel algorithms/applications, dynamic load balancing, multigrid algo-rithms, PC-clusters, multicore computing, GPUAbstract: Numerical modeling of 3D thermomechanical problems is a complex and time consuming issue. Adap-tive techniques are powerful tools to perform efficiently such modeling using the FEM analysis. During adaptationcomputational workloads change unpredictably at the runtime, therefore dynamic load balancing is required.
This paper presents the parallel adaptive FEM package NuscaS with the dynamic load balancing for 3D unstruc-tured meshes. This object-oriented package for the parallel FEM modeling is developed at Czestochowa University ofTechnology to investigate different thermomechanical phenomena. NuscaS uses the message-passing paradigm, and issuitable for distributed memory parallel computers such as PC-clusters. The implementation of adaptation in NuscaSis based on using the ParMETIS tool for the mesh repartitioning and load balancing.
Multigrid methods are among the fastest numerical algorithms for solving large sparse systems of linear equations.Multigrid is also a good preconditioning algorithm for Krylov iterative solvers. That is why, in this paper we presentalso parallelisation aspects of geometric multigrid algorithms developed for the NuscaS package. A parallel conjugategradient method with multigrid preconditioning is used for solving FEM problems using NuscaS on PC-clusters.References:
• R. Wyrzykowski, T. Olas, N. Sczygiol, Object-Oriented Approach to Finite Element Modeling on Clusters,Lecture Notes in Computer Science, 1947 (2001), 250-257.
18
• T. Olas, R. Wyrzykowski, K. Karczewski, A. Tomas, Performance Modeling of Parallel FEM Computations onClusters, Lecture Notes in Computer Science, 3019 (2004), 189-200.
• R. Wyrzykowski, N. Meyer, T. Olas, L. Kuczynski, B. Ludwiczak, C. Czaplewski, S. Oldziej, Meta-computationson the CLUSTERIX Grid, Lecture Notes in Computer Science, 2007, 4699, 489-500.
Resource Management and Scheduling for Clouds and GridsRamin Yahyapour TU Dortmund ramin.yahyapour@udo.
edu
Keywords: Cloud Computing, SchedulingAbstract: The dynamic management of Cloud environments is a complex task for resource providers as well asservice consumer. Considering that in a large-scale system, 10.000s of services as well as 10.000s of cores need tobe managed, manual management is not suitable and automatic mechanisms need to be established. The optimizationgoals are varying as performance is not necessarily the most important criteria. New criteria like power consumption,fault-tolerance become more important. SLAs are considered as a major element in future managament systems.
Perfomance of resource allocation policies in grid systemsStylianos Zikos Aristotle university of Thessaloniki, Department of
Keywords: Distributed systems, grids, site selection, resource allocation policiesAbstract: Efficient job scheduling in large scale distributed systems, such as grids, is challenging due to the largenumber of distributed and heterogeneous resources. Job scheduling takes place at multiple levels and is addressedby different entities at grid and local level. Grid schedulers can utilize dynamic site load information for site selec-tion, while local schedulers apply resource allocation policies. Keeping low the response times of jobs is a primaryobjective regarding performance. Moreover, other important parameters need to be taken into account, such as thecommunication traffic among schedulers, the fair utilization of available resources, and the energy consumption. Theoptimization of the parameters mentioned above is not an easy task, especially in an environment where job servicedemands may be highly variable and unknown a priori.References:
• S. Zikos and H.D. Karatza, “Resource Allocation Strategies in a 2-level Hierarchical Grid System”, Proceedingsof the 41st Annual Simulation Symposium (ANSS), IEEE Computer Society Press, SCS, April 13-16, 2008,Ottawa, Canada, pp. 157-174.
• S. Zikos and H. Karatza, “The Impact of Service Demand Variability on Resource Allocation Strategies in aGrid System”, ACM Transactions on Modeling and Computer Simulation (TOMACS), accepted, to appear.
• S. Zikos and H. Karatza, “Communication Cost Effective Scheduling Policies of Nonclairvoyant Jobs with LoadBalancing in a Grid”, The Journal of Systems and Software, Elsevier, accepted, to appear.
High-Performance Computing in Global Optimization and Optimization-Based Visualiza-tion
Julius Zilinskas Institute of Mathematics and Informatics [email protected]
Keywords: parallel algorithms, global optimization, visualization of multidimensional dataAbstract: Many problems in engineering, physics, economy and other fields are reduced to global minimization withmany local minimizers. Global optimization algorithms are computationally intensive, therefore high-performancecomputing is important. Multidimensional scaling is a technique for visualization of multidimensional data, whoseessential part is optimization of a function possessing many adverse properties including multidimensionality, multi-modality and non-differentiability. In this talk optimization and visualization algorithms relying on linear algebra and
19
their parallelization are discussed.References:
• R. Ciegis, D. Henty, B. Kågström, J. Žilinskas (Eds.)(2009) Parallel Scientific Computing and Optimization.Springer, ISBN 978-0-387-09706-0. doi:10.1007/978-0-387-09707-7.
3 List of Participants
Name Affiliation e-mail Center of interestsFrancisco Almeida La Laguna University [email protected] MultiCore, Cluster, Grid, Library,
Scheduling, Heterogeneous com-puting, GPU.
Mark Baker SSE, University of Reading [email protected] MultiCore, Cluster, Grid, Library,GPU.
Ranieri Baraglia CNR-ISTI [email protected]
Applications, Heterogeneous com-puting, GPU.
Jorge Barbosa University of Porto [email protected] MultiCore, Scheduling, Heteroge-neous computing.
Leszek Borzemski Wroclaw University of Technology,Wroclaw, Poland
MultiCore, Cluster, Grid, Applica-tions, Library, Scheduling, Hetero-geneous computing, GPU.
Thomas Brady University College Dublin [email protected] Grid, Applications, Library, Het-erogeneous computing.
Francois Broquedis INRIA - LaBRI - University of Bor-deaux
MultiCore.
Jose Carlos Cabaleiro Univ. Santiago de ompostela [email protected] MultiCore, Cluster, Grid, Applica-tions.
Gabriele Capannini CNR-ISTI [email protected]
Applications, Heterogeneous com-puting, GPU.
Uros Cibej University of Ljubljana, Slovenia [email protected]
Grid, Scheduling.
Andrea Clematis IMATI - CNR [email protected]
MultiCore, Grid, Applications.
Sylvain Contassot-Vivier
University Henri Poincare - Nancy1
[email protected] MultiCore, Cluster, Grid, Applica-tions, NumericalAnalysis, Hetero-geneous computing, GPU.
Marco Danelutto Dept. Computer Science - Univ.Pisa
[email protected] MultiCore, Cluster, Grid, Heteroge-neous computing.
Ramon Doallo University of A Coruña [email protected] MultiCore, Cluster, Grid, Ap-plications, NumericalAnalysis, li-brary,GPU, Other.
Katerina Doka National Technical University ofAthens
Grid, Applications, Heterogeneouscomputing.
Anne C. Elster NTNU (Norwegian Univ. of Sci-ence and Technology
[email protected] MultiCore, Cluster, Grid, Applica-tions, NumericalAnalysis, Library,Scheduling, Heterogeneous com-puting, GPU.
Francisco F. Rivera Univ. Santiago de Compostela [email protected] MultiCore, Cluster, Grid, Applica-tions.
Ivan Georgiev Institute of Mathematics and Infor-matics, BAS
[email protected] Applications, NumericalAnalysis,Library, Heterogeneous computing.
Arnaud Giersch LIFC, Univ. Franche-Comté [email protected]
Cluster, Grid, Scheduling, Hetero-geneous computing.
Domingo Gimenez University of Murcia [email protected] Applications, Scheduling, Hetero-geneous computing.
Magne Haveraaen Universitetet i Bergen [email protected]
MultiCore, NumericalAnalysis, Li-brary, Scheduling, GPU.
Alexandru Herisanu University Politehnica of Bucharest [email protected] MultiCore, Cluster, Grid, Applica-tions, Heterogeneous computing.
Emmanuel Jeannot INRIA [email protected]
MultiCore, Cluster, Grid, Library,Scheduling.
20
Alexey Kalinov Cadence Design Systems [email protected] Cluster, Applications, Heteroge-neous computing, GPU.
KonstantinosKaraoglanoglou
Aristotle University of Thessa-loniki, Department of Informatics
[email protected] Cluster, Grid, Scheduling, Hetero-geneous computing.
Eleni Karatza Aristotle University of Thessa-loniki, Department of Informatics
[email protected] Cluster, Grid, Scheduling, Hetero-geneous computing.
Daniel Katz University of Chicago [email protected] MultiCore, Cluster, Grid, Applica-tions, Scheduling, Heterogeneouscomputing.
Philipp Kegel University of Munster [email protected]
MultiCore, GPU.
Attila Kertesz MTA SZTAKI [email protected]
Grid, Scheduling.
Pierre Kuonen University of Applied Sciences ofWestern Switzerland, Fribourg
[email protected] MultiCore, Grid, Library, GPU.
Krzysztof Kurowski Institute of Bioorganic Chemistry -Poznan Supercomputing and Net-working Center
MultiCore, Cluster, Grid, Applica-tions, Library, Scheduling, Hetero-geneous computing, GPU.
Jan Kwiatkowski Wroclaw University of Technology [email protected]
MultiCore, Cluster, Grid, Heteroge-neous computing, GPU.
Alexey Lastovetsky University College Dublin [email protected]
MultiCore, Cluster, Grid, Library,Scheduling, Heterogeneous com-puting.
Fotis Loukos Aristotle University of Thessa-loniki, Department of Informatics
[email protected] Applications, Library, Scheduling,Heterogeneous computing.
Sidi Ahmed Mah-moudi
University of Mons [email protected]
Applications, GPU.
Pierre Manneback University of Mons [email protected]
.
Svetozar Margenov Institute for Parallel Processing,BAS
Cluster, Applications, Numerical-Analysis, Heterogeneous comput-ing, Other.
Ester Martin Garzon University of Almeria [email protected] MultiCore, Cluster, Applications,Heterogeneous computing, GPU.
José Carlos MouriñoGalleg
Application Senior Technician /CESGA
[email protected] MultiCore, Grid, Applications, Li-brary, Scheduling Heterogeneouscomputing, GPU.
Zafeirios Papazachos Aristotle University of Thessaloniki [email protected] Cluster, Grid, Scheduling, Hetero-geneous computing.
Marcin Paprzycki SRI PAS [email protected]
Grid, Applications, Library,Scheduling, Heterogeneous com-puting, Other.
Marcelo Pasin University of Lisbon [email protected] Cluster, Grid.Dana Petcu West University of Timisoara [email protected] MultiCore, Cluster, Grid, Applica-
tions, NumericalAnalysis, Schedul-ing.
Antonio Plaza University of Extremadura [email protected] Cluster, Applications, Heteroge-neous computing, GPU.
Peter Popov IPP-BAS [email protected] NumericalAnalysis, Library, Het-erogeneous computing.
Manuel Prieto-Matias Complutense University of Madrid [email protected] MultiCore, Applications, Schedul-ing, Heterogeneous computing,GPU.
Pierre Ramet LaBRI - INRIA Bordeaux [email protected] MultiCore, NumericalAnalysis, Li-brary, Scheduling, GPU.
Thomas Rauber University Bayreuth [email protected] MultiCore, Cluster, Numerical-Analysis, Scheduling, Heteroge-neous computing.
Wolfgang Rehm University of Technology Chemnitz(CUT)
[email protected] Cluster, Library, Heterogeneouscomputing.
Nuno Roma INESC-ID / IST TU Lisbon [email protected] MultiCore, Applications, Heteroge-neous computing.
21
Gudula Rünger Chemnitz University of Technol [email protected]
MultiCore, Cluster, Applications,NumericalAnalysis, Scheduling.
Franck Seinstra Department of Computer Science,Vrije Universiteit, Amsterdam
[email protected] MultiCore, Cluster, Grid, Applica-tions, Library, GPU, Other.
João Sobral Universidade do Minho [email protected] MultiCore, Cluster, Library, Het-erogeneous computing.
Ozan Sonmez Delft University of Technology [email protected] Cluster, Grid, Applications,Scheduling.
Leonel Sousa NESC-ID/IST, TU Lisbon [email protected] Cluster, Cluster, Scheduling, Het-erogeneous computing, GPU.
Frederic Suter CC IN2P3 / CNRS [email protected]
Grid, Scheduling, Heterogeneouscomputing.
Fabricio Sylva University of Lisbon, Departmentof Informatics
"FabricioA.B.Silva"<[email protected]>
Cluster, Grid, Scheduling, Hetero-geneous computing.
Guillermo L. Taboada University of A Coruña, Spain [email protected] MultiCore, Cluster, Library.Samuel Thibault University Bordeaux 1 samuel.thibault@labri.
frMultiCore, Library, Scheduling,Heterogeneous computing, GPU.
Bora Ucar LIP/ENS Lyon [email protected] NumericalAnalysis, Scheduling,Heterogeneous computing.
Jerzy Wasniewski Danish Technical University [email protected] MultiCore, Applications, Numeri-calAnalysis, Library, Other.
Roman Wyrzykowski Czestochowa University of Tech-nology
[email protected] MultiCore, Cluster, Grid, Applica-tions, NumericalAnalysis, Library,GPU, Other.
Ramin Yahyapour TU Dortmund [email protected] Cluster, Grid, Scheduling.Stylianos Zikos Aristotle university of Thessaloniki,
Department of [email protected] Cluster, Grid, Scheduling, Hetero-
geneous computing.Julius Zilinskas Institute of Mathematics and Infor-
MultiCore, Cluster, Grid, Applica-tions, NumericalAnalysis, Schedul-ing, Heterogeneous computing.
22