The Distributed ASCI Super omputer Proje t
Henri Bal, Raoul Bhoedjang, Rutger Hofman, Ceriel Ja obs, Thilo Kielmann, Jason Maassen,
Rob van Nieuwpoort, John Romein, Lu Renambot, Tim R�uhl, Ronald Veldema, Kees Verstoep,
Aline Baggio, Ger o Ballintijn, Ihor Kuz, Guillaume Pierre, Maarten van Steen, Andy Tanenbaum,
Gerben Doornbos, Desmond Germans, Hans Spoelder, Evert-Jan Baerends, Stan van Gisbergen
Fa ulty of S ien es, Vrije Universiteit
Hamideh Afsermanesh, Di k van Albada, Adam Belloum, David Dubbeldam, Zeger Hendrikse,
Bob Hertzberger, Alfons Hoekstra, Kamil Iskra, Drona Kandhai, Dennis Koelma,
Frank van der Linden, Benno Overeinder, Peter Sloot, Piero Spinnato
Department of Computer S ien e, University of Amsterdam
Di k Epema, Arjan van Gemund, Pieter Jonker, Andrei Radules u, Cees van Reeuwijk, Henk Sips
Delft University of Te hnology
Peter Knijnenburg, Mi hael Lew, Floris Sluiter, Lex Wolters
Leiden Institute of Advan ed Computer S ien e, Leiden University
Hans Blom, Cees de Laat, Aad van der Steen
Fa ulty of Physi s and Astronomy, Utre ht University
Abstra t
The Distributed ASCI Super omputer (DAS) is a homogeneous wide-area distributed system onsist-
ing of four luster omputers at di�erent lo ations. DAS has been used for resear h on ommuni ation
software, parallel languages and programming systems, s hedulers, parallel appli ations, and distributed
appli ations. The paper gives a preview of the most interesting resear h results obtained so far in the
DAS proje t.
1
1
More information about the DAS proje t an be found on http://www. s.vu.nl/das/
1
1 Introdu tion
The Distributed ASCI Super omputer (DAS) is an experimental testbed for resear h on wide-area distributed
and parallel appli ations. The system was built for the Advan ed S hool for Computing and Imaging (ASCI)
2
,
a Dut h resear h s hool in whi h several universities parti ipate. The goal of DAS is to provide a ommon
omputational infrastru ture for resear hers within ASCI, who work on various aspe ts of parallel and
distributed systems, in luding ommuni ation substrates, programming environments, and appli ations. Like
a meta omputer [41℄ or omputational grid [17℄, DAS is a physi ally distributed system that appears to its
users as a single, oherent system. Unlike meta omputers, we designed DAS as a homogeneous system.
The DAS system onsists of four luster omputers, lo ated at four di�erent universities in ASCI, linked
together through wide area networks (see Figure 1). All four lusters use the same pro essors and lo al
network and run the same operating system. Ea h university has fast a ess to its own lo al luster. In
addition, a single appli ation an use the entire wide-area system, for example for remote ollaboration
or distributed super omputing. DAS an be seen as a prototype omputational grid, but its homogeneous
stru ture makes it easier to avoid the engineering problems of heterogeneous systems. (Heterogeneous systems
are the obje t of study in several other proje ts, most noti eably Legion [18℄ and Globus [16℄). DAS an
also be seen as a luster omputer, ex ept that it is physi ally distributed.
This paper gives a preview of some resear h results obtained in the DAS proje t sin e its start in June
1997. We �rst des ribe the DAS ar hite ture in more detail (Se tion 2) and then we dis uss how DAS is used
for resear h on systems software (Se tion 3) and appli ations (Se tion 4). Finally, in Se tion 5 we present
our on lusions.
2 The DAS system
DAS was designed as a physi ally distributed homogeneous luster omputer. We de ided to use luster
te hnology be ause of the ex ellent pri e/performan e ratio of ommodity (o�-the-shelf) hardware. We
wanted the system to be distributed, to give the parti ipating universities fast a ess to some lo al resour es.
One of the most important design de isions was to keep the DAS system homogeneous. The reasons for
this hoi e were to allow easy ex hange of software and to stimulate ooperation between ASCI resear hers.
Both the hardware and the software of DAS are homogeneous: ea h node has the same pro essor and runs
the same operating system. Also, the lo al area network within all lusters is the same. The only variations
in the system are the amount of lo al memory and network interfa e memory (SRAM) in ea h node and the
2
The ASCI resear h s hool is unrelated to, and ame into existen e before, the A elerated Strategi Computing Initiative.
2
VU Amsterdam UvA Amsterdam
LeidenDelft
24 24
24128
6 Mbit/sATM
Figure 1: The wide-area DAS system.
number of nodes in ea h luster. Three lusters have 24 nodes; the luster at the VU initially ontained 64
nodes, but was expanded to 128 nodes in May 1998.
We sele ted the 200 MHz Pentium Pro as pro essor for the DAS nodes, at the time of pur hase the
fastest Intel CPU available. The hoi e for the lo al network was based on resear h on an earlier luster
omputer, omparing the performan e of parallel appli ations on Fast Ethernet, ATM, and Myrinet [29℄.
Myrinet was sele ted as lo al network, be ause it was by far the fastest of the networks onsidered. Myrinet
is a swit h-based network using wormhole routing. Ea h ma hine ontains an interfa e ard with the LANai
4.1 programmable network interfa e pro essor. The interfa e ards are onne ted through swit hes. The
Myrinet swit hes are onne ted in a ring topology for the 24-node lusters and in a 4 by 8 torus for the
128-node luster. Myrinet is used as fast user-level inter onne t. The nodes in ea h luster are also onne ted
by a partially swit hed Fast Ethernet, whi h is used for operating system traÆ .
Sin e DAS is homogeneous, it runs a single operating system. We initially hose BSD/OS (from BSDI)
as OS, be ause it is a stable system with ommer ial support. The in reasing popularity of Linux both
worldwide and within the ASCI s hool made us swit h to Linux (RedHat 5.2) in early 1999.
The lusters were assembled by Parsyte . Figure 2 shows the 128-node luster of the Vrije Universiteit.
Ea h node is a ompa t module rather than a desktop PC or minitower, but it ontains a standard PC
motherboard. The lusters were delivered in June 1997.
The lusters are onne ted by wide-area networks in two di�erent ways:
� using the National Resear h Network infrastru ture (best e�ort network) and the LANs of the univer-
sities
� using an ATM based Virtual Private Network (Quality of Servi e network)
3
Figure 2: The 128-node DAS luster at the Vrije Universiteit.
This setup allowed us to ompare a dedi ated �xed Quality of Servi e network with a best e�ort network.
The best e�ort network onsists generally of 100 Mbit/s Ethernet onne tions to a lo al infrastru ture; ea h
university typi ally had a 34 Mbit/s onne tion to the Dut h ba kbone, later in reased to 155 Mbit/s. The
ATM onne tions are all 6 Mbit/s onstant bitrate permanent virtual ir uits. The round trip times on
the ATM onne tions have hardly any variation and typi ally are around 4 ms. On the best e�ort network
the traÆ always has to pass about 4 routers, whi h ause a millise ond delay ea h. Measurements show
that the round trip times vary by about an order of magnitude due to the other internet traÆ . Attainable
throughput on the ATM network is also onstant. On the best e�ort network, the potential throughput is
mu h higher, but during daytime ongestion typi ally gives throughputs of about 1-2 Mbit/s. This problem
improved later during the proje t.
For most resear h proje ts, we thus use the dedi ated 6 Mbit ATM links. The lusters are onne ted by
these links using a fully- onne ted graph topology, so there is a link between every pair of lusters.
3 Resear h on systems software
DAS is used for various resear h proje ts on systems software, in luding low-level ommuni ation proto ols,
languages, and s hedulers.
3.1 Low-level ommuni ation software
High-speed networks like Myrinet an obtain ommuni ation speeds lose to those of super omputers, but
realizing this potential is a hallenging problem. There are many intri ate design issues for low-level network
4
interfa e (NI) proto ols [8℄. We have designed and implemented a network interfa e proto ol for Myrinet,
alled LFC [9, 10℄. LFC is both eÆ ient and provides the right fun tionality for higher-level programming
systems. The LFC software runs partly on the host and partly on the embedded LANai pro essor of the
Myrinet network interfa e ard. An interesting feature of LFC is its spanning tree broad ast proto ol, whi h
is implemented on the NIs. By forwarding broad ast messages on the NI rather than on the host, fewer
intera tions are needed between the NI and the host, thus speeding up broad asts substantially.
We have also developed a higher-level ommuni ation library, alled Panda [5℄, whi h supports asyn-
hronous point-to-point ommuni ation, remote pro edure alls, totally-ordered broad ast, and multithread-
ing. On Myrinet, Panda is implemented on top of LFC. The LFC and Panda libraries have been used for a
variety of programming systems, in luding MPI, PVM, Java, Or a, Ja kal, and CRL.
3.2 Languages and programming systems
Various languages and libraries have been studied using DAS. Part of this work fo uses on lo al lusters,
but several programming environments also have been implemented on the entire wide-area DAS system.
Manta is an implementation of Java designed for high-performan e omputing. Manta uses a stati
(native) ompiler rather than an interpreter or JIT (just-in-time ompiler), to allow more aggressive op-
timizations. Manta's implementation of Remote Method Invo ation (RMI) is far more eÆ ient than that
in other Java systems. On Myrinet, Manta obtains a null-laten y for RMIs of 37 �se , while the JDK 1.1
obtains a laten y of more than 1200 �se [31℄. This dramati performan e improvement was obtained by
generating spe ialized serialization routines during ompile-time, by reimplementing the RMI proto ol it-
self, and by using LFC and Panda instead of TCP/IP. Manta uses its own RMI proto ol, but also has the
fun tionality to interoperate with other Java Virtual Ma hines. To handle polymorphi RMIs [53℄, Manta is
able to a ept a Java lass �le (byte ode) from a JVM, ompile it dynami ally to a binary format, and link
the result into the running appli ation program.
We have also developed a �ne-grained Distributed Shared Memory system for Java, alled Ja kal [52℄.
Ja kal allows multithreaded (shared-memory) Java programs to be run on distributed-memory systems,
su h as a DAS luster. It implements a software a he- oheren e proto ol that manages regions. A region
is a ontiguous blo k of memory that ontains either a single obje t or a �xed-size partition of an array.
Ja kal a hes regions and invalidates the a hed opies at syn hronization points (the start and end of a
syn hronized statement in Java). Ja kal uses lo al and global mark-and-sweep garbage olle tion algorithms
that are able to deal with repli ated obje ts and partitioned arrays. Ja kal has been implemented on DAS
on top of LFC.
5
Spar/Java is a data and task parallel programming language for semi-automati parallel programming, in
parti ular for the programming of array-based appli ations [47℄. Apart from a few minor modi� ations, the
language is a superset of Java. This provides Spar/Java with a modern, solid language as basis, and makes
it a essible to a large group of users. Spar/Java extends Java with onstru ts for parallel programming,
extensive support for array manipulation, and a number of other powerful language features. It has a exible
annotation language for spe ifying data and task mappings at any level of detail [46℄. Alternatively, ompile-
time or run-time s hedulers an do (part of) the s heduling. Spar/Java runs on the DAS using MPI and
Panda.
Or a is an obje t-based distributed shared memory system. Its runtime system dynami ally repli ates
and migrates shared obje ts, using heuristi information from the ompiler. Or a has been implemented
on top of Panda and LFC. An extensive performan e analysis of Or a was performed on DAS, in luding a
omparison with the TreadMarks page-based DSM and the CRL region-based DSM [5℄. Also, a data-parallel
extension to Or a has been designed and implemented, resulting in a language with mixed task and data
parallelism. This extended language and its performan e on DAS are des ribed in [20℄.
In the ESPRIT proje t PREPARE, an HPF ompiler has been developed with an advan ed and eÆ ient
parallelization engine for regular array assignments [14, 39℄. The PREPARE HPF ompiler has been ported
to the DAS and uses the CoSy ompilation framework in ombination with the MPI message passing library.
In another ESPRIT proje t, alled Dynamite
3
, we have developed an environment for the dynami
migration of tasks in a PVM program [21, 22, 44℄. A version of this ode is now available for SUN OS
5.5.1, SUN OS 5.6 and Linux/i386 2.0 and 2.2 (lib 5 and glib 2.0). DAS is being used for developing
and testing the Linux version of this ode. The aim of this work is to develop an environment for Linux,
supporting dynami al task migration for PVM and MPI, and to make this environment available to the
resear h ommunity. Dynamite is minimally intrusive in the sense that it does not require modi� ations in
the user's program and is implemented entirely in user spa e and thus does not require modi� ations to the
kernel. The Dynamite system in ludes: a modi�ed version of the Linux ELF dynami loader, whi h does
he kpointing and restarting of tasks; a modi�ed version of PVM, supporting the transparent migration
of tasks; monitoring programs for the system load and the PVM system; a dynami task s heduler; and
optionally, a GUI an be added that guides the user through the ne essary steps to set up the environment
and to start up a program.
Several programming systems have also been implemented on multiple lusters of the wide-area DAS
system. Both Or a and Manta have been implemented on wide-area DAS and have been used to study the
3
See http://www.hoise. om/dynamite
6
performan e of wide-area parallel appli ations [35, 45℄. In addition, we have developed a new MPI (Message
Passing Interfa e) library for wide-area systems, alled MagPIe [27℄. MagPIe optimizes MPI's olle tive
ommuni ation operations and takes the hierar hi al stru ture of wide-area systems into a ount. With
MagPIe, most olle tive operations require only a single wide-area laten y. For example, an MPI broad ast
message is performed by sending it in parallel over the di�erent wide-area links and then forwarding it
within ea h luster. Existing MPI implementations that are not aware of the wide-area stru ture often
forward a message over multiple wide-area links (thus taking multiple wide-area laten ies) or even send the
same information multiple times over the same wide-area link. On DAS, MagPIe outperforms MPICH by a
fa tor of 2-8.
3.3 S hedulers
Another topi we are investigating with the DAS is the s heduling of parallel programs a ross multiple DAS
lusters. Our urrent DAS s heduler (prun) only operates on single lusters, so for multi- luster s heduling
we need a me hanism for o-allo ation of pro essors in di�erent lusters at the same time. We are urrently
investigating the use of Globus with its support for o-allo ation [13℄ for this purpose. So far, we have
implemented a simple interfa e between Globus and prun, and we have been able to submit and run two-
luster jobs through Globus. An important feature of prun that fa ilitates o-allo ation is its support for
reservations. We are planning to enhan e the interfa e between the lo al s hedulers and Globus, and if
ne essary the o-allo ation omponent of Globus, so that more optimal global s heduling de isions an be
made. Our �rst results on the performan e of o-allo ation in DAS-like systems an be found in [11℄.
4 Resear h on appli ations
We have used the DAS system for appli ations that run on a single luster (Se tion 4.1) and for wide-
area appli ations (Se tion 4.2). Also, we have studied Web-based appli ations (Se tion 4.3) and worldwide
distributed appli ations (Se tion 4.4).
4.1 Parallel appli ations
DAS has been used for a large number of parallel appli ations, in luding dis rete event simulation [33, 40℄,
Latti e Gas - and Latti e Boltzmann Simulations [15, 24, 25℄, parallel imaging [28℄, image sear hing [12℄,
datamining, N-body simulation [42℄, game tree sear h [38℄, simulation of bran h-predi tion me hanisms, ray
7
tra ing, mole ular dynami s, and quantum hemistry [19℄. We dis uss some of these appli ations in more
detail below.
The PILE proje t is to design a parallel programming model and environment for time- onstraint image
pro essing appli ations [28℄. The programming model is based on the analysis of typi al solutions employed
by users from the image pro essing ommunity. The PILE system is built around a software library ontaining
a set of abstra t data types and asso iated operations exe uting in a data parallel fashion. As implementation
vehi les on the DAS, MPI, CRL, and Spar/Java are being investigated.
Another appli ation area is image databases. Visual on ept re ognition [12℄ algorithms typi ally require
the solution of omputationally intensive sub-problems su h as orrelation and the optimal linear prin ipal
omponent transforms. We have designed eÆ ient algorithms for distributing the re ognition problem a ross
high bandwidth, distributed omputing networks. This has led not only to new algorithms for parallelizing
prevalent prin ipal omponent transforms, but also to novel te hniques for segmenting images and video for
real time appli ations.
Another proje t investigates hardware usage for bran h predi tors. Bran h predi tors are used in most
pro essors to keep the instru tion pipeline �lled. As a �rst step, we want to investigate the e�e ts of many
di�erent ( avors of) algorithms. For this purpose, a database was built whi h urrently holds about 8000
SQL-sear hable re ords. Ea h re ord ontains a detailed des ription of the state of a bran h predi tor after
the run of a tra e. These tra es were reated on the DAS ma hine using a MIPS-simulator, whi h simulates
six di�erent Spe 95 ben hmarks. The database was also built on the DAS ma hine, whi h took about 20
hours using 24 nodes. The individual nodes were used as stand-alone omputers. Ea h node ran a opy of a
bran h-predi tor simulator and worked on its own data-set. The main advantage of using the DAS ma hine
for this proje t is that it provides a transparent multi- omputer platform, whi h has proven to build the
required database in a fra tion of the time needed by a single omputer. With the resulting database the
investigation of the next steps in this proje t has been started.
We also use DAS for experimental resear h on the Time Warp Dis rete Event Simulation method. The
availability of fast, low-laten y ommuni ation is an important asset here. We have made extensive use of the
DAS to run Time Warp simulations for studying the in uen e of the appli ation dynami s on the exe ution
behavior of the Time Warp simulation kernel. The appli ation under study is an Ising spin system. The
results learly show the in uen e of the Ising spin dynami s on the Time Warp exe ution behavior in terms
of rollba k length and frequen y, and turnaround times. The results indi ate the need for adaptive optimism
ontrol me hanisms. Su essful experiments showed the versatility of optimism ontrol. First results are
obtained for des ribing and measuring self organization in parallel asyn hronous Cellular Automata with
8
Time Warp optimisti s heduling. It was shown that di�erent s aling laws exist for rollba k lengths with
varying Time Warp windows. These results were experimentally validated for sto hasti spin dynami s
systems. The work is des ribed in more detail in [40℄.
Another proje t studies N-body simulations. The numeri al integration of gravitational N-body problems
in its most basi formulation requires O(N
2
) operations per time step and O(N) time steps to study the
evolution of the system over a dynami ally interesting period of time. Realisti system sizes range from
a few thousand (open lusters) through 10
6
(globular lusters) to 10
12
(large galaxies). There are several
options to speed up su h al ulations: use a parallel omputer system; use fast, spe ial purpose, massively
parallel hardware, su h as the GRAPE-4 system of Makino [32℄; avoid re omputing slowly varying for es too
frequently (i.e. use individual time-steps); or arefully ombine groups of parti les in omputing the for es on
somewhat distant parti les (this leads to the well-known hierar hi al methods). Ea h of these te hniques has
distin t advantages and drawba ks. In our resear h we strive to �nd optimal mixes of the above approa hes
for various lasses of problems. We have atta hed two GRAPE-4 boards, whi h were kindly made available
by Jun Makino, to two separate DAS nodes at the University of Amsterdam. The system is used both by the
Astronomy Department for a tual N-body simulations, and by the Se tion Computational S ien e to model
the behavior of su h hybrid omputer systems and to guide the development of more advan ed approa hes
to N-body simulations, ombining some or all of the above te hniques.
The implementation of a hierar hi al algorithm on a parallel omputer and the ontrol of the resulting
numeri al errors are important omponents of our resear h. The methodologies to be developed have a
wider appli ability than astrophysi al N-body problems alone. Experiments have been performed on two
astronomi al N-body odes that have been instrumented to obtain performan e data for individual software
omponents and the GRAPE. One of the odes was adapted to run as a parallel ode on the DAS with
GRAPE. A simulation model for the resulting hybrid ar hite ture has been developed that reprodu es the
a tual performan e of the system quite a urately, so we will use this model for performan e analysis and
predi tion for similar hybrid systems.
We also study parti le models with highly onstrained dynami s. These Latti e Gas (LGA) - and Latti e
Boltzmann models (LBM) originated as mesos opi parti le models that an mimi hydrodynami s. We use
these models to study u tuations in uids and to study ow and transport in disordered media, su h as
random �ber networks and random lose pa kings of spheres. In all ases the omputational demands are
huge. Typi al LGA runs require 50 hours ompute time on 4 to 8 DAS nodes and are ompute bounded.
Typi al LBM runs simulate ow on very large grids (256
3
to 512
3
) and are usually bounded by the available
memory in the parallel ma hine. Large produ tion runs are typi ally exe uted on 8 to 16 DAS nodes.
9
Our parallel Latti e Boltzmann ode was originally developed on a Cray T3E [24, 25℄ under MPI. Be-
ause of the inherent data lo ality of the LBM iteration, parallelization was straightforward. However, to
guarantee good load balan ing we use Orthogonal Re ursive Bise tion to obtain a good partitioning of the
omputational box. The ode was ported to several systems (DAS, Parsyte CC, IBM SP2). Currently, we
investigate ow and transport in random lose pa kings of spheres, using the DAS.
We developed a generi parallel simulation environment for thermal 2-dimensional LGA [15℄. The de-
omposition of the re tangular omputational grid is obtained by a strip wise partitioning. An important
part of the simulation is ontinuous Fourier transformations of density u tuations after ea h LGA iteration.
As a parallel FFT we use the very eÆ ient publi domain pa kage FFTW (http://www.�tw.org/) whi h
exe utes without any adaptations on the DAS.
Another proje t involves parallel ray tra ing. The ray tra er we use is based on Radian e and uses
PVM (on top of Panda) for inter-pro essor ommuni ation. The port to DAS was a hieved with the aim to
ompare performan e results on di�erent platforms, in luding the DAS, a Parsyte CC, and various lusters
of workstations. The algorithm onsists of a demand-driven part for those tasks whi h require either a
large amount of data or data whi h is diÆ ult to predi t in advan e. This leads to a basi , but unbalan ed
workload. In order to a hieve proper eÆ ien ies, demand driven tasks are reated where possible. These
in lude tasks whi h are relatively ompute intensive and require a small amount of data. Demand driven
tasks are then used to balan e the workload. An overview of the algorithm, in luding results, are given
in [36℄.
In the Multigame proje t, we have developed an environment for distributed game-tree sear h. A pro-
grammer des ribes the legal moves of a game in the Multigame language and the ompiler generates a move
generator for that game. The move generator is linked with a runtime system that ontains parallel sear h
engines and heuristi s, resulting in a parallel program. Using the Multigame system, we developed a new
sear h algorithm for single-agent sear h, alled Transposition Driven Sear h, whi h obtains nearly perfe t
speedups up to 128 DAS nodes [38℄.
Several intera tive appli ations are being studied that use parallelism to obtain real-time responses, for
example for steering simulations. Examples are simulation of the haoti behavior of lasers or of the motion
of magneti vorti es in disordered super ondu tors, real-time analysis of experimental data (e.g., determining
the urrent patterns in at ondu tors from magneto-opti al imaging [54℄), and post-pro essing experimental
results. Examples of the latter ategory are re onstru tion of 3D images of teeth from lo al CT-data and
re onstru tion of the sea-bottom stru ture from a ousti al data. As many problems use linear-algebrai
operations or Fast Fourier Transforms, ex ellent speed-ups an be obtained.
10
One of the largest appli ations ported to DAS is the Amsterdam Density Fun tional (ADF) program [19℄
of the Theoreti al Chemistry se tion of the Vrije Universiteit. ADF is a quantum hemi al program. It uses
density fun tional theory to al ulate the ele troni stru ture of mole ules, whi h an be used for studying
various hemi al problems. ADF has been implemented on DAS on top of the MPI/Panda/LFC layers. The
obtained speed-up strongly depends on the hosen a ura y parameters, the mole ule, and the al ulated
property. The best speed-up measured so far was 70 on 90 CPUs for a al ulation on 30 water mole ules.
Current work fo uses on eliminating the remaining sequential bottlene ks and improving the load balan ing
s heme.
We have also ompared the appli ation performan e of the DAS lusters with that of a 4-pro essor SGI
Origin200 system (at the University of Utre ht). This study mainly uses the EuroBen ben hmark, whi h
indi ates performan e for te hni al/s ienti� omputations. The single-node observed peak performan e of
the Pentium Pro nodes of DAS is 3{5 times lower than that of the Origin200 nodes, mainly be ause the
O200 nodes are super s alar (they have more independently s hedulable oating-point operations per y le).
We ompared the ommuni ation performan e of DAS and the O200 by running a simple ping-pong test
written in Fortran 77/MPI. The MPI system used for DAS is a port of MPICH on top of Panda and LFC.
The bandwidth within a lo al DAS luster is about half of that of the O200. Finally, we measured the
performan e of a dense matrix-ve tor multipli ation The DAS version s ales well, but for large problem sizes
the program runs out of its L2 a he, resulting in a performan e de rease. The maximum speed obtained
on DAS (1228 M op/s) is a fa tor of 6 lower than on the O200 (7233 M op/s).
4.2 Wide-area appli ations
One of the goals of the DAS proje t was to do resear h on distributed super omputing appli ations, whi h
solve very large problems using multiple parallel systems at di�erent geographi lo ations. Most experiments
in this area done so far (e.g., SETI�home and RSA-155) use very oarse-grained appli ations. In our work,
we also investigate whether more medium-grained appli ations an be run eÆ iently on a wide-area system.
The problem here, of ourse, is the relatively poor performan e of the wide-area links. On DAS, for example,
most programming systems obtain a null-laten y over the lo al Myrinet network of about 30-40 �se , whereas
the wide-area ATM laten y is several millise onds. The throughput obtained for Myrinet typi ally is 30-60
Mbyte/se and for the DAS ATM network it is about 0.5 Mbyte/se . So, there is about two orders of
magnitude di�eren e in performan e between the lo al and wide-area links.
Many resear hers have therefore ome to believe that it is impossible to run medium-grained appli ations
on wide-area systems. Our experien e, however, ontradi ts this expe tation. The reason is that it is possible
11
to exploit the hierar hi al stru ture of systems like DAS. Most ommuni ation links in DAS are fast Myrinet
links, and only few links are slow ATM links. We have dis overed that many parallel algorithms an be
optimized by taking this hierar hi al stru ture into a ount [6, 35, 45℄. The key is to avoid the overhead on
the wide-area links, or to mask wide-area ommuni ation. Many appli ations an be made to run mu h faster
on the wide-area DAS using well-known optimizations like message ombining, hierar hi al load balan ing,
and laten y hiding. The speedups of the optimized programs often is lose to those on a single luster with
the same total number of CPUs.
In addition to this distributed super omputing type of appli ation, interest is also in reasing in using DAS
for other types of omputational-grid appli ations. An important issue is to harness the large omputational
and storage apabilities that are provided by su h systems. A next step is to develop new types of appli ation
environments on top of these grids. Virtual laboratories are one form of su h new appli ation environments.
They will, in the near future, have to allow an experimental s ientist (either being a physi ist, a biologist or an
engineer) to do experiments or develop designs. A Virtual laboratory environment will onsist of equipment
(like a mass spe trometer or a DNA mi ro array) that an be remotely ontrolled and that will provide data
sets that an be stored in the information management part of the laboratory. Intera tion with the data
an, among others, take pla e in virtual reality equipment like a CAVE. We have used DAS and another
SMP-based luster omputer (Ar hes) to develop the distributed information management for su h systems
as well as to study the possibilities for more generi user oriented middle ware for su h laboratories [23℄.
In another proje t (whi h is part of the Dut h Robo up initiative), we have developed an intera tive and
ollaborative visualization system for multi-agent systems, in parti ular robot so er. Our system allows
human users in CAVEs at di�erent geographi lo ations to parti ipate in a virtual so er mat h [37, 43℄.
The user in the CAVE an navigate over the so er �eld and ki k a virtual ball. In this way, the user intera ts
with a running (parallel) simulation program, whi h runs either on an SP2 or a DAS luster. In general,
many other s ienti� appli ations an bene�t from su h a form of high-level steering from a Virtual Reality
environment. The wide-area networks of DAS are an interesting infrastru ture for implementing distributed
ollaborative appli ations in general, as the link-laten y has hardly any variation.
4.3 Web-based appli ations
We have studied Web a hing using DAS and Ar hes, fo using mainly on a he repla ement and oheren e
strategies. This study showed that the te hniques urrently used for Web a hing are nevertheless very simple
and are mostly derived from earlier work in omputer ar hite ture systems. The experiments show that these
te hniques are still quite eÆ ient ompared to some new te hniques that have been proposed spe ially for
12
Web a hing [1, 2, 3℄. Sin e in Web a hing both strong and weak do ument oheren y are onsidered,
we have performed experiments in whi h we studied the quality of the hits (good hits are performed on
up-to-date a hed do uments). We have shown the existen e of two ategories of repla ement strategies,
performing the hits on re ently requested do uments or (mainly) on long term a hed do uments. The usage
of the a hed do uments is quite di�erent in both lasses and the a he size has di�erent impa t on ea h
ategory. We have ompared the eÆ ien y of strong and weak do ument oheren y. The results show that
with weak do ument oheren y, between 10% and 26% of the forwarded do uments were out-of-date, while
the useless generated traÆ remains quite high: 40% to 70% of the messages ex hanged to he k the state
of the a hed do uments are useless. To study strong oheren y, we used a typi al method that uses the
invalidation proto ol. The results show that the ost paid for this an be quite high. On average 70% of the
invalidation messages are useless and arrive at a a he server after the target do ument has been removed
from the a he.
In another proje t, alled ARCHIPEL [7℄, we study information servi e brokerage systems. In order
to support the wide variety of appli ation requirements, fundamental approa hes to ele troni ommer e,
Web-based learing houses, distributed databases, and information servi e brokerage systems need to be
developed. The ARCHIPEL system developed at the University of Amsterdam aims at both addressing and
analyzing these omplex needs, and providing the environment and system for their adequate support. The
results a hieved at this early stage of the proje t ontain the identi� ation of the hallenging requirements for
the ARCHIPEL support infrastru ture, su h as Web-based, ooperative and interoperable, high performan e,
support for node autonomy, preservation of information visibility (and information modi� ation) rights, and
the platform heterogeneity. So far, a omprehensive des ription of the ARCHIPEL infrastru ture is provided
that unites di�erent hara teristi s of existing parallel, distributed, and federated database management
systems within one uniform ar hite ture model. Currently, di�erent methodologies and te hnologies are
being evaluated to ful�ll the requirements. From this point of view, among others the XML te hnology,
ODBC standard, rapidly improving Java based programming and distributed management tools are being
evaluated.
4.4 Worldwide distributed appli ations
The goal of the Globe proje t [50, 51℄ is to design and build a prototype of a s alable infrastru ture for future
worldwide distributed appli ations. The infrastru ture is designed to support up to a billion users, spread
over the whole world. These users may own up to a trillion obje ts, some of them mobile. Many aspe ts of
the system, su h as repli ating data over multiple lo ations and managing se urity should be automati or at
13
least easily on�gurable. Although Globe interworks with the World Wide Wide, it is intended to run native
on the Internet, just as email, USENET news, and FTP do. DAS, with its 200 nodes at four lo ations, has
been used as a testbed to test pie es of the Globe system. We hope that a �rst prototype of Globe will be
available in early 2001.
The software te hnology underlying Globe is the distributed shared obje t. Ea h obje t has methods
that its users an invoke to obtain servi es. An obje t may be repli ated on multiple ma hines around the
world and a essed lo ally from ea h one as if it were a lo al obje t. The key to s alability is that ea h
obje t has its own poli ies with respe t to repli ation, oheren e, ommuni ation, se urity, et ., and these
poli ies are en apsulated within the obje t. For example, an obje t providing �nan ial servi es may require
sequential onsisten y, whereas an obje t providing sports s ores may have a mu h weaker onsisten y. It
is this ability to tailor the various poli ies per obje t that makes Globe s alable be ause those obje ts with
demanding requirements an have them without a�e ting obje ts that an live with weaker guarantees.
One of the aspe ts studied in detail is lo ating obje ts (�les, mailboxes, Web pages, et .) in su h a large
system. When an obje t wishes to be found, it registers with the lo ation server, whi h tra ks the lo ation
of all obje ts in a worldwide tree [48, 49℄. The tree exploits lo ality, a hing, and other te hniques to make
it s alable. Re ent work has fo used on handling mobile obje ts. Other work has looked at automating the
de ision about where to pla e repli as of obje ts to minimize bandwidth and delay [34℄. The main on lusion
here is that the ability to tailor the repli ation poli y to ea h obje t's a ess patterns provides signi� ant
gains over a single repli ation poli y for all obje ts and far better than having only a single opy of ea h
obje t. We have also looked at se urity issues [30℄.
Three appli ations of Globe have been built. The �rst one, Globedo [26℄, is used to produ e a better
Web on top of Globe. Globedo allows one or more HTML pages plus some appropriate olle tion of i ons,
images, et . to be pa kaged together into a single Globedo and transmitted all at on e, a vastly more
eÆ ient s heme than the urrent Web. Gateways to and from the Web have been onstru ted so Globedo
obje ts an be viewed with existing browsers. The se ond appli ation is the Globe Distribution Network [4℄,
an appli ation to provide a s alable worldwide distribution s heme for omplex free software pa kages. The
third appli ation is an instant-messaging servi e ( alled Lo 8) built as a front end to the lo ation servi e.
This servi e allows users all over the world to onta t ea h other, regardless whether they are mobile or not.
Spe ial attention has been paid to se urity. In ontrast to the entralized approa h of existing servi es, our
Lo 8 servi e is highly distributed and exploits lo ality as mu h as possible to attain s alability.
14
5 Con lusions
The Distributed ASCI Super omputer (DAS) is a 200-node homogeneous wide-area luster omputer that
is used for experimental resear h within the ASCI resear h s hool. Sin e the start of the proje t in June
1997, a large number of people have used the system for resear h on ommuni ation substrates, s heduling,
programming languages and environments, and appli ations.
The luster omputers of DAS use Myrinet as fast user-level inter onne t. EÆ ient ommuni ation
software is the key issue to obtain high ommuni ation performan e on modern networks like Myrinet.
We designed an eÆ ient low-level ommuni ation substrate for Myrinet, alled LFC. LFC provides the
right fun tionality to higher level layers (e.g., MPI, PVM, Java RMI, Or a), allowing them to obtain high
ommuni ation performan e. Most programming systems (even Java RMI) obtain null-laten ies of 30-40 �se
and throughputs of 30-60 Mbyte/se over Myrinet. We have therefore been able to su essfully implement a
wide variety of parallel appli ations on the DAS lusters, in luding many types of imaging appli ations and
s ienti� simulations.
DAS also is an ex ellent vehi le for doing resear h on wide-area appli ations, be ause it is homogeneous
and uses dedi ated wide-area networks. The onstant bandwidth and the low round trip times of the
WANs make message passing between the lusters predi table. On heterogeneous omputational grids [17℄,
additional problems must be solved (e.g., due to di�eren es in pro essors and networks). Our work on wide-
area parallel appli ations on DAS shows that there is a mu h ri her variety of appli ations than expe ted that
an bene�t from distributed super omputing. The basi assumption we rely on is that the distributed system
is stru tured hierar hi ally. This assumption fails for systems built from individual workstations at random
lo ations on the globe, but it does hold for grids built from MPPs, lusters, or networks of workstations.
We expe t that future omputational grids will indeed be stru tured in su h a hierar hi al way, exhibiting
lo ality like DAS.
A knowledgements
The DAS system is �nan ed partially by the Netherlands Organization for S ienti� Resear h (NWO) and
by the board of the Vrije Universiteit. The system was built by Parsyte , Germany. The wide-area ATM
networks are part of the Surfnet infrastru ture. A large number of other people have ontributed to the
DAS proje t, in luding Egon Amade, Arno Bakker, Sanya Ben Hassen, Christopher H�anle, Philip Homburg,
Jim van Keulen, Jussi Leiwo, Aske Plaat, Patri k Verkaik, Ivo van der Wijk (Vrije Universiteit), Gijs
Nelemans, Gert Polletiek, (University of Amsterdam), Wil Denissen, Vin ent Korstanje, Frits Kuijlman
15
(Delft University of Te hnology), and Erik Reinhard (University of Bristol). We thank Jun Makino for
kindly providing us with two GRAPE-4 boards.
Referen es
[1℄ A.Belloum and B. Hertzberger. Dealing with One-Time Do uments in Web Ca hing. In "EUROMI-
CRO'98 Conferen e", Sweden, August 1998.
[2℄ A.Belloum and B. Hertzberger. Repla ement Strategies in Web Ca hing. In "ISIC/CIRA/ISAS'98
IEEE onferen e", Gaithersburg, Maryland, September 1998.
[3℄ A.Belloum, A.H.J Peddemors, and B. Hertzberger. JERA: A S alable Web Server. In "Pro eedings of
the PDPTA'98 onferen e", pages 167{174, Las Vegas, NV, 1998.
[4℄ A. Bakker, E. Amade, G. Ballintijn, I. Kuz, P. Verkaik, I. van der Wijk, M. van Steen, and A.S.
Tanenbaum. The Globe Distribution Network. In Pro . 2000 USENIX Annual Conf. (FREENIX
Tra k), pages 141{152, June 2000.
[5℄ H. Bal, R. Bhoedjang, R. Hofman, C. Ja obs, K. Langendoen, T. R�uhl, and F. Kaashoek. Performan e
Evaluation of the Or a Shared Obje t System. ACM Transa tions on Computer Systems, 16(1):1{40,
February 1998.
[6℄ H.E. Bal, A. Plaat, M.G. Bakker, P. Dozy, and R.F.H. Hofman. Optimizing Parallel Appli ations for
Wide-Area Clusters. In International Parallel Pro essing Symposium, pages 784{790, Orlando, FL,
April 1998.
[7℄ A. Belloum, H. Muller, and B. Hertzberger. S alable Federations of Web Ca hes. submitted to the
spe ial issue on Web performan e of the Journal of Performan e Evaluation, 1999.
[8℄ R.A.F. Bhoedjang, T. R�uhl, and H.E. Bal. Design Issues for User-Level Network Interfa e Proto ols for
Myrinet. IEEE Computer, 31(11):53{60, November 1998.
[9℄ R.A.F. Bhoedjang, T. R�uhl, and H.E. Bal. EÆ ient Multi ast on Myrinet Using Link-Level Flow
Control. In Pro . of the 1998 Int. Conf. on Parallel Pro essing, pages 381{390, Minneapolis, MN,
August 1998.
[10℄ R.A.F. Bhoedjang, K. Verstoep, T. R�uhl, H.E. Bal, and R.F.H. Hofman. Evaluating Design Alternatives
for Reliable Communi ation on High-Speed Networks. In Pro . 9th Int. Conferen e on Ar hite tural
16
Support for Programming Languages and Operating Systems (ASPLOS-9), Cambridge, MA, November
2000.
[11℄ A.I.D. Bu ur and D.H.J. Epema. The In uen e of the Stru ture and Sizes of Jobs on the Performan e of
Co-Allo ation. In Sixth Workshop on Job S heduling Strategies for Parallel Pro essing (in onjun tion
with IPDPS2000), Le ture Notes in Computer S ien e, Can un, Mexi o, May 2000. Springer-Verlag,
Berlin.
[12℄ J. Buijs and M. Lew. Learning Visual Con epts. In ACM Multimedia'99, Orlando, FL, November 1999.
[13℄ K. Czajkowski, I. Foster, and C. Kesselman. Resour e Co-Allo ation in Computational Grids. In Pro .
of the 8-th IEEE Int'l Symp. on High Performan e Distributed Computing, pages 219{228, Redondo
Bea h, CA, USA, July 1999.
[14℄ W.J.A. Denissen, V.J. Korstanje, and H.J. Sips. Integration of the HPF Data-parallel Model in the CoSy
Compiler Framework. In 7th International Conferen e on Compilers for Parallel Computers (CPC'98),
pages 141{158, Linkoping, Sweden, June 1998.
[15℄ D. Dubbeldam, A.G. Hoekstra, and P.M.A. Sloot. Computational Aspe ts of Multi-Spe ies Latti e-Gas
Automata. In P.M.A. Sloot, M. Bubak, A.G. Hoekstra, and L.O. Hertzberger, editors, High-Performan e
Computing and Networking (HPCN Europe '99), Amsterdam, The Netherlands, number 1593 in Le ture
Notes in Computer S ien e, pages 339{349, Berlin, April 1999. Springer-Verlag.
[16℄ I. Foster and C. Kesselman. Globus: A Meta omputing Infrastru ture Toolkit. Int. Journal of Super-
omputer Appli ations, 11(2):115{128, Summer 1997.
[17℄ I. Foster and C. Kesselman, editors. The GRID: Blueprint for a New Computing Infrastru ture. Morgan
Kaufmann, 1998.
[18℄ A.S. Grimshaw and Wm. A. Wulf. The Legion Vision of a Worldwide Virtual Computer. Comm. ACM,
40(1):39{45, January 1997.
[19℄ C. Fonse a Guerra, J. G. Snijders, G. te Velde, and E. J. Baerends. Towards an Order-N DFT method.
Theor. Chem. A ., 99:391{403, 1998.
[20℄ S. Ben Hassen, H.E. Bal, and C. Ja obs. A Task and Data Parallel Programming Language based on
Shared Obje ts. ACM. Trans. on Programming Languages and Systems, 20(6):1131{1170, November
1998.
17
[21℄ K.A. Iskra, Z.W. Hendrikse, G.D. van Albada, B.J. Overeinder, and P.M.A. Sloot. Experiments with
Migration of PVM Tasks. In Resear h and Development for the Information So iety Conferen e Pro-
eedings (ISThmus 2000), pages 295{304, 2000.
[22℄ K.A. Iskra, F. van der Linden, Z.W. Hendrikse, B.J. Overeinder, G.D. van Albada, and P.M.A. Sloot.
The implementation of Dynamite | an environment for migrating PVM tasks. ACM OS Review
(submitted), 2000.
[23℄ E. Kaletas, A.H.J. Peddemors, and H. Afsarmanesh. ARCHIPEL Cooperative Islands of Information.
Internal report, University of Amsterdam, Amsterdam, The Netherlands, June 1999.
[24℄ D. Kandhai. Large S ale Latti e-Boltzmann Simlations: Computational Methods and Appli ations. PhD
thesis, University of Amsterdam, Amsterdam, The Netherlands, 1999.
[25℄ D. Kandhai, A. Koponen, A.G. Hoekstra, M. Kataja, J. Timonen, and P.M.A. Sloot. Latti e Boltzmann
Hydrodynami s on Parallel Systems. Computer Physi s Communi ations, 111:14{26, 1998.
[26℄ A.M. Kermarre , I. Kuz, M. van Steen, and A.S. Tanenbaum. A Framework for Consistent, Repli ated
Web Obje ts. In Pro eedings of the 18th International Conferen e on Distributed Computing Systems
(ICDCS), May 1998.
[27℄ T. Kielmann, R.F.H. Hofman, H.E. Bal, A. Plaat, and R.A.F. Bhoedjang. MagPIe: MPI's Colle tive
Communi ation Operations for Clustered Wide Area Systems. In ACM SIGPLAN Symposium on
Prin iples and Pra ti e of Parallel Programming, pages 131{140, Atlanta, GA, May 1999.
[28℄ D. Koelma, P.P. Jonker, and H.J. Sips. A software ar hite ture for appli ation driven high perfor-
man e image pro essing. Parallel and Distributed Methods for Image Pro essing, Pro eedings of SPIE,
3166:340{351, July 1997.
[29℄ K. Langendoen, R. Hofman, and H. Bal. Challenging Appli ations on Fast Networks. In HPCA-4
High-Performan e Computer Ar hite ture, pages 125{137, Las Vegas, NV, February 1998.
[30℄ J. Leiwo, C. H�anle, P. Homburg, C. Gamage, and A.S. Tanenbaum. A se urity design for a wide-area
distributed system. In Pro . Se ond Int'l Conf. Information Se urity and Cryptography (ICISC'99), In
LNCS 1878, De ember 1999.
[31℄ J. Maassen, R. van Nieuwpoort, R. Veldema, H.E. Bal, and A. Plaat. An EÆ ient Implementation
of Java's Remote Method Invo ation. In ACM SIGPLAN Symposium on Prin iples and Pra ti e of
Parallel Programming, pages 173{182, Atlanta, GA, May 1999.
18
[32℄ J. Makino, M. Taiji, T. Ebisuzaki, and D. Sugmimoto. GRAPE-4: A Massively-parallel Spe ial-purpose
Computer for Collisional N-body Simulation. Astrophysi al Journal , (480):432{446, 1997.
[33℄ B.J. Overeinder, A. S honeveld, , and P.M.A. Sloot. Self-Organized Criti ality in Optimisti Simulation
of Correlated Systems. Submitted to Journal of Parallel and Distributed Computing, 2000.
[34℄ G. Pierre, I. Kuz, M. van Steen, and A.S. Tanenbaum. Di�erentiated Strategies for Repli ating Web
Do uments. In Pro . 5th International Web Ca hing and Content Delivery Workshop, May 2000.
[35℄ A. Plaat, H. Bal, and R. Hofman. Sensitivity of Parallel Appli ations to Large Di�eren es in Band-
width and Laten y in Two-Layer Inter onne ts. In Fifth International Symposium on High-Performan e
Computer Ar hite ture, pages 244{253, Orlando, FL, January 1999. IEEE CS.
[36℄ E. Reinhard, A. Chalmers, and F.W. Jansen. Hybrid s heduling for parallel rendering using oherent
ray tasks. In J. Ahrens, A. Chalmers, and Han-Wei Shen, editors, 1999 IEEE Parallel Visualization
and Graphi s Symposium, pages 21{28, O tober 1999.
[37℄ L. Renambot, H.E. Bal, D. Germans, and H.J.W. Spoelder. CAVEStudy: an Infrastru ture for Com-
putational Steering in Virtual Reality Environments. In Ninth IEEE International Symposium on High
Performan e Distributed Computing, Pittsburgh, PA, August 2000.
[38℄ J.W. Romein, A. Plaat, H.E. Bal, and J. S hae�er. Transposition Table Driven Work S heduling in
Distributed Sear h. In 16th National Conferen e on Arti� ial Intelligen e (AAAI), pages 725{731,
Orlando, Florida, July 1999.
[39℄ H.J. Sips, W. Denissen, and C. van Reeuwijk. Analysis of Lo al Enumeration and Storage S hemes in
HPF. Parallel Computing, 24:355{382, 1998.
[40℄ P.M.A. Sloot and B.J. Overeinder. Time Warped Automata: Parallel Dis rete Event Simulation of
Asyn hronous CA's. In Pro eedings of the Third International Conferen e on Parallel Pro essing and
Applied Mathemati s, number 1593 in Le ture Notes in Computer S ien e, pages 43{62. Springer-Verlag,
Berlin, September 1999.
[41℄ L. Smarr and C.E. Catlett. Meta omputing. Communi ations of the ACM, 35(6):44{52, June 1992.
[42℄ P.F. Spinnato, G.D. van Albada, and P.M.A. Sloot. Performan e Analysis of Parallel N-Body Codes.
In High-Performan e Computing and Networking (HPCN Europe 2000), number 1823 in Le ture Notes
in Computer S ien e, pages 249{260. Springer-Verlag, Berlin, May 2000.
19
[43℄ H.J.W. Spoelder, L. Renambot, D. Germans, H.E. Bal, and F.C.A. Groen. Man Multi-Agent Intera tion
in VR: a Case Study with RoboCup. In IEEE Virtual Reality 2000 (poster), Mar h 2000. The full paper
is online at http://www. s.vu.nl/~renambot/vr/.
[44℄ G.D. van Albada, J. Clin kemaillie, A.H.L. Emmen, J. Gehring, O. Heinz, F. van der Linden, B.J.
Overeinder, A. Reinefeld, and P.M.A Sloot. Dynamite - Blasting Obsta les to Parallel Cluster Com-
puting. In High-Performan e Computing and Networking (HPCN Europe '99), number 1593 in Le ture
Notes in Computer S ien e, pages 300{310. Springer-Verlag, Berlin, April 1999.
[45℄ R. van Nieuwpoort, J. Maassen, H.E. Bal, T. Kielmann, and R. Veldema. Wide-Area Parallel Computing
in Java. In ACM 1999 Java Grande Conferen e, pages 8{14, San Fran is o, California, June 1999.
[46℄ C. van Reeuwijk, W.J.A. Denissen, F. Kuijlman, and H.J. Sips. Annotating Spar/Java for the Pla e-
ments of Tasks and Data on Heterogeneous Parallel Systems. In Pro eedings CPC 2000, Aussois, January
2000.
[47℄ C. van Reeuwijk, A.J.C. van Gemund, and H.J. Sips. Spar: a Programming Language for Semi-
automati Compilation of Parallel Programs. Con urren y Pra ti e and Experien e, 9(11):1193{1205,
November 1997.
[48℄ M. van Steen, F.J. Hau k, G. Ballintijn, and A.S. Tanenbaum. Algorithmi Design of the Globe Wide-
Area Lo ation Servi e. The Computer Journal, 41(5):297{310, 1998.
[49℄ M. van Steen, F.J. Hau k, P. Homburg, , and A.S. Tanenbaum. Lo ating Obje ts in Wide-Area Systems.
IEEE Communi ations Magazine, pages 104{109, jan 1998.
[50℄ M. van Steen, P. Homburg, and A.S. Tanenbaum. Globe: A Wide-Area Distributed System. IEEE
Con urren y, pages 70{78, January-Mar h 1999.
[51℄ M. van Steen, A.S. Tanenbaum, I. Kuz, and H.J. Sips. A S alable Middleware Solution for Advan ed
Wide-Area Web Servi es. Distributed Systems Engineering, 6(1):34{42, Mar h 1999.
[52℄ R. Veldema, R.A.F. Bhoedjang, R.F.H. Hofman, C.J.H. Ja obs, and H.E. Bal. Ja kal: A Compiler-
Supported, Fine-Grained, Distributed Shared Memory Implementation of Java. Te hni al report, Vrije
Universiteit Amsterdam, July 2000.
[53℄ J. Waldo. Remote Pro edure Calls and Java Remote Method Invo ation. IEEE Con urren y, pages
5{7, July{September 1998.
20
[54℄ R.J. Wijngaarden, H.J.W. Spoelder, R. Surdeanu, and R. Griessen. Determination of Two-dimensional
Current Patterns in Flat Super ondu tors from Magneto-opti al Measurements: An EÆ ient Inversion
S heme. Phys.Rev.B, (54):6742{6749, 1996.
21