Introduction to Grids
David Boyd
Information Technology Department
Rutherford Appleton Laboratory
The Grid - what is it?
“The grid is an emerging infrastructure that will fundamentally change the way we think about - and use - computing. The grid will connect multiple regional and national computational grids to create a universal source of computing power.”
(c.f. electric power grid)
The Grid: Blueprint for a New Computing Infrastructure, Foster and Kesselman, Morgan Kaufmann Publishers
Promise of ubiquitous computing?
• “Global virtual computer”
• Grid “middleware” as the OS of this computer
• Accessible via a “plug on the wall”
• Dependable - predictable, sustained performance
• Consistent - standard interfaces, services and operating parameters
• Pervasive - widely available via controlled access
• Inexpensive - perceived cost relative to alternative solutions
We will eventually take the Grid for granted
Where the Grid is coming from . . .
• Work over the last decade on several distinct topics– Metacomputing - combining distributed heterogeneous
computing resources on a single problem
– Data Archiving - building collections of data on specific topics with open metadata catalogues and well-documented data formats
– Collaborative Working - network-based facilities for distributed, concurrent working on shared information
– Data visualization - large-scale immersive facilities for 3D visual interaction with data coupled to computational analysis
– Network-based information systems - WAIS, Gopher, WWW
– Instrumentation control - remote control of experimental equipment and real time data gathering systems
The Grid (potentially) offers . . .
• A consistent way of combining many types of “computing” resource located anywhere to solve complex problems
• These resources can be:– networks (LAN, SJ4, Geant(EU), Internet2(US), . . . )
– computing systems (HPC, MPP, SMP, clusters, workstations, PCs, . .)
– data storage facilities (tape robots, disk farms, local caches, . . . )
– visualization / VR environments (VR Centres, CAVEs , . . . , desktop)
– scientific instruments (telescopes, microscopes, synchrotrons, satellites, . . . )
– . . . . basically any device which can communicate over the internet
But the Grid is not yet . . .
• mature, reliable technology
• a well-understood system
• a generic solution
• easy to use
• capable of sustained operation
• economically validated
. . . however . . .
• I might have said the same about the Web ~7 years ago
– and it has since transformed the world’s access to information and created a business revolution . . . imagine the potential of the Grid over the next 5-7 years
The Web vs The Grid
• The Web supports wide area data/information location and retrieval
• The Grid supports complete process initiation and execution including any necessary data location and retrieval– offers the potential to carry out significantly large tasks
– opens up new capabilities for knowledge generation
• Currently Grid tools provide a relatively low level of operational control– higher level tools will be developed to automate low level
processes
– agent technology will eventually support real time dynamic process optimisation
Process optimisation
• By looking at whole processes it will be possible to optimise them using intelligent management strategies– identifying appropriate (by user-specified criteria) computing
resources for the task
– locating appropriate sources of the relevant data
– extracting relevant subsets of this data
– assessing options for migrating the data or the task
– evaluating the available network service quality
– making a decision on a strategy to carry out the user’s task
– adapting this to changing circumstances in real time
Grid architecture
GridFabric
Applications
Archives Networks
Instrumentation Control interfaces Computers
GridServices
Directory service
Resource managementMetadata
Data access
Fault detection
Authentication
Display devices
High-energyphysics data
analysis Climate modelling
Collaborativeengineering
Parameterstudies
On-lineexperiments
ApplicationToolkits
Highthroughput
Data intensivecomputing
Collaborativeworking
Remotevisualization
Remote control
Protocols
Grid tools and toolkits
• There are many useful tools and toolkits now available– e.g. Globus, Condor, Legion, SRB, LDAP, OOFS, . . .
• Globus provides resource allocation and management (GRAM), information access (MDS), authentication (GSI)
• Condor provides high throughput computing on distributed networks of workstations with checkpointing
• Legion is an object-based large scale distributed computing environment designed to handle trillions of objects
• Storage Resource Broker provides facilities for managing a distributed data repository which includes multiple copies
An application of the Grid
tomographic reconstruction
real-timecollection
wide-areadissemination
desktop & VR clients with shared controls
Advanced Photon Source
Online Instruments
archival storage
The Globus project
Science-driven Grid applications
• Environmental science– coupled atmosphere and ocean simulations with long simulated
timescales at high resolution
• Biological science– multiple protein folding simulations to generate statistically valid
models of complex molecules
• Astronomy - “Virtual Observatory”– searching across many instrument-specific data archives to study a
new class of object at all wavelengths
• Materials science– combining and analysing data from different experimental
facilities to derive the structure of complex new materials
Large Hadron Collider Data Grid
Department
Desktop
CERN – Tier 0
Tier 1 FNALRAL
IN2P3622 M
bps2.5 Gbps
622 M
bp
s
155
mbp
s 155 mbps
Tier2 Lab a
Uni b Lab c
Uni n
5 Tier model
Some US Grid initiatives
• Grid Physics Network (GriPhyN) a consortium of universities, was recently given $11.8m by NSF over 5 years for projects in particle physics and astronomy
• Particle Physics Data Grid is a consortium of major US government laboratories collaborating on Grid developments for the US particle physics programme
• NASA Information Power Grid is being built to support coupled multi-disciplinary simulations and to provide a national resource for rapid response crisis and disaster management
• Grid Forum is an open forum of the major Grid development teams with Working Groups on high priority problem areas and a remit to develop standards
EU DataGrid project
• Will link national Grid projects to create a European Grid infrastructure
• Strong focus on developing middleware tools
• Application-driven from 3 areas– Particle Physics
– Earth Observation
– Biosciences
• 21 partners! (6 main, 15 associated)
• Close collaboration with US initiatives
• Likely to lead to several follow-on projects to exploit this infrastructure for social and commercial applications
CLRC e-Science programme
• Project-driven programme to exploit Grid technologies for the benefit of the CLRC science programme and its users by enhancing the CLRC infrastructure and developing Grid expertise– Data Portal - developing a metadata-driven access mechanism for a
wide range of scientific data from CLRC facilities and programmes
– ATLAS Datastore- Grid-enabling and enhancing the Datastore to Petabyte capacity
– Gbit LAN - building a Gbit internal network in preparation for Grid-based applications and imminent SJ4 connectionGrid reference system - developing and supporting a reference Grid platform as a basis for internal Grid application projects
– plus several science application pilot projects
UK Grid programme
• Proposal to Treasury for £90m as part of SR2000 bid
• Awaiting news from OST
• Use science as the “storm troopers” to build a national Grid infrastructure
• Achieve world-beating science as a result
• Give UK business a head start on its competitors by early exposure to the technology
• Transfer the expertise gained into the UK commercial software industry to develop a global market
The End