Grid computing : an introduction
Lionel BrunieInstitut National des Sciences Appliquées
Lyon, France
Hansel and Gretel are lost in the forest of the definitions Distributed systemDistributed system Parallel systemParallel system Cluster computingCluster computing Meta-computingMeta-computing Grid computingGrid computing Peer to peerPeer to peer Global computingGlobal computing Internet ComputingInternet Computing Network computingNetwork computing
Distributed system
N autonomous computers (N autonomous computers (sitessites) : n administrators, n ) : n administrators, n data/control flowsdata/control flows
an interconnection networkan interconnection network User view : one single (virtual) systemUser view : one single (virtual) system « Traditional » programmer view : client-server« Traditional » programmer view : client-server
Parallel System
1 computer, n nodes : one administrator, one scheduler, 1 computer, n nodes : one administrator, one scheduler, one power sourceone power source
memory : it dependsmemory : it depends Programmer view : one single machine executing parallel Programmer view : one single machine executing parallel
codes. Various programming models (message passing, codes. Various programming models (message passing, distributed shared memory, data parallelism…)distributed shared memory, data parallelism…)
Cluster computing
Use of PCs interconnected by a (high performance) Use of PCs interconnected by a (high performance) network as a parallel (cheap) machinenetwork as a parallel (cheap) machine
Two main approachesTwo main approaches dedicated network (based on a high performance network : dedicated network (based on a high performance network :
Myrinet, SCI, Fiber Channel...)Myrinet, SCI, Fiber Channel...) non-dedicated network (based on a (good) LAN)non-dedicated network (based on a (good) LAN)
Network computing
From LAN (cluster) computing to WAN computingFrom LAN (cluster) computing to WAN computing Set of machines distributed over a MAN/WAN that are Set of machines distributed over a MAN/WAN that are
used to execute parallel loosely coupled codesused to execute parallel loosely coupled codes Depending on the infrastructure (soft and hard), network Depending on the infrastructure (soft and hard), network
computing is derived in Internet computing, P2P, Grid computing is derived in Internet computing, P2P, Grid computing, etc.computing, etc.
Meta computing
Definitions become fuzzy...Definitions become fuzzy... A meta computer = set of (widely) distributed (high A meta computer = set of (widely) distributed (high
performance) processing resources that can be associated performance) processing resources that can be associated for processing a parallel not so loosely coupled codefor processing a parallel not so loosely coupled code
A meta computer = parallel A meta computer = parallel
virtual machine over a virtual machine over a
distributed systemdistributed system Cluster of PCs
SAN
SAN
Cluster of PCs
LAN
WAN
SupercomputerVisualization
Grid computing (1)
““Resource sharing & coordinated problem solving in Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations” (I. dynamic, multi-institutional virtual organizations” (I. Foster)Foster)
Grid computing (2)
Information grid : large access to distributed data : the Information grid : large access to distributed data : the WebWeb
Data grid : management and processing of very large Data grid : management and processing of very large distributed data setsdistributed data sets
Computing grid ~ meta computerComputing grid ~ meta computer Ex : Globus, LegionEx : Globus, Legion
Internet computing
Use of (idle) computer Use of (idle) computer interconnected by Internet for interconnected by Internet for processing large throughput processing large throughput applicationsapplications
Ex : SETI@HOME, Ex : SETI@HOME, Décrypthon, RSA-155Décrypthon, RSA-155
Programmer view : a single Programmer view : a single master, n servantsmaster, n servants
Global computing
Internet computing on a pool of sitesInternet computing on a pool of sites Meta computing with loosely coupled codesMeta computing with loosely coupled codes Grid computing with poor communication facilitiesGrid computing with poor communication facilities Ex : CondorEx : Condor
Peer to peer computing
A site is both client and server : serventA site is both client and server : servent Dynamic servent discovery by « contamination »Dynamic servent discovery by « contamination » 2 approaches : 2 approaches :
centralized management : Napstercentralized management : Napster distributed management : Gnutella, Kazaadistributed management : Gnutella, Kazaa
Application : file sharingApplication : file sharing
Grid computing
Data Intensive Physical Sciences High energy & nuclear physicsHigh energy & nuclear physics SimulationSimulation
Earth observation, climate modelingEarth observation, climate modeling Geophysics, earthquake modelingGeophysics, earthquake modeling Fluids, aerodynamic designFluids, aerodynamic design Pollutant dispersal scenariosPollutant dispersal scenarios
Astronomy- Digital sky surveys : Astronomy- Digital sky surveys : the planned Large the planned Large Synoptic Survey Telescope will produce over 10 petabytes Synoptic Survey Telescope will produce over 10 petabytes per year by 2008 !per year by 2008 !
Molecular genomicsMolecular genomics Medical imagesMedical images
And comparisons must bemade among many
We need to get to one micron to know location of every cell. We’re just now starting to get to 10 microns
A Brain is a Lot of Data!(Mark Ellisman, UCSD)
Performance evolution of computer components
Network vs. computer performanceNetwork vs. computer performance Computer speed doubles every 18 monthsComputer speed doubles every 18 months Network speed doubles every 9 monthsNetwork speed doubles every 9 months Disk capacity doubles every 12 monthsDisk capacity doubles every 12 months
1986 to 20001986 to 2000 Computers: x 500Computers: x 500 Networks: x 340,000Networks: x 340,000
2001 to 20102001 to 2010 Computers: x 60Computers: x 60 Networks: x 4000Networks: x 4000
Moore’s Law vs. storage improvements vs. optical improvements. Graph from Scientific American (Jan-2001) by Cleo Vilett, source Vined Khoslan, Kleiner, Caufield and Perkins.
Partial conclusion
It is not a phantasm !It is not a phantasm !
Real need for very high performance infrasatructuresReal need for very high performance infrasatructures
Basic idea : share computing resourcesBasic idea : share computing resources
Back to roots (routes)
Railways, telephone, electricity, roads, bank systemRailways, telephone, electricity, roads, bank system Complexity, standards, distribution, integration Complexity, standards, distribution, integration
(large/small)(large/small) Impact on the society : how US grownImpact on the society : how US grown Big differences : Big differences :
clients (the citizens) are NOT providers (State or companies)clients (the citizens) are NOT providers (State or companies) small number of actors/providerssmall number of actors/providers small number of applicationssmall number of applications strong supervision/controlstrong supervision/control
Computational grid
« HW and SW infrastructure that provides dependable, « HW and SW infrastructure that provides dependable, consistent, pervasive and inexpensive access to high-end consistent, pervasive and inexpensive access to high-end computational capabilitiescomputational capabilities
Performance criteria :Performance criteria : securitysecurity reliabilityreliability computing powercomputing power latencylatency servicesservices throughputthroughput
Applications
Distributed supercomputingDistributed supercomputing High throughput computingHigh throughput computing On demand (real time) computingOn demand (real time) computing Data intensive computingData intensive computing Collaborative computingCollaborative computing
An Example Virtual Organization: CERN’s Large Hadron Collider1800 Physicists, 150 Institutes, 32 Countries1800 Physicists, 150 Institutes, 32 Countries
100 PB of data by 2010; 50,000 CPUs?100 PB of data by 2010; 50,000 CPUs?
Grid Communities & Applications:Data Grids for High Energy Physics
Tier2 Centre ~1 TIPS
Online System
Offline Processor Farm
~20 TIPS
CERN Computer Centre
FermiLab ~4 TIPSFrance Regional Centre
Italy Regional Centre
Germany Regional Centre
InstituteInstituteInstituteInstitute ~0.25TIPS
Physicist workstations
~100 MBytes/sec
~100 MBytes/sec
~622 Mbits/sec
~1 MBytes/sec
There is a “bunch crossing” every 25 nsecs.
There are 100 “triggers” per second
Each triggered event is ~1 MByte in size
Physicists work on analysis “channels”.
Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server
Physics data cache
~PBytes/sec
~622 Mbits/sec or Air Freight (deprecated)
Tier2 Centre ~1 TIPS
Tier2 Centre ~1 TIPS
Tier2 Centre ~1 TIPS
Caltech ~1 TIPS
~622 Mbits/sec
Tier 0Tier 0
Tier 1Tier 1
Tier 2Tier 2
Tier 4Tier 4
www.griphyn.org www.ppdg.net www.eu-datagrid.org
Levels of cooperation
End system (computer, disk, sensor…)End system (computer, disk, sensor…) multithreading, local I/Omultithreading, local I/O
Cluster (heterogeneous)Cluster (heterogeneous) synchronous communications, DSM, parallel I/Osynchronous communications, DSM, parallel I/O parallel processingparallel processing
IntranetIntranet heterogeneity, distributed admin, distributed FS and databasesheterogeneity, distributed admin, distributed FS and databases low supervision, resource discoverylow supervision, resource discovery high throughputhigh throughput
InternetInternet no control, collaborative systems, (international) WANno control, collaborative systems, (international) WAN brokers, negotiationbrokers, negotiation
Basic services
AuthenticationAuthentication AuthorizationAuthorization Activity controlActivity control Resource informationResource information Resource brokeringResource brokering SchedulingScheduling Job submission, data access/migration and executionJob submission, data access/migration and execution AccountingAccounting
Layered Grid Architecture(By Analogy to Internet Architecture)
Application
Fabric“Controlling things locally”: Access to, & control of, resources
Connectivity“Talking to things”: communication (Internet protocols) & security
Resource“Sharing single resources”: negotiating access, controlling use
Collective“Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services
InternetTransport
Application
Link
Inte
rnet P
roto
col
Arch
itectu
re
From I. Foster
Aspects of the Problem Need for Need for interoperabilityinteroperability when different groups want to when different groups want to
share resourcesshare resources Diverse components, policies, mechanismsDiverse components, policies, mechanisms E.g., standard notions of identity, means of communication, E.g., standard notions of identity, means of communication,
resource descriptionsresource descriptions
Need for Need for shared infrastructure servicesshared infrastructure services to avoid repeated to avoid repeated development, installationdevelopment, installation E.g., one port/service/protocol for remote access to computing, not E.g., one port/service/protocol for remote access to computing, not
one per tool/applicationone per tool/application E.g., Certificate Authorities: expensive to runE.g., Certificate Authorities: expensive to run
A common need for A common need for protocols & servicesprotocols & services
From I. Foster
Basic services
AuthenticationAuthentication AuthorizationAuthorization Activity controlActivity control Resource informationResource information Resource brokeringResource brokering SchedulingScheduling Job submission, data access/migration and executionJob submission, data access/migration and execution AccountingAccounting
Security :Why Grid Security is Hard
Resources being used may be extremely valuable & the Resources being used may be extremely valuable & the problems being solved extremely sensitiveproblems being solved extremely sensitive
Resources are often located in distinct administrative domainsResources are often located in distinct administrative domains Each resource may have own policies & proceduresEach resource may have own policies & procedures
Users may be differentUsers may be different The set of resources used by a single computation may be The set of resources used by a single computation may be
large, dynamic, and/or unpredictablelarge, dynamic, and/or unpredictable Not just client/serverNot just client/server
It must be broadly available & applicableIt must be broadly available & applicable Standard, well-tested, well-understood protocolsStandard, well-tested, well-understood protocols Integration with wide variety of toolsIntegration with wide variety of tools
1) Easy to use
2) Single sign-on
3) Run applicationsftp,ssh,MPI,Condor,Web,…
4) User based trust model
5) Proxies/agents (delegation)
1) Specify local access control
2) Auditing, accounting, etc.
3) Integration w/ local systemKerberos, AFS, license mgr.
4) Protection from compromisedresources
API/SDK with authentication, flexible message protection,
flexible communication, delegation, ...Direct calls to various security functions (e.g. GSS-API)Or security integrated into higher-level SDKs:
E.g. GlobusIO, Condor
User View Resource Owner View
Developer View
Grid Security : various views
Grid security : requirements
AuthenticationAuthentication Authorization and delegation of authorityAuthorization and delegation of authority AssuranceAssurance Accounting Accounting Auditing and monitoringAuditing and monitoring Integrity and confidentialityIntegrity and confidentiality
Resources
DescriptionDescription AdvertisingAdvertising CatalogingCataloging MatchingMatching ClaimingClaiming ReservingReserving CheckpointingCheckpointing
Resource layers
Application layerApplication layer tasks, resource requeststasks, resource requests
Application resource management layerApplication resource management layer intertask resource management, execution environmentintertask resource management, execution environment
System layerSystem layer resource matching, global brokeringresource matching, global brokering
Owner layerOwner layer owner policy : who may uses whatowner policy : who may uses what
End-resource layerEnd-resource layer end-resource policy (e.g. O.S.)end-resource policy (e.g. O.S.)
Resource management (1)
Services and protocols depend on the infrastructureServices and protocols depend on the infrastructure Some parametersSome parameters
stability of the infrastructure (same set of resources or not)stability of the infrastructure (same set of resources or not) freshness of the resource availability informationfreshness of the resource availability information reservation facilitiesreservation facilities multiple resource or single resource brokeringmultiple resource or single resource brokering
Example request : I need from 10 to 100 CE each with at Example request : I need from 10 to 100 CE each with at least 128 MB RAM and a computing power of 50 Mipsleast 128 MB RAM and a computing power of 50 Mips
Resource management (2)
Figure : the structure of a RMS...Figure : the structure of a RMS...
Resource management and scheduling (1) Levels of schedulingLevels of scheduling
job scheduling (global level ; perf : throughput)job scheduling (global level ; perf : throughput) resource scheduling (perf : fairness, utilization)resource scheduling (perf : fairness, utilization) application scheduling (perf : response time, speedup, produced data…)application scheduling (perf : response time, speedup, produced data…)
Mapping/schedulingMapping/scheduling resource discovery and selectionresource discovery and selection assignment of tasks to computing resourcesassignment of tasks to computing resources data distributiondata distribution task scheduling on the computing resourcestask scheduling on the computing resources (communication scheduling)(communication scheduling)
Individual perfs are not necessarily consistent with the global Individual perfs are not necessarily consistent with the global (system) perf !(system) perf !
Resource management and scheduling (2) Grid problemsGrid problems
predictions are not definitive : dynamicity !predictions are not definitive : dynamicity ! Heterogeneous platformsHeterogeneous platforms Checkpointing and migrationCheckpointing and migration
GRAM GRAM GRAM
LSF Condor NQE
Application
RSL
Simple ground RSL
Information Service
Localresourcemanagers
RSLspecialization
Broker
Ground RSL
Co-allocator
Queries& Info
A Resource Management System example (Globus)
Resource information (1)
What is to be stored ?What is to be stored ? Organization, people, computing resources, software packages, communication Organization, people, computing resources, software packages, communication
resources, event producers, devices…resources, event producers, devices… what about data ???what about data ???
A key issue in such dynamics environmentsA key issue in such dynamics environments A first approach : (distributed) directory (LDAP)A first approach : (distributed) directory (LDAP)
easy to useeasy to use tree structuretree structure distributiondistribution staticstatic mostly read ; not efficient updatingmostly read ; not efficient updating hierarchicalhierarchical poor procedural languagepoor procedural language
Resource information (2)
But :But : dynamicitydynamicity complex relationshipscomplex relationships frequent updatesfrequent updates complex queriescomplex queries
A second approach : (relational) databaseA second approach : (relational) database
Data management
It was long forgotten !!!It was long forgotten !!! Though it is a key issue !Though it is a key issue ! Issues :Issues :
indexingindexing retrievalretrieval replicationreplication cachingcaching traceabilitytraceability (auditing)(auditing)
And security !!!And security !!!
The ReplicaManagement Problem Maintain a mapping between Maintain a mapping between logical nameslogical names for files and for files and
collections and one or more collections and one or more physical locationsphysical locations Decide where and when a piece of data must be replicatedDecide where and when a piece of data must be replicated Important for many applicationsImportant for many applications Example: CERN high-level trigger dataExample: CERN high-level trigger data
Multiple petabytes of data per yearMultiple petabytes of data per year Copy of everything at CERN (Tier 0)Copy of everything at CERN (Tier 0) Subsets at national centers (Tier 1)Subsets at national centers (Tier 1) Smaller regional centers (Tier 2)Smaller regional centers (Tier 2) Individual researchers will have copiesIndividual researchers will have copies
Even more complex with sensitive data like medical data !!!Even more complex with sensitive data like medical data !!!
Programming on the grid : potential programming models Message passing (PVM, MPI)Message passing (PVM, MPI) Distributed Shared MemoryDistributed Shared Memory Data Parallelism (HPF, HPC++)Data Parallelism (HPF, HPC++) Task Parallelism (Condor)Task Parallelism (Condor) Client/server - RPCClient/server - RPC AgentsAgents Integration system (Corba, DCOM, RMI)Integration system (Corba, DCOM, RMI)
Program execution : issues
Parallelize the program with the right job structure, Parallelize the program with the right job structure, communication patterns/procedures, algorithmscommunication patterns/procedures, algorithms
Discover the available resourcesDiscover the available resources Select the suitable resourcesSelect the suitable resources Allocate or reserve these resourcesAllocate or reserve these resources Migrate the dataMigrate the data Initiate computationsInitiate computations Monitor the executions ; checkpoints ?Monitor the executions ; checkpoints ? React to changesReact to changes Collect resultsCollect results
The Legion system
University of VirginiaUniversity of Virginia Object-oriented approach. Objects = data, applications, sensors, Object-oriented approach. Objects = data, applications, sensors,
computing resources, codes… : all is object !computing resources, codes… : all is object ! Loosely coupled codesLoosely coupled codes Single naming spaceSingle naming space Reuse of existing OS and protocols ; definition of message formats and Reuse of existing OS and protocols ; definition of message formats and
high level protocolshigh level protocols Core objects : naming, binding, object Core objects : naming, binding, object
creation/activation/desactivation/destructioncreation/activation/desactivation/destruction Methods : description via an IDLMethods : description via an IDL Security : in the hands of the usersSecurity : in the hands of the users Resource allocation : a site can define its own policyResource allocation : a site can define its own policy
The Globus toolkit A set of integrated executable management (GEM) services for the A set of integrated executable management (GEM) services for the
GridGrid ServicesServices
resource management (GRAM-DUROC)resource management (GRAM-DUROC) communication (NEXUS - MPICH-G2, globus_io)communication (NEXUS - MPICH-G2, globus_io) information (MDS)information (MDS) data management (replica catalog)data management (replica catalog) security (GSI)security (GSI) monitoring (HBM)monitoring (HBM) remote data access (GASS - GridFTP - RIO)remote data access (GASS - GridFTP - RIO) executable management (GEM)executable management (GEM) executionexecution Commodity Grid Kits (Java, Python, Corba, Matlab…)Commodity Grid Kits (Java, Python, Corba, Matlab…)
High-Throughput Computing: Condor High-throughput computing platform for mapping many High-throughput computing platform for mapping many
tasks to idle computerstasks to idle computers Since 1986 !Since 1986 ! Major componentsMajor components
A central manager manages pool(s) of [distributively owned or A central manager manages pool(s) of [distributively owned or dedicated] computers. A CM = scheduler + coordinatordedicated] computers. A CM = scheduler + coordinator
DAGman manages user task poolsDAGman manages user task pools Matchmaker schedules tasks to computers using classified adsMatchmaker schedules tasks to computers using classified ads Checkpointing and process migrationCheckpointing and process migration No simple communicationsNo simple communications
Parameter studies, data analysisParameter studies, data analysis Condor married Globus : Condor-GCondor married Globus : Condor-G More than 150 Condor pools in the world ; or on your More than 150 Condor pools in the world ; or on your
machine !machine !
Defining a DAG A DAG is defined by a A DAG is defined by a .dag.dag filefile, listing each of its nodes and their , listing each of its nodes and their
dependencies:dependencies:# diamond.dag# diamond.dagJob A a.subJob A a.subJob B b.subJob B b.subJob C c.subJob C c.subJob D d.subJob D d.subParent A Child B CParent A Child B CParent B C Child DParent B C Child D
Each node will run the Condor job specified by its accompanying Each node will run the Condor job specified by its accompanying Condor Condor submit filesubmit file
Job A
Job B Job C
Job D
From Condor tutorial
Conclusion
Just a new toy for scientists or a revolution ?Just a new toy for scientists or a revolution ? Complexity from heterogeneity, wide distribution, Complexity from heterogeneity, wide distribution,
security, dynamicitysecurity, dynamicity Many approachesMany approaches
Still much work to do !!!Still much work to do !!!
A global framework for grid computing, pervasive A global framework for grid computing, pervasive computing and Web services ?computing and Web services ?
Functional View of Grid Data Management
Location based ondata attributes
Location of one ormore physical replicas
State of grid resources, performance measurements and predictions
Metadata Service
Application
Replica LocationService
Information Services
Planner:Data location, Replica selection,Selection of compute and storage nodes
Security and Policy
Executor:Initiates data transfers and computations
Data Movement
Data Access
Compute Resources Storage Resources
Components in Globus Toolkit 3.0
GSI
WS-Security
Data Managemen
tSecurity
WSCore
Resource Managemen
t
Information Services
RFT(OGSI)
RLS
WU GridFTPJAVA
WS Core(OGSI)
OGSI C Bindings
MDS2
WS-Index(OGSI)
Pre-WSGRAM
WS GRAM(OGSI)
Components in Globus Toolkit 3.2
GSI
WS-Security
CAS(OGSI)
SimpleCA
Data Managemen
tSecurity
WSCore
Resource Managemen
t
Information Services
RFT(OGSI)
RLS
OGSI-DAI
WU GridFTP
XIO
JAVAWS Core(OGSI)
OGSI C Bindings
MDS2
WS-Index(OGSI)
Pre-WSGRAM
WS GRAM(OGSI)
OGSI Python Bindings
(contributed)
pyGlobus(contributed)
Planned Components in GT 4.0GSI
WS-Security
CAS(WSRF)
SimpleCA
Data Managemen
tSecurity
WSCore
Resource Managemen
t
Information Services
Authz Framework
RFT(WSRF)
RLS
OGSI-DAI
New GridFTP
XIO
JAVAWS Core(WSRF)
C WS Core(WSRF)
MDS2
WS-Index(WSRF)
Pre-WSGRAM
WS-GRAM(WSRF)
CSF(contribution)
pyGlobus(contributed)
GT2 GRAM
Requestor
Root
Gatekeeper
User Account
Server
HostCreds
Authenticate, Request
Authenticate, Respond
JobManager
Invoke
Trustedby serverand user
GT3 GRAM
Requestor
Globus account(non-privileged)
MMJFS
User Account
Server
HostCreds
Signed Request
Signed Respond
JobManager
Invoke
HostEnvStarter
Root
GRIM
GRIMCreds
Trustedby server
Trustedby server
GT4 GRAM
http://www-unix.globus.org/toolkit/docs/3.2/gram/ws/devehttp://www-unix.globus.org/toolkit/docs/3.2/gram/ws/developer/architecture.htmlloper/architecture.html