Chapter 7:-
The anatomy of the Grid
Prepared By:- NITIN PANDYA
Assistant Professor
SVBIT.
Content
Problem statement
Architecture
Globus Toolkit
Futures
2 NITIN PANDYA
The Grid Problem
Resource sharing & coordinated problem solving in dynamic,
multi-institutional virtual organizations
3 NITIN PANDYA
Elements of the Problem
Resource sharing
Computers, storage, sensors, networks, …
Sharing always conditional: issues of trust, policy, negotiation,
payment, …
Coordinated problem solving
Beyond client-server: distributed data analysis, computation,
collaboration, …
Dynamic, multi-institutional virtual orgs
Community overlays on classic org structures
Large or small, static or dynamic
4 NITIN PANDYA
Grid Communities & Applications: Data Grids
for High Energy Physics
Tier2 Centre ~1 TIPS
Online System
Offline Processor Farm
~20 TIPS
CERN Computer Centre
FermiLab ~4 TIPS France Regional Centre
Italy Regional Centre
Germany Regional Centre
Institute Institute Institute Institute ~0.25TIPS
Physicist workstations
~100 MBytes/sec
~100 MBytes/sec
~622 Mbits/sec
~1 MBytes/sec
There is a “bunch crossing” every 25 nsecs.
There are 100 “triggers” per second
Each triggered event is ~1 MByte in size
Physicists work on analysis “channels”.
Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server
Physics data cache
~PBytes/sec
~622 Mbits/sec or Air Freight (deprecated)
Tier2 Centre ~1 TIPS
Tier2 Centre ~1 TIPS
Tier2 Centre ~1 TIPS
Caltech ~1 TIPS
~622 Mbits/sec
Tier 0
Tier 1
Tier 2
Tier 4
1 TIPS is approximately 25,000
SpecInt95 equivalents
Image courtesy Harvey Newman, Caltech 5 NITIN PANDYA
Grid Communities and Applications: Network
for Earthquake Eng. Simulation
NEESgrid: national infrastructure to
couple earthquake engineers with
experimental facilities, databases,
computers, & each other
On-demand access to experiments,
data streams, computing, archives,
collaboration
NEESgrid: Argonne, Michigan, NCSA, UIUC, USC 6 NITIN PANDYA
Grid Communities and Applications:
Mathematicians Solve NUG30
Community=an informal
collaboration of mathematicians and
computer scientists
Condor-G delivers 3.46E8 CPU
seconds in 7 days (peak 1009
processors) in U.S. and Italy (8 sites)
Solves NUG30 quadratic assignment
problem 14,5,28,24,1,3,16,15,
10,9,21,2,4,29,25,22,
13,26,17,30,6,20,19,
8,18,7,27,12,11,23
MetaNEOS: Argonne, Iowa, Northwestern, Wisconsin 7 NITIN PANDYA
Community =
1000s of home computer
users
Philanthropic computing
vendor (Entropia)
Research group (Scripps)
Common goal= advance
AIDS research
Grid Communities and Applications: Home
Computers Evaluate AIDS Drugs
8 NITIN PANDYA
Grid Architecture
Why Discuss Architecture?
Descriptive
Provide a common vocabulary for use when describing Grid
systems
Guidance
Identify key areas in which services are required
Prescriptive
Define standard “Intergrid” protocols and APIs to facilitate
creation of interoperable Grid systems and portable applications
10 NITIN PANDYA
What Sorts of Standards?
Need for interoperability when different groups want to share resources
E.g., IP lets me talk to your computer, but how do we establish & maintain sharing?
How do I discover, authenticate, authorize, describe what I want to do, etc., etc.?
Need for shared infrastructure services to avoid repeated development, installation, e.g.
One port/service for remote access to computing, not one per tool/application
X.509 enables sharing of Certificate Authorities
11 NITIN PANDYA
So, in Defining Grid Architecture, We Must
Address …
Development of Grid protocols & services
Protocol-mediated access to remote resources
New services: e.g., resource brokering
“On the Grid” = speak Intergrid protocols
Mostly (extensions to) existing protocols
Development of Grid APIs & SDKs
Facilitate application development by supplying higher-level abstractions
The (hugely successful) model is the Internet
The Grid is not a distributed OS!
12 NITIN PANDYA
The Role of Grid Services and Tools
Remote
monitor
Remote
access
Information
services
Fault
detection . . . Resource
mgmt
Collaboration
Tools
Data Mgmt
Tools
Distributed
simulation . . .
net
13 NITIN PANDYA
Layered Grid Architecture
Application
Fabric “Controlling things locally”: Access to, & control
of, resources
Connectivity “Talking to things”: communication (Internet
protocols) & security
Resource “Sharing single resources”: negotiating access,
controlling use
Collective
“Coordinating multiple resources”: ubiquitous
infrastructure services, app-specific distributed
services
Internet
Transport
Application
Link
Internet Protocol A
rchitecture
14 NITIN PANDYA
Protocols, Services, and Interfaces Occur
at Each Level
Languages/Frameworks
Fabric Layer
Applications
Local Access APIs and Protocols
Collective Service APIs and SDKs
Collective Services Collective Service Protocols
Resource APIs and SDKs
Resource Services Resource Service Protocols
Connectivity APIs
Connectivity Protocols
15 NITIN PANDYA
Where Are We With Architecture?
No “official” standards exist
Nor is it clear what this would mean
But:
Globus Toolkit has emerged as the de facto standard for several
important Connectivity, Resource, and Collective protocols
GGF has an architecture working group
Technical specifications are being developed for architecture
elements: e.g., security, data, resource management, information
16 NITIN PANDYA
The Globus Toolkit
Grid Services Architecture (1): Fabric Layer
Just what you would expect: the diverse mix of resources that
may be shared
Individual computers, Condor pools, file systems, archives,
metadata catalogs, networks, sensors, etc., etc.
Few constraints on low-level technology: connectivity and
resource level protocols form the “neck in the hourglass”
Globus toolkit provides a few selected components (e.g.,
bandwidth broker)
18 NITIN PANDYA
Grid Services Architecture (2): Connectivity
Layer Protocols & Services
Communication
Internet protocols: IP, DNS, routing, etc.
Security: Grid Security Infrastructure (GSI)
Uniform authentication & authorization mechanisms in multi-
institutional setting
Single sign-on, delegation, identity mapping
Public key technology, SSL, X.509, GSS-API
Supporting infrastructure: Certificate Authorities, key management,
etc.
19 NITIN PANDYA
User
User Proxy
Globus
Credential
Site 1
Kerberos
GRAM Process
Process
Process GSI
Ticket
Site 2
Public Key
GRAM
GSI
Certificate
Process
Process
Process
Authenticated
interprocess
communication
CREDENTIAL
Mutual
user-resource
authentication
Mapping
to
local ids
Assignment of
credentials to
“user proxies”
Single sign-on via “grid-id”
Authorization
20 NITIN PANDYA
GSI Futures
Scalability in numbers of users & resources
Credential management
Online credential repositories (“MyProxy”)
Account management
Authorization
Policy languages
Community authorization
Protection against compromised resources
Restricted delegation, smartcards
21 NITIN PANDYA
GSI Futures: Community Authorization
2. CAS reply, with and resource CA info
user/group membership
resource/collective membership
collective policy information
CAS
Does the
collective policy
authorize this
request for this
user?
User
1. CAS request, with resource names and operations
Resource
Is this request
authorized for
the CAS?
Is this request
authorized by
the
capability? local policy information
3. Resource request, authenticated with
capability
4. Resource reply
capability
22 NITIN PANDYA
Grid Services Architecture (3): Resource Layer
Protocols & Services
Resource management: GRAM
Remote allocation, reservation, monitoring, control of [compute] resources
Data access: GridFTP
High-performance data access & transport
Information: MDS (GRRP, GRIP)
Access to structure & state information
& others emerging: catalog access, code repository access, accounting, …
All integrated with GSI
23 NITIN PANDYA
GRAM Resource Management Protocol
Grid Resource Allocation & Management
Allocation, monitoring, control of computations
Simple HTTP-based RPC
Job request:
Returns a “job contact”: Opaque string that can be passed between clients, for
access to job
Job cancel, Job status, Job signal
Event notification (callbacks) for state changes
Pending, active, done, failed, suspended
Servers for most schedulers; C and Java APIs
24 NITIN PANDYA
Resource Management Futures
GRAM-2 protocol (ETA late 2001)
Advance reservations & multiple resource types
Recoverable requests, timeout, etc.
Use of SOAP (RPC using HTTP + XML)
Policy evaluation points for restricted proxies
25 NITIN PANDYA
Data Access & Transfer
GridFTP: extended version of popular FTP protocol for Grid
data access and transfer
Secure, efficient, reliable, flexible, extensible, parallel,
concurrent, e.g.:
Third-party data transfers, partial file transfers
Parallelism, striping (e.g., on PVFS)
Reliable, recoverable data transfers
Reference implementations
Existing clients and servers: wuftpd, nicftp
Flexible, extensible libraries
26 NITIN PANDYA
Grid Services Architecture (4): Collective
Layer Protocols & Services
Index servers aka metadirectory services
Custom views on dynamic resource collections assembled by a community
Resource brokers (e.g., Condor Matchmaker)
Resource discovery and allocation
Replica management and replica selection
Optimize aggregate data access performance
Co-reservation and co-allocation services
End-to-end performance
Etc., etc.
27 NITIN PANDYA
The Grid Information Problem
Large numbers of distributed “sensors” with different properties
Need for different “views” of this information, depending on
community membership, security constraints, intended purpose,
sensor type
28 NITIN PANDYA
The Globus Toolkit Solution: MDS-2
Registration & enquiry protocols, information models, query
languages
Provides standard interfaces to sensors
Supports different “directory” structures supporting various
discovery/access strategies 29 NITIN PANDYA
GRAM GRAM GRAM
LSF Condor NQE
Application
RSL
Simple ground RSL
Information
Service
Local
resource
managers
RSL
specialization Broker
Ground RSL
Co-allocator
Queries
& Info
Resource Management Architecture
* See talk by Jarek Nabrzyski et al.
ASCI DISCOM
Condor-G
Nimrod-G
Poznan*
U. Lecce
DUROC
MPICH-G2
30 NITIN PANDYA
Data Grid Architecture
Metadata Catalog Replica Catalog
Tape Library
Disk Cache
Attribute
Specification
Logical Collection and Logical
File Name
Disk Array Disk Cache
Application
Replica
Selection
Multiple Locations
NWS
Selected
Replica
GridFTP commands Performance
Information &
Predictions
Replica Location 1 Replica Location 2 Replica Location 3
MDS
+ “Virtual data”: transparency wrt location and materialization
(www.griphyn.org) 31 NITIN PANDYA
Grid Futures
Large Grid Projects are in Place DOE ASCI DISCOM
DOE Particle Physics Data Grid
DOE Earth Systems Grid
DOE Science Grid
DOE Fusion Collaboratory
European Data Grid
Egrid (see talk by G. Allen et al.)
NASA Information Power Grid
NSF National Technology Grid
NSF Network for Earthquake Eng Simulation
NSF Grid Application Development Software
NSF Grid Physics Network 33 NITIN PANDYA
Problem Evolution Past-present: O(102) high-end systems; Mb/s networks; centralized
(or entirely local) control
I-WAY (1995): 17 sites, week-long; 155 Mb/s
GUSTO (1998): 80 sites, long-term experiment
NASA IPG, NSF NTG: O(10) sites, production
Present: O(104-106) data systems, computers; Gb/s networks; scaling, decentralized control
Scalable resource discovery; restricted delegation; community policy; GriPhyN Data Grid: 100s of sites, O(104) computers; complex policies
Future: O(106-109) data, sensors, computers; Tb/s networks; highly flexible policy, control
34 NITIN PANDYA
The Future: All Software is Network-Centric
We don’t build or buy “computers” anymore, we borrow or
lease required resources
When I walk into a room, need to solve a problem, need to
communicate
A “computer” is a dynamically, often collaboratively
constructed collection of processors, data sources, sensors,
networks
Similar observations apply for software
35 NITIN PANDYA
And Thus …
Reduced barriers to access mean that we do much more
computing, and more interesting computing, than today =>
Many more components (& services); massive parallelism
All resources are owned by others => Sharing (for fun or
profit) is fundamental; trust, policy, negotiation, payment
All computing is performed on unfamiliar systems =>
Dynamic behaviors, discovery, adaptivity, failure
36 NITIN PANDYA
Summary
The Grid problem: Resource sharing & coordinated problem
solving in dynamic, multi-institutional virtual organizations
Grid architecture: Emphasize protocol and service definition to
enable interoperability and resource sharing
Globus Toolkit as a source of protocol and API definitions, reference
implementations
For more info: www.globus.org, www.griphyn.org,
www.gridforum.org
37 NITIN PANDYA