An Introduction to Grid Computing
BEAM Workshop
December 2004
Mark Servilla
LTER Network Office
SEEK-BEAM Workshop Dec 2004 2
Presentation Agenda
Definitions Evolution of the Grid Characteristics Computing Model Protocols Examples References
SEEK-BEAM Workshop Dec 2004 3
Definitions of a Grid “… a network of conductors for distribution of electric
power; also : a network of radio or television stations” – Merriam-Webster
“… the illusion of a simple yet large and powerful self-managing virtual computer out of a large collection of connected heterogeneous systems sharing various combinations of resources” – IBM Redbooks
“Grid Computing enables virtual organizations to share geographically distributed resources as they pursue common goals, assuming the absence of central location, central control, omniscience, and an existing trust relationship.” – Globus Alliance
“The Web provides us information — the grid allows us to process it.” - Ahmar Abbas
SEEK-BEAM Workshop Dec 2004 4
The Evolution ofGrid Technology
High-Performance Computing Cluster Computing Peer-to-Peer Computing Internet Computing
SEEK-BEAM Workshop Dec 2004 5
High-Performance Computing
Traditionally known as super-computing
Specialized for parallel processing algorithms
Shared equally among academia, research, and commercial sectors
SEEK-BEAM Workshop Dec 2004 6
Cluster Computing Originated 1994 – Beowulf cluster NASA High-performance Massively-parallel (2 to 1000 nodes) Commodity hardware (Intel, AMD) Low-cost software (Linux, FreeBSD) Interconnected via high-speed private networks Shared storage SAN/NAS
AMD Athlon cluster at University of Heidelberg, Germany – 825Gflops, 35th fastest high-performance computer in the world
SEEK-BEAM Workshop Dec 2004 7
Cluster Computing
SEEK-BEAM Workshop Dec 2004 8
Peer-to-Peer Computing
Primarily used for distributed storage and file-sharing
Early models (rcp, scp, ftp) Restricted to LANs, or Limited to known peers
Internet-based models Centralized (Napster, Kazaa*) Decentralized (Gnutella)
*100,000,000 downloads by 2004; 2-million new downloads a week
SEEK-BEAM Workshop Dec 2004 9
Centralized Peer-to-Peer
.mp3
?
??
??
?.mp3 .mp3 .mp3
SEEK-BEAM Workshop Dec 2004 10
Decentralized Peer-to-Peer
?
?
??
?
?
.mp3 .mp3 .mp3 .mp3
SEEK-BEAM Workshop Dec 2004 11
Internet Computing Volunteer or philanthropic
computing; utilizes personal desktop computers connected to the Internet
Desktop computers idle approximately 95% of the their lifespan
Divide and Conqueror approach Tasks broken into smaller
subtasks Desktop executes subtasks
during idle time Desktop sends data back to
central server, which aggregates results
SEEK-BEAM Workshop Dec 2004 12
Synthesis entrée Grid
High-performance computing pioneered the use of “parallel” algorithms
Cluster computing demonstrated the nature of shared computing and
storage load balancing protocols
Peer-to-peer computing distributed storage resource with no central authority
Internet computing geographically distributed virtual organization fabric of the project vanishes with completion of the task
SEEK-BEAM Workshop Dec 2004 13
Grid Characteristics Resources that
are connected via a network are geographically distributed may consist of heterogeneous hardware and/or
software are managed transparently for performance and
fault tolerance Creates the illusion of virtual organizations
and projects without the presence of a central authority, or a central control
Explicit trust relationships between users and resources
A system that scales in space and time
SEEK-BEAM Workshop Dec 2004 14
Types of Resources Computation
utilization of computing cycles found on processors of the machines on the grid
Storage to increase capacity, performance, sharing, and reliability of data
Communication to increase capacity, performance, and reliability of data
communication Collaboration tools
to facilitate collaboration through conferencing, visualization, and data sharing
Software and Licenses to share site-specific software and/or licenses
Special equipment, capacities, architectures, and policies printers, imaging, sensors, or other local specialty resources
SEEK-BEAM Workshop Dec 2004 15
Grid Ingredients
SEEK-BEAM Workshop Dec 2004 16
Grid Topologies Departmental Grids
localized to a specific group of people generally, same hardware and software designed for high throughput and high performance over a
dedicated network Enterprise Grids
service to numerous groups within a single company or campus
resource heterogeneity increases company-wide local area network
Extraprise Grids service to multiple companies, partners, and customers within
a particular domain domain based private network
Global Grids established over the public-Internet
SEEK-BEAM Workshop Dec 2004 17
Resource-based Grids
Compute Grids desktop nodes server nodes high-performance computing clusters
Data Grids performance-based distributed storage replication for fault-tolerance
Collaboration Grids support for video-conferencing, visualization and data sharing
Utility Grids maintained and managed by a commercial service provider compute resources acquired on a per-need basis application resources that are purchased on a per-use or per-
minute basis
SEEK-BEAM Workshop Dec 2004 18
Application Characteristics
Perfect Parallelism – computations run autonomously (Monte Carlo Simulations)
Data Parallelism – operations performed on data simultaneously (db searches)
Functional Parallelism – multiple operations are performed simultaneously
Optimized for parallel execution
Not capable of parallel computation
Fibonacci Series (1, 1, 2, 3, 5, 8, 13, 21,…)F(k+2) = F(k+1) + F(k)
SEEK-BEAM Workshop Dec 2004 19
Questions to ask?When thinking Grid
Identity and Authentication—Is this user who he says he is? Is this program the right program?
Authorization and Policy—What can the user do on the grid? What can the application do on the grid? What resources are the user and or application allowed to access?
Resource Discovery—Where are the resources? Resource Characterization—What types of resources are
available? Resource Allocation—What policy is applied when assigning the
resources? What is the actual process of assigning the resources. Who gets how much?
Resource Management—Which resource can be used at what time and for what purpose?
Accounting/Billing/Service Level Agreement (SLA)—How much of the resources is being used? What is the rating schedule? What is the SLA?
Security—How do I make sure that this is done securely? How do we know if we have been compromised? What steps are taken once a security breach is detected?
SEEK-BEAM Workshop Dec 2004 20
A Grid Computing Model
(the Globus view)
Software stack consisting of Standards Protocols APIs and SDKs
Loosely based on the Internet model
SEEK-BEAM Workshop Dec 2004 21
A detailed view… Fabric – protocols and
interfaces to resource being shared
Connectivity – protocols for grid-specific network transactions (IP, DNS, WSDL); Security implementation (GSI)
Resource – protocols to initiate and control sharing of local resources (GRAM, GridFTP, GRIS)
Collective – protocols for system-wide deployment (versus local)
Application – protocols targeted at a specific application or class of applications
SEEK-BEAM Workshop Dec 2004 22
Grid Protocols Grid Security Infrastructure (GSI) Grid Resource Allocation and Management
(GRAM) Grid File Transfer Protocol (GridFTP) Grid Information Services (GIS)
SEEK-BEAM Workshop Dec 2004 23
Grid Security Infrastructure
Extended from SSL/TLS and X.509 protocols Utilizes PKI for Certificate Authority
Primary objective is “Authorization” Generates primary credential Generates temporary proxy credential
Certificate Authority Positively identify entities requesting certificates Issuing, removing, and archiving certificates Protecting the Certificate Authority server Maintaining a namespace of unique names for certificate
owners Serve signed certificates to those needing to
authenticate entities Logging activity
SEEK-BEAM Workshop Dec 2004 24
Public Key Infrastructure
1. User A encrypts message with his private key
2. Obtains User B’s public key from CA
3. Encrypts message with B’s public key
4. Sends message
1. User B decrypts message with his private key
2. Obtains User A’s public key from CA
3. Decrypts A’s message with public key
4. B knows message is from A
Public
Private
Private
Public
PublicKeys
“A” “B”
CertificateAuthority
B’s publickey
A’s publickey
AuthenticationCredential
SEEK-BEAM Workshop Dec 2004 25
Grid Security Infrastructure
SEEK-BEAM Workshop Dec 2004 26
Grid Resource Allocation and Management
Allows programs to be started on remote resources Resource Specification Language (RSL)
Resource requirements machine type, number of nodes, memory, etc…
Job configuration directory, executable, arguments, environment
Communication protocols HTTP-base RPC (early protocol) Web-services (WSDL, SOAP)
“create 5-10 instances of myprog, each on a machine with at least 64MB memory that is available to me for 4 hours, or 10 instances, on a machine with
at least 32MB of memory”
SEEK-BEAM Workshop Dec 2004 27
Grid File Transfer Protocol
Providing high-speed and reliable transfer of large volume data (petabytes)
Extension of standard FTP to include striped/parallel data channels partial files automatic and manual TCP buffer size settings progress monitoring extended restart functionality
SEEK-BEAM Workshop Dec 2004 28
Grid Information Services
Grid Resource Information Service (GRIS) provides resource specific information
Grid Resource Registration (GRR) updates GRIS about resource status
Grid Index Information Service (GIIS) an aggregate directory service provides a collection of information that has
been gathered from multiple GRIS servers Grid Resource Inquiry (GRI)
queries GRIS server for resource information queries GIIS server for information
SEEK-BEAM Workshop Dec 2004 29
Open Grid Services Architecture
Marriage of grid protocols with web service protocols
Specifications for How Grid Services are created and discovered How Grid Service instances are named and
referenced Interfaces that define any Grid Service
Initial release with GT 3.0 mid-2003; GT 4.0 Jan 2005
SEEK-BEAM Workshop Dec 2004 30
Grid Examples Network for Earthquake Engineering and
Simulation (NEESGrid) Biomedical Informatics Research Network
(BIRN) EcoGrid
SEEK-BEAM Workshop Dec 2004 31
NEESGrid(Network for Earthquake Engineering and
Simulation)
Linking scientists and facilities observation of an experiment in progress observation before and after an experiment remote operation of an experiment
Linking facilities and data hybrid operation of physical simulations with other simulations,
both physical and numerical automatic archiving of raw data, calibration data, and
processed data Linking scientists and data
collaborative views (static) of time synchronized data visualizations
collaborative views of time synchronized data visualizations with video and audio recordings
Linking scientists and other scientists synchronous communication, such as with colleagues during
an experiment asynchronous communication, such as with colleagues over
the course of preparing a publication resulting from an experiment
SEEK-BEAM Workshop Dec 2004 32
NEESGrid(Network for Earthquake Engineering and
Simulation)
SEEK-BEAM Workshop Dec 2004 33
NEESGrid(Network for Earthquake Engineering and
Simulation)
Network Architecture Diagram
SEEK-BEAM Workshop Dec 2004 34
BIRN(Biomedical Informatics Research
Network)
Testbed for a biomedical knowledge infrastructure
Federated database of neuro-imaging data Fusion of diverse data sources (location; level of
aggregation) Grid access to computational resources Datamining software Scalable and extensible Driven by research needs, not technology-pull or
not technology-push
SEEK-BEAM Workshop Dec 2004 35
BIRN(Biomedical Informatics Research
Network)
SEEK-BEAM Workshop Dec 2004 36
BIRN(Biomedical Informatics Research
Network)
SEEK-BEAM Workshop Dec 2004 37
EcoGrid Metadata Standardization
Ecological Metadata Language – “EML” Integrate diverse data networks from ecology, biodiversity, and
environmental sciences Standardized interfaces to data resources
Metacat SRB DiGIR Xanthoria
Metadata-mediated data access (application-based) Supports multiple metadata standards EML, Darwin Core as foci
Computational services Pre-defined analytical services On-the-fly analytical services
SEEK-BEAM Workshop Dec 2004 38
EcoGrid
*EML facilitates semi-automatic data binding
SEEK-BEAM Workshop Dec 2004 39
EcoGrid
SEEK-BEAM Workshop Dec 2004 40
Grid Organizations Globus Alliance
Globus ToolkitTM – Reference implementation of the grid architecture and grid protocols
http://www.globus.org NSF Middleware Initiative (NMI)
Supports the design, development, testing, and deployment of middleware for HPC
http://www.nsf-middleware.org GRIDS Center
Grid Research Integration Deployment and Support Center – part of NMI
http://www.grids-center.org Global Grid Forum
Main standards body governing the world-wide grid community
http://www.globalgridforum.org
SEEK-BEAM Workshop Dec 2004 41
Recommended Texts
Grid Computing: A Practical Guide to Technology and Applications Ahmar Abbas Charles River Media © 2004
Introduction to Grid Computing with Globus Luis Ferreira et al. IBM Redbooks © 2004
Enabling Applications for Grid Computing with Globus Bart Jacob et al. IBM Redbooks © 2003
Grid Services Programming and Application Enablement Luis Ferreira et al. IBM Redbooks © 2004