2
Outline Motivation Definition and characteristics of Grids Example Grid applications Grid Architecture How a Grid Is Assembled Overview of the Globus Toolkit
Security Tools Monitoring and Discovery System Computing/Execution Tools Data Tools
A more detailed example: The Earth System Grid
3
Motivation: Supporting Scientific Applications Computation intensive
Large-scale simulation and analysis (climate modeling, galaxy formation, gravity waves, event simulation)
Engineering (parameter studies, linked models)
Data intensive Experimental data analysis (high energy physics) Image & sensor analysis (astronomy, climate)
Distributed collaboration Online instrumentation (microscopes, x-ray) Remote visualization (climate studies, biology) Engineering (large-scale structural testing)
Large, complex scientific problems Require people in several organizations to collaborate Share computing resources, data, instruments
4
The Grid Problem Flexible, secure, coordinated resource sharing
among dynamic collections of individuals, institutions, and resource(From “The Anatomy of the Grid: Enabling Scalable
Virtual Organizations”)
Enable communities (“Virtual Organizations”) to share geographically distributed resources as they pursue common goals
Assuming the absence of… central location central control omniscience existing trust relationships
5
An Old Idea …
“The time-sharing computer system can unite a group of investigators …. one can conceive of such a facility as an … intellectual public utility.” Fernando Corbato and Robert Fano, 1966
“We will perhaps see the spread of ‘computer utilities’, which, like present electric and telephone utilities, will service individual homes and offices across the country.” Len Kleinrock, 1967
7
Earth System Grid objectivesTo support the infrastructural needs of the national and international climate community, ESG is providing crucial technology to securely access, monitor, catalog, transport, and distribute data in today’s Grid computing environment.
7 Bernholdt_ESG_0611
HPChardware running
climate models
ESGSites
ESG Portal
Slide Courtesy of Dave Bernholdt, ORNL
8
ESG Portal at NCARESG Portal at NCAR IPCC AR4 ESG PortalIPCC AR4 ESG Portal
130 TB of data at four locations 840,331 files Includes the past 6 years of joint DOE/NSF
climate modeling experiments
28 TB of data at one location 68,400 files Generated by a modeling campaign coordinated by the
Intergovernmental Panel on Climate Change Model data from 11 countries
3,200 registered users 818 registered analysis projects
Downloads to date 25 TB 91,000 files
Downloads to date 123 TB 543,500 files 300 GB/day
(average)
300 scientific papers published to date based on analysis of IPCC AR4 data
ESGFacts and Figures
Worldwide ESG user base
0
100
200
300
400
500
600
GB
/day
Daily 7-Day Average
Nov 2004 – Oct 2006
IPCC Downloads (10/12/06)
Slide Courtesy of Dave Bernholdt, ORNL
9
UCSD UT
UC/ANL
NCSA
PSC
ORNL
PU
IU
A National Science Foundation Investment in Cyberinfrastructure
$100M 3-year construction (2001-2004)$150M 5-year operation &
enhancement (2005-2009)
NSF’s TeraGrid*
TeraGrid DEEP: Integrating NSF’s most powerful computers (60+ TF)
2+ PB Online Data Storage National data visualization facilities World’s most powerful network (national
footprint)TeraGrid WIDE Science Gateways:
Engaging Scientific Communities 90+ Community Data Collections Growing set of community partnerships
spanning the science community. Leveraging NSF ITR, NIH, DOE and
other science community projects. Engaging peer Grid projects such as
Open Science Grid in the U.S. as peer Grids in Europe and Asia-Pacific.
Base TeraGrid Cyberinfrastructure:Persistent, Reliable, National
Coordinated distributed computing and information environment
Coherent User Outreach, Training, and Support
Common, open infrastructure services* Slide courtesy of Ray Bair, Argonne National Laboratory
10Image courtesy Harvey Newman, Caltech
Data Grids forHigh Energy Physics
Tier2 Centre ~1 TIPS
Online System
Offline Processor Farm ~20 TIPS
CERN Computer Centre
FermiLab ~4 TIPSFrance Regional Centre
Italy Regional Centre
Germany Regional Centre
InstituteInstituteInstituteInstitute ~0.25TIPS
Physicist workstations
~100 MBytes/sec
~100 MBytes/sec
~622 Mbits/sec
~1 MBytes/sec
There is a “bunch crossing” every 25 nsecs.
There are 100 “triggers” per second
Each triggered event is ~1 MByte in size
Physicists work on analysis “channels”.
Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server
Physics data cache
~PBytes/sec
~622 Mbits/sec or Air Freight (deprecated)
Tier2 Centre ~1 TIPS
Tier2 Centre ~1 TIPS
Tier2 Centre ~1 TIPS
Caltech ~1 TIPS
~622 Mbits/sec
Tier 0Tier 0
Tier 1Tier 1
Tier 2Tier 2
Tier 4Tier 4
1 TIPS is approximately 25,000
SpecInt95 equivalents
11
Elements of a Grid
Resource sharing Computers, storage systems, sensors, networks,… This sharing is always conditional: issues of trust, policy,
negotiation, payment, etc.
Coordinated problem solving Distributed data analysis, computation, simulation,
collaboration, …
Dynamic, multi-institutional virtual organizations Community overlays on classic organizational structures May be large or small, static or dynamic
12
Two Rules or Principles of the Grid Can’t rely on homogeneity of resources
In practice, resources in a large, distributed environment will be heterogeneous
STRATEGY - Plan for diverse systems and use mechanisms to manage heterogeneity
Can’t rely on trust among participants Sites will not be willing to share their resources if they
cannot trust clients from other sites STRATEGY - Provide a security model that can express
complicated social networks STRATEGY - Use full disclosure when making requests (who
is requesting, authorizing, and authenticating the request) and give service owners tools to enforce local policies.
14
Grid Infrastructure Provides distributed management
Of physical resources Of software services Of communities and their policies
Unified treatment Build on Web Services framework Use Web Services Resource Framework (WS-RF),
Web Services Notification (WS-Notification), etc. to represent and access state associated with a service
Common management abstractions & interfaces
15
Elements of the End-to-End Problem Include …
Massively parallel petascale simulation High-performance parallel I/O Remote visualization High-speed reliable data movement Terascale local analysis Data access and analysis by external users Troubleshooting problems in end-to-end system Security Orchestration of these various activities
Slide Courtesy of Ian Foster
17
Layered Grid Architecture(By Analogy to Internet Architecture)
Application
Fabric“Controlling things locally”: Access to, & control of, resources
Connectivity“Talking to things”: communication (Internet protocols) & security
Resource“Sharing single resources”: negotiating access, controlling use
Collective“Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services
InternetTransport
Application
Link
Inte
rnet P
roto
col
Arch
itectu
re
18
Protocols, Services,and APIs Occur at Each Level
Languages/Frameworks
Fabric Layer
Applications
Local Access APIs and Protocols
Collective Service APIs and SDKs
Collective ServicesCollective Service Protocols
Resource APIs and SDKs
Resource ServicesResource Service Protocols
Connectivity APIs
Connectivity Protocols
19
Important Points
Built on Internet protocols & services Communication, routing, name resolution, etc.
“Layering” here is conceptual, does not imply constraints on who can call what Protocols/services/APIs/SDKs will, ideally, be largely self-
contained Some things are fundamental: e.g., communication and
security But, advantageous for higher-level functions to use
common lower-level functions
20
The Hourglass Model
Focus on architecture issues Propose set of core services
as basic infrastructure Use to construct high-level,
domain-specific solutions Design principles
Keep participation cost low Enable local control Support for adaptation “IP hourglass” model
Diverse global services
Coreservices
Local OS
A p p l i c a t i o n s
21
Layered Grid Architecture(By Analogy to Internet Architecture)
Application
Fabric“Controlling things locally”: Access to, & control of, resources
Connectivity“Talking to things”: communication (Internet protocols) & security
Resource“Sharing single resources”: negotiating access, controlling use
Collective“Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services
InternetTransport
Application
Link
Inte
rnet P
roto
col
Arch
itectu
re
22GSI: www.gridforum.org/security
Connectivity LayerProtocols & Services
Communication protocols Internet protocols: IP, DNS, routing, etc.
Security protocols and infrastructure Uniform authentication, authorization, and message
protection mechanisms in multi-institutional setting Single sign-on, delegation, identity mapping E.g., Public key technology, SSL, X.509, GSS-API Supporting infrastructure: Certificate Authorities,
certificate & key management, …
23
Resource LayerProtocols & Services
Job submission and management tools Remote allocation, advance reservation, control of
compute resources Data Transport Tools
High-performance data access & transport Information Provider
Collects information about the current state of a resource, makes available to higher-level service
24
Collective LayerProtocols & Services
Information Services Aggregate and publish information about resource
characteristics Monitor current status of resources
Resource brokers Resource discovery and allocation
Metadata and Replica Catalogs Data Management Services (e.g., replication) Co-reservation and co-allocation services Workflow management services
25
Example:High-Throughput
Computing SystemHigh Throughput Computing System
Dynamic checkpoint, job management, failover, staging
Brokering, certificate authorities
Access to data, access to computers, access to network performance data
Communication, service discovery (DNS), authentication, authorization, delegation
Storage systems, schedulers
Collective(App)
App
Collective(Generic)
Resource
Connect
Fabric
26
Example: Grid Servicesfor Data-Intensive Applications
Discipline-Specific Data Grid Application
Coherency control, replica selection, task management, data placement services, …
Replica catalog, replica management, co-allocation, certificate authorities, metadata catalogs, …
Access to data, access to computers, access to network performance data, …
Communication, service discovery (DNS), authentication, authorization, delegation
Storage systems, clusters, networks, network caches, …
Collective(App)
App
Collective(Generic)
Resource
Connect
Fabric